CN115733980A

CN115733980A - Video transmission method, system, electronic device, storage medium and chip system

Info

Publication number: CN115733980A
Application number: CN202111017380.6A
Authority: CN
Inventors: 于野; 东巍; 陈曦; 李扬; 李雪晨; 朱洲; 杨剑; 苏诚
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2023-03-03

Abstract

The application is applicable to the technical field of videos and provides a video transmission method, a video transmission system, electronic equipment, a storage medium and a chip system, wherein the method comprises the following steps: determining two groups of video frames according to video data of a target video, wherein the resolution of each video frame in each group of video frames is the same, the resolution of one group of video frames is greater than that of the other group of video frames, the frame number of any one video frame in one group of video frames is different from that of any one video frame in the other group of video frames, and each group of video frames comprises a plurality of video frames with discontinuous frame numbers; respectively coding each group of video frames to obtain coded data corresponding to each group of video frames; and transmitting each coded data. The compression rate can be improved by 30-40% by encoding and transmitting each encoded data with different resolutions, thereby effectively improving the compression rate of the video data of the target video, reducing the network bandwidth resources occupied by transmitting the video data, avoiding the waste of the network bandwidth resources and reducing the cost of transmitting the video data.

Description

Video transmission method, system, electronic device, storage medium and chip system

Technical Field

The present application relates to the field of video technologies, and in particular, to a video transmission method, a video transmission system, an electronic device, a storage medium, and a chip system.

Background

With the continuous development of data streaming media, the storage space occupied by video data is continuously increased. Accordingly, the transmission of video data over the internet has also created an increasingly scarce network bandwidth resource. Therefore, it is necessary to compress video data.

In order to alleviate the situation that network bandwidth is occupied by video data transmission, currently, in the process of sending video data to receiving end equipment, sending end equipment may first compress the video data to be transmitted according to a video coding rule, and then send the compressed video data to the receiving end equipment. However, the compressed video data still occupies more network bandwidth resources in the transmission process.

Disclosure of Invention

The application provides a video transmission method, a video transmission system, an electronic device, a storage medium and a chip system, which solve the problem that more network bandwidth resources are still occupied in the process of transmitting video data in the prior art.

In order to achieve the purpose, the technical scheme is as follows:

in a first aspect, a video transmission method is provided, where the method is applied to a video transmission system formed by a sending end device and a receiving end device, and the method includes:

the sending end equipment determines two groups of video frames according to video data of a target video, wherein the resolution of each video frame in each group of video frames is the same, the resolution of one group of video frames is higher than that of the other group of video frames, the frame number of any one video frame in one group of video frames is different from that of any one video frame in the other group of video frames, and each group of video frames comprises a plurality of video frames with discontinuous frame numbers;

the sending end equipment respectively encodes each group of video frames to obtain encoded data corresponding to each group of video frames;

the sending end equipment sends first coded data and second coded data corresponding to the two groups of video frames respectively, wherein the resolution corresponding to the first coded data is greater than the resolution corresponding to the second coded data;

the receiving end equipment receives the first coded data and the second coded data;

the receiving end device decodes the first encoded data and the second encoded data respectively to obtain first decoded data corresponding to the first encoded data and second decoded data corresponding to the second encoded data, wherein the resolution of the first decoded data is greater than that of the second decoded data.

The sending end equipment determines two groups of video frames according to the video data of the target video, encodes and compresses each group of video frames to obtain encoded data corresponding to each group of video frames, then sends each encoded data to the receiving end equipment, and transmits each encoded data with different resolutions by encoding, so that the compression ratio can be improved by 30% -40%, thereby effectively improving the compression ratio of the video data of the target video, reducing the network bandwidth resources occupied when transmitting the video data, avoiding the waste of the network bandwidth resources and reducing the cost of transmitting the video data.

The receiving end equipment decodes the first coded data and the second coded data to obtain first decoded data and second decoded data, and then the first decoded data and the second decoded data are synthesized to obtain synthesized video data.

In a first possible implementation manner of the first aspect, the video data of the target video includes: first video data and second video data;

the sending end equipment determines two groups of video frames according to the video data of the target video, and the method comprises the following steps:

the sending end equipment performs down-sampling on the first video data to obtain second video data, wherein the resolution of each video frame in the first video data is higher than that of each video frame in the second video data;

and the sending end equipment respectively selects a plurality of video frames from the first video data and the second video data to obtain the two groups of video frames.

In a second possible implementation manner of the first aspect, the resolutions of the video frames in the video data of the target video are the same;

the sending end equipment selects a part of video frames from the video data of the target video to obtain a group of video frames;

and the sending end equipment performs down-sampling on another part of video frames in the video data of the target video to obtain another group of video frames.

The sending end equipment acquires the video data of the target video in different modes and encodes the video data of the target video in a mode corresponding to the acquisition mode, so that the flexibility of acquiring the video data can be improved, and the flexibility of encoding the video data of the target video can also be improved.

Based on the first or second possible implementation manner of the first aspect, in a third possible implementation manner of the first aspect, in a group of video frames with a higher resolution in the two groups of video frames, a difference between frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

The video frames with discontinuous frame numbers are arranged in a group of video frames with larger resolution, so that the storage space occupied by the group of video frames can be reduced, and the compression rate of coding the group of video frames is improved.

Based on any one of the first to third possible implementation manners of the first aspect, in a fourth possible implementation manner of the first aspect, a plurality of groups of video frames with consecutive frame numbers are included in a group of video frames with a lower resolution of the two groups of video frames, and the number of the video frames included in each group of video frames with consecutive frame numbers is n, where n is a positive integer greater than or equal to 1.

By arranging the video frames with continuous or discontinuous frame numbers in a group of video frames with smaller resolution, the storage space occupied by the group of video frames can be reduced, and the compression rate for coding the group of video frames is improved.

Based on any one of the foregoing possible implementation manners of the first aspect, in a fifth possible implementation manner of the first aspect, the frame numbers of the video frames obtained by combining the two groups of video frames are consecutive.

The frame numbers of the video frames in the two groups of video frames are continuous, which indicates that the two groups of video frames are complementary, so that the accuracy of recovering the obtained video frames can be improved, and the playing effect of the synthesized video data can be improved.

Based on any one of the foregoing possible implementation manners of the first aspect, in a sixth possible implementation manner of the first aspect, before the sending-end device determines two sets of video frames according to video data of a target video, the method further includes:

the sending end equipment acquires video data of the target video from a storage space;

or the sending end equipment collects the video data of the target video in real time.

The flexibility of acquiring the video data can be improved by acquiring the video data in different modes, and the acquired video data can be encoded by adopting an encoding mode corresponding to the acquiring mode.

Based on any one of the foregoing possible implementation manners of the first aspect, in a seventh possible implementation manner of the first aspect, the sending, by the sending end device, first encoded data and second encoded data corresponding to the two groups of video frames respectively includes:

the sending end equipment sends the first coded data and the second coded data respectively corresponding to the two groups of video frames to the receiving end equipment;

or, the sending end device sends the first encoded data and the second encoded data corresponding to the two sets of video frames to a server, and the server is configured to forward the first encoded data and the second encoded data corresponding to the two sets of video frames to the receiving end device.

By transmitting the encoded data in different ways, the flexibility of transmitting the encoded data can be improved.

In an eighth possible implementation manner of the first aspect, the processing, by the receiving end device, the second decoded data according to the first decoded data to obtain a restored frame with a resolution that is the same as that of the first decoded data includes:

and the receiving end equipment processes the video frame of the second decoding data according to the video frame of the first decoding data through a preset artificial intelligence AI hyper-resolution model to obtain the reduction frame.

The receiving terminal equipment processes the low-resolution video frame according to the high-resolution video frame adjacent to the frame number through the AI hyper-segmentation model, and the high-frequency airspace information of the high-resolution video frame is utilized, so that the complexity of a hyper-segmentation network can be reduced, the running condition of the AI hyper-segmentation model can be reduced, the AI hyper-segmentation model can be run through the portable terminal, and the universality of the AI hyper-segmentation model is improved.

Based on the eighth possible implementation manner of the first aspect, in a ninth possible implementation manner of the first aspect, the processing, by the receiving end device, the video frame of the second decoded data according to the video frame of the first decoded data through a preset artificial intelligence AI hyper-segmentation model to obtain the restored frame includes:

and the receiving terminal equipment inputs a first video frame of the second decoding data and at least one video frame continuous to the frame number of the first video frame into the AI hyper-resolution model to obtain a reduced frame corresponding to the first video frame.

In the process of processing the video frame by the receiving end equipment through the AI hyper-resolution model, the missing frame information can be obtained according to the low-resolution video frame, so that the network bandwidth resource is saved, the complexity of processing the video frame is reduced, and the accuracy of restoring the frame is improved.

In a second aspect, a video transmission method is provided, the method comprising:

determining two groups of video frames according to video data of a target video, wherein the resolution of each video frame in each group of video frames is the same, the resolution of one group of video frames is greater than that of the other group of video frames, the frame number of any one video frame in one group of video frames is different from that of any one video frame in the other group of video frames, and each group of video frames comprises a plurality of video frames with discontinuous frame numbers;

respectively coding each group of video frames to obtain coded data corresponding to each group of video frames;

and transmitting each coded data.

Wherein the frame number discontinuity is used to indicate a frame number discontinuity of two video frames. Correspondingly, each group of video frames comprises adjacent video frames of which the difference value between the video frame numbers is a positive integer larger than 1.

In a first possible implementation manner of the second aspect, the video data of the target video includes: first video data and second video data;

the determining two groups of video frames according to the video data of the target video comprises:

down-sampling the first video data to obtain second video data, wherein the resolution of each video frame in the first video data is higher than that of each video frame in the second video data;

and respectively selecting a plurality of video frames from the first video data and the second video data to obtain the two groups of video frames.

In a second possible implementation manner of the second aspect, the resolution of each video frame in the video data of the target video is the same;

selecting a part of video frames from the video data of the target video to obtain a group of video frames;

and performing down-sampling on another part of video frames in the video data of the target video to obtain another group of video frames.

Based on the first or second possible implementation manner of the second aspect, in a third possible implementation manner of the second aspect, in a group of video frames with a higher resolution of the two groups of video frames, a difference between frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

Based on any one of the first to third possible implementation manners of the second aspect, in a fourth possible implementation manner of the second aspect, in a group of video frames with a smaller resolution of the two groups of video frames, multiple groups of video frames with consecutive frame numbers are included, and the number of the video frames included in each group of video frames with consecutive frame numbers is n, where n is a positive integer greater than or equal to 1.

Based on any one of the foregoing possible implementation manners of the second aspect, in a fifth possible implementation manner of the second aspect, the frame numbers of the video frames obtained by combining the two groups of video frames are consecutive.

Based on any one of the foregoing possible implementation manners of the second aspect, in a sixth possible implementation manner of the second aspect, before the determining two sets of video frames according to the video data of the target video, the method further includes:

acquiring video data of the target video from a storage space;

or acquiring the video data of the target video in real time.

Based on any one of the foregoing possible implementation manners of the second aspect, in a seventh possible implementation manner of the second aspect, the sending each encoded data includes:

transmitting each coded data to receiving end equipment;

or, each of the encoded data is sent to a server, and the server is configured to forward each of the encoded data to a receiving end device.

In a third aspect, a video transmission method is provided, the method including:

receiving first coded data and second coded data of a target video, wherein the first coded data and the second coded data have different corresponding resolutions;

decoding the first encoded data and the second encoded data respectively to obtain first decoded data corresponding to the first encoded data and second decoded data corresponding to the second encoded data, wherein the resolution of the first decoded data is greater than that of the second decoded data;

processing the second decoding data according to the first decoding data to obtain a restoring frame with the same resolution as the first decoding data;

and combining the first decoding data and the reduction frame to obtain the video data of the target video.

In a first possible implementation manner of the third aspect, the processing, according to the first decoded data, the second decoded data to obtain a restored frame with a resolution that is the same as that of the first decoded data includes:

and processing the video frame of the second decoding data according to the video frame of the first decoding data through a preset AI hyper-score model to obtain the reduction frame.

Based on the first possible implementation manner of the third aspect, in a second possible implementation manner of the third aspect, the processing, by using a preset AI hyper-differential model, the video frame of the second decoded data according to the video frame of the first decoded data to obtain the restored frame includes:

and inputting a first video frame of the second decoding data and at least one video frame continuous to the frame number of the first video frame into the AI hyper-resolution model to obtain a reduction frame corresponding to the first video frame.

In a fourth aspect, there is provided a video transmission apparatus, the apparatus comprising:

the encoding module is used for determining two groups of video frames according to the video data of the target video, wherein the resolution of each video frame in each group of video frames is the same, the resolution of one group of video frames is higher than that of the other group of video frames, the frame number of any one video frame in one group of video frames is different from that of any one video frame in the other group of video frames, and each group of video frames comprises a plurality of video frames with discontinuous frame numbers;

the encoding module is further used for respectively encoding each group of video frames to obtain encoded data corresponding to each group of video frames;

and the sending module is used for sending each coded data.

In a first possible implementation manner of the fourth aspect, the video data of the target video includes: first video data and second video data;

the encoding module is specifically configured to perform downsampling on the first video data to obtain the second video data, where a resolution of each video frame in the first video data is higher than a resolution of each video frame in the second video data; and respectively selecting a plurality of video frames from the first video data and the second video data to obtain the two groups of video frames.

In a second possible implementation manner of the fourth aspect, the resolutions of the video frames in the video data of the target video are the same;

the encoding module is specifically configured to select a part of video frames from the video data of the target video to obtain a group of video frames; and performing down-sampling on another part of video frames in the video data of the target video to obtain another group of video frames.

Based on the first or second possible implementation manner of the fourth aspect, in a third possible implementation manner of the fourth aspect, in a group of video frames with a higher resolution of the two groups of video frames, a difference between frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

Based on any one of the first to third possible implementation manners of the fourth aspect, in a fourth possible implementation manner of the fourth aspect, in a group of video frames with a smaller resolution of the two groups of video frames, multiple groups of video frames with consecutive frame numbers are included, and the number of the video frames included in each group of video frames with consecutive frame numbers is n, where n is a positive integer greater than or equal to 1.

Based on any one of the foregoing possible implementation manners of the fourth aspect, in a fifth possible implementation manner of the fourth aspect, the frame numbers of the video frames obtained by combining the two groups of video frames are consecutive.

Based on any one of the foregoing possible implementation manners of the fourth aspect, in a sixth possible implementation manner of the fourth aspect, the apparatus further includes:

the acquisition module is used for acquiring the video data of the target video from a storage space;

or the acquisition module is used for acquiring the video data of the target video in real time.

Based on any one of the foregoing possible implementation manners of the fourth aspect, in a seventh possible implementation manner of the fourth aspect, the sending module is specifically configured to send each encoded data to a receiving end device;

or, the sending module is specifically configured to send each encoded data to a server, and the server is configured to forward each encoded data to receiving end equipment.

In a fifth aspect, a video transmission apparatus is provided, the apparatus comprising:

the video coding device comprises a receiving module, a decoding module and a processing module, wherein the receiving module is used for receiving first coded data and second coded data of a target video, and the first coded data and the second coded data have different corresponding resolutions;

a decoding module, configured to decode the first encoded data and the second encoded data respectively to obtain first decoded data corresponding to the first encoded data and second decoded data corresponding to the second encoded data, where a resolution of the first decoded data is greater than a resolution of the second decoded data;

the processing module is used for processing the second decoding data according to the first decoding data to obtain a recovery frame with the same resolution as the first decoding data;

and the processing module is also used for combining the first decoding data and the reduction frame to obtain the video data of the target video.

In a first possible implementation manner of the fifth aspect, the processing module is specifically configured to process, through a preset AI hyper-diversity model, the video frame of the second decoded data according to the video frame of the first decoded data, so as to obtain the reduced frame.

Based on the first possible implementation manner of the fifth aspect, in a second possible implementation manner of the fifth aspect, the processing module is specifically configured to input the AI hyper-score model to a first video frame of the second decoded data and at least one video frame consecutive to a frame number of the first video frame, so as to obtain a restored frame corresponding to the first video frame.

In a sixth aspect, there is provided a video transmission system comprising: a sending terminal device and a receiving terminal device;

the sender device is configured to perform the video transmission method according to any one of the second aspects;

the receiving end device is configured to perform the video transmission method according to any one of the third aspects.

In a seventh aspect, an electronic device is provided, including: a processor for executing a computer program stored in a memory to implement the video transmission method according to any one of the second or third aspects.

An eighth aspect provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which when executed by a processor, implements the video transmission method according to any one of the second aspect or the third aspect.

In a ninth aspect, a chip system is provided, the chip system comprising a memory and a processor, the processor executing a computer program stored in the memory to implement the video transmission method according to any one of the second aspect or the third aspect.

It is understood that the beneficial effects of the second to ninth aspects can be seen from the description of the first aspect, and are not described herein again.

Drawings

Fig. 1 is a system architecture diagram of a system framework of a video transmission system according to an embodiment of the present application;

fig. 2 is a block diagram of a structure in which a sending end device and a receiving end device transmit video data according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a video transmission method according to an embodiment of the present application;

FIG. 4A is a diagram illustrating an embodiment of selecting a video frame for encoding according to video data;

FIG. 4B is a diagram illustrating another example of selecting a video frame for encoding according to video data according to an embodiment of the present application;

fig. 4C is a schematic diagram of another embodiment of selecting a video frame for encoding according to video data;

fig. 5A is a schematic diagram of grouping video frames in video data according to an embodiment of the present application;

fig. 5B is a schematic diagram of another grouping of video frames in video data according to an embodiment of the present application;

fig. 5C is a schematic diagram of another embodiment of the present application for grouping video frames in video data;

fig. 6 is a schematic diagram of decoding encoded data according to an embodiment of the present application;

fig. 7 is a schematic flowchart of generating a reduction frame through an AI hyper-resolution model according to an embodiment of the present application;

FIG. 8 is a frame structure diagram of an AI hyper-score model according to an embodiment of the present application;

fig. 9 is a schematic flow chart of another video transmission method provided in the embodiment of the present application;

fig. 10 is a schematic diagram of another embodiment of the present application for grouping video frames in video data;

fig. 11 is a block diagram of a video transmission apparatus according to an embodiment of the present application;

fig. 12 is a block diagram of another video transmission apparatus according to an embodiment of the present application;

fig. 13 is a block diagram of a further video transmission apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known video compression methods, video compression standards, video transmission methods, and electronic devices are omitted so as not to obscure the description of the present application with unnecessary detail.

The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise.

First, a system framework of a video transmission system according to an embodiment of the present application will be described. Referring to fig. 1, the system framework of the video transmission system shown in fig. 1 includes: a sending end device 110, a receiving end device 120, and a server 130.

The server 130 may establish communication connections with the sending-end device 110 and the receiving-end device 120, respectively, and the sending-end device 110 may also establish communication connections with the receiving-end device 120. Correspondingly, in the process of transmitting video data, the sending-end device 110 may send the video data to the receiving-end device 120, or may send the video data to the server 130, and then forward the video data to the receiving-end device 120 through the server 130.

The following describes a process of transmitting video data of a target video by taking the example of transmitting video data of the target video from the transmitting device 110 to the receiving device 120.

Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a transmitting end device and a receiving end device for transmitting video data, and as shown in fig. 2, the transmitting end device 110 may include: a data acquisition module 1101 and an encoding module 1102.

The data acquisition module 1101 is connected to the encoding module 1102. The data collection module 1101 may be a camera of the sending end device 110, and the encoding module 1102 may include: an encoder.

Before transmitting the video data of the target video, the sending-end device 110 may first acquire, by using the data acquisition module 1101, first video data of the target video, and then perform down-sampling on the first video data to obtain second video data, where a resolution of the first video data is higher than a resolution of the second video data. Then, the encoder in the encoding module 1102 may perform encoding processing on a portion of the video frame in the first video data and a portion of the video frame in the second video data in parallel in a dual-thread manner, to obtain first encoded data corresponding to the first video data and second encoded data corresponding to the second video data, so that the first encoded data and the second encoded data may be sent to the receiving end device 120 or the server 130.

The above is to encode and compress the video data of the target video obtained by real-time shooting, the sending-end device 110 may further include the video data of the target video stored in advance, and the sending-end device 110 may also compress the video data of the target video stored in advance, and then send the first encoded data and the second encoded data to the receiving-end device 120 or the server 130.

Accordingly, referring to fig. 2, the transmitting end device 110 may further include: a storage module 1103, and the storage module 1103 may include video data of a target video stored in advance.

In the process of encoding the video data of the pre-stored target video, the sending end device 110 may first process the video data of the target video pre-stored in the storage module 1103 to obtain a plurality of high-resolution continuous video frames, and then group the plurality of continuous video frames according to a preset rule to obtain two groups of video frames. Then, one group of video frames may be down-sampled to obtain a low-resolution video frame, so that a group of high-resolution video frames and a group of low-resolution video frames may be obtained, and then, two groups of video frames with different resolutions may be encoded by the encoder of the encoding module 1102 to obtain first encoded data and second encoded data, so that the first encoded data and the second encoded data may be sent to the receiving end device 120 or the server 130.

Referring to fig. 2, the sink device 120 may include: a decoding module 1201, a synthesizing module 1202, a playing module 1203 and a storage module 1204.

The decoding module 1201 is connected to the synthesizing module 1202 and the storage module 1204, respectively, and the synthesizing module 1202 is further connected to the playing module 1203. Also, the decoding module 1201 may include: a decoder.

After receiving the first encoded data and the second encoded data corresponding to different resolutions, the receiving end device 120 may decode the first encoded data and the second encoded data in parallel in a dual-thread manner through a decoder of the decoding module 1201, so as to obtain first decoded data and second decoded data, that is, two sets of video frames. The composition module 1202 may restore a group of low-resolution video frames according to a group of high-resolution video frames according to a preset Artificial Intelligence (AI) hyper-differentiation model to obtain restored frames. Then, the composition module 1202 may combine the set of high-resolution video frames and each restored frame to obtain the synthesized video data. Finally, the playing module 1203 may play the synthesized video data.

After the decoding module 1201 decodes the two groups of video frames, the storage module 1204 may store first decoded data and second decoded data respectively formed by the two groups of video frames. Further, the receiving end device 120 may play the first decoded data and the second decoded data respectively.

It should be noted that fig. 2 illustrates a process that the sending end device 110 may send the first encoded data and the second encoded data to the receiving end device 120 by using a point-to-point transmission method. In practical applications, the sending-end device 110 may also send the first encoded data and the second encoded data to the server 130 first. After that, the server 130 may forward the received first encoded data and the second encoded data to the receiving end device 120, and a manner of sending the first encoded data and the second encoded data to the receiving end device 120 is not limited in this embodiment of the application.

Further, in the video transmission system provided in the embodiment of the present application, the sending-end device 110 may transmit currently acquired video data in real time. For example, the video transmission system may be applied in a video call scene, a teleconference scene, and a live network scene.

Of course, in the video transmission system provided in the embodiment of the present application, the sending-end device 110 may also transmit the video data stored in advance, and the application time and the application scene of the video transmission system are not limited in the embodiment of the present application.

In addition, the above is described by taking an example that the video data of the target video includes video data of two different resolutions, and in practical applications, the sending-end device 110 may obtain video data corresponding to multiple resolutions of the target video.

Moreover, the sending-end device 110 may also perform downsampling on the video data of the target video to obtain multiple groups of video frames corresponding to multiple resolutions, and in the embodiment of the present application, there is no limitation on the multiple resolutions obtained by downsampling the sending-end device 110, and there is no limitation on the number of the multiple groups of video frames obtained by encoding and compressing.

For the sake of simplicity, the following description takes an example that the transmitting-end device 110 performs encoding and compression to obtain encoded data corresponding to two resolutions, that is, the encoded data transmitted to the receiving-end device 120 by the transmitting-end device 110 includes first encoded data and second encoded data.

Fig. 3 is a schematic flowchart of a video transmission method provided in an embodiment of the present application, and by way of example and not limitation, the method may be applied to the sending end device and the receiving end device, and referring to fig. 3, the method includes:

step 301, the sending end device obtains video data of the target video.

The sending end device can acquire video data in various manners, specifically, the sending end device can acquire video data of a target video in a real-time acquisition manner, and also can acquire video data of a pre-stored target video from a preset storage space.

The sending end device may obtain the video data of the target video in any one of the following manners, which is referred to as the following manner one and manner two:

in the first mode, the sending terminal device collects video data of a target video.

In the process of carrying out video call, teleconference or live video broadcast between the sending end equipment and the receiving end equipment, the sending end equipment can acquire video data of a target video in real time through a preset camera. Moreover, in order to facilitate the encoding and compression of the video data in the subsequent steps, the sending-end device may simultaneously obtain the video data of multiple resolutions for the target video according to an adjusted Application Programming Interface (API).

For example, in the process of a video call, the sending end device may acquire an image shot by a camera in real time, and output video data of two resolutions, such as first video data corresponding to a high resolution and second video data corresponding to a low resolution at the same time. And the second video data is obtained by down-sampling the first video data.

It should be noted that the camera used for acquiring the video data of the target video may be a camera carried by the sending end device itself, or may be an external camera connected to the sending end device. For example, the camera may be a mobile phone or an external camera connected to a computer.

And secondly, the sending terminal equipment acquires the video data of the pre-stored target video.

The sending end device can send the video data of the target video collected in real time to the receiving end device, and can also send the video data of the target video stored in advance to the receiving end device. If the sending end device needs to send the video data of the pre-stored target video to the receiving end device, the sending end device may obtain the video data of the target video from a pre-set storage space.

For example, the sending end device may detect an operation triggered by a user, determine a storage path of video data of a target video for transmission according to the operation triggered by the user, and then may obtain the video data of the target video from a corresponding storage space according to the storage path.

It should be noted that, in practical application, the storage space for the sending end device to obtain the video data may be a storage space of the sending end device, an external storage device connected to the sending end device, or a server connected to the sending end device, that is, a cloud storage space.

Step 302, the sending end device performs coding compression on the acquired video data of the target video to obtain first coded data and second coded data.

The first coded data is high-resolution video data after coding compression, and the second coded data is low-resolution video data after coding compression.

In step 301, the sending end device may obtain the video data of the target video in different manners, and corresponding to step 301, in step 302, the sending end device may perform encoding and compression in different manners according to the different obtaining manners of the video data of the target video, so as to obtain the first encoded data and the second encoded data.

Similarly to step 301, the sending-end device may also compress the acquired video data of the target video in any one of the following manners, see manner one and manner two below, where manner one of step 302 corresponds to manner one of step 301, and manner two of step 302 corresponds to manner two of step 301.

In the first mode, the sending end device encodes partial video frames in the first video data and the second video data according to a preset rule to obtain first encoded data and second encoded data.

After the first video data and the second video data of the target video are acquired by the camera, the sending end device can respectively encode partial video frames in the first video data and the second video data through a preset encoder, so that first encoded data generated by the first video data and second encoded data generated by the second video data can be obtained.

In the process of encoding partial video frames in the first video data and the second video data, the sending end device may select, according to a video frame number used for indicating a sequence of each video frame in the first video data, the partial video frame of the first video data according to a preset rule to obtain a group of video frames, and then encode, by using an encoder, the selected group of video frames to obtain first encoded data.

The sending end device may select a part of the video frames in the second video data to obtain another group of video frames, and then the encoder may also encode the another group of video frames in a manner similar to the above encoding process to obtain the second encoded data.

For example, referring to fig. 4A, 4B, and 4C, each of fig. 4A, 4B, and 4C shows a video frame of first video data and second video data, each of the first video data and the second video data includes 10 video frames, and the video frame of the first video data has a higher resolution than the video frame of the second video data. In the process of selecting a video frame by the encoder, referring to fig. 4A, the sending end device may select a 1 st frame, a 3 rd frame, a 5 th frame, a 7 th frame, and a 9 th frame in the first video data, and select a 2 nd frame, a 4 th frame, a 6 th frame, an 8 th frame, and a 10 th frame in the second video data; alternatively, referring to fig. 4B, the sending end device may select the 1 st frame, the 4 th frame, the 7 th frame, and the 10 th frame in the first video data, and select the 2 nd frame, the 3 rd frame, the 5 th frame, the 6 th frame, the 8 th frame, and the 9 th frame in the second video data; alternatively, referring to fig. 4C, the transmitting-end device may select the 1 st frame, the 5 th frame, and the 9 th frame in the first video data, and select the 2 nd frame, the 3 rd frame, the 4 th frame, the 6 th frame, the 7 th frame, the 8 th frame, and the 10 th frame in the second video data. Then, the sending end device may encode the video frame selected from the first video data through the encoder to obtain first encoded data, and simultaneously encode the video frame selected from the second video data through the encoder to obtain second encoded data.

It should be noted that the sending-end device may also select the video frame in other manners according to a preset rule, and the manner of selecting the video frame is not limited in the embodiment of the present application.

And secondly, the sending end equipment performs down-sampling on partial video frames in the obtained video data of the target video according to a preset rule to obtain two groups of video frames with different resolutions, and then respectively encodes the two groups of video frames to obtain first encoded data and second encoded data.

After the sending end device obtains the video data of the pre-stored target video from the storage space, it may perform down-sampling on a part of video frames in the video data of the pre-stored target video to obtain two sets of video frames with different resolutions, so that the two sets of video frames may be compressed by a preset encoder to obtain the first encoded data and the second encoded data.

Specifically, the sending end device may first group, according to a preset rule, each video frame by combining the video frame number of each video frame in the pre-stored video data, to obtain two groups of video frames with the same resolution. Then, the sending end device may perform downsampling on one group of video frames according to a preset rule to obtain a group of low-resolution video frames, and may obtain two groups of video frames with different resolutions by combining with a group of video frames that are not downsampled.

Then, the sending end device may use a similar manner to the first manner to encode and compress the two sets of video frames by the encoder, respectively, so as to obtain the first encoded data and the second encoded data.

It should be noted that, in the two sets of video frames acquired in the first and second modes, the resolution of each video frame in each set of video frames is the same, and the resolution of one set of video frames is greater than that of the other set of video frames. However, the frame number of any one of the video frames in one set of video frames is different from the frame number of any one of the video frames in the other set of encoded data. Moreover, each set of video frames may include a plurality of video frames with video frame number discontinuities.

Moreover, the frame numbers of the video frames in the group of high-resolution video frames are not continuous, and the frame numbers of the video frames in the group of low-resolution video frames can be continuous or not continuous. In order to improve the compression rate of video data and reduce the storage space and network bandwidth resources occupied by compressed data, in a group of video frames with high resolution, the difference value between the frame numbers of every two adjacent video frames is m, wherein m is a positive integer greater than 1, a group of video frames with continuous frame numbers can be included in a group of video frames with low resolution, the number of the video frames included in each group of the video frames with continuous frame numbers is n, and n is a positive integer greater than or equal to 1. For example, n may be 1 or more and 3 or less. Wherein, two adjacent video frames in each group of video frames indicate that no other video frame is included between the two video frames in the group of video frames. For example, a group of video frames includes video frames with

frame numbers

1, 3, and 5, two video frames with

frame numbers

1 and 3 are adjacent, and two video frames with frame numbers 3 and 5 are adjacent.

Each video frame corresponding to the first encoded data is complementary to each video frame corresponding to the second encoded data. That is, after combining each video frame corresponding to the first encoded data with each video frame corresponding to the second encoded data, each video frame with consecutive video frame numbers can be obtained by combining, and each consecutive video frame number corresponds to the video frame number of the target video one to one.

For example, referring to fig. 5A, 5B, and 5C, video frames of pre-stored video data are shown in fig. 5A, 5B, and 5C, the pre-stored video data including 10 video frames. The sending end device may group the 10 frames of video frames according to a preset rule, see fig. 5A, for example, may use the 1 st frame, the 3 rd frame, the 5 th frame, the 7 th frame, and the 9 th frame as a first group of video frames, and use the 2 nd frame, the 4 th frame, the 6 th frame, the 8 th frame, and the 10 th frame as a second group of video frames; alternatively, referring to fig. 5B, the sending end device may use the 1 st, 4 th, 7 th and 10 th frames as the first group of video frames, and use the 2 nd, 3 rd, 5 th, 6 th, 8 th and 9 th frames as the second group of video frames; alternatively, referring to fig. 5C, the transmitting-end device may take the 1 st, 5 th and 9 th frames as the first group of video frames and the 2 nd, 3 rd, 4 th, 6 th, 7 th, 8 th and 10 th frames as the second group of video frames. Then, the sending end device may perform downsampling on the second group of video frames to obtain a group of low-resolution video frames. Finally, the sending end device may encode the first group of video frames through the encoder to obtain first encoded data, and encode the downsampled second group of video frames through the encoder to obtain second encoded data.

It should be noted that, in practical applications, the sending-end device in the first mode of step 301 may also only collect the high-resolution video data, and then perform encoding compression through the second mode of step 302.

Step 303, the sending end device sends the first encoded data and the second encoded data to the receiving end device.

After obtaining the first encoded data and the second encoded data, the sending end device may send the first encoded data and the second encoded data to the receiving end device in a dual-channel manner under the condition of occupying a lower network bandwidth resource.

Step 304, the receiving end device receives the first encoded data and the second encoded data.

Step 305, the receiving end device decodes the first encoded data and the second encoded data respectively to obtain first decoded data and second decoded data.

After receiving the first encoded data and the second encoded data, the receiving end device may decode the first encoded data through a preset decoder to obtain first decoded data corresponding to the first encoded data, and decode the second encoded data through the decoder to obtain second decoded data corresponding to the second encoded data, that is, to obtain video frames corresponding to the first encoded data and the second encoded data, respectively.

Meanwhile, the receiving end device can store the first decoded data and the second decoded data, so that the storage space of the receiving end device can be saved. Accordingly, if the receiving end device needs to play the high-resolution video data corresponding to the first decoded data and the second decoded data, the receiving end device may execute step 306 and step 307 to play according to the first decoded data and the second decoded data.

It should be noted that, in the process of receiving the encoded data, the receiving end device may decode the received encoded data in real time to obtain a decoded video frame, and then compose decoded data according to a plurality of decoded video frames. For example, referring to fig. 6, after receiving a portion of the first encoded data and a portion of the second encoded data, the receiving end device may decode the received encoded data, so as to obtain a 1 st frame and a 3 rd frame according to the portion of the first encoded data, and obtain a 2 nd frame and a 4 th frame according to the portion of the second encoded data, and meanwhile, the receiving end device may further continue to receive the remaining first encoded data and the second encoded data sent by the sending end device.

And step 306, the receiving end device synthesizes the first decoded data and the second decoded data to obtain synthesized video data.

After the receiving end device decodes the first decoded data and the second decoded data corresponding to the first encoded data and the second encoded data, the receiving end device may restore the video frame included in the second decoded data according to the high-resolution video frame included in the first decoded data through a preset AI hyper-resolution model, so as to obtain a high-resolution restored frame. Then, the receiving-end device may combine the high-resolution video frame included in the first decoded data and the high-resolution restored frame to obtain video data of the target video composed of a plurality of high-resolution consecutive video frames.

Specifically, referring to fig. 7, fig. 7 is a schematic diagram illustrating a flow chart of a receiving device generating a high-resolution restored frame through an AI hyper-resolution model, where the AI hyper-resolution model includes a plurality of neural network convolution layers. If the receiving end device needs to recover the first video frame included in the second decoded data, the receiving end device may select, from the plurality of high resolution video frames included in the first decoded data, a high resolution video frame consecutive to the frame number of the first video frame in combination with the video frame number of each high resolution video frame according to the video frame number of the first video frame. Then, the receiving end device may input the selected first video frame and the corresponding high-resolution video frame into a preset AI hyper-resolution model, and restore the first video frame by combining the AI hyper-resolution model with the corresponding high-resolution video frame to obtain a high-resolution restored frame corresponding to the first video frame.

The first video frame may be any one of the video frames in the second decoded data. The high resolution video frame consecutive to the video frame number of the first video frame may be a video frame consecutive to the frame number before the low resolution video frame, or may be a video frame consecutive to the frame number after the low resolution video frame. For example, if the video frame number of the first video frame is 2, the video frame numbers of the consecutive video frames may be 1 or 3, which is not limited in the embodiment of the present application.

After restoring each low-resolution video frame corresponding to the second decoded data, the receiving end device may perform sorting and combining on each high-resolution video frame and each restored frame according to the video frame number of each high-resolution video frame corresponding to the first decoded data and the video frame number of each low-resolution video frame corresponding to each restored frame, so as to obtain the high-resolution synthesized video data.

Further, referring to fig. 8, fig. 8 shows a frame architecture diagram of the AI hyper-differential model, the AI hyper-differential model may obtain 1 high-resolution video frame with parameters of 1 × 540 × 960 × 6 through the first input terminal (input _ 1), where 1 represents the number of video frames, 540 × 960 represents the resolution of the video frames, and 6 represents the number of channels through which the AI hyper-differential model obtains the video frames, and the meaning of the parameters of each video frame is similar to the meaning of the parameters of the high-resolution video frame, which is not described herein again. Similarly, the AI hyper-differential model may obtain 1 low-resolution video frame with parameters of 1 × 270 × 480 × 6 through the second input terminal (input _ 2), the AI hyper-differential model may first perform a space depth conversion operation (SpaceToDepth) on the high-resolution video frame to obtain a video frame with parameters of 1 × 270 × 480 × 24, and then may perform a correlation operation (correlation) on the converted high-resolution video frame and low-resolution video frame, splice the converted high-resolution video frame and low-resolution video frame, and superimpose the video frames to obtain a video frame with parameters of 1 × 270 × 480 × 30 after the correlation.

And then, the AI hyper-differential model performs multiple convolution (Conv 2D) on the associated video frame and modifies the associated video frame through an activation function (Relu), the convolved video frame and the associated video frame are overlapped (Add), then the overlapped video frame is subjected to depth to space conversion operation (DepthTioScace), and finally output (Identity) to obtain a restored video frame with the parameter of 1 x 540 x 960.

When each video frame is convoluted through the convolution layers, the parameters of convolution kernels in each convolution layer are different. Referring to fig. 8, when performing the first convolution on the video frame with parameter 1 × 270 × 480 × 30, the parameter of the convolution kernel (filter) is 6 × 3 × 30, and the error (bias) parameter is 6, resulting in the video frame with parameter 1 × 270 × 480 × 6; then, the video frame with the parameter of 1 × 270 × 480 × 6 may be convolved, the parameter of the convolution kernel is 12 × 3 × 6, the error parameter is 12, and the video frame with the parameter of 1 × 270 × 480 × 12 is obtained; finally, the video frame with the parameter of 1 × 270 × 480 × 12 is convolved again, the parameter of the convolution kernel is 24 × 1 × 12, the error parameter is 24, and the video frame with the parameter of 1 × 270 × 480 × 24 is obtained.

In addition, the AI hyper-differential model may further perform convolution again on the video frame with the parameter of 1 × 270 × 480 × 30 (i.e., the associated video frame) in another manner, where the parameter of the convolution kernel is 24 × 1 × 30, and the error parameter is 24, to obtain another video frame with the parameter of 1 × 270 × 480 × 24, so as to perform a superimposition operation on the two video frames with the parameter of 1 × 270 × 480 × 24, to obtain a superimposed video frame.

Moreover, the AI hyper-resolution model in the above example may include 4 convolution layers, and the number of channels of the convolution layers is small, and the required calculation amount is low, so that when the receiving end device is used as a mobile phone, a tablet, or other portable mobile devices, the condition that the AI hyper-resolution model recovers the video frame is satisfied, and the recovered frame can be recovered.

It should be noted that each pixel in the video frame is represented in YUV format. Correspondingly, before the video frames are input into the AI hyper-resolution model, the YUV format data of each video frame may be converted to obtain the 6-channel YUV format data. The first 4 channels may be Y data indicating brightness, and the last 2 channels may be U data and V data indicating color and saturation, respectively.

In addition, the above is only described by taking an example that one high-resolution video frame restores one low-resolution video frame, and in practical applications, the receiving end device may input a plurality of high-resolution video frames to the AI hyper-resolution model to restore one low-resolution video frame. For example, if the video frame number of the low-resolution video frame to be restored is 2, the video frame numbers of the high-resolution video frames input to the AI hyper-segmentation model may be 1 and 3, which is not limited in the embodiment of the present application.

In addition, if the video frames of the second decoded data include video frames with consecutive frame numbers, in the process of restoring the video frames with consecutive frame numbers, the receiving end device may restore the video frames adjacent to the consecutive video frames in the video frames included in the first decoded data, or may restore the remaining video frames in the consecutive video frames by using the adjacent restored frames. The process of restoring the continuous video frames is similar to the process of restoring the low-resolution video frames, and is not described herein again.

For example, the first decoding data includes a video frame with a frame number of 1, the second decoding data includes 3 consecutive video frames with frame numbers of 2, 3, and 4, the receiving end device may input each of the video frames with frame numbers of 1, 2, 3, and 4 into the AI hyper-segmentation model, and the AI hyper-segmentation model may process the video frames with frame numbers of 2, 3, and 4 to obtain the restored frames corresponding to the video frames with frame numbers of 2, 3, and 4, respectively.

Or, the receiving end device may input both the video frames with the frame numbers of 1 and 2 into the AI hyper-diversity model, and the AI hyper-diversity model may process the video frame with the frame number of 2 according to the video frame with the frame number of 1 to obtain a restored frame corresponding to the video frame with the frame number of 2, and then input the video frame with the frame number of 3, and process the video frame with the frame number of 3 through the restored frame corresponding to the video frame with the frame number of 2 to obtain a restored frame corresponding to the video frame with the frame number of 3. The receiving end device may continue to perform the recovery processing on the video frame with the frame number of 4 in a manner similar to the processing of the video frame with the frame number of 3, thereby completing the recovery processing on each video frame with consecutive frame numbers.

Step 307, the receiving end device plays the synthesized video data.

The receiving end device can play the synthesized video data after obtaining the synthesized video data according to the first decoding data and the second decoding data, so that the high-resolution video data can be obtained by occupying smaller network bandwidth and smaller storage space.

In practical application, the receiving end device may not only play the video data obtained by synthesizing the first decoded data and the second decoded data, but also play the stored first decoded data and the stored second decoded data separately, which is not limited in this embodiment of the present application.

In addition, referring to fig. 9, the sending end device may also send the first encoded data and the second encoded data to the server, and forward the first encoded data and the second encoded data to the receiving end device through the server.

Further, the sending end device may send the video data of the high-resolution target video to the server without performing encoding compression on the video data of the target video, then perform encoding compression on the video data of the target video by the server to obtain first encoded data and second encoded data, and then send the encoded first encoded data and second encoded data to the receiving end device by the server.

It should be noted that, in the foregoing embodiment, only the sending end device sends two compressed data with different resolutions to the receiving end device, but in practical application, the sending end device may also first obtain three or more sets of video frames with different resolutions, then compress each set of video frames, and then send each compressed data to the receiving end device.

For example, referring to fig. 10, a video frame of pre-stored video data is shown in fig. 10, the pre-stored video data comprising 10 video frames. The sending end device may divide 10 frames of video frames of the video data into 3 groups of video frames according to a preset rule, for example, the 1 st frame, the 5 th frame, and the 9 th frame may be used as a first group, the 2 nd frame, the 4 th frame, the 6 th frame, the 8 th frame, and the 10 th frame may be used as a second group, and the 3 rd frame and the 7 th frame may be used as a third group. Then, the sending end device may down-sample the second group of video frames and the third group of video frames, respectively, to obtain two groups of low-resolution video frames, and the resolution corresponding to the third group of video frames is lower than the resolution corresponding to the second group of video frames. Finally, the sending end device may encode the first group of video frames, the down-sampled second group of video frames, and the down-sampled third group of video frames respectively through an encoder to obtain first encoded data, second encoded data, and third encoded data.

Correspondingly, the receiving end device may receive the plurality of encoded data sent by the sending end device, and decode the plurality of encoded data to obtain a plurality of groups of video frames with different resolutions. And then, restoring the video frames with different resolutions based on the video frames with different resolutions through a preset AI hyper-resolution model to obtain the restored frames, thereby improving the accuracy of each restored frame obtained by restoration.

To sum up, according to the video transmission method provided by the embodiment of the present application, the sending end device determines two sets of video frames according to the video data of the target video, and performs encoding and compression on each set of video frames to obtain encoded data corresponding to each set of video frames, and then sends each encoded data to the receiving end device, and each encoded data with different resolutions is transmitted by encoding, so that the compression ratio can be improved by 30% to 40%, thereby effectively improving the compression ratio of the video data of the target video, reducing the network bandwidth resources occupied when transmitting the video data, avoiding the waste of the network bandwidth resources, and reducing the cost of transmitting the video data.

And the receiving end equipment decodes the first coded data and the second coded data to obtain first decoded data and second decoded data, and then synthesizes the first decoded data and the second decoded data to obtain synthesized video data.

Similarly, the server does not need to store the video data of multiple resolutions of the target video, so that the storage space occupied by the video data of the target video in the server can be reduced, the utilization rate of the storage space of the server is improved, and the maintenance cost of the server is reduced.

In addition, the sending end device acquires the video data of the target video in different modes and encodes the video data of the target video in a mode corresponding to the acquisition mode, so that the flexibility of acquiring the video data can be improved, and the flexibility of encoding the video data of the target video can also be improved.

In addition, the process of obtaining the first coded data and the second coded data through coding and the process of obtaining the first decoding data and the second decoding data through decoding are independent of the standard coding and decoding stage, any standard coding and decoding technology can be compatible, and the compatibility of video data transmission is improved.

Furthermore, the receiving end device synthesizes the first decoding data and the second decoding data which can be complemented by the video frame through a preset AI hyper-separation model, so that the accuracy of recovering the obtained video frame can be improved, and the playing effect of the synthesized video data can be improved.

Moreover, the receiving terminal device processes the low-resolution video frame according to the high-resolution video frame adjacent to the frame number through the AI hyper-segmentation model, and utilizes the high-frequency spatial domain information of the high-resolution video frame to reduce the complexity of the hyper-segmentation network, thereby reducing the operating conditions of the AI hyper-segmentation model, realizing the operation of the AI hyper-segmentation model through the portable terminal and improving the universality of the AI hyper-segmentation model.

In addition, in the process of processing the video frame by the receiving end device through the AI hyper-resolution model, the missing frame information can be obtained according to the low-resolution video frame, so that the network bandwidth resource is saved, the complexity of processing the video frame is reduced, and the accuracy of restoring the frame is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 11 is a block diagram of a video transmission apparatus according to an embodiment of the present application, which corresponds to the video transmission method described in the foregoing embodiment, and only the relevant portions of the embodiment of the present application are shown for convenience of illustration.

Referring to fig. 11, the apparatus includes:

the encoding module 1101 is configured to determine two groups of video frames according to video data of a target video, where the resolution of each video frame in each group of video frames is the same, and the resolution of one group of video frames is greater than that of another group of video frames, a frame number of any one video frame in one group of video frames is different from that of any one video frame in another group of video frames, and each group of video frames includes multiple video frames with discontinuous frame numbers;

the encoding module 1101 is further configured to encode each group of video frames respectively to obtain encoded data corresponding to each group of video frames;

a sending module 1102, configured to send each encoded data.

Optionally, the video data of the target video includes: first video data and second video data;

the encoding module 1101 is specifically configured to perform downsampling on the first video data to obtain the second video data, where a resolution of each video frame in the first video data is higher than a resolution of each video frame in the second video data; and respectively selecting a plurality of video frames from the first video data and the second video data to obtain the two groups of video frames.

Optionally, the resolution of each video frame in the video data of the target video is the same;

the encoding module 1101 is specifically configured to select a part of video frames from the video data of the target video to obtain a group of video frames; and performing down-sampling on another part of video frames in the video data of the target video to obtain another group of video frames.

Optionally, in a group of video frames with a higher resolution of the two groups of video frames, a difference between frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

Optionally, the two groups of video frames with smaller resolution include multiple groups of video frames with consecutive frame numbers, where the number of the video frames included in each group of video frames with consecutive frame numbers is n, where n is a positive integer greater than or equal to 1.

Optionally, the frame numbers of the video frames obtained by combining the two groups of video frames are continuous.

Optionally, referring to fig. 12, the apparatus further includes:

an obtaining module 1103, configured to obtain video data of the target video from a storage space;

or, the obtaining module 1103 is configured to collect video data of the target video in real time.

Optionally, the sending module 1102 is specifically configured to send each encoded data to a receiving end device;

or, the sending module is specifically configured to send each encoded data to a server, and the server is configured to forward each encoded data to the receiving end device.

Fig. 13 is a block diagram of another video transmission apparatus according to the embodiment of the present application, and only a portion related to the embodiment of the present application is shown for convenience of description.

Referring to fig. 13, the apparatus includes:

a receiving module 1301, configured to receive first encoded data and second encoded data of a target video, where resolutions of the first encoded data and the second encoded data are different;

a decoding module 1302, configured to decode the first encoded data and the second encoded data respectively to obtain first decoded data corresponding to the first encoded data and second decoded data corresponding to the second encoded data, where a resolution of the first decoded data is greater than a resolution of the second decoded data;

a processing module 1303, configured to process the second decoded data according to the first decoded data to obtain a restored frame with the same resolution as that of the first decoded data;

the processing module 1303 is further configured to combine the first decoded data and the restored frame to obtain video data of the target video.

Optionally, the processing module 1303 is specifically configured to process the video frame of the second decoded data according to the video frame of the first decoded data through a preset AI hyper-diversity model, so as to obtain the restored frame.

Optionally, the processing module 1303 is specifically configured to input the first video frame of the second decoded data and at least one video frame consecutive to the frame number of the first video frame into the AI hyper-score model, so as to obtain a restored frame corresponding to the first video frame.

To sum up, according to the video transmission apparatus provided by the embodiment of the present application, the sending end device performs encoding compression on video data of a target video to obtain and send high-resolution first encoded data and low-resolution second encoded data to the receiving end device, and the compression ratio can be improved by 30% to 40% by compressing and transmitting the first encoded data and the second encoded data with different resolutions, so that the compression ratio of the video data of the target video can be effectively improved, network bandwidth resources occupied when the video data is transmitted are reduced, waste of the network bandwidth resources is avoided, and the cost for transmitting the video data is reduced.

Taking an electronic device as an example, a sending end device and a receiving end device related to the embodiment of the present application are introduced below, please refer to fig. 14, and fig. 14 is a schematic structural diagram of an electronic device provided in the embodiment of the present application.

The electronic device may include a processor 1410, an external memory interface 1420, an internal memory 1421, a Universal Serial Bus (USB) interface 1430, a charging management module 1440, a power management module 1441, a battery 1442, an antenna 1, an antenna 2, a mobile communication module 1450, a wireless communication module 1460, an audio module 1470, a speaker 1470A, a receiver 1470B, a microphone 1470C, an earphone interface 1470D, a sensor module 1480, buttons 1490, a motor 1491, an indicator 1492, a camera 1493, a display 1494, and a Subscriber Identity Module (SIM) card interface 1495, and the like. The sensor module 1480 may include a pressure sensor 1480A, a gyroscope sensor 1480B, an air pressure sensor 1480C, a magnetic sensor 1480D, an acceleration sensor 1480E, a distance sensor 1480F, a proximity light sensor 1480G, a fingerprint sensor 1480H, a temperature sensor 1480J, a touch sensor 1480K, an ambient light sensor 1480L, a bone conduction sensor 1480M, and the like.

It should be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device. In other embodiments of the present application, an electronic device may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components may be used. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 1410 may include one or more processing units, such as: the processor 1410 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller can be a neural center and a command center of the electronic device. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 1410 for storing instructions and data. In some embodiments, the memory in the processor 1410 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 1410. If the processor 1410 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 1410, thereby increasing the efficiency of the system.

In some embodiments, processor 1410 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus comprising a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 1410 may include multiple sets of I2C buses. The processor 1410 may be coupled to the touch sensor 1480K, the charger, the flash, the camera 1493, etc. through different I2C bus interfaces. For example: the processor 1410 may be coupled to the touch sensor 1480K via an I2C interface, such that the processor 1410 and the touch sensor 1480K communicate via an I2C bus interface to implement touch functionality of the electronic device.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 1410 with the wireless communication module 1460. For example: the processor 1410 communicates with a bluetooth module in the wireless communication module 1460 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 1470 may transmit an audio signal to the wireless communication module 1460 through a UART interface, so as to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 1410 with peripheral devices such as a display 1494, a camera 1493, etc. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 1410 and camera 1493 communicate through a CSI interface to implement the capture functionality of the electronic device. The processor 1410 and the display 1494 communicate via the DSI interface to implement the display function of the electronic device.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 1410 with the camera 1493, the display 1494, the wireless communication module 1460, the audio module 1470, the sensor module 1480, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, and the like.

The USB interface 1430 is an interface conforming to the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 1430 may be used to connect a charger to charge the electronic device, and may also be used to transmit data between the electronic device and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other electronic devices, such as AR devices and the like.

It should be understood that the interface connection relationship between the modules according to the embodiment of the present invention is only an exemplary illustration, and does not limit the structure of the electronic device. In other embodiments of the present application, the electronic device may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The power management module 1441 is used to connect the battery 1442, the charging management module 1440 and the processor 1410. The power management module 1441 receives input from the battery 1442 and/or the charging management module 1440, and provides power to the processor 1410, the internal memory 1421, the external memory, the display 1494, the camera 1493, and the wireless communication module 1460. The power management module 1441 may also be used to monitor parameters such as battery capacity, battery cycle number, battery state of health (leakage, impedance), etc. In other embodiments, a power management module 1441 may also be disposed in the processor 1410. In other embodiments, the power management module 1441 and the charging management module 1440 may be disposed in the same device.

The wireless communication function of the electronic device can be implemented by the antenna 1, the antenna 2, the mobile communication module 1450, the wireless communication module 1460, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in an electronic device may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 1450 may provide a solution including 2G/3G/4G/5G wireless communication applied on the electronic device. The mobile communication module 1450 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 1450 may receive electromagnetic waves from the antenna 1, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 1450 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 1450 may be provided in the processor 1410. In some embodiments, at least some of the functional blocks of the mobile communication module 1450 may be provided in the same device as at least some of the blocks of the processor 1410.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 1470A, the receiver 1470B, etc.) or displays an image or video through the display 1494. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be separate from the processor 1410, and may be located in the same device as the mobile communication module 1450 or other functional modules.

The wireless communication module 1460 may provide solutions for wireless communication applied to electronic devices, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 1460 may be one or more devices integrating at least one communication processing module. The wireless communication module 1460 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering on electromagnetic wave signals, and transmits the processed signals to the processor 1410. The wireless communication module 1460 may also receive a signal to be transmitted from the processor 1410, frequency modulate it, amplify it, and convert it into electromagnetic waves via the antenna 2 to radiate it out.

In some embodiments, the antenna 1 of the electronic device is coupled to the mobile communication module 1450 and the antenna 2 is coupled to the wireless communication module 1460 so that the electronic device can communicate with networks and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device implements display functions via the GPU, the display screen 1494, and the application processor, etc. The GPU is a microprocessor for image processing, connected to the display 1494 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 1410 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 1494 is used to display images, video, and the like. The display screen 1494 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device may include 1 or N display screens 1494, N being a positive integer greater than 1.

The electronic device may implement a shooting function through the ISP, the camera 1493, the video codec, the GPU, the display 1494, the application processor, and the like.

The ISP is used to process the data fed back by the camera 1493. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 1493.

The camera 1493 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device may include 1 or N cameras 1493, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device selects a frequency point, the digital signal processor is used for performing fourier transform and the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The electronic device may support one or more video codecs. In this way, the electronic device can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor, which processes input information quickly by referring to a biological neural network structure, for example, by referring to a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent cognition of electronic equipment, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 1420 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the electronic device. The external memory card communicates with the processor 1410 through an external memory interface 1420 to implement data storage functions. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 1421 may be used to store computer-executable program code, which includes instructions. The processor 1410 performs various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 1421. The internal memory 1421 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area can store data (such as audio data, phone book and the like) created in the using process of the electronic device. In addition, the internal memory 1421 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The electronic device may implement audio functions via the audio module 1470, speaker 1470A, microphone 1470B, earphone interface 1470D, application processor, etc. Such as music playing, recording, etc.

The audio module 1470 is used to convert digital audio information into an analog audio signal output and also used to convert an analog audio input into a digital audio signal. The audio module 1470 may also be used to encode and decode audio signals. In some embodiments, the audio module 1470 may be disposed in the processor 1410, or some functional modules of the audio module 1470 may be disposed in the processor 1410.

The speaker 1470A, also referred to as a "horn," is used to convert electrical audio signals into acoustic signals. The electronic device can listen to music through the speaker 1470A or listen to a hands-free call.

A receiver 1470B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic device answers a call or voice information, voice can be answered by placing the receiver 1470B close to the ear.

The microphone 1470C, also referred to as a "microphone", is used to convert sound signals into electrical signals. When making a call or sending voice information, the user can input a voice signal to the microphone 1470C by speaking the user's mouth near the microphone 1470C. The electronic device may be provided with at least one microphone 1470C. In other embodiments, the electronic device may be provided with two microphones 1470C to implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device may further include three, four, or more microphones 1470C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and the like.

The headset interface 1470D is used to connect wired headsets. The headset interface 1470D may be the USB interface 1430, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 1480A is configured to sense a pressure signal and convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 1480A may be disposed on the display screen 1494. The pressure sensor 1480A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 1480A, the capacitance between the electrodes changes. The electronic device determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 1494, the electronic device detects the intensity of the touch operation according to the pressure sensor 1480A. The electronic device may also calculate the position of the touch from the detection signal of the pressure sensor 1480A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

Acceleration sensor 1480E may detect the magnitude of acceleration of the electronic device in various directions (typically three axes). When the electronic device is at rest, the magnitude and direction of gravity can be detected. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

Touch sensor 1480K, also referred to as a "touch panel". The touch sensor 1480K may be disposed on the display screen 1494, and the touch sensor 1480K and the display screen 1494 form a touch screen, which is also referred to as a "touch screen". The touch sensor 1480K is used to detect a touch operation applied thereto or therearound. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through the display 1494. In other embodiments, the touch sensor 1480K may be disposed on a different surface of the electronic device than the display 1494.

The buttons 1490 include a power-on key, a volume key, and the like. The keys 1490 may be mechanical keys. Or may be touch keys. The electronic device may receive a key input, and generate a key signal input related to user settings and function control of the electronic device.

The indicator 1492 may be an indicator light, and may be used to indicate a charging status, a change in power, or a message, a missed call, a notification, etc.

The SIM card interface 1495 is used for connecting a SIM card. The SIM card can be attached to and detached from the electronic device by being inserted into the SIM card interface 1495 or being pulled out of the SIM card interface 1495. The electronic equipment can support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 1495 can support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 1495 allows multiple cards to be inserted simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 1495 is also compatible with different types of SIM cards. The SIM card interface 1495 is also compatible with external memory cards. The electronic equipment realizes functions of conversation, data communication and the like through the interaction of the SIM card and the network. In some embodiments, the electronic device employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include at least: any entity or apparatus capable of carrying computer program code to an electronic device, a recording medium, computer Memory, read-Only Memory (ROM), random-Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video transmission method is applied to a video transmission system consisting of sending end equipment and receiving end equipment, and comprises the following steps:

the sending end equipment determines two groups of video frames according to video data of a target video, wherein the resolution ratio of each video frame in each group of video frames is the same, the resolution ratio of one group of video frames is greater than that of the other group of video frames, the frame number of any one video frame in one group of video frames is different from that of any one video frame in the other group of video frames, and each group of video frames comprises a plurality of video frames with discontinuous frame numbers;

the receiving end device decodes the first encoded data and the second encoded data respectively to obtain first decoded data corresponding to the first encoded data and second decoded data corresponding to the second encoded data, wherein the resolution of the first decoded data is greater than that of the second decoded data;

the receiving end equipment processes the second decoding data according to the first decoding data to obtain a reduction frame with the same resolution as the first decoding data;

and the receiving end equipment combines the first decoding data and the reduction frame to obtain the video data of the target video.

2. The method of claim 1, wherein the video data of the target video comprises: first video data and second video data;

3. The method of claim 1, wherein the resolution of each video frame in the video data of the target video is the same;

4. The method according to claim 2 or 3, wherein in the group of video frames with higher resolution, the difference between the frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

5. The method according to any one of claims 2 to 4, wherein the two groups of video frames have smaller resolution, and comprise a plurality of groups of consecutive video frames, and each group of consecutive video frames comprises n video frames, where n is a positive integer greater than or equal to 1.

6. The method according to any one of claims 1 to 5, wherein the frame numbers of the video frames obtained by combining the two sets of video frames are consecutive.

7. The method according to any of claims 1 to 6, wherein before the sending end device determines two sets of video frames from the video data of the target video, the method further comprises:

8. The method according to any one of claims 1 to 7, wherein the sending end device sends first encoded data and second encoded data corresponding to the two sets of video frames, respectively, and includes:

9. The method according to claim 1, wherein the receiving end device processes the second decoded data according to the first decoded data to obtain a restored frame with the same resolution as the first decoded data, and includes:

10. The method according to claim 9, wherein the receiving end device processes the video frame of the second decoded data according to the video frame of the first decoded data through a preset artificial intelligence AI hyper-score model to obtain the reduced frame, including:

11. A method of video transmission, the method comprising:

and transmitting each coded data.

12. The method of claim 11, wherein the video data of the target video comprises: first video data and second video data;

13. The method of claim 11, wherein the resolution of each video frame in the video data of the target video is the same;

14. The method according to claim 12 or 13, wherein in the group of video frames with higher resolution, the difference between the frame numbers of every two adjacent video frames is m, where m is a positive integer greater than 1.

15. The method according to any one of claims 12 to 14, wherein the two sets of video frames have smaller resolution, and comprise a plurality of sets of video frames with consecutive frame numbers, and each set of video frames with consecutive frame numbers comprises n video frames, where n is a positive integer greater than or equal to 1.

16. The method according to any one of claims 11 to 15, wherein the frame numbers of the video frames obtained by combining the two sets of video frames are consecutive.

17. A method of video transmission, the method comprising:

18. The method of claim 17, wherein the processing the second decoded data according to the first decoded data to obtain a restored frame with the same resolution as the first decoded data comprises:

19. The method of claim 18, wherein the processing the video frame of the second decoded data according to the video frame of the first decoded data through a preset AI hyper-differential model to obtain the reduced frame comprises:

20. A video transmission system, characterized in that the video transmission system comprises: a sending end device and a receiving end device;

the sender device is configured to perform the video transmission method according to any one of claims 11 to 16;

the sink device is configured to perform the video transmission method according to any one of claims 17 to 19.

21. An electronic device, comprising: a processor for executing a computer program stored in a memory to implement the video transmission method of any of claims 11 to 16 or 17 to 19.

22. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the video transmission method according to any one of claims 11 to 16 or claims 17 to 19.

23. A chip system, comprising a memory and a processor, wherein the processor executes a computer program stored in the memory to implement the video transmission method of any one of claims 11 to 16 or claims 17 to 19.