CN114363649B - Video processing method, device, equipment and storage medium - Google Patents

Video processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114363649B
CN114363649B CN202111669381.9A CN202111669381A CN114363649B CN 114363649 B CN114363649 B CN 114363649B CN 202111669381 A CN202111669381 A CN 202111669381A CN 114363649 B CN114363649 B CN 114363649B
Authority
CN
China
Prior art keywords
target
key
frame
compressed
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111669381.9A
Other languages
Chinese (zh)
Other versions
CN114363649A (en
Inventor
张傲阳
李清
陈颖
马晓腾
马茜
孟胜彬
邹龙昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest University of Science and Technology
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Southwest University of Science and Technology
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest University of Science and Technology, Beijing ByteDance Network Technology Co Ltd filed Critical Southwest University of Science and Technology
Priority to CN202111669381.9A priority Critical patent/CN114363649B/en
Publication of CN114363649A publication Critical patent/CN114363649A/en
Application granted granted Critical
Publication of CN114363649B publication Critical patent/CN114363649B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention discloses a video processing method, a device, equipment and a storage medium. The method applied to the live broadcast user side comprises the following steps: target original coding data of live video are obtained, target frame numbers of original non-key frames to be processed are determined according to uplink network conditions and/or super-resolution image processing capacity of a server, the original key frames and the non-key frames to be processed are downsampled, video frame sequences comprising compressed key frames, compressed non-key frames and uncompressed non-key frames are uploaded to the server, super-resolution image processing is respectively carried out on the compressed key frames and the compressed non-key frames in the video frame sequences by the server, and the obtained target video frame sequences are sent to a user side of an audience. By adopting the technical scheme, the video can be compressed more flexibly, so that the video compression method is better suitable for the variability of a network, and further reduces the time consumption of end-to-end transmission, can ensure the video quality and reduce the live broadcast time delay.

Description

Video processing method, device, equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of video processing, in particular to a video processing method, a device, equipment and a storage medium.
Background
With the development of network technology and terminal equipment, video services such as video on demand and live broadcast have become mainstream applications of the internet. The proliferation of video traffic puts tremendous strain on network bandwidth. At the same time, the requirements of users on QoE (Quality of Experience ) are also increasing, including video quality, blocking, rate switching, and live delay, which presents a great challenge to video transmission.
The bandwidth of the uplink in video transmission networks is often very inadequate compared to the downlink, e.g. in 4G networks the difference between the uplink and the downlink bandwidth is up to 10 times, so the uplink in the network becomes the main bottleneck for live video transmission. And, live video quality is largely dependent on the upstream network quality of the uploader during live. If the upstream bandwidth is insufficient, the video quality of the viewer will be significantly reduced and a jam is likely to occur. Therefore, how to improve the quality of live video streaming under the condition of limited uplink bandwidth resources is a technical problem to be solved in the present day.
Disclosure of Invention
The embodiment of the disclosure provides a video processing method, a video processing device, a storage medium and video processing equipment, which can optimize the existing video processing scheme.
In a first aspect, an embodiment of the present disclosure provides a video processing method, which is applied to a live broadcast user side in a live broadcast system, where the live broadcast system further includes a server side and a viewer user side, and the method includes:
acquiring target original coding data of a live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group;
determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame;
uploading a video frame sequence to the server, wherein the video frame sequence comprises the compressed key frame, the compressed non-key frame and the rest original non-key frames except the non-key frames to be processed in the original non-key frame set, and the target video frame sequence comprises the target key frame, the target non-key frame and the rest original key frames.
In a second aspect, an embodiment of the present disclosure provides a video processing method, which is applied to a server in a live broadcast system, where the live broadcast system further includes a live broadcast user side and a viewer user side, and the method includes:
receiving a video frame sequence sent by the live broadcast user side, wherein the video frame sequence comprises compressed key frames, compressed non-key frames and residual original non-key frames except for the non-key frames to be processed in an original non-key frame set; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence respectively to obtain target key frames and target non-key frames;
Determining a target video frame sequence according to the target key frame, the target non-key frame and the residual original non-key frames;
and sending the target video frame sequence to the audience user side.
In a third aspect, an embodiment of the present disclosure provides a video processing apparatus configured at a live user end in a live broadcast system, where the live broadcast system further includes a server end and a viewer user end, and the apparatus includes:
the original data acquisition module is used for acquiring target original coding data of the live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group;
the target frame number determining module is used for determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
the downsampling module is used for determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and downsampling the original key frame and the non-key frame to be processed respectively to obtain a compressed key frame and a compressed non-key frame;
the video frame sequence uploading module is used for uploading a video frame sequence to the server and indicating the server to respectively perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence to obtain target key frames and target non-key frames, and sending the target video frame sequence to the audience user side, wherein the video frame sequence comprises the compressed key frames, the compressed non-key frames and the residual original non-key frames except the non-key frames to be processed in an original non-key frame set, and the target video frame sequence comprises the target key frames, the target non-key frames and the residual original key frames.
In a fourth aspect, an embodiment of the present disclosure provides a video processing apparatus configured at a server in a live broadcast system, where the live broadcast system further includes a live broadcast user side and a viewer user side, where the apparatus includes:
the video frame sequence receiving module is used for receiving a video frame sequence sent by the live broadcast user side, wherein the video frame sequence comprises compressed key frames, compressed non-key frames and residual original non-key frames except for the non-key frames to be processed in an original non-key frame set; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
the image processing module is used for respectively carrying out super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence to obtain target key frames and target non-key frames;
A target video frame sequence determining module, configured to determine a target video frame sequence according to the target key frame, the target non-key frame, and the remaining original non-key frames;
and the target video frame sequence sending module is used for sending the target video frame sequence to the audience user side.
In a fifth aspect, embodiments of the present disclosure provide an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a video processing method as provided by embodiments of the present disclosure when the computer program is executed.
In a sixth aspect, the disclosed embodiments provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a video processing method as provided by the disclosed embodiments.
According to the video processing scheme provided by the embodiment of the disclosure, a live user side in a live system acquires an original key frame and an original non-key frame set in a target picture group of a live video, determines a target frame number of an original non-key frame to be processed according to an uplink network condition of the live system and/or super-resolution image processing capability of a server side, respectively downsamples the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame, uploads a video frame sequence containing the compressed key frame, the compressed non-key frame and the non-compressed non-key frame to the server side, respectively carries out super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence by the server side, and sends the obtained target video frame sequence to a viewer user side. By adopting the technical scheme, the video can be compressed more flexibly, so that the video compression method is better suitable for the variability of a network, and further reduces the time consumption of end-to-end transmission, can ensure the video quality and reduce the live broadcast time delay.
Drawings
Fig. 1 is an application scenario schematic diagram of a live broadcast system provided in an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the disclosure;
fig. 3 is a flowchart of yet another video processing method according to an embodiment of the disclosure;
fig. 4 is a flowchart of another video processing method according to an embodiment of the disclosure;
fig. 5 is a schematic diagram of a video frame processing procedure of a live user side according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a video frame processing procedure of a server provided in an embodiment of the disclosure;
fig. 7 is a block diagram of a video processing apparatus according to an embodiment of the present disclosure;
fig. 8 is a block diagram of another video processing apparatus according to an embodiment of the present disclosure;
fig. 9 is a block diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.
The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
In the following embodiments, optional features and examples are provided in each embodiment at the same time, and the features described in the embodiments may be combined to form multiple alternatives, and each numbered embodiment should not be considered as only one technical solution.
Fig. 1 is an application scenario schematic diagram of a live broadcast system provided in an embodiment of the present disclosure, where, as shown in fig. 1, the live broadcast system includes a live broadcast user terminal 101, a service terminal 102, and a viewer user terminal 103. Live clients can be distributed throughout the world and are the source of the video stream. Specifically, the live user side collects data through an image collection device and the like, processes the data into a video stream capable of being transmitted, and finally uploads the processed video stream to the server through a network channel, such as a fourth-generation mobile communication technology (the 4th generation mobile communication technology,4G) or wireless fidelity (Wireless Fidelity, WIFI) and the like. The server side can collect the video stream uploaded by the live broadcast user side and push the video stream to the audience user side. At the audience user side, the video stream pushed by the live user side can be expected to be seen in real time, and the quality of the video and the playing stability are ensured.
The network link from the live user end to the service end is an uplink, and the network link from the service end to the audience user end or the network link from the service end to the live user end is a downlink. In practice, the bandwidth of the uplink in the network is often very insufficient compared to the downlink, and thus the uplink in the network becomes a main bottleneck for live video transmission. And, live video quality is largely dependent on the upstream network quality of the uploader during live. If the upstream bandwidth is insufficient, the video quality of the viewer's client will be significantly reduced and a clip will be likely to occur.
The inventor researches and discovers that the related technology mainly adopts a Super-resolution (SR) technology based on a deep neural network in video transmission to solve the problem of insufficient bandwidth in video transmission. Specifically, the low-resolution video (for example, 240P) is uploaded on the live user side, and then the service side reconstructs all frames in the video by using the super-resolution technology, so that the high-resolution video (for example, 960P) is obtained, and the aim of improving the video quality of the live video stream is achieved. However, the whole process of video quality enhancement by super resolution at the server side includes three steps: video decoding, superdividing each frame, and encoding each frame after superdivision. The whole process takes a long time, resulting in high delay in live transmission, so that the live transmission is difficult to meet strict time delay requirements.
In the embodiment of the disclosure, the non-key frames in the picture group of the live video can be selectively compressed, and the requirements of video quality and live low delay can be better considered under the condition of limited uplink bandwidth.
The embodiment of the disclosure is applied to live broadcast scenes, such as conference live broadcast scenes, teaching live broadcast scenes, live broadcast with goods scenes and the like. A live user side may be understood to refer to a user side used by a host user (e.g., a lecture teacher, conference lecturer, or anchor, etc.). A viewer client may be understood as a client used by a user watching a live broadcast (e.g., a student, a person listening to a meeting, or an article buyer, etc.). The live user side and the audience user side can be installed in devices such as smartphones, notebooks or desktop computers. The server side generally refers to a server carrying live broadcast services, and may be an independent server, a cluster server, a cloud server, or the like, which is not limited in the embodiments of the present disclosure.
Fig. 2 is a schematic flow chart of a video processing method according to an embodiment of the present disclosure, where the method is applied to a live user side in a live system, and may be executed by a video processing device corresponding to the live user side, where the device may be implemented by software and/or hardware, and may generally be integrated in an electronic device. As shown in fig. 2, the method includes:
Step 201, obtaining target original coding data of a live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group.
The live user side can collect live video data through a collection device with a video collection function, and the collection device can be an electronic device to which the live user side belongs or other devices with a communication connection relationship with the electronic device. After the live video is acquired, the live user side may encode the acquired live video data into encoded data including a group of pictures (Group of Pictures, GOP), referred to herein as original encoded data, through a preset encoder (e.g., H264, etc.). A GOP is understood to be a group of consecutive pictures that can be decoded independently, generally without dependency between GOPs, the first frame of a GOP being typically a Key Frame (KF) and the remaining following frames being non-key frames (NK), the key frames being obtained by intra-coded compression, and the non-key frames being obtained by inter-coded compression.
In the embodiment of the present disclosure, the target picture group may be all picture groups or part of picture groups included in the original encoded data, and the specific determination manner is not limited. The target original coded data includes an original key frame and an original non-key frame set within the target group of pictures. The original key frame may be understood as a key frame encoded by the above-mentioned preset encoder, and the original non-key frame may be understood as a non-key frame encoded by the above-mentioned preset encoder. For a single target group of pictures, the set of original non-key frames may include all of the original non-key frames in the target group of pictures. The target original encoded data may also be understood as an original video frame sequence comprising original key frames and original non-key frames.
Step 202, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server.
For example, in order to reduce uplink network resource (such as bandwidth) consumption, the video frames in the target original encoded data may be selectively compressed according to the actual situation, and the server is responsible for performing super-resolution image processing on the compressed video frames, so as to improve the resolution of the compressed video frames and ensure the video quality of the video frames transmitted to the viewer user side.
In the embodiment of the disclosure, whether a certain frame group needs to be compressed may be determined by taking the frame group as a unit, and if the certain frame group needs to be compressed, the frame group may be determined as a target frame group, and a specific determination manner is not limited. Since the compression ratio of inter-frame coding is larger for a group of pictures than intra-frame coding, the size of key frames is typically much larger than for non-key frames, and therefore the key frames may be selected for compression preferentially, and for non-key frames within a group of pictures the number of frames that need to be compressed may be determined as a function of the actual situation. That is, for a target group of pictures, the original keyframes need to be compressed, and some of the original non-keyframes in the original non-keyframe set may be compressed.
In the embodiment of the disclosure, the number of original non-key frames, namely the target frame number, which need to be compressed can be determined according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side. Of course, more reference factors may be combined to comprehensively determine the target frame number, such as the resolutions of the original key frame and the original non-key frame, which is not limited in particular.
By way of example, uplink network conditions may include uplink network bandwidth, real-time throughput, and network congestion conditions, among others, without limitation in particular. Optionally, the uplink network condition may be predicted by the live ue itself, or may be predicted by the server and sent to the live ue. The super-resolution image processing capability of the server may include the number of computing resources that the server can currently allocate for the super-resolution image processing task, time consumption when the server performs super-resolution image processing on a single frame of video frame, and load condition of the server. In the embodiment of the disclosure, the target frame number meeting the current actual situation can be dynamically determined according to the uplink network condition and/or the super-resolution image processing capability of the server, and the specific determination mode is not limited. The target frame number is understood to be the number of original non-key frames in a single target group of pictures that need to be further compressed.
Step 203, determining an original non-key frame of a target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame.
For example, after determining the target frame number, an original non-key frame of the target frame number may be selected from the original non-key frame set, and the selected original non-key frame is used as the non-key frame to be processed. The selection mode of the non-key frames to be processed is not limited, and the original non-key frames with continuous target frames can be selected from the original non-key frame set to be used as the non-key frames to be processed; the original non-key frames of the target frame number can also be selected from the original non-key frame set at intervals to serve as the non-key frames to be processed. Optionally, for the interval selection manner, the sequence formed by the original non-key frames in the original non-key frame set may be divided according to the target frame number, so as to obtain a number of sub-sequences equal to the target frame number, where each sub-sequence selects one original non-key frame, and may be randomly selected or select the first original non-key frame in the sub-sequence.
Illustratively, the original key frame and the selected non-key frame to be processed are taken as video frames needing to be further compressed, and are compressed in a downsampling mode, so that the resolution is reduced, and the data volume is reduced. For example, after the encoding by the preset encoder, the resolutions of the original key frame and the original non-key frame are the first resolution, and after the downsampling, the resolutions of the compressed key frame and the compressed non-key frame are the second resolution, and the second resolution is smaller than the first resolution.
For example, when downsampling, downsampling may be performed according to a preset compression ratio, or the compression ratio may be dynamically determined according to the current actual situation. Optionally, determining a target compression ratio according to an uplink network condition of the live broadcast system and/or a super-resolution image processing capability of the server, and respectively downsampling the original key frame and the non-key frame to be processed based on the target compression ratio. The compression ratio can be understood as the ratio of the resolution of the image after downsampling to the resolution of the image before downsampling. Generally, the lower the sampling rate, the higher the compression ratio, and the smaller the data size of the compressed video frame, but the difficulty of super-resolution image processing increases, and the processing time is also increased. The embodiment of the disclosure dynamically determines the target compression ratio has the advantage that the compression ratio of the video frame is more reasonably dynamically determined according to the current actual situation.
Step 204, uploading a video frame sequence to a server, where the video frame sequence includes a compressed key frame, a compressed non-key frame, and remaining original non-key frames except for a non-key frame to be processed in an original non-key frame set, and the target video frame sequence includes a target key frame, a target non-key frame, and remaining original key frames.
Illustratively, after obtaining the compressed key frames and the compressed non-key frames, the video frames may be re-encoded to incorporate the downsampled video frames into the original video stream. The re-encoding mode may be to select a compressed key frame as a reference frame to encode a compressed non-key frame using inter-frame encoding, replace an original key frame with the compressed key frame, replace the original non-key frame with the compressed non-key frame, and retain original temporal and spatial features of the remaining original non-key frame.
Illustratively, after recoding, a video frame sequence including the compressed key frame, the compressed non-key frame and the remaining original non-key frames except the non-key frames to be processed in the original non-key frame set is obtained, and the video frame sequence is uploaded to the server. After receiving the video frame sequence, the server can extract the compressed key frame and the compressed non-key frame from the video frame sequence, perform super-resolution image processing on the compressed key frame to obtain a target key frame, and perform super-resolution image processing on the compressed non-key frame to obtain the target non-key frame. The resolution of the target key frame and the target non-key frame may be a third resolution, where the third resolution is greater than the second resolution, may be the same as or different from the first resolution, and may be determined by a super-resolution image processing policy of the server.
For example, a super-resolution model may be set at the server, and super-resolution image processing may be performed on the compressed key frames and the compressed non-key frames using the super-resolution model. The super-resolution model can be obtained through on-line training or off-line training, and is not particularly limited. For example, the compressed key frame and the compressed non-key frame are up-sampled by using the super-resolution model, so as to obtain the target key frame and the target non-key frame with the first resolution, that is, the resolution of the original video frame is restored.
For example, after obtaining the target key frame and the target non-key frame, the server may replace the compressed key frame in the video frame sequence with the target key frame, and replace the compressed non-key frame with the target non-key frame, to obtain the target video frame sequence including the target key frame, the target non-key frame, and the remaining original key frames. Then, the server can send the target video frame sequence to the audience user end, so that the smoothness of video playing and the high quality of video images are ensured.
According to the video processing method provided by the embodiment of the disclosure, a live user side in a live system acquires an original key frame and an original non-key frame set in a target picture group of a live video, determines a target frame number of an original non-key frame to be processed according to an uplink network condition of the live system and/or super-resolution image processing capability of a server side, respectively downsamples the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame, uploads a video frame sequence containing the compressed key frame, the compressed non-key frame and the non-compressed non-key frame to the server side, respectively carries out super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence by the server side, and sends the obtained target video frame sequence to a viewer user side. By adopting the technical scheme, the video can be compressed more flexibly, so that the video compression method is better suitable for the variability of a network, and further reduces the time consumption of end-to-end transmission, can ensure the video quality and reduce the live broadcast time delay.
In some embodiments, the determining the target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, and determining the target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, includes: and determining a target frame number and a target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server. The advantage of this arrangement is that the target frame number and the target compression ratio can be determined simultaneously by integrating the uplink network condition and the super-resolution image processing capability of the server.
For example, a model predictive control (Model Predictive Control, MPC) algorithm may be utilized to determine a target frame number and a target compression ratio based on the uplink network conditions of the live broadcast system and the super-resolution image processing capabilities of the server. The MPC algorithm is a multivariable feedback control strategy, and can be understood as a control algorithm based on the prediction of a controlled object, wherein the measured value obtained at each sampling moment is used as an initial condition for predicting the future dynamic state of the system at the current moment. In the embodiment of the disclosure, the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server, which are acquired at the current moment, can be used as initial conditions for predicting future dynamics of the live broadcast system, namely, the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server at the next moment are predicted, so that the target frame number and the target compression ratio to be adopted are determined.
In some embodiments, the determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server side includes: determining a target super-time-length threshold corresponding to the target picture group according to the uplink network condition of the live broadcast system; determining Shan Zhen superminute time length according to the super-resolution image processing capability of the server; determining a maximum processing frame number according to the target super-time-sharing length threshold and the Shan Zhen super-time-sharing length; and on the premise that the target frame number to be determined is smaller than the maximum processing frame number, determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server. The advantage of this is that the maximum number of processing frames is determined first, and the calculation complexity is reduced by constraining the selectable range of target frames by the maximum number of processing frames, i.e. reducing the number of combinations of frames and compression ratios.
The target super-time length threshold corresponding to the target frame group can be understood as the maximum time length which can be provided for the server side to perform super-resolution image processing on the single frame group, and if the time length is exceeded, higher delay exists in live stream transmission, so that the experience quality of a user is seriously affected. Optionally, a first preset mapping relationship may be preset, where the first preset mapping relationship includes a corresponding relationship between an uplink network condition and a super-time-length threshold, and the target super-time-length threshold may be obtained quickly by querying the first preset mapping relationship according to the uplink network condition of the current live broadcast system.
For example, a single frame superminute duration may be understood as the time consumed by a server in performing super-resolution image processing on a single frame of video frame. Optionally, a second preset mapping relationship may be preset, where the second preset mapping relationship includes a corresponding relationship between the super-resolution image processing capability of the server and the Shan Zhen super-resolution duration, and the current single-frame super-resolution duration may be obtained quickly by querying the second preset mapping relationship according to the super-resolution image processing capability of the server of the current live broadcast system.
For example, the maximum number of processing frames may be determined based on a quotient of the target superminute length threshold and the single frame superminute length.
Optionally, on the premise that the target frame number to be determined is smaller than the maximum processing frame number, determining the target frame number and the target compression ratio by using an MPC algorithm according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server. For example, it may specifically include: determining a target frame number according to the maximum processing frame number, determining a target compression ratio according to a candidate compression ratio, determining a live effect score after transmitting a target video frame sequence obtained by processing based on the target frame number and the target compression ratio to a user end of a spectator, and dynamically performing feedback adjustment on the target frame number and the target compression ratio according to the live effect score, an uplink network condition of a live system and super-resolution image processing capacity of a server. The feedback adjustment may include decreasing the target frame number, increasing the target compression ratio, or decreasing the target compression ratio.
Alternatively, the live effectiveness score may be measured by the quality of experience corresponding to the target video frame sequence received by the viewer user. The specific value of the experience quality can be calculated according to an experience quality function, parameters such as video quality, video smoothness, live broadcast time delay, and a cartoon duration can be included in the experience quality function, and the specific expression form of the experience quality function is not limited. The video quality can be determined by using a video quality multiparty fusion evaluation standard (Video Multimethod Assessment Fusion, VMAF).
In some embodiments, according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server, a preset mapping relation table is queried to obtain a target frame number and a target compression ratio; the preset mapping relation table comprises a corresponding relation between a first combination parameter and a second combination parameter, the first combination parameter comprises an uplink network condition and super-resolution image processing capability, and the second combination parameter comprises a frame number and a compression ratio. The advantage of this arrangement is that the correspondence between the combination of different uplink network conditions and super-resolution image processing capabilities and the combination of different frame numbers and compression ratios can be pre-established, and in the live broadcast process, the combination of the currently suitable frame numbers and compression ratios, that is, the target frame numbers and the target compression ratios, can be quickly determined by querying the preset mapping relation table.
The preset mapping table may be preset by the following method: and respectively determining the direct broadcast effect score obtained after video processing by adopting various second combination parameter values in the second combination parameters according to each first combination parameter value of the first combination parameters under the direct broadcast scene corresponding to the current first combination parameter value, and determining the second combination parameter value with the highest direct broadcast effect score as the second combination parameter value corresponding to the current first combination parameter value.
In some embodiments, further comprising: and sending compression information to the server side, wherein the compression information is used for indicating the server side to extract the compressed non-key frames from the video frame sequence according to the compression information, and the compression information comprises frame identification information of the compressed non-key frames. The advantage of this setting is that the live broadcast user side sends the compressed information to the server side to inform the server side of which frames are non-key frames after the downsampling process, the server side can rapidly extract the compressed non-key frames according to the frame identification information, and does not need to judge whether super-resolution image processing is needed or not frame by frame, so that the processing efficiency of the server side is improved. The frame identification information may be a frame number, such as a sequence number in the target frame group, which is not specifically limited. Other information, such as a target compression ratio, may be included in the compressed information, so that the service end can restore the compressed key frame and the compressed non-key frame to the original resolution.
In some embodiments, the acquiring target raw encoded data of the live video includes: determining whether to compress a current picture group in the live video according to the uplink network condition of the live system and/or the super-resolution image processing capacity of the server; if yes, determining the current picture group as a target picture group, and acquiring target original coding data corresponding to the target picture group. The method has the advantages that whether each picture group needs to be compressed or not can be dynamically determined according to the current actual situation by taking the picture group as a unit, so that more flexible video compression can be realized, and the method is better suitable for the variability of a network.
Fig. 3 is a flowchart of another video processing method according to an embodiment of the present disclosure, where the method is applied to a server in a live broadcast system, and may be executed by a video processing device corresponding to the server, where the device may be implemented by software and/or hardware, and may generally be integrated in an electronic device such as a server. As shown in fig. 3, the method includes:
step 301, receiving a video frame sequence sent by a live user side, wherein the video frame sequence comprises a compressed key frame, a compressed non-key frame and the rest original non-key frames except for the non-key frames to be processed in an original non-key frame set, the live user side is used for acquiring target original coding data of the live video, determining a target frame number according to an uplink network condition of a live system and/or super-resolution image processing capability of a service side, determining the original non-key frames of the target frame number from the original non-key frame set as the non-key frames to be processed, and respectively downsampling the original key frames and the non-key frames to be processed to obtain the compressed key frames and the compressed non-key frames, wherein the target original coding data comprises the original key frames and the original non-key frame sets in a target picture group.
The live user side obtains a video frame sequence after processing the target original coded data, and sends the video frame sequence to the server side, and the server side receives the video frame sequence sent by the live user side. The processing of the target original encoded data may be described with reference to the above related descriptions, and will not be repeated here.
And 302, respectively performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence to obtain target key frames and target non-key frames.
The server may extract a compressed key frame and a compressed non-key frame from the video frame sequence, perform super-resolution image processing on the compressed key frame to obtain a target key frame, and perform super-resolution image processing on the compressed non-key frame to obtain the target non-key frame.
Optionally, the server may be provided with a super-resolution model, and the super-resolution model is used to perform super-resolution image processing on the compressed key frames and the compressed non-key frames. The super-resolution model can be obtained through on-line training or off-line training, and is not particularly limited.
Step 303, determining a target video frame sequence according to the target key frame, the target non-key frame and the rest of the original non-key frames.
For example, a compressed key frame in a video frame sequence may be replaced with a target key frame, and a compressed non-key frame may be replaced with a target non-key frame, resulting in a target video frame sequence comprising the target key frame, the target non-key frame, and the remaining original key frames.
Step 304, the target video frame sequence is sent to the audience user terminal.
According to the video processing method provided by the embodiment of the disclosure, a live user side in a live system respectively downsamples an original key frame and a part of non-key frames according to current actual conditions to obtain a video frame sequence containing compressed key frames, compressed non-key frames and uncompressed non-key frames, the video frame sequence is sent to a server side, after receiving the video frame sequence, the server side respectively carries out super-resolution image processing on the compressed key frames and the compressed non-key frames, and sends the obtained target video frame sequence to a viewer user side. By adopting the technical scheme, the video can be compressed more flexibly, so that the video compression method is better suitable for the variability of a network, and further reduces the time consumption of end-to-end transmission, can ensure the video quality and reduce the live broadcast time delay.
In some embodiments, further comprising: and receiving compression information sent by the live broadcast user side, wherein the compression information comprises frame identification information of the compression non-key frames. The performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence respectively includes: performing super-resolution image processing on the compressed key frames in the video frame sequence; extracting the compressed non-key frames from the sequence of video frames according to the compressed information; and performing super-resolution image processing on the compressed non-key frames. The method has the advantages that the server can rapidly extract the compressed non-key frames according to the frame identification information, does not need to judge whether super-resolution image processing is needed or not frame by frame, and improves the processing efficiency of the server.
In some embodiments, the performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence respectively includes: predicting the current popularity of the live video according to the associated information of the live account corresponding to the live user side, wherein the associated information comprises historical play information and/or account attribute information; when the current popularity degree meets the preset requirement, adopting a first super-resolution model trained on line to respectively perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence; otherwise, performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence by adopting a second super-resolution model trained offline. The method has the advantages that the corresponding super-resolution model can be selected in a targeted manner for different live videos, so that the computing resources of the server side can be utilized and distributed more reasonably.
The performance of a single general super-resolution model is limited, and the super-resolution output effect of the model is very different for different types of video, and the super-resolution processing performance of a service end on compressed key frames and compressed non-key frames sent by a live user end can be further improved by training the corresponding super-resolution model on line through high-resolution video frame data sent by the live user end. Furthermore, due to the nature of live broadcast, all video content cannot be acquired in advance, and scene cuts in the live stream may occur at any time. Therefore, it is necessary to update the training data set in time to ensure the video quality enhancement effect of the super-resolution model. Because of various types of live broadcast contents, online training and quality enhancement by applying a super-resolution model are tasks with huge consumption of computing resources aiming at different types of live broadcast contents. Therefore, in the embodiment of the present disclosure, the current popularity of the live video may be predicted according to the associated information of the live account corresponding to the live user terminal, and the corresponding super-resolution model may be selected to perform the super-processing according to whether the current popularity meets the preset requirement.
For example, the associated information may include historical play information, where the historical play information may include a number of plays for a preset historical period (e.g., the last month), an average live time period, a maximum number of viewers per live, an average number of viewers per live, a number of audience utterances, and so on; the association information may also include account attribute information, which may include, for example, account levels, the number of people of interest in the account, and the duration of the account, among others. Alternatively, linear regression algorithms may be employed to predict popularity based on the correlation information.
The preset requirement may be that the current popularity is higher than the product of the number of all live accounts currently in the live state and a preset proportion, for example, the preset proportion may be 99%, and the current live video may be understood to be the top 1% of the ranking popular video.
In the embodiment of the disclosure, in order to improve the super-resolution processing performance of the super-resolution model on a specific live video, on-line training may be performed according to high-resolution video frame data (such as the above pre-original encoded data) sent by a live user terminal in a live process, so as to obtain a first super-resolution model, so that super-resolution image processing is performed on compressed key frames and compressed non-key frames in the live video with current popularity meeting preset requirements according to the first super-resolution model.
In the embodiment of the disclosure, offline training may be performed through a sample high-resolution image to obtain a general super-resolution model, and the general super-resolution model is used as a second super-resolution model. For example, the sample high-resolution image may be downsampled to obtain a sample low-resolution image, then the sample low-resolution image is input into a preset super-resolution model, and the output image of the preset super-resolution model and the corresponding sample high-resolution image are subjected to calculation of a loss function, so as to realize offline training of the super-resolution model, and obtain a second super-resolution model.
Optionally, the popularity may be further divided, for example, when the current popularity does not meet the first preset requirement (i.e. the preset requirement), whether the popularity is the second preset requirement (which is less than the first preset requirement) may be continuously determined, if yes, the second super-resolution model trained offline is adopted to perform super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence, and if not, the compressed key frame and the compressed non-key frame in the video frame sequence are prohibited from performing super-resolution image processing, and instead, the video frame sequence is directly determined to be the target video frame sequence and sent to the viewer user terminal.
Fig. 4 is a schematic flow chart of another video processing method according to an embodiment of the disclosure, where the method is optimized based on the foregoing optional embodiments, and the method may include:
step 401, the live user side acquires target original coding data of a live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group.
Fig. 5 is a schematic diagram of a video frame processing procedure of a live user side according to an embodiment of the present disclosure. Exemplary, it is assumed that the original video frame sequence of a target group of pictures obtained by encoding by the preset encoder contains 1 high-resolution original key frame sum (denoted as KF) H ) 5 high resolution original non-key frames (denoted as NKF H )。
Step 402, the live user side determines the target frame number and the target compression ratio according to the uplink network condition of the live system and the super-resolution image processing capability of the server side.
For example, assume that the target frame number is determined to be 2 and the target compression ratio is 1/2 according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server side.
Step 403, the live user side determines an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, respectively downsamples the original key frame and the non-key frame to be processed based on the target compression ratio to obtain a compressed key frame and a compressed non-key frame, and records the frame identification information containing the compressed non-key frame and the compression information of the target compression ratio.
For example, an original key frame and an original non-key frame are extracted from the target picture group, 2 original non-key frames are selected from 5 original non-key frames to be used as non-key frames to be processed, and if the frame numbers of the selected original non-key frames are 4 and 5, the frame numbers 4 and 5 are recorded as compression information. Decoding and downsampling the original key frame and 2 non-key frames to be processed to obtain a low resolution compressed key frame (denoted KF) L ) And 2 low resolution compressed non-key frames(denoted as NKF) L ) The remaining 3 original non-key frames are temporarily left unprocessed.
Step 404, the live user side determines a video frame sequence according to the compressed key frame, the compressed non-key frame and the rest original non-key frames except the non-key frames to be processed in the original non-key frame set.
Illustratively, the compressed key frames and the compressed non-key frames are re-encoded, the original key frames in the original video frame sequence are replaced by the re-encoded compressed key frames, and the original non-key frames in the original video frame sequence are replaced by the re-encoded compressed non-key frames, so that the video frame sequence is re-encapsulated (repacking) and a video frame sequence is obtained.
Step 405, the live user side uploads the video frame sequence and the compressed information to the server side.
Step 406, the server extracts the compressed key frame from the video frame sequence, and extracts the compressed non-key frame from the video frame sequence according to the compressed information.
Fig. 6 is a schematic diagram of a video frame processing procedure of a server according to an embodiment of the disclosure. The server can extract each video frame from the received video frame sequence, decode the compressed key frame and the compressed non-key frames with the serial numbers of 4 and 5, and do no processing for the original non-key frames.
And step 407, the server predicts the current popularity of the live video according to the associated information of the live account corresponding to the live user terminal, and determines the adopted super-resolution model according to whether the current popularity meets the preset requirement.
The popularity degree is determined according to the historical play record of the live account number and the related information such as the number of the fans, and if the popularity degree is high and meets the preset requirement, a super-resolution model of online training can be adopted.
And 408, the server side adopts the determined super-resolution model to respectively perform super-resolution image processing on the compressed key frame and the compressed non-key frame to obtain a target key frame and a target non-key frame.
Exemplary, the respective pressures are opposite based on the target compression ratioThe compressed key frame and 2 compressed non-key frames are processed by super-resolution image processing, and restored to high-resolution target Key Frame (KF) H ) And high resolution target non-key frames (NKFs) H ) From recoding.
Step 409, the server determines a target video frame sequence according to the target key frame, the target non-key frame and the remaining original non-key frames.
By way of example, the compressed key frames and the compressed non-key frames in the video frame sequence are replaced by the high-resolution target key frames and the target non-key frames obtained through super-division up-sampling, so that the quality and fluency of video playing are ensured, the audience is difficult to feel the encoding processing trace, and the player at the user end of the audience does not need to be modified.
Step 410, the server sends the target video frame sequence to the viewer client.
According to the video processing method provided by the embodiment of the disclosure, a live user side in a live system acquires an original key frame and an original non-key frame set in a target picture group of a live video, a target frame number and a target compression ratio of the original non-key frame to be processed are determined according to an uplink network condition of the live system and a super-resolution image processing capability of a server side, the original key frame and the non-key frame to be processed are respectively downsampled to obtain the compressed key frame and the compressed non-key frame, a video frame sequence containing the compressed key frame, the compressed non-key frame and the non-compressed non-key frame is uploaded to the server side, the server side dynamically selects a proper super-resolution model according to the hot degree of the live video to respectively perform super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence, and the obtained target video frame sequence is sent to the user side of a viewer, so that the video can be compressed more flexibly, the variability of the network can be better adapted, the transmission time consumption from the terminal to the terminal can be further reduced, and the live video quality can be ensured. For example, experiments were performed on 90 4G bandwidth datasets by related art and schemes of the present disclosure, and the results shown in table 1 were obtained. The scheme 1 of the related art is a scheme that a live user uploads a low-resolution video frame (including a key frame and a non-key frame) and a server performs super-division on each video frame, and the scheme 2 of the related art is a scheme that the live user only compresses the key frame and the server performs super-division on the key frame.
Table 1 comparison table of experimental data
Referring to table 1, compared with the scheme of related art 1, the video quality of the scheme of the present disclosure is improved by 7.06%, the live time delay is reduced by 41.80%, and the QoE is improved by 58.97%; compared with the scheme of the related art 1, the video quality of the scheme is improved by 2.03%, the live broadcast time delay is reduced by 18.31%, and the QoE is improved by 16.88%.
Fig. 7 is a block diagram of a video processing apparatus according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device as a live user terminal, and may perform video processing by performing a video processing method. As shown in fig. 7, the apparatus includes:
the original data acquisition module 701 is configured to acquire target original encoded data of a live video, where the target original encoded data includes an original key frame and an original non-key frame set in a target picture group;
a target frame number determining module 702, configured to determine a target frame number according to an uplink network condition of the live broadcast system and/or a super-resolution image processing capability of the server;
a downsampling module 703, configured to determine an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and downsample the original key frame and the non-key frame to be processed respectively to obtain a compressed key frame and a compressed non-key frame;
The video frame sequence uploading module 704 is configured to upload a video frame sequence to the server, instruct the server to perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence, respectively, to obtain a target key frame and a target non-key frame, and send the target video frame sequence to the viewer user, where the video frame sequence includes the compressed key frame, the compressed non-key frame, and the remaining original non-key frames except the non-key frames to be processed in the original non-key frame set, and the target video frame sequence includes the target key frame, the target non-key frame, and the remaining original key frames.
The video processing device provided by the embodiment of the disclosure can compress videos more flexibly, so that the video processing device is better suitable for network variability, and further reduces the time consumption of end-to-end transmission, and can improve the video quality and reduce the live broadcast time delay.
Optionally, the target frame number determining module is further configured to: and determining a target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server. The step of respectively downsampling the original key frame and the non-key frame to be processed includes: and respectively downsampling the original key frame and the non-key frame to be processed based on the target compression ratio.
Optionally, the target frame number determining module is specifically configured to: and determining a target frame number and a target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server.
Optionally, the target frame number determining module includes:
a target super-time-length threshold determining unit, configured to determine a target super-time-length threshold corresponding to the target picture group according to an uplink network condition of the live broadcast system;
shan Zhen super-resolution time length determining unit, configured to determine Shan Zhen super-resolution time length according to super-resolution image processing capability of the server;
a maximum processing frame number determining unit, configured to determine a maximum processing frame number according to the target super-time length threshold and the Shan Zhen super-time length;
and the frame number and compression ratio determining unit is used for determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server on the premise that the target frame number to be determined is smaller than the maximum processing frame number.
Optionally, the apparatus further includes: the compressed information sending module is used for sending compressed information to the server and indicating the server to extract the compressed non-key frames from the video frame sequence according to the compressed information, wherein the compressed information comprises frame identification information of the compressed non-key frames.
Optionally, the raw data acquisition module includes:
the compression judging unit is used for determining whether to compress the current picture group in the live video according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
and the data acquisition unit is used for determining the current picture group as a target picture group and acquiring target original coding data corresponding to the target picture group when the judgment result of the compression judgment unit is yes.
Fig. 8 is a block diagram of another video processing apparatus according to an embodiment of the present disclosure, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device as a server, and may perform video processing by performing a video processing method. As shown in fig. 8, the apparatus includes:
a video frame sequence receiving module 801, configured to receive a video frame sequence sent by the live broadcast client, where the video frame sequence includes a compressed key frame, a compressed non-key frame, and remaining original non-key frames in an original non-key frame set, except for a non-key frame to be processed; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
An image processing module 802, configured to perform super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence, so as to obtain a target key frame and a target non-key frame;
a target video frame sequence determining module 803, configured to determine a target video frame sequence according to the target key frame, the target non-key frame, and the remaining original non-key frames;
and the target video frame sequence sending module 804 is configured to send the target video frame sequence to the viewer user side.
Optionally, the apparatus further includes: the compressed information receiving module is used for receiving compressed information sent by the live broadcast user side, wherein the compressed information comprises frame identification information of the compressed non-key frames; wherein the performing super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence respectively includes: performing super-resolution image processing on the compressed key frames in the video frame sequence; extracting the compressed non-key frames from the sequence of video frames according to the compressed information; and performing super-resolution image processing on the compressed non-key frames.
Optionally, the image processing module includes:
the current trending degree determining unit is used for predicting the current trending degree of the live video according to the association information of the live account corresponding to the live user side, wherein the association information comprises historical play information and/or account attribute information;
the image processing unit is used for respectively performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence by adopting a first super-resolution model trained on line when the current hot degree meets the preset requirement to obtain a target key frame and a target non-key frame; otherwise, performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence by adopting a second super-resolution model trained offline to obtain target key frames and target non-key frames.
Referring now to fig. 9, a schematic diagram of an electronic device 900 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 9 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 9, the electronic device 900 may include a processing means (e.g., a central processor, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage means 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are also stored. The processing device 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
In general, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication means 909 may allow the electronic device 900 to communicate wirelessly or by wire with other devices to exchange data. While fig. 9 shows an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 909, or installed from the storage device 908, or installed from the ROM 902. When executed by the processing device 901, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement any of the video processing methods provided by the embodiments of the present disclosure.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented in software or hardware. The name of the module is not limited to the module itself in some cases, for example, the target frame number determining module may also be described as "a module for determining a target frame number according to an uplink network condition of the live broadcast system and/or a super resolution image processing capability of the server".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, a video processing method is provided, which is applied to a live user side in a live system, where the live system further includes a server side and a viewer user side, and the method includes:
acquiring target original coding data of a live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group;
determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame;
uploading a video frame sequence to the server, wherein the video frame sequence comprises the compressed key frame, the compressed non-key frame and the rest original non-key frames except the non-key frames to be processed in the original non-key frame set, and the target video frame sequence comprises the target key frame, the target non-key frame and the rest original key frames.
Further, the method further comprises the following steps: determining a target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
the step of respectively downsampling the original key frame and the non-key frame to be processed includes: and respectively downsampling the original key frame and the non-key frame to be processed based on the target compression ratio.
Further, the determining the target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, and determining the target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side includes:
and determining a target frame number and a target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server.
Further, the determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server side includes:
determining a target super-time-length threshold corresponding to the target picture group according to the uplink network condition of the live broadcast system;
Determining Shan Zhen superminute time length according to the super-resolution image processing capability of the server;
determining a maximum processing frame number according to the target super-time-sharing length threshold and the Shan Zhen super-time-sharing length;
and on the premise that the target frame number to be determined is smaller than the maximum processing frame number, determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server.
Further, the method further comprises the following steps: and sending compression information to the server side, wherein the compression information is used for indicating the server side to extract the compressed non-key frames from the video frame sequence according to the compression information, and the compression information comprises frame identification information of the compressed non-key frames.
Further, the acquiring the target original encoded data of the live video includes:
determining whether to compress a current picture group in the live video according to the uplink network condition of the live system and/or the super-resolution image processing capacity of the server;
if yes, determining the current picture group as a target picture group, and acquiring target original coding data corresponding to the target picture group.
According to one or more embodiments of the present disclosure, a video processing method is provided, which is applied to a server in a live broadcast system, where the live broadcast system further includes a live broadcast user side and a viewer user side, and the method includes:
Receiving a video frame sequence sent by the live broadcast user side, wherein the video frame sequence comprises compressed key frames, compressed non-key frames and residual original non-key frames except for the non-key frames to be processed in an original non-key frame set; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence respectively to obtain target key frames and target non-key frames;
determining a target video frame sequence according to the target key frame, the target non-key frame and the residual original non-key frames;
And sending the target video frame sequence to the audience user side.
Further, the method further comprises the following steps:
receiving compression information sent by the live broadcast user side, wherein the compression information comprises frame identification information of the compression non-key frames;
wherein the performing super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence respectively includes:
performing super-resolution image processing on the compressed key frames in the video frame sequence;
extracting the compressed non-key frames from the sequence of video frames according to the compressed information;
and performing super-resolution image processing on the compressed non-key frames.
Further, the performing super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence respectively includes:
predicting the current popularity of the live video according to the associated information of the live account corresponding to the live user side, wherein the associated information comprises historical play information and/or account attribute information;
when the current popularity degree meets the preset requirement, adopting a first super-resolution model trained on line to respectively perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence; otherwise, performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence by adopting a second super-resolution model trained offline.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).
Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (13)

1. The video processing method is characterized by being applied to a live broadcast user side in a live broadcast system, wherein the live broadcast system also comprises a service side and a viewer user side, and the method comprises the following steps:
acquiring target original coding data of a live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group;
determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain a compressed key frame and a compressed non-key frame;
uploading a video frame sequence to the server, wherein the video frame sequence comprises the compressed key frame, the compressed non-key frame and the rest original non-key frames except the non-key frames to be processed in the original non-key frame set, and the target video frame sequence comprises the target key frame, the target non-key frames and the rest original non-key frames.
2. The method as recited in claim 1, further comprising:
determining a target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
the step of respectively downsampling the original key frame and the non-key frame to be processed includes:
and respectively downsampling the original key frame and the non-key frame to be processed based on the target compression ratio.
3. The method according to claim 2, wherein the determining the target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining the target compression ratio according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, comprises:
and determining a target frame number and a target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server.
4. The method of claim 3, wherein the determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capability of the server side comprises:
Determining a target super-time-length threshold corresponding to the target picture group according to the uplink network condition of the live broadcast system;
determining Shan Zhen superminute time length according to the super-resolution image processing capability of the server;
determining a maximum processing frame number according to the target super-time-sharing length threshold and the Shan Zhen super-time-sharing length;
and on the premise that the target frame number to be determined is smaller than the maximum processing frame number, determining the target frame number and the target compression ratio according to the uplink network condition of the live broadcast system and the super-resolution image processing capacity of the server.
5. The method as recited in claim 1, further comprising:
and sending compression information to the server side, wherein the compression information is used for indicating the server side to extract the compressed non-key frames from the video frame sequence according to the compression information, and the compression information comprises frame identification information of the compressed non-key frames.
6. The method of claim 1, wherein the obtaining the target raw encoded data of the live video comprises:
determining whether to compress a current picture group in the live video according to the uplink network condition of the live system and/or the super-resolution image processing capacity of the server;
If yes, determining the current picture group as a target picture group, and acquiring target original coding data corresponding to the target picture group.
7. The video processing method is characterized by being applied to a service end in a live broadcast system, wherein the live broadcast system also comprises a live broadcast user end and a audience user end, and the method comprises the following steps:
receiving a video frame sequence sent by the live broadcast user side, wherein the video frame sequence comprises compressed key frames, compressed non-key frames and residual original non-key frames except for the non-key frames to be processed in an original non-key frame set; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
Performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence respectively to obtain target key frames and target non-key frames;
determining a target video frame sequence according to the target key frame, the target non-key frame and the residual original non-key frames;
and sending the target video frame sequence to the audience user side.
8. The method as recited in claim 7, further comprising:
receiving compression information sent by the live broadcast user side, wherein the compression information comprises frame identification information of the compression non-key frames;
wherein the performing super-resolution image processing on the compressed key frame and the compressed non-key frame in the video frame sequence respectively includes:
performing super-resolution image processing on the compressed key frames in the video frame sequence;
extracting the compressed non-key frames from the sequence of video frames according to the compressed information;
and performing super-resolution image processing on the compressed non-key frames.
9. The method of claim 7, wherein the super-resolution image processing of the compressed key frames and the compressed non-key frames in the sequence of video frames, respectively, comprises:
Predicting the current popularity of the live video according to the associated information of the live account corresponding to the live user side, wherein the associated information comprises historical play information and/or account attribute information;
when the current popularity degree meets the preset requirement, adopting a first super-resolution model trained on line to respectively perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence; otherwise, performing super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence by adopting a second super-resolution model trained offline.
10. The video processing device is characterized by being configured at a live broadcast user side in a live broadcast system, wherein the live broadcast system also comprises a service side and a viewer user side, and the device comprises:
the original data acquisition module is used for acquiring target original coding data of the live video, wherein the target original coding data comprises an original key frame and an original non-key frame set in a target picture group;
the target frame number determining module is used for determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capacity of the server;
The downsampling module is used for determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and downsampling the original key frame and the non-key frame to be processed respectively to obtain a compressed key frame and a compressed non-key frame;
the video frame sequence uploading module is used for uploading a video frame sequence to the server and indicating the server to respectively perform super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence to obtain target key frames and target non-key frames, and sending the target video frame sequence to the audience user side, wherein the video frame sequence comprises the compressed key frames, the compressed non-key frames and the residual original non-key frames except the non-key frames to be processed in an original non-key frame set, and the target video frame sequence comprises the target key frames, the target non-key frames and the residual original non-key frames.
11. The video processing device is characterized by being configured at a service end in a live broadcast system, wherein the live broadcast system also comprises a live broadcast user end and a audience user end, and the device comprises:
The video frame sequence receiving module is used for receiving a video frame sequence sent by the live broadcast user side, wherein the video frame sequence comprises compressed key frames, compressed non-key frames and residual original non-key frames except for the non-key frames to be processed in an original non-key frame set; the live broadcast user side is used for acquiring target original coding data of a live broadcast video, determining a target frame number according to the uplink network condition of the live broadcast system and/or the super-resolution image processing capability of the server side, determining an original non-key frame of the target frame number from the original non-key frame set as a non-key frame to be processed, and respectively downsampling the original key frame and the non-key frame to be processed to obtain the compressed key frame and the compressed non-key frame, wherein the target original coding data comprises the original key frame and the original non-key frame set in a target picture group;
the image processing module is used for respectively carrying out super-resolution image processing on the compressed key frames and the compressed non-key frames in the video frame sequence to obtain target key frames and target non-key frames;
a target video frame sequence determining module, configured to determine a target video frame sequence according to the target key frame, the target non-key frame, and the remaining original non-key frames;
And the target video frame sequence sending module is used for sending the target video frame sequence to the audience user side.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-9 when the computer program is executed by the processor.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-9.
CN202111669381.9A 2021-12-31 2021-12-31 Video processing method, device, equipment and storage medium Active CN114363649B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111669381.9A CN114363649B (en) 2021-12-31 2021-12-31 Video processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111669381.9A CN114363649B (en) 2021-12-31 2021-12-31 Video processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114363649A CN114363649A (en) 2022-04-15
CN114363649B true CN114363649B (en) 2024-02-09

Family

ID=81106248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111669381.9A Active CN114363649B (en) 2021-12-31 2021-12-31 Video processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114363649B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900717B (en) * 2022-05-13 2023-09-26 杭州网易智企科技有限公司 Video data transmission method, device, medium and computing equipment
CN117176979B (en) * 2023-04-24 2024-05-03 青岛尘元科技信息有限公司 Method, device, equipment and storage medium for extracting content frames of multi-source heterogeneous video
CN117058002B (en) * 2023-10-12 2024-02-02 深圳云天畅想信息科技有限公司 Video frame super-resolution reconstruction method and device and computer equipment
CN117896552A (en) * 2024-03-14 2024-04-16 浙江华创视讯科技有限公司 Video conference processing method, video conference system and related device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938656A (en) * 2010-09-27 2011-01-05 上海交通大学 Video coding and decoding system based on keyframe super-resolution reconstruction
CN109660796A (en) * 2018-11-09 2019-04-19 建湖云飞数据科技有限公司 The method that a kind of pair of video frame is encoded
CN110062232A (en) * 2019-04-01 2019-07-26 杭州电子科技大学 A kind of video-frequency compression method and system based on super-resolution
CN113115067A (en) * 2021-04-19 2021-07-13 脸萌有限公司 Live broadcast system, video processing method and related device
WO2021193648A1 (en) * 2020-03-25 2021-09-30 株式会社ソニー・インタラクティブエンタテインメント Image processing device and server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769681B (en) * 2018-06-20 2022-06-10 腾讯科技(深圳)有限公司 Video encoding method, video decoding method, video encoding apparatus, video decoding apparatus, computer device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101938656A (en) * 2010-09-27 2011-01-05 上海交通大学 Video coding and decoding system based on keyframe super-resolution reconstruction
CN109660796A (en) * 2018-11-09 2019-04-19 建湖云飞数据科技有限公司 The method that a kind of pair of video frame is encoded
CN110062232A (en) * 2019-04-01 2019-07-26 杭州电子科技大学 A kind of video-frequency compression method and system based on super-resolution
WO2021193648A1 (en) * 2020-03-25 2021-09-30 株式会社ソニー・インタラクティブエンタテインメント Image processing device and server
CN113115067A (en) * 2021-04-19 2021-07-13 脸萌有限公司 Live broadcast system, video processing method and related device

Also Published As

Publication number Publication date
CN114363649A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN114363649B (en) Video processing method, device, equipment and storage medium
CN113115067A (en) Live broadcast system, video processing method and related device
US11057657B2 (en) Methods, systems, processor and computer code for providing video clips
CA2742111C (en) Video conference rate matching
US20150156557A1 (en) Display apparatus, method of displaying image thereof, and computer-readable recording medium
US10771789B2 (en) Complexity adaptive rate control
WO2018010662A1 (en) Video file transcoding method and device, and storage medium
TW201404170A (en) Techniques for adaptive video streaming
US10187648B2 (en) Information processing device and method
KR101350915B1 (en) Multi-view video steaming system and providing method thereof
US20180184089A1 (en) Target bit allocation for video coding
US11025987B2 (en) Prediction-based representation selection in video playback
CN113906764B (en) Method, apparatus and computer readable medium for transcoding video
US9877056B1 (en) Compressed media with still images selected from a video stream
AU2018250308B2 (en) Video compression using down-sampling patterns in two phases
CN113302928A (en) System and method for transmitting multiple video streams
Li et al. A super-resolution flexible video coding solution for improving live streaming quality
US20120033727A1 (en) Efficient video codec implementation
JP2020524450A (en) Transmission system for multi-channel video, control method thereof, multi-channel video reproduction method and device thereof
JP6483850B2 (en) Data processing method and apparatus
CN115706829A (en) Multi-window video communication method, device and system
CN117291810B (en) Video frame processing method, device, equipment and storage medium
CN113038277B (en) Video processing method and device
KR20090086715A (en) Apparatus and method for displaying thumbnail image
Kobayashi et al. A Low-Latency 4K HEVC Multi-Channel Encoding System with Content-Aware Bitrate Control for Live Streaming

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant