WO2022206960A1 - 视频转码方法、系统及电子设备 - Google Patents

视频转码方法、系统及电子设备 Download PDF

Info

Publication number
WO2022206960A1
WO2022206960A1 PCT/CN2022/084838 CN2022084838W WO2022206960A1 WO 2022206960 A1 WO2022206960 A1 WO 2022206960A1 CN 2022084838 W CN2022084838 W CN 2022084838W WO 2022206960 A1 WO2022206960 A1 WO 2022206960A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video frame
frame sequence
framework
sequence
Prior art date
Application number
PCT/CN2022/084838
Other languages
English (en)
French (fr)
Inventor
高艳
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2022206960A1 publication Critical patent/WO2022206960A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display

Definitions

  • the present disclosure relates to the technical field of image transcoding, and in particular, to a video transcoding method, system, and electronic device.
  • the purpose of the present disclosure is to provide a video transcoding method, system and electronic device.
  • the present disclosure provides a video transcoding method, including:
  • a transcoded video is sequentially generated according to the second video frame sequence, and the transcoded video is output.
  • the present disclosure also provides a video transcoding system, including:
  • a decoding unit for decoding the input video to generate a first video frame sequence
  • the super-resolution enhancement unit is used to pre-process all the data of each frame of the first video frame sequence obtained in sequence by using a parallel computing framework, and use a deep learning inference framework to perform pre-processing on the pre-processed first video frame sequence.
  • Transcoding model calculation to generate a second video frame sequence after transcoding;
  • An encoding output unit configured to generate a transcoded video in sequence according to the second video frame sequence, and output the transcoded video.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the program, the processor implements any of the above method described.
  • a video transcoding method, system and electronic device include: decoding an input video to generate a first video frame sequence; using a parallel computing framework to sequentially acquire the first video frame All the data of each frame in the sequence is pre-processed, and the deep learning inference framework is used to calculate the transcoding model for the pre-processed first video frame sequence to generate the transcoded second video frame sequence; according to the second video frame sequence Generate the transcoded video in sequence, and output the transcoded video.
  • One or more embodiments of this specification utilize a combination of a parallel computing framework and a deep learning inference framework to make full use of graphics processor resources.
  • FIG. 1 is a schematic flowchart of a video transcoding method proposed by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a denoising network framework of a video transcoding method proposed by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a super-division network framework of a video transcoding method proposed by an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a synchronization control flow of a video transcoding method proposed by an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a video transcoding system framework in a specific application scenario proposed by an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a data processing flow of an AI super-resolution enhancement unit in a specific application scenario proposed by an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of a video transcoding system proposed by an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure proposes a video transcoding scheme, which utilizes a combination of a parallel computing framework and a deep learning inference framework to make full use of graphics processing unit (GPU, Graphics Processing Unit) resources.
  • graphics processing unit GPU, Graphics Processing Unit
  • data transmission is not required between inference frameworks, and data is shared, which reduces the redundancy caused by multiple data transmissions.
  • the video frame is only transmitted once, and the rest of the operations are completed in the graphics processor through the parallel computing framework and the deep learning inference framework, which greatly improves the effective utilization of resources and data processing speed.
  • FIG. 1 it is a schematic flowchart of a video transcoding method according to an embodiment of the present specification, which specifically includes the following steps:
  • Step 101 Decode the input video to generate a first video frame sequence.
  • the purpose of this step is to decode the video to form video frames one by one to prepare for the subsequent video frame super-division.
  • the format of the video can be wma, rm, avi, mod and so on.
  • the decoding method can be hardware decoding or software decoding, etc.
  • the decoding tool can be FFmpeg (Fast Forward Mpeg), MPEG-4, DivX, etc.
  • the video frame sequence is a sequence in which each video frame is arranged in the order of playback time nodes after the video is decoded.
  • video sampling may be performed when the video frame sequence is generated, for example, the decoded video frames are automatically sampled to 1080p. That is, the decoding the input video to generate the first video frame sequence includes: decoding the input video through an audio and video processing program; sampling the decoded video frame into a set video display format, and generating the first video frame sequence. A sequence of video frames.
  • sampling may not be performed at all when generating the first video frame sequence, and the final video frame sequence is generated after decoding is completed.
  • the audio and video processing program is an FFMpeg (Fast Forward Mpeg) program.
  • Step 102 using a parallel computing framework to perform preprocessing on all the data of each frame in the sequence of the first video frame obtained in sequence, and using a deep learning inference framework to perform a transcoding model calculation on the preprocessed first video frame sequence, A transcoded second sequence of video frames is generated.
  • parallel computing refers to the process of using multiple computing resources to solve computing problems at the same time, and it is an effective means to improve the computing speed and processing capacity of computer systems. Its basic idea is to use multiple processors to solve the same problem collaboratively, that is, to decompose the problem to be solved into several parts, and each part is calculated in parallel by an independent processor.
  • a parallel computing system can be either a specially designed supercomputer with multiple processors, or a cluster of several independent computers interconnected in some way.
  • Common parallel computing frameworks are: MPI, OPENMP, OPENCL (Open Computing Language, Open Computing Language), OPENGL (Open Graphics Library, Open Graphics Library), CUDA (Compute Unified Device Architecture) and so on.
  • MPI Motion Imaging
  • OPENMP Open Computing Language
  • OPENCL Open Computing Language
  • OPENGL Open Graphics Library
  • CUDA Computer Unified Device Architecture
  • Common deep learning inference frameworks are: TensorRT, OpenVINO, NCNN, MNN, etc.
  • the parallel computing framework uses the parallel computing framework to pre-process all the data of each frame is a process of standardizing the data of each frame, which may be a process of normalizing and/or converting the data of each frame, for example, Convert 0-255 range uint8 data to 0-1 range float32 data.
  • the CUDA framework is selected from the parallel computing framework. That is, the parallel computing framework is a CUDA framework; the preprocessing includes: performing normalization and type conversion operations on the first video frame sequence through the CUDA framework.
  • the deep learning inference framework is used to load different models for model inference calculation to complete the process of transcoding and super-score.
  • the loaded model can be a denoising model, a super-score model, a detail enhancement model, and so on.
  • the calculation of the denoising model and the calculation of the super-score model must be carried out in order.
  • the model can choose whether to join or not according to the specific situation.
  • further convolution and/or deconvolution operations may be performed for a set number of times.
  • the deep learning inference framework is the tensorRT framework
  • the use of the deep learning inference framework to perform the transcoding model calculation on the pre-processed first video frame sequence includes: using the tensorRT framework for the pre-processed first video frame sequence.
  • a sequence of video frames is calculated in the order of denoising model, detail enhancement model and/or super-score model; and at least one convolution and/or deconvolution is performed in the de-noising model, detail enhancement model and/or super-score model product operation to complete the calculation of the transcoding model.
  • FIG. 2 it is a schematic diagram of a denoising network framework of a video transcoding method
  • FIG. 3 it is a schematic diagram of a super-division network framework of a video transcoding method.
  • the two network frameworks respectively perform convolution and/or deconvolution operations on the basis of the existing denoising network framework and super-resolution network framework.
  • the denoising network framework first follow the current After the denoising network completes the data processing, the processed data is subjected to 4 convolutions as shown in Figure 2, and then output, and finally the denoising processing is completed.
  • the Gaussian kernel of each layer of convolution is 3*3, and the number of feature maps is 64.
  • real-time video super-resolution processing of high-speed 4K can be realized, including the entire process of video encoding, super-resolution enhancement, and video encoding, and the overall processing time of each frame is 0.04s. If the complexity of the model is high, the number of GPUs can be increased accordingly to achieve real-time processing.
  • the GPU processing space is maximized.
  • the pre-processing of using a parallel computing framework to sequentially acquire all the data of each frame in the first video frame sequence includes: identifying the processing space of a currently available graphics processor, and determining the first video frame according to the processing space.
  • a distribution amount of a video frame sequence, and each frame in the first video frame sequence is distributed in sequence according to the distribution amount. That is, before distributing the video frame sequence, first determine the amount of GPU space currently available, and then divide the amount of GPU space according to the amount of processing models to be performed later.
  • the current GPU space is divided into two parts, one part is denoised and the other part is over-segmented. Then, since the processing of video frames needs to be de-noised and then over-segmented, the amount of space allocated according to the denoising model Determine the distribution amount of each video frame sequence, and then distribute the video frames sequentially according to the distribution amount. Among them, in order to realize the sequence distribution, sequence processing and other processes of video frames. Data needs to be synchronously controlled. In a specific application scenario, a semaphore can be used for synchronous control, as shown in FIG. 4 , which is a schematic flowchart of a synchronous control.
  • sem_wait stands for waiting for the semaphore. If the semaphore is 0, it will be suspended, if it is 1, it will be decremented by one.
  • the specific steps are as follows: (1) Wait for the shared variable 1 to be readable, that is, wait for the end of the previous thread to assign the result to the shared variable 1; (2) Get the value from the shared variable 1; (3) Release the shared variable 1 to be writable , which tells the previous thread that the shared variable 1 can be assigned a value. (4) Processing data. (5) Waiting for the shared variable 2 to be writable, that is, waiting for the next thread to take the previous value of the shared variable 2. (6) Assign value to shared variable 2.
  • Step 103 Generate a transcoded video in sequence according to the second video frame sequence, and output the transcoded video.
  • the purpose of this step is to arrange and re-encode the super-divided video frame sequence into a super-divided video and output it, so that the user can view the super-divided transcoded video or reprocess the transcoded video.
  • the process of generating the transcoded video in sequence according to the sequence of video frames may be to first collect the transcoded video frames, arrange them in sequence, and then forward them to the video coding software for video coding to generate the transcoded video; Or directly give it to the video encoding software after the transcoding of each video frame is completed. Since the video transcoding itself is also performed in sequence, the encoding only needs to be encoded according to the receiving order to generate the transcoded video. .
  • the process of sorting processing in this step may be similar to the synchronization control method in the specific application scenario of step 102 .
  • the same audio and video processing program as in step 101 can be used to perform, and meanwhile, in order to reduce the number of data transmissions between transcoding and encoding, to reduce transmission redundancy.
  • the generating the transcoded video in sequence according to the second video frame sequence includes: acquiring all the second video frame sequence, and performing re-encoding in sequence according to the audio and video processing program. That is, after all the video frames are transcoded, they are arranged in order and sent to the encoding program for encoding.
  • the transcoded video to store, display, use or reprocess the transcoded video.
  • the specific output mode of the transcoded video can be flexibly selected.
  • the transcoded video can be directly output on the display component (display, projector, etc.) of the current device in a display mode, so that the operator of the current device can See the content of the transcoded video directly from the display widget.
  • the transcoded video may be sent to the system through any data communication method (wired connection, NFC, Bluetooth, wifi, cellular mobile network, etc.).
  • Other preset devices in the system as receivers, so that the preset device that receives the transcoded video can perform subsequent processing on it.
  • the preset device may be a preset server, and the server is generally set in the cloud as a data processing and storage center, which can store and distribute the transcoded video; wherein, the recipient of the distribution is a terminal device, The holder or operator of these terminal devices may be the current user, the relevant personnel of the subsequent video processing, and so on.
  • the transcoded video can be directly sent to a preset terminal device through any data communication method, and the terminal device can be listed in the preceding paragraphs. one or more of.
  • FIG. 5 it is a schematic diagram of a video transcoding system framework in the specific application scenario.
  • the framework mainly has 5 processing units: video decoding unit, GPU distribution unit, AI (Artificial Intelligence, artificial intelligence) super-resolution enhancement unit, GPU aggregation unit, video coding unit.
  • these five parts adopt a multi-threaded parallel processing mechanism. Data synchronization between threads adopts semaphore control.
  • the video decoding unit (which can be decoded by FFMpeg): the FFMpeg api decodes the video, and automatically samples the decoded video frame to 1080p.
  • GPU distribution unit It will automatically identify the number of GPU spaces, and distribute the video frame sequence according to the number of GPUs for the subsequent AI super-resolution enhancement unit to process the separated frame sequence.
  • AI super-resolution enhancement unit In this processing unit, a combination of CUDA and tensort is used to make full use of GPU resources. And there is no need for data transmission between each tensorrt inference engine, and the GPU memory is shared, which reduces the redundancy caused by multiple data transmissions. The data is transmitted from the CPU side to the GPU side, only once, the rest of the operations are implemented on the GPU side, and the final processed data is transmitted from the GPU side to the CPU side for use by the encoding unit.
  • FIG. 6 it is a schematic diagram of the data processing flow of the AI super-resolution enhancement unit.
  • the specific processing process is as follows: the data on the CPU side is copied to the GPU side at one time; CUDA does data preprocessing, normalization and type conversion: uint8 data in the range of 0-255 is converted into float32 data in the range of 0-1; the tensorrt framework is called as model 1 Inference, and send the data address to model 2; then call the tensorrt framework to do model 2 inference; CUDA does data post-processing: data trimming (clip) and data type conversion (float32 to uint8), data trimming (clip) refers to the Data beyond the range of 0-1 is truncated, the replacement is 0 if it is less than 0, and the replacement is 1 if it is greater than 1.
  • model 1 is generally a denoising model
  • model 2 is generally a super-resolution model.
  • GPU aggregation unit used to accept the transcoding result and pass it to the coding unit in order to ensure that there will be no frame sequence confusion.
  • Video coding unit (FFMpeg hard coding can be used): The video coding unit can support conventional coding formats, and the FFMpeg hard coding method (h264_nvenc, hevc_nvenc) is used here to improve the coding speed.
  • a video transcoding method includes: decoding an input video to generate a first video frame sequence; using a parallel computing framework to sequentially acquire each frame in the first video frame sequence All the data is pre-processed, and the deep learning inference framework is used to calculate the transcoding model for the pre-processed first video frame sequence to generate the transcoded second video frame sequence; video, output transcoded video.
  • One or more embodiments of this specification utilize a combination of a parallel computing framework and a deep learning inference framework to make full use of graphics processor resources.
  • data transmission is not required between inference frameworks, and data is shared, which reduces the redundancy caused by multiple data transmissions.
  • the video frame is only transmitted once, and the rest of the operations are completed in the graphics processor through the parallel computing framework and the deep learning inference framework, which greatly improves the effective utilization of resources and data processing speed.
  • the methods of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server.
  • the method of the embodiment of the present disclosure can also be applied to a distributed scenario, where multiple devices cooperate with each other to complete.
  • one device among the multiple devices may only perform one or more steps in the method of the embodiment of the present disclosure, and the multiple devices will interact with each other to complete all the steps. method described.
  • a multi-threaded mechanism is used to manage video encoding and decoding and GPU resource scheduling, and CPU resources are fully utilized.
  • the tensorrt framework and cuda are used for acceleration, and the CPU and GPU data transmission are optimized to minimize redundant operations during processing.
  • the real-time super-resolution of HD to 4K was finally realized on a single graphics card of NVIDIA's GeForce RTX TM 2080ti.
  • the number of GPUs can be increased to achieve real-time processing of high-complexity models.
  • all GPUs can be automatically identified and used, and the processing speed of a single card is nearly doubled.
  • the parallel computing framework is a CUDA framework
  • the pretreatment includes:
  • Normalization and type conversion operations are performed on the first video frame sequence through the CUDA framework.
  • the deep learning inference framework is a tensorRT framework
  • the transcoding model calculation performed on the pre-processed first video frame sequence using the deep learning inference framework includes:
  • the decoding of the input video to generate the first video frame sequence includes:
  • the decoded video frame is sampled into a set video display format, and the first video frame sequence is generated.
  • the use of a parallel computing framework to sequentially acquire all the data of each frame in the first video frame sequence is pre-processed, including:
  • a processing space of a currently available graphics processor is identified, a distribution amount of the first video frame sequence is determined according to the processing space, and each frame in the first video frame sequence is sequentially distributed according to the distribution amount.
  • generating the transcoded video in sequence according to the second video frame sequence includes:
  • the audio and video processing program is an FFMpeg program.
  • the present disclosure also provides a video transcoding system, as shown in FIG. 7 , which specifically includes:
  • a decoding unit 701 configured to decode the input video to generate a first video frame sequence
  • the super-resolution enhancement unit 702 is configured to use a parallel computing framework to pre-process all the data of each frame in the sequence of the first video frame obtained in sequence, and use a deep learning inference framework to perform preprocessing on the pre-processed first video frame sequence Perform transcoding model calculation, and generate a second video frame sequence after transcoding;
  • the encoding output unit 703 is configured to generate a transcoded video in sequence according to the second video frame sequence, and output the transcoded video.
  • each module may be implemented in one or more software and/or hardware.
  • the system provides an extended interface based on the polymorphism mechanism of assembly language.
  • assembly language is a programming language such as C language and C++ language.
  • the unit when writing a unit module, the unit can be written based on the polymorphism mechanism of C++. Therefore, when the system is written, the port of the unit that may be added can be defined in advance by using the polymorphism mechanism. Define the addition position and operation logic of other processing units, so that when other processing units need to be added in specific application scenarios, the units can be written according to the defined format and added directly to the system framework, which can easily add other processing units unit.
  • the HDR (High-Dynamic Range, high dynamic range image) unit can be directly added to the super-resolution enhancement unit, and HDR processing is performed after the video frame is super-divided.
  • the system provides an extension interface based on the polymorphism mechanism of assembly language.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing The program implements the video transcoding method described in any one of the above embodiments.
  • FIG. 8 shows a schematic diagram of a more specific hardware structure of an electronic device provided in this embodiment.
  • the device may include: a processor 810 , a memory 820 , an input/output interface 830 , a communication interface 840 and a bus 850 .
  • the processor 810 , the memory 820 , the input/output interface 830 and the communication interface 840 realize the communication connection among each other within the device through the bus 850 .
  • the processor 810 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. program to implement the technical solutions provided by the embodiments of this specification.
  • a general-purpose CPU Central Processing Unit, central processing unit
  • a microprocessor central processing unit
  • an application specific integrated circuit Application Specific Integrated Circuit, ASIC
  • ASIC Application Specific Integrated Circuit
  • the memory 820 can be implemented in the form of a ROM (Read Only Memory, read-only memory), a RAM (Random Access Memory, random access memory), a static storage device, a dynamic storage device, and the like.
  • the memory 820 may store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 820 and invoked by the processor 810 for execution.
  • the input/output interface 830 is used to connect the input/output module to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 840 is used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module may implement communication through wired means (eg, USB, network cable, etc.), or may implement communication through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).
  • Bus 850 includes a path to transfer information between the various components of the device (eg, processor 810, memory 820, input/output interface 830, and communication interface 840).
  • the above-mentioned device only shows the processor 810, the memory 820, the input/output interface 830, the communication interface 840 and the bus 850, in the specific implementation process, the device may also include necessary components for normal operation. other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of the present specification, rather than all the components shown in the figures.
  • the electronic device in the foregoing embodiment is used to implement the corresponding video transcoding method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which will not be repeated here.
  • DRAM dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供的一种视频转码方法、系统及电子设备,包括:对输入视频进行解码生成第一视频帧序列;利用并行计算框架对按序获取第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;根据第二视频帧序列按序生成转码视频,输出转码视频。

Description

视频转码方法、系统及电子设备
本申请要求申请日为2021年3月29日、申请号为“202110336370.2”、发明名称为“视频转码方法、系统及电子设备”的优先权。
技术领域
本公开涉及图像转码技术领域,尤其涉及一种视频转码方法、系统及电子设备。
背景技术
随着互联网技术的快速发展,用户对高清视频的需求量日益增加,用户对视频清晰度要求越来越高。因此,互联网中针对每一个视频都会设置不同的清晰度供用户选择。
但是在现有技术中,用户在切换一个视频的清晰度时,系统会进行大量的数据处理,对同一份数据会可能会进行多次数据调取,造成数据传输的冗余,不利于资源的有效利用及服务器的数据处理速度。
发明内容
有鉴于此,本公开的目的在于提出一种视频转码方法、系统及电子设备。
基于上述目的,本公开提供了一种视频转码方法,包括:
对输入视频进行解码生成第一视频帧序列;
利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;
根据所述第二视频帧序列按序生成转码视频,输出所述转码视频。
基于同一构思,本公开还提供了一种视频转码系统,包括:
解码单元,用于对输入视频进行解码生成第一视频帧序列;
超分辨率增强单元,用于利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;
编码输出单元,用于根据所述第二视频帧序列按序生成转码视频,输出 所述转码视频。
基于同一构思,本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上任一项所述的方法。
从上面所述可以看出,本公开提供的一种视频转码方法、系统及电子设备,包括:对输入视频进行解码生成第一视频帧序列;利用并行计算框架对按序获取第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;根据第二视频帧序列按序生成转码视频,输出转码视频。本说明书一个或多个实施例利用并行计算框架和深度学习推理框架相结合的方式,充分利用到图形处理器资源。且推理框架间的不需要数据传输,共享数据,减少了数据多次传输带来的冗余。视频帧仅进行一次传输,其余操作都在图形处理器中通过并行计算框架和深度学习推理框架完成,大大提升了资源的有效利用率及数据处理速度。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提出的一种视频转码方法的流程示意图;
图2为本公开实施例提出的一种视频转码方法的去噪网络框架示意图;
图3为本公开实施例提出的一种视频转码方法的超分网络框架示意图;
图4为本公开实施例提出的一种视频转码方法的同步控制流程示意图;
图5为本公开实施例提出的具体应用场景中一种视频转码系统框架示意图;
图6为本公开实施例提出的具体应用场景中AI超分辨率增强单元的数据处理流程示意图;
图7为本公开实施例提出的一种视频转码系统的结构示意图;
图8本公开实施例提出的电子设备结构示意图。
具体实施方式
为使本说明书的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本说明书进一步详细说明。
需要说明的是,除非另外定义,本公开实施例使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开实施例中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件、物件或者方法步骤涵盖出现在该词后面列举的元件、物件或者方法步骤及其等同,而不排除其他元件、物件或者方法步骤。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
如背景技术部分所述,现有系统在处理视频的清晰度转码过程中,一般都是通过中央处理器(CPU,Central Processing Unit)进行处理,在处理过程中由于涉及了多个图像模型,从而需要多次进行数据的提取,而多次提取数据的过程势必在处理过程中造成冗余,从而降低视频转码的效率和浪费系统资源。
结合上述实际情况,本公开实施例提出了一种视频转码方案,利用并行计算框架和深度学习推理框架相结合的方式,充分利用到图形处理器(GPU,Graphics Processing Unit)资源。且推理框架间的不需要数据传输,共享数据,减少了数据多次传输带来的冗余。视频帧仅进行一次传输,其余操作都在图形处理器中通过并行计算框架和深度学习推理框架完成,大大提升了资源的有效利用率及数据处理速度。
参考图1所示,为本说明书一个实施例的一种视频转码方法的流程示意图,具体包括以下步骤:
步骤101,对输入视频进行解码生成第一视频帧序列。
本步骤旨在,对视频进行解码,形成一个个视频帧,为之后的视频帧超分做准备。其中,对于输入视频,视频的格式可以是wma,rm,avi,mod等等。解码的方式可以是硬件解码也可以是软件解码等,解码工具可以是 FFmpeg(Fast Forward Mpeg)、MPEG-4、DivX等等。视频帧序列即为将视频解码后,将每个视频帧按照播放时间节点顺序排列的序列。
可选的,在一些应用场景中,为了方便后续模型处理过程,可以在生成视频帧序列时进行视频采样,例如将将解码出的视频帧自动采样到1080p。即,所述对输入视频进行解码生成第一视频帧序列,包括:通过音视频处理程序对所述输入视频进行解码;将解码后的视频帧采样为设定视频显示格式,生成所述第一视频帧序列。当然,在生成第一视频帧序列时也可以完全不进行采样,解码完成即生成最终的视频帧序列。
可选的,为了保证高可移植性和编解码质量,提高编码速度,所述音视频处理程序为FFMpeg(Fast Forward Mpeg)程序。
步骤102,利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列。
本步骤旨在,在进行视频转码时,先一次性获取全部的视频帧数据,并通过并行计算框架和深度学习推理框架相结合的方式充分利用处理器资源,并且由于深度学习推理框架共享数据,减少了数据多次传输带来的冗余。其中,并行计算(Parallel Computing)是指同时使用多种计算资源解决计算问题的过程,是提高计算机系统计算速度和处理能力的一种有效手段。它的基本思想是用多个处理器来协同求解同一问题,即将被求解的问题分解成若干个部分,各部分均由一个独立的处理机来并行计算。并行计算系统既可以是专门设计的、含有多个处理器的超级计算机,也可以是以某种方式互连的若干台的独立计算机构成的集群。通过并行计算集群完成数据的处理,再将处理的结果返回给用户。常见的并行计算框架有:MPI、OPENMP、OPENCL(Open Computing Language,开放运算语言)、OPENGL(Open Graphics Library,开放图形库)、CUDA(Compute Unified Device Architecture)等等。之后,在深度学习推理初始阶段,每个深度学习研究者都需要写大量的重复代码。为了提高工作效率,研究者将这些代码写成了各种框架放到网上让所有研究者一起使用,即为深度学习推理框架。常见的深度学习推理框架有:TensorRT、OpenVINO、NCNN、MNN等等。
之后,利用并行计算框架对每一帧的全部数据进行前处理是对每一帧数据进行标准化处理的过程,其可以是对每一帧数据进行归一化和/或类型转 换的过程,例如,将0-255范围uint8数据转换为0-1范围float32数据。之后,为了能够最大化的利用到图形处理器(GPU),使其能够快速高效的进行归一化和/或类型转换等前处理过程,可选的,在并行计算框架中选取CUDA框架。即,所述并行计算框架为CUDA框架;所述前处理,包括:通过所述CUDA框架对所述第一视频帧序列进行归一化及类型转换操作。
进行完前处理之后,利用深度学习推理框架加载不同的模型进行模型推理计算以完成转码超分的过程,其中,加载的模型可以是去噪模型、超分模型、细节增强模型等等。可选的,在进行转码的过程中,去噪模型计算与超分模型计算是必须按照先后顺序进行的,先进行完去噪模型计算,再进行超分模型计算,而细节增强模型等其他模型可以根据具体情况,选择是否需要加入。可选的,为了达到能够进行实时处理的目的,对于去噪模型与超分模型,在两者现有模型的基础上可以进一步的再进行设定次数的卷积和/或反卷积操作。之后,为了能够最大化的利用到GPU资源,在共享GPU显存的同事能够快速的进行模型计算,可选的,在深度学习推理框架中选取tensorRT框架。即,所述深度学习推理框架为tensorRT框架;所述利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,包括:通过所述tensorRT框架对所述前处理后的第一视频帧序列按照去噪模型、细节增强模型和/或超分模型顺序进行计算;并在所述去噪模型、细节增强模型和/或超分模型中进行至少一次卷积和/或反卷积操作,完成所述转码模型计算。
在具体应用场景中,如图2所示,为一种视频转码方法的去噪网络框架示意图;如图3所示,为一种视频转码方法的超分网络框架示意图。其中,两种网络框架分别是在现有的去噪网络框架和超分网络框架的基础上再次做了卷积和/或反卷积操作,举例说明,在去噪网络框架下,先行按照现有去噪网络完成数据处理后,将处理后的数据再进行如图2所示的4次卷积,之后再输出,最终完成去噪处理。每层卷积的高斯内核均为3*3,特征图数量均为64。利用本具体实施例中的框架系统可以实现高转4k的实时视频超分辨率处理,包括视频编码,超分辨增强,视频编码的整个流程,每一帧的整体处理时间为0.04s。如果模型的复杂度较高,可以相应的增加GPU的数量,以达到实时处理的目的。
在一个具体应用场景中,为了确定能够利用的GPU资源数量,最大化的利用GPU处理空间。所述利用并行计算框架对按序获取所述第一视频帧 序列中每一帧的全部数据进行前处理,包括:识别当前可用的图形处理器的处理空间,根据所述处理空间确定所述第一视频帧序列的分发量,根据所述分发量按序分发所述第一视频帧序列中的每一帧。即,在进行视频帧序列分发之前,先确定当前能用的GPU空间量,然后根据之后要进行的处理模型量对GPU空间量进行划分,例如:后续需要进行去噪和超分两个模型计算,则将当前的全部GPU空间平均划分两部分,一部分进行去噪,一部分进行超分,然后由于对视频帧的处理需要先进行去噪再进行超分,则根据去噪模型分到的空间量确定每次视频帧序列的分发量,之后按照这个分发量对视频帧进行顺序分发。其中,为了实现视频帧的顺序分发、顺序处理等过程。需要对数据进行同步控制,在具体应用场景中,可以采用信号量来做同步控制,如图4所示,为一种同步控制的流程示意图。其中,sem_wait代表等待信号量若信号量为0则挂起,若为1则减一,sem_post代表释放信号量,将信号量加1。其具体步骤如下:(1)等待共享变量1可读,既等待上一个线程处理结束将结果赋值给共享变量1;(2)从共享变量1中取值;(3)释放共享变量1可写,既告诉上一个线程可以给共享变量1赋值了。(4)处理数据。(5)等待共享变量2可写,既等待下一个线程将共享变量2的上一个值取走。(6)向共享变量2赋值。(7)释放共享变量2可读,既告诉下一个线程可以取走共享变量2的值了。从而,可以通过如此方式可以做到当前线程处理第n帧,上一个线程处理第n-1帧,下一个线程处理第n+1帧,从而可以通过流水线的方式处理,更高效的使用多核资源。
步骤103,根据所述第二视频帧序列按序生成转码视频,输出所述转码视频。
本步骤旨在,对超分后的视频帧序列再按序排列重新编码成为超分视频并输出,以使用户能够看到超分转码的视频或对转码视频进行再加工。其中,根据视频帧序列按序生成转码视频的过程,可以是先对转码后的视频帧进行集合,并顺序排列好后,再一并转发给视频编码软件进行视频编码生成转码视频;或是在每个视频帧转码完成后直接给到视频编码软件,由于在视频转码时其本身也是按序进行的,所以在编码是只要根据接收顺序进行编码即可,从而生成转码视频。其中,本步骤中进行排序处理的过程可以与步骤102的具体应用场景中的同步控制方式相类似。之后,对于编码程序,可以使用与步骤101中相同的音视频处理程序进行,同时,为了减少转码与编码 之间的数据传输次数,以减少传输冗余。所述根据所述第二视频帧序列按序生成转码视频,包括:获取全部所述第二视频帧序列,根据所述音视频处理程序按序进行再编码。即,等全部视频帧转码完成后,再按序排列,一起发送给编码程序进行编码。
最后,输出转码视频,用以存储、展示、使用或再加工转码视频。根据不同的应用场景和实施需要,具体的对于转码视频的输出方式可以灵活选择。
例如,对于本实施例的方法在单一设备上执行的应用场景,可以将转码视频直接在当前设备的显示部件(显示器、投影仪等)上以显示的方式输出,使得当前设备的操作者能够从显示部件上直接看到转码视频的内容。
又如,对于本实施例的方法在多个设备组成的系统上执行的应用场景,可以将转码视频通过任意的数据通信方式(有线连接、NFC、蓝牙、wifi、蜂窝移动网络等)发送至系统内的其他作为接收方的预设设备上,以使得接收到转码视频的预设设备可以对其进行后续处理。可选的,该预设设备可以是预设的服务器,服务器一般设置在云端,作为数据的处理和存储中心,其能够对转码视频进行存储和分发;其中,分发的接收方是终端设备,该些终端设备的持有者或操作者可以是当前用户、视频后续处理的相关人员等等。
再如,对于本实施例的方法在多个设备组成的系统上执行的应用场景时,可以将转码视频通过任意的数据通信方式直接发送至预设的终端设备,终端设备可以是前述段落列举中的一种或多种。
在具体应用场景中,如图5所示,为具体应用场景中一种视频转码系统框架示意图。其中,框架主要有5个处理单元:视频解码单元,GPU分发单元,AI(Artificial Intelligence,人工智能)超分辨率增强单元,GPU集合单元,视频编码单元。为了充分使用到CPU资源,这五个部分采用多线程并行处理机制。线程之间的数据同步采用信号量控制。具体的,视频解码单元(可使用FFMpeg解码):FFMpeg api解码视频,将解码出的视频帧自动采样到1080p。GPU分发单元:会自动识别GPU的空间数量,并根据GPU数量做视频帧序列的分发,以供后续AI超分辨率增强单元处理分出去的帧序列。AI超分辨率增强单元:在这个处理单元中,采用了CUDA和tensort相结合的方式,充分利用到GPU资源。且每个tensorrt推理引擎之间的不需要做数据传输,共享GPU显存,减少了数据多次传输带来的冗余。数据从 CPU端传给GPU端,只做一次,其余操作都在GPU端实现,最终处理完的数据从GPU端传给CPU端,以供编码单元使用。如图6所示,为AI超分辨率增强单元的数据处理流程示意图。具体处理过程如下:CPU端数据一次性拷贝至GPU端;CUDA做数据前处理,进行归一化和类型转换:0-255范围uint8数据转换为0-1范围float32数据;调用tensorrt框架做模型1推理,并将数据地址给模型2;再调用tensorrt框架做模型2推理;CUDA做数据后处理:数据修剪(clip)和数据类型转换(float32转换为uint8),数据修剪(clip)是指的将超出0-1范围的数据截断,小于0的置换为0,大于1的置换为1。其中,模型1一般是去噪模型,模型2一般是超分模型,当然其他模型也可以根据实际情况以预定顺序加入其中,或加于两个模型之间,或加于两个模型之后或之前等等。GPU集合单元:用于接受转码结果,并按顺序传给编码单元,以保证不会出现帧序列混乱情况。视频编码单元(可使用FFMpeg硬编):视频编码单元可以支持常规的编码格式,在此采用FFMpeg硬编码的方式(h264_nvenc,hevc_nvenc),提高编码速度。
通过应用本说明书一个或多个实施例提供的一种视频转码方法,包括:对输入视频进行解码生成第一视频帧序列;利用并行计算框架对按序获取第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;根据第二视频帧序列按序生成转码视频,输出转码视频。本说明书一个或多个实施例利用并行计算框架和深度学习推理框架相结合的方式,充分利用到图形处理器资源。且推理框架间的不需要数据传输,共享数据,减少了数据多次传输带来的冗余。视频帧仅进行一次传输,其余操作都在图形处理器中通过并行计算框架和深度学习推理框架完成,大大提升了资源的有效利用率及数据处理速度。
需要说明的是,本公开实施例的方法可以由单个设备执行,例如一台计算机或服务器等。本公开实施例的方法也可以应用于分布式场景下,由多台设备相互配合来完成。在这种分布式场景的情况下,这多台设备中的一台设备可以只执行本公开实施例的方法中的某一个或多个步骤,这多台设备相互之间会进行交互以完成所述的方法。
需要说明的是,上述对本公开特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤 可以按照不同于上述实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
在具体应用场景中,采用多线程机制管理视频编解码以及GPU资源调度,充分使用到CPU资源。在AI超分辨率增强单元和数据前后处理中,使用tensorrt框架和cuda加速,且在CPU和GPU数据传输上做了优化,将处理过程中的冗余操作降到最低。结合超分辨模型对数据精度的特殊要求,对比多种显卡在不同数据精度上的算力,以及显卡成本,最终在NVIDIA的GeForce RTX TM 2080ti单个显卡上实现了高清转4K的实时超分辨。且可根据具体模型的计算复杂度,可增加GPU数量,以达到高复杂度模型的实时处理,本系统框架下可实现自动识别使用全部GPU,且对比单卡处理速度接近倍数增长。
需要说明的是,本公开的实施例还可以通过以下方式进一步描述:
在一些实施方式中,所述并行计算框架为CUDA框架;
所述前处理,包括:
通过所述CUDA框架对所述第一视频帧序列进行归一化及类型转换操作。
在一些实施方式中,所述深度学习推理框架为tensorRT框架;
所述利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,包括:
通过所述tensorRT框架对所述前处理后的第一视频帧序列按照去噪模型、细节增强模型和/或超分模型顺序进行计算;并在所述去噪模型、细节增强模型和/或超分模型中进行至少一次卷积和/或反卷积操作,完成所述转码模型计算。
在一些实施方式中,所述对输入视频进行解码生成第一视频帧序列,包括:
通过音视频处理程序对所述输入视频进行解码;
将解码后的视频帧采样为设定视频显示格式,生成所述第一视频帧序列。
在一些实施方式中,所述利用并行计算框架对按序获取所述第一视频帧 序列中每一帧的全部数据进行前处理,包括:
识别当前可用的图形处理器的处理空间,根据所述处理空间确定所述第一视频帧序列的分发量,根据所述分发量按序分发所述第一视频帧序列中的每一帧。
在一些实施方式中,所述根据所述第二视频帧序列按序生成转码视频,包括:
获取全部所述第二视频帧序列,根据所述音视频处理程序按序进行再编码。
在一些实施方式中,所述音视频处理程序为FFMpeg程序。
基于同一构思,与上述任意实施例方法相对应的,本公开还提供了一种视频转码系统,参考图7所示,具体包括:
解码单元701,用于对输入视频进行解码生成第一视频帧序列;
超分辨率增强单元702,用于利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;
编码输出单元703,用于根据所述第二视频帧序列按序生成转码视频,输出所述转码视频。
为了描述的方便,描述以上系统时以功能分为各种模块分别描述。当然,在实施本公开实施例时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
上述实施例的系统用于实现前述实施例中相应的视频转码方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
在本说明书的可选实施例中,为了能够实现新增处理单元“即插即用”模式。所述系统基于汇编语言的多态机制提供扩展接口。其中汇编语言即为C语言、C++语言等编程语言。在具体应用场景中,在进行单元模块编写时,均可基于C++的多态机制进行单元编写,从而,在完成系统编写时,可以利用多态机制事先定义好可能加入的单元的端口,以此定义其他处理单元的加入位置和运算逻辑,从而在具体应用场景中需要加入其他处理单元时,即可按照定义好的格式进行单元编写从而直接加入到系统框架中来,可以很方便的添加其他处理单元。例如:HDR(High-Dynamic Range,高动态范围图像) 单元,可以直接加入到超分辨率增强单元之后,在视频帧完成超分后进行HDR处理。
需要说明的是,本公开的实施例还可以通过以下方式进一步描述:
在一些实施方式中,所述系统基于汇编语言的多态机制提供扩展接口。
基于同一构思,与上述任意实施例方法相对应的,本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上任意一实施例所述的视频转码方法。
图8示出了本实施例所提供的一种更为具体的电子设备硬件结构示意图,该设备可以包括:处理器810、存储器820、输入/输出接口830、通信接口840和总线850。其中处理器810、存储器820、输入/输出接口830和通信接口840通过总线850实现彼此之间在设备内部的通信连接。
处理器810可以采用通用的CPU(Central Processing Unit,中央处理器)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器820可以采用ROM(Read Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、静态存储设备,动态存储设备等形式实现。存储器820可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器820中,并由处理器810来调用执行。
输入/输出接口830用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口840用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线850包括一通路,在设备的各个组件(例如处理器810、存储器820、输入/输出接口830和通信接口840)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器810、存储器820、输入/输出接口830、通信接口840以及总线850,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
上述实施例的电子设备用于实现前述任一实施例中相应的视频转码方法,并且具有相应的方法实施例的有益效果,在此不再赘述。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本公开的范围(包括权利要求)被限于这些例子;在本公开的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本公开实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。
另外,为简化说明和讨论,并且为了不会使本公开实施例难以理解,在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外,可以以框图的形式示出装置,以便避免使本公开实施例难以理解,并且这也考虑了以下事实,即关于这些框图装置的实施方式的细节是高度取决于将要实施本公开实施例的平台的(即,这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如,电路)以描述本公开的示例性实施例的情况下,对本领域技术人员来说显而易见的是,可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本公开实施例。因此,这些描述应被认为是说明性的而不是限制性的。
尽管已经结合了本公开的具体实施例对本公开进行了描述,但是根据前面的描述,这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如,其它存储器架构(例如,动态RAM(DRAM))可以使用所讨论的实施例。
本公开实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此,凡在本公开实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本公开的保护范围之内。

Claims (10)

  1. 一种视频转码方法,包括:
    对输入视频进行解码生成第一视频帧序列;
    利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;
    根据所述第二视频帧序列按序生成转码视频,输出所述转码视频。
  2. 根据权利要求1所述的方法,其中,所述并行计算框架为CUDA框架;
    所述前处理,包括:
    通过所述CUDA框架对所述第一视频帧序列进行归一化及类型转换操作。
  3. 根据权利要求1所述的方法,其中,所述深度学习推理框架为tensorRT框架;
    所述利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,包括:
    通过所述tensorRT框架对所述前处理后的第一视频帧序列按照去噪模型、细节增强模型和/或超分模型顺序进行计算;并在所述去噪模型、细节增强模型和/或超分模型中进行至少一次卷积和/或反卷积操作,完成所述转码模型计算。
  4. 根据权利要求1所述的方法,其中,所述对输入视频进行解码生成第一视频帧序列,包括:
    通过音视频处理程序对所述输入视频进行解码;
    将解码后的视频帧采样为设定视频显示格式,生成所述第一视频帧序列。
  5. 根据权利要求4所述的方法,其中,所述利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,包括:
    识别当前可用的图形处理器的处理空间,根据所述处理空间确定所述第一视频帧序列的分发量,根据所述分发量按序分发所述第一视频帧序列中的每一帧。
  6. 根据权利要求4所述的方法,其中,所述根据所述第二视频帧序列 按序生成转码视频,包括:
    获取全部所述第二视频帧序列,根据所述音视频处理程序按序进行再编码。
  7. 根据权利要求4所述的方法,其中,所述音视频处理程序为FFMpeg程序。
  8. 一种视频转码系统,包括:
    解码单元,用于对输入视频进行解码生成第一视频帧序列;
    超分辨率增强单元,用于利用并行计算框架对按序获取所述第一视频帧序列中每一帧的全部数据进行前处理,利用深度学习推理框架对前处理后的第一视频帧序列进行转码模型计算,生成转码后的第二视频帧序列;
    编码输出单元,用于根据所述第二视频帧序列按序生成转码视频,输出所述转码视频。
  9. 根据权利要求8所述的系统,其中,所述系统基于汇编语言的多态机制提供扩展接口。
  10. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1至7任一项所述的方法。
PCT/CN2022/084838 2021-03-29 2022-04-01 视频转码方法、系统及电子设备 WO2022206960A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110336370.2 2021-03-29
CN202110336370.2A CN113038279B (zh) 2021-03-29 2021-03-29 视频转码方法、系统及电子设备

Publications (1)

Publication Number Publication Date
WO2022206960A1 true WO2022206960A1 (zh) 2022-10-06

Family

ID=76452781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/084838 WO2022206960A1 (zh) 2021-03-29 2022-04-01 视频转码方法、系统及电子设备

Country Status (2)

Country Link
CN (1) CN113038279B (zh)
WO (1) WO2022206960A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113038279B (zh) * 2021-03-29 2023-04-18 京东方科技集团股份有限公司 视频转码方法、系统及电子设备
CN114501141B (zh) * 2022-01-04 2024-02-02 杭州网易智企科技有限公司 视频数据处理方法、装置、设备和介质
CN114449295A (zh) * 2022-01-30 2022-05-06 京东方科技集团股份有限公司 视频处理方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125538A1 (en) * 2007-11-13 2009-05-14 Elemental Technologies, Inc. Video encoding and decoding using parallel processors
CN103596012A (zh) * 2013-11-14 2014-02-19 山东电子职业技术学院 一种实时的基于avs的视频帧率转码中帧间宏块类型选择方法
WO2018051330A1 (en) * 2016-09-14 2018-03-22 Beamr Imaging Ltd. Method of pre-processing of video information for optimized video encoding
CN108920274A (zh) * 2018-06-21 2018-11-30 北京陌上花科技有限公司 用于图像处理服务器端的性能优化及装置
CN110418144A (zh) * 2019-08-28 2019-11-05 成都索贝数码科技股份有限公司 一种基于nvidia gpu实现一入多出转码多码率视频文件的方法
CN111726633A (zh) * 2020-05-11 2020-09-29 河南大学 基于深度学习和显著性感知的压缩视频流再编码方法
CN113038279A (zh) * 2021-03-29 2021-06-25 京东方科技集团股份有限公司 视频转码方法、系统及电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101098483A (zh) * 2007-07-19 2008-01-02 上海交通大学 以图像组结构为并行处理单元的视频集群转码系统
US9338467B1 (en) * 2010-07-19 2016-05-10 Google Inc. Parallel video transcoding
HK1205426A2 (zh) * 2015-09-24 2015-12-11 Tfi Digital Media Ltd 種分佈式視頻編碼方法
US10798393B2 (en) * 2018-07-09 2020-10-06 Hulu, LLC Two pass chunk parallel transcoding process
US20210067952A1 (en) * 2019-09-03 2021-03-04 Nvidia Corporation Performing scrambling and/or descrambling on parallel computing architectures
CN110992260B (zh) * 2019-10-15 2022-04-22 网宿科技股份有限公司 一种视频超分辨率重建的方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125538A1 (en) * 2007-11-13 2009-05-14 Elemental Technologies, Inc. Video encoding and decoding using parallel processors
CN103596012A (zh) * 2013-11-14 2014-02-19 山东电子职业技术学院 一种实时的基于avs的视频帧率转码中帧间宏块类型选择方法
WO2018051330A1 (en) * 2016-09-14 2018-03-22 Beamr Imaging Ltd. Method of pre-processing of video information for optimized video encoding
CN108920274A (zh) * 2018-06-21 2018-11-30 北京陌上花科技有限公司 用于图像处理服务器端的性能优化及装置
CN110418144A (zh) * 2019-08-28 2019-11-05 成都索贝数码科技股份有限公司 一种基于nvidia gpu实现一入多出转码多码率视频文件的方法
CN111726633A (zh) * 2020-05-11 2020-09-29 河南大学 基于深度学习和显著性感知的压缩视频流再编码方法
CN113038279A (zh) * 2021-03-29 2021-06-25 京东方科技集团股份有限公司 视频转码方法、系统及电子设备

Also Published As

Publication number Publication date
CN113038279B (zh) 2023-04-18
CN113038279A (zh) 2021-06-25

Similar Documents

Publication Publication Date Title
WO2022206960A1 (zh) 视频转码方法、系统及电子设备
US11977388B2 (en) Quantizing autoencoders in a neural network
US11263525B2 (en) Progressive modification of neural networks
CN111258744A (zh) 一种基于异构计算的任务处理方法及软硬件框架系统
US10504275B2 (en) Methods and apparatus for more efficient ray tracing of instanced geometry
US9576340B2 (en) Render-assisted compression for remote graphics
US10445043B2 (en) Graphics engine and environment for efficient real time rendering of graphics that are not pre-known
US11159790B2 (en) Methods, apparatuses, and systems for transcoding a video
US8928680B1 (en) Method and system for sharing a buffer between a graphics processing unit and a media encoder
US11082720B2 (en) Using residual video data resulting from a compression of original video data to improve a decompression of the original video data
TWI725024B (zh) 促進高效圖形命令產生和執行的設備、方法及非暫態機器可讀取媒體
US11550632B2 (en) Facilitating efficient communication and data processing across clusters of computing machines in heterogeneous computing environment
US20170178594A1 (en) Method and apparatus for color buffer compression
CN110650347A (zh) 多媒体数据的处理方法及装置
CN103888771A (zh) 基于gpgpu技术的并行视频图像处理方法
US11954830B2 (en) High dynamic range support for legacy applications
US20160283825A1 (en) Clustered Palette Compression
Kalva et al. Parallel programming for multimedia applications
US20170213314A1 (en) Smart optimization of unused graphics buffer memory in computing environments
CN114529443A (zh) 以目标采样率的自适应采样
WO2023124428A1 (zh) 芯片、加速卡以及电子设备、数据处理方法
CN114237916B (zh) 一种数据处理方法及相关设备
JP2023157833A (ja) 機械学習推論用のグラフィックス処理の最適化のためのシステムおよび方法
CN103891272B (zh) 用于视频分析和编码的多个流处理
CN114385867A (zh) 一种对多维数据进行处理的设备、方法和计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22779140

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19-02-2024)