CN113038279B

CN113038279B - Video transcoding method and system and electronic device

Info

Publication number: CN113038279B
Application number: CN202110336370.2A
Authority: CN
Inventors: 高艳
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2023-04-18
Anticipated expiration: 2041-03-29
Also published as: CN113038279A; WO2022206960A1

Abstract

The present disclosure provides a video transcoding method, system and electronic device, including: decoding an input video to generate a first video frame sequence; preprocessing all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, and performing transcoding model computation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence; and generating the transcoded video according to the second video frame sequence in sequence and outputting the transcoded video. One or more embodiments of the present description leverage graphics processor resources in a manner that combines a parallel computing framework and a deep learning inference framework. And data transmission is not needed among the reasoning frames, data is shared, and redundancy caused by multiple data transmission is reduced. The video frame is transmitted only once, and other operations are completed in the graphics processor through a parallel computing frame and a deep learning reasoning frame, so that the effective utilization rate of resources and the data processing speed are greatly improved.

Description

Video transcoding method and system and electronic device

Technical Field

The present disclosure relates to the field of image transcoding technologies, and in particular, to a video transcoding method, system and electronic device.

Background

With the rapid development of internet technology, the demand of users for high definition videos is increasing day by day, and the requirements of users for video definition are higher and higher. Therefore, different definitions are set for each video in the internet for the user to select.

However, in the prior art, when a user switches the definition of a video, the system may perform a large amount of data processing, and may call the same data multiple times, which may cause redundancy in data transmission, and is not favorable for effective utilization of resources and data processing speed of the server.

Disclosure of Invention

In view of the above, the present disclosure is directed to a video transcoding method, system and electronic device.

Based on the above purpose, the present disclosure provides a video transcoding method, including:

decoding an input video to generate a first video frame sequence;

preprocessing all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, and performing transcoding model calculation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence;

and generating a transcoded video according to the second video frame sequence in sequence, and outputting the transcoded video.

Based on the same concept, the present disclosure also provides a video transcoding system, comprising:

a decoding unit for decoding an input video to generate a first video frame sequence;

the super-resolution enhancement unit is used for preprocessing all data of each frame in the first video frame sequence acquired in sequence by using a parallel computing frame, and performing transcoding model computation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence;

and the encoding output unit is used for generating the transcoded video in sequence according to the second video frame sequence and outputting the transcoded video.

Based on the same concept, the present disclosure also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method as described in any one of the above when executing the program.

As can be seen from the foregoing, the present disclosure provides a video transcoding method, system and electronic device, including: decoding an input video to generate a first video frame sequence; preprocessing all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, and performing transcoding model computation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence; and generating the transcoded video according to the second video frame sequence in sequence and outputting the transcoded video. One or more embodiments of the present description leverage graphics processor resources in a manner that combines a parallel computing framework and a deep learning inference framework. And data transmission is not needed among the reasoning frameworks, data is shared, and redundancy caused by multiple data transmission is reduced. The video frame is transmitted only once, and other operations are completed in the graphics processor through a parallel computing frame and a deep learning reasoning frame, so that the effective utilization rate of resources and the data processing speed are greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video transcoding method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a denoising network frame of a video transcoding method according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of a hyper-division network framework of a video transcoding method according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a synchronization control flow of a video transcoding method according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a video transcoding system framework in a specific application scenario proposed in the embodiment of the present disclosure;

fig. 6 is a schematic data processing flow diagram of an AI super-resolution enhancement unit in a specific application scenario according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a video transcoding system according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the present specification more apparent, the present specification is further described in detail below with reference to the accompanying drawings in combination with specific embodiments.

It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element, article, or method step that precedes the word comprises, or does not exclude, other elements, articles, or method steps, and the like. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

As described in the background section, in the process of transcoding the definition of a video, a Central Processing Unit (CPU) is generally used to process the video, and since a plurality of image models are involved in the process of transcoding the definition of the video, data extraction needs to be performed many times, and the process of extracting data many times inevitably causes redundancy in the process of transcoding the video, thereby reducing the efficiency of transcoding the video and wasting system resources.

In combination with the above actual situation, the embodiment of the present disclosure provides a video transcoding scheme, which makes full use of Graphics Processing Unit (GPU) resources by using a combination of a parallel computing framework and a deep learning inference framework. And data transmission is not needed among the reasoning frames, data is shared, and redundancy caused by multiple data transmission is reduced. The video frame is transmitted only once, and other operations are completed in the graphics processor through a parallel computing frame and a deep learning reasoning frame, so that the effective utilization rate of resources and the data processing speed are greatly improved.

Referring to fig. 1, a schematic flow diagram of a video transcoding method according to an embodiment of the present disclosure is shown, which specifically includes the following steps:

step 101, decoding an input video to generate a first video frame sequence.

The step aims at decoding the video to form video frames and preparing for the superseparation of the following video frames. Where for input video, the format of the video may be wma, rm, avi, mod, etc. The decoding mode can be hardware decoding, software decoding, etc., and the decoding tool can be FFmpeg (Fast Forward Mpeg), MPEG-4, divX, etc. The video frame sequence is a sequence in which after the video is decoded, each video frame is arranged according to the sequence of playing time nodes.

Optionally, in some application scenarios, for facilitating the subsequent model processing process, video sampling may be performed when generating the video frame sequence, for example, automatically sampling a decoded video frame to 1080p. Namely, the decoding of the input video generates a first sequence of video frames comprising: decoding the input video through an audio and video processing program; and sampling the decoded video frame into a set video display format to generate the first video frame sequence. Of course, the first video frame sequence may be generated without sampling at all, and the final video frame sequence may be generated after decoding is completed.

Optionally, in order to ensure high portability and encoding and decoding quality and improve encoding speed, the audio/video processing program is an FFMpeg (Fast Forward Mpeg) program.

And 102, preprocessing all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, and performing transcoding model computation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence.

The method aims to acquire all video frame data at one time when video transcoding is carried out, fully utilize processor resources in a mode of combining a parallel computing frame and a deep learning inference frame, and reduce redundancy caused by multiple data transmission because the deep learning inference frame shares data. Parallel Computing (Parallel Computing) refers to a process of solving a Computing problem by simultaneously using multiple Computing resources, and is an effective means for improving the Computing speed and the processing capacity of a computer system. The basic idea is to solve the same problem by using multiple processors, i.e. the problem to be solved is decomposed into several parts, each part is calculated in parallel by an independent processor. A parallel computing system may be either a specially designed supercomputer with multiple processors or a cluster of several separate computers interconnected in some fashion. And finishing the data processing through the parallel computing cluster, and returning the processing result to the user. Common parallel computing frameworks are: MPI, OPENMP, OPENCL (Open Computing Language), OPENGGL (Open Graphics Library), CUDA (computer Unified Device Architecture), and the like. Thereafter, in the deep learning reasoning initial stage, each deep learning researcher needs to write a large amount of repeated codes. In order to improve the work efficiency, researchers write the codes into various frames and put the frames on the network for all researchers to use together, namely, the deep learning reasoning frame is adopted. Common deep learning reasoning frameworks are: tensorRT, openVINO, NCNN, MNN, and the like.

Then, the preprocessing of the entire data of each frame by the parallel computing framework is a process of performing normalization processing on each frame data, which may be a process of performing normalization and/or type conversion on each frame data, for example, converting 0-255 range agent 8 data into 0-1 range float32 data. Then, in order to maximally utilize a Graphics Processing Unit (GPU) and quickly and efficiently perform preprocessing processes such as normalization and/or type conversion, a CUDA frame may be selected from the parallel computing frames. Namely, the parallel computing framework is a CUDA framework; the pretreatment comprises the following steps: and carrying out normalization and type conversion operation on the first video frame sequence through the CUDA framework.

After the pre-processing is finished, different models are loaded by using a deep learning reasoning framework to perform model reasoning calculation so as to finish the transcoding and hyper-parting process, wherein the loaded models can be a denoising model, a hyper-parting model, a detail enhancement model and the like. Optionally, in the transcoding process, the denoising model calculation and the hyper-segmentation model calculation must be performed in order, the denoising model calculation is performed first, and then the hyper-segmentation model calculation is performed, and other models such as the detail enhancement model and the like may be selected according to specific situations to be added or not. Optionally, in order to achieve the purpose of real-time processing, the denoising model and the hyper-segmentation model may be further subjected to convolution and/or deconvolution operations for a set number of times on the basis of the two existing models. Then, in order to maximally utilize the GPU resources, the colleagues sharing the GPU video memory can quickly perform model calculation, and optionally, a tensorRT frame is selected from a deep learning inference frame. Namely, the deep learning inference framework is a tensorRT framework; the transcoding model calculation of the preprocessed first video frame sequence by using the deep learning inference framework comprises the following steps: calculating the preprocessed first video frame sequence according to a denoising model, a detail enhancement model and/or a hyper-resolution model sequence through the tensorRT frame; and performing convolution and/or deconvolution operation at least once in the denoising model, the detail enhancement model and/or the hyper-resolution model to complete the calculation of the transcoding model.

In a specific application scenario, as shown in fig. 2, a schematic diagram of a denoising network framework of a video transcoding method is shown; fig. 3 is a schematic diagram of a hyper-division network framework of a video transcoding method. For example, in the denoising network framework, after data processing is completed according to the existing denoising network in advance, the processed data is convolved for 4 times as shown in fig. 2, and then output, and finally denoising processing is completed. The Gaussian kernel of each layer of convolution is 3*3, and the number of feature maps is 64. By using the frame system in the embodiment, the real-time video super-resolution processing of high-resolution 4k can be realized, including the whole processes of video coding, super-resolution enhancement and video coding, and the whole processing time of each frame is 0.04s. If the complexity of the model is higher, the number of the GPUs can be correspondingly increased so as to achieve the purpose of real-time processing.

In one particular application scenario, GPU processing space is maximally utilized in order to determine the amount of GPU resources that can be utilized. The pre-processing, by using a parallel computing framework, all data of each frame in the first sequence of video frames, includes: the method comprises the steps of identifying a processing space of a currently available graphics processor, determining a distribution amount of the first video frame sequence according to the processing space, and distributing each frame in the first video frame sequence according to the distribution amount in sequence. That is, before the video frame sequence is distributed, the amount of GPU space currently available is determined, and then the amount of GPU space is divided according to the amount of processing model to be performed later, for example: and secondly, determining the distribution amount of each video frame sequence according to the space amount divided by the denoising model and then sequentially distributing the video frames according to the distribution amount, wherein the two parts are divided into two parts by the current GPU space average, one part is denoised and the other part is subjected to super-division, and then the denoising model is firstly denoised and then the super-division is carried out on the video frames. In order to realize the processes of sequential distribution, sequential processing and the like of video frames. Data needs to be synchronously controlled, and in a specific application scenario, a semaphore may be used for synchronous control, as shown in fig. 4, which is a schematic flow diagram of synchronous control. Wherein, sem _ wait represents that the wait semaphore is suspended if the semaphore is 0, and is decremented by one if the semaphore is 1, and sem _ post represents that the semaphore is released, and the semaphore is added by 1. The method comprises the following specific steps: (1) Waiting for the shared variable 1 to be readable, namely, waiting for the last thread to finish processing and assigning the result to the shared variable 1; (2) taking a value from the shared variable 1; (3) Releasing the shared variable 1 may be writable, which tells the last thread that the shared variable 1 may be assigned. And (4) processing the data. (5) Waiting for shared variable 2 to be writable, i.e. waiting for the next thread to take the last value of shared variable 2. (6) assign a value to the shared variable 2. (7) Releasing shared variable 2 is readable, telling the next thread that the value of shared variable 2 can be taken. Therefore, the current thread can process the nth frame, the previous thread processes the (n-1) th frame, and the next thread processes the (n + 1) th frame in such a way, so that the current thread can process the nth frame in a pipeline mode, and multi-core resources can be more efficiently used.

And 103, sequentially generating a transcoded video according to the second video frame sequence, and outputting the transcoded video.

The aim of the step is to re-encode the super-divided video frame sequence into the super-divided video in sequence and output the super-divided video so that a user can see the super-divided transcoded video or re-process the transcoded video. The process of sequentially generating the transcoded video according to the video frame sequence can be that firstly, the transcoded video frames are gathered and sequentially arranged, and then, the video frames are forwarded to video coding software together to perform video coding to generate the transcoded video; or each video frame is directly sent to video coding software after transcoding is finished, and since the video coding is carried out in sequence during video transcoding, the coding is carried out according to the receiving sequence, so that the transcoded video is generated. The process of performing the sorting process in this step may be similar to the synchronous control manner in the specific application scenario of step 102. Then, for the encoding procedure, the same audio/video processing procedure as in step 101 may be used, and at the same time, in order to reduce the number of data transmission between transcoding and encoding, the transmission redundancy is reduced. The sequentially generating transcoded videos according to the second video frame sequence includes: and acquiring all the second video frame sequences, and sequentially recoding according to the audio and video processing program. Namely, after all video frames are transcoded, the video frames are sequentially arranged and sent to an encoding program together for encoding.

Finally, the transcoded video is output for storage, display, use, or reprocessing of the transcoded video. According to different application scenes and implementation requirements, a specific output mode of the transcoded video can be flexibly selected.

For example, for an application scenario in which the method of the present embodiment is executed on a single device, the transcoded video may be directly output in a display manner on a display component (display, projector, etc.) of the current device, so that an operator of the current device can directly see the content of the transcoded video from the display component.

For another example, for an application scenario executed on a system composed of multiple devices by the method of this embodiment, the transcoded video may be sent to other preset devices serving as receivers in the system through any data communication manner (e.g., wired connection, NFC, bluetooth, wifi, cellular mobile network, etc.), so that the preset devices receiving the transcoded video may perform subsequent processing on the preset devices. Optionally, the preset device may be a preset server, and the server is generally arranged at a cloud end and used as a data processing and storage center, which can store and distribute the transcoded video; the recipient of the distribution is a terminal device, and the holders or operators of the terminal devices may be current users, persons related to subsequent processing of videos, and the like.

For another example, for an application scenario executed on a system composed of multiple devices, the method of this embodiment may directly send the transcoded video to a preset terminal device in an arbitrary data communication manner, where the terminal device may be one or more of the foregoing paragraphs.

In a specific application scenario, as shown in fig. 5, a schematic diagram of a video transcoding system framework in the specific application scenario is shown. Wherein, the frame mainly has 5 processing unit: the system comprises a video decoding unit, a GPU distribution unit, an AI (Artificial Intelligence) super-resolution enhancement unit, a GPU assembly unit and a video coding unit. In order to fully use the CPU resources, the five parts adopt a multithread parallel processing mechanism. Data synchronization between threads is controlled by a semaphore. Specifically, the video decoding unit (which can use FFMpeg decoding): ffmpeg api decodes video, and automatically samples the decoded video frame to 1080p. A GPU distribution unit: the spatial number of the GPUs can be automatically identified, and the video frame sequence is distributed according to the number of the GPUs, so that a subsequent AI super-resolution enhancement unit can process the distributed frame sequence. An AI super-resolution enhancement unit: in the processing unit, a mode of combining CUDA and tensort is adopted, and GPU resources are fully utilized. And data transmission is not needed among each tensorrt reasoning engine, the GPU video memory is shared, and redundancy caused by multiple data transmission is reduced. And the data is transmitted from the CPU end to the GPU end only once, the rest operations are realized at the GPU end, and the finally processed data is transmitted from the GPU end to the CPU end for being used by the coding unit. Fig. 6 is a schematic diagram illustrating a data processing flow of the AI super-resolution enhancement unit. The specific treatment process is as follows: copying the data of the CPU end to the GPU end at one time; preprocessing CUDA data, normalizing and converting types: converting the uint8 data in the range of 0-255 into float32 data in the range of 0-1; calling a tensorrt frame to carry out model 1 reasoning, and sending a data address to a model 2; then calling a tensorrt frame to carry out model 2 reasoning; and (3) performing data post-processing by using the CUDA: data pruning (clip) which truncates data out of the range of 0 to 1, substitutions less than 0 being 0, and substitutions more than 1 being 1, and data type conversion (float 32 to agent 8). The model 1 is generally a denoising model, and the model 2 is generally a hyper-resolution model, but other models may be added thereto in a predetermined order, or added between two models, or added after or before two models, etc. according to the actual situation. GPU set unit: and the transcoding unit is used for receiving the transcoding result and transmitting the transcoding result to the encoding unit in sequence so as to ensure that the condition of frame sequence disorder does not occur. Video coding unit (FFMpeg hard-coding can be used): the video coding unit can support the conventional coding format, and the FFMpeg hard coding mode (h 264_ nvenc, hevc _ nvenc) is adopted to improve the coding speed.

A video transcoding method provided by applying one or more embodiments of the present specification includes: decoding an input video to generate a first video frame sequence; preprocessing all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, and performing transcoding model computation on the preprocessed first video frame sequence by using a deep learning inference frame to generate a transcoded second video frame sequence; and generating the transcoded video according to the second video frame sequence in sequence and outputting the transcoded video. One or more embodiments of the present description leverage graphics processor resources in a manner that combines a parallel computing framework and a deep learning inference framework. And data transmission is not needed among the reasoning frameworks, data is shared, and redundancy caused by multiple data transmission is reduced. The video frame is transmitted only once, and other operations are completed in the graphics processor through a parallel computing frame and a deep learning reasoning frame, so that the effective utilization rate of resources and the data processing speed are greatly improved.

It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment of the disclosure can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It is noted that the above describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In a specific application scene, a multithreading mechanism is adopted to manage video coding and decoding and GPU resource scheduling, and CPU resources are fully used. In the AI super-resolution enhancement unit and the data pre-and-post processing, a tensorrt frame and cuda acceleration are used, and the CPU and GPU data transmission is optimized, so that the redundant operation in the processing process is reduced to the minimum. By combining the special requirements of the super-resolution model on the data precision, the computing power of various display cards on different data precisions and the display card cost are compared, and finally the GeForce RTX in NVIDIA ^TM 2080ti realizes high-definition to 4K real-time super resolution on a single video card. And the number of the GPUs can be increased according to the calculation complexity of a specific model so as to achieve the real-time processing of a high-complexity model, all the GPUs can be automatically identified and used under the system framework, and the processing speed is increased by a factor approaching to that of a single card.

It should be noted that the embodiments of the present disclosure can be further described by the following ways:

in some embodiments, the parallel computing framework is a CUDA framework;

the pretreatment comprises the following steps:

and carrying out normalization and type conversion operation on the first video frame sequence through the CUDA framework.

In some embodiments, the deep learning inference framework is a tensorRT framework;

the transcoding model calculation of the preprocessed first video frame sequence by using the deep learning inference framework comprises the following steps:

calculating the preprocessed first video frame sequence according to a denoising model, a detail enhancement model and/or a hyper-resolution model sequence through the tensorRT frame; and performing convolution and/or deconvolution operation at least once in the denoising model, the detail enhancement model and/or the hyper-resolution model to complete the calculation of the transcoding model.

In some embodiments, the decoding the input video generates a first sequence of video frames comprising:

decoding the input video through an audio and video processing program;

and sampling the decoded video frame into a set video display format to generate the first video frame sequence.

In some embodiments, the preprocessing, by the parallel computing framework, all data for sequentially acquiring each frame of the first sequence of video frames comprises:

the method comprises the steps of identifying a processing space of a currently available graphics processor, determining a distribution amount of the first video frame sequence according to the processing space, and distributing each frame in the first video frame sequence according to the distribution amount in sequence.

In some embodiments, said sequentially generating transcoded video from said second sequence of video frames comprises:

and acquiring all the second video frame sequences, and sequentially recoding according to the audio and video processing program.

In some embodiments, the audiovisual processing program is an FFMpeg program.

Based on the same concept, corresponding to any of the above embodiments, the present disclosure further provides a video transcoding system, as shown in fig. 7, which specifically includes:

a decoding unit 701, configured to decode an input video to generate a first video frame sequence;

the super-resolution enhancement unit 702 is configured to perform pre-processing on all data of each frame in the first video frame sequence obtained in sequence by using a parallel computing frame, perform transcoding model computation on the pre-processed first video frame sequence by using a deep learning inference frame, and generate a transcoded second video frame sequence;

and an encoding output unit 703 configured to generate a transcoded video in sequence according to the second video frame sequence, and output the transcoded video.

For convenience of description, the above system is described with functions divided into various modules, which are described separately. Of course, the functions of the modules may be implemented in the same or multiple software and/or hardware in implementing embodiments of the present disclosure.

The system of the foregoing embodiment is used to implement the corresponding video transcoding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

In an alternative embodiment of the present description, a "plug and play" mode is used in order to enable the addition of processing units. The system provides an extension interface based on a polymorphic mechanism of an assembly language. The assembly language is a programming language such as C language, C + + language and the like. In a specific application scene, when a unit module is written, the unit module can be written based on a C + + multi-state mechanism, so that when the system is written, ports of possibly added units can be defined in advance by using the multi-state mechanism, the adding positions and the operational logic of other processing units are defined, when other processing units need to be added in the specific application scene, the unit module can be written according to a defined format, and then the unit module can be directly added into a system frame, and other processing units can be conveniently added. For example: an HDR (High-Dynamic Range, high Dynamic Range image) unit may be directly added to the super-resolution enhancement unit, and then HDR processing is performed after the video frame is subjected to super-resolution.

It should be noted that, the embodiments of the present disclosure can be further described by the following ways:

in some embodiments, the system provides an extended interface based on a polymorphic mechanism in assembly language.

Based on the same concept, corresponding to the method of any embodiment, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the video transcoding method according to any embodiment.

Fig. 8 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 810, a memory 820, an input/output interface 830, a communication interface 840, and a bus 850. Wherein processor 810, memory 820, input/output interface 830, and communication interface 840 are communicatively coupled to each other within the device via bus 850.

The processor 810 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.

The Memory 820 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 820 can store an operating system and other application programs, and when the technical solutions provided by the embodiments of the present specification are implemented by software or firmware, the relevant program codes are stored in the memory 820 and called to be executed by the processor 810.

The input/output interface 830 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component within the device (not shown) or may be external to the device to provide corresponding functionality. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 840 is used for connecting a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, bluetooth and the like).

Bus 850 includes a pathway for communicating information between various components of the device, such as processor 810, memory 820, input/output interface 830, and communication interface 840.

It should be noted that although the above-mentioned device only shows the processor 810, the memory 820, the input/output interface 830, the communication interface 840 and the bus 850, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the corresponding video transcoding method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Further, devices may be shown in block diagram form in order to avoid obscuring embodiments of the disclosure, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. A method of video transcoding, comprising:

decoding an input video to generate a first video frame sequence;

sequentially generating a transcoded video according to the second video frame sequence, and outputting the transcoded video;

the deep learning inference framework is a tensorrT framework;

the method for performing transcoding model calculation on the preprocessed first video frame sequence by using the deep learning inference framework comprises the following steps:

calculating the preprocessed first video frame sequence according to a denoising model, a detail enhancement model and/or a hyper-resolution model sequence through the tensorRT frame; and performing at least one convolution and/or deconvolution operation in the denoising model, the detail enhancement model and/or the hyper-resolution model to complete the calculation of the transcoding model.

2. The method of claim 1, wherein the parallel computing framework is a CUDA framework;

the pretreatment comprises the following steps:

3. The method of claim 1, wherein the decoding the input video to generate a first sequence of video frames comprises:

decoding the input video through an audio and video processing program;

4. A method as defined in claim 3, wherein said pre-processing all data for sequentially acquiring each frame of the first sequence of video frames using a parallel computing framework comprises:

5. The method of claim 3, wherein the sequentially generating transcoded video from the second sequence of video frames comprises:

6. A method according to claim 3, wherein the audio-visual processing program is an FFMpeg program.

7. A video transcoding system, comprising:

the encoding output unit is used for generating a transcoded video according to the second video frame sequence in sequence and outputting the transcoded video;

the deep learning inference framework is a tensorrT framework;

8. The system of claim 7, wherein the system provides an extension interface based on polymorphic mechanisms in assembly language.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 6 when executing the program.