CN115589488A

CN115589488A - Video transcoding system, method, GPU, electronic device and storage medium

Info

Publication number: CN115589488A
Application number: CN202211215800.6A
Authority: CN
Inventors: 卢子威; 朱瑞博; 王坤; 黄旭; 刘晓峰; 张钰勃; 马凤翔
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-01-10
Anticipated expiration: 2042-09-30
Also published as: CN115589488B

Abstract

The present disclosure relates to a video transcoding system, method, GPU, electronic device and storage medium, the system comprising: the GPU is connected with the GPUs through a video transcoding interconnection bus, and the GPUs comprise a master GPU; the master control GPU is used for distributing video transcoding tasks among the GPUs; and the GPUs transmit the transcoding processing data corresponding to the video transcoding task based on the video transcoding interconnection bus. The embodiment of the disclosure can realize high-speed transmission of transcoding processing data corresponding to a video transcoding task among GPUs based on a video transcoding interconnection bus.

Description

Video transcoding system, method, GPU, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video transcoding system, a video transcoding method, a GPU, an electronic device, and a storage medium.

Background

In the conventional method, a Central Processing Unit (CPU) is used for video transcoding processing, so that more CPU resources are occupied, and particularly, the computing performance is influenced in a graphic video processing or cloud game scene. The video transcoding processing is performed by using a dedicated video transcoding computing unit in a Graphics Processing Unit (GPU) chip, which can relatively save computing resources and improve the overall computing performance. Therefore, in more and more scenes, a large amount of video transcoding services are performed by using a special video transcoding computing unit in a GPU chip. In the related art, the video transcoding tasks on multiple GPU chips are realized by using methods of CPU scheduling and high-speed serial computer expansion bus (PCIE) control data transmission, the control process is complex, the transcoding efficiency is low, and the video transcoding capability of multiple GPU chips cannot be fully exerted.

Disclosure of Invention

The disclosure provides a video transcoding system, a video transcoding method, a GPU, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a video transcoding system, the system comprising: the GPU is connected with the GPUs through a video transcoding interconnection bus, and the GPUs comprise a master GPU; the master control GPU is used for distributing video transcoding tasks among the GPUs; and the GPUs transmit the transcoding processing data corresponding to the video transcoding task based on the video transcoding interconnection bus.

In one possible implementation manner, the master GPU includes: a configuration module; the configuration module is used for configuring memory address information and synchronizing video frame levels for transcoding processing data corresponding to the video transcoding tasks executed in the GPUs.

In a possible implementation manner, for any one GPU, based on the memory address information configured by the configuration module, the transcoding processing data to be transcoded is read from the DDR memory of other GPUs by using the video transcoding interconnection bus.

In a possible implementation manner, the master GPU is further configured to, when an operation load of a first GPU in the multiple GPUs exceeds a preset threshold, allocate a video transcoding task in an unprocessed state corresponding to the first GPU to a second GPU in the multiple GPUs, where the operation load of the second GPU does not exceed the preset threshold.

In one possible implementation, the video transcoding task includes at least one of: decoding task, rendering task, video processing task, and encoding task.

In one possible implementation, the multiple GPUs are located on one circuit board; or, the multiple GPUs are located on multiple circuit boards.

According to an aspect of the present disclosure, there is provided a GPU comprising a master GPU connected to a slave GPU via a video transcoding interconnection bus, the master GPU comprising: the task allocation module is used for allocating video transcoding tasks between the master GPU and the slave GPUs; and the configuration module is used for executing transcoding processing data corresponding to the video transcoding task on the master GPU and the slave GPU, and performing configuration of memory address information and synchronization of video frame levels.

In one possible implementation manner, the GPU further includes: the transcoding module is used for executing the video transcoding task and storing the obtained transcoding processing data in a DDR memory of the GPU; the master control GPU is also used for reading corresponding transcoding processing data from the DDR memory of each GPU through the video transcoding interconnection bus based on the memory address information configured by the configuration module and the synchronization of the video frame level, and transcoding results are obtained after transcoding processing is carried out; and the data transmission module is used for outputting the transcoding result through the PCIE bus or the Ethernet.

In one possible implementation manner, the GPU further includes: the transcoding module is used for executing the video transcoding task and storing the obtained transcoding processing data in a DDR memory of the GPU; each GPU is used for obtaining a corresponding transcoding result after the video transcoding task distributed by the task distribution module is processed; and the data transmission module is used for outputting the transcoding result corresponding to each GPU through a PCIE bus or an Ethernet.

According to an aspect of the present disclosure, a video transcoding method is provided, where the method is applied to a GPU including a master GPU connected to a slave GPU via a video transcoding interconnection bus, and the method includes: based on the master GPU, distributing video transcoding tasks between the master GPU and the slave GPUs; and based on the master GPU, performing transcoding processing data corresponding to the video transcoding task on the master GPU and the slave GPU, and performing configuration of memory address information and synchronization of video frame levels.

In one possible implementation, the method further includes: executing the video transcoding task, and storing the obtained transcoding processing data in a DDR memory of the GPU; based on the configured memory address information and the synchronization of the video frame level, reading corresponding transcoding processing data from the DDR memory of each GPU through the video transcoding interconnection bus, and performing transcoding processing to obtain a transcoding result; and outputting the transcoding result through a PCIE bus or an Ethernet.

In one possible implementation, the method further includes: executing the video transcoding task, and storing the obtained transcoding processing data in a DDR memory of the GPU; each GPU respectively obtains a corresponding transcoding result after the distributed video transcoding task is processed; and outputting the transcoding result corresponding to each GPU through a PCIE bus or an Ethernet.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored by the memory to run the system, or run the GPU, or perform the method.

According to an aspect of the present disclosure, a computer-readable storage medium is provided, on which computer program instructions are stored, which, when executed by a processor, run the above-described system, or run the above-described GPU, or implement the above-described method.

In the embodiment of the disclosure, the video transcoding system comprises a plurality of GPUs, the GPUs are connected through a video transcoding interconnection bus, the video transcoding tasks are distributed among the GPUs through a master control GPU in the GPUs, the video transcoding capacity of different GPUs is fully utilized, high-speed transmission is performed on transcoding processing data corresponding to the video transcoding tasks among the GPUs based on the video transcoding interconnection bus, data transmission among different GPUs is not required through a PCIE bus, the data transmission speed is effectively increased, and the transcoding efficiency of the video transcoding system is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic diagram of a video transcoding system in the related art;

fig. 2 shows a schematic diagram of a video transcoding system according to an embodiment of the present disclosure;

fig. 3 shows a schematic diagram of a master GPU according to an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of a slave GPU according to an embodiment of the present disclosure;

fig. 5 shows a flow diagram of a method of video transcoding in accordance with an embodiment of the present disclosure;

FIG. 6 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure;

FIG. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of a, B, and C, and may mean including any one or more elements selected from the group consisting of a, B, and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.

With the explosive increase of the video transcoding requirement of the network server side, the video transcoding computing capacity in the GPU chip becomes more and more important. Especially in the case of a multi-channel video simultaneous transcoding service: for example, after one path of original video code stream is decoded, the GPU is used to perform resolution scaling, and finally, the original video code stream is encoded into a plurality of encoded code streams of different standards and different resolutions to meet the multi-user requirements on the server, where the calculation ratio of decoding to encoding is 1: n; after a plurality of paths of original video code streams are decoded by a plurality of paths of decoding channels, a plurality of paths of data are combined into one path for coding, and the calculation ratio of decoding to coding is m:1; after decoding the multi-path original video code stream, encoding the multi-path original video code stream into multi-path encoded code streams with different video resolutions and different video formats, and transmitting the multi-path encoded code streams to a user side, wherein the calculation ratio of decoding to encoding is m: n is the same as the formula (I).

Due to the limitation of resources and area of the GPU chip, only a video coding and decoding computing unit with fixed computing power can be arranged on the GPU chip. For the situation that a large number of video encoding and decoding tasks exist, a CPU is required to perform complex task scheduling allocation and memory management in the related art.

The control flow of video coding and decoding processing calculation by a single GPU chip is relatively simple, but the processing capacity and the data bandwidth cannot meet the requirement of real-time processing of a large number of video coding and decoding tasks.

When a plurality of GPU chips perform video coding and decoding processing calculation in parallel, in the related technology, a CPU is required to control a PCIE bus, and the transmission and synchronization of transcoding processing data among different GPU chips are realized. However, since PCIE is affected by version factors, it needs to be compatible with a protocol of an earlier version, which results in a need of more load information such as routing, parity check correlation, and the like in the data transmission process, so that the efficiency of transcoding data transmission in the video transcoding process is low.

Fig. 1 shows a schematic diagram of a video transcoding system in the related art. As shown in fig. 1, the 4 GPU chips communicate with each other and transmit data through a PCIE bus. However, when the PCIE bus transmits the transcoding processing data, the chip may transcode the transcoding processing data in the DDR memory of its own GPU chip after the transcoding processing data is transmitted from the DDR memory of one GPU chip to the DDR memory of another GPU chip, which results in the video transcoding system needing additional data storage space and occupying a larger bus bandwidth. When the video transcoding task is a complex encoding and decoding task, the mode of using the PCIE bus to perform data transmission in the related art cannot efficiently complete the processing of multiple encoding and decoding tasks.

In order to efficiently complete various video coding and decoding tasks in a complex video transcoding scene, the embodiment of the disclosure provides a video transcoding system. The following describes a video transcoding system provided by an embodiment of the present disclosure in detail.

Fig. 2 shows a schematic diagram of a video transcoding system according to an embodiment of the present disclosure. As shown in fig. 2, the video transcoding system includes: the system comprises a plurality of GPUs, a video transcoding interconnection bus, a master control GPU and a plurality of video transcoding interconnection buses, wherein the GPUs comprise the master control GPU; the master control GPU is used for distributing video transcoding tasks among the GPUs; and the GPUs transmit the transcoding processing data corresponding to the video transcoding task based on a video transcoding interconnection bus.

The video transcoding interconnection bus is a fast serial transmission bus communication protocol and has the data transmission characteristics of high bandwidth, low delay and low power consumption. Based on the video transcoding interconnection bus, data interconnection among different GPU chips can be effectively realized, and therefore data transmission efficiency and video transcoding efficiency of the video transcoding system in a complex video transcoding scene are effectively improved.

According to the embodiment of the disclosure, the video transcoding system comprises a plurality of GPUs, the GPUs are connected through a video transcoding interconnection bus, video transcoding tasks are distributed among the GPUs through a master control GPU in the GPUs, the video transcoding capacity of different GPUs is fully utilized, transcoding processing data corresponding to the video transcoding tasks are transmitted at high speed among the GPUs based on the video transcoding interconnection bus, data transmission among the GPUs is not needed through a PCIE bus, the data transmission speed is effectively improved, and the transcoding efficiency of the video transcoding system is further improved.

In one possible implementation, the video transcoding system further includes: a CPU; and the CPU is used for configuring the master GPU in the plurality of GPUs.

Taking the above fig. 2 as an example, as shown in fig. 2, the video transcoding system includes: 4 GPUs: GPU _0 (a), GPU _1 (B), GPU _2 (C), GPU _3 (D), and 1 CPU (J). Among them, CPU (J) can configure GPU _0 (a) as a master GPU, i.e., GPU _0_main (a). The other GPUs except the master GPU are all slave GPUs.

GPU _0 (a) may perform the distribution of video transcoding tasks among 4 GPUs. The 4 GPUs are connected through video transcoding-based interconnection buses a, b, c, d, e and f. Based on the video transcoding interconnection buses a, b, c, d, e and f, interconnection and transmission of transcoding processing data corresponding to the video transcoding task among 4 GPUs can be achieved.

In one possible implementation, the master GPU includes: a configuration module; and the configuration module is used for configuring the memory address information and synchronizing the video frame level for transcoding processing data corresponding to the video transcoding tasks executed in the GPUs.

Fig. 3 shows a schematic diagram of a master GPU according to an embodiment of the present disclosure. As shown in fig. 3, the master GPU includes: the video transcoding unit A1 (encoder _ core (A1)) responsible for video encoding, the video transcoding unit A2 (decoder _ core (A2)) responsible for video decoding, the GPU computing unit A3 (GPU _ core (A3)) responsible for computing, the DDR memory unit A5 responsible for data storage, and the configuration module A4.

In the case where the master GPU shown in fig. 3 is the master GPU _0 (a) shown in fig. 2, based on the configuration module A4 in the master GPU _0 (a), the configuration module may perform the following operations on the 4 GPUs shown in fig. 2: and transcoding processing data corresponding to the video transcoding task is executed in the GPU _0 (A), the GPU _1 (B), the GPU _2 (C) and the GPU _3 (D), and the configuration of memory address information and the synchronization of video frame levels are carried out.

In a possible implementation manner, for any one GPU, based on the memory address information configured by the configuration module, the transcoding processing data needing transcoding is read from DDR memories of other GPUs by using a video transcoding interconnection bus.

The master GPU can configure the memory address information of the transcoding processing data needing transcoding to the slave GPU executing the video transcoding task based on the configuration module, so that the slave GPU can obtain the memory address information and directly read the transcoding processing data from the DDR memories of other GPUs according to the memory address information without transmitting the transcoding processing data to the DDR memory of the slave GPU, DDR memory resource sharing among different GPUs is effectively achieved, memory resources on different GPUs are fully utilized, flexible application of the memory resources is improved, and pressure of system bandwidth is reduced.

The memory address information may include first address information and address length information, or the memory address information may be set to other formats according to actual situations, which is not specifically limited in this disclosure.

Fig. 4 shows a schematic diagram of a slave GPU according to an embodiment of the present disclosure. As shown in fig. 4, the slave GPU includes: the video transcoding unit B1 (encoder _ core (B1)) responsible for video encoding, the video transcoding unit B2 (decoder _ core (B2)) responsible for video decoding, the GPU computing unit B3 (GPU _ core (B3)) responsible for computing and the DDR memory unit B4 responsible for data storage. The slave GPU _1 (B), the slave GPU _2 (C), and the slave GPU _3 (D) shown in fig. 2 may have the structure shown in fig. 4, which is not described herein again.

Taking the above fig. 2 as an example, the transcoding processing data to be transcoded is stored in the DDR memory of the slave GPU _1 (B), and the master GPU _0 (a) configures the memory address information of the transcoding processing data stored in the slave GPU _1 (B) to the slave GPU _2 (C) that needs to perform the video transcoding task, so that the slave GPU _2 (C) can directly access the DDR memory of the slave GPU _1 (B) to read the transcoding processing data for processing based on the video transcoding interconnection bus f between the slave GPU _2 (C) and the slave GPU _1 (B).

In one possible implementation, the video transcoding task includes: decoding task, rendering task, video processing task, and encoding task.

In a complex video transcoding scenario, a video transcoding task may include: decoding tasks, rendering tasks (performing rendering processing on decoded data), video processing tasks (performing video processing on decoded data and/or rendered data), encoding tasks. The video processing task may include scaling, edge enhancement, background blurring, contrast adjustment, color enhancement, and other processing, which is not specifically limited by this disclosure. The video transcoding task may include other graphics processing tasks in addition to the tasks described above, which are not specifically limited by this disclosure.

In a possible implementation manner, the master GPU is further configured to, when an operation load of a first GPU in the multiple GPUs exceeds a preset threshold, allocate the video transcoding task in an unprocessed state corresponding to the first GPU to a second GPU in the multiple GPUs, where the operation load of the second GPU does not exceed the preset threshold.

By using a hardware scheduling mode, the transcoding computing power of different GPUs can be fully utilized, and the condition that one GPU is overloaded and other GPUs are in no-load states is avoided, so that the waste of computing resources is avoided, and the transcoding efficiency of the system is improved.

The specific value of the preset threshold may be determined according to actual conditions, and is not specifically limited by the present disclosure.

In one possible implementation, multiple GPUs are located on one circuit board; or, multiple GPUs are located on multiple circuit boards.

The following describes in detail a specific transcoding process under various video transcoding scenarios by using the video transcoding system provided by the present disclosure.

A first video transcoding scenario: and decoding the multiple paths of original video code streams, performing GPU calculation processing on the decoded data according to user requirements, and synthesizing a path of data to encode a scene.

First, a CPU of the video transcoding system configures a master GPU and a slave GPU of the plurality of GPUs.

Taking the above fig. 2 as an example, the CPU (J) configures GPU _0 (a) as the master GPU, i.e. GPU _0 \/main (a). GPU _1 (B), GPU _2 (C), and GPU _3 (D) are all slave GPUs.

And secondly, the master control GPU performs decoding processing on the input original video code stream and stores the decoded data into a DDR memory of the master control GPU by taking a video frame as a unit. And the slave GPU also executes decoding processing on the input original video code stream, and stores the decoded data into the DDR memory of the slave GPU by taking the video frame as a unit.

And thirdly, according to the user requirements, the main control GPU performs GPU calculation processing such as rendering and/or zooming on the decoded data. And the slave GPU also carries out GPU calculation processing such as rendering and/or scaling on the corresponding decoded data according to the user requirement.

And fourthly, because the video decoding, rendering and/or scaling calculation speeds of different GPUs are different, the main control GPU carries out memory address information configuration and video frame level synchronization on transcoding processing data in all GPUs.

Taking the above fig. 2 as an example, the master GPU: GPU _0_main (a) versus slave GPU: GPU _1 (B), GPU _2 (C), and GPU _3 (D) perform configuration of memory address information and synchronization of video frame levels.

For example, the master GPU allocates the memory address information of the transcoding processing data to be transcoded to the slave GPU performing the transcoding processing. The memory address information includes first address information, address length information, and the like.

Based on the configuration of the master GPU, different GPUs can access each other to acquire needed transcoding processing data.

And fifthly, simultaneously performing GPU calculation processing such as video decoding, rendering and/or scaling at the video frame level by all GPUs, and storing the transcoding result into a DDR memory of the GPU by taking the video frame as a unit.

And sixthly, transmitting the transcoding processing data stored in the DDR memories of different GPUs to the master control GPU through a video transcoding interconnection bus to finish the final synthesis coding.

Taking the above fig. 2 as an example, after 4 GPUs complete the synchronous transcoding calculation, the slave GPU: GPU _1 (B), GPU _2 (C), GPU _3 (D), by being in communication with the master GPU: video transcoding interconnection buses a, e and d between the GPUs _0 \/main (A) transmit transcoding processing data in respective DDR memories to a master GPU: GPU _0_ main (A).

A master control GPU: and the GPU _0_ main (A) encodes the summarized transcoding processing data to obtain a final transcoding result.

And seventhly, after the video transcoding is finished, transmitting the final transcoding result to the user through the PCIE bus.

Taking the above fig. 2 as an example, the master GPU: GPU _0_ main (a) transmits the final transcoding result to the user side based on PCIE bus g.

After the video transcoding is completed, the final transcoding result may also be transmitted to the user through Ethernet (Ethernet), which is not specifically limited by this disclosure.

A second video transcoding scenario: decoding one path of original video code stream, performing GPU calculation processing on the decoded data according to user requirements, and then encoding the decoded data into scenes of code streams with various video formats and different resolutions.

First, a CPU of a video transcoding system configures a master GPU and a slave GPU of a plurality of GPUs.

And secondly, the master control GPU executes decoding processing on the input original video code stream and stores the decoded data into a DDR memory of the master control GPU by taking a video frame as a unit.

Taking the above fig. 2 as an example, the master GPU: GPU _0 \ (A) decodes the original video code stream, and stores the decoded data into the DDR memory of GPU _0 \ (A).

And thirdly, because the video decoding, rendering and/or scaling calculation speeds of different GPUs are different, the main control GPU performs memory address information configuration and video frame level synchronization on transcoding processing data in all GPUs.

Taking the above fig. 2 as an example, the master GPU: GPU _0_main (a) versus slave GPU: the GPU _1 (B), GPU _2 (C), and GPU _3 (D) perform configuration of memory address information and synchronization of video frame levels.

And fourthly, the master control GPU transmits the decoded data to the slave GPU through the video transcoding interconnection bus, meanwhile, according to user requirements, the master control GPU performs GPU computing processing such as rendering or zooming on the decoded data, and the slave GPU also performs GPU computing processing such as rendering or zooming on the video data transmitted by the video transcoding interconnection bus.

And fifthly, coding the calculation results after GPU calculation processing such as rendering or zooming by all GPUs executing the transcoding task, coding the calculation results into code streams with different resolutions and video formats according to the requirements of different users to obtain respective transcoding results, and finally respectively transmitting the transcoding results to the users through a PCIE bus.

Taking the above fig. 2 as an example, according to the user requirement, the master GPU: the GPU _0_main (a) reads the decoded data from the DDR memory of the GPU _0_main (a) to perform GPU calculations such as rendering and/or scaling, stores the calculation result in the DDR space of the GPU _0_main (a), and then reads the calculation result from the DDR memory of the GPU _0_main (a) to perform video encoding.

Because the master GPU: GPU _0_ main (A) for the slave GPU: the GPU _1 (B), the GPU _2 (C), and the GPU _3 (D) perform configuration of memory address information and synchronization of video frame levels, so that the GPU _1 (B), the GPU _2 (C), and the GPU _3 (D) can obtain correct transcoding processing data from the DDR memory of the GPU _0 \main (a). The GPU _0 \ (A) completes data transmission with the GPU _1 (B), the GPU _2 (C) and the GPU _3 (D) through video transcoding interconnection buses a, e and D among the GPU _1 (B), the GPU _2 (C) and the GPU _3 (D).

According to the user requirements, GPU calculation such as rendering, scaling conversion and the like is carried out on the acquired transcoding processing data by the GPU _1 (B), the GPU _2 (C) and the GPU _3 (D), and calculation results are stored in respective DDR memories. According to the user requirement and the synchronization information of the video frame level between the GPU _0 \ (A), the GPU _1 (B), the GPU _2 (C) and the GPU _3 (D) respectively read the calculation results from the DDR memories to carry out video coding.

Finally, GPU _0_main (a), GPU _1 (B), GPU _2 (C), and GPU _3 (D) are transmitted to the user side through PCIE buses g, h, i, and j, respectively.

A third video transcoding scenario: and decoding the multi-path original video code streams, performing GPU calculation processing on the decoded data according to user requirements, and encoding the data into scenes of code streams with various video formats and different resolutions.

And secondly, the master control GPU executes decoding processing on the input original video code stream and stores the decoded data into a DDR memory of the master control GPU by taking a video frame as a unit. And the slave GPU also executes decoding processing on the input original video code stream, and stores the decoded data into the DDR memory of the slave GPU by taking the video frame as a unit.

And thirdly, according to the user requirements, the main control GPU performs GPU calculation processing such as rendering or zooming on the decoded data. And the slave GPU also carries out GPU calculation processing such as rendering or zooming on the corresponding decoded data according to the user requirement.

And fourthly, configuring memory address information and synchronizing video frame levels of transcoding processing data in all the GPUs by the master control GPU due to different video decoding, rendering and/or scaling calculation speeds of the GPUs.

Taking the above fig. 2 as an example, the master GPU: GPU _0_ main (A) for the slave GPU: GPU _1 (B), GPU _2 (C), and GPU _3 (D) perform configuration of memory address information and synchronization of video frame levels.

Fifthly, after all the GPUs finish GPU rendering calculation such as video decoding, rendering or zooming of one frame of video data, the calculation results are stored in respective DDR memories by taking the video frame as a unit.

And sixthly, coding the calculation results subjected to GPU calculation processing such as rendering or zooming by all GPUs executing the transcoding task, and coding the calculation results into code streams with different resolutions and video formats according to the requirements of different users.

And seventhly, when the GPU is in the state that the video transcoding task is idle, the master control GPU configures the GPU with multiple video transcoding tasks to transmit transcoding processing data to the GPU with the idle transcoding task through the video transcoding interconnection bus.

In an example, the master GPU may determine whether the GPU is idle in the transcoding task and whether the GPU is under a heavy load according to the running load of each GPU and the current state of the video transcoding task, so as to schedule the video transcoding task based on the actual load condition of each GPU.

Taking the above fig. 2 as an example, in the scheduling process, the GPU _0_ main (a), GPU _1 (B), GPU _2 (C), and GPU _3 (D) transmit the transcoding processing data corresponding to the video transcoding task through the video transcoding interconnection buses a, B, C, D, e, and f.

And seventhly, after finishing video transcoding, transmitting the final transcoding result to the user through the PCIE bus.

Taking the above fig. 2 as an example, GPU _0_main (a), GPU _1 (B), GPU _2 (C), and GPU _3 (D) are transmitted to the user side through PCIE buses g, h, i, and j, respectively.

The video transcoding system provided by the embodiment of the disclosure uses the video transcoding interconnection bus to efficiently transmit transcoding processing data among different GPUs, and inter-chip data transmission is not required through a PCIE bus, so that the transcoding computing capacity of the video transcoding system is greatly improved, and the real-time performance of the video transcoding system is also improved.

The video transcoding system provided by the embodiment of the disclosure can realize DDR memory resource sharing between different GPUs through the flexible configuration mode of the master GPU to the slave GPUs by configuring the memory address information and synchronizing the video frame level of all the slave GPUs through the master GPU. In addition, the GPUs which need transcoding processing can mutually access the DDR memory of the other side, memory resources on different GPUs are fully utilized, flexible application of the memory of the video transcoding system is improved, and pressure of system bandwidth is reduced.

The embodiment of the present disclosure further provides a GPU including a master GPU connected to a slave GPU through a video transcoding interconnection bus, the master GPU including: the task allocation module is used for allocating video transcoding tasks between the master GPU and the slave GPUs; and the configuration module is used for executing transcoding processing data corresponding to the video transcoding task on the master GPU and the slave GPU, and performing configuration of memory address information and synchronization of video frame levels.

The specific structures of the master GPU and the slave GPU and the connection relationship therebetween may refer to the above description, which is not repeated herein.

In one possible implementation, the GPU further includes: the transcoding module is used for executing a video transcoding task and storing the obtained transcoding processing data in a DDR memory of the GPU; the master control GPU is also used for reading corresponding transcoding processing data from the DDR memory of each GPU through a video transcoding interconnection bus based on the memory address information configured by the configuration module and the synchronization of the video frame level, and transcoding results are obtained after transcoding processing is carried out; and the data transmission module is used for outputting a transcoding result through the PCIE bus or the Ethernet.

The process of the GPU executing the transcoding task may refer to the detailed description of the first video transcoding scenario, which is not described herein again.

In one possible implementation, the GPU further includes: the transcoding module is used for executing a video transcoding task and storing the obtained transcoding processing data in a DDR memory of the GPU; each GPU respectively obtains a corresponding transcoding result after processing the video transcoding task distributed by the task distribution module; and the data transmission module is used for outputting the transcoding result corresponding to each GPU through the PCIE bus or the Ethernet.

The process of the GPU executing the transcoding task may refer to the detailed description of the second and third video transcoding scenarios, which is not described herein again.

Fig. 5 shows a flow chart of a video transcoding method according to an embodiment of the present disclosure. The video transcoding method is applied to the GPU, the GPU comprises a master GPU, and the master GPU is connected with a slave GPU through a video transcoding interconnection bus. The specific structures of the master GPU and the slave GPU and the connection relationship therebetween may refer to the above description, which is not repeated herein. As shown in fig. 5, the video transcoding method includes:

in step S51, based on the master GPU, a video transcoding task is allocated between the master GPU and the slave GPU;

in step S52, based on the master GPU, transcoding processing data corresponding to the video transcoding task is executed on the master GPU and the slave GPU, and configuration of memory address information and synchronization of video frame levels are performed.

According to the embodiment of the disclosure, based on the master control GPU, video transcoding tasks are distributed between the master control GPU and the slave GPUs connected with the master control GPU through the video transcoding interconnection buses, the video transcoding capacities of different GPUs are fully utilized, high-speed transmission is performed on transcoding processing data corresponding to the video transcoding tasks based on the video transcoding interconnection buses among the GPUs, data transmission among different GPUs is not required through PCIE buses, the data transmission speed is effectively increased, and the video transcoding efficiency is further improved.

In one possible implementation, the video transcoding method further includes: executing a video transcoding task, and storing the obtained transcoding processing data in a DDR memory of a GPU; based on the configured memory address information and the synchronization of the video frame level, reading corresponding transcoding processing data from the DDR memory of each GPU through a video transcoding interconnection bus, and performing transcoding processing to obtain a transcoding result; and outputting the transcoding result through a PCIE bus or an Ethernet.

In one possible implementation, the video transcoding method further includes: executing a video transcoding task, and storing the obtained transcoding processing data in a DDR memory of a GPU; each GPU respectively obtains a corresponding transcoding result after the distributed video transcoding task is processed; and outputting the transcoding result corresponding to each GPU through a PCIE bus or an Ethernet.

It is understood that the above-mentioned embodiments of the method of the present disclosure can be combined with each other to form a combined embodiment without departing from the principle logic, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an electronic device, a computer-readable storage medium, and a program, which may all be used to implement any one of the video transcoding systems, methods, and GPUs provided by the present disclosure, and the corresponding technical solutions and descriptions thereof and the corresponding descriptions in the methods section are not described again.

The method has specific technical relevance with the internal structure of the computer system, and can solve the technical problems of how to improve the hardware operation efficiency or the execution effect (including reducing data storage capacity, reducing data transmission capacity, improving hardware processing speed and the like), thereby obtaining the technical effect of improving the internal performance of the computer system according with the natural law.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium, on which computer program instructions are stored, and when executed by a processor, the computer program instructions execute the above-mentioned system, or execute the above-mentioned GPU, or implement the above-mentioned method. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored by the memory to run the system described above, or run the GPU described above, or perform the method described above.

Embodiments of the present disclosure also provide a computer program product, which includes computer readable code or a non-volatile computer readable storage medium carrying computer readable code, when the computer readable code runs in a processor of an electronic device, the processor in the electronic device executes the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the disclosure. Referring to fig. 6, the electronic device 800 may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like.

Referring to fig. 6, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as wireless network (Wi-Fi), second generation mobile communication technology (2G), third generation mobile communication technology (3G), fourth generation mobile communication technology (4G), long term evolution of universal mobile communication technology (LTE), fifth generation mobile communication technology (5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

The disclosure relates to the field of augmented reality, and in particular relates to a method for detecting or identifying relevant features, states and attributes of a target object by acquiring image information of the target object in a real environment and by means of various visual correlation algorithms, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

FIG. 7 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure. Referring to fig. 7, the electronic device 1900 may be provided as a server or a terminal device. Referring to fig. 7, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, that are executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932 ^TM ) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X) ^TM ) Multi-user, multi-process computer operating system (Unix) ^TM ) Free and open native code Unix-like operating System (Linux) ^TM ) Open native code Unix-like operating System (FreeBSD) ^TM ) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as a memory 1932, is also provided that includes computer program instructions executable by a processing component 1922 of an electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as a punch card or an in-groove protruding structure with instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK) or the like.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization in the modes of pop-up window information or asking the person to upload personal information thereof and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A video transcoding system, the system comprising: the system comprises a plurality of Graphic Processing Units (GPUs), a video transcoding interconnection bus and a master control GPU, wherein the GPUs are connected with one another through the video transcoding interconnection bus;

the master control GPU is used for distributing video transcoding tasks among the GPUs;

and the GPUs transmit the transcoding processing data corresponding to the video transcoding task based on the video transcoding interconnection bus.

2. The system according to claim 1, wherein the master GPU comprises: a configuration module;

the configuration module is used for configuring memory address information and synchronizing video frame levels for transcoding processing data corresponding to the video transcoding tasks executed in the GPUs.

3. The system according to claim 2, wherein for any GPU, based on the memory address information configured by the configuration module, the transcoding processing data to be transcoded is read from DDR memories of other GPUs by using the video transcoding interconnection bus.

4. The system according to any one of claims 1 to 3, wherein the master GPU is further configured to, if the running load of a first GPU of the GPUs exceeds a preset threshold, allocate the video transcoding task in the unprocessed state corresponding to the first GPU to a second GPU of the GPUs, wherein the running load of the second GPU does not exceed the preset threshold.

5. The system according to any one of claims 1 to 4, wherein the video transcoding task comprises at least one of: decoding task, rendering task, video processing task, and encoding task.

6. The system according to any one of claims 1 to 5, wherein the plurality of GPUs are located on one circuit board; or, the multiple GPUs are located on multiple circuit boards.

7. A GPU comprising a master GPU connected to a slave GPU via a video transcoding interconnect bus, the master GPU comprising:

the task allocation module is used for allocating video transcoding tasks between the master GPU and the slave GPUs;

and the configuration module is used for executing transcoding processing data corresponding to the video transcoding task on the master GPU and the slave GPU, and performing configuration of memory address information and synchronization of video frame levels.

8. The GPU of claim 7, wherein the GPU further comprises:

the transcoding module is used for executing the video transcoding task and storing the obtained transcoding processing data in a DDR memory of the GPU;

the master control GPU is also used for reading corresponding transcoding processing data from the DDR memory of each GPU through the video transcoding interconnection bus based on the memory address information configured by the configuration module and the synchronization of the video frame level, and transcoding results are obtained after transcoding processing is carried out;

and the data transmission module is used for outputting the transcoding result through the PCIE bus or the Ethernet.

9. The GPU of claim 7, wherein the GPU further comprises:

each GPU is used for obtaining a corresponding transcoding result after the video transcoding task distributed by the task distribution module is processed;

and the data transmission module is used for outputting the transcoding result corresponding to each GPU through a PCIE bus or an Ethernet.

10. A video transcoding method is applied to a GPU (graphics processing unit), the GPU comprises a master GPU, the master GPU is connected with a slave GPU through a video transcoding interconnection bus, and the method comprises the following steps:

based on the master GPU, distributing video transcoding tasks between the master GPU and the slave GPUs;

and based on the master GPU, performing transcoding processing data corresponding to the video transcoding task on the master GPU and the slave GPU, and performing configuration of memory address information and synchronization of video frame levels.

11. The method of claim 10, further comprising:

executing the video transcoding task, and storing the obtained transcoding processing data in a DDR memory of the GPU;

based on the configured memory address information and the synchronization of the video frame level, reading corresponding transcoding processing data from the DDR memory of each GPU through the video transcoding interconnection bus, and performing transcoding processing to obtain a transcoding result;

and outputting the transcoding result through a PCIE bus or an Ethernet.

12. The method of claim 10, further comprising:

each GPU respectively obtains a corresponding transcoding result after the distributed video transcoding task is processed;

and outputting the transcoding result corresponding to each GPU through a PCIE bus or an Ethernet.

13. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to execute the system of any of claims 1 to 6, or to execute the GPU of any of claims 7 to 9, or to perform the method of any of claims 10 to 12.

14. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, run the system of any one of claims 1 to 6, or run the GPU of any one of claims 7 to 9, or implement the method of any one of claims 10 to 12.