CN111314706B

CN111314706B - Video transcoding method and device

Info

Publication number: CN111314706B
Application number: CN201811510054.7A
Authority: CN
Inventors: 李庆文
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2023-08-25
Anticipated expiration: 2038-12-11
Also published as: CN111314706A; WO2020119670A1

Abstract

The embodiment of the application discloses a video transcoding method and a device, wherein the method comprises the following steps: acquiring a source video; determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path; and transcoding the source video based on the determined path information. The technical scheme provided by the application can improve the video transcoding efficiency.

Description

Video transcoding method and device

Technical Field

The application relates to the technical field of internet, in particular to a video transcoding method and device.

Background

With the continuous development of internet technology, more and more video playing platforms are emerging. In order to provide video of different quality to users, video playback platforms typically need to transcode the source video to generate multiple videos with different resolutions and different code rates.

Currently, for some multi-level dependent transcoding scenarios, e.g. scenarios for producing high frame rate video, it is often necessary to perform different transcoding tasks by means of multiple transcoding machines, respectively. For example, prior to transcoding the source video, it is necessary to perform high frame rate conversion on the source video to generate an intermediate result, and then transcode the intermediate result, thereby generating multiple videos with different resolutions and different code rates. In such a transcoding scenario relying on an intermediate result, after the completion of the transcoding task for generating the intermediate result, it is generally required to upload the intermediate result to the external storage platform, and then, when performing multiple transcoding tasks based on the intermediate result, it is required to read the intermediate result from the external storage platform multiple times, and both the uploading process and the reading process from the external storage platform are time-consuming, resulting in low video transcoding efficiency.

Accordingly, there is a need to provide a faster video transcoding method.

Disclosure of Invention

The embodiment of the application aims to provide a video transcoding method and device, which can improve the video transcoding efficiency.

To achieve the above object, an embodiment of the present application provides a video transcoding method, including: acquiring a source video; determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path; and transcoding the source video based on the acquired path information.

In order to achieve the above object, an embodiment of the present application further provides a video transcoding device, including: the video acquisition unit is used for acquiring a source video; a path determining unit configured to determine path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path; and the transcoding unit is used for transcoding the source video based on the acquired path information.

In order to achieve the above object, an embodiment of the present application further provides a video transcoding device, where the device includes a memory and a processor, where the memory is configured to store a computer program, and the computer program is executed by the processor to implement the video transcoding method described above.

From the above, according to the technical scheme provided by the application, after the source video is acquired, the path information of transcoding from the source video to the target video can be determined for the target video to be output. The path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path. In this way, according to the transcoding path, the source video and other intermediate nodes in the transcoding path can be transcoded in turn by a transcoding mode between nodes in the transcoding path, and the target video is output. Therefore, the whole transcoding process can be completed in one transcoding machine, and the uploading process and the process of reading from an external storage platform are reduced, so that the video transcoding time is shortened, and the video transcoding efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some of the embodiments described in the application, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a video transcoding method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a directed acyclic transcoding architecture in accordance with an embodiment of the present application;

fig. 3 is a schematic structural diagram of a video transcoding device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of another video transcoding device according to an embodiment of the present application.

Detailed Description

In order to make the technical solution of the present application better understood by those skilled in the art, the technical solution of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

The application provides a video transcoding method which can be applied to terminal equipment with an image processing function. The terminal device may be, for example, a desktop computer, a notebook computer, a tablet computer, a workstation, etc. In addition, the method can be applied to a service server of the video playing website, wherein the service server can be an independent server or a server cluster formed by a plurality of servers.

Referring to fig. 1, the video transcoding method provided by the present application includes the following steps.

S11: and acquiring a source video.

In this embodiment, by transcoding the source video, multiple videos with different resolutions and different code rates can be generated.

In this embodiment, the manner of obtaining the source video may include reading the source video from the storage path according to the provided storage path or receiving the source video sent by other terminal devices.

S13: determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path.

In this embodiment, in an actual application scenario, for a target video to be output, transcoding from the source video to the target video may need to go through a transcoding process between multiple nodes. For example, for some multi-level dependent transcoding scenarios, such as those that produce high frame rate video, it is necessary to first convert the source video to the high frame rate source video by a frame rate conversion technique (Frame Rate Conversion, FRC), i.e., generate an intermediate result, then transcode this intermediate result to generate a video with a specified resolution, and finally transcode the video to generate a target video with a specified video format and a specified resolution. Thus, in order to output the target video with the specified video format and the specified resolution, two intermediate nodes are required, and the whole transcoding process can be divided into four levels, namely a first level with the source video as a root node, a second level with the source video with high frame rate as a child node, a third level with the video with the specified resolution as a three-level node, and a fourth level with the target video with the specified resolution and the specified video format as a leaf node. Wherein the output of the nodes of the second level depends on the nodes of the first level, the output of the nodes of the third level depends on the nodes of the second level, and the output of the nodes of the fourth level depends on the nodes of the third level. In order to implement such a multi-level dependent complex transcoding process in the terminal device, path information for transcoding from the source video to the target video may be determined first. The path information may include a transcoding path and a transcoding manner between each node in the transcoding path. In this way, the transcoding process of the source video can be implemented in a terminal device based on the determined path information, so as to obtain the target video.

In one embodiment, the transcoding process of transcoding the source video into the target video may be a multi-level dependent transcoding process existing in practical application, so that the dependency relationship between the transcoding tasks can be obtained from the existing separated multiple transcoding tasks, and the input and the output in the transcoding tasks can be used as each node in a transcoding path. In this way, a transcoding path from the source video to the target video, and the individual nodes in the transcoding path, are available. The transcoding mode between each node can also be directly obtained through each separated transcoding task. In this way, path information for transcoding from the source video to the target video may be determined. In this embodiment, the transcoding mode corresponds to a video transcoding parameter required for transcoding one video to another video, and the parameter value of the video transcoding parameter may be determined according to the parameter values of the video parameters and the audio parameters of the two videos. The transcoding parameters may include, for example, fidelity, resolution, transmission code rate, etc. After the transcoding parameters are set, the video can be transcoded, so that transcoded video conforming to the transcoding parameters is obtained.

In one embodiment, in consideration that the transcoding process of transcoding the source video into the target video may not be found from the existing multi-level dependent transcoding process, a path recognition model for recognizing transcoding path information may be constructed by adopting a deep learning method in the practical application process. For example, path information corresponding to a video group composed of the source video and the target video may be identified by a support vector machine (support vector machine, SVM). The source video and the target video can be respectively used as a path starting node and a path ending node in the path identification model. Specifically, when the path recognition model is constructed, a training sample set may be obtained in advance, and the training sample set may be used to train the path recognition model, so that the path recognition model can recognize path information corresponding to the input video group. The training sample set may include sample video groups for which the corresponding transcoding path corresponds to the path information and sample video groups for which the corresponding transcoding path does not correspond to the path information. The sample video group may include sample videos corresponding to the path start node and the path end node respectively. In this way, during the training process, the sample video sets in the training sample set may be sequentially input into the path recognition model. An initial neural network can be constructed in the path recognition model, and initial prediction parameters can be preset in the neural network. And after the input sample video group is processed through the initial prediction parameters, a prediction result of the sample video group can be obtained, and the prediction result can be used for representing whether a transcoding path corresponding to the sample video group accords with the path information. Specifically, when the path recognition model processes a sample video group, first a first feature vector corresponding to parameter information of the source video and a second feature vector corresponding to parameter information of the target video may be extracted respectively. The elements in the first feature vector may be parameter values of respective parameters of the source video, for example, parameter values of video parameters or audio parameters, which may include video resolution, video code rate, video frame rate, video format, and so on. Similarly, the elements in the first feature vector may be parameter values for respective parameters of the target video. In this way, the path recognition model may read the parameter value of each parameter in the source video corresponding to the path start node in the sample video group and the parameter value of each parameter in the target video corresponding to the path end node, and form the parameter values into the first feature vector and the second feature vector according to the reading order. In practical applications, since the number of parameters is generally large, the dimension of the extracted feature vector is also large, and thus, more resources are consumed to process the feature vector. In view of this, in this embodiment, a convolutional neural network (Convolutional Neural Network, CNN) may also be used to process the sample video set, so as to obtain feature vectors with smaller dimensions for subsequent recognition processing.

In this embodiment, after the data of the input sample video group is processed through the neural network, a probability value vector of the sample video group may be obtained. A probability value for the specified path information may be included in the probability value vector. The probability value vector may include two probability values that respectively represent a probability that the transcoding path conforms to the specified path information and a probability that the transcoding path does not conform to the specified path information. For example, after a set of sample video sets corresponding to the transcoding path conforming to the specified path information is input, a probability value vector of (0.4, 0.8) can be obtained through the path recognition model, wherein 0.4 represents a probability that the transcoding path conforms to the specified path information and 0.8 represents a probability that the transcoding path does not conform to the specified path information. Since the initial prediction parameters in the path recognition model may not be set accurately enough, the probability result predicted by the path recognition model may not be consistent with the actual situation. For example, the above-described input is a sample video group whose transcoding path conforms to the specified path information, but the probability of the obtained probability vector indicating that the transcoding path conforms to the specified path information is only 0.4, and the probability of indicating that the transcoding path does not conform to the specified path information is 0.8. In this case, the prediction result is indicated as incorrect. At this time, the initial prediction parameters in the path recognition model may be adjusted according to the difference value between the prediction result and the correct result. In particular, the sample video group may have a theoretical probability value result. For example, the theoretical probability value result that the transcoding path conforms to the specified path information may be (1, 0), where 1 represents the probability value that the transcoding path conforms to the specified path information. At this time, the predicted probability value result and the theoretical probability value result can be subtracted to obtain a difference value between the predicted probability value result and the theoretical probability value result, then the obtained difference value can be utilized to adjust the initial prediction parameters of the neural network, and finally the obtained prediction result accords with the correct result after the sample video group is processed again through the adjusted prediction parameters. Therefore, after a large number of training samples are trained, the path identification model can distinguish whether the transcoding path corresponding to the sample video group accords with the appointed transcoding path, so that the path information of the actual transcoding path corresponding to the sample video group can be identified.

S15: and transcoding the source video based on the acquired path information.

In this embodiment, after determining the path information for transcoding from the source video to the target video to be output, the transcoding process for the source video may be implemented in a terminal device based on the determined path information, so as to obtain the target video. Specifically, according to the transcoding paths included in the path information, the source video may be transcoded in a transcoding manner between nodes in the transcoding paths, so that the target video may be obtained. For example, the transcoding path includes four nodes, which are a root node, a child node, a three-level node and a four-level node according to the transcoding sequence. The root node is the source video, and the four-level nodes are the target videos. Then, the root node can be transcoded in a transcoding mode between the root node and the child node to obtain the video corresponding to the child node. And then, transcoding the video corresponding to the child node by a transcoding mode between the child node and the tertiary node to obtain the video corresponding to the tertiary node. And finally, transcoding the video corresponding to the third-level node by a transcoding mode between the third-level node and the fourth-level node to obtain the target video. Therefore, the whole multi-level dependent transcoding process is completed in one terminal device, and the uploading process of videos corresponding to the intermediate nodes and the video processes read from an external storage platform are reduced, so that the video transcoding time can be reduced, and the video transcoding efficiency is improved.

In this embodiment, in some complex transcoding scenarios, at least two target videos are often included in the video to be output, so that multiple transcoding paths for these target videos may occur. When there are overlapping paths in the transcoding paths, in order to avoid repeating the transcoding process in the overlapping paths, it may be first determined whether there are overlapping paths in the transcoding paths for each target video, and if there are overlapping paths, the source video may be transcoded according to the overlapping paths first to obtain an intermediate node, and the intermediate node may be transcoded according to non-overlapping paths in each transcoding path. Specifically, for two kinds of target videos, after determining first path information transcoded from a source video into the first target video and second path information transcoded from the source video into the second target video, if an overlapping path exists between the first path information corresponding to the first target video and the second path information corresponding to the second target video, transcoding the source video according to the overlapping path to obtain an intermediate node, and then transcoding the intermediate node according to a non-overlapping path in the first path information and the second path information. To achieve the above procedure, a transcoding structure including directed acyclic graphs (Directed Acyclic Graph, DAG) of the nodes can be constructed from the up-down dependency between the nodes in the path information for the target videos.

For example, in an application scenario in which a source video is transcoded into a target video having dolby audio, it is necessary to output a plurality of target videos having different dolby audio. And the transcoding paths in the path information corresponding to the target videos comprise partially overlapped paths. Then, according to the up-down dependency relationship between the nodes in the path information, a corresponding DAG transcoding structure can be constructed to combine the overlapping paths, and then the complex transcoding process of outputting multiple target videos can be completed in a terminal device directly according to the DAG transcoding structure. As shown in fig. 2, among the nodes for the target video to be finally output, the paths corresponding to the Node11 having the dolby effect 11, the Node12 having the dolby effect 12, and the Node13 having the dolby effect 13 each include a path from the root Node to the Node1 having the dolby effect 1. Then, the overlapping paths may be combined, so that the terminal device may transcode the source video according to the overlapping paths to obtain an intermediate Node1, and transcode the intermediate Node1 according to the non-overlapping paths to obtain nodes Node11, node12 and Node13. Similarly, for the nodes of other final output target videos, the Node21 with the dolby sound effect 21 and the Node22 with the dolby sound effect 22 may also perform overlapping path merging in the above manner, so as to construct a transcoding path of the DAG transcoding structure. In fig. 2, a root Node root is a first level, a Node1 and a Node2 form a second level, and nodes Node11, node12, node13, node21 and Node22 form a third level, and a dependency relationship exists between the levels. The first level is used as input to output the second level, and then the second level is used as input to output the third level. In this embodiment, after the construction of the DAG transcoding structure is completed, the construction of the DAG transcoding structure may be further extended horizontally and vertically according to the increased transcoding service requirement. As shown in the dashed box in fig. 2, in order to increase the transcoding service requirements for outputting the video with dolby sound effect 211, dolby sound effect 212, dolby sound effect 213, dolby sound effect 221 and dolby sound effect 311, according to the determined path information of the video to be output, the nodes Node3 may be added from the lateral direction, the nodes Node211, node212, node213 and Node221 may be added from the vertical direction, and the nodes Node31 and Node311 may be added. Thus, the transcoding path of the DAG transcoding structure can be constructed, and the complex transcoding service requirement can be realized in one terminal device. In the transcoding process, the intermediate nodes are directly stored in the local of the terminal equipment, so that the video corresponding to the intermediate nodes does not need to be uploaded to the external storage equipment, and the video does not need to be read from the external storage equipment for multiple times, and the subsequent transcoding process is performed.

In this embodiment, the functions implemented in the above-described method steps may be implemented by a computer program, which may be stored in a computer storage medium. In particular, the computer storage medium may be coupled to a processor, which may thereby read a computer program in the computer storage medium. The computer program, when executed by a processor, may perform the following functions:

s11: acquiring a source video;

s12: determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path;

s13: and transcoding the source video based on the determined path information.

In one embodiment, the computer program when executed by the processor further performs the steps of:

and according to the transcoding path, transcoding the source video in a transcoding mode among nodes in the transcoding path to obtain the target video.

In one embodiment, when the computer program is executed by the processor, the following steps are further implemented when at least two target videos are included in the video to be output:

when an overlapped path exists between first path information corresponding to a first target video and second path information corresponding to a second target video, transcoding the source video according to the overlapped path to obtain an intermediate node, and transcoding the intermediate node according to a non-overlapped path in the first path information and the second path information.

inputting a video group formed by the source video and the target video into a path identification model, and determining path information transcoded from the source video into the target video; and the source video and the target video are respectively used as a path starting node and a path ending node in the path identification model.

inputting a video group formed by the source video and the target video into a path recognition model, respectively extracting a first feature vector corresponding to the parameter information of the source video and a second feature vector corresponding to the parameter information of the target video through the feature recognition model, and determining a predicted value corresponding to a vector group formed by the first feature vector and the second feature vector through the path recognition model;

and taking the path information characterized by the predicted value as the path information of transcoding the source video into the target video.

acquiring a training sample set, wherein the training sample set comprises a sample video group of which the corresponding transcoding path accords with the path information and a sample video group of which the corresponding transcoding path does not accord with the path information; the sample video group comprises sample videos respectively corresponding to the path starting node and the path ending node;

inputting a sample video group in the training sample set into a path identification model, wherein the path identification model comprises initial prediction parameters;

processing the input sample video group through the initial prediction parameters to obtain a prediction result of the sample video group, wherein the prediction result is used for representing whether a transcoding path corresponding to the sample video group accords with the path information;

and if the predicted result is incorrect, adjusting the initial predicted parameter in the path recognition model according to the difference value between the predicted result and the correct result so that the obtained predicted result accords with the correct result after the sample video group is processed again through the adjusted predicted parameter.

Referring to fig. 3, the present application further provides a video transcoding device, which includes:

a video acquisition unit 100 for acquiring a source video;

a path determining unit 200 for determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode between nodes in the transcoding path;

and a transcoding unit 300, configured to transcode the source video based on the acquired path information.

In one embodiment, the transcoding unit is further configured to transcode, according to the transcoding path, the source video in a transcoding manner between nodes in the transcoding path, so as to obtain the target video.

In one embodiment, when at least two kinds of target videos are included in the video to be output,

the transcoding unit is further configured to transcode the source video according to the overlapping path when an overlapping path exists between first path information corresponding to the first target video and second path information corresponding to the second target video, so as to obtain an intermediate node, and transcode the intermediate node according to non-overlapping paths in the first path information and the second path information.

In one embodiment, the path determining unit is further configured to input a video group formed by the source video and the target video into a path recognition model, and determine path information for transcoding from the source video to the target video; and the source video and the target video are respectively used as a path starting node and a path ending node in the path identification model.

The specific functions of each unit module in the video transcoding device provided in the embodiment of the present disclosure may be explained in comparison with the foregoing method embodiment in the present disclosure, and may achieve the technical effects of the foregoing method embodiment, which will not be described herein again.

Referring to fig. 4, the present application further provides a video transcoding device, the device comprising a memory and a processor, the memory being configured to store a computer program, the computer program, when executed by the processor, implementing the steps of:

s11: acquiring a source video;

s13: and transcoding the source video based on the determined path information.

In this embodiment, the memory may include physical means for storing information, typically by digitizing the information and then storing the information in a medium using electrical, magnetic, or optical methods. The memory according to the present embodiment may further include: means for storing information by means of electrical energy, such as RAM, ROM, etc.; devices for storing information by magnetic energy, such as hard disk, floppy disk, magnetic tape, magnetic core memory, bubble memory, and USB flash disk; devices for storing information optically, such as CDs or DVDs. Of course, there are other ways of storing, such as quantum storing, graphene storing, etc.

In this embodiment, the processor may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not only one, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog2 are most commonly used at present. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

It is also known to those skilled in the art that the video image transcoding device can be implemented entirely by logic programming method steps to cause the video image transcoding device to perform the same function in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., except that the video image transcoding device is implemented as purely computer readable program code. Such a video image transcoding device can be regarded as a hardware component, and the devices included therein for implementing various functions can also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments of the present application.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are referred to each other, and each embodiment is mainly described as different from other embodiments. In particular, for embodiments of the video image transcoding device, reference may be made to the description of the embodiments of the method described above for comparison.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

While the present application has been described by way of embodiments, those of ordinary skill in the art will recognize that there are many variations and modifications of the present application without departing from the spirit of the application, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the application.

Claims

1. A method of video transcoding, the method comprising:

acquiring a source video;

determining path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode among nodes in the transcoding path, wherein the transcoding path is used for representing a plurality of transcoding tasks passing from a source video to a target video, and the plurality of transcoding tasks have a dependency relationship;

and transcoding the source video based on the determined path information so as to realize a multi-level transcoding process from the source video to the target video through the dependency relationship.

2. The method of claim 1, transcoding the source video based on the determined path information comprises:

3. The method of claim 1, when at least two target videos are included in the video to be output, transcoding the source video based on the determined path information comprises:

4. The method of claim 1, wherein the path information is determined as follows:

5. The method of claim 4, wherein the path information is determined as follows:

6. The method of claim 4, wherein the path recognition model is determined as follows:

7. A video transcoding apparatus, said apparatus comprising:

the video acquisition unit is used for acquiring a source video;

a path determining unit configured to determine path information transcoded from the source video into a target video; the path information comprises a transcoding path and a transcoding mode among nodes in the transcoding path, wherein the transcoding path is used for representing a plurality of transcoding tasks passing from a source video to a target video, and the plurality of transcoding tasks have a dependency relationship;

and the transcoding unit is used for transcoding the source video based on the acquired path information so as to realize a multi-level transcoding process from the source video to the target video through the dependency relationship.

8. The apparatus of claim 7, wherein the transcoding unit is further configured to transcode the source video according to the transcoding path in a transcoding manner between nodes in the transcoding path to obtain the target video.

9. The apparatus of claim 7, wherein when at least two kinds of target videos are included in the video to be outputted,

10. The apparatus according to claim 7, wherein the path determining unit is further configured to input a video group composed of the source video and the target video into a path recognition model, and determine path information for transcoding from the source video to the target video; and the source video and the target video are respectively used as a path starting node and a path ending node in the path identification model.

11. A video transcoding device, characterized in that the device comprises a memory and a processor, the memory being adapted to store a computer program which, when executed by the processor, implements the method according to any of claims 1 to 6.