WO2015002582A1

WO2015002582A1 - Method and arrangement for video transcoding

Info

Publication number: WO2015002582A1
Application number: PCT/SE2013/050849
Authority: WO
Inventors: Jacob STRÖM
Original assignee: Telefonaktiebolaget L M Ericsson (Publ)
Priority date: 2013-07-02
Filing date: 2013-07-02
Publication date: 2015-01-08

Abstract

In a video transcoder arrangement (1), including at least one decoder arrangement (20), each configured for decoding a received video signal of a respective first video format into video data of a common intermediate format, and at least two encoder arrangements (30), each configured for selectively encoding the video data of the common intermediate format into a video signal of a respective second video format.

Description

METHOD AND ARRANGEMENT FOR VIDEO TRANSCODING

TECHNICAL FIELD

Embodiments herein generally relates to the field of video coding, in particular to methods and arrangements for video transcoding.

BACKGROUND

One of the fundamental challenges in deploying multimedia systems is to deliver smooth and uninterruptible flow of audio-visual information. A multimedia system might consist of various devices, such as PCs, laptops, PDAs and smart phones etc., interconnected via heterogeneous wireline and wireless networks. In such systems, multimedia content originally authored and compressed with a certain format might need bit rate adjustment and format conversion in order to allow access by receiving devices with diverse capabilities. Thus, a transcoding mechanism is required to make the multimedia content adaptive to the capabilities of diverse networks and client devices. For example, if the bandwidth required for a particular video is fluctuating due to congestion or other causes, a transcoder can provide fine and dynamic adjustments in the bit rate of the video bitstream in the compressed domain without imposing additional functional requirements in the decoder. In addition, a video transcoder can change the coding parameters of the compressed video, adjust spatial and temporal resolution, and modify the video content of and/or the coding standard used.

In many situations, it is necessary to do transcoding between different codecs. For instance, in a video conference scenario, not all users might be able to decode a particular video format. It might then be necessary for a node or device in the network to transcode from the current video format to a video format that the end user can accept. The simplest way to do video transcoding is to first decode the video to pure pixels, and then encode the pixels again to the desired format. This is shown in Figure 1. However, one problem with this is that it is not very efficient. Typically the encoding process is much more time consuming than the decoding process. It is possible to do the encoding quickly, but then the quality or bit rate suffers.

A known and more efficient way is to directly transcode from e.g. H.264 to VP8, as is shown in Figure 2. In this case, the transcoder can remember selections made in the H. 264 format and use similar settings in the VP8 format. This increases the speed in relation to compression efficiency e.g. quality per bit.

For instance, most video coders use some form of motion vectors. Finding the best motion vector to use is a slow process. Forcing this search to happen quicker typically lowers quality for a certain bit rate. However, in the above described "direct transcoding" case, the motion vector used by the H.264 format can be used as a starting point for the search. This will mean that the VP8 format will converge quicker to a good solution. Another way to look at this is that when going from pixels in Figure 1, a lot of useful information that has taken a lot of computational effort to obtain is typically discarded. In the "direct transcode" case, the information is preserved.

A real world example of a direct transcoder is presented by Thomas Rusert and Sina Tamanna [ 1], and this is a good example of existing technology in this area.

However, creating a transcoder that operates directly in this fashion is unique to the particular codecs between which transcoding takes place. When transcoding to HEVC instead of to VP8 the direct transcoder would be different.

By the advent of Googels codecs VP8 and soon-to-come VP9 there are suddenly many more possible codecs around. Figure 3 depicts the five likely to be interesting in the next couple of years. As can be seen, in order to implement direct transcoding from every codec to every other codec 25 transcoder implementations are needed, and this is only for the five most common codecs.

In the example of Figure 3, the case of transcoding from one format to itself is included. This is a rather common case, for instance, when an end user cannot receive video at the current resolution and bit rate, and a network node has to down-sample the video.

As stated above, 25 transcoder implementations are needed. This is a huge number of transcoders to keep track of. In addition, every time there is a new codec the number of needed transcoders goes up further due to the combinatorics. At the same time, transcoding to pixels does not provide sufficient speed in combination with compression efficiency.

Consequently, there is a need for improving the efficiency of video transcoding, as the number of possible codecs is continuously increasing.

SUMMARY

It is an object to provide an improved transcoder.

This and other objects are met by embodiments of the proposed technology. According to a first aspect, there is provided a video transcoder arrangement which includes at least one decoder arrangement, each configured for decoding a received video signal of a respective first video format into video data of a common intermediate format, and at least two encoder arrangements, each configured for selectively encoding the video data of the common intermediate format into a video signal of a respective second video format.

According to a second aspect, there is provided a video transcoding method, which includes the steps of providing a video signal of at least a first predetermined video format, and decoding, in a decoder arrangement, the provided video signal of the at least a first predetermined video format into video data of a common intermediate format. Further, the method includes the step of selectively encoding, in an encoder arrangement, the video data of the common intermediate format into a video signal of one of at least two second predetermined formats.

An advantage of the proposed technology enables quickly creating, maintaining and using a system of transcoders, where transcoding can happen from any codec to any other codec.

Other advantages will be appreciated when reading the detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS The embodiments, together with further objects and advantages thereof, might best be understood by referring to the following description taken together with the accompanying drawings, in which:

Fig. 1 illustrates a known transcoder;

Fig. 2 illustrates a known transcoder;

Fig. 3 illustrates necessary transcoder combinations;

Fig. 4 illustrates an embodiment of the current technology; Fig. 5 illustrates an embodiment of a transcoding method of the current technology;

Fig. 6 illustrates an embodiment of a transcoder arrangement according to the current technology;

Fig. 7 illustrates a computer implementation of an embodiment of the current technology.

DETAILED DESCRIPTION

Throughout the drawings, the same reference numbers are used for similar or corresponding elements.

For a better understanding of the proposed technology, it might be useful to begin with a brief overview of what motion estimation and motion vectors in relation to pixels of decoded video signals entails.

Motion estimation is the process of determining motion vectors that describe the transformation from one two-dimensional image to another, such as from adjacent video frames in a video sequence. The motion vectors can relate to the entire image or specific parts thereof, such as rectangular blocks, arbitrary shaped patches or even per image pixel. Applying the motion vectors to an image to synthesize the transformation to the next image is called motion compensation. By determining motion vectors for the pixels or blocks of pixels in an image, it is possible to predict a subsequent image

The above described motion estimation can also be referred to as inter- prediction. Consequently, a prediction model is created from one or more previously encoded video frames. The model is formed by shifting samples in a reference frame, which is a case of motion compensated prediction. An image frame which includes a multitude of individual image pixels is typically divided into one or more sub-frames or macro blocks, each of which can be further divide into sub-blocks. A motion vector can be determined for one or more pixels, for one or more sub-blocks, or for one or more macro- blocks. Finding out an appropriate motion vector during inter-prediction gives rise to a certain amount of computational load. With known transcoding arrangements, the result of this computational load is sometimes discarded during a transcoding operation, something that is dealt with and negated in the current disclosure.

A basic idea of the present disclosure is to use or define a common intermediate media format or an interface other than pure pixels such that every decoder can be coupled with every encoder. Thereby, only one respective decoder and encoder per video format needs to be implemented, thus reducing the complexity of a transcoder arrangement and also simplifying the addition of new video formats. Every new video format only requires one encoder and one decoder functionality to be added to the transcoder. This is shown in Figure 4, where a number of video formats are illustrated, and the encoding/ decoding process is indicated by the arrows.

With reference to Figure 5, an embodiment of a video transcoding method will be described. The method can be implemented in a network node and/ or user device in a wireless or wired communication network.

Initially a video signal of at least a first predetermined video format is provided in step S 10, as input to a transcoder arrangement or functionality. The provided video signal is subsequently decoded in step S20, in a decoder arrangement, from the first predetermined video format into video data of a common intermediate format. Finally, the video data of the common intermediate format is selectively encoded in step S30, in an encoder arrangement, into a video signal of one of at least two second predetermined formats.

By using the common intermediate format, it is possible to decode a video signal into video data with a common intermediate format that is useable for multiple encoders. According to a further embodiment, the decoding step S20 includes decoding the video signal of the first predetermined video format into image pixel values and additional information relating to the video signal of the first predetermined video format. The additional information is then selectively used to encode the video data to the second predetermined video format.

The additional information can comprise a multitude of different types of information relating to the provided video signal, where the information can be used or ignored in a subsequent encoding operation. Thereby, it is possible to utilize already existing video information data that would otherwise be lost, thus further improving the efficiency of the transcoding operation. According to a particular embodiment, the additional information includes at least motion vector information for one or more pixels of a decoded image. This motion vector information could include one or more motion vectors related to some or each of the one or more pixels of the decoded image. As an example, one motion vector points towards a previous frame and another motion vector points towards a future frame.

According to an embodiment of the present disclosure, the common intermediate format or interface can be a set of pixel images, but where every pixel not only has a color but also additional information such as a motion vector associated with it. Assume that the aim is to transcode from one video format such as H.264 to another video format such as VP8. Instead of doing direct decoding or decoding to pixels only, as mentioned in the background section, according to embodiments of the video transcoding method of the current disclosure, the video signal is decoded into the above-mentioned interchangeable format where every pixel has a color and potentially a motion vector. For pixels belonging to so-called intra-blocks (which have no motion compensation) it is possible to introduce an indication or a dummy value, such as not-a-number (NaN). The encoder arrangement takes this interchangeable format as input when starting to encode to VP8. When the encoder compresses a block, it can examine the motion vector for e.g. the center pixel in the block. The VP8 encoder arrangement can then try this motion vector position, which is likely to be good, instead of trying all possible motion vectors. It is thus likely that this will work much better than decoding followed by encoding as in Figure 1. If the motion vector associated with the center pixel does not yield a satisfactory result, the motion vector associated with one or more of the other pixels in the block can be tried, for instance the one associated with the top left pixel. If this also fails, the encoder can of course disregard the extra information and search for a motion vector from scratch. It should therefore never be worse than decoding following by encoding as in Figure 1.

In addition, this also solves the above-described combinatorial explosion. Since the same common interchangeable format can be used for all codecs, it is only necessary to implement five decoders and five encoders as shown in Figure 4. In other words, one decoder for decoding from each particular format into the interchangeable format, and one encoder for encoding from the interchangeable format into each particular format. When introducing a new codec, only one new decoder and one new encoder need to be implemented. This enables a more efficient manner in which to add new video formats to the transcoder capabilities.

In Figure 4, the common interchangeable format is described to include pixels and "more". One example of "more" is the motion vector per pixel information described above. Another can be a "true/false" flag indicating whether the block or pixel was skipped. Another type of information can be about the block structure, such as the number of surrounding pixels with a same motion vector as a current pixel. Are there 16, 32, or 64*64 pixels? Other types of information can be whether the block was intra-coded or inter-coded, which reference picture or reference pictures the motion vector was referring to if it was inter-coded etc. In case a block has been intra- coded, the block is typically also predicted from previously coded parts of the image in question. For instance, the pixels of the block are predicted from the left or from the top of the block. In other words, the intra-prediction can have a directional quality. Consequently, the additional information, according to a further embodiment, could include an indication that directional intra-prediction has been used and if so, an indication on which direction a particular pixel has been intra-predicted from. Any type of information that is possible to parse from the bit stream of the video signal and that is present in several of the other codecs can be of interest and part of the interchangeable format. Not all of the information needs to be used by the encoder. For example, some coders might have a fixed block structure, and will therefore ignore information about block structures.

Consequently, according to various embodiments, the motion vector information can comprise motion vector information of every motion compensated pixel, and / or a predetermined value if the pixel is not motion compensated. In addition, the common intermediate format can also comprise an indication whether a current pixel was intra coded or not.

Since not all codecs utilize motion vectors in the same manner, the encoding step S30 can be based on at least part of the provided additional information. Thereby, an encoder which does not recognize all the provided additional information only utilizes the information that it does recognize.

When going between resolutions, the data that is available per pixel can be scaled using e.g. nearest neighbor to the new resolution. For instance, if a downscale from 640x480 to 320x240 is performed, the information stored in the interchangeable format might be of 320x240 pixels and contain the information of every second pixel in the x- and y-dimension from the 640x480 size.

The above-described embodiments of a video transcoding method can be used to transcode from a first video format to a second different video format, or between a first video format to a same second video format in the case of changing bitrate or resolution or the like for a same video format.

With reference to Figure 6, an embodiment of a video transcoder arrangement 1 according to the present technology will be described. The video transcoder arrangement 1 is configured for decoding received video signals of at least one first predetermined format into video data of a common intermediate format, and further configured for selectively encoding the video data into a transcoded video signal of a second video format.

Consequently, an embodiment of a video transcoder arrangement 1, includes at least one decoder arrangement 20, each configured for decoding a received video signal of a respective first video format into video data of a common intermediate format. Further, the video transcoder arrangement 1 includes at least two encoder arrangements 30, each of which is configured for selectively encoding the video data from the common intermediate format into a transcoded video signal of a respective second video format. In this embodiment one decoder arrangement 20 and two encoder arrangements 30 are disclosed. However, in a particular embodiment a plurality of both encoder arrangements 30 and decoder arrangements 20 are provided, thus enabling transcoding from any one of a plurality of first video formats into any one of a plurality of second video formats.

According to a further embodiment, the first and the second video formats can be a same video format but with different bitrate or other characteristics, or be altogether different video formats.

The video transcoder arrangement is beneficially implemented in a network node or a user equipment node or device in a wireless or wired communication system.

Although not disclosed in Figure 6, the video transcoder arrangement includes all necessary equipment e.g. RX/TX, antenna etc. for successfully receiving and transmitting video signals. Consequently, the transcoder can be included and utilized in a device or node before transmitting the transcoded signal, or included and utilized in a device or node upon reception of a video signal.

Further, the video transcoder arrangement 1 is configured to enable all functionality described with relation to Figure 6. These include, decoding video signals into an intermediate format comprising image pixel values and motion vector information for the received video signal.

Image pixel values can be generally understood to include pixel colors such as for example red, green and blue (RGB), or cyan, magenta, yellow and black (CMYK), or luminance, chrominance-U and chrominance-V (YUV). The motion vector information can comprise motion vector information of every motion compensated pixel, or a predetermined value if the pixel is not motion compensated. In addition, according to a further embodiment, the intermediate format can include an indication whether a current pixel was intra coded or not. As mentioned previously, the transcoder arrangement and method of the current technology can be utilized to transcode a video signal from a first video format to a second different video format, or from one video format to the same video format with a different compression or bitrate. The latter case occurs when a certain bitrate is not supported by a receiving device or if the available bandwidth is reduced for the transmission of the video signal.

The embodiments of the present technology can also be viewed as a system capable of transcoding between at least two combinations of formats (A to B and A to C) for instance where each such transcoding is divided into two steps, by decoding from one format to an intermediate format containing pixel values such as R, G and B (or Y, U or V) and additional information, and encoding from said intermediate format to another format and where the same intermediate format is used for all transcoding operations in the system.

The steps, functions, procedures, and/ or blocks described above might be implemented in hardware using any conventional technology, such as discrete circuit or integrated circuit technology, including both general- purpose electronic circuitry and application-specific circuitry.

Particular examples include one or more suitably configured digital signal processors and other known electronic circuits, e.g. discrete logic gates interconnected to form a specialized function, or Application Specific Integrated Circuits, ASICs.

Alternatively, at least some of the steps, functions, procedures, and/ or blocks described above might be implemented in software such as a computer program for execution by suitable processing circuitry including one or more processing units. Examples of processing circuitry includes, but is not limited to, one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Central Processing Units (CPUs), video acceleration hardware, and/or any suitable programmable logic circuitry, such as one ore more Field Programmable Gate Arrays (FPGAs), or one or more Programmable Logic Controllers (PLCs).

It should also be understood that it might be possible to re-use the general processing capabilities of any conventional device or unit in which the proposed technology is implemented. It might also be possible to re-use existing software or by adding new software components.

In the following, an example of a computer implementation will be described with reference to Figure 8. The transcoder arrangement 200 comprises processing circuitry such as one or more processors 210 e.g. a micro processor, which executes a software component 221 for providing a video signal of first predetermined video format, and a software component 222 for decoding the provided video signal into video data of a common intermediate format. Further, the transcoder arrangement 200 includes a software component 223 for encoding the video data of the common intermediate format into a transcoded video signal of a second predetermined video format. These software components are stored in a memory 220. In this particular example, at least some of the steps, functions, procedures, and/or blocks described above are implemented in a computer program, which is loaded into the memory for execution by the processing circuitry. The processing circuitry 210 and memory 220 are interconnected to each other to enable normal software execution. An optional input/output device might also be interconnected to the processing circuitry and/ or the memory to enable input and/ or output of relevant data such as input parameter(s) and/or resulting output parameter(s) . The processor 210 communicates with the memory over a system bus. A video signal is received by an input/output (I/O) controller 230 controlling an I/O bus, to which the processor 210 and the memory 220 are connected. In this embodiment, the signal received by the I/O controller 230 is stored in the memory 220, where it is processed by the software components. Software component 221 might implement the functionality of the video signal-providing step S 10 in the embodiment described with reference to Figure 5. Software component 222 might implement the functionality of the decoding step S20, also with reference to Figure 5. Finally, software component 223 might implement the functionality of the encoding step S30 described with reference to Figure 5. The I/O unit 230 might be interconnected to the processor 210 and the memory 220 via an I/O bus to enable input and/or output of relevant data such as input parameters and/or resulting output parameters.

The term 'computer' should be interpreted in a general sense as any system or device capable of executing program code or computer program instructions to perform a particular processing, determining or computing task. In a particular embodiment, the computer program comprises program code which when executed by the processing circuitry or computer causes the processing circuitry or computer to provide S 10 a video signal of at least a first predetermined video format, and decoding S20, in a decoder arrangement, the provided video signal of the at least first predetermined video format into video data of a common intermediate format. Finally, the processing circuitry or computer selectively encodes S30, in an encoder arrangement, the video data of the common intermediate format into a video signal of one of at least two second predetermined formats.

The software or computer program might be realized as a computer program product, which is normally carried or stored on a computer-readable medium. The computer-readable medium might include one or more removable or non-removable memory devices including, but not limited to a Read-Only Memory, ROM, a Random Access Memory, RAM, a Compact Disc, CD, a Digital Versatile Disc, DVD, a Universal Serial Bus, USB, memory, a Hard Disk Drive, HDD, storage device, a flash memory, or any other conventional memory device. The computer program might thus be loaded into the operating memory of a computer or equivalent processing device for execution by the processing circuitry thereof.

For example, the computer program stored in memory includes program instructions executable by the processing circuitry, whereby the processing circuitry is able or operative to execute the above-described steps, functions, procedures and / or blocks.

The transcoder arrangement 1 is thus configured to perform, when executing the computer program, well-defined processing tasks such as those described above.

The computer or processing circuitry does not have to be dedicated to only execute the above-described steps, functions, procedures, and/ or blocks, but might also execute other tasks. The main advantage of the present technology is the possibility to quickly create, maintain and use as transcoder or system of transcoders where transcoding can happen from any codec to any other codec, without the penalty of doing blind re-encoding as in Figure 1 and without the combinatorial explosion described in Figure 3.

The embodiments described above are merely given as examples, and it should be understood that the proposed technology is not limited thereto. It will be understood by those skilled in the art that various modifications, combinations and changes might be made to the embodiments without departing from the present scope as defined by the appended claims. In particular, different part solutions in the different embodiments can be combined in other configurations, where technically possible.

Claims

1. A video transcoder arrangement ( 1), comprising:

at least one decoder arrangement ( 10), each configured for decoding a received video signal of a respective first video format into video data of a common intermediate format;

at least two encoder arrangements (20), each configured for selectively encoding said video data of said common intermediate format into a video signal of a respective second video format.

2. The video transcoder arrangement according to claim 1, wherein said common intermediate format comprises image pixel values and motion vector information for said decoded received video signal.

3. The video transcoder arrangement according to claim 2, wherein said motion vector information comprises motion vector information of every motion compensated pixel.

4. The video transcoder arrangement according to claim 3, wherein said motion vector information comprises a predetermined value if the pixel is not motion compensated.

5. The video transcoder arrangement according to claim 1, wherein said common intermediate format further comprises an indication whether the pixel was intra coded or not.

6. The video transcoder arrangement according to claim 5, wherein if the pixel was intra-coded, said additional information comprising an indication whether said pixel was directionally predicted.

7. The video transcoder arrangement according to claim 6, wherein said additional information comprising an indication that said pixel was predicted directionally and an indication of a direction from which said pixel was predicted.

8. The video transcoder arrangement according to claim 1, wherein said first video format and said at least two second video formats comprise the same or different video formats.

9. The video transcoder arrangement according to any of claims 1-8, wherein said video transcoder arrangement ( 1) comprising a plurality of decoder arrangements ( 10) and a plurality of encoder arrangements (20), thereby enabling selectively transcoding a video signal from any of a plurality of first formats to a video signal of any of a plurality of second formats.

10. A video transcoding method, comprising the steps of:

providing (S 10) a video signal of at least a first predetermined video format;

decoding (S20), in a decoder arrangement, said provided video signal of said at least a first predetermined video format into video data of a common intermediate format;

selectively encoding (S30), in an encoder arrangement, said video data of said common intermediate format into a video signal of one of at least two second predetermined formats.

1 1. The video transcoding method according to claim 10, wherein said decoding step (S20) comprises decoding said video signal into image pixel values and additional information relating to said video signal of said first predetermined video format.

12. The video transcoding method according to claim 1 1, wherein said additional information includes at least motion vector information.

13. The video transcoding method according to claim 12, wherein said at least motion vector information comprises motion vector information of every motion compensated pixel.

14. The video transcoding method according to claim 12, wherein said at least motion vector information comprises a predetermined value if the pixel is not motion compensated.

15. The video transcoding method according to claim 1 1, wherein said intermediate format also comprises an indication whether the pixel was intra coded or not.

16. The video transcoding method according to claim 15, wherein if the pixel was intracoded, said additional information comprising an indication whether said pixel was directionally predicted..

17. The video transcoding method according to claim 14, wherein said additional information comprising an indication that said pixel was predicted directionally and an indication of a direction from which said pixel was predicted.

18. The video transcoding method according to claim 1 1, wherein said encoding step (S30) is based on at least part of the provided additional information.