GB2553597A

GB2553597A - Multimedia processing in IP networks

Info

Publication number: GB2553597A
Application number: GB1618438.4A
Authority: GB
Inventors: Jean Marie Surcouf Andre; Marshall John; Markevitch Jamie; Desmouceaux Yoann; Taldir Axel
Original assignee: Cisco Technology Inc
Current assignee: Cisco Technology Inc
Priority date: 2016-09-07
Filing date: 2016-11-01
Publication date: 2018-03-14
Also published as: WO2018048585A1; GB201618438D0

Abstract

A method of handling multimedia processing of a multimedia stream in an IP network comprises (i) receiving 951 an incoming multimedia stream; (ii) receiving 952 instructions specifying processing steps to be performed on the multimedia stream; (iii) allocating 953 the processing steps to compute nodes in the IP network; (iv) determining 954 whether criteria are met; and (v) handling 955 transfer of the multimedia stream between consecutive processing steps by: (a) writing to/reading from a shared memory segment; or (b) packaging the multimedia stream as IP packets depending on met criteria. The IP network may transmit and receive IP packets encapsulated in a header, which includes fields storing information relating to processing steps previously carried out, the frame number, and/or identification of compute nodes which carried out processing steps. Instructions may specify text or video overlays for video streams; image processing for video streams; or an audio stream to merge or replace a native audio stream. The conversion between IP packets and video frames may be called IP-to-Video-Frame (IP2VF).

Description

(71) Applicant(s):

CISCO TECHNOLOGY, INC.

(Incorporated in USA - California)

170 West Tasman Drive, San Jose, CA 95134, United States of America (72) Inventor(s):

Andre Jean-Marie Surcouf John Marshall Jamie Markevitch Yoann Desmouceaux Axel Taldir (74) Agent and/or Address for Service:

Mathys & Squire LLP

The Shard, 32 London Bridge Street, LONDON, SE1 9SG, United Kingdom (51) INT CL:

G06F 9/50 (2006.01) H04N 21/647 (2011.01) (56) Documents Cited:

US 20120017062 A1 US 20090031119 A1 (58) Field of Search:

INT CL G06F, H04L, H04N

Other: WPI, EPODOC, Patent Full-Text, XPSPRNG, XPI3E, XPIPCOM, XPMISC, XPRD, XPESP, TDB, INSPEC, Internet (54) Title of the Invention: Multimedia processing in IP networks

Abstract Title: Multimedia Processing of a Multimedia Stream in IP networks (57) A method of handling multimedia processing of a multimedia stream in an IP network comprises (i) receiving 951 an incoming multimedia stream; (ii) receiving 952 instructions specifying processing steps to θοο be performed on the multimedia stream; (iii) allocating 953 the processing steps to compute nodes in the IP network; (iv) determining 954 whether criteria are met; and (v) handling 955 transfer of the multimedia stream between consecutive processing steps by: (a) writing to/ reading from a shared memory segment; or (b) packaging the multimedia stream as IP packets depending on met criteria. The IP network may transmit and receive IP packets encapsulated in a header, which includes fields storing information relating to processing steps previously carried out, the frame number, and/or identification of compute nodes which carried out processing steps. Instructions may specify text or video overlays for video streams; image processing for video streams; or an audio stream to merge or replace a native audio stream. The conversion between IP packets and video frames may be called “IP-to-Video-Frame” (IP2VF).

Fig. 9

955

1/9

100

Fig. 1 (Prior Art)

2/9

Fig. 2

214

3/9

Fig. 3 (Prior Art)

4/9

405

5/9

Fig. 5

513a

6/9

Fig. 6

619a \ 61.9b

7/9

Fig. 7

719a \ 71.9b

8/9

Fig. 8

9/9

900

955

Fig. 9

Multimedia processing in IP networks

Technical Field

The present disclosure relates generally to methods and systems for performing multimedia processing on a multimedia stream, and in particular to methods and systems for distributing the multimedia processing round an IP network.

Background

In traditional multimedia processing centres, multimedia data is propagated through various processing steps in a manner compliant with Serial Digital Interface (SDI) standards. Connecting many pieces of specialised equipment together in this way requires a specialised routing and interface infrastructure.

Brief Description of the Figures

Embodiments of the method and apparatus described herein are illustrated in the accompanying Figures, in which:

Figure 1 is a schematic of a prior art multimedia processing centre;

Figure 2 is a schematic of a multimedia processing centre utilising an IP network, according to the present disclosure;

Figure 3 is a schematic of a video frame;

Figure 4 is a schematic of a multimedia processing pipeline according to the present disclosure;

Figure 5 is a schematic of another multimedia processing pipeline according to the present disclosure;

Figure 6 is a detailed schematic of a processing step according to the present disclosure;

Figure 7 is a detailed schematic of two processing steps according to the present disclosure;

Figure 8 is a detailed schematic of a video frame converter according to the present disclosure; and

Figure 9 is a flow chart illustrating a method according to the present disclosure.

Detailed Description

Overview

Described herein is a method of handling multimedia processing in an IP network, the method comprising:

(i) receiving an incoming multimedia stream;

(ii) receiving instructions specifying one or more processing steps to be performed on the multimedia stream;

(iii) allocating the processing steps to compute nodes in the IP network;

(iv) determining whether one or more criteria are met; and (v) handling transfer of the multimedia stream between consecutive processing steps by:

(a) writing to, and reading from, a shared memory segment; or (b) packaging the multimedia stream as IP packets depending on which criteria are met.

There is also described herein a network device and a network for implementing the methods described herein. In addition, a computer program, computer program product and computer readable medium comprising instructions for implementing the methods described herein is disclosed.

Example Embodiments

Aspects and embodiments of the disclosure are now set out in detail below, in which similar or identical parts are numbered with similar numbers.

Figure 1 shows a prior art multimedia processing centre 100. Media is received from a variety of sources, for example, a network source 102 is configured to send multimedia to be processed. Additionally, media may be generated locally by graphics systems 104 or cameras and/or microphones 108, or supplied from a local server 106. Each of these sources is fed to a media SDI router 110. Control system and control panel 112 is arranged to manipulate the input multimedia feeds according to the requirements of the user of the system. For example, two video streams may be merged, or one overlaid on a portion of another. Other effects, such as overlaying text or images, audio processing, altering transparency of overlay etc. may also be applied, e.g. using an alpha channel of an ARGB colour model encoded image. Additionally, the processing may include audio processing, such as merging or replacing (dubbing) the audio stream supplied with the multimedia stream (the native audio stream) with another audio stream.

In some cases, the processing may be handled entirely by the control system and control panel 112. In other cases, specialist processing modules may be used fora particular step, such as audio mixer 120, multi-viewer 118 and video switcher 116. In addition, a monitor 114 may be provided, to allow users of the system to view the multimedia stream in real (or near real) time.

When the processing is complete, the media stream is output, for example to a network location 122 (for broadcasting, further processing, etc.) or to storage 124.

As technology progresses, the quality of multimedia streams increases, and consequently, so does the bitrate of the stream. For example, a resolution of 1080p (1920x1080 pixels) requires a bitrate of around 5Mbit/s, while the 4k video gaining popularity at the moment requires closer to 50Mbit/s, when the streams are compressed. This situation is exacerbated with uncompressed video, where the bitrates are 3Gbit/s for 1080p, and 12Gbit/s for 4k. This shift has led processing centres like that shown in Figure 1 to offload more and more of the processing steps to specialist modules, rather than processing by the control panel and control system 112, as each step is too computationally intensive to handle centrally in real time.

This in turn leads to a very complex routing situation, in which the media router is required to have a large number of input connections, intermediate processing connections (for the specialised processing modules), and output connections. Not only does this increase the complexity of the routing apparatus, but the use of specialist modules designed with specific tasks in mind may be an inefficient use of resources if some modules are used only rarely.

Turning now to Figure 2, which shows a multimedia processing centre 200 according to the present disclosure, a similar set up is shown to that of Figure 1. Once again, multimedia is received from a variety of sources: a network source 202, graphics systems 204, cameras and/or microphones 208, and local server 206.

Instead of being received at an SDI router, the multimedia streams are supplied to an IP network 226. The IP network 226 comprises an interlinked series of processors 230, for example nodes which are able to process video or audio data, convert it to a different format, etc. The processors 230, also called compute nodes, are linked to one another by connections to IP routers (or switches). IP networks are well understood, and easily scalable, so resources are unlikely to be wasted. Without loss of generality, the IP network may be an IPv4 network, or an IPv6 network.

The control system and control panel 212 once again controls the overall processing operation. In this case, it interacts with the IP network via a network interface and open API

234, which allow the control panel and control system 212 to oversee the processes occurring on the network 226.

Additionally, specialist equipment may be provided, such as audio mixer 220, multi-viewer 218 and video switcher 216. However, the IP networking arrangement allows the functionality of each of these modules to be provided on a compute node, for example as a software implementation. It may simply be a case of providing a Graphics Processing Unit (GPU), or other dedicated hardware on some of the compute nodes 230, so that there is sufficient processing power on the compute node to perform the desired processing step.

In addition, a monitor 214 may be provided, connected to the IP network 226 to allow users of the system to view the multimedia stream in real (or near real) time, as before.

While the IP network approach allows a simplification of the processing centre, as described above, IP networks and multimedia processing operate on different protocols. For example, IP networks are based around discrete packets of information, sent compliant with transport layer protocols such as TCP or UDP, while multimedia processing uses specific multimedia protocols, and in particular video processing is performed in a frame-by-frame manner. It is possible to transfer multimedia data over an IP network, for example the SMPTE 2022-6 standard relates to Transport of High Bit Rate Media Signals over IP Networks (HBRMT). However, the packet nature of IP makes processing of streams in an IP format difficult.

The control panel 212 may be used to control, via the interface and API 234, not just which processes, and with what processing parameters, are to be carried out in manipulating the multimedia stream, but also which nodes should perform which process. It may be known to a user that certain nodes are particularly suited for carrying out a particular process, or the control panel 212 may be configured to assess the network and make this decision on the user’s behalf. Considerations which may be used in this latter case include general state of network traffic at various nodes, and the functionality of each node, for example.

As used in this disclosure, a compute node is a network node able to perform computations, and process multimedia streams. When two or more nodes are referred to, they may be physically separate entities, linked by a communications pathway (Ethernet cables, wireless protocols, fibre-optic communications etc.), or they may be located on the same server, even running as processes in separate containers or virtual machines on the same processor. That is, at one extreme a compute node may be a workload encapsulated in a container, while at the other extreme a compute node may be a separate processing entity entirely.

An example of a video frame 300 is shown in Figure 3. Here, the frame comprises 16x12 = 192 individual pixels 301. This is merely an example, however, and typical pixel resolutions might range from 480 x 640 (480p) to 3840 x 2160 (2160p or 4k). Indeed, yet higher resolutions are being contemplated by multimedia producers, such as 8k (7680 x 4320 pixels). Whatever the resolution, the image represented by the frame is divided into lines, shown alternately shaded and unshaded in Figure 3. A line is simply a row of pixels extending across the entire image that is a 1 x m block of pixels, where m is the horizontal pixel resolution. A video is formed by displaying a series of frames one after the other in quick succession. Typically frames are shown at around 50-60 frames per second in modern media systems. Some systems, known as high FPS systems (collectively high FPS and high dynamic range imaging are known as Ultra High Definition (UHD) formats). The concepts disclosed herein are applicable to these very high throughput systems as well.

The processing steps described above may involve directly manipulating the pixels of one or more frames, for example altering blocks of pixels to be a single colour, to introduce a banner, and then further overlaying pixels of another colour in particular shapes for example to display text on the banner. In order for the overlaid text to remain visible as the video plays, the same alterations should be made to subsequent frames, for as long as the text is intended to remain visible. More complex modifications may be made, which nonetheless follow this same idea. For example, overlaying video may simply require pixels from one stream to be replaced by corresponding pixels from another stream. Transparency can be adjusted by merging pixel colours and brightness values rather than overwriting.

Each frame may also include audio data, and other data, such as encoding schemes, language information or DRM data. These can be manipulated (or added) by processing steps, in a similar manner.

In order to send multimedia streams over IP networks, they are sent as packets. Typically, a single frame is larger than the maximum size for an IP packet payload, so not only is the stream broken down into frames, but frames are broken up prior to sending them over the network, and can be reassembled when they arrive, for example to enable processing to be carried out. In fact, modern resolutions can be so large that even individual lines are too large, and an IP packet may contain only a fragment of a line, or just part of the audio or other information. When all the packets making up a frame arrive, they can be assembled into a frame. Each subsequent frame is assembled sequentially, so that the multimedia stream can be recreated, having been sent as packets over the IP network.

In particular, for the system shown in Figure 2, where multimedia streams (in a frame format) are received from sources by the IP network 226, they are converted to IP packets before they are propagated around the network.

When the IP packets arrive at a compute node 230, which is configured to apply one or more processing steps on the multimedia stream on a frame-by-frame basis, the packets are reassembled into frames and the node processes the data. Similarly, when one compute node 230 completes the processing step which it is required to, it sends the frame to the next processing step, or to output. Prior to sending the frame onwards, it is converted to an IP packet format.

This conversion between IP packets and video frames and vice versa is sometimes referred to as IP2VF (IP-to-Video-Frame). Figure 4 shows an example of a media processing pipeline 400 configured to operate on an IP network such as the network 226 shown in Figure 2.

A media stream enters the IP2VF domain at input 403. It is assumed that the input is in an IP packet format, described above. The first IP2VF node 405 therefore is tasked with assembling the incoming IP packets into media frames. The first IP2VF node 405 also duplicates the incoming multimedia stream, for example by creating two copies of the assembled frames for output. One resultant stream is sent to a processing step, while another resultant stream is sent to a monitor 411, for example so that a user can review the stream at entry. The monitor may include a loudspeaker to review any audio information included in the stream. Optionally, the monitoring process may store a copy of the stream or transmit it elsewhere, for example. In some embodiments, the duplicate streams are not identical, for example, one may have a reduced bitrate relative to the other, as described in more detail below.

The stream sent to a processing step can be performed locally (that is, on the same node as that performing the assembly of IP packets into frames), in which case the video frame format is used, or the processing can be performed on a different compute node, in which case, the first IP2VF node 405 converts the stream to IP2VF format and sends it over IP, for receipt by the compute node on which the processing is to take place. The packets are reassembled into frames, and the frames are processed individually, as described above.

The IP2VF format shares some similarities with IP packets, differing in that they have a specially defined IP2VF Internal Header per media frame. At the entry of the pipeline, the first module accepts raw video in a format such as SMPTE 2022-6 format. It converts them into an internal IP2VF transport format by adding a specific header.

The header may comprise but is not limited to one or more of the following elements:

• 4 bytes: 0x00 0x00 0x00 0x01 to notify nodes on the system that this is the start of an IP2VF frame • 4 bytes: frame number, starting at 0, and progressing until there are no further frames in this stream. When the number of frames exceeds the maximum number representable by this portion (around 4 billion frames), the earliest numbers are recycled, for example starting again at 0000.

• 1 byte: module id. The module id is a unique identifier across the whole videoprocessing pipe line used to identify the producer. This producer can be either a videoprocessing node or another IP2VF module. Each time a frame in the IP2VF format enters a module, the module must override the module id byte from the header with its own id. This allows identification and tracking of the frame source, so that e.g. a module outputting corrupted or incorrect processed frames can be easily identified.

The IP2VF internal header therefore contains information on the processing. Specifically, the header may include fields in which information can be stored relating to the compute nodes which have performed processing steps, which processing steps have occurred, and/or the frame number. Indeed, any other information which may be of use to the user may also be included. Not only does this assist in tracking the packets around the network, and monitoring the steps which have been performed, but the system may operate more efficiently by making decisions based on the history of a given packet, frame, etc.

The processing continues in this manner, converting to IP2VF format to transfer to the next compute node, where the IP packets are assembled into frames for the next processing step, as often as necessary until each required processing step has been completed. At this stage, the multimedia stream is sent to the final output, for example in Figure 2, network location 222 or storage 224. Additionally, the final IP2VF node 405 may be configured to duplicate the stream again, sending one stream to the final output and the other stream to a monitor for review by a user. In this way, a user can easily track the changes which have occurred during video processing, by comparing the input monitoring stream with the output monitoring stream.

Figure 5 shows a similar processing pipeline 500, albeit more complex than the one shown in Figure 4. In this example each IP2VF node 505, which connects the processing steps with one another, duplicates the stream. In most cases, this is used to allow monitors 511 to display the status of the stream at that stage in the processing, so allowing a user to track the progress of each of the processing steps as the multimedia stream traverses the pipeline.

In Figure 5 another use of the duplicating feature of the IP2VF node is shown between the output of step “Process 1” and the input of step “Process 2”. In this case, the multimedia stream is duplicated, and each stream undergoes different processing steps. The result is two final outputs 513a and 513b. Since they have undergone two different sets of processing steps (compare {Process 2 followed by Process 3} with {Process 4 followed by Process 5}), the final output multimedia streams can be different from one another. As a simple example, the processing steps may include overlaying subtitles on a video stream. The two different streams may then be used to concurrently overlay subtitles in different languages. For example output 513a may have English subtitles, while output 513b may have German subtitles.

While pipelines 400 and 500 show some specific implementations of the options for monitoring and processing multimedia streams, many more branching options are available to provide as many streams as required.

Note that copying of streams is generally minimised by the IP2VF system, and monitoring every stage may be computationally inefficient. Therefore, sometimes not every processing stage is monitored as described above. Alternatively, the stream sent to a monitor may be reduced in quality, by reducing the bitrate, when the IP2VF module is informed (by a user or otherwise) that the output will be going to a monitoring station. For example, every nth frame may be dropped to reduce the number of frames which need to be duplicated. Here n is an integer of 2 or more. This reduces the copying workload to 1/n of the full value. Alternatively, the resolution of the frames may be reduced by reducing a 2 x 2 pixel square to a single pixel e.g. by replacing 4 pixels with one averaged one, or simply deleting 3 out of every 4 pixels. Alternatively the video can be simply compressed to generate e.g. an H264 compressed stream. Finally, the audio or other information may be manipulated to reduce the bitrate. The audio may be down-sampled, or even deleted, and other information, e.g. metadata providing information relating to video frame rate; video resolution; multimedia input file format; multimedia output file format; audio format; audio quality; codec usage; multimedia frame size in bytes; or multimedia stream bitrate may be deleted as they may not be deemed required by the monitor.

Multimedia streams which are to be sent to a monitor can be converted or transcoded to a suitable format for monitoring them prior to sending to the monitor. In this context, converting may mean converting the format, quality, resolution or encoding of the stream. Additionally, it may include sending it to a particular socket in a particular format, e.g. to allow a monitor to view the stream. It may even include stripping away superfluous information, e.g. metadata or audio data.

Figure 6 shows an example 600 of the situation in which a compute node 615 processes multimedia which arrives as IP packets 617. As described above, the incoming packets may be IP2VF packets, depending on the context. Without loss of generality, such packets will be referred to generally in the foregoing as IP packets, the main difference being the specialised internal header which is applied by the first IP2VF module in the network.

The compute node comprises an input gateway IP2VF module 619a running in a container or a virtual machine, for receiving the IP packets. The IP2VF node 619a assembles the incoming IP packets into frames for sending as a multimedia stream. For simplicity, the IP2VF module 619a is shown only outputting one stream, but as described above, a second stream may be generated concurrently.

The multimedia stream is output to a data buffer, e.g. shared memory segment (SHM) 621a, which is shared with a video processing application 632 on the same compute node of the network as the IP2VF process.

Once the packets have been converted to video frames, the video processing step can occur. The video processing application 632 also runs inside a container or a Virtual machine, and processes video according to particular processing instructions provided to it by a user of the system. This processing typically occurs frame by frame

The video processing workload reads an input frame from SHM 621a, processes it, and stores the result in a second SHM 621b. Since processing steps usually use a GPU, which works in ARGB format, while video frames are usually transmitted in YUV format, the processing step also performs a translation between these two formats. The general schema for a processing workload is therefore:

- CPU of the compute node receives notification from input IP2VF 617a that a

Video Frame is available in the input SHM. That is, sufficient IP packets have arrived, in this example, to construct a full frame.

- A GPU forming part of the compute node reads the YUV format frame from the first (input) SHM 621a and performs YUV to (A)RGB conversion on the fly.

- The GPU performs processing on the frame in (A)RGB format.

- The GPU performs (A)RGB back to YUV conversion of the processed frame on the fly, and stores the result in the second (output) SHM 621b.

- The CPU notifies IP2VF output module 617b that a video frame is available in the second (output) SHM 621b.

- The IP2VF output module 617b sends the video frame to the next stage in the process, and the system starts again from step 1 once the next frame is available in the first SHM 621a.

In a second example, shown in Figure 7, a similar situation occurs, but here there are two processing steps occurring on the same compute node. In this case, an IP2VF module is not required between the two processing steps. Instead, the general schema for this compute node is:

- CPU of the compute node receives notification from input IP2VF 717a that a Video Frame is available in the input SHM. That is, sufficient IP packets have arrived, in this example, to construct a full frame.

- A GPU forming part of the compute node reads the YUV format frame from the first (input) SHM 721a and performs YUV to (A)RGB conversion on the fly.

- The GPU performs processing on the frame in (A)RGB format.

- The GPU performs (A)RGB back to YUV conversion of the processed frame on the fly, and stores the result in the second SHM 721b.

- The CPU is notified that a frame is available in the second SHM 721 b.

- A GPU forming part of the compute node reads the YUV format frame from the second SHM 721b and performs YUV to (A)RGB conversion on the fly.

- The GPU performs processing on the frame in (A)RGB format.

- The GPU performs (A)RGB back to YUV conversion of the processed frame on the fly, and stores the result in the third (output) SHM 721c.

- The CPU notifies IP2VF output module 617b that a video frame is available in the third (output) SHM 621c.

This approach can be used to chain together as many processing steps as desired on a single compute node. The GPU in step 6 of the above may be the same one in step 2, or it may be a second one, depending on the application. The use of shared memory to transfer the stream to the next stage in a processing pipeline is possible only when the two processing steps are performed on the same compute node. The main limitation on the processing steps which can be performed comes from the capabilities of the GPU(s), so in some cases, a compute node may only be able to support one video processing stage. It should be noted that SHM cannot be used to connect either into or out of a virtual machine meaning that if the video processing is hosted in a virtual machine, IP2VF conversion modules are used to connect the virtual machine to the rest of the processing pipeline.

In any case, when the compute node has performed all the processing steps required of it, it sends the output of the final processing step to a final SHM, which is read by an IP2VF module, configured to convert the stream to IP packets, for transport around the network.

The use of shared memory in this way helps to improve the overall performance of the network, because it reduces the number of copies which need to be made, thereby reducing cache storage, processing etc. For this reason, it is preferred that IP2VF modules have at most two outputs, to reduce the copies generated. However, certain applications may require three or more outputs from a given IP2VF module, which is achieved in a manner analogous with that described above.

Figure 8 shows an IP2VF module 819 in detail. It comprises an input pin 821, a core 827, two output pins 823a and 823b, and a control input 825. As described above, the input pin receives multimedia data in either a frame format, or as IP packets. The core loops over the following cycle:

1. Core 827 reads a new frame from input 821

2. Core converts input frame to desired output

3. Core writes converted frame to an output pin, and optionally to all output pins (2 in this example)

4. Repeat.

The core performs these steps repeatedly, as fast as possible to keep the flow moving in real time. In this context, real time means with minimal delay. For example the transfer of a single frame through an IP2VF module does not take more time than the frame is displayed for when the stream is viewed at normal speed (that is, for a 50 frames/second stream, not longer than 1/50s or 0.02s). Ideally, the core completes and has to wait for a full new frame to be ready at the input pin before it can continue, so the IP2VF module does not introduce any additional delay over the finite processing time inherent in executing the computation steps described herein.

The module 819 receives instructions over control pin 825, which give information about the frame size, resolution, encoding parameters etc. which are useful for knowing when to alert other parts of the system to a full frame being ready (see above in relation to Figs 6 and 7). For example, the frame size and resolution can be used to determine that once a certain amount of data (e.g. number of packets) has been received, then a full frame has arrived. This may be used to trigger the sending of a message via the control pin to the next stage in the system that a frame is ready for processing, forwarding, display, storage etc.

In addition, control pin 825 may provide information relating to the conversion required (e.g. from frames to packets or vice versa). For example synchronisation and control instructions may be provided from a user, or automatic control system, which uses a messaging system to send instruction messages. Instruction messages of this type may be formatted in the ZeroMQ format, for example. The core may be instructed to perform the conversion as efficiently as possible, for example to minimise the number of copies of the stream/frame which are generated. Note that the transfer from input pin to output pin(s) does not necessarily mean that a copy has been made. It may simply involve noting that a full frame is ready for sending to the next step, and then providing a shared memory key for that frame to the next stage of the process.

The output to pins 823 may be IP packets or it may be video frames. Generally, an IP2VF module will convert one of these formats to the other. As shown in Figure 5, however, some nodes may be used solely for creating duplicate streams, and may therefore input e.g. IP packets, and output two (possibly identical, possibly merely similar e.g. same content, at a different resolution etc.) IP packet streams. It should be noted that the IP payload is an IP2VF internal frame format which is transported over IP. In other words, the IP2VF packets transmitted around the network are IP2VF frames. The outputs may each be sent independently to subsequent processing steps, other network locations (i.e. leaving the IP2VF domain entirely, and traversing, the internet, or a Local Area Network, etc.), storage, or in some cases (e.g. testing, or when a destination is disconnected), deleted. In some examples, an additional output pin, e.g. a X264 output pin may be used to generate an output stream.

In the case where the packets leave the IP2VF domain entirely, the last IP2VF node may convert the stream back to SMPTE 2022-6 format which is understood as a suitable format for long range transmission.

Instructions may be received by way of e.g. control panel 212, or they may be read, determined or otherwise supplied with the incoming data stream. For example, it may be possible to determine or read information such as video frame rate; video resolution; multimedia input file format; multimedia output file format; audio format; audio quality; codec usage; multimedia frame size in bytes; or multimedia stream bitrate from the incoming data stream. Such information may be useful to any of the processing steps in the pipeline, and may even be included in the internal IP2VF header in some embodiments. Such information may alternatively be supplied by the control panel or the user. In any event, the instructions may include instructions to change one or more of these things as part of the processing steps to be carried out.

An IP2VF module can be arranged to transfer the multimedia stream between different parts of the system intelligently, so that it reduces computational load as far as possible. For example, an IP2VF module can decide whether to output a frame to a shared memory segment, and then sharing the shared memory key with the next processing module to push the multimedia stream through the system. In particular, this may be used when the next processing step is hosted on the same compute node. Alternatively, the IP2VF module can choose to write the output frame to the output pins as IP packets, for sending to other processes, located on different compute nodes, connected via an IP network.

Turning now to Fig. 9, there is shown a flow chart 950 describing the process for performing a method as disclosed herein.

At step 951, an incoming multimedia stream is received. This occurs much as described above.

At step 952, instructions are received which specify one or more processing steps to be performed on the multimedia stream. These instructions are received, for example from the control panel 212 of Figure 2, and specify steps which a user wishes to have performed on the multimedia stream. Referring to Fig. 8, the instructions may be sent to modules in the network using control pin 825, for example.

The receipt of instructions specifying the steps to be performed can be in a variety of formats. For example, the instructions may originate from a user, who directly controls the system, providing instructions to the system on the fly. Alternatively, the system may be preconfigured, so that particular nodes are arranged to perform particular processing steps, and continue to perform those steps on multimedia streams which pass through them. In this case, the instructions to the system (or equivalently to each node) may be supplied only once at the start of the process, to configure the system. In yet other embodiments, the instructions may be hard-wired into the nodes. That is, the instructions are received in the form of an internal protocol, e.g. provided as firmware or as an application-specific integrated circuit, and the node operates according to this internal protocol. In some cases, the receipt of instructions may comprise both a user providing the instructions and retrieving the instructions from an internal protocol as set out above. That is, the source of part of the instructions may be internal, and part may be derived from an external source.

For the avoidance of doubt, the use of “instructions” herein includes both a complete set of instructions, as well as parts of such a complete set. For example a full set of instructions may specify each processing step to be performed, in which order, which nodes are to perform each step, and where each node passes the stream to once it has performed its processing steps. Alternatively, the instructions may comprise parts of this full process, for example, they may only relate to a single processing step, for example on a single node. Another example of a partial set of instructions is an instruction to a particular node to output content to a particular location, e.g. shared memory, or to an IP pipeline for transfer to a different node. In this latter case, the partial instruction may further include addressing information of the next node.

At step 953, the processing steps are allocated the processing steps to compute nodes in the IP network. Examples of such compute nodes are shown in Figs 6 and 7.

Step 954 determines whether one or more criteria are met, for example whether certain processing steps occur on the same compute node.

Lastly at step 955, the transfer of the multimedia stream between consecutive processing steps is handled by either:

(a) writing to, and reading from, a shared memory segment; or (b) packaging the multimedia stream as IP packets depending on which criteria are met. For example, when two steps are handled on the same compute node, shared memory may be used to reduce the number of copies made, and thereby reduce computational complexity.

This process allows the IP2VF domain to efficiently propagate the stream between consecutive nodes by selecting a transfer method which is computationally least intensive for the situation at hand. By consecutive it is simply meant that the user of the system is able to define an ordered sequence of operations to be performed on the stream. Consecutive operations are those directly adjacent to one another in that ordered sequence.

The criteria, as described above, may include whether two consecutive processes are on the same compute node or not. Other considerations may include the network traffic status, e.g. so that nodes can perform load balancing, and/or determinations may be based on the traffic type being handled. The criteria may simply include determining certain properties of the multimedia stream and using these properties to decide how to distribute the processing steps, and how to handle transitions between consecutive processing steps. In some embodiments, the properties of the multimedia stream may even be used to determine the type of operations to be performed, and/or the order in which the various operations are to be performed. Such analysis may be used to improve the computational efficiency of the processing steps and/or to determine the optimal route which the stream takes around the network.

These considerations may also be used to determine the distribution of processing steps around the network. Certain nodes maybe preferred for certain tasks due to the functionality, or for other reasons, such as load balancing, or planning for other processes to be run in parallel.

In IP networks, single packets can be lost. This applies equally to the IP2VF domain described above. For example, packets may be lost before they even arrive at the IP2VF domain if they arrive from a network location (the internet, a Local Area Network, etc.). Error correction can be included in general, for example, using Forward Error Correction or other error correction systems involving redundancy. However, given the high throughput nature of modern video processing, it may not be viable to spend the time error correcting for missing packets, since a single missing packet in a 50 frame per second video stream accounts for only 0.02s of display time from a viewer’s perspective. Therefore, in some embodiments, a missing IP packet can be corrected for by simply repeating the previous line. In particular examples, e.g. where many packets from a single frame are lost, the best solution may be to repeat the entire previous frame.

IP2VF methods and systems described herein are provided to become the transport layer of any multimedia-processing pipeline. It achieves high-level performances, allowing distribution of UHD video processing across several video processing nodes possibly distributed on different CPU cores or on different physical compute nodes.

In addition, the IP2VF system neither provides nor requires any video synchronization mechanism, nor does it control the video frame rate. That is to say that IP2VF can transport raw video at any frame rate (as long as the underlying hardware allows). For example, the maximum frame rate is limited by the hardware capacity, and not by any overlying system protocol. Limitations can come from the network (Network Interface Controller capacity) limiting the max throughput or from the CPU limiting the speed at which IP2VF modules can process IP packets to rebuild video frames in the server memory.

In classical systems, e.g. Serial Digital Interface (SDI) systems, the video processing is driven by the physical frame rate. Each processing step must be synchronised with the others meaning that a delay introduced by a processing stage is always an integer number multiplied by the duration of a frame, as each process is regulated by a tick (e.g. 50Hz pulse for a 50 frames/second (FPS) stream).

In IP2VF systems such as that described herein, the video frame rate is controlled by the video-processing application, e.g. when a camera is producing a 1080p 50fps stream. Downstream processing steps introduce a further delay as calculations are carried out. However, in classical SDI systems, any such delay is rounded up to the next nearest tick (i.e. to the next highest 20ms in a 50 FPS stream). IP2VF systems do not require synchronisation in this way, so the process streams with fewer delays.

Claims

1. A method of handling multimedia processing of a multimedia stream in an IP network, the method comprising:

(i) receiving an incoming multimedia stream;

(iii) allocating the processing steps to compute nodes in the IP network;

2. The method of claim 1, wherein the criteria include whether consecutive processing steps are allocated to the same compute node.

3. The method of claim 2, wherein, in the event that consecutive processing steps are allocated to the same compute node, the multimedia stream is transferred between the consecutive processing steps by writing to, and reading from, a shared memory segment.

4. The method of claim 2 or 3, wherein, in the event that consecutive processing steps are allocated to different compute nodes, the multimedia stream is transferred between the consecutive processing steps by packaging the stream as IP packets.

5. The method of any preceding claim, wherein receiving instructions comprises receiving instructions from a user.

6. The method of any preceding claim, wherein receiving instructions comprises retrieving instructions from an internal protocol.

7. The method of any preceding claim, wherein the step of receiving instructions includes receiving instructions specifying which compute node is to be used for a particular processing step.

8. The method of claim 7, wherein the instructions specifying which compute node is to be used for a particular processing step include specifying that two processing steps are to be carried out on a single compute node.

9. The method of claim 8, wherein the two processing steps to be carried out on a single compute node are consecutive steps.

10. The method of any preceding claim, wherein a compute node comprises a separate node in the network.

11. The method of any of claims 1 to 9, wherein a compute node comprises a workload encapsulated in a container.

12. The method of any preceding claim, wherein the IP network is configured to transmit and receive IP packets encapsulated in a header, the header including one or more fields for storing information relating to:

processing steps previously carried out; frame number; and/or identification of compute nodes which previously carried out processing steps.

13. The method of any preceding claim, wherein step (v) comprises duplicating the stream to provide two or more output multimedia streams, and transferring to the next consecutive processing step in each stream by:

(a) writing to, and reading from, a shared memory segment; and/or (b) packaging the stream as IP packets depending on which criteria are met.

14. The method of claim 13, wherein each next consecutive processing step for the two multimedia streams is different.

15. The method of any preceding claim, wherein the incoming multimedia stream is sourced from one or more of:

a camera; a microphone; a stored file;

another multimedia processing system; or a network location.

16. The method of any preceding claim, wherein the incoming multimedia stream is compliant with the SMPTE 2022-6 standard.

17. The method of claim 16, wherein step (ii) comprises reading incoming IP packets and assembling video frames in real time.

18. The method of any preceding claim, wherein step (b) comprises deconstructing multimedia frames into chunks, and packaging each chunk into an IP packet.

19. The method of any preceding claim, wherein multimedia includes at least one of audio and/or video information.

20. The method of any preceding claim wherein the instructions are provided by a messaging system.

21. The method of claim 20, wherein the instructions are provided in the ZeroMQ format.

22. The method of any preceding claim, wherein the instructions specify one or more of:

video frame rate; video resolution; multimedia input file format; multimedia output file format; audio format; audio quality; codec usage;

multimedia frame size in bytes; or multimedia stream bitrate of the multimedia stream, before or after a particular processing step has occurred.

23. The method of any preceding claim, wherein the instructions specify one or more of:

text or video overlays for video streams;

image processing for video streams; or audio stream to merge or replace native audio stream of the multimedia stream, before or after a particular processing step has occurred.

24. The method of any preceding claim, wherein the transfer in step (v) results in the multimedia stream being:

(1) sent to a monitor;

(2) sent to a loudspeaker;

(3) written to file;

(4) sent to a next consecutive processing step;

(5) sent to a network location and/or (6) sent to an X264 output pin.

25. The method of claim 24, wherein sending the multimedia stream to a monitor comprises sending a reduced quality multimedia stream to the monitor.

26. The method of claim 25, wherein reduced quality comprises lowering a frame rate of the multimedia stream, relative to the frame rate of the input multimedia stream.

27. The method of claim 26, wherein the frame rate reduction comprises deleting every nth frame, where n is an integer of two or more.

28. The method of any of claims 25 to 27, wherein the reduced quality comprises lowering a video resolution of the multimedia stream, relative to the video resolution of the input multimedia stream.

29. The method of claim 26, wherein the video resolution reduction comprises deleting pixels from each video frame.

30. The method of any of claims 24 to 29, wherein sending a steam to an output includes converting the multimedia stream to a suitable format for monitoring prior to sending the multimedia stream to the monitor.

31. The method of any of claims 24 to 30 wherein the reduced quality comprises lowering audio quality the multimedia stream, relative to the audio quality of the input multimedia stream.

32. The method of claim 31, wherein the audio is downsampled.

33. The method of claim 31, wherein the audio is deleted.

34. The method of any preceding claim, further comprising error correction to correct for missing IP packets.

35. The method of claim 34, wherein the error correction includes duplicating a previous frame, a line of a previous frame, or a packet of a previous frame.

36. A network device configured to perform the method of any of claims 1 to 35.

37. A network configured to perform the method of any of claims 1 to 35.

38. A computer program, computer program product or computer readable medium comprising instructions for implementing the method of any of claims 1 to 35.

Intellectual

Property

Office

Application No: GB 1618438.4 Examiner: Mr Andrew Stephens