GB2483250A

GB2483250A - Producing edit decision list by comparing edited video with source video

Info

Publication number: GB2483250A
Application number: GB1014494.7A
Authority: GB
Inventors: Jonathan Mckinnell; John Fletcher
Original assignee: British Broadcasting Corp
Current assignee: British Broadcasting Corp
Priority date: 2010-08-31
Filing date: 2010-08-31
Publication date: 2012-03-07
Also published as: GB201014494D0

Abstract

Edit decision list (EDL) data 44 is produced by comparing 43 an edited mixer video output that has been derived from selecting one of multiple video sources at any given time against each of the sources, to determine the times at which each source was selected. The comparison can use reduced data indicating characteristics of each frame, in particular a signature of each frame derived from average luminance and texture values. The average luminance values are for the frame as a whole, or portions of each frame and are calculated using multiple different masks. The comparison algorithm includes a threshold arranged to determine if a given source is selected at a time if the signatures from the edited video and signatures from one of the sources match within the threshold limit. The method may be used for recording a rough cut EDL from a live vision mix where the source video streams are from respective cameras.

Description

METHOD AND APPARATUS FOR PRODUCING EDIT DECiSION LIST DATA

BACKGROUND OF THE INVENTION

The present invention relates to a method and apparatus for producing edit decision list data.

An edit decision list is a list of the times at which an editor chose to select from one of a number of sources to produce an edited video programme. Using such an edit decision list, the edited programme produced may be recreated from the original source video data by switching between the sources at the times indicated in the edits decision list.

A typical multi-camera production involves multiple synchronised sources (genlocked' cameras) between which cut decisions are made in real time using a vision mixer, a recording and storage system and a post-production online editing system to produce the final program. It is common to record the separate sources in addition to the vision mixer output, so that cut decisions can be revised or adjusted after recording during post production.

Tapeless recording systems can save time and money by removing the need to ingest the material off the tapes digitally after a production for the post-production stage. The vision mixer's primary use in multi-camera productions is to select sources for real time playback and recording. Metadata can also be recorded during production by a tapeless production system either through interaction with a Graphical User Interface (GUI) (for example for logging and recording cut points) or through automation such as creating camera cut points in an edit decision list (EDL) by parsing messages from a hardware router or vision mixer. For example, the multi-camera director's cut decisions can be recorded real time by a tapeless recording system as a rough cut by recording the vision mixer output via a control output cable. However, these messages and the physical connectivity of the hardware devices to the tapeless production system can vary.

It is not always possible to record an edit decision list at the point of recording each of multiple camera sources and creating an edited cut version either because the recording hardware does not provide an appropriate output or simply because of lack of connectivity between devices. An edit decision list can refer to any list of instructions for making an edited sequence and can include describing how certain sections of an interview are cut out. The main example, though, is in switching between cameras and, we have appreciated, it is in this arrangement that a lack of connectivity between devices creates a problem in creating an edit decision list at the point of recording.

SUMMARY OF THE INVENTION

We have appreciated that it is not in fact necessary to record an edit decision list at the point of creating an edited version from multiple input video sources.

In broad terms, the invention relates to an apparatus and method for producing edit decision list data by comparing an edited video output, such as the output from a vision mixer with each of multiple sources of video, such as multiple individual camera outputs.

The invention is defined in the claims to which reference is now directed.

In an embodiment of the invention, a system for producing edit decision list data has an input for receiving multiple streams of data, each stream of data being derived from a respective video source. The streams of data may be directly derived from video sources, such as video cameras, but preferably each stream of data is derived from a respective source by reducing data from each frame of video such that the data denotes a characteristic of each frame of each source. The system also has an input for receiving an edited stream of data, the edited stream of data being derived from selecting one of the respective video sources at any given time. The edited data therefore represents a video stream as created by a director by selecting from one of multiple sources, for example using a vision mixer, to select from one of multiple cameras. The edited stream may similarly be derived by reducing data from edited video frame by frame to denote some characteristic of each video frame in the same manner as performed for each of the multiple streams.

A comparator is then arranged to compare the edited stream of data and each of the multiple streams of data to determine the times at which each of the respective sources was selected. The times may be relative to any suitable reference, but will preferably be time codes as used in each of the video data sources. An output is then arranged to produce edit decision list data from the times derived by the comparator. The edit decision list data may be in any appropriate format, but will typically comprise a list of time codes and the source selected at that time code.

As noted above, whilst the system embodying the invention may directly compare video frames of multiple sources to the edited video, we have appreciated that it is computationally more efficient to derive one or more characteristics of each video frame in the sources to produce a digest, signature or similar from each frame being a significantly smaller amount of data yet remaining distinctive in comparison to equivalent data from other sources at the same point in time. The comparator may then compare such signatures allowing the comparison to be computationally simpler than comparing whole frames of video. This approach also reduces the amount of data that needs to be transmitted throughout a system embodying the invention, as the signature from each frame may be derived at a local store and transmitted to a central comparator, the transmission of the signature using significantly less bandwidth than transmitting the entire video frames.

The signature of each frame may comprise any appropriate characteristic that may be denoted by significantly less data than an entire image frame, provided that the characteristic is sufficiently distinctive from characteristics from other sources such that, when performing the comparison, one source may be distinguished as a better match than other sources. The signature preferably comprises one or more of the average luminance over a frame and average texture data.

The comparator preferably determines which source is selected by determining whether the match between the edited stream and a given one of the multiple streams is above a specified threshold. The threshold may be adjustable so as to take account of differences in the video stream that may be introduced in the production of the edited video, such as colour, luminance or overlay data introduced by a vision mixer.

The invention may also be embodied in a method for producing edit decision list data operable either on dedicated hardware or on a general purpose processor.

The invention may also be embodied in programme code for executing such a method.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will now be described in more detail by way of example with reference to the drawings, in which: Figure 1: is a diagram of a typical known approach to multi-camera production; Figure 2: is a diagram showing a tapeless production workflow; Figure 3: is a diagram showing the main functional components of a system embodying the invention; Figure 4: shows each of the functional components of Figure 3 in greater detail showing the scalability of the system; and Figure 5: shows texture responses derived from an example image frame.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The system embodying the invention that will be described relates to a tapeless recording system, named Ingex herein, which comprises a computer with multiple Serial Digital Interface (SDI) video I/O cards for capturing multiple feeds of video and storing these to disc. The invention may equally be embodied in other systems. The use of a tapeless recording system can help overcome physical connectivity problems of existing vision mixers by allowing direct comparison of data derived from video feeds from cameras to data derived from the feed from the vision mixer. In this way, it is not necessary to interface to a specific vision mixer, and the system may therefore accommodate use with multiple different vision mixer manufacturers. The output of the system is metadata that may be referred to as Cut Decision data or edit decision list (EDL) data that may be provided in an appropriate interchange format such as AAF or XML file formats which may then be used by a variety of post production systems. The functionality of the comparator within the system will be described in terms of an algorithm which is implemented in such a manner that it is scalable in terms of the number of cameras used in the system (and hence the number of recording systems needed to record those cameras). In order to facilitate understanding of the system, a typical multiple camera system with which the invention may be used will first be described with respect to Figure 1.

The multi-camera production arrangement shown in Figure 1 comprises multiple cameras 8 each providing an SOt video feed to inputs 11 to a vision mixer 10 which may be used by a director to produce an output video stream on 501 video mixer output line 13 to a recorder 12. The recorder 12 then records the edited video which comprises video selected from one of the cameras B at any given time. In addition, the recorder 12 receives a stream of video direct from each camera 8 at inputs 9 and also records these raw video streams. The edited video and streams of video from each camera may be recorded as uncompressed video, but typically will be recorded as a compressed version. Optionally, the vision mixer may have a control output 17 which provides metadata which would also be recorded by the recorder 12. The recorder 12 can produce an output 15 of the recorded data for use by post production system 14. The post production system 14 may thereby retrieve and process each of the video streams from the cameras 8 as well as the edited video streams from the vision mixer 10. However, for the reasons previously described, the vision mixer 10 in such a system may not produce an appropriate control output and so the edit decisions made using the vision mixer 10 may be lost.

The edit decision data may be derived using the method and system of the present invention using a workflow as shown in Figure 2. In the example of Figure 2, there are two streams of audio-video data on two separate cameras provided at inputs 21 to respective recorders 22. The preferred recorders use the ingex" technology, and will be referred to as Ingex recorders or Ingex boxes for ease of reference, but other recording technologies are perfectly useable in the system embodying the invention. Hard disc drives are provided as part of the Ingex recorders 22. A controller PC 30 may be used to control the recorders by a network 29.

The number of video sources which can be captured and recorded by one lngex box is dependent upon the computational power of the lngex box (as well as the limitation of the number of Serial Digital Interface (SDI) capture cards which can slot into the motherboard). Typically each Ingex recorder has 2 High Definition Serial Digital Interface (HD SDI) feeds or 4 Standard Definition Serial Digital Interface (SD SDJ) feeds per box. As an example, at current productions 4 SD SDI camera feeds are required to be recorded in addition to the SD SDI video feed from the vision mixer.

The system may compare the video feeds directly (using for example computer vision techniques) to reliably and frame accurately match the vision mixer output to one of the camera video feeds. In order to match the signals to the mixer out video, the signals (or some proxy "signatures" for each frame) from each source video and the mixer out video must be conveyed to a single comparison machine (which could be one of the recorders).

Preferably, the system uses proxy "signature(s)" for the frames to ensure that network bandwidth is not an issue, as may well be the case if the entire video frames had to be available for comparison on one Ingex box (especially for HD content). Additionally, this ensures that allocation of memory on the Ingex box making the comparison is not an issue.

Given the variety of different vision mixers in use on productions, another desirable feature of the proxy signature comparison algorithm is that it does not require the video feed from the mixer out to be exactly the same as one of the camera video feeds (which cannot be guaranteed as some minor distortion or image processing algorithm may have been executed on the video feeds inside the vision mixer).

This approach also allows for more general use cases which will be discussed later.

The process blocks implemented by the invention are shown in Figure 3. At block 40, a signature for each frame of an image in each of four SDI video feeds is calculated. The preferred signature will be described tater, but may be any appropriate digest of a video frame which provides a reduced quantity of data to represent some characteristics of the image. Each signature should be sufficiently distinctive of the underlying image frame so that it may be distinguished from signatures of frames taken at the same time from other sources. At block 42, a signature is calculated for each frame of the edited or mixed SOt output. The signature should be produced using the same algorithm or technique as the signatures for each frame of the four SDI camera feeds, or alternatively, a different signature technique may be used provided that this is accommodated in the comparison algorithm used. The signatures for each frame along with the time code of each frame are provided to a comparator block 43 implemented by comparison algorithm. The comparison algorithm may be implemented on one of the lngex recorders or may be a separate device receiving the signature data via a network 45. Referring back to Figure 2, briefly, the alternative in which the comparison algorithm is implemented on one of the lngex recorders is shown with the signatures being distributed between the Ingex recorders by multicast shown as connection 27. Referring back to Figure 3, the algorithm in the comparison block 43 performs a comparison for each signature from a given time code of each of the four streams against the signature from the mixed stream and provides output metadata denoting where cuts have been made in producing the mixed output and saves this metadata 44 in a database or output file. The edit decision list data may be recorded in a variety of possible formats. Preferably, the data is stored in some form of database allowing later retrieval as an AAF or XML file which may be generated from the data in the database.

The overview of the technique is outlined in Figure 3 and consists broadly of the following steps: For each feed going into an Ingex recorder, an image signature is calculated for each frame. This is done for the multiple camera source feeds and one mixer out feed.

* The signatures from each recorder in the system are collected together and compared.

* For each frame, a decision is made as to which camera source matches the mixer out (if any), and recorded as cut metadata with the matching timecode.

The frame matching algorithm used in the embodiment for a multi-camera shoot is now described in more detail.

The Capture process captures live SDI feeds in real time using a capture card and software to place the frames into a shared memory buffer on each Ingex Recorder.

Each Ingex Recorder is capable of recording 4 Standard Definition SDI feeds. If more than four SD SDI feeds are in the multi-camera shoot, a second (and third etc) Ingex Recorder can be used.

During the Capture process, or later, the signature responses (Image Signatures) of each frame are derived from each feed and for the SDI feed from the "Vision Mixer". In total, 10 signatures are derived for each frame comprised of: a) Average luminance over a frame b) Average over a frame of the absolute value of the nine Laws' 3 by 3 texture mask response energies over the luma values of image. The nine 2D masks are formed by the set of outer products of the average mask (L3) [1 2 1]; the edginess mask (E3) [-1 0 1]; and the spottiness mask (S3) [-1 2 -1]. This set of nine masks form an orthogonal basis for texture on the Cartesian x-y grid. The Laws' Masks are known to the skilled person.

An example of one of the nine 3 by 3 texture masks is shown below, which is formed from the outer product of the L3 and S3: 1 -1 2 -1 2 [-12-11= -2 4 -2 1 -1 2 -1 The absolute value of this mask response is taken at each pixel in a frame (taking care of image boundary conditions) and the average over the image is the L3S3 signature response for that frame.

The averages are stored as integers including a standardised timecode for the frame along with any other relevant frame data as a frame signature packet (which corresponds to the frame itself) into a shared memory first in first out buffer. The frame signature packets are then multicast using UDP packets across a network using, for example, an Ethernet connection. These multicasted streams are scalable in terms of number of cameras in a multi-camera shoot because to record a greater number of cameras more Ingex Recorders are attached to the Ethernet network.

The Comparison Algorithm is then run on one of the Ingex Recorder machines (or a separate machine) in the system, which, as well as recording the video feeds captured by that recorder, makes a connection to the multicast stream of UDP frame signature packets coming from each camera and the vision mixer output.

These frame signature packet are then placed into a second, separate, shared memory buffer on that Ingex Recorder.

The signature responses of the camera feeds are then analysed and compared with the signature responses of the mixer out feed for a particular timecode. By sorting the shared memory buffer into a list a recursive algorithm (stops searching as soon as timecodes from all feeds are matched) can be used to match the timecode across the camera feed and mixer out frame signature packets. This is advantageous because if the separate multicast frame signature packets streams drift against each other over time (or if one or more multicast streams are lost intermittently for whatever reason) the matching algorithm can still record the correct timecode by comparing previous frames in the buffer.

If the signatures of a camera are within a tolerance value to the signatures of the mixer out and the camera has changed from the previous cut decision then a cut decision is recorded as raw data. This data can then later be recorded and associated with the media such that it will then appear in a multi-camera sequence as a cut in the edit decision list in post-production using an appropriate open standards based recording program (for example in AAF or XML format). An example of the comparison algorithm will be discussed in more detail later.

The choice of multicasting the signatures (and the use of the UDP network protocol) can have advantages in a real time production environment. It is observed that connectivity can be interrupted, especially when a recording is not taking place. This can be due to reorganisation (e.g. of cameras) or uncontrolled movements within the studio leading to wires being unplugged and re-plugged or a connection lost for a fraction of a second. Therefore, if for any reason a signature or timecode for a certain frame has been corrupted, or cannot reliably be frame matched, the algorithm's response is to move onto the next timecode. This is useful in real time applications, as the correct cut decision will still be identified, but a frame late (or later depending on the loss of connectivity time). This is preferable to no cut decision being identified. This also ensures that future cut decision identification is not affected by a previous loss of connectivity.

The scalability of the embodying system is demonstrated in Figure 4 which shows the arrangement of Figure 2 in greater detail with like components given the same reference numerals. As before, inputs 21 to respective recorders 22 provide SDI video feeds from cameras and an SDI feed from a vision mixer. The output of each recorder is provided over a multicast network 29 either to a separate device or from one box to another box which has the functionality for a comparison block 43 to produce the edit decision list output data 44 as already described. The comparison block is fed by a shared memory buffer 50 as described above.

Each recorder 22 has the functionality to provide the image signatures after capturing an image in block 51, the signatures are produced in block 54 and a separate time code block 55 takes a time code from a common source. Typically, in a multi-camera environment a common time code source will be defined which may be one of the cameras or a separate source used to provide the time code to each camera and also to provide frame synchronisation of each camera. In the absence of a time code provided by the video data sources, a time code may be derived within the recorder in the time code block 55. The image signatures for each frame and the accompanying time code for that frame are provided to the frame signature block 55 which is assembled along with a YUV frame 52 into a frame 53 and provided to a shared memory buffer 58 and then recorded on a hard disc or other recorder 57. The signature UDP packets 59 described above are then output from the shared memory buffer and multicast as already described.

A particular advantage showing the scalability is that the computationally expensive process of deriving the signatures may be distributed amongst multiple respective hardware boxes, rendering a smaller amount of data to be transmitted across the network 29. For each extra source or group of sources added, an extra recorder box may be added to the system with the accompanying functionality to derive the image signatures. In theory, a large number of separate sources may therefore be added to the system, each having additional hardware for deriving the image signatures.

The example comparison algorithm will now be described in further detail. For ease of understanding of the algorithm, the nine separate texture masks are visually represented by the results of applying a mask to an image as shown in Figure 5. As can be seen, each mask produces a different visual representation from an image. The average luminance is then taken for that image rendering a single numerical value for each of the nine masks plus a numerical value for the average luminance of the frame giving a sequence of ten numbers and an appended time code for transmission across the network and for comparison within the comparison algorithm.

An example of the comparison algorithm for a particular matching time code is: 1) If the average percentage difference of the nine texture response signatures AND the percentage difference of the luminance response signature is less than the threshold (typically around 1 %) then a potential match for the corresponding source to the mixer out video feed has been found.

2) If more than one potential match for that frame is found then the source which gives the least average percentage differences in texture and luminance is se'ected as having the highest confidence value, provided that the potential matches do not have very similar percentage differences (within a second threshold) 3) If more than one potential match for that frame is found with equivalent percentage differences for both luminance and texture, i.e. differing by less than the second threshold, then the first source is selected to avoid multiple cuts being recorded in the case of two or more identical sources matching the mixer out.

The algorithm described should ensure that the correct source is selected in a variety of situations. For example, a single source may have been recorded twice by two different recorders. The algorithm thereby needs a mechanism to select one or the other and to avoid swapping between the two different sources if, in fact, no edit has been made. If a small amount of noise has been introduced in the recordings, it would be possible for one source to match and then another in quick succession. To avoid this, the algorithm selects the first source that meets the matching criteria.

Whilst the signatures derived from each image have so far been described in terms of producing a signature for each entire frame, it is also possible to derive a signature from a portion of each frame so that multiple signatures are provided for each frame. One example would be to separate the top, middle and bottom portion of a frame and derive a set of ten (nine masks plus average luminance) signatures for each portion of the frame giving thirty signatures for each frame.

These may then be compared and matched against similar signatures for the edited video. Using this technique, spurious differences may be eliminated, such as sub-titles introduced by a mixer, which may be accommodated in the comparison algorithm by determining that a match on the top two sections of the image may be sufficient if no other match is found. The images may be split into other such regions, such as other numbers of horizontal bands or vertical bands, or a combination of both, to give separate square or rectangular regions.

The comparison algorithm preferably implements a threshold such that a match based on a confidence level must be found with a frame above a given threshold for the frame to be considered a candidate match. If two different sources provide frames which are above this threshold in terms of the confidence of a match, then the one with the higher level of confidence is selected as the source and the edit decision list created accordingly.

in the event that a given camera source is deemed to be a match and then subsequently frames for that camera are not received for some reason during which period no match is found, and then frames are subsequently received from that source and matched, the algorithm determines that no edit has been made and determines that a drop in the signal has occurred. a

S

An example of how the algorithm behaves for an example set of data is shown below. Consider, for example, the case shown below where T stands for a texture response percentage difference between source and mixer out and L the corresponding Luminance percentage difference. In this case, consider the threshold to be 1 %. In this case source 1 is found to be a match for the vision * mixer.

11 T2 T3 T4 T5 16 17 T8 T9 AvgT L Source 1 0.1% 0.01% 0.2% 1% 1.5% 0.5% 0.6% 0.4% 0.8% 0.57% 0.01% Source 2 20% 15% 5% -30% 30% 60% 35% 60% 50% 34% 30% Consider the second case below. In this case no match is found for the vision mixer at this frame because the luminance value is over the threshold for both sources.

TI T2 T3 -14 T5 T6 T7 Ti T9 AvgT L * * Source 1 0.1% 0.01% 0.2% 1% 1.5% 0.5% 0.6% 0.4% 0.8% 0.54% 1.1%

S * *

Source 2 20% 15% 5% 30% 30% 60% 35% 60% 50% 34% 30% *. *r S."...

. In the third case below, both sources could be matches, but the algorithm selects the first matching source, namely source 1.

-TI T2 13 T4 15 16 T7 18 19 AvgT L Source 1 0.1% 0.01% 0.2% 1% 0.5% 0.5% 0.6% 0.4% 0.8% 0.54% 0.1% Source 2 04% 0.01% 0.3% 1% 5% 05% O% 0.5% 0.8% 0.54% 0.9% Some advantages of the algorithm described will now be described for completeness of understanding.

Consider the case where the vision mixer signal has been compressed, and/or the camera video feeds have been compressed, using some standard compression format, and therefore the vision mixer output does not exactly (bit for bit) match that of any of the camera feeds. Due to the use of computer vision techniques (specifically texture and brightness) to match the camera signals, the vision mixer output will still be matched to the correct camera and the EDL is therefore still recorded correctly. This is because the proxy signatures are matched within a small tolerance value for each frame and individual pixels are not matched at specific positions of the camera feeds to the vision mixer output feed. In this way the relevant information content of the frames are the brightness and texture of the image as a whole rather than values of individual pixels.

* Consider the case where the output from the vision mixer is of a different resolution to the camera video feeds. Again, because the proxy signatures are matched and not individual pixels, the correct camera feed is recorded as being the output of the vision mixer.

* Consider the case where the output from the vision mixer has been line shifted or cropped by some small value. Using the proxy signature matching technique the correct camera output is matched to the vision mixer output.

* Consider the case where the output from the vision mixer has some graphic painted over a certain position in the image (for example a burnt in timecode) and it is desirable to still match the output from the vision mixer with the correct camera feed even though the vision mixer output is visually different from any camera feed. The algorithm can still be used, with each frame of the image now being split into (e.g.) 16 rectangular regions each with the 10 corresponding proxy signatures. If a sufficient number of these segments of an camera feed's frame are matched to the corresponding segments on the mixer out feeds frame segments then the camera matching algorithm can still be used to record a cut decision.

Consider the case where the output from the vision mixer has been colour graded and it is still desirable to be able to match the vision mixer output to one of the cameras. This can be achieved by allowing difference threshold values to be applied to the texture proxy signatures for frame and the intensity proxy signature for the frame.

Claims

CLAIMS1. An apparatus for producing edit decision list data, comprising: an input for receiving multiple streams of data, each stream of data being derived from a respective video source; an input for receiving an edited stream of data, the edited stream of data being derived from selecting one of the respective video sources at any given time; a comparator arranged to compare the edited stream of data and each of the multiple streams of data to determine the times at which each of the respective sources was selected; and an output arranged to produce edit decision list data from the times derived by the comparator.
2. An apparatus according to claim 1, further comprising means for producing the multiple streams of data and the edited stream of data by producing a reduced data version of each frame of the respective multiple streams and edited stream.
3. An apparatus according to claim 2, wherein the reduced data comprises data representing a characteristic of each frame.
4. An apparatus according to claim 2 or 3, wherein the reduced data comprises deriving a value for one or more signatures for each frame.
5. An apparatus according to claim 2, 3 or 4, wherein the reduced data is produced for one or more portions of each video frame.
6. An apparatus according to any preceding claim, wherein the comparator is arranged to indicate an edit time if the results of the comparison for a given frame indicates a different source is matched, than the results of the comparison for the previous frame.
7. An apparatus according to any preceding claim, wherein the comparator indicates that a given source is selected if the comparison for that source provides a result above a threshold level.
8. An apparatus according to claim 7, wherein the comparator determines that a given source is selected if that source has the highest such comparison result.
9. A method for producing edit decision list data in a recording system comprising: producing multiple streams of data, each stream being derived from a respective video source; producing an edited stream of data, the edited stream being derived from selecting one of the respective video sources at any given time; asserting a comparison process in which the edited stream of data is compared to each of the multiple streams of data to determine the times at which each of the respective sources was selected; and producing an edit decision list data output from the times derived by the comparison.
10. A method according to claim 9, wherein producing the multiple streams of data and the edited stream of data comprises producing a reduced data version of each frame of the respective multiple streams and edited stream.
11. A method according to claim 10, wherein the reduced data comprises data representing a characteristic of each frame.
12. A method according to claim 10 or 11, wherein the reduced data comprises a value for one or more signatures for each frame.
13. A method according to claim 10, 11 or 12, wherein the reduced data is produced for one or more portions of each video frame.
14. A method according to any of claims 10 to 13, wherein the comparison process indicates an edit time if the result of the comparison for a given frame indicates a different source is matched, than the results of the comparison for the previous frame.
15. A method according to any of claims 10 to 14, wherein the comparison process indicates that a given source is selected if the comparison for that source provides a result above a threshold level.
16. A method according to claim 15, wherein the comparison process determines that a given source is selected if that source has the highest such comparison result.