GB2507729B - Method and apparatus for quality checking - Google Patents

Method and apparatus for quality checking Download PDF

Info

Publication number
GB2507729B
GB2507729B GB1219924.6A GB201219924A GB2507729B GB 2507729 B GB2507729 B GB 2507729B GB 201219924 A GB201219924 A GB 201219924A GB 2507729 B GB2507729 B GB 2507729B
Authority
GB
United Kingdom
Prior art keywords
analyser
warning
audio
output
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
GB1219924.6A
Other versions
GB201219924D0 (en
GB2507729A (en
Inventor
Mckinnell Jonathan
Glanville Mark
Tudor Phil
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Broadcasting Corp
Original Assignee
British Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Broadcasting Corp filed Critical British Broadcasting Corp
Priority to GB1219924.6A priority Critical patent/GB2507729B/en
Publication of GB201219924D0 publication Critical patent/GB201219924D0/en
Publication of GB2507729A publication Critical patent/GB2507729A/en
Application granted granted Critical
Publication of GB2507729B publication Critical patent/GB2507729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N17/00Diagnosis, testing or measuring for television systems or their details
    • H04N17/002Diagnosis, testing or measuring for television systems or their details for television cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Description

Method and Apparatus for Quality Checking
BACKGROUND OF THE INVENTION
The present invention relates to systems and methods for monitoring the quality of audio video content from each of multiple sources. In particular, the invention relates to monitoring the audio and video produced by multiple cameras in a production environment.
Programme makers are being required to produce greater volumes of high quality content for a larger variety of delivery platforms and audiences with ever decreasing budgets. Alerting production staff of potential problems with audio or video content could help save time and money both on the production and in the editing suite by expediting the process whilst potentially avoiding the failure of a technical review. There are a variety of possible technical errors or potential problems which can occur in a production.
SUMMARY OF THE INVENTION
We have appreciated the need for improved systems and methods for alerting users to the fact that a source of audio video content may not be providing that content at a required quality level.
In broad terms, the invention provides systems and methods for monitoring the quality of audio video content by analysing the content from each of multiple sources and determine if the quality of content matches a required level. An output warning signal is then asserted if a given source does not match the required level.
The nature of the quality checking may take a variety of forms including, for example, checking for the presence or absence of the signal, for significant corruption of a signal or more subtle quality issues such as colour changes and noise within an image or audio fading or noise within an audio track.
In a first aspect, a system embodying the invention provides a store for allowing user selection of metrics to apply to the content so as to define for each source of audio video content the metrics that should be applied. For example, in a multi camera setup, some cameras may be on tripods and others handheld. In this scenario, the user can define for each camera whether or not the camera shake metric should be analysed.
The invention provides a system for monitoring quality of audio video content, comprising: an input for receiving audio video content from each of multiple sources; an analyser arranged to receive the audio video content from the input and to analyse the content in accordance with user selectable metrics; a settings store for storing user selections defining which of the selectable metrics to apply to the content from each of the multiple sources; and an output arranged to assert a warning signal for each respective source if the analyser determines that the content from that source does not match the selected metrics for that source.
The output preferably asserts a warning signal in the form of a visual indication to be displayed on or alongside a monitor showing the video for each respective source, preferably by some form of overlay on the display. A system embodying the invention is thereby able to detect such errors and display them to a production team. Errors can be critical, and as such are given a red traffic light, or potentially more subjective, which are given an amber traffic light. These traffic lights can be overlain on the live feed, shown, for example on a monitor in the studio gallery. Footage in which no errors were detected is given a green traffic light. The content may also be tagged with metadata pertaining to the individual error at the relevant time code, making it readily available in the editing process.
In a second broad aspect embodying the invention, a system is arranged to compare the quality of one source against at least one other source in a multiple source arrangement and to assert a warning signal if one source does not match the quality of another source by a given amount. For example, with multiple cameras used in a studio environment, the colour balance of all cameras should be similar and if one of the cameras produces a different colour balance, a warning may be given.
The invention according to this aspect provides a system for monitoring quality of audio video content, comprising: an input for receiving audio video content from each of multiple sources; an analyser arranged to receive the audio video content from the input and to analyse the content by comparing a quality of the content from each source against a quality of the content from at least one other source; and an output arranged to assert a warning signal for each respective source if the analyser determines that the quality of content from that source does not match quality of content from another source by a given amount.
The improvements of the present invention are defined in the independent claims below, to which reference may now be made. Advantageous features are set forth in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be described in more detail by way of example with reference to the accompanying drawings, in which:
Figure 1: is a block diagram showing the main functional components of a system embodying the invention;
Figure 2: illustrates possible video errors;
Figure 3: illustrates possible audio errors;
Figure 4: is a schematic diagram relating to frame stability detection;
Figure 5: shows a graphical user interface to illustrate regions of high exposure;
Figure 6: is a flow diagram of a camera shake algorithm;
Figure 7: shows a difference image used for difference detection;
Figure 8: is a flow diagram showing an algorithm for determining if a feature is in shot;
Figure 9: shows a sample image for histogram detection;
Figure 10: shows the histograms for the image of Figure 9;
Figure 11: illustrates a graph for a zero-crossing algorithm;
Figure 12: illustrates a first derivative peak finding algorithm;
Figure 13: is a flow diagram of a peak finding algorithm;
Figure 14: shows the peaks found by a peak finding algorithm on the histograms of Figure 9;
Figure 15: illustrates a quality checking graphic user interface;
Figure 16: is a flow diagram of a temporal photosensitive epilepsy detection algorithm
Figure 17: shows Fourier transforms of maximum intensity historic arrays; Figure 18: illustrates the use of chroma key;
Figure 19: shows graphical user interfaces in use;
Figure 20: is a flow diagram of an audio processing algorithm;
Figure 21: is a Fourier transform of one frame of audio samples;
Figure 22: illustrates a peak finding algorithm applied to Figure 21;
Figure 23: is a plot of frequency domain response when feedback is present; Figure 24: shows waveforms of speech and sine wave;
Figure 25: shows the waveforms of Figure 24 with a logarithmic y-axis;
Figure 26: shows the sum of two waveforms of Figure 25;
Figure 27: shows a graphical user interface for a multi-microphone setup;
Figure 28: shows a fast Fourier transform of samples; and
Figure 29: shows historical peak power over a few seconds of audio.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION
Broadcasters have technical standards that they require content to meet before being broadcast. The process of checking the technical quality of programmes is known as ‘technical review’. Programmes that fail a technical review must be fixed, generally by re-editing, which can be time consuming arid expensive. A system embodying the invention provides a general tool which flags up all technical errors that would fail a technical review to aid in insuring a high level of technical quality as well as mitigating the cost of fixing errors in post-production. This possible set of errors is quite large and varied. It could include, for example, acoustic feedback on one of the audio feeds, or a high level of noise on one of the isolated video feeds. By flagging any potential errors to the production staff, faults can be corrected or problematic footage reshot if necessary. *
As well as technical, it is also useful to flag unusable footage (for example, an isolated camera feed which is out-of-focus while the camera operator lines up the next shot) and record an associated tag as metadata which would be available in the editing process. The process of automatically creating metadata, such as metadata about the quality of the content, is part of a wider area of study pertaining to automatically marking up footage. It has numerous benefits in terms of search, classification and the linking of content. Examples include: face detection and recognition; speech recognition for tagging the most frequently occurring words to a scene (which can aid searching and organisation of content), or aligning a script or transcript to the audio (which could be useful in, for example, text-based editing); and detecting sky as described in as a first step towards classifying footage as indoor or outdoor, which is the subject of future work. Metadata is considered most useful when it persists throughout the production workflow. A system embodying the invention provides automated quality checking. A variety of possible checks can be performed in order to ascertain the quality of footage. Each of these checks can be carried out separately, and subsequently combined and presented to the production staff via a Graphical User Interface (GUI).
The errors flagged to the production team are of two general classes. The first are those which can be detected both with a high degree of certainty, and would be absolutely critical to the production (such as a dropped feed), and are given the colour red. In the case of video, these are the “capture errors” which are shown in Figure 2 below. Secondly, there are errors which can be subjective, such as an out-of-focus shot, and errors which are not absolutely critical to the production, such as footage which is over-exposed (these are shown in Figure 2 below as the amber errors). These amber errors could be introduced by the camera operator, or due to the camera itself, or they may be errors that could have a health effect on the viewer. These error messages are collated for each isolated camera feed and presented in a GUI which shows the multi-camera group, the error message(s) and the “traffic light” associated with that isolated camera feed. An example of this GUI is shown in the bottom of Figure 2.
Automated detection of audio errors is also desirable in a production. Due to time pressures, fewer resources and staff, and the potential for a large number of isolated microphone feeds, efficient audio monitoring and error flagging tools would be useful to production staff.
Similar to the video errors, audio errors can be split into categories and levels of priority. Those errors which would cause the recorded audio material to be unusable are flagged as red errors, and fall into the category of capture errors. Examples could include a dropped feed, or high levels of crackling or pops and clicks causing the footage to be unusable. All other categories of errors are flagged as amber errors, as they may in some sense be subjective or could be fixed in postproduction. Clipping due to incorrect microphone levels could be considered to be a critical error if the clipping is excessive, however clipping can occur naturally due to a short period of unusually loud noise (such as a gun-shot), where it may be the case that microphone levels are correct even though clipping has occurred, therefore we consider clipping as an amber error.
The GUI for audio errors can be seen at the bottom-middle of Figure 3. The left hand column shows a short history of the peak power of each of the individual audio feeds, with their respective errors appearing as the red boxes to the right of these peak power histories. The middle column shows the frequency domain plot of each audio feed, whilst the right had column indicates the historical peak powers of the audio feeds on the same axis as a comparison. A system embodying the invention will now be described in relation to Figure 1 which shows the functional components of an entire system embodying the invention. Multiple sources of audio video content provide that audio video content to one or more displays 12 via a connection path 7. Typically, the sources will be cameras either in a studio or at a remote location connected via a wired or wireless communication link 7 direct to the displays 12 of a production team. In addition, the cameras will connect to a variety of other studio equipment including recorders, controllers, and so on that are not shown here for simplicity of explanation. In addition to connecting to the displays 12, the cameras 2 have an input 3 to a monitoring system 4. The input 3 may comprise multiple separate hardware inputs, one for each camera, or more typically will be a communication network over which audio video content from each of the cameras will be streamed and received at a single network input port to the monitoring system 4. The monitoring system 4 has an output 5 that can be asserted to the displays 12 to provide a visual ihdication of the existence of a quality problem with one or more of the cameras 2. This visual indication can be separate from the display of image on the display, but is preferably an overlay on the relevant display or part of the display. In this regard, the output signal on the output 5 may include the audio video content itself combined with the visual indicator. In such an arrangement, the monitoring system 4 would be included as part of a larger system such as a production suite.
The monitoring system 4 also has an output 9 providing data indicating the nature of any quality issues regarding the audio video sources coupled to a metadata store 13 arranged receive the data and to as store metadata describing any quality problems with the corresponding feeds. The metadata may be periodically stored, such as at a certain interval of a number of frames, or may be stored whenever there is a change indicated by the signal.
The monitoring system 4 comprises an analysis module 6, a settings database 8 and an output control 10. The settings database 8 stores user definable values to indicate the sources 2 for which monitoring is required and the types of analysis that should be performed to determine if those sources are correctly functioning. The types of analysis will be described later.
The analysis module 6 undertakes the analysis of each audio video signal in accordance with the settings from the settings database 8 and provides the results to output control 10 which asserts the output signal either separately or as part of the audio video signal at the output 5 as already mentioned.
The settings database 8 may be implemented by any known database technology and functionally provides a table with a row for each source 2 and columns for each of the types of analysis that are to be performed. The settings database 8 may be updated by an input at input line 9, which would typically be by a graphical user interface software controller. In this way, the user can select exactly which of the analysis techniques should be applied to which of the camera sources 2. In addition, the settings database stores for each source analysis combination the nature of the alert that should be given, for example green, amber or red. The user thus has control over the nature of the analysis for each source as well as the nature of the indication given.
The various types of analysis that will be performed will now be described in greater detail with reference to the remaining figures. The video quality checking tools will be discussed first followed by the audio checking tools.
Video quality checking tools
Video content can contain a variety of different errors which manifest themselves in a variety of different ways. Some of these errors could be both quick and essential to fix, such as frames being dropped caused by a faulty cable, or more difficult to fix, such as noise being caused by a high gain setting in a relatively dark environment. This variety of possible technical errors is compounded by time pressures on production staff who have multiple tasks to perform in a short period of time. Therefore, a tool which attempts to identify and flag up errors from this large set would be most beneficial to production staff rather than concentrating on one possible error and refining and optimising the algorithm to detect this one aspect of the video.
As in Figure 2 video errors can be split into four categories: 1. ) Capture errors 2. ) Camera operator errors 3. ) Camera errors 4. ) Health
Capture errors would tend to be caused by studio equipment such as cables, recording devices, connections, time code generators, etc. This type of error can cause irrevocable damage to a video feed, such as a dropped frame, and it is therefore of a high priority to detect these errors reliably and to promptly inform the production staff. Due to the critical nature of these errors, they are also relatively easy to detect requiring few concepts from computer vision.
Capture Errors
Examples of errors which can occur due to a faulty capture process are dropped frames, duplicate frames or even dropped feeds. These errors can all be classified as some kind of instability in the video through time. Therefore, a method for quantifying the stability of an isolated camera feed would be beneficial. One possible approach is to attempt to identify a single frame error using the surrounding frames as comparison.
Consider a sequence of frames: A, B and C. By comparing the difference between frames A and C to the difference between A and B, for example, a measure for the stability of the video feed can be formed. The two conditions for stable video are stated below. For example, if frame B is a black frame (dropped frame) then this will exhibit a large difference between its surrounding frames and hence the video is “unstable”. A separate algorithm is run which detects black pixels in a frame, where if the number of black (for example, black pixel values are less than 16 for 8 bit integers) pixels is greater than a threshold a black frame is detected, which can be used in conjunction with instability detection to reliably detect dropped frames.
Another possible instability is a video feed where part of the information encoded in the frame was lost, where video could tear or a frame could have encoding artefacts on it. This could also be detected via the criteria outlined above, as long as the effect was larger than the difference threshold. The threshold was set from the analysis of a variety of test footage, however, a future work would be to classify the threshold from a large set of data including known errors.
In the case of a duplicate frame, for example frames B and C being identical, a different algorithm must be employed, that makes a mathematical comparison of the pixel values of the two frames. It should be noted that this algorithm is intended for use with video images captured from a camera.
Duplicate frames in computer generated graphics are not necessarily erroneous.
Camera Operator Errors
Camera operator errors are ones which would usually be flagged up by the multicamera director on a production. Production environments can be hectic, with little time to watch the footage, and a director can have multiple tasks to perform. A tool which could aid the director in spotting these types of errors on each isolated camera feed would be useful. Typically, errors of this type are not as severe as the capture errors list above. For example, soft focus may be intentional, and footage is often still usable even with small amounts of noise associated with it. These errors are therefore in the “amber” category, some of which are listed below: 1. ) Chromatic Aberration 2. ) Focus 3. ) Noise 4. ) Exposure 5. ) Shaky / nudged camera 6. ) Boom in shot 7. ) Multi-camera white / colour balance
Automatic detection of chromatic aberration, out-of-focus images and noisy images are all self-contained problems in computer vision, and are the subject of many known algorithms which will therefore not be covered here. When a camera operator is lining up for the next shot, it is likely that the footage will be out-offocus. Therefore, as well as flagging the potential error to the production staff, out-of-focus flags can be useful in the edit for distinguishing between “good” footage and camera line up footage. Noisy images can be formed due to operator error, such as having the gain settings too high on the camera. The technical review process for broadcasting requires new footage to not contain an excessive amount of noise or chromatic aberration. Therefore it is desirable to fix these errors during production.
Exposure
Automated detection of incorrect exposure can be useful not only in terms of individual camera feeds, but also in terms of a multi-camera group, where the director or editor can select different cameras at different times to form the final cut of the programme. For example, if one camera is over-exposed relative to the others it would be useful to flag that to the production staff so the director can prompt the camera operator or vision operator to fix the problem. Although cameras’ exposures might be aligned at the beginning of a production, the camera positions and lighting may change causing exposure to be a problem. Correct exposure is most simply characterised as an effective use of the available bit depth. Ultimately though, bit depth is finite and there will be some situations where it is not possible to capture all regions of the image with sufficient dynamic range to represent the image accurately. Some sense of exposure quality can be calculated using a rudimentary algorithm which counts the number of pixels over and under certain high and low threshold values (set by experiment at 20 and 235 respectively for 8 bit video). If either of these pixel counts exceeds a given proportion of the total pixels in the image (set by experiment at 20%) then this frame is flagged with an exposure warning. It was found that frames will be frequently flagged as over-exposed in cases where there is sky in the image. In such cases there exists a large range of exposures across the pixels in the image, therefore it is not necessarily the case that the exposure settings on the camera are incorrect - it is just the nature of the scene. This problem was mitigated by concurrently running a sky detection algorithm on the video. In cases where there the region of over-exposure was sky the choice could be made either to not flag a warning, or to flag an exposure warning but “with sky”, allowing the warning to be more readily ignored by the director. In future this algorithm could be improved by further analysis of the contiguous regions of over-exposure (or under-exposure) and attempting to characterise the reasons for the exposure warning.
It was also found to be beneficial to include zebra lines on regions of high exposure on the GUI, mimicking the behaviour of camera viewfinders. An example of the lines can be seen on the image in the bottom left of Figure 5 below.
Shaky / Nudged Camera
One problem which can occur during productions is for the camera to get knocked, either by knocking the tripod or by the shaking of the floor or mounting of the camera. Also a roaming camera which the operator is not able to hold sufficiently steady should also be alerted to the multicamera director so they can give feedback to the camera operator. A shaking or nudged camera can be characterised in the video footage as a brief period of excessive random movement, surrounded both before and after by relatively little movement. In this way it can be distinguished from the normal movement of, for instance, a panning camera.
The shaky camera detection algorithm is shown in Figure 6. The factor in the final box was chosen experimentally to be 3.5, a smaller number than this will give fewer false positive but more false negative instances of shaky camera warnings.
Boom in shot
Frequently on a multi-camera production environment, especially with guests in a studio, hands or other objects (such as the boom, papers a presenter is holding, etc.) can encroach from the edge on one of the camera's shots. This can be corrected by the camera operator adjusting the shot or by informing the guest of the issue. In a multi-camera shoot, the director would prefer to select one of the other cameras for the live shot or final edit, if possible. Therefore it is useful to alert the director and production staff when such objects are impinging on one of the camera’s shots. The algorithm implemented is labelled a “boom in shot" detection algorithm, but in fact it is constructed in such a way as to attempt to identify any unwanted object appearing and disappearing at the edge of a shot. The algorithm employed was based on a difference image, which is calculated as the absolute difference between the pixel values in the present frame and the preceding frame. As such it represents the amount of “movement” there is in a scene at a particular point in time. The algorithm forms a comparison between the average amount of “movement” in the outer regions of the difference image to the maximum amount of “movement” in the inner regions of the difference image. Hence it should mitigate the effect of panning camera, and isolate incidents of large movement at the edges of scenes, which could be due to unwanted items i coming in and out of shot. The inner and outer regions of the difference image are illustrated in Figure 7. The algorithm is shown in Figure 8. The count threshold was chosen experimentally to be 3% of the total number of pixels in the outer region.
Multi-camera white / colour balance
Before a shoot usually a camera line up is performed where the white level, black level and colour balance of each camera is set. In a multi-camera shoot, the cameras all need to have the same “look and feel” otherwise cuts between them will be distracting for the audience. On larger productions continuous monitoring and adjustment of camera colour and exposure settings, or “racking”, will be the job of a team of vision engineers. However, on smaller shoots with lower budgets, responsibility may fall to the director or lighting supervisor. They may coordinate line-up by eye using the vision mixer before a shoot, and it might be checked again during a break. During continuous shooting it may be difficult to keep track of the colour alignment between the cameras with the multiple tasks required of the production staff in the production environment. Therefore, a measure of the alignment between the cameras in terms of white balance and colour could be useful to the production staff. By calculating and comparing Red, Green, Blue and Hue histograms between the cameras such a measure can be obtained.
Initially a simple cross correlation between each histogram was calculated. However, this alone was found to be insufficient. Consider a studio example where two cameras in a multi-camera group are pointing at the presenter’s face. In the first camera almost half of the picture is taken up by the face alone, whereas in the case of the second camera the desk and presenter’s body are also in shot. If the cameras are aligned properly then the colour and brightness (unless there is a particularly strong light source in one direction, which in most studio setups would not be the case) of the skin tones of the presenter’s face in both cameras should be similar. If they are not then the viewer will find a cut between the cameras odd, because if they were viewing the scene with their own eyes such a colour change from different viewpoints would not take place.
The skin tones of the presenter’s face should correspond to a peak or peaks in the corresponding histograms of the images of each camera which occur at closely aligned positions in the colour spaces. However, the magnitude of the peaks will be different - in this example the peak would be larger in the case of the first camera where the presenter’s face takes up more of the image.
Therefore, as well as the overall correlation between the histograms, a comparison between the positions of the peaks in the histograms should be made. A peak finding algorithm needs to be implemented, with the difficulty being the stochastic nature of the data of the histograms (due to the finite number of pixels used to resolve the images). In Figure 5 the multi-camera GUI of the images for each camera was shown, and comparing camera 1 with cameras 2 and 3 it can be observed by eye that the colour and brightness between 2 and 3 is much more closely correlated than those between 1 and 2 or 1 and 3. Therefore the camera operator of camera 2 and/or camera 3 should adjust their settings for the three cameras to become more aligned to the currently selected camera 1. At the top of Figure 5 this is demonstrated graphically with the green and red “cut quality” lines (for cameras 2 and 3) both being far from the white line for the selected camera 1. This would indicate to the director that according to the algorithm, a cut from camera 1 to camera 2 or to camera 3 would not be desirable without adjustment. These “cut quality” lines are created using the histogram comparison algorithm via peak finding which will now be discussed.
Peak finding algorithm as applied to the image histogram
Consider the image presented in Figure 9 and its respective HSV and RGB histograms in Figure 10. Figure 10 shows the histograms with peaks detected, note the difficulty of detecting peaks for data of a locally stochastic nature (which is true in general for image histograms and audio spectra), for example the data of the top left Hue histogram. In order to calculate the positions of peaks, a novel peak finding algorithm is employed.
In order to attempt to mitigate this problem of locally stochastic data, first a smoothing window is passed over the data, which eliminates some of the unwanted stochastic peaks. This can be seen by comparison of a high order polynomial interpolation of the original data (jagged line) with a high order polynomial interpolation of the smoothed data (smoothed line) in Figure 11. The first derivative of this polynomial can then be taken, which is shown by the dashed line. Experiments were carried out into finding the zero points of the first derivative and checking the sign of the second derivative at this point in order to characterise the stationary point as a maximum (if the second derivative is negative).
However, due to the locally stochastic nature of the data, this was found not to be a reliable enough method to find the main peaks in practice. The new method implemented analyses the zero-crossing behaviour of the first derivative which is illustrated in Figure 12.
The peak finding algorithm is shown in Figure 13 with Figure 12 as an illustration. The roundness threshold (threshold) was chosen by experimentation to be 0.2. The step where the data is squared is optional, but it affects the roundness threshold. The peak finding algorithm operates as follows. First, the low level noise is removed from the histogram. The data is then smoothed using a smoothing function such as a Gaussian window. The data is optionally squared to make the peaks larger in comparison to other data. The first derivative of the data is then calculated, for example using high order polynomial interpolation. The maximum first derivative is then determined to set a scale. The first derivative shows the positions of peaks and troughs as these will have a zero first derivative, termed a “zero crossing”. These are found and those with a negative gradient identified. A negative gradient shows a change from increasing to decreasing histogram values and so identifies a peak. However, at this stage we do not know if the peak is genuine peak or not.
To identify the peaks in the histogram, negative zero crossing is analysed in turn. For each, the height of the previous first derivative peak “local_max” and the depth of the subsequent first derivative trough “local_min” are checked. If the value of local_max - local_min is greater than a defined threshold x the maximum derivative then the main peak is deemed to be found. This is because the calculation shows the peak in the histogram having the highest difference in rate of increase and decrease.
In Figure 14 are the red, blue and green histograms of the image in Figure 10 with the peaks found using the peak finding algorithm indicated with the bold lines.
The results of the peak finding algorithm are then used to compare the relative positions of the peaks in each of the red, green and blue channels from each of the camera sources. Where there is a close match in the peaks, the white balance of two sources are deemed to be aligned.
The information gathered via the video quality checking algorithms is displayed in the multi-camera GUI shown again below in Figure 15. Warning messages are shown in red boxes for each individual camera, with a traffic light shown in the top right hand corner. The red, green, blue and hue histograms with their peaks indicated are also shown overlain on the images. The graph at the top indicates the “cut quality” from the selected camera (which is camera 1, indicated by the red box around its image) to the other cameras. The relatively low values of the green (camera 2) and red (camera 3) historical lines indicate the difference in brightness and colour between camera 1 and the other two cameras, whilst higher values would indicate better aligned cameras.
Camera Errors
Camera errors are those which pertain to the camera itself, which may or may not be due to incorrect settings on the camera. They may be viewed as engineering problems, as opposed to creative decisions that could result in similar technical errors. Examples are: 1. ) Stuck pixel / line 2. ) Gamma / black crushing 3. ) Out of gamut
Stuck pixels / lines or black pixels / lines can be detected by analysing the change in the image through time. A wholly black line, for example in a particular field of the image, can be found by looping through each field separately and finding pixels which are zero. In general, few if any pixels should be zero in a “normal” image.
Incorrect gamma curve settings can cause an inefficient utilisation of dynamic range (bit depth) of the camera’s imaging system. This can result in a loss of detail in light or dark regions of the image.
Standards for the technical delivery of HD material impose restrictions on the allowable colour gamut for video images. These standards require that a certain percentage of the pixels remain “in gamut" to pass the test. The EBU defines the allowable gamut as follows: 1. ) RGB components must be between -5 % and 105% (-35 and 735mV) 2. ) Luminance (Y) must be between -1% and 103% (-7mV and 721 mV)
Following on from these recommendations, overshoots and undershoots which exist for a relatively short period of time may be filtered out before measuring. An error will only be registered where the out of gamut signals total at least 1% of picture area. In the current implementation, only the check on luminance (Y) is carried out, and the RGB check is left as a future work.
Health
The minimisation of the likelihood that a piece of video footage could trigger photosensitive epilepsy (PSE) is measured as a technical requirement for delivery of a programme for broadcast in the UK. Here we discuss a method for the detection of flashing imagery. It is not intended to replace the Harding test currently used by broadcasters and advertisers to test their footage. It could however, be used to warn the production team when images are likely to fail a Harding test because of flashing.
In general there are two possible cases where PSE could be triggered. The first is temporal, for example a flashing light (such as flash photography) and the second is due to spatial variations in light patterns (such as light reflecting off waves in water). The first case was implemented as in Figure 16 below, and the second case was left as a future work. The spatial variations thought likely to trigger PSE are poorly defined, so such work may focus on higher-level scenarios, such as the detection of water using similar techniques to sky detection.
The image is split into 16 regions because PSE could be triggered by any region of the image (it is not required that the entire image is flashing, only a sufficiently large region of it). In Figure 17 below, we show Fourier transforms of the maximum intensity historical arrays for two pieces of footage.
Presentation to production staff
The video GUI in Figure 15 provides the production staff with a visualisation of the multi-camera group and any errors (in red boxes) associated with each feed. This information is further summarised into a traffic light, as often staff would know what the error is upon inspection of the video itself. Finally, the “cut quality” chart provides the director with a measure of the alignment of the multi-camera group in terms of white balance and colour. However, there is a great preference from production staff to view live (or extremely close to live) content at full resolution, for example on a studio reference monitor. The GUI displayed in Figure 15 above is delayed, due to (amongst other factors) the delay in ingesting the data from the capture card to the software memory buffer, and then rendering out the video to the display device. Coupled with this consideration is the drawback associated with providing the director with an extra screen which they may not have time to look at in a busy studio environment. Therefore, it would be desirable to overlay the traffic lights on top of the live footage shown on a studio reference monitor, thus requiring no extra screen and introducing no delay to the visualisation of the video itself.
This can be done by playing out the relevant key shown in Figure 18 back through the capture card which is “genlocked” to the multi-camera group. The relevant traffic light is select by sending a message from the quality checking GUI to the key player telling the player which traffic light key should be selected for that video feed. A hardware chroma key device can then be used to combine the relevant camera feed with its traffic light in real time. Similarly, the relevant traffic light can be overlain individually on each video feed to the vision mixing device for the entire multi camera group.
Increasingly, production galleries are being equipped with “multi-viewers”. These display multiple video feeds and other data on a single large display, allowing dynamic reconfiguration to suit the current production. A multi-viewer could also be used to display the quality checking traffic lights.
The important features of whatever method is used to display the data are the same: There should be no additional display devices for the production staff to view; the system should not introduce a delay to the display of video images; there should be an option to turn off the overlay so video can be viewed normally, unencumbered by the overlay.
Post Production
As well as correcting mistakes that could occur during a production, the quality checking tools can be used to record metadata, which may be useful in the editing process. The use of this metadata in the editing process depends upon the particular production being undertaken. Larger productions tend to have more video and audio feeds, but also more staff to monitor those feeds and fix problems. On an “as live” recording it may not be possible to fix problems as quickly as in, for instance, a drama production. This could result in sections of footage being recorded with errors that can therefore not be used in the edit. The automated tagging with errors of content which has been subsequently reshot in the production can help to expedite the editing process by avoiding the editor having to reassess the quality of the footage. Production staff can log such errors manually. However, time pressures, budget constraints, and the latency and lack of diligence of human error flagging make this non-ideal. Therefore, this type of automated metadata can help to avoid the inclusion of low quality or erroneous footage in the final edit of the programme.
As well as the arguments above, amber errors (such focus checking) can occur naturally on a multi-camera shoot, for example, during shot line up. This metadata can be useful in the editing process in terms of sorting the footage (for example, a timeline which only shows the editor error free footage) and searching the footage. This can save time and money in the editing process. A smaller production (such as self-operator news journalism) may well have fewer video and audio feeds, meaning less information is required to be monitored at the point of capture, and sorted upon editing. However, this may also mean fewer production staff and a shorter editing period with a smaller budget to fix mistakes as and when they may arise. Typically such productions will have less time and fewer resources to review video footage, especially at full resolution. Therefore, assuming highly reliable and accurate quality checking tagging, editing using low resolution proxy footage on a laptop could result in high (technical) quality output for smaller budgets.
Pressure in the editing process could well continue to go up in future due to an increasing number of delivery platforms, which may require different edits for different audiences. Automated quality checking metadata can help to alleviate this pressure. Finally, footage may also be required to be delivered into an archive, or footage may be included from an archive. Automated quality checking of this footage and the use of this metadata in searching and sorting large archives is likely to become more important in the future.
The analysis discussed so far is of the video component of an audio video signal. The analysis may also be on the audio component of the audio video signal in conjunction with the analysis of the video component or instead of the analysis of the video component. This analysis will now be described with respect to the remaining figures.
Audio quality checking tools
In a production environment, production staff members have to deal with a great deal of information in a short period of time. For example, it is often the case that there is insufficient time to listen to audio feeds whilst the programme is being shot, and therefore a monitoring tool for each feed would be desirable. Currently, sound engineers have only three tools to identify a fault causing audio feed: clipping lights on the console; VU meters on the console; and pre-fade listen, which could be routed to other monitoring tools such as a peak programme meter (PPM). Hence, some visualisation of the audio may also be helpful as it could provide instant visual feedback to the sound engineer without having to isolate each audio feed.
Errors which could occur on a single audio feed are varied. Examples include levels being too low, clipping due to levels being too high, or something intermittent like feedback or crackling. As outlined in Figure 3, audio errors can be split into five categories: capture, microphone, multi microphone, studio and heath. The capture errors are given a red warning level because they could make the content for that microphone feed totally unusable. Other errors are given an amber warning level as they may be intentional or fixable in the post-production process.
Capture Errors
Capture errors are those errors which occur due to the capture process itself which could be due to cables, connections, capture hardware or electromagnetic interference from other devices in the studio. Errors of this nature can be critical for production staff to correct as soon as possible, as they may be impossible to fix in post-production leading to a potential loss in crucial content. Algorithms have been implemented to attempt to automatically detect the following four common capture errors: 1. ) Dropped feed 2. ) Radio Microphone drop off 3. ) Crackling 4. ) Pops / Clicks
Multiple feeds of audio can be difficult to monitor all at once. Hence, it is possible for an entire recording to be carried out with one of the audio feeds not working at all. A simple algorithm which measures the peak power of an audio feed and warns the production staff when this level becomes too low (or falls to zero) is therefore useful. Dropped feeds can be due to faulty microphones or faulty cables, and are in general quick to fix once they are identified, for example by swapping out a bad cable. A similar algorithm can be used to detect radio microphone drop off which exhibits sudden drops in peak power.
The first two of the faults listed above are detected in the audio processing algorithm shown in Figure 20 below, which is run in segments of time equivalent to a single frame of video (for example 1/25 of a second, equivalent to 1920 samples of audio) for every frame. Note that through experimentation peak_power_threshold was set at -3dBFS; zeros_threshold was set at 10 and mean_threshold was set to be 1% of the sample scale (sample maximum).
Detection of crackling or pops and clicks is more challenging. Simple algorithms which looked at characteristic frequencies of pops and clicks were investigated; however, it is still difficult to separate the characteristics of the unwanted pops and clicks from the desired audio. The robust detection of pops and clicks is left as a future work.
Microphone Errors
What we refer to as microphone errors pertain to the individual source setup, be that a microphone or line-level source. These errors are varied and their nature makes it difficult to identify their exact origin. Examples include RF interference, DC offset, clipping, or levels being set incorrectly when considered as a group of sources (for example, one microphone is far too quiet in comparison with the rest of the microphones) and are listed below: 1. ) Clipping 2. ) Levels 3. ) DC offset 4. ) Feedback 5. ) Interference / Hum
Microphone errors most frequently occur due to either production staff errors, or due to the unpredictable nature of a production (such as one of the guests on a panel show speaking more loudly than others) or through faulty equipment. A production may have a large number of microphone feeds, making it more difficult for sound staff to spot errors when they occur. Therefore, simple visualisation tools of the peak power for all of the feeds at the point of recording would be useful. Peak power is calculated for each video frame-worth of audio, as shown in Figure 20. The level of a particular microphone could be correct at the sound desk, but then for some reason too low or too high at the point of capture. Similarly, warnings for when a particular microphone is clipping would be desirable and is also calculated as shown in Figure 20. If the clipping is due to a performer, for example, then this may not be an error, but a microphone which is consistently clipping should probably have its pre-amp gain turned down at the desk.
Levels
We differentiate between ‘'low” signal levels, which are considered to have a peak power too close to the noise floor, and signals that appear “digitally dead” because they have multiple contiguous sample values of zero.
Mains Hum
Mains hum and RF interference can be caused (amongst other reasons) when cables are incorrectly laid. If analogue audio lines are laid parallel (and close) to mains-carrying power cables the 50Hz frequency of the mains current can be inducted onto the audio cable. This can cause an audible low-frequency hum to be added to the audio signal. Many productions involve rigging and de-rigging of the studio or space in a limited amount of time, which can result in mistakes.
Either before a shoot begins or even once it has started, an alert which advises the staff of issues such as interference would be useful. A simple method of detecting mains hum was implemented as shown in Figure 20, however, the reliability of this algorithm was required to be improved both in terms of false positives and false negatives. In order to improve on this performance, the peak finding algorithm was used to identify a persistent peak at 50Hz. The audio_data_fft is plotted in Figure 21, where the lowest % of the frequencies are shown at the bottom for illustration. Note the highly stochastic nature of the data.
The peak finding algorithm of Figure 13 applied to this data identifies the peak shown in Figure 22 which is most easily seen in the bottom chart with the bolder white line than the rest of the plot. If this peak occurs at 50Hz and persists at this frequency over a few contiguous seconds we consider this a positive detection. In order to trigger an alert, we require that this algorithm and the mains hum detector shown in Figure 20 both return positive. In this case the combined mains hum warning is set. DC Offset A difference in electrical potential between two parts of a circuit can result in a DC offset. This is easily fixed in post-production, but while recording it limits the headroom of the signal and impairs dynamic range. While generally not a problem with production-grade equipment, it is a simple fault to detect. A basic detector was implemented, which calculates the mean sample value of 1920 consecutive samples. This mean value is required to be below a threshold. To avoid the mean value being skewed by data at the beginning or end of the 1920 sample chunk, we only use data between the first and last positive-going zerocrossing points in the chunk.
Feedback
Acoustic feedback can occur when a microphone and a speaker outputting the sound from that microphone are positioned such that a positive feedback loop is created. This will typically occur at some resonant frequency or frequencies associated with the equipment and their relative positions. One possible algorithm employed for the detection of unwanted feedback is to detect persistent peaks in the frequency domain. This method could trigger false positives in, for example, electronic music containing a single fixed frequency. However, this would be rare. A musical instrument in general would have a broader frequency peak which fluctuates slightly though time associated with each note. This method is implemented in exactly the same way as the persistent peak finding algorithm presented above for the detection of a 50Hz mains hum, but the persistent peak (due to feedback) could occur at any frequency in the spectrum as shown in Figure 23.
An alternative complimentary method for detecting feedback is to use zerocrossing analysis. We investigated zero-crossing rate as an indicator of unwanted tonal components such as feedback. Figure 24 shows two waveforms. The first is of a section of audio from a television programme; the second is a 1 KHz sine wave, which we introduce as an unwanted tone, similar in some ways to acoustic feedback.
To aid visualisation, we switch to a logarithmic axis for the amplitude (Figure 25), which exaggerates the waveform making it easier to see the zero-crossing points. The zero-crossing rate is defined as the number of zero-crossings per unit time. Initial experiments involved calculating the zero-crossing rate for successive chunks of audio and analysing the change in this value over time. However, this property was found to not vary much even with the introduction of feedback. A better solution was found based on research into acoustic feedback reduction in hearing aids. Rather than measuring zero-crossing rate, we measure the distances between zero-crossing points. A low variance in this data indicates the presence of a tonal component.
Applying this analysis to the two waveforms in Figure 25 we can see that, as is expected for a sine wave, the distances between the zero-crossings in constant, so the variance is zero. For the speech waveform, we see large variance in the zero-crossing distances.
As an example, a sum of the two waveforms (Figure 26) exhibits the low variance in zero-crossing distances that we predict for a signal with a strong tonal component. This same method can be applied to detecting acoustic feedback, a primary feature of which is the presence of a strong tonal component. This method is the time domain analogue of the frequency domain peak finding method outlined above. The difference is that this zero-crossing based method works best when there is only one tonal component, and harmonics are of negligible amplitude. This is often the case with acoustic feedback and is one of the features that distinguish it from other tonal sources, such as a flute. As such, we run both detectors concurrently, but only show one warning, giving priority to the zero-crossing method.
Multi-Microphone Setup
For a production with a large number of microphone feeds, a simple visualisation which shows each microphone compared with the others would be useful. This can also make it obvious if two feeds are identical (which would most likely be caused by incorrect routing of signals, either physically or electronically). Problems can occur if two similar signals, say from two microphones in the same studio are out of phase by approximately 180°. When mixed, the sum of these signals can be close to zero so the two signals cancel each other out. It is difficult to conceive of a method that could detect when this might be a problem, because it requires knowledge of how the signals will be mixed down. As such, implementation of an algorithm to check for the correct phase of a multimicrophone setup has been left as a future work. A short list of multi-microphone setup errors are: 1. ) Identical feeds 2. ) Levels - visual 3. ) Phase
In order to detect identical feeds and visually inspect the levels, Figure 27 shows the GUI used to present audio information and potential errors on audio feeds to the production staff. Identical feeds or feeds which have the incorrect levels can be identified visually from this GUI. Errors such as feedback, clipping, or dead feeds are identified by the red warning messages associated with each audio feed (the central column). The left hand column shows a short history of the peak power, whilst the right hand graph compares these historical peak powers overlaid on the same axis. The blue chart shows the frequency domain plot of that audio feed with a logarithmic y-axis (note the peak due to feedback).
Studio Errors
It can be common in a studio to pick up unwanted acoustic sounds from the surrounding environment. These may be caused by people moving about in a studio; air conditioning or other plant; or wind noise if outdoors. An example of this is an unwanted banging noise which can occur when a speaker hits a table during talking, or nudges the microphone itself. Therefore, an algorithm which could identify if a feed has an unwanted banging occurring on it would be useful, especially in programmes with a rapid turnaround and multiple microphones making it difficult for production staff to monitor each feed individually. By taking the Fourier transform of a small subset of the number of samples per second (here 1/32 of the duration of a video frame was used - 60 audio samples at 48kHz) and comparing the low frequency and high frequency responses, a measure of whether a bang has occurred can be constructed. An average for the low frequency and high frequency response is constructed as shown by the horizontal lines in Figure 28. For a “normal” audio signal (not containing a bang) in general if the average low frequency response becomes large the average high frequency response will be much smaller (top chart). For an audio signal containing a bang, both the average high frequency and the average low frequency responses become larger (bottom chart). If both average responses are greater than a threshold, and the ratio between the average low frequency response and average high frequency response is greater than the ratio threshold, then a banging warning is issued.
Health
The EBU and ITU have issued recommendations pertaining to the maximum loudness response of audio signals. Therefore it is important for production staff to be able to easily monitor a measure of the loudness for the mixer out. If the mixer output is fed into an automated quality checking system, this might help with monitoring loudness for a live broadcast. The historical peak power can be monitored visually (as shown in Figure 29), which can give an indication of part of the EBU loudness standards regarding insufficient dynamics in a programme. This can also help avoid other errors such as clipping.
Post Production
As well as alerting production staff to audio errors as and when they occur in order to attempt to correct these errors, tagging the audio content with metadata is useful in the post production. Audio differs from video for post production. Audio may be directly associated with a video feed (emanating from something that is happening on camera, for example, an interview) or it could be coming from elsewhere or a larger number of feeds such as a music or sporting event. Unusable audio which is directly associated with video may result in the necessity for footage to be reshot or not used. In general, an audio edit is a mix down of multiple audio feeds, produced by associating different weightings with each feed. This makes the use of the quality checking tags less discreet than for video. Content which has small errors associated with it may be used in the final mix but with a smaller weighting than it would otherwise have had.
Although audio can be monitored at the audio desk, where depending upon the individual desk various tools are at the sound engineer’s disposal, errors could occur downstream of the audio desk. Automatically recording these errors as metadata can be useful in the editing process.
In general it may not be practical (especially for a large number of audio feeds) to carefully listen to each individual feed before beginning the editing process. However, it is useful to know if there are errors (such as feedback) occurring at particular times on particular feeds. An editor in this case may choose to fade the erroneous feed out for the duration of the fault(s). Alternatively, knowledge of the exact locations of particular errors can enable post production staff to quickly locate and attempt to mitigate or fix these errors.
Analogous with video, metadata pertaining audio quality can be used to expedite the sorting and searching process. For example, an editor could be presented with only those audio clips which do not contain errors, resulting in less audio content which requires a human to listen to it. Another potential application of automated quality checking metadata is with regard to large quantities of user generated content. In this case it is probably impossible for a human to view and listen to all of the content. Therefore automated searching and sorting of this content becomes essential, where the quality of the content is an important factor in selection for the final edit. User generated content may often have poor quality audio, and therefore automatically selecting content with the fewest audio errors can be used to vastly reduce the amount of audio content which is required to be listened to in the editing process.

Claims (48)

1. A system for monitoring quality of audio video content, comprising: an input for receiving audio video content from each of multiple sources; an analyser arranged to receive the audio video content from the input and to analyse the content in accordance with user selectable metrics; a settings store for storing user selections defining which of the selectable metrics to apply to the content from each of the multiple sources; and an output arranged to assert a warning signal for each respective source if the analyser determines that the content from that source does not match the selected metrics for that source.
2. A system according to any preceding claim, wherein the analyser determines whether a source has dropped one or more frames.
3. A system according to claim 2, wherein the analyser determines that a source has dropped frames by comparing a difference between successive frames.
4. A system according to any of claims 1 to 3, having a buffer for contiguous frames, wherein the analyser is arranged to take the difference of two frames in the buffer to produce a difference image to determine if the difference does not match a metric.
5. A system according to claim 4, wherein the difference image is calculated by subtracting pixel by pixel one frame from the other and taking the modulus of this value at each pixel to produce the difference image.
6. A system according to claim 4 or 5, wherein the analyser is arranged to calculate the arithmetic mean pixel value of the difference image and to determine if this value is greater than a threshold to determine whether a frame is stuck.
7. A system according to claim 4, wherein the analyser is arranged to analyse three contiguous frames, to take the difference between each of them and to the output asserts a warning if the difference between two consecutive frames is greater than the difference between the first and third frames.
8. A system according to claim 1, wherein the analyser is arranged to detect a departure in exposure level for a source.
9. A system according to claim 3 wherein the exposure level is determined by comparison of available bit depth to a threshold.
10. A system according to any preceding claim wherein the analyser is arranged to determine if the movement of the image source is above a threshold.
11. A system according to claim 10, wherein the analyser is arranged to determine if the amount of movement of an inner region of an image is different from movement of an outer region of an image between frames thereby indicating an object entering into view at the edge of an image.
12. A system according to claim 11, wherein the analyser is arranged to determine the amount of movement by determining a difference image between contiguous frames by determining an arithmetic mean pixel value for a predefined inner area of the image and a predefined outer area of the image and the output asserts a warning if the ratio of the outer mean as the numerator and the inner mean as the denominator exceeds a threshold.
13. A system according to claim 4, wherein the analyser is arranged to calculate the difference images for each successive pair of frames, calculate the arithmetic mean pixel value for each difference image and find the maximum of these mean values and the arithmetic mean of these mean values, if the ratio of the maximum to the mean is greater than a threshold then the output asserts a warning of a shaky or nudged camera.
14. A system according to any preceding claim wherein the analyser is arranged to determine for the presence of one or more of stuck pixels, lines, gamma problems, black crushing and out of gamut.
15. A system according to any preceding claim wherein the analyser is arranged to detect the presence of flashes in an image.
16. A system according to claim 15, wherein the analyser is arranged to buffer contiguous frames of video of a predetermined size; split these images into multiple equally sized regions forming multiple distinct buffers; find the maximum pixel value of the region based multiple buffers to form an historical array of maximum pixel values for each region; take the Fourier transform of each of these historical maximum pixel value arrays; find any well defined peaks using a peak finding algorithm; and if any peak is within a predefined frequency range and the peak's amplitude is above a certain threshold then the output asserts a warning of flashing images.
17. A system according to any preceding claim wherein the analyser analyses one or more regions of images from a source.
18. A system according to any preceding claim wherein the analyser is arranged to analyse the audio signal of the content to determine one or more of a dropped feed, crackling, pops, clipping, levels, DC offset, feedback, interference or hum.
19. A system according to claim 18, wherein the analyser is arranged to loop through a number of audio samples in a buffer of a video frame; and to count the number of contiguous zero samples, where if this number of contiguous zero samples exceeds a threshold, a the output asserts a warning of a dropped audio frame.
20. A system according to claim 18, wherein the analyser is arranged to loop through audio samples in a buffer of of a video frame; to calculate the mean of the samples between the first and last positive-going zero-crossings of the data; and if this mean value is above a threshold then the output asserts a warning of DC Offset.
21. A system according to claim 18, wherein the analyser is arranged to operate on a number of audio samples in a buffer of a video frame; to calculate the peak power of this single frame worth of audio data, if the peak power is above a threshold then the output warns of clipping.
22. A system according to claim 18, wherein the analyser is arranged to operate on a number of audio samples in a buffer; to takes the Fourier transform of a single frame worth of audio data, finds peaks in the frequency domain spectrum using a peak finding algorithm, checks if this peak is a persistent peak by requiring the peak occurs at the same position in contiguous frames, if a persistent peak occurs at approximately 50Hz then a the output asserts a warning of mains hum.
23. A system according to claim 18, wherein the analyser is arranged to loop through a number of audio samples in a buffer of predefined size; form an array of the number of audio samples between each zero-crossing in the buffer of audio samples; calculate the variance of the data in this zerocrossing array; and if the variance is low then the output asserts a warning of acoustic feedback.
24. A system according to claim 18, wherein the analyser is arranged to operate on a number of audio samples in a buffer of size approximately equal to 1 millisecond of audio; to take the Fourier transform of this buffer; to calculate the ratio of average amplitude of the higher half of the frequency response to the average amplitude of the lower half of the frequency response and if this ratio is above a threshold and the overall average amplitude of the frequency response is greater than a threshold then the output asserts a warning of a microphone bang.
25. A system for monitoring quality of audio video content, comprising: an input for receiving audio video content from each of multiple sources; - an analyser arranged to receive the audio video content from the input and to analyse the content by comparing a quality of the content from each source against a quality of the content from at least one other source; and an output arranged to assert a warning signal for each respective source if the analyser determines that the quality of content from that source does not match quality of content from another source by a given amount.
26. A system according to claim 25 wherein the quality is white balance and the analyser is arranged to determine if the white balance of an image source is within an appropriate range.
27. A system according to claim 26 wherein the analyser is arranged to determine if a white balance of one of the sources is different from the white balance of other sources by a predetermined amount.
28. A system according to claim 27, wherein the analyser is arranged to calculate histograms for one or more of the Hue, Saturation, Value, Red, Green and Blue for each video source; and to take an overall correlation comparison between each of the histograms for each video feed; to find peaks in each histogram using a peak finding algorithm, and to compares the position of each of these peaks between the corresponding histograms for each video source.
29. A system according to claim 28, wherein the peak finding algorithm operates by taking a first derivative, analysing negative zero crossings of the first derivative and, for each, determining the height of the previous first derivative peak “local_max” and the depth of the subsequent first derivative trough “local_min” are checked and determining if the value of local_max -local_min is greater than a defined threshold.
30. A system according to claim 29, wherein the threshold is a value multiplied by the maximum derivative.
31. A system according to any preceding claim, wherein the output varies the output signal so that the output signal indicates a warning level.
32. A system according to claim 31, wherein the output varies the output signal to indicate a level of confidence of a problem.
33. A system according to any preceding claim, wherein the analyser is arranged to perform multiple comparisons and the output asserts a warning signal based on a combination of the comparisons.
34. A system according to any preceding claim, wherein the analyser performs multiple analyses and the system allows a user to select a type of warning signal for a corresponding analysis.
35. A system according to any preceding claim, wherein the analyser performs multiple analyses and the system allows a user to select a type of warning signal for a corresponding combination of analyses.
36. A system according to any preceding claim, wherein the analyser includes a sensitivity setting by which the sensitivity of the analysis may be set.
37. A system according to any preceding claim wherein the warning signal comprises an overlay on a display of the image from the respective source for which the warning is provided.
38. A system according to any preceding claim wherein the warning signal is an overlay on an image using chroma key type techniques.
39. A system according to any preceding claim, further comprising a metadata store arranged to receive the warning signal and to store metadata indicating the nature of the warning.
40. A system according to claim 39, wherein the metadata store is arranged to selectively store metadata for selectable types of warning.
41. A system according to claim 39 or 40, wherein the metadata store is arranged to store metadata at periodic intervals.
42. A system according to claim 39 or 40, wherein the metadata store is arranged to store metadata whenever the warning signal is asserted.
43. A system according to any preceding claim, wherein the output is arranged to assert the warning signal in real time.
44. A system according to any preceding claim wherein the output is arranged to assert the warning signal to an interface to studio monitors.
45. A system according to claim 40, wherein the output is arranged to collate a warning into a scene based, shot based or clip based summary.
46. A system according to claim 45, wherein the output is arranged to provide an in point and an out point for each warning, wherein the frame at which the warning first occurs is in point, and the frame at which the warning no longer applies is the out point.
47. A system according to any preceding claim, wherein the parameters of algorithms such as thresholds and durations are optimised based on analysis of a large set of data of known quality levels.
48. A system according to claim 47, wherein the optimisation in these algorithms of the buffer size and threshold(s) used are obtained from a machine learning method using a set of test footage with pre-labelled positive footage containing the relevant quality error and pre-labelled negative test footage not containing the quality error.
GB1219924.6A 2012-11-05 2012-11-05 Method and apparatus for quality checking Active GB2507729B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1219924.6A GB2507729B (en) 2012-11-05 2012-11-05 Method and apparatus for quality checking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1219924.6A GB2507729B (en) 2012-11-05 2012-11-05 Method and apparatus for quality checking

Publications (3)

Publication Number Publication Date
GB201219924D0 GB201219924D0 (en) 2012-12-19
GB2507729A GB2507729A (en) 2014-05-14
GB2507729B true GB2507729B (en) 2019-06-12

Family

ID=47429207

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1219924.6A Active GB2507729B (en) 2012-11-05 2012-11-05 Method and apparatus for quality checking

Country Status (1)

Country Link
GB (1) GB2507729B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10587872B2 (en) * 2017-07-05 2020-03-10 Project Giants, Llc Video waveform peak indicator
US10742923B2 (en) 2018-10-25 2020-08-11 International Business Machines Corporation Detection of photosensitive triggers in video content

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5596364A (en) * 1993-10-06 1997-01-21 The United States Of America As Represented By The Secretary Of Commerce Perception-based audio visual synchronization measurement system
EP1056297A2 (en) * 1999-05-24 2000-11-29 Agilent Technologies Inc. Multimedia decoder with error detection
GB2369756A (en) * 2000-08-23 2002-06-05 Sanyo Electric Co Reception degradation warning in digital broadcast receiver
US20030090590A1 (en) * 2001-11-09 2003-05-15 Kazuhiko Yoshizawa Video processing device
JP2004166075A (en) * 2002-11-14 2004-06-10 Hitachi Kokusai Electric Inc Television camera apparatus
JP2009043294A (en) * 2007-08-06 2009-02-26 Sharp Corp Recorder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5596364A (en) * 1993-10-06 1997-01-21 The United States Of America As Represented By The Secretary Of Commerce Perception-based audio visual synchronization measurement system
EP1056297A2 (en) * 1999-05-24 2000-11-29 Agilent Technologies Inc. Multimedia decoder with error detection
GB2369756A (en) * 2000-08-23 2002-06-05 Sanyo Electric Co Reception degradation warning in digital broadcast receiver
US20030090590A1 (en) * 2001-11-09 2003-05-15 Kazuhiko Yoshizawa Video processing device
JP2004166075A (en) * 2002-11-14 2004-06-10 Hitachi Kokusai Electric Inc Television camera apparatus
JP2009043294A (en) * 2007-08-06 2009-02-26 Sharp Corp Recorder

Also Published As

Publication number Publication date
GB201219924D0 (en) 2012-12-19
GB2507729A (en) 2014-05-14

Similar Documents

Publication Publication Date Title
CN110808048B (en) Voice processing method, device, system and storage medium
US7512883B2 (en) Portable solution for automatic camera management
US8408710B2 (en) Presentation recording apparatus and method
EP1202571A2 (en) Controlled access to audio signals based objectionable audio content detected via sound recognition
US9615029B2 (en) Method and apparatus for determining a need for a change in a pixel density requirement due to changing light conditions
GB2507729B (en) Method and apparatus for quality checking
KR100444784B1 (en) Security system
EP1613084A2 (en) AV system and control unit
US9633692B1 (en) Continuous loop audio-visual display and methods
US20040019899A1 (en) Method of and system for signal detection
US11238577B2 (en) Video dynamic range analysis
US11627278B2 (en) High dynamic range video format detection
KR101733305B1 (en) Smart alarm broadcasting apparatus
CN109151417B (en) Detection system and method for multimedia equipment
KR101721224B1 (en) Apparatus and method for detecting real-time video and audio distortion
GB2507576A (en) Focus detection
KR101576223B1 (en) Apparatus for Monitering of Brodcasting Signal and Method thereof
CN108667683A (en) For monitoring and broadcasting and/or the method and apparatus of the relevant data of crossfire
US20230141019A1 (en) Attention Tracking of a Crowd
CN115695930A (en) Program signal processing method, device, equipment and storage medium
Liu et al. A novel audio extraction and restoration system for optical soundtrack
JP2009135754A (en) Digest creating apparatus and method
CN115457747A (en) Signal abnormal state alarm method, device, equipment and storage medium
WO2018152630A1 (en) Systems and methods for monitoring playback of recorded content
JP2006525621A (en) Digital reproduction of variable density film soundtrack