EP3317881B1 - Commande de contenu audio-vidéo - Google Patents

Commande de contenu audio-vidéo Download PDF

Info

Publication number
EP3317881B1
EP3317881B1 EP16744832.3A EP16744832A EP3317881B1 EP 3317881 B1 EP3317881 B1 EP 3317881B1 EP 16744832 A EP16744832 A EP 16744832A EP 3317881 B1 EP3317881 B1 EP 3317881B1
Authority
EP
European Patent Office
Prior art keywords
audio
video content
vector
metadata value
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP16744832.3A
Other languages
German (de)
English (en)
Other versions
EP3317881A1 (fr
Inventor
Simon RANKINE
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Broadcasting Corp
Original Assignee
British Broadcasting Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Broadcasting Corp filed Critical British Broadcasting Corp
Publication of EP3317881A1 publication Critical patent/EP3317881A1/fr
Application granted granted Critical
Publication of EP3317881B1 publication Critical patent/EP3317881B1/fr
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/233Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region

Definitions

  • This invention relates to a system and method for allowing control of output of audio-video content.
  • Audio-video content such as television programmes, comprises video frames and an accompanying sound track which may be stored in any of a wide variety of coding formats, such as MPEG-2 or MPEG-4.
  • the audio and video data may be multiplexed and stored together or stored separately.
  • a programme comprises such audio video content as defined by the programme maker.
  • Programmes include television programmes, films, news bulletins and other such audio video content that may be stored and broadcast as part of a television schedule. Audio-video content as described herein thus includes content that comprises audio, video or both audio and video.
  • GB 1 336 498 A discloses extracting features from audio-video content and applying the feature values to a low pass filter and a Schmitt trigger detector to determine shot changes.
  • US 2003/103565 A1 discloses classifying and segmenting video according to specific extracted features.
  • hysteresis filters in the context of audio-video data are generally known e.g. from WO 94/16443 A1 , US 2013/050254 A1 or US 2014/232818 A1 .
  • the invention provides a controller, as defined by claim 1, for controlling the output of audio-video content such as programmes received either as a live feed or retrieved from a store such as an audio-video player.
  • the controller is arranged to analyse the audio-video stream, to produce a vector at intervals that correlates to a defined ground truth and to operate a two-stage filtration process on the vector.
  • the output of the two-stage filtration process is provided to a retrieval module which allows control such as providing the audio-video data to a display or transmitter for output, or controlling a player to provide different content or jump to a new location within the programme being processed.
  • An example advantageous use case for a controller embodying the invention is in the analysis of a live feed to determine a fault condition.
  • a live feed of an audio-video programme such as coverage of a football match will comprise various aspect such as camera motion, shot changes, changes in audio from the crowd and commentator and so on.
  • this condition should be detected at the earliest opportunity so as to substitute audio-video content either by changing camera or by retrieving prior audio-video data such as an action replay from the store of a player.
  • a controller can rapidly detect a variety of conditions such as described above, or indeed different "chapters" such as the game being stopped due to injury or half time and to control the output accordingly.
  • the invention may be embodied in a variety of methods and systems for allowing control of the output of audio video content such as programmes.
  • the main embodiment described is a controller for control of recorded or live programmes to be output on a local display and/ or provided to a transmitter for broadcast.
  • a controller embodying the invention is particularly beneficial for rapid analysis of a live audio-video feed to determine that a condition has occurred such that the controller should, instead of presenting the live feed as an output, retrieve alternative content from an audio-video player or store.
  • a condition may be a drastic fault such as a frozen feed from a camera or, as sometimes happens with live broadcast, a camera pointing at material that is not relevant.
  • the controller is able to detect such conditions. Whilst described primarily in terms of a live audio-video feed, the controller may equally operate in relation to audio-video programmes provided from an archive.
  • the embodiment of the invention provides an improved approach to detecting a change in condition from an audio-video feed to avoid both false negatives and false positives.
  • a controller is of value to broadcasters, researchers and users wishing to provide audio-video output if it correctly identifies the portions of an audio-video programme matching a ground truth.
  • the embodying system receives audio video programmes, processes the programmes to produce metadata values, referred to as mood vectors, at intervals throughout the programme and controls the output of programmes.
  • FIG. 1 A system embodying the invention is shown in Figure 1 .
  • the system embodying the invention comprises a player 12 having a store of audio-video programmes which may be a high performance live store or a lower performance archive.
  • live feed input 14 may receive a live feed of an audio-video programme, such as an outside broadcast.
  • a controller 20 receives the retrieved audio-video data and live feed at inputs and provides the live feed and or retrieved programme at outputs for display on a local display 16 or for broadcast via a transmission arrangement, here shown as transmitter 18.
  • the controller 20 comprises the following components or modules which together operate to determine if the received audio-video stream should be continuously output or whether one or more "chapter" points have occurred.
  • a chapter point may be considered to be the point at which the audio-video feed no longer matches a defined criterion or changes from no longer matching the criterion to resuming matching that criterion.
  • the controller 20 comprises a vector engine 21 which receives the audio-video programme and analyses the programme at intervals to produce a vector that correlates to a ground truth.
  • the ground truth may be derived from characteristics of the audio-video stream itself. The characteristics may be audio or video characteristics described later.
  • the output of the vector engine will be a vector at intervals, the vector relating to the ground truth selected, for example rate, pace, tone or other "mood" vectors.
  • the vector signal comprising vector values at time intervals is passed to a low pass filter 22 arranged to filter out rapid changes in the vector signal.
  • the filtered vector signal is then provided to a Hysteresis filter arranged to avoid potential "ringing" in the sense of rapidly switching between asserting the presence or absence of the defied condition.
  • the output of the Hysteresis filter 23 is provided to a retrieval module 24 which is arranged to either allow the feed from the player 12 or live feed 14 to pass to the display or transmitter, or to interrupt the provided programme and instead jump to a feature section of an already recorded programme or indeed swap to a different programme as appropriate.
  • the vector engine 21, low pass filter 22, Hysteresis filter 23 and retrieval module 24 will now be described in further detail.
  • the vector engine 21 is arranged to process the audio video data of the programme to produce a vector at intervals that represents a predfined quality of the programme such as the "mood" of the programme for that interval.
  • the processing module may process other data associated with the programme, for example subtitles, to produce the mood vectors at intervals.
  • the intervals for which the processing is performed may be variable or fixed time intervals, such as every minute or every few minutes, or may be intervals defined in relation to the programme content, such as based on video shot changes or other indicators that are stored or derived from the programme. The intervals are thus useful subdivisions of the whole programme. Vector production is described in more detail later.
  • the output of the vector engine may be a vector comprising pace and tone data.
  • the pace and tone vector may be used to determine a condition such as a "chapter" of game play, a problem with the live feed and so on as previously described.
  • the main chapterisation will be described as a simple example of determining the first half, second half and the half time interval of a football match.
  • FIG 3 An example of the pace and tone data produced by the mood classification algorithm is illustrated in figure 3 , with a ground truth overlaid on top.
  • the chapterisation uses this correlation to perform the chapterisation process.
  • the chapterisation may be broken into two distinct sections: a filtering process to remove transient peaks and troughs allowing larger trends to be observed and a detection process which determines which applies a threshold to the data resulting in a binary playing/not playing output.
  • Figure 4 shows these steps.
  • a low pass filter is provided to avoid unwanted transitioning due to transient changes.
  • the low pass filter may have a fixed or variable threshold.
  • the threshold may be determined from analysis of the data.
  • An implemention specific example provides that the mood vector data is filtered by a second order IIR Butterworth filter with a cut-off of 0.0016 of the Nyquist frequency. This very low cut off frequency compared to the Nyquist rate rendered use of an FIR filter impractical in this specific use case, as the number of taps required was excessive (in the order of hundreds of thousands).
  • IIR filters are able to produce a similar response with far fewer polls, usually at the cost of distortion and potential instability. This makes IIR filters a better choice for a variety of use cases.
  • the magnitude and phase response of the filter is illustrated in figure 5 .
  • Hysteresis filter one specific example of which is a Schmitt trigger
  • a hysterisis filter is known to the skilled person and takes a continuous input and provides a discrete output based on a threshold. The threshold differs depending upon the direction of the input signal. Early tests used a simple mean crossing detector, however this was susceptible to small fluctuations in the low pass filtered data when the input to the detector was close to the mean value. This would result in oscillation of the output in transition regions. Use of hysteresis ensures that once the transition has taken place the output of the detector remains stable.
  • the center point of the hysteresis may be set in a variety of ways.
  • the centre point is determined by taking the average of selected maximum and minimum samples, such as the 20 maximum and 20 minimum samples. This avoids the problem of being dependant on the match shape, but reduces the impact of extreme outliers by using more than one value.
  • the centre point is the mean value of the smoothed data.
  • the amount of hysteresis used in the embodiment is defined as a function of the standard deviation of the smoothed data, for example as fraction of the standard deviation of the smoothed data. The optimum value of this may be determined experimentally from sample data, in this example to be 1/5 of the standard deviation of the smoothed data.
  • Figure 4 summarises the steps undertaken by the arrangement of Figure 1 .
  • the pace and tone inputs received from the vector engine may be selected by a source selector or alternatively the vector sum of these selected and passed to the low pass filter and Hysteresis filter.
  • the output is a binary output waveform defining the presence or absence of the condition tested, here the presence of the football game being played or its absence.
  • This binary output waveform provided to the retrieval module 24 of Figure 1 allows either portions of new programme to be skipped (those portions for which the output is low) and effectively fast forwarded to the next "chapter" for a retrieved programme, or to swap to alternative content for a live feed.
  • the retrieval module could be used to directly control output of audio-video content.
  • the retrieval module may be used to indicate chapter points such as by storing timestamps or a flag along with the audio-video content to be passed to a further controller.
  • ground truth for "mood”.
  • Other ground truths may be used such as video quality defined in various ways.
  • Such a ground truth would have use cases such as analysing archived video to determine sections for restoration.
  • Another ground truth would be amount of "activity" within a video sequence.
  • Such a ground truth would be useful for analysing content that has infrequent activity, such as natural history footage, to determine sections of interest.
  • the system comprises an input 2 for receiving the AV content, for example, retrieved from an archive database.
  • a characteristics extraction engine 4 analyses the audio and/or video data to produce values for a number of different characteristics, such as audio frequency, audio spectrum, video shot changes, video luminance values and so on.
  • a data comparison unit 6 receives the multiple characteristics for the content and compares the multiple characteristics to characteristics of other known content to produce a value for each characteristic.
  • Such characteristic values having been produced by comparison to known AV data, can thereby represent features such as the probability of laughter, relative rate of shot changes (high or low) existence and size of faces directed towards the camera.
  • a multi-dimensional metadata engine 8 then receives the multiple feature values and reduces these feature values to a complex metadata value of M dimensions which may be referred to as a mood vector.
  • the extracted features may represent aspects such as laughter, gun shots, explosions, car tyre screeching, speech rates, motion, cuts, faces, luminance and cognitive features.
  • the data comparison and multi-dimensional metadata units generate a complex metadata "mood" value from the extracted features.
  • the complex mood value has humorous, serious, fast paced and slow paced components.
  • the audio features include laughter, gun shots, explosions, car tyre screeching and speech rates.
  • the video features include motion, cuts, luminance, faces and cognitive values.
  • the characteristic extraction engine 4 provides a process by which the audio data and video data may be analysed and characteristics discussed above extracted.
  • the data itself is typically time coded and may be analysed at a defined sampling rate discussed later.
  • the video data is typically frame by frame data and so may be analysed frame by frame, as groups of frames or by sampling frames at intervals. Various characteristics that may be used to generate the mood vectors are described later.
  • the process described so far takes characteristics of audio-video content and produces values for features, as discussed.
  • the feature values produced by the process described above relate to samples of the AV content, such as individual frames.
  • multiple characteristics are combined together to give a value for features such as laughter.
  • characteristics such as motion maybe directly assessed to produce a motion feature value.
  • the feature values need to be combined to provide a more readily understandable representation of the metadata in the form of a complex metadata value.
  • the metadata value is complex in the sense that it may be represented in M dimensions.
  • a variety of such complex values are possible representing different attributes of the AV content, but the preferred example is a so-called "mood" value indicating how a viewer would perceive the features within the AV content.
  • the main example mood vector that will be discussed has two dimensions: fast/slow and humorous/serious.
  • the metadata engine 8 operates a machine learning system.
  • the ground truth data may be from user trials where members of the general-public manually tag 3 minute clips of archive and current programmes in terms of content mood, or from user trials in which the members tag the whole programme with a single mood tag.
  • the users tag programmes in each mood dimension to be used such as 'activity' (exciting/relaxing) generating one mood tag representing the mood of the complete programme (called whole programme user tag).
  • the whole programme user tag and the programmes' audio/video features are used to train a mood classifier.
  • the preferred machine learning method is Support Vector Machine (SVM) regression. Whilst the whole programme tagged classifier is used in the preferred embodiment for the time-line mood classification, other sources of ground truth could be used to train the machine learning system.
  • SVM Support Vector Machine
  • the metadata engine 8 may produce mood values at intervals throughout the duration of the programme.
  • the time intervals evaluated are consecutive non-overlapping windows of 1 minute, 30 seconds and 15 seconds.
  • the mood vector for a given interval is calculated from the features present during that time interval. This will be referred to as variable time-line mood classification.
  • the machine learning algorithm used to produce the mood data for this study uses a 60s temporal window by default.
  • results using mood data produced with a 5s, 10s, 30s, 60s and 120s window were compared.
  • tone and vector sum the vector sum performed best, so the results for the vector sum are summarised in the table below.
  • time interval can affect how the system may be used. For the purpose of identifying moods of particular parts of a programme, a short time interval allows accurate selection of small portions of a programme. For improved accuracy, a longer time period is beneficial. The choice of a fixed time interval around one minute gives a benefit as this is short in comparison to the length of most programmes, but long enough to provide accuracy of deriving the mood vector for each interval.
  • the low level audio features or characteristics that are identified include formant frequencies, power spectral density, bark filtered root mean square amplitudes, spectral centroid and short time frequency estimation. These low level characteristics may then be compared to known data to produce a value for each feature.
  • the spectral centroid is used to determine where the dominant centre of the frequency spectrum is.
  • a Fourier Transform of the signal is taken, and the amplitudes of the component frequencies are used to calculate the weighted mean. This weighted mean, along with the standard deviation and auto covariance were used as three feature values.
  • Each windowed sample is split into a sub window each 2048 samples in length. From this autocorrelation was used to estimate the main frequency of this subwindow. The average frequency of all these sub-windows, the standard deviation and auto covariance were used as the feature vectors.
  • the low level features or characteristics described above give certain information about the audio-video content, but in themselves are difficult to interpret, either by subsequent processes or by a video representation. Accordingly, the low level features or characteristics are combined by data comparison as will now be described.
  • a low level feature such as formant frequencies, in itself may not provide a sufficiently accurate indication of the presence of a given feature, such as laughter, gun shots, tyre screeches and so on.
  • a given feature such as laughter, gun shots, tyre screeches and so on.
  • the likely presence of features within the audio content may be determined.
  • the main example is laughter estimation.
  • a laughter value is produced from low level audio characteristics in the data comparison engine.
  • the audio window length in samples is half the sampling frequency. Thus, if the sampling frequency is 44.1kHz, the window will be 22.05k samples long, or 50ms. There was a 0.2 sampling frequency overlap between windows.
  • the characteristics are calculated, they are compared to known data (training data) using a variance on N-Dimensional Euclidean Distance. From the above characteristics extraction, the following characteristics are extracted; Formant Frequencies Formants 1-5 Power Spectral Density Mean Standard Deviation Auto covariance Bark Filtered RMS Amplitudes RMS amplitudes for Bark filter bands 1-23 Spectral Centroid Mean Standard Deviation Auto covariance Short Time Frequency Estimation Mean Standard Deviation Auto covariance
  • the video features may be directly determined from certain characteristics that are identified are as follows.
  • Motion values are calculated from 32x32 pixel gray scaled version of the AV content. Motion value is produced from the mean difference between the current frame f k and the tenth previous frame f k-10 .
  • Motion scale * ⁇ f k ⁇ f k ⁇ 10
  • Cuts values are calculated from 32x32 pixel gray scaled version of the AV content. Cuts value is produced from the threshold product of the mean difference and the inverse of the phase correlation between the current frame f k and previous frame f k-1 .
  • Cuts threshold md * 1 ⁇ pc
  • Change in lighting is the summation of the difference in luminance values. Constant lighting is the number of luminance histogram bins that are above a threshold.
  • Face value is the number of full frontal faces and the proportion of the frame covered by faces for each frame. Face detection on the gray scale image of each frame is implemented using a mex implementation of OpenCV's face detector from Matlab central. The code implements Viola-Jones adaboosted algorithm for face detection.
  • Cognitive features are the output of simulated simple cells and complex cells in the initial feed forward stage of object recognition in the visual cortex. Cognitive features are generated by the 'FH' package of the Cortical Network Simulator from Centre for Biological and Computational Learning, MIT.
  • the invention may be implemented in systems or methods, but may also be implemented in program code executable on a device, such as a set top box, or on an archive system or on a personal device.
  • a device such as a set top box, or on an archive system or on a personal device.
  • Alternative implementations include a set top box, larger scale machines for retrieval and display of television programme archives containing thousands of programmes and smaller scale implementations such as personal audio video players, smart phones, tablets and other such devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Television Signal Processing For Recording (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Claims (15)

  1. Unité de commande pour commander la sortie d'un contenu audio-vidéo, comprenant :
    - une entrée (12,14) destinée à recevoir un contenu audio-vidéo ;
    - un moteur de vecteur (21) destiné à :
    recevoir le contenu audio-vidéo ;
    extraire, à partir du contenu audio-vidéo, des valeurs représentant des caractéristiques du contenu audio-vidéo ; et
    produire un vecteur, à intervalles, le vecteur étant produit à partir des valeurs extraites, et le vecteur étant une valeur de métadonnées continues de M dimensions ;
    - un filtre passe-bas (22) destiné à recevoir la valeur de métadonnées continues et à passer des changements en dessous d'un seuil à un filtre d'hystérésis (23) ;
    - le filtre d'hystérésis étant destiné à recevoir la valeur de métadonnées continues filtrée et à produire une sortie binaire à partir de la valeur de métadonnées continues filtrée ; et
    - un module de recouvrement (24) destiné à recevoir la sortie binaire et à établir un signal de commande pour permettre la commande de la sortie du contenu audio-vidéo en fonction de la sortie binaire ;
    - caractérisée en ce que le filtre d'hystérésis présente un écart entre des seuils supérieur et inférieur en fonction de l'écart-type de la valeur de métadonnées continues.
  2. Unité de commande selon la revendication 1, dans laquelle le contenu audio-vidéo comprend un programme.
  3. Unité de commande selon la revendication 1, dans laquelle le contenu audio-vidéo comprend une alimentation en direct.
  4. Unité de commande selon n'importe quelle revendication précédente dans laquelle le filtre d'hystérésis comporte en outre un point central i déterminé en utilisant la moyenne d'échantillons maximum et minimum sélectionnés.
  5. Unité de commande selon n'importe quelle revendication précédente dans laquelle le module de recouvrement est destiné à mémoriser des points de chapitre du contenu audio-vidéo.
  6. Unité de commande selon n'importe quelle revendication précédente dans laquelle le module de recouvrement est destiné à sélectionner un autre contenu audio-vidéo si la sortie binaire change d'état.
  7. Unité de commande selon n'importe quelle revendication précédente dans laquelle le filtre d'hystérésis comprend en outre un point central déterminé en utilisant la moyenne d'échantillons de la valeur de métadonnées continues filtrée.
  8. Procédé de commande de la sortie d'un contenu audio-vidéo, comprenant :
    - la réception d'un contenu audio-vidéo ;
    - l'extraction, à partir du contenu audio-vidéo, de valeurs représentant des caractéristiques du contenu audio-vidéo ; et
    - la production, à l'aide d'un moteur de vecteur, d'un vecteur, à intervalles, à partir du contenu audio-vidéo, le vecteur étant produit à partir des valeurs extraites, et le vecteur étant une valeur de métadonnées continues de M dimensions ;
    - le filtrage, à l'aide d'un filtre passe-bas destiné à recevoir la valeur de métadonnées continues et à passer des changements en dessous d'un seuil à un filtre d'hystérésis ;
    - le filtrage, à l'aide du filtre d'hystérésis destiné à recevoir la valeur de métadonnées continues filtrée et à produire une sortie binaire, de la valeur de métadonnées continues filtrée ; et
    - l'établissement d'un signal de commande dérivé de la sortie binaire pour permettre la commande de la sortie du contenu audio-vidéo en fonction de la sortie binaire ;
    - caractérisé en ce que le filtre d'hystérésis présente un écart entre des seuils supérieur et inférieur en fonction de l'écart-type de la valeur de métadonnées continues.
  9. Procédé selon la revendication 8, dans lequel le contenu audio-vidéo comprend un programme.
  10. Procédé selon la revendication 8, dans lequel le contenu audio-vidéo comprend une alimentation en direct.
  11. Procédé selon l'une quelconque des revendications 8 à 10, dans lequel le filtre d'hystérésis comporte en outre un point central i déterminé en utilisant la moyenne d'échantillons maximum et minimum sélectionnés.
  12. Procédé selon l'une quelconque des revendications 8 à 11, dans lequel le module de recouvrement est destiné à mémoriser des points de chapitre du contenu audio-vidéo.
  13. Procédé selon l'une quelconque des revendications 8 à 12, dans lequel le module de recouvrement est destiné à sélectionner un autre contenu audio-vidéo si la sortie binaire change d'état.
  14. Programme informatique comprenant un code qui, à son exécution, entreprend le procédé selon l'une quelconque des revendications 8 à 13.
  15. Dispositif comportant l'unité de commande selon l'une quelconque des revendications 1 à 7.
EP16744832.3A 2015-06-30 2016-06-30 Commande de contenu audio-vidéo Active EP3317881B1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1511450.7A GB201511450D0 (en) 2015-06-30 2015-06-30 Audio-video content control
PCT/GB2016/051986 WO2017001860A1 (fr) 2015-06-30 2016-06-30 Commande de contenu audio-vidéo

Publications (2)

Publication Number Publication Date
EP3317881A1 EP3317881A1 (fr) 2018-05-09
EP3317881B1 true EP3317881B1 (fr) 2021-01-27

Family

ID=53872434

Family Applications (1)

Application Number Title Priority Date Filing Date
EP16744832.3A Active EP3317881B1 (fr) 2015-06-30 2016-06-30 Commande de contenu audio-vidéo

Country Status (4)

Country Link
US (1) US10701459B2 (fr)
EP (1) EP3317881B1 (fr)
GB (2) GB201511450D0 (fr)
WO (1) WO2017001860A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102660124B1 (ko) * 2018-03-08 2024-04-23 한국전자통신연구원 동영상 감정 학습용 데이터 생성 방법, 동영상 감정 판단 방법, 이를 이용하는 동영상 감정 판단 장치

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130050254A1 (en) * 2011-08-31 2013-02-28 Texas Instruments Incorporated Hybrid video and graphics system with automatic content detection process, and other circuits, processes, and systems
US20140232818A1 (en) * 2013-02-19 2014-08-21 Disney Enterprises, Inc. Method and device for spherical resampling for video generation

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB1336498A (en) 1971-09-13 1973-11-07 British Broadcasting Corp Television shot-change detector
US5634020A (en) * 1992-12-31 1997-05-27 Avid Technology, Inc. Apparatus and method for displaying audio data as a discrete waveform
US6865226B2 (en) * 2001-12-05 2005-03-08 Mitsubishi Electric Research Laboratories, Inc. Structural analysis of videos with hidden markov models and dynamic programming
US7506816B2 (en) * 2004-10-04 2009-03-24 Datalogic Scanning, Inc. System and method for determining a threshold for edge detection based on an undifferentiated equalized scan line signal
US20070242880A1 (en) * 2005-05-18 2007-10-18 Stebbings David W System and method for the identification of motional media of widely varying picture content
US8355442B2 (en) * 2007-11-07 2013-01-15 Broadcom Corporation Method and system for automatically turning off motion compensation when motion vectors are inaccurate
US8311390B2 (en) * 2008-05-14 2012-11-13 Digitalsmiths, Inc. Systems and methods for identifying pre-inserted and/or potential advertisement breaks in a video sequence
US7832928B2 (en) * 2008-07-24 2010-11-16 Carestream Health, Inc. Dark correction for digital X-ray detector
CN102893602B (zh) * 2010-02-22 2016-08-10 杜比实验室特许公司 具有使用嵌入在比特流中的元数据的呈现控制的视频显示
IT1403800B1 (it) * 2011-01-20 2013-10-31 Sisvel Technology Srl Procedimenti e dispositivi per la registrazione e la riproduzione di contenuti multimediali utilizzando metadati dinamici
GB2510424A (en) * 2013-02-05 2014-08-06 British Broadcasting Corp Processing audio-video (AV) metadata relating to general and individual user parameters
GB2515481A (en) 2013-06-24 2014-12-31 British Broadcasting Corp Programme control
KR102051798B1 (ko) * 2013-07-30 2019-12-04 돌비 레버러토리즈 라이쎈싱 코오포레이션 장면 안정 메타데이터를 발생하기 위한 시스템 및 방법들
US9895121B2 (en) * 2013-08-20 2018-02-20 Densitas Incorporated Methods and systems for determining breast density
GB2523311B (en) * 2014-02-17 2021-07-14 Grass Valley Ltd Method and apparatus for managing audio visual, audio or visual content

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130050254A1 (en) * 2011-08-31 2013-02-28 Texas Instruments Incorporated Hybrid video and graphics system with automatic content detection process, and other circuits, processes, and systems
US20140232818A1 (en) * 2013-02-19 2014-08-21 Disney Enterprises, Inc. Method and device for spherical resampling for video generation

Also Published As

Publication number Publication date
US20180184179A1 (en) 2018-06-28
EP3317881A1 (fr) 2018-05-09
GB2556737A8 (en) 2018-07-04
WO2017001860A1 (fr) 2017-01-05
GB201801163D0 (en) 2018-03-07
GB2556737A (en) 2018-06-06
GB201511450D0 (en) 2015-08-12
US10701459B2 (en) 2020-06-30

Similar Documents

Publication Publication Date Title
EP1081960B1 (fr) Procede de traitement de signaux et dispositif de traitement de signaux video/vocaux
Huang et al. Scream detection for home applications
US7982797B2 (en) Detecting blocks of commercial content in video data
US6697564B1 (en) Method and system for video browsing and editing by employing audio
US20050131688A1 (en) Apparatus and method for classifying an audio signal
US20130073578A1 (en) Processing Audio-Video Data To Produce Metadata
EP3701528B1 (fr) Extraction de caractéristiques à base de segmentation pour classification de scène acoustique
CN101129064A (zh) 动态生成过程建模
US20080292273A1 (en) Uniform Program Indexing Method with Simple and Robust Audio Feature and Related Enhancing Methods
US8473294B2 (en) Skipping radio/television program segments
US20210098008A1 (en) A method and system for triggering events
Okuyucu et al. Audio feature and classifier analysis for efficient recognition of environmental sounds
Kim et al. Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation
EP3317881B1 (fr) Commande de contenu audio-vidéo
Boril et al. Automatic excitement-level detection for sports highlights generation.
JPH10187182A (ja) 映像分類方法および装置
US20160163354A1 (en) Programme Control
Islam et al. Sports highlights generation using decomposed audio information
Giannakopoulos et al. A novel efficient approach for audio segmentation
US8285051B2 (en) Information processing apparatus and method for detecting associated information from time-sequential information
Zubari et al. Speech detection on broadcast audio
Raventós et al. The importance of audio descriptors in automatic soccer highlights generation
Kim et al. Detection of goal events in soccer videos
Yoshitaka et al. Video summarization based on film grammar
Changapur et al. Bioacoustics Monitoring to Improve Conservation Efforts for Endangered Species

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20180124

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20191030

GRAJ Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted

Free format text: ORIGINAL CODE: EPIDOSDIGR1

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

GRAJ Information related to disapproval of communication of intention to grant by the applicant or resumption of examination proceedings by the epo deleted

Free format text: ORIGINAL CODE: EPIDOSDIGR1

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: H04N 21/234 20110101ALI20200714BHEP

Ipc: G11B 27/10 20060101AFI20200714BHEP

Ipc: H04N 5/14 20060101ALI20200714BHEP

Ipc: G11B 27/28 20060101ALI20200714BHEP

Ipc: H04N 21/233 20110101ALI20200714BHEP

Ipc: H04N 21/84 20110101ALI20200714BHEP

INTG Intention to grant announced

Effective date: 20200810

INTG Intention to grant announced

Effective date: 20200819

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1359081

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210215

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602016052069

Country of ref document: DE

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20210127

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG9D

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1359081

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210427

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210428

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210527

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210427

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210527

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602016052069

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20211028

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210527

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210630

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230330

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20210127

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20160630

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230510

Year of fee payment: 8

Ref country code: DE

Payment date: 20230502

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20230511

Year of fee payment: 8

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210127