WO2007114796A1 - Appareil et procédé d'analyse de diffusion vidéo - Google Patents

Appareil et procédé d'analyse de diffusion vidéo Download PDF

Info

Publication number
WO2007114796A1
WO2007114796A1 PCT/SG2007/000091 SG2007000091W WO2007114796A1 WO 2007114796 A1 WO2007114796 A1 WO 2007114796A1 SG 2007000091 W SG2007000091 W SG 2007000091W WO 2007114796 A1 WO2007114796 A1 WO 2007114796A1
Authority
WO
WIPO (PCT)
Prior art keywords
commercial
boundary
video
candidate
audio
Prior art date
Application number
PCT/SG2007/000091
Other languages
English (en)
Inventor
Lingyu Duan
Yantao Zheng
Changsheng Xu
Qi Tian
Original Assignee
Agency For Science, Technology And Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency For Science, Technology And Research filed Critical Agency For Science, Technology And Research
Publication of WO2007114796A1 publication Critical patent/WO2007114796A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/37Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying segments of broadcast information, e.g. scenes or extracting programme ID
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/59Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44209Monitoring of downstream path of the transmission network originating from a server, e.g. bandwidth variations of a wireless network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/812Monomedia components thereof involving advertisement data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor

Definitions

  • the invention relates to an apparatus and method for analysing a video broadcast.
  • the invention relates to an apparatus and method for determining a likelihood that a candidate commercial boundary in a segmented video broadcast is a commercial boundary.
  • the invention also relates to an apparatus and method for classifying a commercial broadcast in a pre-defined category.
  • the invention also relates to an apparatus and method for identifying a boundary of a commercial broadcast in a video broadcast and classifying the commercial broadcast.
  • TV advertising is ubiquitous, perseverant, and economically vital. Millions of people's living and working habits are affected by TV commercials. Today, TV commercials are, generally, produced for 30 or 60 seconds, costing millions of US dollars to produce and air. One 30-second commercial in prime time can easily cost up to 120,000 US dollars [Reference 1 - see appended list of references]. Millions of people are reached by commercials which modify their living and work habits, if not immediately, at least later.
  • Advertising may be considered an organised method of communicating information about a product or service which a company or individual wants to promote to people.
  • An advertisement is a paid announcement that is conveyed through words, pictures, music, and action in a medium (e.g., newspaper, magazine, broadcast channels, etc.).
  • United States Patent No. 6100941 discloses a method and apparatus for locating a commercial within a video data stream.
  • the average cut frame distance, cut rate, changes in the average cut frame distance, the absence of a logo, a commercial signature detection, brand name detection, a series of black frames proceeding a high cut rate, similar frames located within a specified period of time before a frame being analysed and character detection are combined to provide a commercial isolation apparatus and/or method with an increased detection reliability rate.
  • a method for detecting an individual TV commercial's boundaries is not disclosed in this invention.
  • Reference [2] discusses a method of extracting a number of audio, visual, and temporal features (such as audio class histogram, commercial pallet histogram, text location indicator, scene change rate, and blank frame rate) within a window around each scene boundary and utilises an SVM classifier to classify each candidate segment into commercial segments or programme segments.
  • Reference [18] discloses a technique for a commercial video's semantic analysis. However, this work is limited to the mapping between low-level visual features and subjective semiotic categories (i.e., practical, playful, utopic, and critical). It utilises heuristic rules used in the practice of commercial production to associate a set of perceptual features with four major types, namely, practical commercials, playful commercials, utopic commercials, and critical commercials.
  • Shots and sequences are a useful level of granularity, as a few useful features (e.g., scene change rate or shot frequency in [2], etc.) rely on shots directly, and many statistically meaningful features (e.g., blank frame rate and audio class histogram in [2], average and variance of edge change ratio and frame differences) have to undergo the accumulation over a temporal window.
  • useful features e.g., scene change rate or shot frequency in [2], etc.
  • statistically meaningful features e.g., blank frame rate and audio class histogram in [2], average and variance of edge change ratio and frame differences
  • Apparatuses incorporating features defined in the appended independent claims can be used to identify a TV commercial's boundary and TV commercial classification by advertised products or services.
  • a flexible and reliable solution may resort to the representation of intra-commercial characteristics that are of interest to indicate the beginning and ending of a commercial, and to indicate the transition from one commercial to the other.
  • apparatuses implementing the features of the independent claims may provide any or all of the following advantages:
  • Apparatuses implementing the techniques described may provide a generic and reliable system and method for locating each individual TV commercial within a video data stream by utilising machine learning to assess a likelihood a candidate commercial boundary is a commercial boundary (for example, as a boundary or not) on the basis of a set of mid-level features, which are developed to capture useful audio- visual characteristics within a commercial and at the boundary between two consecutive commercials.
  • Some apparatuses implementing the invention utilise a binary classifier to assess simply whether or not the candidate commercial boundary is a commercial boundary.
  • Video shots containing such key image frames, together with some modest encouragement coming from the announcer/voice-over, are often employed to highlight the offer at the end of a commercial. This may be a reliable indicator that the video shot in question is in the vicinity (in the video broadcast stream) of a commercial boundary.
  • an alignment algorithm is carried out to seek the most probable position of audio scene change within a neighbourhood of a video shot transition point.
  • Boundary classifier modules may comprise a set of mid-level features to capture audiovisual characteristics significant for parsing commercials' video content (e.g. key frames, structure), Black frame inclusive/exclusive multi-modal feature vectors, and a supervised learning algorithm (e.g. support vector machines (SVMs), decision tree, na ⁇ ve Bayesian classifier, etc.).
  • a supervised learning algorithm e.g. support vector machines (SVMs), decision tree, na ⁇ ve Bayesian classifier, etc.
  • Apparatuses implementing the techniques described may provide a system and method for automatically classifying an individual TV commercial into a predefined category. This may be done according to advertised product and/or service by making use of, for example, ASR (Automatic Speech Recognition), OCR (Optical Character Recognition), object recognition and IR (Information Retrieval) techniques.
  • ASR Automatic Speech Recognition
  • OCR Optical Character Recognition
  • IR Information Retrieval
  • Commercial categoriser modules may comprise ASR and OCR modules for extracting raw textual information followed by spell checking and correction, keyword selection and keyword-based query expansion using external resources (such as google, encyclopaedia and dictionary), SVMs-based classifier trained from external resources such as public document corpus categorised according to different topics, and IR text pre-processing module (such as porter stemming, stopper words removal, and vocabulary pruning); visual-based object recognition (e.g. car, computer, etc.) may be useful in the case of weak textual information.
  • ASR and OCR modules for extracting raw textual information followed by spell checking and correction, keyword selection and keyword-based query expansion using external resources (such as google, encyclopaedia and dictionary), SVMs-based classifier trained from external resources such as public document corpus categorised according to different topics, and IR text pre-processing module (such as porter stemming, stopper words removal, and vocabulary pruning); visual-based object recognition (e.g. car, computer, etc.) may be useful in the case
  • Figure 1 is a block diagram illustrating an application paradigm of TV commercial segmentation, categorisation and identification.
  • Figure 1 is the Figure 1 used in the published paper not that of the specification;
  • Figure 2 is a block diagram illustrating an architecture for a boundary classifier and a commercial classifier;
  • Figure 3 is a process flow diagram illustrating a first set of techniques for determining a likelihood that a candidate commercial boundary is a commercial boundary
  • Figure 4 is an architecture and flow diagram illustrating a second technique for determining a likelihood that a candidate commercial boundary is a commercial boundary
  • FIG. 5 illustrates a series of Image Frames Marked with Product Information (FMPI);
  • Figure 6 is process diagram illustrating low-level visual FMPI feature extraction;
  • Figure 7 is a line graph shows results of system performance for FMPI classification by using different features
  • Figure 8 shows a series of images incorrectly classified as an FMPI frame
  • Figure 9 is a block diagram illustrating an Audio Scene Change Indicator (ASCI), alignment of audio offset and training process flow;
  • ASCI Audio Scene Change Indicator
  • Figure 10 is a bar graph illustrating statistics of time offsets between an audio scene change and its associated video scene change in news programs and commercials;
  • Figure 11 illustrates a Kullback-Leibler distance-based alignment process for audio- video scene changes
  • Figure 12 is a graph illustrating a series of Kullback-Leibler distances calculated from
  • Figure 13 is a table illustrating the simulation results of ASCI
  • Figure 14 is a graph illustrating statistics of the number of shots and the duration of TV commercials in the simulation video database
  • Figure 15 is a line graph illustrating the simulation results of an individual TV commercial's boundaries detection
  • Figure 16 is a block diagram illustrating the architecture of a commercial classifier
  • Figure 17 is a process flow diagram illustrating a first process for classifying a commercial
  • Figure 18 is an architecture/process flow diagram for a second commercial classification method
  • Figure 19 is a process flow diagram illustrating the method for keyword determination and proxy assignation of Figure 18 in more detail
  • Figure 20 is a process flow diagram illustrating the method for word feature selection of
  • Figure 21 illustrates an example of actual speech script, ASR generated speech script, and an acquired article from World Wide Web for the purpose of query expansion/proxy assignation
  • Figure 22 shows a group of key image frames containing significant semantic information in TV commercial videos
  • Figure 23 is a pie chart illustrating system performance results for TV commercial classification
  • Figure 24 is a bar graph illustrating the number of commercials in which the OCR and ASR of Figure 18 recognise brand names successfully;
  • Figure 25 is a bar graph illustrating the Fl values of classifications based on three types of input; and
  • Figure 26 is table illustrating results of classification processes.
  • TV commercial management system which detects commercial segments, determine the boundaries of individual commercials, identify & track new commercials, and summarise the commercials within a period by removing repeated instances.
  • TV commercial classification with respect to the advertised products or services (e.g., automobile, finance, etc.) helps to fulfill the commercial filtering towards personalised consumer services. For example, an MMS or email message (containing key frames or adapted video) on the commercials of interest to a registered user can be sent to her/his mobile device or email account.
  • TV commercials have changed significantly; they are almost always edited on a computer; the appearance all starts with the MTV generation and MTV-type commercials are more visual, more quickly paced, use more camera movement, and often combine multiple looks, such as black and white with colour, or stills with quick cuts [I]. Accordingly, a TV commercial archive system including browse, classification, and search may inspire the creation of a good commercial. Marketing companies may even utilise it to observe competitors' behaviours.
  • the apparatus 60 comprises TV commercial detector 62 configured to locate boundaries of video programmes and commercial broadcasts in the video broadcast and to derive a segmented video broadcast, video shot (or frame) transition detector 64 configured to identify candidate commercial boundaries in the segmented video broadcast, boundary classifier 66 for assessing a likelihood a candidate commercial boundary is a commercial boundary, and commercial classifier 68.
  • boundary classifier 66 is a binary boundary classifier. As shown in Figure 2, boundary classifier 66 comprises FMPI recognition module 70 for determining whether a particular frame comprises an FMPI frame. Boundary classifier 66 also comprises an SVM training module 74 configured the train the classifier model 74 with video frames of the segmented video broadcast which comprise product information (e.g. FMPI frames). Additionally, boundary classifier 74 assesses whether a candidate commercial boundary can be considered to be a commercial boundary. The boundary classifier performs this assessment for an FMPI frame with FMPI recognition module 70.
  • SVM training module 74 configured the train the classifier model 74 with video frames of the segmented video broadcast which comprise product information (e.g. FMPI frames).
  • boundary classifier 74 assesses whether a candidate commercial boundary can be considered to be a commercial boundary. The boundary classifier performs this assessment for an FMPI frame with FMPI recognition module 70.
  • the boundary classifier may, optionally, comprise ASC (audio scene change) recognition module 76, silent frame recognition module 78, black frame recognition module 80 and HMM training module 82 used to train an HMM (Hidden Markov model) utilised in the ASC recognition module 76.
  • ASC audio scene change
  • silent frame recognition module 78 silent frame recognition module 78
  • black frame recognition module 80 black frame recognition module 80
  • HMM training module 82 used to train an HMM (Hidden Markov model) utilised in the ASC recognition module 76.
  • (at least) visual features are extracted within a symmetric window of each candidate commercial boundary location from a video data stream as shown in Figure 3.
  • Multi-modal audio-visual features are extracted in apparatuses implementing ASC and/or silence recognition.
  • Figure 3 illustrates a multi-modal technique, it has been found that excellent results are obtainable (again, described below) with an implementation of FMPI techniques only. Boundary classification is carried out to determine whether a candidate commercial boundary is indeed a commercial boundary of each individual TV commercial.
  • the input video data stream can be any combination of video/audio source. It could be, for example, a television signal or an Internet file broadcast.
  • the disclosed techniques have particular application for digital video broadcasts. Implementation of the techniques described are extendable to analogue video signals.
  • the analogue video signals are converted to digital format prior to application of the techniques.
  • the disclosed techniques may be implemented on, for example, a computer apparatus, and be implemented either in hardware, software or in a combination thereof.
  • process 100 starts at step 102.
  • the input video broadcast signal is partitioned into commercial and programme sections, as is known.
  • a candidate commercial boundary is detected by use of, for example, a video shot detector 64.
  • image marked with product information (FMPI) recognition is carried out.
  • FMPI recognition used in isolation may provide perfectly acceptable results for assessing the candidate commercial boundary is a commercial boundary at step 110.
  • the boundary classifier determines a likelihood the candidate commercial boundary is a commercial boundary in dependence of a determination the candidate video frame comprises an audio scene change; that is ASCI recognition may be implemented at step 114 and/or silence and black frames recognition may be implemented at step 116.
  • ASCI recognition may be implemented at step 114 and/or silence and black frames recognition may be implemented at step 116.
  • FMPI recognition is discussed in more detail with reference to Figure 3b and ASCI recognition is discussed in more detail with reference to Figure 3c. The process of Figure 3a ends at step 112.
  • an apparatus which determines a likelihood a candidate commercial boundary is a commercial boundary.
  • the apparatus comprises a boundary classifier which determines whether a candidate video frame associated with a candidate commercial boundary of a segmented video broadcast comprises product information and determines a likelihood the candidate commercial boundary is a commercial boundary in dependence of the determination the candidate video frame comprises product information.
  • the boundary classifier is determines a likelihood the candidate commercial boundary is a commercial boundary in dependence of a determination the candidate video frame comprises an audio scene change.
  • the boundary classifier is configured to make the classification according to a determination the candidate video frame (or frames thereof) comprises audio silence or video black frames.
  • a candidate boundary is detected at step 106 of Figure 3 a.
  • the MPEG motion vectors of the video signal are queried in order to identify key frames at step 122.
  • the identification of key frames will be described in more detail below.
  • the video frame comprising the candidate commercial boundary is parsed in order to determine local image features at step 124 and global image features at step 126.
  • the local features derived comprise 128 features (or dimensions) and the global features derived comprise 13 features (or dimensions).
  • the local features and global features are merged to form a 141 -dimensional feature vector.
  • the 141 -dimensional feature vector is examined by a statistical model, in the present example a supervised vector model (SVM) such as the C-SVC (C-support vector classification) model.
  • SVM supervised vector model
  • the SVM model determines at step 132 whether or not the candidate boundary video frame comprises an FMPI frame; that is, it determines whether the candidate video frame which is associated with the candidate commercial boundary of the segmented video broadcast comprises product information. If the query returns a positive result (i.e. the candidate boundary video frame is an FMPI frame), an FMPI confidence/likelihood score for the or each frame in a candidate window (the candidate window comprising a set of video frames associated with the candidate commercial boundary) at step 134.
  • the confidence/likelihood score may be a probability value, as discussed below.
  • the candidate boundary likelihood assessment is then made at step 110 of Figure 3a; that is, the apparatus determines a likelihood the candidate commercial boundary is a commercial boundary in dependence of the determination the candidate video frame comprises product information.
  • the assessment of the likelihood the candidate commercial boundary is a commercial boundary at step 110 may be augmented by ASCI (audio scene change indicator) recognition in step 114 of Figure 3a.
  • ASCI audio scene change indicator
  • a process for assessing the audio scene change is illustrated in more detail in Figure 3c.
  • the candidate boundary is detected at step 106 of Figure 3a.
  • a symmetric audio window is defined at step 140. This will be described further below.
  • the symmetric window is segmented into frames, and a sliding window is derived. Again, this will be described further below.
  • audio features are extracted for each sliding window in the segmented window.
  • step 148 the K-L (Kullback-Leibler) distance metric is applied to the extracted audio features and alignment of the audio window takes place at step 150, looping back to step 148, again as described in detail below.
  • steps 152 and 154 ASC and non-ASC HMM-trained models analyse the extracted audio features and probability scores for ASC and non-ASC are derived at steps 156 and 158 respectively. The probability scores will be described further below and are applied to the candidate boundary likelihood assessment at step 110 of Figure 3a.
  • An input video stream is first partitioned into commercial segments and programme segments. Shot change detection is applied to detect cuts, dissolves, and fade in/fade out, which are considered as candidate commercial boundaries.
  • Hidden Markov Models HMMs
  • Support Vector Machines SVMs
  • ASCI Administered Scene Change Indicator
  • FMPI Image Frame Marked with Product Information
  • Thresholding is used to detect Silence and Black Frames that constitute an integrated feature set together with ASCI and FMPI.
  • a supervised learning algorithm is utilised to fuse ASCI, FMPI, Silence, and Black Frames to distinguish true boundaries of an individual TV commercial. Derivation of these model and features are described below.
  • An SVM is utilised to accomplish the binary classification problem of an FMPI frame. This may be a simple binary ("Yes'V'No" classification). Compared with artificial neural networks, SVMs are faster, more interpretable, and deterministic. Advantages of SVMs over other methods consist of a) providing better prediction on unseen test data, b) providing a unique optimal solution for a training problem, c) containing fewer parameters compared to other methods, and d) working well for data with a large number of features. It has been found that the C-Support Vector Classification (C-SVC) works particularly well with the described techniques.
  • the radial basis function (RBF) kernel is used to map training vectors into a high-dimensional feature space for classification.
  • scene transition detection is used to differentiate commonly known scene change detection that aims to detect shot boundaries by visual primitives.
  • a scene or a story unit is composed of a series of "interrelated shots that are unified by location or dramatic incident" [9].
  • STD aims to detect scenes on the basis of computable audio- visual characteristics and production rules. Many prior works deal with STD concentrating on sitcoms, movies [5] - [9], or broadcast news video [10] [H].
  • One exemplary approach described herein reduces the problem of commercial STD to that of a classification of True Scene Changes versus False Scene Changes at candidate positions consisting of video shot change points. It is reasonably assumed that a TV commercial scene transition always comes with a shot change (i.e., cuts, fade-in/-out, and dissolves).
  • Features e.g. multi-modal features
  • Different or multi-scale window sizes may be optionally applied to different kinds of features.
  • a supervised learning is subsequently applied to fuse multi-modal features.
  • ASCI and FMPI characterise computational video contents (structural or semantic) of interest to signify the boundaries of an individual commercial.
  • FMPI and ASCI are two mid-level features on the basis of video and audio content within an individual TV commercial. Silence and Black Frames are based on the postediting of a sequence of TV commercials. FMPI - whether or not in combination with ASC - provides a post-editing independent system and method. The combination of FMPI, ASCI, and as a further option with Silence and Black Frame provides a more reliable system and method if Silence and Black Frame are used in post-editing process. (Silence and Black Frames are created in post-editing processes.) Further, as different countries make use of them differently, it is a significant advantage of the disclosed techniques for FMPI and, optionally, ASCI not to depend on these features.
  • FMPI is used to describe those images containing visual information explicitly illustrating an advertised product or service.
  • the visual information is expressed in the combination of three ways: text, computer graphics, and frames from a live footage of real things and people.
  • Figure 5 illustrates some examples of FMPI frames.
  • the textual section may consist of the brand name, the store name, the address, the telephone number, and the cost, etc.
  • a drawing or photo of a product might be placed with computer graphics techniques.
  • graphics create a more or less abstract, symbolic, or "unreal" universe in which immense things can happen (from a viewer's perspective)
  • live footage of real things or people is usually combined with computer graphics to solve the problem of impersonality.
  • Each frame of film can be layered with any number of superimposed images.
  • Figure 5 (a)-(e) are the simplest yet most prevalent ones.
  • Figure 5 (f)-(j) the product is projected into the foreground, usually in crisp, clear magnification.
  • Figure 5 (k)-(o) the FMPI frames are yielded by the superimposed text bars, graphics, and live footage. From the image recognition point of view, Figure 5 (a)-(e) produce a fairly uniform pattern; for Figure 5 (f)-(j), the pattern variability mainly derives from the layout and the appearance of a product; Figure 5 (k)-(o) present more diverse patterns due to unexpected real things.
  • the spatial relationship between the FMPI frames and an individual commercial's boundaries is revealed by the production rules as below.
  • the shot containing at least one FMPI frame as an FMPI frame.
  • one or two FMPI frames are utilised to highlight the offer at the end of a commercial.
  • a good example is a commercial for services, expensive consumer durables, and big companies.
  • These commercials usually work through context or setting plus the technical sophistication of the photograph or camera work to concentrate on the presentation of luxury and status, or to explore subconscious feelings and subtle associations between product and situation. For these cases, it is sometimes hard to see what precisely is on offer in commercials since the product or service is buried in the combination of commentary and video shots. Accordingly, an FMPI frame is a useful 'prop'.
  • an FMPI frame might be irregularly interposed in the course of some TV commercials (say, 30-seconder or 60-seconder), as our memories are served by of course endless repetition besides brand names, slogans and catchphrases, and snatches of song. Occasionally an FMPI frame may be present at the beginning of a commercial.
  • an FMPI frame can be considered as an indicator, which helps to determine a much smaller set of commercial boundary candidates from large amounts of shot transitions. It is possible to rely on the FMPI frames only to identify commercial boundaries, but performance may feature a higher recall but a lower precision. As illustrated in Figure 4, and particularly by Figure 15 below, by combining FMPI and ASCI techniques, this problem can be alleviated and yet more accurate results may be obtained.
  • Figure 6 shows an FMPI frame represented by properties of colour, texture, and edge features.
  • the layout is a significant factor in distinguishing an FMPI frame, it is beneficial to incorporate spatial information about visual features.
  • One common approach is to divide images into subregions and impose positional constraints on the image comparison (image partitioning). This approach is used to train the SVM and also to determine whether the candidate video frame comprises FMPL.
  • dominant colours are used to construct an approximate representation of colour distribution. These dominant colours can be easily identified from colour histograms.
  • Gabor filters exhibit optimal localisation properties in the spatial domain as well as in the frequency domain, they are used to capture rich texture information in the FMPI frame.
  • Edge is a useful complement of textures especially when an FMPI frame features stand-alone edges as a contour of an object, as texture relies on a collection of similar edges.
  • the boundary classifier derives the training data by parsing video frames comprising product information and extracting a video frame feature for one or more portions of the video frame and/or for a complete video frame.
  • a given image is first sub-divided into 4x4 sub-images, and local features of eight dimensions for each of these sub-images are computed.
  • the LUV colour space is used to manipulate colour.
  • a uniform quantisation of the LUV space to 300 bins is employed, each channel being assigned 100 bins.
  • Three maximum bin values are selected as features from L, U, and V channels, respectively, as indicated by solid bars in Figure 6.
  • Edges derived from an image using Canny algorithm provide an accumulation of edge pixels for each sub- image, which finally acts as 16-dimensional edge density features.
  • a set of two- dimensional Gabor filters is employed to extract texture features.
  • the Gabor filter is characterised by a preferred orientation and a preferred spatial frequency.
  • the filter bank comprises 4 Gabor filters that are the results of using one centre frequency (i.e., one scale) and four different equidistant orientations.
  • the application of such a filter bank to an input image results in a 4-dimensional feature vector (consisting of the magnitudes of the transform coefficients) for each point of that image.
  • the mean of the feature vectors is calculated for each sub-image.
  • a 128-dimensional feature vector is then formed to represent local features.
  • the first p maximum bin values are selected as dominant colour features.
  • the bin values are meant to represent the spatial coherency of colour, irrespective of concrete colour values.
  • edge pixels are summed up within each sub- image, thereby yielding edge density features with r ⁇ c dimensions.
  • a set of two- dimensional Gabor filters are employed for texture.
  • the mean ⁇ sk of the magnitudes of transform coefficients is used.
  • texture features of r • c ⁇ S ⁇ K dimensions are finally constructed using ⁇ sk .
  • colour and edge are taken into account.
  • the first q maximum bin values are selected from each channel. Edges are broadly grouped into h categories of orientation by using the angle quantiser as:
  • Figure 7 shows a set of recall/precision curves yielded by using different visual features and different C-SVC parameters, which have shown the effectiveness of the proposed method in distinguishing the FMPI frames from an extensive set of commercial images.
  • LIBSVM [16] is utilized to accomplish C-SVC learning.
  • Radial basis function (RBF) exp(— y Xi ⁇ > 0 , is used.
  • w t - is for weighted SVMs to deal with unbalanced data, which set the cost C of class i to w,- xC .
  • e sets the tolerance of termination criterion.
  • Non-FMPI class e is set to 0.0001.
  • is tuned between 0.1 and 10 while C is tuned between 0.1 and 1.
  • An optimal pair of (y ,C) (0.6,0.7) is set.
  • classification is performed with different SVMs kernel parameters (or combinations thereof): colour, texture, edge. Note that different parameters generate different performance figures for recall and precision. Different recall/precision values for each kind of features combination are linked to generate the curves like Figure 7 to reveal some tendency.
  • a set of manually-labelled training feature vectors for FMPI frames and Non-FMPI frames are fed into a LIBSVM to train the SVM classifier in a supervised manner.
  • the apparatus extracts the feature vector from the image and feeds the feature vector into the trained SVM classifier.
  • the SVM classifier determines whether the image associated with the feature vector is an FMPI frame or not. Given a set of test images, the SVM correctly classifies some images as FMPI frames and incorrectly classifies some images as FMPI frames.
  • the classification results vary with different SVM kernel parameters. Examples of performance are illustrated in the recall/precision curves of Figure 7.
  • the FMPI recognition may be applied to those key frames selected from a shot to identify a candidate video frame from a motion measurement of a video frame associated with the candidate commercial boundary.
  • Motion is utilised to identify key frames.
  • the average intensity of motion vectors in the video frame from B- and P- frames in MPEG videos is used to measure the motion in a shot and select key frames at the local minima of motion.
  • Directing recognition at key frames has two advantages: 1) reducing computation, and 2) avoiding distracting frames due to animation effects.
  • Figure 8 illustrates a group of images incorrectly classified as an FMPI frame from FMPI recognition alone.
  • Figure 8 (a) (b) (c) (e) strong texture of a large area is one main reason for false alarms.
  • Figure 8 (d) observes clear edges overlapping a large blank area, which exhibits a similar pattern as an FMPI frame.
  • Figure 8(f) where an object is delineated at the centre of a blank frame.
  • Such a kind of picture often appears in an FMPI frame to highlight the foreground product.
  • an algorithm is required to understand what to tell in an image frame.
  • ASC audio scene changes
  • ASC An audio scene is often modelled as a collection of sound sources and the scene is further assumed to be dominated by a few of these sources [5].
  • ASC is said to occur when the majority of the dominant sources in sound change. It is more or less complicated and sensitive to determine the ASC transition pattern in terms of acoustic - classes [13] because of model-based methods' weakness: large amounts of samples required and the subjectivity of classes labelling.
  • An alternative is to examine the distance metric between two windows based on audio features. Metric-based methods are straightforward. A quantitative indicator is produced. Yet human knowledge is not incorporated by labelling training data or others.
  • the boundary classifier may make the determination the candidate video frame comprises an audio scene change from a distance measurement of audio properties of first and second audio frames of an audio segment of the video broadcast associated with the candidate commercial boundary .
  • Figure 9 shows an audio segment located within a symmetric window at each video shot transition point. The window may be of a pre-defined length.
  • a HMM is utilized to train two models for representing Audio Scene Change (ASC) and Non-audio Scene Change (Non-ASC) on the basis of low- level audio features extracted from the audio segment. Given a candidate commercial boundary, two probability values output from trained HMM models are combined with FMPI related feature values, Silence and Black Frame related feature values to represent a commercial boundary as illustrated in Figure 4.
  • an audio scene is usually modelled as a collection of sound sources and the scene is further assumed to be dominated by a few of these sources.
  • ASC is said to occur when the majority of the dominant sources in the sound change.
  • previous work has classified the audio track into pure speech, pure music, song, silence, speech with music background, environmental sound with music background, etc.
  • ASC is accordingly associated with the transition among major sound categories or different kinds of sounds in the same major category (e.g. speaker change).
  • the proposed ASCI is meant to provide a probabilistic representation of ASC.
  • HMM is utilized to train two models for "explaining" the audio dynamic patterns, namely, ASC and Non-ASC.
  • An unknown audio segment is classified by either of the models which returns the highest posterior probability the segment is ASC or non-ASC.
  • This model-based method is different from that based on acoustic classes. Firstly, the labelling of ASC/Non-ASC is simpler and can more or less capture the sense of hearing when one is viewing TV commercial videos.
  • a mixture Gaussian HMM (left-to-right) is utilised to train ASC/Non-ASC recognisers.
  • a diagonal covariance matrix is used to estimate the mixture Gaussian distribution.
  • the ASCI considers 43-diniensional audio features comprising Mel-frequency cepstral coefficients (MFCCs) and its first and second derivates (36 features), mean and variance of short time energy log measure (STE) (2 features), mean and variance of short-time zero-crossing rate (ZCR) (2 features), short-time fundamental frequency (or Pitch) (1 feature), mean of the spectrum flux (SF) (1 feature), and harmonic degree (HD) (1 feature).
  • MFCCs Mel-frequency cepstral coefficients
  • STE mean and variance of short time energy log measure
  • ZCR mean and variance of short-time zero-crossing rate
  • Pitch short-time fundamental frequency
  • SF spectrum flux
  • HD harmonic degree
  • MFCCs furnish a more efficient representation of speech spectra, which is widely used in speech recognition.
  • STE provides a basis for discriminating between voiced speech components and unvoiced speech components, speech and music, audible sounds and silence.
  • music produces much lower variances and amplitudes than speech does.
  • ZCR is also useful for distinguishing environmental sounds.
  • Pitch determines the harmonic property of audio signals. Voiced speech components are harmonic while unvoiced speech components are non-harmonic. Sounds from most musical instruments are harmonic while most environmental sounds are non-harmonic. In general, the SF values of speech are higher than those of music but less than those of environmental sounds.
  • an anchor person or live reporter can hardly remain synchronised when camera shots are switched to weave news stories.
  • the production of commercial video tends to use more editing effects.
  • the time offsets are mainly attributed to post-editing effects. For example, for fade -in/-out, the visual change is located at the middle of the shot transition whereas the audio change point is often delayed till the end of the shot transition.
  • Figure 11 shows the Kullback-Leibler distance metric in use to evaluate the changes between successive audio analysis windows and to align the audio window. Window size is important to good modelling.
  • the difference curves indicate the different locations of peak change for different window sizes.
  • a multiscale difference computing is used since it is unknown what sounds are being analysed.
  • the boundary classifier determines the candidate video frame comprises an audio scene change by partitioning the audio segment into a plurality of sets of audio frames, each set of audio frames having frames of equal length, the length of one set of audio frames being different from a length of another set of audio frames to determine a set of difference sequences of audio properties from the sets of audio frames, and determining a correlation between difference sequences of the set of difference sequences.
  • each difference sequence is then normalized to [0, 1] through dividing difference values by the maximum of each sequence; the most likely audio scene change is determined by locating the highest accumulated difference values derived from the set of difference sequences.
  • a set of uniform difference peaks associated with the true audio scene change has been located with around 240 ms delay; the offset is identified from a correlation between difference sequences.
  • the boundary classifier aligns the audio scene change with the candidate commercial boundary. According to offset statistics in Figure 10, the shift of adjusted change point is currently confined to the range of [-500ms, 500ms].
  • Audio features are further extracted and arranged within adjusted 4-sec feature windows to be fed into two HMM-based classifiers associated with ASC and Non-ASC, respectively. That is, boundary classifier extracts audio features from the audio segment, trains first and second statistical models for audio scene change and for non-audio scene change from the audio features extracted from the audio segment and classifies a candidate audio segment associated with the candidate commercial boundary from the first and second statistical models.
  • We then form the sequence d(W i ,W i+1 ).
  • An ASC from W 1 to W 1+1 is declared if D t is the maximum within a symmetric window of WS ms. Window size is important for good modeling.
  • the difference curves in Figure 11 indicate different change peaks in the case of different window sizes. Since one does not know a priori what sound one is analysing, multi-scale computing is used.
  • Distance sca i e is then normalised to [0,1] through dividing difference values D i scale by the maximum of each series Max(Dis tan ce sca[e ); the most likely ASC point ⁇ is finally determined by locating the highest accumulated values.
  • the probability p ⁇ co ⁇ ) of the candidate window position ⁇ being an ASC point is calculated as:
  • M denotes the total number of candidate window positions
  • denotes the window corresponding to an ASC point.
  • the Kullback-Leibler distance metric is a formal measure of the differences between two density functions.
  • the normal density function is currently employed to estimate the probability distribution of 43 -dimensional audio features for each sliding analysis window. For the minimum window of 500 ms in total 499 samples of 20 ms unit with a 10 ms overlap result.
  • Figure 11 shows at the sliding window level an overlap of 100 ms has been uniformly employed for multi-scale computing.
  • Figure 12 shows the Kullback-Leibler distances of a small set of ASC and Non-ASC samples which are illustrated to indicate the effectiveness of low-level audio features.
  • the duration of each audio sample is 2 seconds.
  • Two probability distributions are computed for two symmetric windows of one second.
  • the same sampling strategy is applied, i.e., 20 ms unit with a 10 ms overlap.
  • the audio samples are selected to cover diverse audio classes such as speech, different kinds of music, speech with music background, speech with noise background, etc.
  • Two clusters of Kullback-Leibler distances can be delineated clearly. This indicates selected low-level audio features' capability in discriminating ASC samples from Non-ASC samples.
  • HMM is a powerful model to characterize the temporally non- stationary but learnable and regular patterns for the speech signal especially when utilised in conjunction with the Kullback-Liebler distance metric.
  • the audio data set comprises 2394 Non-ASC samples and 1932 ASC samples.
  • a Half-and-Half training/testing partition is applied.
  • a left-to-right HMM consisting of 8 hidden states is employed.
  • a diagonal covariance matrix is used to estimate the mixture Gaussian distribution comprising 12 components.
  • the forward-backward algorithm generates two likelihood values of an observation sequence.
  • the probability/likelihood scores for each of these can be fused later to provide what may be acceptable results.
  • the co-occurrence of some features an effectively indicate the boundary and performance may be improved.
  • Fl or overall accuracy is increased by 3.9% - 4.6%.
  • the HMM-based method improves the Fl or overall accuracy by 2.9% - 4.2%.
  • the alignment plays a more important role in performance improvement.
  • An emphasis should be put on the overall accuracy of ASC and Non-ASC, since two generated probabilities for ASC and Non-ASC jointly contribute to the boundary classification. According to simulation results, a promising accuracy of 87.9% has been achieved by HMM with an alignment process.
  • Silence is detected by examining the audio energy level.
  • the short-time energy function is measured every 10 ms and smoothed using an 8-frame FIR filter.
  • the smoothing implicitly imposes a minimum length constraint on the silence period.
  • a threshold is applied, and the segment that has its energy below the threshold is decided as Silence.
  • a black frame is detected by evaluating the mean and the variance of intensity values for a frame.
  • a threshold method is applied. A series of consecutive black frames (say 8) is considered to indicate the presence of Black Frames.
  • Silence & Black Frames are limited by editing techniques at TV commercial boundaries and their frequent occurrences within an individual commercial.
  • Silence and Black Frames can be combined with FMPI and ASCI to form a complete feature set useful for detecting TV commercial boundaries.
  • the boundary classifier classifies the candidate commercial boundary as a commercial boundary from a fusion of likelihood scores for frame marked with product information (FMPI), audio scene change (ASC) and, optionally, audio silence and video black frame.
  • ASCI yields two probability values /?(ASC) and p(Non - ASC)
  • Silence and Black Frames yields two values /?(Silence) and p(Black Frames) to indicate the presence of Silence and Black Frames, respectively.
  • the candidate video frame comprises a frame of a plurality of video frames of a candidate commercial window associated with the candidate commercial boundary
  • the boundary classifier determines a commercial boundary probability score for video frames of the candidate commercial window determines the likelihood the candidate commercial boundary is a commercial boundary from a plurality of the commercial boundary probability scores.
  • An overall likelihood score is derived from one or more of the probability scores.
  • Machine learning is used to complete the fusion of the probability scores because it is not a trivial task to construct manually the heuristic rules to fuse the probabilities.
  • a SVM is used to learn the patterns associated with (true) commercial boundaries or false commercial boundaries in terms of those probabilities, from a series of manually labelled true or false boundary examples.
  • the fusion can be linear or non-linear.
  • the boundary detection problem is transformed into a binary classification problem.
  • a commercial video database is built for assessment, which consists of 499 clips of individual TV commercial videos covering 390 different commercials.
  • the TV commercial video clips come from heterogeneous video data set of 169 hours of news video taken from 6 different sources, namely, LBC, CCTV4, NTDTV, CNN, NBC, and MSNBC.
  • LBC low-density video
  • CCTV4 nuclear-vehicle
  • NTDTV nuclear-vehicle
  • CNN NBC
  • MSNBC MSNBC
  • These commercials have extensively covered three concepts: namely, Ideas (e.g. education opportunities, vehicle safety), Products (e.g. vehicles, food items, decoration, cigarettes, perfume, soft drink, health and beauty aids), and Services (e.g. banking, insurance, training, travel and tourism).
  • Figure 14 shows the statistics in terms of the number of video shots and the duration within a single TV commercial clip.
  • Three major modes about the duration are observed to be roughly located at 15 seconds, 30 seconds, and 60 seconds.
  • the 30 seconds mode is often used and claims to cut costs as well as gaining reach.
  • the 60 seconds mode is considered as a media idea featuring the substance, tone, and humour of a creative idea.
  • the 15 seconds mode is the saviour of the single-minded idea.
  • the number of video shots features a larger variance. This may be related to various types (e.g. Problem-Solution Format, Demonstration Format, Product Alone Format, Spokesperson Format, Testimonial Format, etc.) of TV commercials.
  • FMPI+ASCI+SILENCE-I-BLACK FRAMES may vary with different video data streams due to non-uniform post-editing techniques.
  • a heterogeneous video data set has been employed aiming at a fair performance evaluation.
  • the apparatus of Figure 2 comprising separable boundary and commercial classifiers can be considered as an apparatus for identifying a boundary of a commercial broadcast in a video broadcast and classifying the commercial broadcast in a pre-defined category.
  • the apparatus comprises a video shot transition detector configured to identify a candidate commercial boundary in the video broadcast, a boundary classifier configured to verify the candidate commercial boundary as a commercial boundary and a commercial classifier configured to classify the commercial in a pre-defined category.
  • a commercial classifier apparatus for classifying a commercial video broadcast in a predefined category will now be described.
  • the commercial classifier may be used in conjunction with the apparatus for determining a likelihood that a candidate commercial boundary of a commercial broadcast in a segmented video broadcast is a commercial boundary described above.
  • Use of the two apparatuses together may be particularly advantageous; if a candidate commercial boundary can be determined to be a commercial boundary with any level of certainty, this facilitates identification of a commercial broadcast for its classification.
  • the architecture of classifier 68 is shown in more detail in Figure 16.
  • the commercial classifier 68 comprises, optionally, a video processor 200 for extracting video and/or audio data from a frame of the video broadcast commercial and converting the video and/or audio data to text data, a classifier model 202, and a proxy document identifier 204 for identifying a proxy document as a proxy of the commercial video broadcast.
  • the proxy document identifier may identify the proxy document as a document related to a keyword identified by First keyword derivation module 206.
  • First text preprocessing module 208, test word vector mapper 210 and training module 212 may identify the proxy document as a document related to a keyword identified by First keyword derivation module 206.
  • Training module 212 is for the compilation of training data from a corpus of training documents and may comprise second keyword derivation module 214, second text pre- processing module 216 and training data vector mapper 218.
  • the classifier module is trained by data from the training data, and classifies the commercial video broadcast from an examination of proxy data from the proxy document.
  • the classifier module may be a support vector machine module.
  • the proxy document identifier 204 is configured to interface with a document index/database 220 which may be a remote external resource, as shown in Figure 16, from commercial classifier 68.
  • a process flow of a first commercial classifier 68 is described as follows with respect to Figure 17.
  • the classification process starts at step 230 and, at step 232, video processor 200 parses a commercial video broadcast for video and/or audio data.
  • proxy document identifier 204 identifies a proxy document from the video/audio data. As described below, this may be done by converting the video/audio data to text data and identifying the proxy document from the text data with ASR and OCR modules of the video processor.
  • the classifier model 202 is trained with training data from training module 212.
  • the classifier model 202 classifies a commercial broadcast from an examination of proxy data from the proxy document identified by proxy document identifier 204.
  • the Commercial Video Processing Module (COVM) 200 aims to expand the deficient and less-informative transcripts from ASR 252 and OCR 254 with relevant proxy articles searched from the world- wide web (WWW) at step 268 like Google and encyclopaedia webs.
  • WWW world- wide web
  • the module For each incoming TV commercial video TYCOM,. 250, the module first converts the video/audio data to text by extracting the raw semantic information via ASR 252 and OCR 254 on the key frame images. Key frames can be extracted at the local minima of motion as described above for FMPI recognition.
  • the accuracy of OCR depends on the resolution of characters in an image. It is empirically observed that text of a larger size contains more significant information than small text.
  • Both English dictionary and encyclopaedias are used as the ground truth for spell checking, as a normal English dictionary may not include non-vocabulary terms like brand names.
  • the proxy article d is obtained.
  • the testing document vector is generated from ⁇ ..
  • Keyword expansion 268 is made with respect to, for example, the internet 268 and a proxy document assignation step 270 then takes place. Steps 264, 266, and 270 are described in more detail with respect to Figure 19. (Note that the same or similar process may be applied when identifying training keywords by the training data and word feature processing module 212 of Figure 18.)
  • the proposed approach firstly preprocesses the output transcripts of ASR and OCR in TV commercial video TVCOm 1 with spell checking at step 258 to generate corrected transcript S 1 - at step 300.
  • a list L 1 of nouns and noun phrases from S 1 are extracted by a natural language processor at step 302.
  • a set of keywords K 1 (kw n ,..., kw u ) are selected by applying the steps below: a) Check S 1 for an occurrence of a brand name or a dictionary of brand names at step 302. b) If the result returns that the brand name(s) is are found in S- at step 306 the brand is selected as a keyword kw t and searched on the online encyclopedia
  • the keyword derivation module therefore identifies a keyword by querying the text data for an occurrence of a brand name identifier word; and in dependence of detecting an occurrence of the identifier word, identifying the identifier word as a keyword c) If the result returns "No" at step 306, from L 1 , another word, such as 1... n nouns and/or noun phrases with largest font size from OCR and the last m from ASR are heuristically selected at step 266 as keywords.
  • the document identifier identifies the proxy document as a document related to the keyword by querying an external document index or database with the keyword as a query term and assigning a most relevant result document of the query as the proxy document.
  • the keyword derivation module identifies another word in the text data, for example a noun word, as a keyword.
  • the Google search engine may be utilised at step 312 for its superior performance in assuring the searched articles' relevancy.
  • the one with the highest relevancy rating is selected at step 270 as the proxy document ' , by proxy document identifier 204 which we denote as the proxy article of TV commercial o ⁇ ii .
  • a value T assigned to ⁇ ' ' c ' ' indicates the proxy article ' under c > l
  • a value F assigned to ⁇ i > C '' means d ' not under c ' .
  • Some learning algorithms may generate an output probability ranging from 0 to 1, instead of the absolute values of 1 or 0; thresholding may be applied for a final determination of the category.
  • the first IR Preprocessing Module (IRPM) 208 function at steps 272, 274 is a known vocabulary term normalisation process used in the setting-up of IR systems. It applies two major steps: the Porter Stemming Algorithm (PSA) 276 and the Stop Word Removal Algorithm (SWRA) 278 to rationalise proxy data.
  • PSA is a process of removing the common morphological and inflexional endings from words in English so that different word forms are all mapped to the same token (which is assumed to have essentially equal meaning for all forms).
  • SWRA is to eliminate words of little or no semantic significance, such as "the", "you", "can”, etc. As shown in Figure 18, both testing and training documents go through this module before any other process runs on them.
  • test word vector mapper 210 forms the test vector at step 282 from proxy data for examination by the classifier model 202 at step 284.
  • the classifier model 202 is trained with training data from the training module 212.
  • the training module 212 is composed of a Training Data & Word Feature Processing Module (TRFM) which accomplishes two tasks. Firstly, a topic- wise document corpus 286 is constructed from available public IR corpora or related articles manually collected from the WWW 287 as the training dataset of a text categoriser. In this way, the training corpus can possess a large amount of training documents and wide coverage of topics. Such a training corpus can avoid the potential over-fitting problem, which may be caused if the textual information of a limited set of TV commercials only is taken as training data. In a proposed system, the categorised Reuters-21578 and 20 Newsgroup corpora are combined to construct the training dataset. The defined topics of these corpora may not exactly match the categories of TV commercials.
  • TRFM Training Data & Word Feature Processing Module
  • One solution is to select the topics from these corpora that are related to a commercial category and combine them to jointly construct the training dataset for representing the commercial category. For example, the documents on the topics of "earn”, “money”, and “trade” in Reuters-21578 are merged together to yield the training dataset for the finance category.
  • Document frequency is a technique for vocabulary reduction. Its promising performance and as the computational complexity is approximately linear in the number of training documents, means it lends itself to the present implementation.
  • the word feature selection process 292 measures the number of documents in which a term W 1 occurs, resulting in the document frequency DF[W 1 ) . If DF[W 1 ) exceeds a predetermined threshold at step 350, W 1 is selected as a feature at step 354; otherwise, w t is discarded and removed from the feature space at step 352.
  • An example of a suitable threshold is 2 when 9107 word features are selected. The basic assumption is that rare terms are either non-informative for category prediction, or not influential in global performance.
  • the number of occurrences of term W 1 is taken as the feature value tf ⁇ w t ) at step 356.
  • each document vector is normalised to unit length at steps 294, 296 so as to eliminate the influence of different document lengths.
  • the Classifier Module performs text categorisation of query articles based on the training corpus and determine the classification of commercial video.
  • SVM is able to handle high dimensional input space. Text categorisation usually involves feature space with extremely high (around 10,000) dimensions. Moreover, the over-fitting protection in SVM enables it to handle such a large feature space.
  • SVM is able to tackle sparse document corpus. Due to the short length of document and large feature space, each document vector contains only a few non-zero entries. As both theoretically and empirically proved, SVM is suitable for problems with dense concepts and sparse instances.
  • Figure 21 shows the output script of ASR (at step 256 of Fig. 18) on a TV commercial of Singulair, which is a brand name of a medicine relieving asthma and allergic symptoms.
  • the script is erroneous and deficient due to background music.
  • ASR generated script By comparing ASR generated script and the actual speech script, it can be found that the innate noise of audio data encumbers the ASR techniques from delivering a semantically meaningful and coherent passage describing the advertised commodity. Any other relevant articles that fall into the same category can serve as the proxy of the TV commercial to serve in the semantic classification task.
  • certain noun or noun phrase can be extracted, like ⁇ allergy> for example, as keywords.
  • World Wide Web step 260
  • an example of the relevant article is acquired which can be assigned as the proxy document.
  • Figure 22 shows another source of potential keywords provided by key image frames of commercial videos.
  • the examples shown present text significantly related to advertised commodity's category, such as ⁇ Credit Card> for finance, or even its brand names, such as ⁇ Microsoft>.
  • a system uses as an example 499 English TV commercials and extracted 191 distinct ones from TRECVID05 video database. Based on their advertised products or services, the 191 distinct TV commercials are distributed in eight categories, as illustrated in Figure 23. This system involves four categories: Automobile, Finance, Healthcare and IT. Though they do not exclusively cover all TV commercials, they count up to 141 and 74% of total commercials. Therefore, they should be able to demonstrate the effectiveness of the proposed approach.
  • 1,000 training documents are selected from corpora Reuter and 20 Newsgroup. Altogether the training documents amount to 4,000.
  • word feature selection phase the document frequency threshold is set to 2, and 9107 word features are selected.
  • Prior to training SVM these 4,000 documents were evaluated by a three-fold cross validation to examine their integrity and qualification as training data. The cross validation accuracy reached up to 96.9%, where Radial basis function (RBF) kernel was used and SVM parameter cost and gamma were determined to be 8,000 and 0.0005.
  • RBF Radial basis function
  • the classification based on manually recorded speech transcripts of commercials is firstly performed. As Figure 26(a) shows, except IT, all other categories achieve satisfactory classification result and the overall classification accuracy reaches 85.8%.
  • IT category mainly covers computer hardware and software. However, in testing commercials, it includes other IT products, like printers and photocopy machines.
  • ASR transcripts are also applied to perform text categorisation. As Figure 26(b) shows, the ASR transcripts deliver bad results in all categories.
  • Figure 26(c) shows the classification results with proxy articles. Compared with ASR transcripts, the classification results have been improved drastically and the overall classification accuracy increases from 43.3% to 80.9%.
  • Figure 25 displays the Fl values of classifications based on all three types of inputs.
  • the proxy articles deliver slightly lower accuracies than the manually recorded speech transcripts. The accuracy differences imply that the errors in keyword selection and proxy article acquisition do occur, and however, they do not necessarily provoke serious degrades on the final performance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un appareil permettant la détermination de vraisemblance qu'une limite de message publicitaire candidat est une limite de message publicitaire comportant un classifieur de limites. Un classifieur de limites détermine si une image vidéo candidate comporte une information de produit, et détermine une vraisemblance que la limite de message publicitaire candidat est une limite de message publicitaire en fonction d'une détermination que l'image vidéo candidate comporte une information de produit. Un autre appareil pour classifier une diffusion vidéo publicitaire comporte un identifiant de document de procuration pour identifier un serveur mandataire d'une diffusion vidéo publicitaire L'appareil comporte également un module de formation pour compiler des données de formation à partir d'un corps de documents de formation et un module classifieur, entraîné par des données de formation, pour classifier la diffusion vidéo publicitaire à parti d'un examen des données provenant du document de procuration.
PCT/SG2007/000091 2006-04-05 2007-04-05 Appareil et procédé d'analyse de diffusion vidéo WO2007114796A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US78953406P 2006-04-05 2006-04-05
US60/789,534 2006-04-05

Publications (1)

Publication Number Publication Date
WO2007114796A1 true WO2007114796A1 (fr) 2007-10-11

Family

ID=38563972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2007/000091 WO2007114796A1 (fr) 2006-04-05 2007-04-05 Appareil et procédé d'analyse de diffusion vidéo

Country Status (2)

Country Link
SG (1) SG155922A1 (fr)
WO (1) WO2007114796A1 (fr)

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011017823A1 (fr) * 2009-08-12 2011-02-17 Intel Corporation Techniques d’exécution de stabilisation vidéo et de détection de limites de scènes vidéo en fonction d'éléments de traitement communs
WO2014145938A1 (fr) * 2013-03-15 2014-09-18 Zeev Neumeier Systèmes et procédés permettant de détecter en temps réel une publicité télévisuelle en utilisant une base de données de reconnaissance de contenu automatisée
US8930980B2 (en) 2010-05-27 2015-01-06 Cognitive Networks, Inc. Systems and methods for real-time television ad detection using an automated content recognition database
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US9258383B2 (en) 2008-11-26 2016-02-09 Free Stream Media Corp. Monetization of television audience data across muliple screens of a user watching television
US9386356B2 (en) 2008-11-26 2016-07-05 Free Stream Media Corp. Targeting with television audience data across multiple screens
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
EP3032837A4 (fr) * 2013-08-07 2017-01-11 Enswers Co., Ltd. Système et procédé permettant de détecter et de classifier de la publicité directe
EP2982131A4 (fr) * 2013-03-15 2017-01-18 Cognitive Media Networks, Inc. Systèmes et procédés permettant de détecter en temps réel une publicité télévisuelle en utilisant une base de données de reconnaissance de contenu automatisée
US9560425B2 (en) 2008-11-26 2017-01-31 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9838753B2 (en) 2013-12-23 2017-12-05 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US9906834B2 (en) 2009-05-29 2018-02-27 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
US9955192B2 (en) 2013-12-23 2018-04-24 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US10080062B2 (en) 2015-07-16 2018-09-18 Inscape Data, Inc. Optimizing media fingerprint retention to improve system resource utilization
US10116972B2 (en) 2009-05-29 2018-10-30 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
EP3286757A4 (fr) * 2015-04-24 2018-12-05 Cyber Resonance Corporation Procédés et systèmes permettant de réaliser une analyse de signal pour identifier des types de contenu
US10169455B2 (en) 2009-05-29 2019-01-01 Inscape Data, Inc. Systems and methods for addressing a media database using distance associative hashing
US10192138B2 (en) 2010-05-27 2019-01-29 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10375451B2 (en) 2009-05-29 2019-08-06 Inscape Data, Inc. Detection of common media segments
US10405014B2 (en) 2015-01-30 2019-09-03 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10482349B2 (en) 2015-04-17 2019-11-19 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
CN111444335A (zh) * 2019-01-17 2020-07-24 阿里巴巴集团控股有限公司 中心词的提取方法及装置
CN111695622A (zh) * 2020-06-09 2020-09-22 全球能源互联网研究院有限公司 变电作业场景的标识模型训练方法、标识方法及装置
US10873788B2 (en) 2015-07-16 2020-12-22 Inscape Data, Inc. Detection of common media segments
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10902048B2 (en) 2015-07-16 2021-01-26 Inscape Data, Inc. Prediction of future views of video segments to optimize system resource utilization
US10949458B2 (en) 2009-05-29 2021-03-16 Inscape Data, Inc. System and method for improving work load management in ACR television monitoring system
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US10983984B2 (en) 2017-04-06 2021-04-20 Inscape Data, Inc. Systems and methods for improving accuracy of device maps using media viewing data
CN113723305A (zh) * 2021-08-31 2021-11-30 北京百度网讯科技有限公司 图像和视频检测方法、装置、电子设备和介质
CN113836992A (zh) * 2021-06-15 2021-12-24 腾讯科技(深圳)有限公司 识别标签的方法、训练标签识别模型的方法、装置及设备
US11272248B2 (en) 2009-05-29 2022-03-08 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
CN114339375A (zh) * 2021-08-17 2022-04-12 腾讯科技(深圳)有限公司 视频播放方法、生成视频目录的方法及相关产品
CN114332729A (zh) * 2021-12-31 2022-04-12 西安交通大学 一种视频场景检测标注方法及系统
US11308144B2 (en) 2015-07-16 2022-04-19 Inscape Data, Inc. Systems and methods for partitioning search indexes for improved efficiency in identifying media segments

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030001977A1 (en) * 2001-06-28 2003-01-02 Xiaoling Wang Apparatus and a method for preventing automated detection of television commercials
US20030147466A1 (en) * 2002-02-01 2003-08-07 Qilian Liang Method, system, device and computer program product for MPEG variable bit rate (VBR) video traffic classification using a nearest neighbor classifier
US20030185541A1 (en) * 2002-03-26 2003-10-02 Dustin Green Digital video segment identification
WO2004030360A1 (fr) * 2002-09-26 2004-04-08 Koninklijke Philips Electronics N.V. Dispositif de recommandation publicitaire
WO2004080073A2 (fr) * 2003-03-07 2004-09-16 Half Minute Media Ltd Procede et systeme de detection et de substitution de segment video
US20050216516A1 (en) * 2000-05-02 2005-09-29 Textwise Llc Advertisement placement method and system using semantic analysis
JP2006050240A (ja) * 2004-08-04 2006-02-16 Sharp Corp 放送信号受信装置および受信方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050216516A1 (en) * 2000-05-02 2005-09-29 Textwise Llc Advertisement placement method and system using semantic analysis
US20030001977A1 (en) * 2001-06-28 2003-01-02 Xiaoling Wang Apparatus and a method for preventing automated detection of television commercials
US20030147466A1 (en) * 2002-02-01 2003-08-07 Qilian Liang Method, system, device and computer program product for MPEG variable bit rate (VBR) video traffic classification using a nearest neighbor classifier
US20030185541A1 (en) * 2002-03-26 2003-10-02 Dustin Green Digital video segment identification
WO2004030360A1 (fr) * 2002-09-26 2004-04-08 Koninklijke Philips Electronics N.V. Dispositif de recommandation publicitaire
WO2004080073A2 (fr) * 2003-03-07 2004-09-16 Half Minute Media Ltd Procede et systeme de detection et de substitution de segment video
JP2006050240A (ja) * 2004-08-04 2006-02-16 Sharp Corp 放送信号受信装置および受信方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN *

Cited By (88)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10074108B2 (en) 2008-11-26 2018-09-11 Free Stream Media Corp. Annotation of metadata through capture infrastructure
US10986141B2 (en) 2008-11-26 2021-04-20 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10791152B2 (en) 2008-11-26 2020-09-29 Free Stream Media Corp. Automatic communications between networked devices such as televisions and mobile devices
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US9167419B2 (en) 2008-11-26 2015-10-20 Free Stream Media Corp. Discovery and launch system and method
US10771525B2 (en) 2008-11-26 2020-09-08 Free Stream Media Corp. System and method of discovery and launch associated with a networked media device
US9258383B2 (en) 2008-11-26 2016-02-09 Free Stream Media Corp. Monetization of television audience data across muliple screens of a user watching television
US9386356B2 (en) 2008-11-26 2016-07-05 Free Stream Media Corp. Targeting with television audience data across multiple screens
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US10567823B2 (en) 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9560425B2 (en) 2008-11-26 2017-01-31 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9576473B2 (en) 2008-11-26 2017-02-21 Free Stream Media Corp. Annotation of metadata through capture infrastructure
US9591381B2 (en) 2008-11-26 2017-03-07 Free Stream Media Corp. Automated discovery and launch of an application on a network enabled device
US9589456B2 (en) 2008-11-26 2017-03-07 Free Stream Media Corp. Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US10425675B2 (en) 2008-11-26 2019-09-24 Free Stream Media Corp. Discovery, access control, and communication with networked services
US9686596B2 (en) 2008-11-26 2017-06-20 Free Stream Media Corp. Advertisement targeting through embedded scripts in supply-side and demand-side platforms
US9706265B2 (en) 2008-11-26 2017-07-11 Free Stream Media Corp. Automatic communications between networked devices such as televisions and mobile devices
US9703947B2 (en) 2008-11-26 2017-07-11 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9716736B2 (en) 2008-11-26 2017-07-25 Free Stream Media Corp. System and method of discovery and launch associated with a networked media device
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US9838758B2 (en) 2008-11-26 2017-12-05 David Harrison Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9848250B2 (en) 2008-11-26 2017-12-19 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9854330B2 (en) 2008-11-26 2017-12-26 David Harrison Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9866925B2 (en) 2008-11-26 2018-01-09 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10142377B2 (en) 2008-11-26 2018-11-27 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US9967295B2 (en) 2008-11-26 2018-05-08 David Harrison Automated discovery and launch of an application on a network enabled device
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US10032191B2 (en) 2008-11-26 2018-07-24 Free Stream Media Corp. Advertisement targeting through embedded scripts in supply-side and demand-side platforms
US10949458B2 (en) 2009-05-29 2021-03-16 Inscape Data, Inc. System and method for improving work load management in ACR television monitoring system
US10271098B2 (en) 2009-05-29 2019-04-23 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
US10116972B2 (en) 2009-05-29 2018-10-30 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US11272248B2 (en) 2009-05-29 2022-03-08 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
US11080331B2 (en) 2009-05-29 2021-08-03 Inscape Data, Inc. Systems and methods for addressing a media database using distance associative hashing
US10820048B2 (en) 2009-05-29 2020-10-27 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
US10169455B2 (en) 2009-05-29 2019-01-01 Inscape Data, Inc. Systems and methods for addressing a media database using distance associative hashing
US10185768B2 (en) 2009-05-29 2019-01-22 Inscape Data, Inc. Systems and methods for addressing a media database using distance associative hashing
US10375451B2 (en) 2009-05-29 2019-08-06 Inscape Data, Inc. Detection of common media segments
US9906834B2 (en) 2009-05-29 2018-02-27 Inscape Data, Inc. Methods for identifying video segments and displaying contextually targeted content on a connected television
CN102474568B (zh) * 2009-08-12 2015-07-29 英特尔公司 基于共同处理元件执行视频稳定化和检测视频镜头边界的技术
CN102474568A (zh) * 2009-08-12 2012-05-23 英特尔公司 基于共同处理元件执行视频稳定化和检测视频镜头边界的技术
WO2011017823A1 (fr) * 2009-08-12 2011-02-17 Intel Corporation Techniques d’exécution de stabilisation vidéo et de détection de limites de scènes vidéo en fonction d'éléments de traitement communs
US8930980B2 (en) 2010-05-27 2015-01-06 Cognitive Networks, Inc. Systems and methods for real-time television ad detection using an automated content recognition database
US10192138B2 (en) 2010-05-27 2019-01-29 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
EP4221235A3 (fr) * 2013-03-15 2023-09-20 Inscape Data, Inc. Systèmes et procédés d'identification de segments vidéo pour afficher un contenu présentant une pertinence contextuelle
CN105052161A (zh) * 2013-03-15 2015-11-11 康格尼蒂夫媒体网络公司 用于使用自动化内容识别数据库的实时电视广告检测的系统和方法
WO2014145938A1 (fr) * 2013-03-15 2014-09-18 Zeev Neumeier Systèmes et procédés permettant de détecter en temps réel une publicité télévisuelle en utilisant une base de données de reconnaissance de contenu automatisée
CN105052161B (zh) * 2013-03-15 2018-12-28 构造数据有限责任公司 实时电视广告检测的系统和方法
EP3534615A1 (fr) * 2013-03-15 2019-09-04 Inscape Data, Inc. Systèmes et procédés permettant de détecter en temps réel une publicité télévisuelle en utilisant une base de données de reconnaissance de contenus automatisée
EP2982131A4 (fr) * 2013-03-15 2017-01-18 Cognitive Media Networks, Inc. Systèmes et procédés permettant de détecter en temps réel une publicité télévisuelle en utilisant une base de données de reconnaissance de contenu automatisée
US10893321B2 (en) 2013-08-07 2021-01-12 Enswers Co., Ltd. System and method for detecting and classifying direct response advertisements using fingerprints
US10231011B2 (en) 2013-08-07 2019-03-12 Enswers Co., Ltd. Method for receiving a broadcast stream and detecting and classifying direct response advertisements using fingerprints
EP3032837A4 (fr) * 2013-08-07 2017-01-11 Enswers Co., Ltd. Système et procédé permettant de détecter et de classifier de la publicité directe
US9609384B2 (en) 2013-08-07 2017-03-28 Enswers Co., Ltd System and method for detecting and classifying direct response advertisements using fingerprints
US9955192B2 (en) 2013-12-23 2018-04-24 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US10284884B2 (en) 2013-12-23 2019-05-07 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US11039178B2 (en) 2013-12-23 2021-06-15 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US10306274B2 (en) 2013-12-23 2019-05-28 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US9838753B2 (en) 2013-12-23 2017-12-05 Inscape Data, Inc. Monitoring individual viewing of television events using tracking pixels and cookies
US11711554B2 (en) 2015-01-30 2023-07-25 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US10405014B2 (en) 2015-01-30 2019-09-03 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US10945006B2 (en) 2015-01-30 2021-03-09 Inscape Data, Inc. Methods for identifying video segments and displaying option to view from an alternative source and/or on an alternative device
US10482349B2 (en) 2015-04-17 2019-11-19 Inscape Data, Inc. Systems and methods for reducing data density in large datasets
EP3286757A4 (fr) * 2015-04-24 2018-12-05 Cyber Resonance Corporation Procédés et systèmes permettant de réaliser une analyse de signal pour identifier des types de contenu
US10674223B2 (en) 2015-07-16 2020-06-02 Inscape Data, Inc. Optimizing media fingerprint retention to improve system resource utilization
US10902048B2 (en) 2015-07-16 2021-01-26 Inscape Data, Inc. Prediction of future views of video segments to optimize system resource utilization
US10873788B2 (en) 2015-07-16 2020-12-22 Inscape Data, Inc. Detection of common media segments
US11971919B2 (en) 2015-07-16 2024-04-30 Inscape Data, Inc. Systems and methods for partitioning search indexes for improved efficiency in identifying media segments
US10080062B2 (en) 2015-07-16 2018-09-18 Inscape Data, Inc. Optimizing media fingerprint retention to improve system resource utilization
US11308144B2 (en) 2015-07-16 2022-04-19 Inscape Data, Inc. Systems and methods for partitioning search indexes for improved efficiency in identifying media segments
US11451877B2 (en) 2015-07-16 2022-09-20 Inscape Data, Inc. Optimizing media fingerprint retention to improve system resource utilization
US11659255B2 (en) 2015-07-16 2023-05-23 Inscape Data, Inc. Detection of common media segments
US10983984B2 (en) 2017-04-06 2021-04-20 Inscape Data, Inc. Systems and methods for improving accuracy of device maps using media viewing data
CN111444335A (zh) * 2019-01-17 2020-07-24 阿里巴巴集团控股有限公司 中心词的提取方法及装置
CN111444335B (zh) * 2019-01-17 2023-04-07 阿里巴巴集团控股有限公司 中心词的提取方法及装置
CN111695622A (zh) * 2020-06-09 2020-09-22 全球能源互联网研究院有限公司 变电作业场景的标识模型训练方法、标识方法及装置
CN111695622B (zh) * 2020-06-09 2023-08-11 全球能源互联网研究院有限公司 变电作业场景的标识模型训练方法、标识方法及装置
CN113836992A (zh) * 2021-06-15 2021-12-24 腾讯科技(深圳)有限公司 识别标签的方法、训练标签识别模型的方法、装置及设备
CN113836992B (zh) * 2021-06-15 2023-07-25 腾讯科技(深圳)有限公司 识别标签的方法、训练标签识别模型的方法、装置及设备
CN114339375A (zh) * 2021-08-17 2022-04-12 腾讯科技(深圳)有限公司 视频播放方法、生成视频目录的方法及相关产品
CN114339375B (zh) * 2021-08-17 2024-04-02 腾讯科技(深圳)有限公司 视频播放方法、生成视频目录的方法及相关产品
CN113723305A (zh) * 2021-08-31 2021-11-30 北京百度网讯科技有限公司 图像和视频检测方法、装置、电子设备和介质
CN114332729A (zh) * 2021-12-31 2022-04-12 西安交通大学 一种视频场景检测标注方法及系统
CN114332729B (zh) * 2021-12-31 2024-02-02 西安交通大学 一种视频场景检测标注方法及系统

Also Published As

Publication number Publication date
SG155922A1 (en) 2009-10-29

Similar Documents

Publication Publication Date Title
Duan et al. Segmentation, categorization, and identification of commercial clips from TV streams using multimodal analysis
WO2007114796A1 (fr) Appareil et procédé d'analyse de diffusion vidéo
US10262239B2 (en) Video content contextual classification
Hua et al. Robust learning-based TV commercial detection
Snoek et al. Multimodal video indexing: A review of the state-of-the-art
Brezeale et al. Automatic video classification: A survey of the literature
Li et al. Content-based movie analysis and indexing based on audiovisual cues
Kotsakis et al. Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification
Evangelopoulos et al. Video event detection and summarization using audio, visual and text saliency
Li et al. Video content analysis using multimodal information: For movie content extraction, indexing and representation
Ekenel et al. Multimodal genre classification of TV programs and YouTube videos
Wang et al. A multimodal scheme for program segmentation and representation in broadcast video streams
Ekenel et al. Content-based video genre classification using multiple cues
Liu et al. Exploiting visual-audio-textual characteristics for automatic tv commercial block detection and segmentation
Maragos et al. Cross-modal integration for performance improving in multimedia: A review
Rouvier et al. Audio-based video genre identification
Doulaty et al. Automatic genre and show identification of broadcast media
Koskela et al. PicSOM Experiments in TRECVID 2005.
Qi et al. Automated coding of political video ads for political science research
Tapu et al. DEEP-AD: a multimodal temporal video segmentation framework for online video advertising
Rouvier et al. On-the-fly video genre classification by combination of audio features
Duan et al. Digesting commercial clips from TV streams
Chu et al. Generative and discriminative modeling toward semantic context detection in audio tracks
Kannao et al. Only overlay text: novel features for TV news broadcast video segmentation
Chu et al. Toward semantic indexing and retrieval using hierarchical audio models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07748636

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07748636

Country of ref document: EP

Kind code of ref document: A1