US20170201793A1 - TV Content Segmentation, Categorization and Identification and Time-Aligned Applications - Google Patents

TV Content Segmentation, Categorization and Identification and Time-Aligned Applications Download PDF

Info

Publication number
US20170201793A1
US20170201793A1 US15/297,658 US201615297658A US2017201793A1 US 20170201793 A1 US20170201793 A1 US 20170201793A1 US 201615297658 A US201615297658 A US 201615297658A US 2017201793 A1 US2017201793 A1 US 2017201793A1
Authority
US
United States
Prior art keywords
content
video
user device
audio
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/297,658
Inventor
Jose Pio Pereira
Sunil Suresh Kulkarni
Oleksiy Bolgarov
Prashant Ramanathan
Shashank Merchant
Mihailo M. Stojancic
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Roku Inc
Original Assignee
Gracenote Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/141,163 external-priority patent/US8229227B2/en
Priority claimed from US12/772,566 external-priority patent/US8195689B2/en
Priority claimed from US12/788,796 external-priority patent/US8335786B2/en
Priority claimed from US13/102,479 external-priority patent/US8655878B1/en
Priority claimed from US13/276,110 external-priority patent/US8959108B2/en
Priority to US15/297,658 priority Critical patent/US20170201793A1/en
Application filed by Gracenote Inc filed Critical Gracenote Inc
Assigned to JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT reassignment JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRACENOTE, INC., TRIBUNE BROADCASTING COMPANY, LLC, TRIBUNE DIGITAL VENTURES, LLC
Assigned to TRIBUNE DIGITAL VENTURES, LLC, CastTV Inc., TRIBUNE MEDIA SERVICES, LLC, GRACENOTE, INC. reassignment TRIBUNE DIGITAL VENTURES, LLC RELEASE OF SECURITY INTEREST IN PATENT RIGHTS Assignors: JPMORGAN CHASE BANK, N.A.
Assigned to CITIBANK, N.A., AS COLLATERAL AGENT reassignment CITIBANK, N.A., AS COLLATERAL AGENT SUPPLEMENTAL SECURITY AGREEMENT Assignors: GRACENOTE DIGITAL VENTURES, LLC, GRACENOTE MEDIA SERVICES, LLC, GRACENOTE, INC.
Publication of US20170201793A1 publication Critical patent/US20170201793A1/en
Assigned to THE NIELSEN COMPANY (US), LLC, GRACENOTE, INC. reassignment THE NIELSEN COMPANY (US), LLC PARTIAL RELEASE OF SECURITY INTEREST Assignors: CITIBANK, N.A.
Assigned to ROKU, INC. reassignment ROKU, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GRACENOTE, INC.
Assigned to GRACENOTE DIGITAL VENTURES, LLC, GRACENOTE, INC. reassignment GRACENOTE DIGITAL VENTURES, LLC RELEASE (REEL 042262 / FRAME 0601) Assignors: CITIBANK, N.A.
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/101Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM] by binding digital rights to specific entities
    • G06F21/1011Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM] by binding digital rights to specific entities to devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/35Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users
    • H04H60/37Arrangements for identifying or recognising characteristics with a direct linkage to broadcast information or to broadcast space-time, e.g. for identifying broadcast stations or for identifying users for identifying segments of broadcast information, e.g. scenes or extracting programme ID
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/56Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54
    • H04H60/59Arrangements characterised by components specially adapted for monitoring, identification or recognition covered by groups H04H60/29-H04H60/54 of video
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44204Monitoring of content usage, e.g. the number of times a movie has been viewed, copied or the amount which has been watched
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/835Generation of protective data, e.g. certificates
    • H04N21/8352Generation of protective data, e.g. certificates involving content or source identification data, e.g. Unique Material Identifier [UMID]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/10Arrangements for replacing or switching information during the broadcast or the distribution
    • H04H20/106Receiver-side switching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H2201/00Aspects of broadcast communication
    • H04H2201/90Aspects of broadcast communication characterised by the use of signatures

Definitions

  • the present invention generally relates to techniques for video and audio multi-media processing shared between a central server and remote client devices and more specifically to techniques for multi-media content segmentation, classification, monitoring, publishing in time-aligned broadcast applications, and usability for content viewing and interaction.
  • Video content segmentation, categorization and identification can be applied to a number of major application areas.
  • the major application areas are broadcast content indexing, and monitoring broadcast content.
  • a number of applications utilize video segmentation and content identification. Also, a number of techniques to detect commercials within broadcast content use feature detectors and a decision tree, also considered a form of classifier. Such techniques are generally performed after a show is recorded.
  • an embodiment of the invention addresses a method for time aligned identification of segments of multimedia content on a client device.
  • Multimedia content of broadcast multimedia data received on a client device is identified.
  • a time alignment of content playing on the client device relative to the received broadcast content is tracked and refined.
  • a change in multimedia content has occurred and the time of the change are identified.
  • a sample of the multimedia content beginning at the time of the change in multimedia content is verified to match an expected multimedia content, wherein a time aligned service is provided beginning at the time of change in multimedia content.
  • Another embodiment of the invention addresses a method of video segmentation. Fingerprints of incoming video are generated. A reference database is searched to identify content of the incoming video. Segments are associated with classification scores generated based on the incoming video content using search reports and content analytics, wherein the content classification scores represent types of content contained in the incoming video.
  • Another embodiment of the invention addresses a method of video segmentation based on graph based partitioning. Fingerprints of incoming multimedia content are generated. Nodes in a graph are identified, wherein each node represents a change in multimedia content and the point in time the change occurred in the multimedia content. A weight value associated with each edge between the nodes is generated based on similarity scores between different nodes in the graph. The graph is partitioned into segments. The segments are classified according to types of content contained in segments.
  • Another embodiment of the invention addresses a method of providing time aligned services.
  • An incoming video stream is processed to identify content.
  • Third party alternative content is received for selected display by a user.
  • a scene change is determined to have occurred in the identified content, wherein replaceable content is detected at the scene change.
  • the replaceable content detected at the scene change is replaced with the third party alternative content selected by the user.
  • Another embodiment of the invention addresses a computer readable non-transitory medium encoded with computer readable program data and code for operating a system.
  • An incoming video stream is processed to identify content.
  • Third party alternative content is received for selected display by a user.
  • a scene change is determined to have occurred in the identified content, wherein replaceable content is detected at the scene change.
  • the replaceable content detected at the scene change is replaced with the third party alternative content selected by the user.
  • FIG. 1 illustrates a fingerprinting and search system for both media fingerprinting and identification in accordance with an embodiment of the present invention
  • FIG. 2A illustrates with a flowchart an embodiment of the invention using content id matching, logo tracking, and video transition detection and audio silence detection to perform video segmentation;
  • FIG. 2B illustrates a flowchart to detect frame alignment between query video frames and reference video frames
  • FIG. 2C illustrates a flowchart to perform video segmentation using graph.
  • FIG. 3 illustrates a flowchart showing the states of detected content and state transitions for video segmentation
  • FIG. 4A illustrates the data structures used to store the reports from fingerprint tools and from search servers
  • FIG. 4B illustrates the data structures used for non-recorded broadcast content
  • FIG. 5A illustrates a flowchart to perform fast and accurate content segmentation, and identification which can be used for time-aligned applications including advertisement replacement;
  • FIG. 5B illustrates a method for specific advertisement replacement or overlay
  • FIG. 5C illustrates a method for publishing content and metadata for first/second screen time aligned applications
  • FIG. 6 illustrates a method to segment broadcast TV content on a consumer device and offer time aligned services
  • FIG. 7A illustrates a flowchart to perform fast and accurate content segmentation on broadcast non-recorded content playing on a consumer devices and offer time aligned services
  • FIG. 7B illustrates a method for time aligned applications with multi-media content publishing and user control
  • FIG. 8 illustrates with a flowchart to perform audience measurement or video monitoring on consumer devices
  • FIG. 9A illustrates a method to perform time aligned services such as advertisement replacement on consumer devices
  • FIG. 9B illustrates an exemplary example time aligned application that can be created using various services described in this application
  • FIG. 9C illustrates an example partial xml showing two menu options
  • FIG. 10 illustrates a method to enable multiple language choices for over the air or over cable broadcast on consumer devices; by overlaying text appropriately on the video screen, and substituting audio with the selected language;
  • FIG. 11 illustrates a simple embodiment to enable multiple language choice for over the air or over cable broadcast on consumer devices. This method can also be applied to live linear broadcast where content fingerprints are not immediately available; and
  • FIG. 12 illustrates a system method to monitor broadcast TV content on a consumer device while using adaptive and hybrid fingerprinting methods.
  • the SBlock descriptor which analyses sub-images of a frame and the time-distance between the blocks, and helps reduce false detection during a fade.
  • the SArea descriptor detects the presence of a logo.
  • the recognition of logos is typically computationally expensive.
  • the above reference uses a fast algorithm to detect the presence of a transparent or non-transparent logo.
  • the visual descriptors are combined and a decision tree used to segment a video into commercial and content sections.
  • the present disclosure may be embodied as methods, systems, or computer program products. Accordingly, the present inventive concepts disclosed herein may take the form of a hardware embodiment, a software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present inventive concepts disclosed herein may take the form of a computer program product on a computer readable storage medium having non-transitory computer usable program code embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, optical storage devices, flash memories, or magnetic storage devices.
  • Computer program code or software programs that are operated upon or for carrying out operations according to the teachings of the invention may be written in a high level programming language such as C, C++, JAVA®, Smalltalk, JavaScript®, Visual Basic®, TSQL, Python, Ruby, Perl, use of .NETTM Framework, Visual Studio® or in various other programming languages.
  • Software programs may also be written directly in a native assembler language for a target processor.
  • a native assembler program uses instruction mnemonic representations of machine level binary instructions.
  • Program code or computer readable medium as used herein refers to code whose format is understandable by a processor.
  • Software embodiments of the disclosure do not depend upon their implementation with a particular programming language.
  • a software module may reside as non-transitory signals in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
  • a computer-readable storage medium may be coupled to the processor through local connections such that the processor can read information from, and write information to, the storage medium or through network connections such that the processor can download information from or upload information to the storage medium.
  • the storage medium may be integral to the processor.
  • Embodiments of the present invention go beyond segmentation of commercials on digital video recorder discs (DVDs) and address segmentation of broadcast content and live broadcast content into individual advertisements. Additional embodiments are described that enable quick detection of new advertisements appearing in broadcast content using the advantageous segmentation techniques described below.
  • Segmentation as described herein, has also been utilized to improve identification and support time-aligned applications.
  • the embodiments of the invention provides a method to identify and segment video content that is playing on a consumer device or sensed ambiently. Further embodiments include methods to track the content accurately in time at client site or device and methods to provide time-aligned services. The methods are based on a collection of detectors and descriptors, a content identification system, a tracking search method, and a classification and identification method, and a few additional modes to intelligently control the overall system solution.
  • applications related to social networking, entertainment (content publishing) and advertising can take advantage of identification of the precise multimedia program and the program's exact time as it is played on a consumer device.
  • time aligned knowledge enables useful services and solutions for the user and are valuable to advertisers and content owners as well.
  • Such applications take advantage of segmentation and identification, along with other methods such as content tracking to enable time aligned applications for broadcast content playing on consumer devices or sensed ambiently.
  • An embodiment of the invention addresses techniques for time-aligned services that utilize tracking when a match between incoming video and a stored content sequence is detected.
  • the time aligned services technique allows a user to select displays of relevant content and results of metadata matching to a detected content's time and user menu choices.
  • a content specific menu is prepared for the user to make selections from, such as content type and information.
  • a user interface allows time scrolling to allow the user to go back into the program for missed information.
  • FIG. 1 illustrates a fingerprinting and search system 100 for both media fingerprinting and identification in accordance with an embodiment of the present invention.
  • the fingerprinting and search system 100 includes user sites 102 and 103 , a server 106 , a video database 108 , a remote user device 114 with a wireless connection to the server 106 and for example to a video fingerprinting and video identification process 112 operated, for example, by user site 102 .
  • the remote user device 114 is representative a plurality of remote user devices which may operate as described in accordance with embodiments of the present invention.
  • a network 104 such as the Internet, a wireless network, or a private network, connects sites 102 and 103 and server 106 .
  • Each of the user sites, 102 and 103 , remote user device 114 , and server 106 may include a processor complex having one or more processors, having internal program storage and local user controls such as a monitor, a keyboard, a mouse, a printer, and may include other input or output devices, such as an external file storage device and communication interfaces.
  • the user site 102 may comprise, for example, a personal computer, a laptop computer, a tablet computer, or the like equipped with programs and interfaces to support data input and output and video fingerprinting and search monitoring that may be implemented both automatically and manually.
  • the user site 102 may store programs, such as the video fingerprinting and search process, 112 which is an implementation of a content based video identification process of the present invention.
  • the user site 102 may also have access to such programs through electronic media, such as may be downloaded over the Internet from an external server, accessed through a universal serial bus (USB) port from flash memory, accessed from disk media of various types, or the like.
  • the fingerprinting and search system 100 may also suitably include more servers and user sites than shown in FIG. 1 . Also, multiple user sites each operating an instantiated copy or version of the video fingerprinting and search process 112 may be connected directly to the server 106 while other user sites may be indirectly connected to it over the network 104 .
  • User sites 102 and 103 and remote user device 114 may generate user video content which is uploaded over the Internet 104 to a server 106 for storage in the video database 108 .
  • the user sites 102 and 103 and remote user device 114 may also operate a video fingerprinting and video identification process 112 to generate fingerprints and search for video content in the video database 108 .
  • the video fingerprinting and video identification process 112 in FIG. 1A is scalable and utilizes highly accurate video fingerprinting and identification technology as described in more detail below.
  • the process 112 is operable to check unknown video content against a database of previously fingerprinted video content, which is considered an accurate or “golden” database.
  • the video fingerprinting and video identification process 112 is different in a number of aspects from commonly deployed processes.
  • the process 112 extracts features from the video itself rather than modifying the video.
  • the video fingerprinting and video identification process 112 allows the server 106 to configure a “golden” database specific to its business requirements. For example, general multimedia content may be filtered according to a set of guidelines for acceptable multimedia content that may be stored on the business system.
  • the user site 102 that is configured to connect with the network 104 , uses the video fingerprinting and search process 112 to compare local video streams against a previously generated database of signatures in the video database 108 .
  • the video database 108 may store video archives, as well as data related to video content stored in the video database 108 .
  • the video database 108 also may store a plurality of video fingerprints that have been adapted for use as described herein and in accordance with the present invention. It is noted that depending on the size of an installation, the functions of the video fingerprinting and search process 112 and the management of the video database 108 may be combined in a single processor system, such as user site 102 or server 106 , and may operate as directed by separate program threads for each function.
  • the fingerprinting and search system 100 for both media fingerprinting and identification is readily scalable to very large multimedia databases, has high accuracy in finding a correct clip, has a low probability of misidentifying a wrong clip, and is robust to many types of distortion.
  • the fingerprinting and search system 100 uses one or more fingerprints for a unit of multimedia content that are composed of a number of compact signatures, including cluster keys and associated metadata.
  • the compact signatures and cluster keys are constructed to be easily searchable when scaling to a large database of multimedia fingerprints.
  • the multimedia content is also represented by many signatures that relate to various aspects of the multimedia content that are relatively independent from each other. Such an approach allows the system to be robust to distortion of the multimedia content even when only small portions of the multimedia content are available.
  • Embodiments of this invention address accurate classification of queries. By accurately classifying query content, a classified query can be correctly directed to relevant search servers and avoid a large search operation that generally would involve a majority of database servers. Further embodiments of this invention address systems and methods for accurate content identification.
  • searching, content monitoring, and content tracking applications may be distributed to literally million of remote devices, such as tablets, laptops, smart phones, and the like.
  • Content monitoring comprises continuous identification of content on one or more channels or sources.
  • Content tracking comprises continued identification of an already identified content without performing search on the entire database. For example, a television program may be identified by comparing a queried content with content already identified, such as television programs and primarily with the anticipated time location of the program as described in more detail below. This is in contrast to a number of current solutions that involve a large number of database servers for such applications.
  • FIG. 2A illustrates with flowchart process 200 an embodiment of the invention to segment video and to identify content segments accurately using content id matching, logo tracking, scene change detection, and video transitions and audio silence and audio turn detection.
  • the process 200 is operable to run on a client device or a supporting server.
  • the client or monitoring device can be a consumer device/studio/broadcast equipment configured to perform fingerprinting, scene change detection, logo detection, and commercial break cues detection on incoming content received directly or sensed ambiently in order to segment and track the incoming content.
  • the client device transitions between different states based on the content identified and activates specific detectors based on its state.
  • the client device utilizes fingerprints, content search, and processing of sensed audio and video to identify and segment the incoming video content. To identify content the client performs a similarity search and correlation against stored video and audio sequences.
  • the client performs content tracking and segmentation of content to enable a variety of applications. For example, applications may be provided for the purpose of separating content from advertisements and monitoring of advertisements, in order to identify and separate out new advertisements. Also, applications may be provided to accurately track content and to identify, for example, advertisements and promotions accurately in time to enable time-aligned services.
  • the client or monitoring device can be a consumer device/studio/broadcast equipment configured to perform fingerprinting, scene change detection, logo detection, and commercial break cues detection on incoming content received directly or sensed ambiently in order to segment and track the incoming content.
  • the client device transitions between different states based on the content identified and activates specific detectors based on its state.
  • the client device utilizes fingerprints, content search, and processing of sensed audio and video to identify and segment the incoming video content. To identify content the client performs a similarity search and correlation against stored video and audio sequences.
  • the client performs content tracking and segmentation of content to enable a variety of applications. For example, applications may be provided for the purpose of separating content from advertisements and monitoring of advertisements, in order to identify and separate out new advertisements. Also, applications may be provided to accurately track content and to identify, for example, advertisements and promotions accurately in time to enable time-aligned services.
  • the method is used on central server for archiving and monitoring applications, and on the remote clients, such as smart TVs, tablets, computers, smart phones, and the like, for time aligned and monitoring applications.
  • the method avoids reliance on logo and black frames detection, and uses other detectors and features to segment broadcast video. While logo detection is used in methods such as tracking a known content or narrowing a query, to segment video the reliance on logo detection is reduced.
  • the client performs content tracking and segmentation of content to enable applications for separating content from advertisements and monitoring of advertisements, quickly identifying and separating out new advertisements, or determining more accurate time identification of content for time-aligned services.
  • a method uses classification and state based segmentation that is effective for live broadcast content to identify content, advertisements and promos quickly.
  • the incoming video 201 is processed at step 203 to generate fingerprints for the video.
  • the terms fingerprint and signature may be used interchangeably.
  • the step 203 also generates reports using audio and video analysis.
  • Step 204 performs, in parallel with step 203 , logo detection, identification and tracking.
  • a search is performed on the database of all collected content and advertisements, to identify the content and the time location of the content. Configurations for implementation of step 205 can vary based on the device performing this function. Examples of devices performing these operations are smart TVs, tablets, and smart phones or central server. As a result of the search, an initial match is detected and evaluated in step 206 .
  • step 207 the match is verified using more information and signatures such as additional fingerprints, logo information, color descriptors, scene changes. If there is no match, the process 200 returns to steps 203 and then 205 to identify content. If there is a match, the process 200 proceeds to step 208 .
  • step 208 video frame transformation and audio transformation are calculated. The step 208 detects the transformation of the reference content to the content on the client. Transformations possible are cropping of video frames, zooming in, change in image ratios along x or y axis, and image brightness and contrast changes. Similarly changes can occur for the audio, and these can be pitch change, frequency response changes. The presence of these changes increases the compute effort to fingerprint and detect the reference content.
  • step 208 can use the transformed query video and audio, so as to represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint. For example, by detecting client content that has been stretched 20% on the y-axis, that information is taken into account in the generation of fingerprint to obtain a more accurate representation of the client content.
  • Step 208 generates the fingerprints and reports to track the monitored content with reference to the original reference (via fingerprints).
  • Query content is transformed to represent the original aligned content. If the query video is cropped, than the query transform considers this, so that the generated fingerprint better represents the original.
  • a correlation between the generated transform fingerprints and the reference is performed to achieve accurate matching between monitored content and reference. If the tracked content no longer matches reference, it is considered a divergence and is detected at step 215 . If divergence is detected, the control loops back to 203 , and 204 for fingerprinting and logo processing, and identifying content. Since the previous content no longer matches, at step 203 the content is identified again, and this next time may match a different program or video.
  • a state based classifier takes in all the reports from the fingerprint tools, and the database search, detected logos and other information generated in steps 203 , 204 , 205 and 209 .
  • the classifier analyzes these reports and generates a higher level of classification: such as advertisements, identified content, and promotions and a finer level of segmentation which identifies individual advertisements, and chapters of the content.
  • FIG. 3 illustrates a state based classifier described in more detail below. Promotions are content that advertises video programs yet to be broadcast, or other content that is not an advertisement and not a video program.
  • the results of segmentation process 200 include the following: (i) separate content and index of the content for archival purposes, (ii) information to identify and monitor advertisements, (iii) information to identify new advertising, (iv) information to classify a video during live broadcast to reduce cost of content tracking and monitoring, and (v) information to classify live content for synchronous time aligned services.
  • the classification of video can be performed using a graph structure, where each node is a point in time of the video content, and the arcs between the nodes are similarity scores between the nodes, while other information (such as logo detect, audio turns, scene change, database search) are used to generate the classification into advertisement or content or other types.
  • FIG. 2B illustrates a method 245 to detect a frame alignment mapping between a query and a reference video.
  • the process 245 is operable to run on a client device or a supporting server.
  • the detected frame alignment can be used to reduce fingerprint compute cost since the client video frames can now be aligned with reference frames.
  • any distortion or disturbance between the reference and query fingerprint can be avoided resulting in high matching accuracy or reducing the fingerprints to be compared.
  • the transformed query video and audio represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint.
  • the distortion (disturbance) between the reference and query fingerprint can be avoided resulting in high matching accuracy or reducing the fingerprints to be compared.
  • Detecting frame alignment enables applications that perform overlays of text and specific images without unintended effects since the overlay can be selected to be at appropriate locations on video screen image.
  • Applications such as multi-language broadcast, advertising, subtitles, or 3 rd party content overlays can be performed accurately.
  • Detecting frame alignment enables applications where text and specific image overlays can be performed without unintended effects since the overlay can be selected to be at appropriate and accurate locations in position and in time on video screen image.
  • Applications such as multi-language broadcast, advertising, subtitles, or 3 rd party content overlay can be performed correctly.
  • the detected video and audio transforms on the consumer device are used on the consumer device to reduce the cost of fingerprinting by reducing content variation while tracking the identified content.
  • the transformed query video and audio represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint.
  • the video content is received at step 250 .
  • video signatures are generated that include the detected or selected region's location or equivalent coordinate information and scale.
  • a region may be determined, and thereby selected, and a location of the determined region provided.
  • frame alignment is performed using scale or size of a selected region and x and y coordinates of a fingerprint descriptor center.
  • a search and content match process is performed to detect for a match between the query which is the incoming video received at step 250 and reference database.
  • the reference database may be located on a central server or at a client device.
  • the same content match process evaluates the confidence of the match.
  • One method of estimating confidence of the match includes using a geometric correlation between the scale and x, y coordinates of the fingerprints. If a reliable, as determined by the confidence, match is not detected, the query is generated once again returning to step 251 for signature generation. If a reliable match is not found, another search is processed to in an attempt to obtain a match with good confidence before making assumptions about video frame alignment. The intent is to have as correct a match as possible, before making an estimate of the geometric alignment between the query and reference video frames. If a reliable match is detected, the process 245 proceeds to step 257 .
  • Step 257 involves calculating a scale ratio on each X-axis and Y-axis between 2 pairs of matching query and reference signatures by obtaining the geometric x, and y coordinate difference between the query signature pair and the reference signature pair along each axis.
  • regions of a video frame are selected for a fingerprint.
  • the center of each region fingerprinted can be described with x, y coordinate.
  • the size of the region is described using the scale value.
  • QA(x) is x coordinate of Query
  • RA(x) is x coordinates of Reference of a matching signature pair A; and similarly for signature pair B.
  • an additional condition can be used to select or prefer pairs of fingerprints that agree geometrically, and in this alternate embodiment only pairs which have center coordinate difference greater than a threshold are considered.
  • the scale ratio on the x-axis is denoted as Sx, and that on the y axis as Sy.
  • the average scale ratios ASx, ASy on each axis are calculated. Outliers, are those pairs that have high geometric alignment error and are eliminated while calculating this average.
  • the pixel offset between the query and the reference video frames are calculated. For each matching pair the pixel offset is calculated with the following equation
  • the evaluated frame alignment information between query and reference video is reported at step 260 .
  • the reported frame alignment information includes pixel or equivalent offsets along x-axis and y-axis, and the scale ratios on the x-axis and y-axis. With this information, it is possible to map the location of the query video frame to exact pixel locations on the reference video.
  • the frame alignment information is used to generate transformed query video and audio fingerprints represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint.
  • the query signatures generated using frame alignment more accurately represent the reference, fewer query signatures may be used to determine a continued match of the incoming video broadcast at the consumer device with the reference.
  • the detected frame alignment is also very useful to align any overlay text or image in various applications that are described further.
  • FIG. 2C illustrates a method 270 of video segmentation using graph based partitioning.
  • a unique embodiment of a graph segmentation method is use of edge weights defined to represent the similarity between content nodes that represent a unique content time. Each node is associated with a likely class based on processed reports, including content search results, blank video and scene change reports, and audio silence and turn reports.
  • the graph segmentation method is able to combine both local content similarity with the global similarities in the program content, to assist in segmentation.
  • the graph segmentation method uses content matching results with a large database to assist in classification and segmentation.
  • This method 270 combines both local in time similarity to global similarities and in the program content using large content databases to assist in segmentation.
  • a database search is performed to indicate what kind of content is being evaluated. If the content is an advertisement, it is likely to match an advertisement from the main database. If the content is an actual video program, it may at least match an opening sequence or closing credits if the program is a continuation of an existing TV or video program series.
  • content search is utilized on acquired databases of content, advertisements, promotions, opening sequences, closing credits to assist in accurate segmentation of the video content.
  • Each node, as defined below, is given a class score based on audio, video processed reports and database search.
  • a graph G (V,E) consists of nodes v i ⁇ V, and each (v i , v j ) ⁇ E.
  • Each node v i is selected at audio and video turns and specific time intervals.
  • Each edge (v i , v j ) connects certain pairs of nodes, usually neighboring time nodes, and neighboring significant nodes that are unique because of an audio or video scene change or at boundaries of content matching sequences.
  • a node represents a point in time in the video content. The node at the selected time holds relevant information including audio signatures, video signatures, type of event, such as audio silence, an audio turn, a scene change, or just a sample.
  • a weight is associated with each edge that is based on the similarity between the nodes.
  • the graph can be partitioned using any of the well known graph partitioning methods.
  • One approach for graph segmentation is a method using pairwise region comparison, as described in “Efficient Graph-Based Image Segmentation”, by P. Felzenszwalb and D. Huttenlocher, Int'l J. Computer Vision, vol. 59, no. 2, pp. 167-181, 2004).
  • a graph in order to partition a graph into classified segments such as advertisement, promotions, and content, additional edge weights are added based on the likely classification.
  • the classified content can be further segmented into individual advertisements, or content chapters.
  • a graph cut method using pairwise region comparison calculates an edge weight between 2 regions.
  • a low cost implementation of the edge weight may be the highest similarity score between nodes in each region, while in more robust implementations, an edge weight would be calculated between the 2 entire regions.
  • the 2 regions can be merged if the edge similarity is greater than the average or median (or another function) of the 2 regions.
  • the input multi-media content is received at step 271 .
  • the received multi-media content is used to generate audio and video fingerprints and detection of audio turns, audio silence, and video scene changes, and video blanks.
  • logo identification is performed at step 273 .
  • a database search is performed using the fingerprints against a database of advertisements, promotions, and content information including opening scenes or full content.
  • frame similarity between different time locations of content is performed. The time locations can correspond to nodes in a constructed graph, and the time points are selected because they are significant, such as a beginning period of silence, blank video frames, an audio turn, a video scene change, or determining a content match boundary.
  • step 275 The results from step 275 , database search, and from step 273 , logo identification, and 272 , audio and video event reports, and content similarity reports from step 274 , are input to a graph analysis and partitioning at step 280 .
  • Graph segmentation is also performed at step 280 to generate a classified video segmentation such as advertisement, promos, and content.
  • a finer segmentation can also be performed to identify individual advertisements, and individual content chapters.
  • new advertisements are identified using the video segmentation by graph or classification methods. Segmented advertisements that partially matched or did not match previous advertisements are identified and are considered candidates for new advertisements. With this method, new advertisements can be identified efficiently and quickly while monitoring hundreds of channels with continuous broadcast.
  • FIG. 3 illustrates a state transition diagram 300 of an embodiment of the state based classifier.
  • the inputs at state 301 include the audio and video analysis reports: audio silence, audio turn, scene change, video blank/black frame, and initial search results are processed by the state classifier.
  • the classifier detects a particular type of content, it causes state transition into states such “likely advertisement state” as in state 303 ; when its detects content that is classified as such, likely broadcast content state 305 , and likely movie in state 308 .
  • the inputs including further search results, and fingerprint reports are fed to the classifier.
  • the classifier confirms the detection of a particular category of content, the state transitions to a “confirmed” advertisement or confirmed program and other confirmed states as in states 304 and 306 . If the content is unknown and meets certain rules, then a query is made to search server at state 302 .
  • the fingerprinting and analytics have different modes, and their compute cost is reduced in the “likely” states.
  • Analytics are methods to extract information, such as logos, scene changes, and the like. In the “confirmed” states, the audio and video analysis cost can be reduced even further till a change in state occurs.
  • FIG. 4A is an illustration of the data structures 400 to capture the video and audio analysis during fingerprinting, and search results for segmenting the video.
  • the data structures hold data generated by audio analysis, video analysis, logo detect, content similarity and database search reports.
  • the video report data structure holds the time of the event, the frameType, such as audio or video and transitionType, which may include blank, fade, black.
  • the similarity search data structure holds the time, the time difference offset between compared frames, and the similarity score.
  • the audio report data structure holds the time of the event, the audio event, such as an audio turn, silence and audioLevel.
  • the search result data structure holds the time, length of match, number of unique matching programs, total number of matches, and the match type, including classification of database that is searched on.
  • the logo data structure is used to identify the matching logo and holds the time, whether logo detected and the logo ID.
  • the data structures are used to classify and segment the multimedia program by adding a classification weight or score at each time node. When graph based segmentation is used, the data structures are utilized to generate node classification and edge weights.
  • FIG. 4B describes the relevant data structures 450 for non-recorded broadcast which are used to segment streaming video.
  • the data structures hold data generated by the audio analysis, video analysis, logo detect, and content similarity reports.
  • an additional frame similarity search report is added and this holds the results of frame level content matching with client databases of opening sequences, closing credits, specific images and sound bites.
  • a frame similarity search is performed only on detected frames at the client, and these are triggered by events such as scene change, audio turn, silence or video blanks, fading.
  • An additional frame similarity data structure holds event time, type of match, match score.
  • a graph connecting different time points within the video is generated.
  • the data extracted from the video and audio analysis reports and stored in the data structures are used to generate similarity scores regarding the similarity of different sections of the video.
  • Similarity scores are also generated that represent a likelihood of content or advertisement at a particular time. Such a prediction is based on past history of previous data base content searches and the previous content of the same video.
  • the scores are mapped onto a graph structure and the graph is segmented into sections representing content and advertisement classes, as well as into individual advertisement and content chapters.
  • FIG. 5A illustrates a flowchart 500 to identify and track content and perform specific time-aligned application such as advertisement replacement. This includes a method of fast and accurate content segmentation, and content identification.
  • a video segmentation method utilizes graph partitioning or a classifier to segment or classify sections of the video content.
  • the inputs for the classifier or graph partitioning technique are video and audio analytic reports in time, content similarity in time, and content match reports for advertisements and content with matching time information.
  • the video and audio analytics include detection of video scene changes, including black frames, audio silence detection and audio turns, and number of active audio channels.
  • a content query on a search server is performed to identify current video content playing on a selected client.
  • the search and content match method at step 502 identifies video and audio transforms on content played out at the client, in addition to identifying the content.
  • the detected audio and video transforms at the client include detection of the frame mapping between reference and query video frames.
  • FIG. 2B illustrates a method for detecting frame alignment between query and reference video.
  • the client now performs a video and audio transform, as required to better align the client fingerprints to the reference and then generates query fingerprints.
  • a detected transform for frame alignment is performed on query content while generating fingerprints. This step enables low compute cost and better tracking of client content to the reference in upcoming processing steps.
  • scene change detection is utilized on the client content to select frames to perform fingerprinting and correlate with the reference.
  • the fingerprints are used to track the client content to the reference.
  • client content is tracked with reference to the expected broadcast and that includes time sections where the content being played is not known such as unidentified advertisements. Processing is optimized if the expected time slot for the advertisement or content to be tracked or replaced is known.
  • step 505 on a scene change or audio transition, a check is made on whether the sampled incoming content is an appropriate transition after which the expected content is likely to play out.
  • step 506 the incoming content in the client buffer which may not necessarily be played out is verified with multiple fingerprint methods and determined whether any matches are found with expected content. If the tracked advertisement or content is associated with some time-aligned service, that action is performed at step 507 .
  • FIG. 5B illustrates a flowchart 510 for performing advertisement replacement for a specific pre-selected and identified advertisement slot.
  • This method is advantageous and requires specific information of an advertisement or content that needs to be replaced.
  • One embodiment needs the specific time when the advertisement is expected to occur.
  • Embodiments of the invention include the transition 512 to describe the content to be replaced, and the time information (known or unknown), step 516 to verify instance of occurrence via video frame, audio, or watermarks in audio or video, and the step 517 to track the original content that is incoming while the replacement content is being displayed or played at client site.
  • the time location of the advertisement or specific information to be overlaid or displayed is defined for multi-media content.
  • the content is sampled or sensed.
  • a content query is performed on a search server to identify current video content playing on the client.
  • the client also tracks the input content fingerprints with the reference. Processing may be optimized if the expected time slot for the advertisement or content to be tracked or replaced is known. If the exact location is unknown, as may be the case with a live broadcast or a non-recorded linear broadcast, verification processing is required on all possible transitions.
  • the incoming content in the client buffer which may not necessarily be played out is verified with multiple fingerprint methods and determined whether any matches are found with expected content. If the tracked advertisement or content is associated with some time-aligned service, that action is performed quickly in step 517 .
  • FIG. 5C illustrates a flowchart 520 to publish content and offer user interaction modes for time-aligned applications.
  • the method of content publishing with associated content data, associated content links, content control menus, and user control menus enable an entire ecosystem of content publishing.
  • the advantageous methods of content publishing as described herein offers a user choice of the content to be presented and supports efficient and engaging user interaction.
  • Time-aligned services can be consumed on a separate device or screen without disturbing a first screen, such as a primary TV display. In many cases, the user may not have control or may not want to exert control especially when other viewers are watching.
  • the methods for time-aligned services enable each user to have a private selected experience of viewing a program along with additional specific information, such player statistics, or dance steps, or local costume designers of actor apparel.
  • the user choices can be remembered and can be different for each program.
  • the same user may want player statistics, game scores, and standings for an NBA game, but may also want to learn dance steps while watching a network TV show “Dancing with the Stars”.
  • the user While watching a streaming movie, the user may want to control the first screen and place it into “family viewing mode”. Such control would be possible by restricting non-family rated pieces and fetching family friendly replacements.
  • the reference content is processed initially to generate sequentially or in parallel fingerprints and associated data as shown in steps 522 through 525 .
  • fingerprints for the content are generated and stored with the timestamps in memory 521 .
  • content information is defined, at step 525 the content control options are defined, and at step 526 the user menus to be offered are defined.
  • time based behavior for the metadata is defined which includes content information, content control and user menus.
  • the memory associated with access step 521 stores the information from steps 522 , 524 , 523 , 526 , and 527 .
  • the content is sampled or sensed.
  • a content query is initiated on the client device to be performed on a search server to identify current video content playing on the client, when the content is not found on client.
  • part of the database is on the client which is searched first.
  • the client also tracks the input content fingerprints with the reference.
  • the content information determined from metadata and content metadata links is displayed.
  • the user is offered control for content viewed on one or more display screens. For example, a display screen selection, display format selection, content type selection, and time scrolling may be offered among other control options.
  • content, fingerprints, and control metadata are downloaded at request of the tracking function at step 530 .
  • step 530 if tracked content continues matching updated content, display and control options are provided to user. If content does not track, segmentation is used to decide the content type and also decide whether to keep looking for local match, or send a new query to search sever. At step 532 , the process returns to the content identification and content tracking step 530 .
  • FIG. 6 illustrates a method 600 to segment broadcast TV content, and provide time-aligned services.
  • Step 601 performs fingerprinting and content analysis of broadcast content.
  • Step 601 transmits the fingerprints of each recording such as a TV program as a query to the content search at step 603 .
  • a content search server at step 603 returns a search report containing the detected match data to step 605 .
  • the content search server transfers the information about the video frame alignment between the reference and query to the fingerprint generator.
  • the content search server sends information about the detected audio transforms between reference and query.
  • the fingerprint generator can use light weight processes with much lower compute cost, since the detected transforms can be applied to reduce the similarity error of the generated signatures.
  • the time schedule of the program, ad slots and required action are retrieved when the content is identified.
  • audio and video analysis reports are received from the fingerprinting step 601 .
  • the search, audio, video analysis, detected logo information, and similarity reports are received and video segmentation is performed.
  • the content is tracked until the expected time slot of action.
  • the incoming content is verified if it is exactly the same as the expected one. This check is performed in step 611 .
  • video and audio analysis is performed to locate a likely location on the video frame and identifies the location where information can be inserted. This functionality can be used to enhance the program watched, and to overlay messages or advertisements.
  • the video analysis at step 607 detects space on the video frame that is relatively free or smooth.
  • Video frame alignment at step 604 provides information that describes the relationship between the client video frame and the reference video frame.
  • Step 613 executes the overlay, insertion or advertisement replacement onto the streaming broadcast content.
  • FIG. 7A illustrates a flowchart 700 to perform fast and accurate content segmentation on broadcast non-recorded content, and overlay content information on one or more selected display screens.
  • a content analysis is performed and then a query and search operation executes on a database on the client device and then if no match is found on the local client device, a query is sent to a central server (cloud) to identify current video. Since the goal is to detect non-recorded television broadcast segments, the process cannot rely only on fingerprints since none exist for a non-recorded broadcast segment.
  • the logo of the channel, the program logos, and opening sequences of the program are used to assist in identifying the content.
  • the client performs continued tracking by verifying the program details from text and logos in the content.
  • scene change detection is utilized on the client content to select frames to perform fingerprinting and correlate to generate reports to support segmentation.
  • the client content that includes time sections is also tracked, where the content being played does not even have the logo information.
  • the process 700 is able to do such tracking by using similarity information from audio and video which can identify the likelihood of the same video.
  • step 705 on a scene change or audio transition, a determination is made whether this transition is a possible content change. If the transition is not a possible content change, the process 700 returns to step 704 . If the transition is a possible content change, the process 700 proceeds to step 706 .
  • an “expected transition at a given time” is checked for, since it is intended to replace a specific ad which is expected at a given time for typical TV program. For live broadcast such as NBA basketball, “the expected transition” may occur at any time and is checked for accordingly.
  • the processing step 706 communicates the detected frame alignment of the query video, and information about space usage of the video frame via step 708 .
  • the display information, at step 708 enables the optimal overlay of user service or advertisement information.
  • time aligned services Some examples of the time aligned services that are provided are listed below.
  • FIG. 7B illustrates a flowchart 710 to offer time-aligned services of enhanced viewing experience on one or more selected display screens utilizing content metadata or additional content sources.
  • Another embodiment of the invention addresses content identification and tracking, and segmentation that enables new time-aligned services to the user.
  • Another embodiment of the invention addresses a method of content publishing with associated content data, associated content links, and content control menus, supported by intuitive user control menus.
  • Such content publishing enables an entire ecosystem of content publishing.
  • An ecosystem of time aligned (synchronous) content publishing enables the provider to distribute synchronous streams of information that can be consumed on different user devices such as second screens.
  • the synchronous streams can be used to replace original content with targeted ads, subtitles, audience rating or the like when desired.
  • the ecosystem of content publishing including generating synchronous content streams, associated data, content control and user control and display menus.
  • the time-aligned services can be consumed on a separate device or screen without disturbing a main display screen.
  • a user may not have control or may not want to exert control especially when other viewers are watching.
  • the methods for time-aligned services enable each user to have a private selected experience of viewing a program along with additional specific information such player statistics, or dance steps, or local costume designers of actor apparel.
  • the reference content is processed initially to generate fingerprints and associated data and content streams at step 712 .
  • additional information must be generated and linked at the servers. Fingerprints and watermarks in content are used to identify content at client. For each broadcast content, additional content choices can be created such as an alternative language, such as Spanish audio stream and Spanish text overlay for screen, sports statistics per event in the sports game, bio, or action information during a prime time TV program. Links to such content or metadata associated with the content for the additional information may be stored at servers along with the reference fingerprints, if required. To enable a rich user experience, menus for user control of information, display, and content selection are provided to the users.
  • 3 rd content information or streams are provided.
  • the party content is sampled or sensed.
  • a content query is performed, for example on a search server, to identify current video content playing on the client.
  • the tracking function requests further download of fingerprints and content, control metadata.
  • the client tracks the input content fingerprints with the reference. Also at step 716 , if tracked content continues matching updated content, display and control options are provided to user.
  • a determination is made if the content at the transition is expected content.
  • step 720 If the expected content is found at the transition, then further actions and information transfer for next actions is performed by steps 720 , 721 , 722 ; and content continues to be tracked at step 716 . If content does not track, segmentation is used to decide the content type and decide whether to keep looking for local match, or send a new query to a search sever. If the sensed or input content stops tracking the reference, the process 710 continues to the content identification step, 713 .
  • step 720 the content information from the 3 rd party metadata and content metadata links is displayed.
  • step 721 the user is offered control for content viewed on one or more display screens, including choices for display screen and format selection, content type selection. Time scrolling selection is offered at step 722 .
  • FIG. 8 illustrates a method 800 to perform efficient broadcast monitoring on clients using video segmentation and a central search system for content matching. Segmentation is utilized to improve accuracy and bring scale efficiency to advantageous time-aligned applications.
  • An embodiment of the invention is a method that uses the current identification state to selectively invoke specific feature detectors or descriptors, thus optimizing the memory and compute resources required on the remote client.
  • the invoked feature detectors or descriptors are then used in performing a search to obtain content match or track the content.
  • This method is particularly useful when supporting many clients making large scale deployment economical, and reducing the compute loads on the remote client devices. With reduced compute loads, the client devices are capable to do user friendly tasks such as fetching and displaying content and responding to user interactions.
  • Another embodiment of the invention is a technique for time-aligned services identification of content and for tracking incoming or sensed content which provides a stored content sequence that may be used for detection.
  • a correlation is performed at scene changes and audio turns, to check and verify that the incoming content remains similar to the expected program content.
  • This method can improve the accuracy of content tracking while reducing the computation cost.
  • the feature to track content more intelligently using scene change and audio turns also enables delivery of time-aligned applications for live broadcast content where pre-recorded fingerprints are not available.
  • Techniques for efficient content monitoring and audience measurement include tracking of a logo, a program logo, and other types of logos and scene change markers which are used to reduce client computation and fingerprint processing bandwidth. Computation is reduced by electing to do fingerprinting in conditions where it is likely that the content has changed due to user or broadcast network such as scene change or audio turns. Similarly bandwidth is reduced by sending fingerprints at significant events or at a lower sampling rate once content has been identified and is being tracked.
  • An embodiment for time-aligned services identifies content and tracks incoming or sensed content to stored content sequences.
  • a correlation is performed at scene changes and audio turns to check and verify that the incoming content remains similar to the expected program content.
  • This method can improve the accuracy of content tracking while reducing the computation cost.
  • the method to track content more intelligently using scene change and audio turns also enables delivery of time-aligned applications for live broadcast content where pre-recorded fingerprints are not available.
  • a logo detection and identification is performed on the incoming broadcast video input.
  • the broadcast video is identified and classified on a client device using any of the following methods:
  • step 804 after identifying the broadcast incoming video content, critical relevant information of an event is extracted from played audio and video utilizing available information such as an electronic program guide (EPG) or simply a PG.
  • EPG electronic program guide
  • step 805 a check is made as to whether the classified and identified content is among the channels and programs that need to be monitored.
  • step 806 a determination is made whether additional information is required at the client. If so, at step 807 , the query is submitted, including detected signatures, text, logos, detected channel and programs, to the search servers which accurately identify the content.
  • the efficiency of broadcast monitoring is improved by deriving information from video segmentation. Queries from monitored clients can be limited to a particular class of database, based on an identified channel or program. Video segmentation classifies commercials or promos being played, and the queries to the search server can be avoided if commercials for some or all programs do not need to be monitored. Video segmentation methods for pre-recorded and live broadcast content are described in FIGS. 2A, 2C, 3, 4A, 4B and applications in FIGS. 5A, 5B, 5C, 6, 7A, and 7B . If content being played at client site is classified or identified as an advertisement, the client agent can avoid a query to server when only content is being monitored.
  • Learning rules to identify new content are used to improve efficiency of the search system. If a particular user watches or plays popular video games, these can be identified by the remote client based on a set of rules about the content playing.
  • the set of rules about the content played by user can include extracted logos, text, video frame color and interest region based fingerprints and audio fingerprints.
  • the rules for segmentation are program specific, each program follows are particular format. Further each user typically watches a few programs. It is possible to learn the rules for segmentation for each user based on this information, and have high segmentation accuracy.
  • the basic video segmentation utilizes content search databases to segment known content, and uses inter frame and content similarity analysis to further assist segmentation, besides using other information, such as program and channel logos, content information and EPG which indicates the broadcast schedule.
  • FIG. 9A describes a method to provide a time-aligned service such as advertisement replacement during a video broadcast.
  • the content playing is identified at step 902 .
  • the content playing can be identified using a variety of methods including:
  • search server (3) querying a search server reference database using audio, video signatures of content and other extracted information such as channel and program identification.
  • the search server also detects the video frame mapping of consumer device video reference video and determines frame alignment information between a query and reference content found in the reference database.
  • the actual time alignment of the content playing on the consumer device relative to broadcast content is identified and tracked.
  • the time alignment of reference and query content is determined.
  • the accuracy of the time aligned is further improved.
  • the incoming video is processed to detect scene changes and audio turns, and this is followed by video and audio processing such as at the detected scene change and audio turn.
  • the video processing includes signature generation, logo detection and identification, using the generated data to track the identified content, to identify changes in the content, and to start content identification afresh.
  • the tasks of scene change, audio turn detection, and segmentation are performed on the incoming video.
  • Step 905 the expected start time for the advertisement to be replaced is updated using a projected value of the match time.
  • step 905 includes projecting the time of the expected advertisement in terms of the current system clock time, while monitoring the segmentation changes within the incoming video content.
  • Step 905 eventually identifies that a scene change event is within the target range of the start of the selected advertisement to be replaced. Then, step 905 invokes the verification step.
  • Step 906 the incoming content at the expected time range is verified to be the expected advertisement.
  • Step 906 also recovers the frame alignment information between the query and reference video, and can regenerate the video frame or interpret the video analysis process appropriately.
  • Step 906 also generates signatures on a small time sample of the incoming video beginning at the identified scene change event using audio and video fingerprinting. Next the generated signatures are compared against the beginning period of the original advertisement, specifically, such as the first video frame and associated audio of the original advertisement. If the incoming video agrees with the expected advertisement, the local video buffer display is switched to the new alternate advertisement. It is possible to perform a highly accurate check that the expected video frame is matches the incoming video first frame.
  • Video fingerprinting which detects interest regions at interesting locations on the frame, and generates descriptors of region around the interest regions, and the associated coordinates and scale of the detected regions allow a very accurate check. Additionally the video time locations and selected transitions allow only very few possibilities for matching. Alternate methods of video fingerprinting using intensity and color information can also be used for highly accurate matching between reference and first video frame. If the comparison does not match, the process 900 returns to step 902 .
  • the advertisement is switched and the video frame mapping is decided based on the detected frame mapping from step 902 , and tracked through steps 903 , 904 and 906 .
  • At step 902 when content is identified an initial mapping of the reference to query screen is performed.
  • mapping is refined and tracked through client operations 903 , 904 and 906 .
  • the incoming content is monitored and tracked to verify it matches the expected content.
  • the advertisement replacement process continues until the incoming advertisement ends or defined substitution time ends and while incoming content, such as advertisements is the expected content.
  • a replacement advertisement may be a partial replacement or an overlay.
  • An appropriate delay buffer may be used to accommodate the delays for identifying and verifying advertisement for switching, so that user experience is not at all degraded.
  • a simpler method for advertisement replacement may be employed by the cable operators with the co-operation of the content owners.
  • the timing information of the original advertisement and the one to be replaced are available to the cable operator and at the end user set top box or equivalent.
  • the problem remains how to deliver the alternative advertisement to the end user.
  • This alternative advertisement can be delivered by internet or over certain channels on the cable.
  • a similar approach can be assumed for over the air broadcast. However these solutions are not applicable when the assumptions are not valid such as when the content owners and cable operators do not agree on deploying this mode of advertisement replacement.
  • FIG. 9A in contrast illustrates a content publishing method with user control; wherein the user can choose the type of synchronous content during the entire TV viewing experience.
  • FIG. 9B illustrates an example of publishing content as a time-aligned application.
  • the time aligned applications would synchronize to the show the user is currently watching on a big-screen TV using content identification technology. After the identification of the content, these applications would display some additional content which may include text, images, and video, links to supplemental on-line information, buttons for shopping, voting, or other actions, on the second screen.
  • the contents to be displayed are synced with the main content and the content publisher would be able to specify the relationship between the content and the additional information displayed on the second screen.
  • FIG. 9B describes an exemplary example time aligned application that can be created using various services described in this application.
  • the second screen device is shown displaying various menu options 952 , 954 , 956 .
  • the menu option 952 is associated with the content being currently displayed, while menu option 954 and 956 are associated with the content that is displayed in the past.
  • menu option 952 When a user clicks on any of the menu options 952 , 954 , 956 , the application displays more information 970 as shown in the figure. As the content on the main display screen progresses, menu option 952 is updated with a new menu option obtained through a request to the content identification service network. The menu options for previous events are pushed down forming a stack of the menu options. Such stacked menu options 952 , 954 , 956 can be scrolled using scrollbar 958 . Menu options can have further menu options within themselves forming a chain of menu options. To implement these time aligned publishing the publisher needs to provide following information.
  • FIG. 9C illustrates menu formats.
  • the menu choices can be provided in various formats.
  • An example partial xml format is illustrated in FIG. 9C .
  • This example shows two menu options. The first menu option is displayed 60 seconds after the program starts with a menu title “Welcome to the program”. The detailed content is specified under the tag “display_content” which for this menu is shown only for 240 seconds. After 240 second, this menu option is removed from the stack of the menu options. The second menu option is displayed after 300 seconds with a menu title “Know more about the program” and is displayed for 900 seconds.
  • FIG. 10 describes a method 1000 to provide a time-aligned service such as selecting a language of choice for a broadcast video program.
  • the content playing is identified at step 1002 .
  • the content playing can be identified using a variety of methods including:
  • the search server also detects the video frame mapping of consumer device content query.
  • the actual time alignment of the content playing on the consumer device relative to broadcast content is identified. Further, the incoming video is processed to detect scene changes and audio turns, and this is followed by video and audio processing such as at the scene change or at an audio turn.
  • the video processing includes signature generation, logo detection and identification, and these are used to track the identified content or to identify a change in the content and start content identification afresh.
  • the tasks of scene change detection, audio turn detection, and segmenting the incoming video for processing are performed.
  • the frame alignment information is recovered between the query and reference video. Then signatures of the incoming broadcast content and reference video are generated. The signatures are used to synchronize the incoming video time to the reference.
  • the detected relationship time of an incoming video is used to align with the selected language customizations, the audio track and the text and/or video overlays over the original video.
  • the continued updating of the detected time relationship between reference and current video can be performed by multiple methods including:
  • step 1007 the selected audio tracks are switched and the text and video overlays are performed using the video frame mapping information from step 1002 .
  • the incoming video content is thus aligned in time and in video space through steps 1003 , 1004 and 1006 .
  • step 1008 the incoming content is monitored and tracked it in time with the expected content.
  • alignment between reference and current playout is updated at step 1006 .
  • a language customization application for audio substitution and video overlay continues while the incoming content is as expected. If the content stops tracking with expected content then control moves to step 1002 .
  • a simpler method for selecting language of choice may be employed by the content owners with the co-operation of the cable operators.
  • the timing information of the broadcast content and the language customization data are available to the cable operator and at the end user set top box or equivalent.
  • the problem remains how to deliver the alternative language customization data to the end user.
  • This additional data can be delivered by internet or over certain channels on the cable.
  • a similar approach can be assumed for over the air broadcast. However these solutions are not applicable when the assumptions are not valid such as when the content owners and the cable operators do not agree on this mode of deployment of multi-language choice service.
  • FIG. 11 describes another embodiment of a method 1100 to provide a time-aligned service, for selecting language of choice for a live non-recorded broadcast video program.
  • the incoming video received at step 1101 is processed at step 1103 to identify the content using any of the following methods:
  • time alignment information is maintained between query and reference.
  • the detected incoming video's reference time is used to align with the selected language customizations.
  • the audio track and the text and/or video overlays are added or overlaid at step 1107 and 1108 , over the original video.
  • the additional data to implement the language customizations, determined from step 1105 can be provided over the air or cable or internet.
  • the video frame alignment is also optionally detected between the incoming video and the reference.
  • the video frame alignment is detected using the known locations of logos, detected text between the client video and the reference video. Time alignment is performed by comparing scene change timings for audio and video content including text and logo changes.
  • the participation of the original content provider is necessary to generate the customization information simultaneously as the current content. Since both the original content and customization are generated together, crucial information to align both the original and client side playout can be generated via signatures, or via scene change, content change information with associated time. Since the broadcast content is live and not pre-recorded, querying to server cannot be used without a delay factor involved which can be upwards of 5 or more seconds.
  • a solution which may be used transfers the information that enables time alignment of the language customization directly to the client. The client can thus detect the time alignment between the reference and the language customization data and stream. Earlier, at step 1105 , the client extracts content alignment synchronization information such as text, logos, scene change, fingerprints from the incoming broadcast video input which can be over air or cable or internet.
  • step 1107 the selected audio tracks are switched and text and video overlays are performed using the video frame mapping information from step 1106 . And at step 1108 , the text and video overlay for the selected language are overlaid on the video frames.
  • FIG. 12 illustrates a method 1200 to segment broadcast TV content using hybrid and adaptive fingerprint methods.
  • tracking of logo, program logo, and other logos and scene change markers are used to reduce client computation and fingerprint transfer bandwidth.
  • the computation and fingerprint bandwidth is reduced by electing to do fingerprinting in conditions where it is likely that the content has changed due to user or broadcast network action.
  • step 1201 fingerprinting and content analysis is performed on broadcast content.
  • the fingerprints of each program are transmitted as a query to the content search Server 1 , for a search operation at step 1202 .
  • the content search server 1 returns the search report containing the detected match data to step 1204 , to fingerprint step 1203 , and to the segment/classifier step 1205 .
  • the content search server 1 transfers the information about the frame alignment and the time alignment between the reference and query to the fingerprint generator 2 , step 1203 . Subsequent content searches are sent to content search server 2 , step 1204 .
  • the fingerprint generator 2 (step 1203 ) can use light weight processes with much lower compute cost, since the detected transforms, such as frame alignment and audio transform, can be applied to reduce the similarity error of the generated signatures.
  • the segment/classifier step 1205 manages the incoming content, and controls (activates and disables) the time aligned service. Step 1205 includes the functionality of segmenting, classifying and predicting the time alignment of incoming video. The step 1205 also communicates the video frame alignment information, so that video overlays can be performed optimally. Step 1209 , executes the video overlay, insertion or advertisement replacement onto the streaming broadcast content. Before any overlay can start the time alignment between the reference and incoming content, the incoming content is verified in step 1206 .
  • the verification step 1206 can use a variety of fingerprinting methods to generate signatures and correlate to verify the time alignment with the reference signatures.
  • Step 1208 continues to perform more light weight verification, and content tracking; and trick mode detection on incoming content while the time aligned services are overlaid on the incoming broadcast video, by step 1209 .
  • Trick mode is defined as a digital video recorder (DVR) actions of fast forwarding or skipping sections or rewinding video content. Scene changes and audio turns that are detected are compared with the expected times, as these may be unaligned due to possible trick mode operations. Then, a verify operation of trick mode or other unexpected changes is performed and a graceful transition to normal video input is performed.
  • the verify process for trick mode can be as simple as checking that audio and video content is not aligned to expected content's scene changes and audio turns. A more complex process employs comparison of fingerprints between the expected content and the current played out content.
  • the verify process can be used for live broadcast where pre-recorded content is not available.
  • fingerprints of already played out live broadcast content can be stored locally or on a central server. These recorded fingerprints of non-pre recorded broadcast can be used to detect possible trick modes, such as rewind, and align with the correct time of video content being played out on the TV or other screens.
  • a trick mode is detected by performing logo detect processing and matching for trick mode overlay buttons on the video.
  • the client stores a small search database of fingerprints that match opening sequences of programs. Additionally, the client stores a small database of logos, and program logos, and in certain cases specific logos of teams for sports programming. To detect dynamic logos, a set of rules about the dynamic logos are stored. These local databases are then utilized to identify content playing on a client, or utilized to make a likely guess about the match. To verify the “likely match” specific additional information is downloaded or queried with the central servers to support identification and segmentation. The additional information can be color descriptors, signatures of template videos, speaker voice models.
  • the client learns and caches critical information about the popular channels and programs watched and the associated channel logos, program logos, program specific text and logos, and video frame layouts. This learning is used to optimize the cost and accuracy of content identification and segmentation.
  • the above learning of video frame layouts for popular programs includes specific details such as text locations, color, logos or text locations within video frames such as team scores.
  • a learning engine is used to learn rules to best segment and identify content at each client by adding relevant and user specific sequences, video layouts, opening sequences and closing credits to the content databases.
  • the learning engine also assists in creating the rules for identification of new programs and channels at the client device.
  • the ability to learn new rules to identify content can significantly improve the efficiency of a content monitoring system, since identification at a client can prevent queries being sent that are avoidable and can target a search to appropriate search databases separate from the client device.
  • the rules learned at the client are communicated to the server and all the rules learned for content can be stored on the central servers, which enables classification and categorization and identification of the content.

Abstract

A content segmentation, categorization and identification method on consumer devices (clients) is described. Methods for content tracking are illustrated that are suitable for large scale deployment and applications such as broadcast monitoring, novel content publishing and interaction. Time-aligned (synchronous) applications such as multi-language selection, customized advertisements, second screen services and content monitoring applications can be economically deployed at large scales. The client performs fingerprinting, scene change detection, audio turn detection, and logo detection on incoming video and gathers database search results, logos and text to identify and segment video streams into content, promos, and commercials. A learning engine is configured to learn rules for optimal identification and segmentation at each client for each channel and program. Content sensed at the client site is tracked with reduced computation and applications are executed with timing precision. A method and user interface for time-aligned publishing of content and subsequent usage and interaction on one or more displays is described.

Description

  • This application is a continuation of U.S. patent application Ser. No. 13/327,350 entitled “TV Content Segmentation, Categorization and Identification and Time-Aligned Applications” filed Dec. 15, 2011, which in turn claims the benefit of U.S. Provisional Patent Application Ser. No. 61/423,205 entitled “TV Content Segmentation, Categorization and Identification and Time-Aligned Applications” filed on Dec. 15, 2010, both of which are hereby incorporated by reference in their entireties.
  • CROSS REFERENCE TO RELATED APPLICATIONS
  • U.S. application Ser. No. 12/141,337 filed on Jun. 18, 2009 entitled “Method and Apparatus for Multi-dimensional Content Search and Video Identification”, U.S. application Ser. No. 12/141,163 filed on Jun. 18, 2008 entitled “Methods and Apparatus for Providing a Scalable Identification of Digital Video Sequences”, U.S. patent application Ser. No. 12/772,566 filed on May 3, 2010 entitled “Media Fingerprinting and Identification System”, U.S. application Ser. No. 12/788,796 filed on May 27, 2010 entitled “Multi-Media Content Identification Using Multi-Level Content Signature Correlation and Fast Similarity Search”, U.S. application Ser. No. 13/102,479 filed on May 6, 2011 entitled “Scalable, Adaptable, and Manageable System for Multimedia Identification”, and U.S. application Ser. No. 13/276,110 filed on Oct. 18, 2011 entitled “Distributed and Tiered Architecture for Content Search and Content Monitoring”.
  • FIELD OF THE INVENTION
  • The present invention generally relates to techniques for video and audio multi-media processing shared between a central server and remote client devices and more specifically to techniques for multi-media content segmentation, classification, monitoring, publishing in time-aligned broadcast applications, and usability for content viewing and interaction.
  • BACKGROUND OF THE INVENTION
  • Video content segmentation, categorization and identification can be applied to a number of major application areas. The major application areas are broadcast content indexing, and monitoring broadcast content.
  • A number of applications utilize video segmentation and content identification. Also, a number of techniques to detect commercials within broadcast content use feature detectors and a decision tree, also considered a form of classifier. Such techniques are generally performed after a show is recorded.
  • Traditional content identification applications such as audience measurement, broadcast monitoring, play out verification are currently limited to a lower scale of deployment for a limited number of clients. For monitoring of large scale deployments, there is a need to perform monitoring tasks with higher efficiency.
  • SUMMARY OF THE INVENTION
  • In one or more of its several aspects, the present invention recognizes and addresses problems such as those described above. To such ends, an embodiment of the invention addresses a method for time aligned identification of segments of multimedia content on a client device. Multimedia content of broadcast multimedia data received on a client device is identified. A time alignment of content playing on the client device relative to the received broadcast content is tracked and refined. A change in multimedia content has occurred and the time of the change are identified. A sample of the multimedia content beginning at the time of the change in multimedia content is verified to match an expected multimedia content, wherein a time aligned service is provided beginning at the time of change in multimedia content.
  • Another embodiment of the invention addresses a method of video segmentation. Fingerprints of incoming video are generated. A reference database is searched to identify content of the incoming video. Segments are associated with classification scores generated based on the incoming video content using search reports and content analytics, wherein the content classification scores represent types of content contained in the incoming video.
  • Another embodiment of the invention addresses a method of video segmentation based on graph based partitioning. Fingerprints of incoming multimedia content are generated. Nodes in a graph are identified, wherein each node represents a change in multimedia content and the point in time the change occurred in the multimedia content. A weight value associated with each edge between the nodes is generated based on similarity scores between different nodes in the graph. The graph is partitioned into segments. The segments are classified according to types of content contained in segments.
  • Another embodiment of the invention addresses a method of providing time aligned services. An incoming video stream is processed to identify content. Third party alternative content is received for selected display by a user. A scene change is determined to have occurred in the identified content, wherein replaceable content is detected at the scene change. The replaceable content detected at the scene change is replaced with the third party alternative content selected by the user.
  • Another embodiment of the invention addresses a computer readable non-transitory medium encoded with computer readable program data and code for operating a system. An incoming video stream is processed to identify content. Third party alternative content is received for selected display by a user. A scene change is determined to have occurred in the identified content, wherein replaceable content is detected at the scene change. The replaceable content detected at the scene change is replaced with the third party alternative content selected by the user.
  • These and other features, aspects, techniques and advantages of the present invention will be apparent to those skilled in the art from the following detailed description, taken together with the accompanying drawings and claims.
  • BRIEF DESCRIPTION OF DRAWINGS
  • FIG. 1 illustrates a fingerprinting and search system for both media fingerprinting and identification in accordance with an embodiment of the present invention;
  • FIG. 2A illustrates with a flowchart an embodiment of the invention using content id matching, logo tracking, and video transition detection and audio silence detection to perform video segmentation;
  • FIG. 2B illustrates a flowchart to detect frame alignment between query video frames and reference video frames;
  • FIG. 2C illustrates a flowchart to perform video segmentation using graph.
  • FIG. 3 illustrates a flowchart showing the states of detected content and state transitions for video segmentation;
  • FIG. 4A illustrates the data structures used to store the reports from fingerprint tools and from search servers;
  • FIG. 4B illustrates the data structures used for non-recorded broadcast content;
  • FIG. 5A illustrates a flowchart to perform fast and accurate content segmentation, and identification which can be used for time-aligned applications including advertisement replacement;
  • FIG. 5B illustrates a method for specific advertisement replacement or overlay;
  • FIG. 5C illustrates a method for publishing content and metadata for first/second screen time aligned applications;
  • FIG. 6 illustrates a method to segment broadcast TV content on a consumer device and offer time aligned services;
  • FIG. 7A illustrates a flowchart to perform fast and accurate content segmentation on broadcast non-recorded content playing on a consumer devices and offer time aligned services;
  • FIG. 7B illustrates a method for time aligned applications with multi-media content publishing and user control;
  • FIG. 8 illustrates with a flowchart to perform audience measurement or video monitoring on consumer devices;
  • FIG. 9A illustrates a method to perform time aligned services such as advertisement replacement on consumer devices;
  • FIG. 9B illustrates an exemplary example time aligned application that can be created using various services described in this application;
  • FIG. 9C illustrates an example partial xml showing two menu options;
  • FIG. 10 illustrates a method to enable multiple language choices for over the air or over cable broadcast on consumer devices; by overlaying text appropriately on the video screen, and substituting audio with the selected language;
  • FIG. 11 illustrates a simple embodiment to enable multiple language choice for over the air or over cable broadcast on consumer devices. This method can also be applied to live linear broadcast where content fingerprints are not immediately available; and
  • FIG. 12 illustrates a system method to monitor broadcast TV content on a consumer device while using adaptive and hybrid fingerprinting methods.
  • DETAILED DESCRIPTION
  • The present invention will now be described more fully with reference to the accompanying drawings, in which several embodiments of the invention are shown. This invention may, however, be embodied in various forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
  • A prior art segmentation system is described in a paper “Recognizing Commercials in Real-Time using Three Visual Descriptors and a Decision-Tree”, by Ronald Glasberg, Cengiz Tas, and Thomas Sikora, at ICME 2006 pages 1481-1484. The Glasberg et al. reference uses a hard cut, a static area (SArea), and a separating block (SBlock) descriptors. The hard cut descriptor is generated from the appearance of several monochrome black frames between each commercial block. In this context Lienhart et al. in “On the Detection and Recognition of Television Commercials”, IEEE Conference on Multimedia Computing and Systems, pp. 509-516, 1997, published an approach, requiring that the average and the standard deviation intensity values of the pixels in these frames should be below a certain threshold. The SBlock descriptor, which analyses sub-images of a frame and the time-distance between the blocks, and helps reduce false detection during a fade. The SArea descriptor detects the presence of a logo. The recognition of logos is typically computationally expensive. The above reference uses a fast algorithm to detect the presence of a transparent or non-transparent logo. The visual descriptors are combined and a decision tree used to segment a video into commercial and content sections.
  • Prior art and other work in video segmentation, such as Glasberg et al., have focused on using black frames to separate commercials and specific improvements to reduce false detection. However, in many countries including the USA, black frame breaks for commercials are infrequent. Additional characteristics of channels that cause difficulties include channels that do not insert a logo, and a significant number of other channels that have a temporally varying logo. Additionally, current approaches address segmentation of content that is already recorded, and not during a live broadcast. In embodiments of the present invention, new methods are defined for accurate segmentation using content similarity, and content database searches. Techniques as described herein address large scale deployment of segmentation for applications such as time-aligned services which include specific services such as language subtitles, specific advertisement replacement or overlay, identifying new advertisements that are on broadcast channels, as described in more detail below.
  • It will be appreciated that the present disclosure may be embodied as methods, systems, or computer program products. Accordingly, the present inventive concepts disclosed herein may take the form of a hardware embodiment, a software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present inventive concepts disclosed herein may take the form of a computer program product on a computer readable storage medium having non-transitory computer usable program code embodied in the medium. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, optical storage devices, flash memories, or magnetic storage devices.
  • Computer program code or software programs that are operated upon or for carrying out operations according to the teachings of the invention may be written in a high level programming language such as C, C++, JAVA®, Smalltalk, JavaScript®, Visual Basic®, TSQL, Python, Ruby, Perl, use of .NET™ Framework, Visual Studio® or in various other programming languages. Software programs may also be written directly in a native assembler language for a target processor. A native assembler program uses instruction mnemonic representations of machine level binary instructions. Program code or computer readable medium as used herein refers to code whose format is understandable by a processor. Software embodiments of the disclosure do not depend upon their implementation with a particular programming language.
  • The methods described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside as non-transitory signals in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A computer-readable storage medium may be coupled to the processor through local connections such that the processor can read information from, and write information to, the storage medium or through network connections such that the processor can download information from or upload information to the storage medium. In the alternative, the storage medium may be integral to the processor.
  • Embodiments of the present invention go beyond segmentation of commercials on digital video recorder discs (DVDs) and address segmentation of broadcast content and live broadcast content into individual advertisements. Additional embodiments are described that enable quick detection of new advertisements appearing in broadcast content using the advantageous segmentation techniques described below.
  • Segmentation, as described herein, has also been utilized to improve identification and support time-aligned applications.
  • The embodiments of the invention provides a method to identify and segment video content that is playing on a consumer device or sensed ambiently. Further embodiments include methods to track the content accurately in time at client site or device and methods to provide time-aligned services. The methods are based on a collection of detectors and descriptors, a content identification system, a tracking search method, and a classification and identification method, and a few additional modes to intelligently control the overall system solution.
  • Also, applications related to social networking, entertainment (content publishing) and advertising can take advantage of identification of the precise multimedia program and the program's exact time as it is played on a consumer device. Such time aligned knowledge enables useful services and solutions for the user and are valuable to advertisers and content owners as well. Such applications take advantage of segmentation and identification, along with other methods such as content tracking to enable time aligned applications for broadcast content playing on consumer devices or sensed ambiently.
  • An embodiment of the invention addresses techniques for time-aligned services that utilize tracking when a match between incoming video and a stored content sequence is detected. The time aligned services technique allows a user to select displays of relevant content and results of metadata matching to a detected content's time and user menu choices. A content specific menu is prepared for the user to make selections from, such as content type and information. A user interface allows time scrolling to allow the user to go back into the program for missed information.
  • To provide for such needs, FIG. 1 illustrates a fingerprinting and search system 100 for both media fingerprinting and identification in accordance with an embodiment of the present invention. The fingerprinting and search system 100 includes user sites 102 and 103, a server 106, a video database 108, a remote user device 114 with a wireless connection to the server 106 and for example to a video fingerprinting and video identification process 112 operated, for example, by user site 102. The remote user device 114 is representative a plurality of remote user devices which may operate as described in accordance with embodiments of the present invention. A network 104, such as the Internet, a wireless network, or a private network, connects sites 102 and 103 and server 106. Each of the user sites, 102 and 103, remote user device 114, and server 106 may include a processor complex having one or more processors, having internal program storage and local user controls such as a monitor, a keyboard, a mouse, a printer, and may include other input or output devices, such as an external file storage device and communication interfaces.
  • The user site 102 may comprise, for example, a personal computer, a laptop computer, a tablet computer, or the like equipped with programs and interfaces to support data input and output and video fingerprinting and search monitoring that may be implemented both automatically and manually. The user site 102, for example, may store programs, such as the video fingerprinting and search process, 112 which is an implementation of a content based video identification process of the present invention. The user site 102 may also have access to such programs through electronic media, such as may be downloaded over the Internet from an external server, accessed through a universal serial bus (USB) port from flash memory, accessed from disk media of various types, or the like. The fingerprinting and search system 100 may also suitably include more servers and user sites than shown in FIG. 1. Also, multiple user sites each operating an instantiated copy or version of the video fingerprinting and search process 112 may be connected directly to the server 106 while other user sites may be indirectly connected to it over the network 104.
  • User sites 102 and 103 and remote user device 114 may generate user video content which is uploaded over the Internet 104 to a server 106 for storage in the video database 108. The user sites 102 and 103 and remote user device 114, for example, may also operate a video fingerprinting and video identification process 112 to generate fingerprints and search for video content in the video database 108. The video fingerprinting and video identification process 112 in FIG. 1A is scalable and utilizes highly accurate video fingerprinting and identification technology as described in more detail below. The process 112 is operable to check unknown video content against a database of previously fingerprinted video content, which is considered an accurate or “golden” database. The video fingerprinting and video identification process 112 is different in a number of aspects from commonly deployed processes. For example, the process 112 extracts features from the video itself rather than modifying the video. The video fingerprinting and video identification process 112 allows the server 106 to configure a “golden” database specific to its business requirements. For example, general multimedia content may be filtered according to a set of guidelines for acceptable multimedia content that may be stored on the business system. The user site 102 that is configured to connect with the network 104, uses the video fingerprinting and search process 112 to compare local video streams against a previously generated database of signatures in the video database 108.
  • The video database 108 may store video archives, as well as data related to video content stored in the video database 108. The video database 108 also may store a plurality of video fingerprints that have been adapted for use as described herein and in accordance with the present invention. It is noted that depending on the size of an installation, the functions of the video fingerprinting and search process 112 and the management of the video database 108 may be combined in a single processor system, such as user site 102 or server 106, and may operate as directed by separate program threads for each function.
  • The fingerprinting and search system 100 for both media fingerprinting and identification is readily scalable to very large multimedia databases, has high accuracy in finding a correct clip, has a low probability of misidentifying a wrong clip, and is robust to many types of distortion. The fingerprinting and search system 100 uses one or more fingerprints for a unit of multimedia content that are composed of a number of compact signatures, including cluster keys and associated metadata. The compact signatures and cluster keys are constructed to be easily searchable when scaling to a large database of multimedia fingerprints. The multimedia content is also represented by many signatures that relate to various aspects of the multimedia content that are relatively independent from each other. Such an approach allows the system to be robust to distortion of the multimedia content even when only small portions of the multimedia content are available.
  • Embodiments of this invention address accurate classification of queries. By accurately classifying query content, a classified query can be correctly directed to relevant search servers and avoid a large search operation that generally would involve a majority of database servers. Further embodiments of this invention address systems and methods for accurate content identification. As addressed in more detail below, searching, content monitoring, and content tracking applications may be distributed to literally million of remote devices, such as tablets, laptops, smart phones, and the like. Content monitoring comprises continuous identification of content on one or more channels or sources. Content tracking comprises continued identification of an already identified content without performing search on the entire database. For example, a television program may be identified by comparing a queried content with content already identified, such as television programs and primarily with the anticipated time location of the program as described in more detail below. This is in contrast to a number of current solutions that involve a large number of database servers for such applications.
  • FIG. 2A illustrates with flowchart process 200 an embodiment of the invention to segment video and to identify content segments accurately using content id matching, logo tracking, scene change detection, and video transitions and audio silence and audio turn detection. The process 200 is operable to run on a client device or a supporting server.
  • The client or monitoring device can be a consumer device/studio/broadcast equipment configured to perform fingerprinting, scene change detection, logo detection, and commercial break cues detection on incoming content received directly or sensed ambiently in order to segment and track the incoming content. The client device, transitions between different states based on the content identified and activates specific detectors based on its state. The client device utilizes fingerprints, content search, and processing of sensed audio and video to identify and segment the incoming video content. To identify content the client performs a similarity search and correlation against stored video and audio sequences. The client performs content tracking and segmentation of content to enable a variety of applications. For example, applications may be provided for the purpose of separating content from advertisements and monitoring of advertisements, in order to identify and separate out new advertisements. Also, applications may be provided to accurately track content and to identify, for example, advertisements and promotions accurately in time to enable time-aligned services.
  • The client or monitoring device can be a consumer device/studio/broadcast equipment configured to perform fingerprinting, scene change detection, logo detection, and commercial break cues detection on incoming content received directly or sensed ambiently in order to segment and track the incoming content. The client device, transitions between different states based on the content identified and activates specific detectors based on its state. The client device utilizes fingerprints, content search, and processing of sensed audio and video to identify and segment the incoming video content. To identify content the client performs a similarity search and correlation against stored video and audio sequences. The client performs content tracking and segmentation of content to enable a variety of applications. For example, applications may be provided for the purpose of separating content from advertisements and monitoring of advertisements, in order to identify and separate out new advertisements. Also, applications may be provided to accurately track content and to identify, for example, advertisements and promotions accurately in time to enable time-aligned services.
  • The method is used on central server for archiving and monitoring applications, and on the remote clients, such as smart TVs, tablets, computers, smart phones, and the like, for time aligned and monitoring applications.
  • The method avoids reliance on logo and black frames detection, and uses other detectors and features to segment broadcast video. While logo detection is used in methods such as tracking a known content or narrowing a query, to segment video the reliance on logo detection is reduced. The client performs content tracking and segmentation of content to enable applications for separating content from advertisements and monitoring of advertisements, quickly identifying and separating out new advertisements, or determining more accurate time identification of content for time-aligned services.
  • A method, as shown in FIG. 3 and described in more detail below, uses classification and state based segmentation that is effective for live broadcast content to identify content, advertisements and promos quickly.
  • In FIG. 2A, the incoming video 201 is processed at step 203 to generate fingerprints for the video. The terms fingerprint and signature may be used interchangeably. The step 203 also generates reports using audio and video analysis. Step 204 performs, in parallel with step 203, logo detection, identification and tracking. At step 205, a search is performed on the database of all collected content and advertisements, to identify the content and the time location of the content. Configurations for implementation of step 205 can vary based on the device performing this function. Examples of devices performing these operations are smart TVs, tablets, and smart phones or central server. As a result of the search, an initial match is detected and evaluated in step 206. At step 207, the match is verified using more information and signatures such as additional fingerprints, logo information, color descriptors, scene changes. If there is no match, the process 200 returns to steps 203 and then 205 to identify content. If there is a match, the process 200 proceeds to step 208. At step 208, video frame transformation and audio transformation are calculated. The step 208 detects the transformation of the reference content to the content on the client. Transformations possible are cropping of video frames, zooming in, change in image ratios along x or y axis, and image brightness and contrast changes. Similarly changes can occur for the audio, and these can be pitch change, frequency response changes. The presence of these changes increases the compute effort to fingerprint and detect the reference content. By detecting the type of transformation of the original content to the content played at a client site, most of the negative impact of these transformations may be reduced thereby reducing the computational effort to identify the media content and increasing accuracy of the identification. Thus, the transforms are utilized to optimizes the compute cycles used to generate fingerprints. Now step 208 can use the transformed query video and audio, so as to represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint. For example, by detecting client content that has been stretched 20% on the y-axis, that information is taken into account in the generation of fingerprint to obtain a more accurate representation of the client content. Other distortions, such as may affect x position, y position, scale in video and peak coefficient information, frequency entropy in audio, and other such distortions, may be detected and likewise taken into account to improve accuracy. Step 208, generates the fingerprints and reports to track the monitored content with reference to the original reference (via fingerprints). Query content is transformed to represent the original aligned content. If the query video is cropped, than the query transform considers this, so that the generated fingerprint better represents the original. At step 209, a correlation between the generated transform fingerprints and the reference is performed to achieve accurate matching between monitored content and reference. If the tracked content no longer matches reference, it is considered a divergence and is detected at step 215. If divergence is detected, the control loops back to 203, and 204 for fingerprinting and logo processing, and identifying content. Since the previous content no longer matches, at step 203 the content is identified again, and this next time may match a different program or video.
  • At step 220, a state based classifier takes in all the reports from the fingerprint tools, and the database search, detected logos and other information generated in steps 203, 204, 205 and 209. The classifier analyzes these reports and generates a higher level of classification: such as advertisements, identified content, and promotions and a finer level of segmentation which identifies individual advertisements, and chapters of the content. FIG. 3 illustrates a state based classifier described in more detail below. Promotions are content that advertises video programs yet to be broadcast, or other content that is not an advertisement and not a video program. The results of segmentation process 200 include the following: (i) separate content and index of the content for archival purposes, (ii) information to identify and monitor advertisements, (iii) information to identify new advertising, (iv) information to classify a video during live broadcast to reduce cost of content tracking and monitoring, and (v) information to classify live content for synchronous time aligned services.
  • In an alternate embodiment, the classification of video can be performed using a graph structure, where each node is a point in time of the video content, and the arcs between the nodes are similarity scores between the nodes, while other information (such as logo detect, audio turns, scene change, database search) are used to generate the classification into advertisement or content or other types.
  • FIG. 2B illustrates a method 245 to detect a frame alignment mapping between a query and a reference video. The process 245 is operable to run on a client device or a supporting server. By detecting frame alignment between client video frames and reference video frames, the system efficiency is improved. The detected frame alignment can be used to reduce fingerprint compute cost since the client video frames can now be aligned with reference frames. By detecting the alignment between the frames, any distortion or disturbance between the reference and query fingerprint can be avoided resulting in high matching accuracy or reducing the fingerprints to be compared. The transformed query video and audio represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint.
  • By detecting the alignment between the frames the distortion (disturbance) between the reference and query fingerprint can be avoided resulting in high matching accuracy or reducing the fingerprints to be compared.
  • Detecting frame alignment enables applications that perform overlays of text and specific images without unintended effects since the overlay can be selected to be at appropriate locations on video screen image. Applications such as multi-language broadcast, advertising, subtitles, or 3rd party content overlays can be performed accurately.
  • Detecting frame alignment enables applications where text and specific image overlays can be performed without unintended effects since the overlay can be selected to be at appropriate and accurate locations in position and in time on video screen image. Applications such as multi-language broadcast, advertising, subtitles, or 3rd party content overlay can be performed correctly.
  • For embedded applications involving time-aligned applications, the detected video and audio transforms on the consumer device are used on the consumer device to reduce the cost of fingerprinting by reducing content variation while tracking the identified content. The transformed query video and audio represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint.
  • The video content is received at step 250. Next at step 251, video signatures are generated that include the detected or selected region's location or equivalent coordinate information and scale. A region may be determined, and thereby selected, and a location of the determined region provided. In one aspect, frame alignment is performed using scale or size of a selected region and x and y coordinates of a fingerprint descriptor center. At step 253, a search and content match process is performed to detect for a match between the query which is the incoming video received at step 250 and reference database. For example, the reference database may be located on a central server or at a client device. At step 255, the same content match process evaluates the confidence of the match. One method of estimating confidence of the match includes using a geometric correlation between the scale and x, y coordinates of the fingerprints. If a reliable, as determined by the confidence, match is not detected, the query is generated once again returning to step 251 for signature generation. If a reliable match is not found, another search is processed to in an attempt to obtain a match with good confidence before making assumptions about video frame alignment. The intent is to have as correct a match as possible, before making an estimate of the geometric alignment between the query and reference video frames. If a reliable match is detected, the process 245 proceeds to step 257. Step 257 involves calculating a scale ratio on each X-axis and Y-axis between 2 pairs of matching query and reference signatures by obtaining the geometric x, and y coordinate difference between the query signature pair and the reference signature pair along each axis. With video fingerprinting, regions of a video frame are selected for a fingerprint. The center of each region fingerprinted can be described with x, y coordinate. The size of the region is described using the scale value.
  • The scale ratio along the X-axis for 2 pairs of matching signatures is calculated as:

  • Xscale Ratio=(QA(x)−QB(x))/(RA(x)−RB(x))  eqn(1)
  • where QA(x) is x coordinate of Query, RA(x) is x coordinates of Reference of a matching signature pair A; and similarly for signature pair B.
  • In another embodiment, an additional condition can be used to select or prefer pairs of fingerprints that agree geometrically, and in this alternate embodiment only pairs which have center coordinate difference greater than a threshold are considered. The scale ratio on the x-axis is denoted as Sx, and that on the y axis as Sy.
  • Returning to step 258, the average scale ratios ASx, ASy on each axis are calculated. Outliers, are those pairs that have high geometric alignment error and are eliminated while calculating this average. At step 259, the pixel offset between the query and the reference video frames are calculated. For each matching pair the pixel offset is calculated with the following equation

  • XOffset=QA(x)/ASx−RA(x)  eqn (2)
  • where QA(x) and RA(x) are the x coordinates for a matching signature pair, and ASx is the average scale ratio on the x axis as calculated in equation (1). The evaluated frame alignment information between query and reference video is reported at step 260. The reported frame alignment information includes pixel or equivalent offsets along x-axis and y-axis, and the scale ratios on the x-axis and y-axis. With this information, it is possible to map the location of the query video frame to exact pixel locations on the reference video. The frame alignment information is used to generate transformed query video and audio fingerprints represent original video and audio fingerprints more closely, and thus more likely to better match the reference fingerprint. Since the query signatures generated using frame alignment more accurately represent the reference, fewer query signatures may be used to determine a continued match of the incoming video broadcast at the consumer device with the reference. The detected frame alignment is also very useful to align any overlay text or image in various applications that are described further.
  • FIG. 2C illustrates a method 270 of video segmentation using graph based partitioning. A unique embodiment of a graph segmentation method is use of edge weights defined to represent the similarity between content nodes that represent a unique content time. Each node is associated with a likely class based on processed reports, including content search results, blank video and scene change reports, and audio silence and turn reports. The graph segmentation method is able to combine both local content similarity with the global similarities in the program content, to assist in segmentation. The graph segmentation method uses content matching results with a large database to assist in classification and segmentation.
  • This method 270 combines both local in time similarity to global similarities and in the program content using large content databases to assist in segmentation. A database search is performed to indicate what kind of content is being evaluated. If the content is an advertisement, it is likely to match an advertisement from the main database. If the content is an actual video program, it may at least match an opening sequence or closing credits if the program is a continuation of an existing TV or video program series.
  • In additional to evaluating the audio and video content properties, content search is utilized on acquired databases of content, advertisements, promotions, opening sequences, closing credits to assist in accurate segmentation of the video content. Each node, as defined below, is given a class score based on audio, video processed reports and database search.
  • A graph G (V,E) consists of nodes vi εV, and each (vi, vj)εE. Each node vi is selected at audio and video turns and specific time intervals. Each edge (vi, vj) connects certain pairs of nodes, usually neighboring time nodes, and neighboring significant nodes that are unique because of an audio or video scene change or at boundaries of content matching sequences. A node represents a point in time in the video content. The node at the selected time holds relevant information including audio signatures, video signatures, type of event, such as audio silence, an audio turn, a scene change, or just a sample.
  • A weight is associated with each edge that is based on the similarity between the nodes.
  • Multiple methods are used to determine the similarity between nodes. When an audio turn or video scene change is present between two nodes, the nodes are more likely to be dissimilar so a negative value will be added to the edge weight. If the content contained at the nodes match the same reference content then a positive value is added to the edge weight since they are likely to belong to same content, but if the nodes belong to different content than a negative value is added to the weight of the edge. Comparing signatures and features from audio and video between the 2 nodes, as described in more detail below with regard to step 274 of FIG. 2C, is another method to calculate the similarity between the 2 nodes. Logo identification is a advantageous method used in television broadcast to classify content and the presence of a similar logo at two nodes, or a difference in logo status between 2 nodes is used to calculate the similarity score used for the edge weight.
  • Once a graph is defined with edges having similarity weights, the graph can be partitioned using any of the well known graph partitioning methods. One approach for graph segmentation is a method using pairwise region comparison, as described in “Efficient Graph-Based Image Segmentation”, by P. Felzenszwalb and D. Huttenlocher, Int'l J. Computer Vision, vol. 59, no. 2, pp. 167-181, 2004).
  • In an embodiment of the present invention, in order to partition a graph into classified segments such as advertisement, promotions, and content, additional edge weights are added based on the likely classification. The classified content can be further segmented into individual advertisements, or content chapters.
  • In an embodiment of the present invention a graph cut method using pairwise region comparison calculates an edge weight between 2 regions. A low cost implementation of the edge weight may be the highest similarity score between nodes in each region, while in more robust implementations, an edge weight would be calculated between the 2 entire regions. The 2 regions can be merged if the edge similarity is greater than the average or median (or another function) of the 2 regions.
  • Returning to FIG. 2C and method 270, the input multi-media content is received at step 271. At step 272, the received multi-media content is used to generate audio and video fingerprints and detection of audio turns, audio silence, and video scene changes, and video blanks. Logo identification is performed at step 273. At step 275, a database search is performed using the fingerprints against a database of advertisements, promotions, and content information including opening scenes or full content. At step 274 frame similarity between different time locations of content is performed. The time locations can correspond to nodes in a constructed graph, and the time points are selected because they are significant, such as a beginning period of silence, blank video frames, an audio turn, a video scene change, or determining a content match boundary.
  • The results from step 275, database search, and from step 273, logo identification, and 272, audio and video event reports, and content similarity reports from step 274, are input to a graph analysis and partitioning at step 280. Graph segmentation is also performed at step 280 to generate a classified video segmentation such as advertisement, promos, and content. A finer segmentation can also be performed to identify individual advertisements, and individual content chapters. These reports are generated at step 281.
  • In another embodiment, new advertisements are identified using the video segmentation by graph or classification methods. Segmented advertisements that partially matched or did not match previous advertisements are identified and are considered candidates for new advertisements. With this method, new advertisements can be identified efficiently and quickly while monitoring hundreds of channels with continuous broadcast.
  • FIG. 3 illustrates a state transition diagram 300 of an embodiment of the state based classifier. Initially the content is unknown, so we are in an initial, unclassified state 301. The inputs at state 301, include the audio and video analysis reports: audio silence, audio turn, scene change, video blank/black frame, and initial search results are processed by the state classifier. If the classifier detects a particular type of content, it causes state transition into states such “likely advertisement state” as in state 303; when its detects content that is classified as such, likely broadcast content state 305, and likely movie in state 308. In each of the “likely” states, the inputs including further search results, and fingerprint reports are fed to the classifier. If the classifier confirms the detection of a particular category of content, the state transitions to a “confirmed” advertisement or confirmed program and other confirmed states as in states 304 and 306. If the content is unknown and meets certain rules, then a query is made to search server at state 302.
  • As discussed earlier in an embodiment, the fingerprinting and analytics have different modes, and their compute cost is reduced in the “likely” states. Analytics are methods to extract information, such as logos, scene changes, and the like. In the “confirmed” states, the audio and video analysis cost can be reduced even further till a change in state occurs.
  • FIG. 4A is an illustration of the data structures 400 to capture the video and audio analysis during fingerprinting, and search results for segmenting the video. The data structures hold data generated by audio analysis, video analysis, logo detect, content similarity and database search reports. The video report data structure holds the time of the event, the frameType, such as audio or video and transitionType, which may include blank, fade, black. The similarity search data structure holds the time, the time difference offset between compared frames, and the similarity score. The audio report data structure holds the time of the event, the audio event, such as an audio turn, silence and audioLevel. The search result data structure holds the time, length of match, number of unique matching programs, total number of matches, and the match type, including classification of database that is searched on. The logo data structure is used to identify the matching logo and holds the time, whether logo detected and the logo ID. The data structures are used to classify and segment the multimedia program by adding a classification weight or score at each time node. When graph based segmentation is used, the data structures are utilized to generate node classification and edge weights.
  • FIG. 4B describes the relevant data structures 450 for non-recorded broadcast which are used to segment streaming video. The data structures hold data generated by the audio analysis, video analysis, logo detect, and content similarity reports. To support live broadcast segmentation, an additional frame similarity search report is added and this holds the results of frame level content matching with client databases of opening sequences, closing credits, specific images and sound bites. A frame similarity search is performed only on detected frames at the client, and these are triggered by events such as scene change, audio turn, silence or video blanks, fading. An additional frame similarity data structure holds event time, type of match, match score.
  • In an embodiment for segmenting video content, a graph connecting different time points within the video is generated. The data extracted from the video and audio analysis reports and stored in the data structures are used to generate similarity scores regarding the similarity of different sections of the video. Similarity scores are also generated that represent a likelihood of content or advertisement at a particular time. Such a prediction is based on past history of previous data base content searches and the previous content of the same video. The scores are mapped onto a graph structure and the graph is segmented into sections representing content and advertisement classes, as well as into individual advertisement and content chapters.
  • FIG. 5A illustrates a flowchart 500 to identify and track content and perform specific time-aligned application such as advertisement replacement. This includes a method of fast and accurate content segmentation, and content identification.
  • A video segmentation method utilizes graph partitioning or a classifier to segment or classify sections of the video content. The inputs for the classifier or graph partitioning technique are video and audio analytic reports in time, content similarity in time, and content match reports for advertisements and content with matching time information. The video and audio analytics include detection of video scene changes, including black frames, audio silence detection and audio turns, and number of active audio channels.
  • At step 502, a content query on a search server is performed to identify current video content playing on a selected client. The search and content match method at step 502 identifies video and audio transforms on content played out at the client, in addition to identifying the content. The detected audio and video transforms at the client include detection of the frame mapping between reference and query video frames. FIG. 2B illustrates a method for detecting frame alignment between query and reference video.
  • At step 503, the client now performs a video and audio transform, as required to better align the client fingerprints to the reference and then generates query fingerprints. In one example, a detected transform for frame alignment is performed on query content while generating fingerprints. This step enables low compute cost and better tracking of client content to the reference in upcoming processing steps. At step 504, scene change detection is utilized on the client content to select frames to perform fingerprinting and correlate with the reference. Next, the fingerprints are used to track the client content to the reference. At step 504, client content is tracked with reference to the expected broadcast and that includes time sections where the content being played is not known such as unidentified advertisements. Processing is optimized if the expected time slot for the advertisement or content to be tracked or replaced is known. If the exact location is unknown, as may be the case with a live broadcast or a non-recorded linear broadcast, verification processing is required on all possible transitions. At step 505, on a scene change or audio transition, a check is made on whether the sampled incoming content is an appropriate transition after which the expected content is likely to play out. At step 506, the incoming content in the client buffer which may not necessarily be played out is verified with multiple fingerprint methods and determined whether any matches are found with expected content. If the tracked advertisement or content is associated with some time-aligned service, that action is performed at step 507.
  • FIG. 5B illustrates a flowchart 510 for performing advertisement replacement for a specific pre-selected and identified advertisement slot. This method is advantageous and requires specific information of an advertisement or content that needs to be replaced. One embodiment needs the specific time when the advertisement is expected to occur. Embodiments of the invention include the transition 512 to describe the content to be replaced, and the time information (known or unknown), step 516 to verify instance of occurrence via video frame, audio, or watermarks in audio or video, and the step 517 to track the original content that is incoming while the replacement content is being displayed or played at client site.
  • At step 511, the time location of the advertisement or specific information to be overlaid or displayed is defined for multi-media content. At step 513, the content is sampled or sensed. At step 514, a content query is performed on a search server to identify current video content playing on the client. At step 514, the client also tracks the input content fingerprints with the reference. Processing may be optimized if the expected time slot for the advertisement or content to be tracked or replaced is known. If the exact location is unknown, as may be the case with a live broadcast or a non-recorded linear broadcast, verification processing is required on all possible transitions. At step 515, on a scene change or audio transition, a check is made at to whether the sampled incoming content is an appropriate transition after which the expected content is likely to play out. At step 516, the incoming content in the client buffer which may not necessarily be played out is verified with multiple fingerprint methods and determined whether any matches are found with expected content. If the tracked advertisement or content is associated with some time-aligned service, that action is performed quickly in step 517.
  • FIG. 5C illustrates a flowchart 520 to publish content and offer user interaction modes for time-aligned applications. The method of content publishing with associated content data, associated content links, content control menus, and user control menus enable an entire ecosystem of content publishing. The advantageous methods of content publishing as described herein offers a user choice of the content to be presented and supports efficient and engaging user interaction. Time-aligned services can be consumed on a separate device or screen without disturbing a first screen, such as a primary TV display. In many cases, the user may not have control or may not want to exert control especially when other viewers are watching. The methods for time-aligned services enable each user to have a private selected experience of viewing a program along with additional specific information, such player statistics, or dance steps, or local costume designers of actor apparel. The user choices can be remembered and can be different for each program. The same user may want player statistics, game scores, and standings for an NBA game, but may also want to learn dance steps while watching a network TV show “Dancing with the Stars”. While watching a streaming movie, the user may want to control the first screen and place it into “family viewing mode”. Such control would be possible by restricting non-family rated pieces and fetching family friendly replacements.
  • The reference content is processed initially to generate sequentially or in parallel fingerprints and associated data as shown in steps 522 through 525. At Step 522, fingerprints for the content are generated and stored with the timestamps in memory 521. At step 524, content information is defined, at step 525 the content control options are defined, and at step 526 the user menus to be offered are defined. At step 526, time based behavior for the metadata is defined which includes content information, content control and user menus. The memory associated with access step 521 stores the information from steps 522, 524, 523, 526, and 527. At step 528, the content is sampled or sensed. At step 530, a content query is initiated on the client device to be performed on a search server to identify current video content playing on the client, when the content is not found on client. In support of step 530, part of the database is on the client which is searched first. Also, at step 530, the client also tracks the input content fingerprints with the reference. At step 532, the content information determined from metadata and content metadata links is displayed. At step 531, the user is offered control for content viewed on one or more display screens. For example, a display screen selection, display format selection, content type selection, and time scrolling may be offered among other control options. At step 527, content, fingerprints, and control metadata are downloaded at request of the tracking function at step 530. Further, at step 530, if tracked content continues matching updated content, display and control options are provided to user. If content does not track, segmentation is used to decide the content type and also decide whether to keep looking for local match, or send a new query to search sever. At step 532, the process returns to the content identification and content tracking step 530.
  • FIG. 6 illustrates a method 600 to segment broadcast TV content, and provide time-aligned services. Step 601 performs fingerprinting and content analysis of broadcast content. Step 601 transmits the fingerprints of each recording such as a TV program as a query to the content search at step 603. A content search server at step 603 returns a search report containing the detected match data to step 605. At step 604, the content search server transfers the information about the video frame alignment between the reference and query to the fingerprint generator. Similarly the content search server sends information about the detected audio transforms between reference and query. Thus for further fingerprinting, the fingerprint generator can use light weight processes with much lower compute cost, since the detected transforms can be applied to reduce the similarity error of the generated signatures. At step 609, the time schedule of the program, ad slots and required action are retrieved when the content is identified. At step 605, audio and video analysis reports are received from the fingerprinting step 601. At step 605, the search, audio, video analysis, detected logo information, and similarity reports are received and video segmentation is performed. At step 605, the content is tracked until the expected time slot of action. At the expected event time, the incoming content is verified if it is exactly the same as the expected one. This check is performed in step 611. At step 607, video and audio analysis is performed to locate a likely location on the video frame and identifies the location where information can be inserted. This functionality can be used to enhance the program watched, and to overlay messages or advertisements. The video analysis at step 607 detects space on the video frame that is relatively free or smooth. Video frame alignment at step 604 provides information that describes the relationship between the client video frame and the reference video frame. Step 613 executes the overlay, insertion or advertisement replacement onto the streaming broadcast content.
  • FIG. 7A illustrates a flowchart 700 to perform fast and accurate content segmentation on broadcast non-recorded content, and overlay content information on one or more selected display screens. At step 702, a content analysis is performed and then a query and search operation executes on a database on the client device and then if no match is found on the local client device, a query is sent to a central server (cloud) to identify current video. Since the goal is to detect non-recorded television broadcast segments, the process cannot rely only on fingerprints since none exist for a non-recorded broadcast segment. The logo of the channel, the program logos, and opening sequences of the program are used to assist in identifying the content. At step 703, the client performs continued tracking by verifying the program details from text and logos in the content. At step 704, scene change detection is utilized on the client content to select frames to perform fingerprinting and correlate to generate reports to support segmentation. At step 704, the client content that includes time sections is also tracked, where the content being played does not even have the logo information. The process 700 is able to do such tracking by using similarity information from audio and video which can identify the likelihood of the same video. At step 705, on a scene change or audio transition, a determination is made whether this transition is a possible content change. If the transition is not a possible content change, the process 700 returns to step 704. If the transition is a possible content change, the process 700 proceeds to step 706. At step 705, an “expected transition at a given time” is checked for, since it is intended to replace a specific ad which is expected at a given time for typical TV program. For live broadcast such as NBA basketball, “the expected transition” may occur at any time and is checked for accordingly. At step 706, with multiple fingerprint methods verifies whether the expected content matches the content in the client buffer which may not necessarily be played out. If the advertisement or content to be replaced needs to be associated with a time-aligned service, that action is performed quickly in step 707. The processing step 706 communicates the detected frame alignment of the query video, and information about space usage of the video frame via step 708. The display information, at step 708, enables the optimal overlay of user service or advertisement information.
  • Some examples of the time aligned services that are provided are listed below.
      • a. Audio track for specific use such as languages or overlay commentary.
      • b. Sign language track overlay.
      • c. Personalized services overlay.
      • d. Overlay or replace content based on certain event detection based on personalized services programming rules.
      • e. Video overlay of local advertisements.
      • f. Video overlay of local activities.
      • g. Advertisement replacement.
      • h. Partial advertisement replacement.
      • i. Providing time-aligned services on another screen or personal phone.
  • FIG. 7B illustrates a flowchart 710 to offer time-aligned services of enhanced viewing experience on one or more selected display screens utilizing content metadata or additional content sources.
  • Another embodiment of the invention addresses content identification and tracking, and segmentation that enables new time-aligned services to the user. Another embodiment of the invention addresses a method of content publishing with associated content data, associated content links, and content control menus, supported by intuitive user control menus. Such content publishing enables an entire ecosystem of content publishing. An ecosystem of time aligned (synchronous) content publishing enables the provider to distribute synchronous streams of information that can be consumed on different user devices such as second screens. The synchronous streams can be used to replace original content with targeted ads, subtitles, audience rating or the like when desired. The ecosystem of content publishing including generating synchronous content streams, associated data, content control and user control and display menus. Thus new methods of content publishing, content consumption and user interaction are enabled. For example, the time-aligned services can be consumed on a separate device or screen without disturbing a main display screen. In current TV and video playback cases a user may not have control or may not want to exert control especially when other viewers are watching. The methods for time-aligned services enable each user to have a private selected experience of viewing a program along with additional specific information such player statistics, or dance steps, or local costume designers of actor apparel.
  • The reference content is processed initially to generate fingerprints and associated data and content streams at step 712. To enable content publishing for second screen applications, additional information must be generated and linked at the servers. Fingerprints and watermarks in content are used to identify content at client. For each broadcast content, additional content choices can be created such as an alternative language, such as Spanish audio stream and Spanish text overlay for screen, sports statistics per event in the sports game, bio, or action information during a prime time TV program. Links to such content or metadata associated with the content for the additional information may be stored at servers along with the reference fingerprints, if required. To enable a rich user experience, menus for user control of information, display, and content selection are provided to the users.
  • At step 714, 3rd content information or streams are provided. At step 711, the party content is sampled or sensed. At step 713, a content query is performed, for example on a search server, to identify current video content playing on the client. At step 715, the tracking function requests further download of fingerprints and content, control metadata. At step 716, the client tracks the input content fingerprints with the reference. Also at step 716, if tracked content continues matching updated content, display and control options are provided to user. At step 717, a determination is made if the content at the transition is expected content. If the expected content is found at the transition, then further actions and information transfer for next actions is performed by steps 720, 721, 722; and content continues to be tracked at step 716. If content does not track, segmentation is used to decide the content type and decide whether to keep looking for local match, or send a new query to a search sever. If the sensed or input content stops tracking the reference, the process 710 continues to the content identification step, 713. At step 720, the content information from the 3rd party metadata and content metadata links is displayed. At step 721, the user is offered control for content viewed on one or more display screens, including choices for display screen and format selection, content type selection. Time scrolling selection is offered at step 722.
  • FIG. 8 illustrates a method 800 to perform efficient broadcast monitoring on clients using video segmentation and a central search system for content matching. Segmentation is utilized to improve accuracy and bring scale efficiency to advantageous time-aligned applications.
  • An embodiment of the invention is a method that uses the current identification state to selectively invoke specific feature detectors or descriptors, thus optimizing the memory and compute resources required on the remote client. The invoked feature detectors or descriptors are then used in performing a search to obtain content match or track the content. This method is particularly useful when supporting many clients making large scale deployment economical, and reducing the compute loads on the remote client devices. With reduced compute loads, the client devices are capable to do user friendly tasks such as fetching and displaying content and responding to user interactions.
  • Another embodiment of the invention is a technique for time-aligned services identification of content and for tracking incoming or sensed content which provides a stored content sequence that may be used for detection. In the tracking mode, a correlation is performed at scene changes and audio turns, to check and verify that the incoming content remains similar to the expected program content. This method can improve the accuracy of content tracking while reducing the computation cost. The feature to track content more intelligently using scene change and audio turns also enables delivery of time-aligned applications for live broadcast content where pre-recorded fingerprints are not available.
  • Techniques for efficient content monitoring and audience measurement include tracking of a logo, a program logo, and other types of logos and scene change markers which are used to reduce client computation and fingerprint processing bandwidth. Computation is reduced by electing to do fingerprinting in conditions where it is likely that the content has changed due to user or broadcast network such as scene change or audio turns. Similarly bandwidth is reduced by sending fingerprints at significant events or at a lower sampling rate once content has been identified and is being tracked.
  • An embodiment for time-aligned services identifies content and tracks incoming or sensed content to stored content sequences. In the tracking mode, a correlation is performed at scene changes and audio turns to check and verify that the incoming content remains similar to the expected program content. This method can improve the accuracy of content tracking while reducing the computation cost. The method to track content more intelligently using scene change and audio turns also enables delivery of time-aligned applications for live broadcast content where pre-recorded fingerprints are not available.
  • At step 802, a logo detection and identification is performed on the incoming broadcast video input. Next, at step 803, the broadcast video is identified and classified on a client device using any of the following methods:
  • (1) generating audio and video signatures, and searching on stored opening sequences of programs.
  • (2) extracting text and program logos or program specific logos, such as a team's name, from the videos.
  • At step 804, after identifying the broadcast incoming video content, critical relevant information of an event is extracted from played audio and video utilizing available information such as an electronic program guide (EPG) or simply a PG. At step 805, a check is made as to whether the classified and identified content is among the channels and programs that need to be monitored. At step 806, a determination is made whether additional information is required at the client. If so, at step 807, the query is submitted, including detected signatures, text, logos, detected channel and programs, to the search servers which accurately identify the content.
  • The efficiency of broadcast monitoring is improved by deriving information from video segmentation. Queries from monitored clients can be limited to a particular class of database, based on an identified channel or program. Video segmentation classifies commercials or promos being played, and the queries to the search server can be avoided if commercials for some or all programs do not need to be monitored. Video segmentation methods for pre-recorded and live broadcast content are described in FIGS. 2A, 2C, 3, 4A, 4B and applications in FIGS. 5A, 5B, 5C, 6, 7A, and 7B. If content being played at client site is classified or identified as an advertisement, the client agent can avoid a query to server when only content is being monitored.
  • Learning rules to identify new content are used to improve efficiency of the search system. If a particular user watches or plays popular video games, these can be identified by the remote client based on a set of rules about the content playing. The set of rules about the content played by user can include extracted logos, text, video frame color and interest region based fingerprints and audio fingerprints. By identifying and classifying different content at the user, queries to the search servers can be limited to content of interest to the video monitoring application. In general, the same applies to any content based application that is active, as described further in the application.
  • The rules for segmentation are program specific, each program follows are particular format. Further each user typically watches a few programs. It is possible to learn the rules for segmentation for each user based on this information, and have high segmentation accuracy. In addition the basic video segmentation utilizes content search databases to segment known content, and uses inter frame and content similarity analysis to further assist segmentation, besides using other information, such as program and channel logos, content information and EPG which indicates the broadcast schedule.
  • FIG. 9A describes a method to provide a time-aligned service such as advertisement replacement during a video broadcast. Initially, the content playing is identified at step 902. The content playing can be identified using a variety of methods including:
  • (1) generating audio and video signatures and searching on stored opening sequences of programs for time aligned search on a local client device to minimize search latency or a central server.
  • (2) extracting text, program logos and program specific logos, such as a team's name by OCR (optical character recognition) from an image, or database description of detected logo.
  • (3) querying a search server reference database using audio, video signatures of content and other extracted information such as channel and program identification. The search server also detects the video frame mapping of consumer device video reference video and determines frame alignment information between a query and reference content found in the reference database.
  • At step 903, the actual time alignment of the content playing on the consumer device relative to broadcast content is identified and tracked. During search and correlation, the time alignment of reference and query content is determined. During tracking, the accuracy of the time aligned is further improved. Further, the incoming video is processed to detect scene changes and audio turns, and this is followed by video and audio processing such as at the detected scene change and audio turn. The video processing includes signature generation, logo detection and identification, using the generated data to track the identified content, to identify changes in the content, and to start content identification afresh. At step 904, the tasks of scene change, audio turn detection, and segmentation are performed on the incoming video. Methods of scene change may be used to detect a large change in the image and similarly detect an audio turn which is, for example, a large change in the audio sound. If the identified content is selected to have an advertisement replacement, such as possible localized and personalized advertising, then at step 905, the expected start time for the advertisement to be replaced is updated using a projected value of the match time. Thus step 905 includes projecting the time of the expected advertisement in terms of the current system clock time, while monitoring the segmentation changes within the incoming video content. Step 905 eventually identifies that a scene change event is within the target range of the start of the selected advertisement to be replaced. Then, step 905 invokes the verification step.
  • At step 906, the incoming content at the expected time range is verified to be the expected advertisement. Step 906 also recovers the frame alignment information between the query and reference video, and can regenerate the video frame or interpret the video analysis process appropriately. Step 906 also generates signatures on a small time sample of the incoming video beginning at the identified scene change event using audio and video fingerprinting. Next the generated signatures are compared against the beginning period of the original advertisement, specifically, such as the first video frame and associated audio of the original advertisement. If the incoming video agrees with the expected advertisement, the local video buffer display is switched to the new alternate advertisement. It is possible to perform a highly accurate check that the expected video frame is matches the incoming video first frame. Video fingerprinting which detects interest regions at interesting locations on the frame, and generates descriptors of region around the interest regions, and the associated coordinates and scale of the detected regions allow a very accurate check. Additionally the video time locations and selected transitions allow only very few possibilities for matching. Alternate methods of video fingerprinting using intensity and color information can also be used for highly accurate matching between reference and first video frame. If the comparison does not match, the process 900 returns to step 902. At step 907, the advertisement is switched and the video frame mapping is decided based on the detected frame mapping from step 902, and tracked through steps 903, 904 and 906. At step 902 when content is identified an initial mapping of the reference to query screen is performed. Further this mapping is refined and tracked through client operations 903, 904 and 906. In the meantime, at step 908, the incoming content is monitored and tracked to verify it matches the expected content. The advertisement replacement process continues until the incoming advertisement ends or defined substitution time ends and while incoming content, such as advertisements is the expected content. A replacement advertisement may be a partial replacement or an overlay. An appropriate delay buffer may be used to accommodate the delays for identifying and verifying advertisement for switching, so that user experience is not at all degraded.
  • A simpler method for advertisement replacement may be employed by the cable operators with the co-operation of the content owners. In this situation, the timing information of the original advertisement and the one to be replaced are available to the cable operator and at the end user set top box or equivalent. The problem remains how to deliver the alternative advertisement to the end user. This alternative advertisement can be delivered by internet or over certain channels on the cable. A similar approach can be assumed for over the air broadcast. However these solutions are not applicable when the assumptions are not valid such as when the content owners and cable operators do not agree on deploying this mode of advertisement replacement.
  • Thus we have described a method above that enables content broadcaster to customize their advertisement slots per user. The content owner creates the program schedule describing the time and location of advertisements and the rules for replacing specific advertisements. The rules for replacing specific advertisements are executed by the clients. The methods to do the steps for executing “content replacement” are described in FIGS. 7B, and 5C. FIG. 9A in contrast illustrates a content publishing method with user control; wherein the user can choose the type of synchronous content during the entire TV viewing experience.
  • FIG. 9B illustrates an example of publishing content as a time-aligned application. One aspect of this application is that the time aligned applications would synchronize to the show the user is currently watching on a big-screen TV using content identification technology. After the identification of the content, these applications would display some additional content which may include text, images, and video, links to supplemental on-line information, buttons for shopping, voting, or other actions, on the second screen. The contents to be displayed are synced with the main content and the content publisher would be able to specify the relationship between the content and the additional information displayed on the second screen.
  • FIG. 9B describes an exemplary example time aligned application that can be created using various services described in this application. In this simple application, the second screen device is shown displaying various menu options 952, 954, 956. The menu option 952 is associated with the content being currently displayed, while menu option 954 and 956 are associated with the content that is displayed in the past.
  • When a user clicks on any of the menu options 952,954, 956, the application displays more information 970 as shown in the figure. As the content on the main display screen progresses, menu option 952 is updated with a new menu option obtained through a request to the content identification service network. The menu options for previous events are pushed down forming a stack of the menu options. Such stacked menu options 952, 954, 956 can be scrolled using scrollbar 958. Menu options can have further menu options within themselves forming a chain of menu options.
    To implement these time aligned publishing the publisher needs to provide following information.
      • Time offsets to display menu options from the start of the show;
      • Associated content with the menu option that would be displayed when a menu option I activated by the user. The content can include displayable and interact-able content including but not limited to text, graphic, multimedia, actions.
  • FIG. 9C illustrates menu formats. For time aligned publishing, the menu choices can be provided in various formats. An example partial xml format is illustrated in FIG. 9C. This example shows two menu options. The first menu option is displayed 60 seconds after the program starts with a menu title “Welcome to the program”. The detailed content is specified under the tag “display_content” which for this menu is shown only for 240 seconds. After 240 second, this menu option is removed from the stack of the menu options. The second menu option is displayed after 300 seconds with a menu title “Know more about the program” and is displayed for 900 seconds.
  • FIG. 10 describes a method 1000 to provide a time-aligned service such as selecting a language of choice for a broadcast video program. Initially, the content playing is identified at step 1002. The content playing can be identified using a variety of methods including:
  • (1) generating audio and video signatures and searching on stored opening sequences of programs.
  • (2) extracting text, program logos and program specific logos, such as a team's name
  • (3) querying a search server using audio, video signatures of content and other extracted information such as channel and program identification.
  • (4) using a program guide to identify the content and performing content identification or an alignment operation between query and reference.
  • The search server also detects the video frame mapping of consumer device content query.
  • At step 1003, the actual time alignment of the content playing on the consumer device relative to broadcast content is identified. Further, the incoming video is processed to detect scene changes and audio turns, and this is followed by video and audio processing such as at the scene change or at an audio turn. The video processing includes signature generation, logo detection and identification, and these are used to track the identified content or to identify a change in the content and start content identification afresh. At step 1004, the tasks of scene change detection, audio turn detection, and segmenting the incoming video for processing are performed. At step 1006, the frame alignment information is recovered between the query and reference video. Then signatures of the incoming broadcast content and reference video are generated. The signatures are used to synchronize the incoming video time to the reference. The detected relationship time of an incoming video is used to align with the selected language customizations, the audio track and the text and/or video overlays over the original video. The continued updating of the detected time relationship between reference and current video can be performed by multiple methods including:
      • (1) Audio-video synchronization standardized signatures.
      • (2) Frame based video signatures and audio signatures.
      • (3) Interest region based video signatures and audio signatures.
      • (4) With audio turns and scene change timing.
      • (5) With audio turn and scene change along with audio and video frame information at the relevant time.
  • At step 1007, the selected audio tracks are switched and the text and video overlays are performed using the video frame mapping information from step 1002. The incoming video content is thus aligned in time and in video space through steps 1003, 1004 and 1006. In the meantime, at step 1008 the incoming content is monitored and tracked it in time with the expected content. At the same time, alignment between reference and current playout is updated at step 1006. A language customization application for audio substitution and video overlay continues while the incoming content is as expected. If the content stops tracking with expected content then control moves to step 1002.
  • A simpler method for selecting language of choice may be employed by the content owners with the co-operation of the cable operators. In this situation, the timing information of the broadcast content and the language customization data are available to the cable operator and at the end user set top box or equivalent. The problem remains how to deliver the alternative language customization data to the end user. This additional data can be delivered by internet or over certain channels on the cable. A similar approach can be assumed for over the air broadcast. However these solutions are not applicable when the assumptions are not valid such as when the content owners and the cable operators do not agree on this mode of deployment of multi-language choice service.
  • FIG. 11 describes another embodiment of a method 1100 to provide a time-aligned service, for selecting language of choice for a live non-recorded broadcast video program. The incoming video received at step 1101 is processed at step 1103 to identify the content using any of the following methods:
  • (1) generating audio and video signatures and searching locally or on server
  • (2) extracting text, program logos and program specific logos and a program guide to identify the content for detected program logo and verifying with extracted text.
  • At step 1103 time alignment information is maintained between query and reference. At step 1106, the detected incoming video's reference time is used to align with the selected language customizations. The audio track and the text and/or video overlays are added or overlaid at step 1107 and 1108, over the original video. The additional data to implement the language customizations, determined from step 1105, can be provided over the air or cable or internet. At step 1106, the video frame alignment is also optionally detected between the incoming video and the reference. The video frame alignment is detected using the known locations of logos, detected text between the client video and the reference video. Time alignment is performed by comparing scene change timings for audio and video content including text and logo changes. To provide the language customization service, the participation of the original content provider is necessary to generate the customization information simultaneously as the current content. Since both the original content and customization are generated together, crucial information to align both the original and client side playout can be generated via signatures, or via scene change, content change information with associated time. Since the broadcast content is live and not pre-recorded, querying to server cannot be used without a delay factor involved which can be upwards of 5 or more seconds. A solution which may be used transfers the information that enables time alignment of the language customization directly to the client. The client can thus detect the time alignment between the reference and the language customization data and stream. Earlier, at step 1105, the client extracts content alignment synchronization information such as text, logos, scene change, fingerprints from the incoming broadcast video input which can be over air or cable or internet.
  • At step 1107, the selected audio tracks are switched and text and video overlays are performed using the video frame mapping information from step 1106. And at step 1108, the text and video overlay for the selected language are overlaid on the video frames.
  • FIG. 12 illustrates a method 1200 to segment broadcast TV content using hybrid and adaptive fingerprint methods. In an embodiment for efficient content monitoring and audience measurement, tracking of logo, program logo, and other logos and scene change markers are used to reduce client computation and fingerprint transfer bandwidth. The computation and fingerprint bandwidth is reduced by electing to do fingerprinting in conditions where it is likely that the content has changed due to user or broadcast network action.
  • At step 1201, fingerprinting and content analysis is performed on broadcast content. At step 1201, the fingerprints of each program are transmitted as a query to the content search Server1, for a search operation at step 1202. The content search server1 returns the search report containing the detected match data to step 1204, to fingerprint step 1203, and to the segment/classifier step 1205. At step 1204, the content search server1 transfers the information about the frame alignment and the time alignment between the reference and query to the fingerprint generator2, step 1203. Subsequent content searches are sent to content search server2, step 1204. Thus for further fingerprinting, the fingerprint generator2 (step 1203) can use light weight processes with much lower compute cost, since the detected transforms, such as frame alignment and audio transform, can be applied to reduce the similarity error of the generated signatures. The segment/classifier step 1205 manages the incoming content, and controls (activates and disables) the time aligned service. Step 1205 includes the functionality of segmenting, classifying and predicting the time alignment of incoming video. The step 1205 also communicates the video frame alignment information, so that video overlays can be performed optimally. Step 1209, executes the video overlay, insertion or advertisement replacement onto the streaming broadcast content. Before any overlay can start the time alignment between the reference and incoming content, the incoming content is verified in step 1206. The verification step 1206 can use a variety of fingerprinting methods to generate signatures and correlate to verify the time alignment with the reference signatures. Step 1208 continues to perform more light weight verification, and content tracking; and trick mode detection on incoming content while the time aligned services are overlaid on the incoming broadcast video, by step 1209.
  • An embodiment is described that detects trick mode playout, and trick mode detection is necessary during execution of time-aligned services. Trick mode is defined as a digital video recorder (DVR) actions of fast forwarding or skipping sections or rewinding video content. Scene changes and audio turns that are detected are compared with the expected times, as these may be unaligned due to possible trick mode operations. Then, a verify operation of trick mode or other unexpected changes is performed and a graceful transition to normal video input is performed. The verify process for trick mode can be as simple as checking that audio and video content is not aligned to expected content's scene changes and audio turns. A more complex process employs comparison of fingerprints between the expected content and the current played out content. The verify process can be used for live broadcast where pre-recorded content is not available. However fingerprints of already played out live broadcast content can be stored locally or on a central server. These recorded fingerprints of non-pre recorded broadcast can be used to detect possible trick modes, such as rewind, and align with the correct time of video content being played out on the TV or other screens.
  • The above descriptions illustrate various methods to enable language customization including live broadcast TV. Below is another example that shows how a typical use case is supported with live broadcast TV using the invention described. A user is watching FIFA soccer matches on a TV using a cable subscription channel. The matches are in English language while the user prefers Portuguese language. The user performs rewind to watch some events and then performs the forward function till most current action is reached. The content playing on the TV is identified using content identification or logo identification, and text extraction. Continuous synchronization is enabled by performing correlation between the information coming via language customization and the information extracted from the incoming broadcast video. When a user rewinds, the scene change misalignment is detected quickly in about a second and the time alignment between the rewound content and reference is identified using signatures or with logo and text information. The same methods are applied for fast forward till current time is reached.
  • In an alternate embodiment, a trick mode is detected by performing logo detect processing and matching for trick mode overlay buttons on the video.
  • In an alternate embodiment the client stores a small search database of fingerprints that match opening sequences of programs. Additionally, the client stores a small database of logos, and program logos, and in certain cases specific logos of teams for sports programming. To detect dynamic logos, a set of rules about the dynamic logos are stored. These local databases are then utilized to identify content playing on a client, or utilized to make a likely guess about the match. To verify the “likely match” specific additional information is downloaded or queried with the central servers to support identification and segmentation. The additional information can be color descriptors, signatures of template videos, speaker voice models.
  • In another embodiment, the client learns and caches critical information about the popular channels and programs watched and the associated channel logos, program logos, program specific text and logos, and video frame layouts. This learning is used to optimize the cost and accuracy of content identification and segmentation. The above learning of video frame layouts for popular programs includes specific details such as text locations, color, logos or text locations within video frames such as team scores.
  • Additionally this learning ability to learn video frame layouts and opening sequences for popular content is utilized to significantly reduce the number of queries sent to the search server to identify content being played out on remote clients.
  • A learning engine is used to learn rules to best segment and identify content at each client by adding relevant and user specific sequences, video layouts, opening sequences and closing credits to the content databases. The learning engine also assists in creating the rules for identification of new programs and channels at the client device. The ability to learn new rules to identify content can significantly improve the efficiency of a content monitoring system, since identification at a client can prevent queries being sent that are avoidable and can target a search to appropriate search databases separate from the client device.
  • In another embodiment the rules learned at the client are communicated to the server and all the rules learned for content can be stored on the central servers, which enables classification and categorization and identification of the content.
  • It is understood that other embodiments of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein various embodiments of the invention are shown and described by way of the illustrations. As will be realized, the invention is capable of other and different embodiments and its several details are capable of modification in various other respects, all without departing from the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.

Claims (21)

1.-20. (canceled)
21. A method to provide content publishing services that are personalized for different user devices, the method comprising:
detecting a logo in an incoming video;
identifying the incoming video based on the identified logo;
collecting a first set of information regarding the identified incoming video that is personalized for use by a first user device;
collecting a second set of information regarding the identified incoming video that is personalized for use by a second user device; and
adjusting content publishing services for the first user device and for the second user device according to the personalized information received by each user device.
22. The method of claim 21 further comprising:
generating video signatures on initial frames of the incoming video; and
searching for the generated video signatures in opening sequences of programs stored in a reference database to further identify the incoming video.
23. The method of claim 21, wherein the incoming video is a live broadcast.
24. The method of claim 21 further comprising:
generating first time alignment information by calculating locations in position and in time for pixel offsets along an x-axis and along a y-axis between signatures generated for incoming video frames received on the first user device and corresponding matching signatures generated for an opening sequence of frames from a program found in a reference database, wherein the first time alignment information is used on the first user device to synchronize publishing content selected from the reference database with the incoming video frames.
25. The method of claim 21 further comprising:
generating second time alignment information by calculating locations in position and in time for pixel offsets along an x-axis and along a y-axis between signatures generated for incoming video frames received on the second user device and corresponding matching signatures generated for an opening sequence of frames from a program found in a reference database, wherein the second time alignment information is used on the second user device to synchronize publishing content selected from the reference database with the incoming video frames.
26. The method of claim 21, wherein the content publishing services for the first user device comprise a first language customization.
27. The method of claim 26, wherein first time alignment information is generated in the first user device to synchronize the first language customization with the incoming video.
28. The method of claim 21, wherein the content publishing services for the second user device further comprise a second language customization.
29. The method of claim 28, wherein second time alignment information is generated in the second user device to synchronize the second language customization with the incoming video.
30. A method to provide time aligned language presentations personalized for user devices, the method comprising:
identifying audio content and video content based on a logo detected in broadcast content of a broadcast program;
determining an audio time alignment of the audio content and a video time alignment of the video content on a first user device relative to the broadcast content;
synchronizing the audio content with the video content according to the audio time alignment and the video time alignment on the first user device;
substituting the synchronized audio content with a first selected language on the first user device; and
overlaying text and the logo with customized text and a customized logo in the first selected language in the synchronized video content on the first user device.
31. The method of claim 30 further comprising:
determining an audio time alignment of the audio content and a video time alignment of the video content on a second user device relative to the broadcast content;
synchronizing the audio content with the video content according to the audio time alignment and the video time alignment on the second user device;
substituting the synchronized audio content with a second selected language on the second user device; and
overlaying text and the logo with customized text and a customized logo in the second selected language in the synchronized video content on the second user device.
32. The method of claim 30 further comprising:
generating audio and video signatures on initial frames of the incoming video; and
searching for the generated audio and video signatures in opening sequences of programs stored in a reference database to further identify the audio content and the video content.
33. The method of claim 30 further comprising:
using a program guide to further identify the audio content and the video content.
34. The method of claim 30 further comprising:
generating first time alignment information by calculating locations in position and in time for pixel offsets along an x-axis and along a y-axis between signatures generated for frames of the broadcast content received on the first user device and corresponding matching signatures generated for an opening sequence of frames from a program found in a reference database, wherein the first time alignment information is used on the first user device to synchronize time aligned language presentations selected from the reference database with the incoming video frames.
35. The method of claim 30 further comprising:
identifying a logo change as an indicator of and a time of a scene change;
substituting the synchronized audio content with a selected language at the time of the scene change: and
overlaying text and the logo with customized text and a customized logo in the synchronized video content.
36. The method of claim 30 further comprising:
monitoring content defined by the detected logo and identified audio content and video content on a plurality of user devices; and
sending the monitored content to a server to determine audience measurements of contents watched on the plurality of user devices.
37. The method of claim 36, wherein the monitored content includes broadcast programs.
38. The method of claim 36, wherein the monitored content includes advertisements specified as having a content logo.
39. A computer readable non-transitory medium encoded with computer readable program data and code, the computer readable program data and code when executed perform a method to provide content publishing services that are personalized for different user devices, the method comprising:
detecting a logo in an incoming video;
identifying the incoming video based on the identified logo;
collecting a first set of information regarding the identified incoming video that is personalized for use by a first user device;
collecting a second set of information regarding the identified incoming video that is personalized for use by a second user device; and
adjusting content publishing services for the first user device and for the second user device according to the personalized information received by each user device.
40. The computer readable non-transitory medium method of claim 39 further comprising:
generating video signatures on initial frames of the incoming video; and
searching for the generated video signatures in opening sequences of programs stored in a reference database to further identify the incoming video.
US15/297,658 2008-06-18 2016-10-19 TV Content Segmentation, Categorization and Identification and Time-Aligned Applications Abandoned US20170201793A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/297,658 US20170201793A1 (en) 2008-06-18 2016-10-19 TV Content Segmentation, Categorization and Identification and Time-Aligned Applications

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US12/141,163 US8229227B2 (en) 2007-06-18 2008-06-18 Methods and apparatus for providing a scalable identification of digital video sequences
US14133709A 2009-06-18 2009-06-18
US12/772,566 US8195689B2 (en) 2009-06-10 2010-05-03 Media fingerprinting and identification system
US12/788,796 US8335786B2 (en) 2009-05-28 2010-05-27 Multi-media content identification using multi-level content signature correlation and fast similarity search
US42320510P 2010-12-15 2010-12-15
US13/102,479 US8655878B1 (en) 2010-05-06 2011-05-06 Scalable, adaptable, and manageable system for multimedia identification
US13/276,110 US8959108B2 (en) 2008-06-18 2011-10-18 Distributed and tiered architecture for content search and content monitoring
US13/327,359 US9510044B1 (en) 2008-06-18 2011-12-15 TV content segmentation, categorization and identification and time-aligned applications
US15/297,658 US20170201793A1 (en) 2008-06-18 2016-10-19 TV Content Segmentation, Categorization and Identification and Time-Aligned Applications

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/327,359 Continuation US9510044B1 (en) 2008-06-18 2011-12-15 TV content segmentation, categorization and identification and time-aligned applications

Publications (1)

Publication Number Publication Date
US20170201793A1 true US20170201793A1 (en) 2017-07-13

Family

ID=57352092

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/327,359 Active 2032-12-08 US9510044B1 (en) 2008-06-18 2011-12-15 TV content segmentation, categorization and identification and time-aligned applications
US15/297,658 Abandoned US20170201793A1 (en) 2008-06-18 2016-10-19 TV Content Segmentation, Categorization and Identification and Time-Aligned Applications

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/327,359 Active 2032-12-08 US9510044B1 (en) 2008-06-18 2011-12-15 TV content segmentation, categorization and identification and time-aligned applications

Country Status (1)

Country Link
US (2) US9510044B1 (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160337716A1 (en) * 2013-12-19 2016-11-17 Lg Electronics Inc. Broadcast transmitting device and operating method thereof, and broadcast receiving device and operating method thereof
US10108709B1 (en) * 2016-04-11 2018-10-23 Digital Reasoning Systems, Inc. Systems and methods for queryable graph representations of videos
CN109379155A (en) * 2018-11-24 2019-02-22 合肥龙泊信息科技有限公司 A kind of emergency broadcase system having teletext self-checking function
CN109492126A (en) * 2018-11-02 2019-03-19 廊坊市森淼春食用菌有限公司 A kind of intelligent interactive method and device
CN109600184A (en) * 2018-11-24 2019-04-09 六安富华智能信息科技有限公司 A kind of emergent broadcast terminal having teletext self-checking function
WO2019088853A1 (en) * 2017-11-03 2019-05-09 Klaps Limited Live audio replacement in a digital stream
WO2019094403A1 (en) * 2017-11-08 2019-05-16 Roku, Inc. Enhanced playback bar
WO2020105993A1 (en) * 2018-11-19 2020-05-28 Samsung Electronics Co., Ltd. Display apparatus, server, electronic apparatus and control methods thereof
WO2020160563A1 (en) * 2019-01-22 2020-08-06 MGM Resorts International Operations, Inc. Systems and methods for customizing and compositing a video feed at a client device
WO2020247840A1 (en) * 2019-06-07 2020-12-10 The Nielson Company (Us), Llc Content-modification system with overlay handling feature
WO2020257424A1 (en) * 2019-06-18 2020-12-24 The Nielsen Company (Us), Llc Content-modification system with determination of input-buffer switching delay feature
CN112204989A (en) * 2018-12-20 2021-01-08 海信视像科技股份有限公司 Broadcast signal receiving apparatus, advertisement replacing method, and advertisement replacing system
WO2021091171A1 (en) * 2019-11-06 2021-05-14 Samsung Electronics Co., Ltd. Electronic device and method for controlling the same, and storage medium
CN113177603A (en) * 2021-05-12 2021-07-27 中移智行网络科技有限公司 Training method of classification model, video classification method and related equipment
US11140328B2 (en) 2019-01-22 2021-10-05 Tempus Ex Machina, Inc. Systems and methods for partitioning a video feed to segment live player activity
CN113574901A (en) * 2019-03-15 2021-10-29 天时机械公司 System and method for customizing and compositing video feeds at a client device
US11172248B2 (en) * 2019-01-22 2021-11-09 Tempus Ex Machina, Inc. Systems and methods for customizing and compositing a video feed at a client device
US11184670B2 (en) 2018-12-18 2021-11-23 Samsung Electronics Co., Ltd. Display apparatus and control method thereof
US11190837B2 (en) 2018-06-25 2021-11-30 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
EP3920541A1 (en) * 2020-06-03 2021-12-08 Top Victory Investments Limited Method for obtaining television rating points for television channels
US11234060B2 (en) 2017-09-01 2022-01-25 Roku, Inc. Weave streaming content into a linear viewing experience
CN114503600A (en) * 2019-10-31 2022-05-13 六科股份有限公司 Content modification system with delay buffer feature
US20220215074A1 (en) * 2019-05-07 2022-07-07 The Nielsen Company (Us), Llc End-point media watermarking
US20220224968A1 (en) * 2019-04-28 2022-07-14 Huawei Technologies Co., Ltd. Screen Projection Method, Electronic Device, and System
US11418858B2 (en) 2017-09-01 2022-08-16 Roku, Inc. Interactive content when the secondary content is server stitched
US11475668B2 (en) 2020-10-09 2022-10-18 Bank Of America Corporation System and method for automatic video categorization
CN115309920A (en) * 2022-10-08 2022-11-08 国家广播电视总局信息中心 Audio and video management method and system based on fusion big data
US11514337B1 (en) * 2021-09-15 2022-11-29 Castle Global, Inc. Logo detection and processing data model
US11575962B2 (en) 2018-05-21 2023-02-07 Samsung Electronics Co., Ltd. Electronic device and content recognition information acquisition therefor
US11632598B2 (en) 2019-05-10 2023-04-18 Roku, Inc. Content-modification system with responsive transmission of reference fingerprint data feature
US11645866B2 (en) 2019-05-10 2023-05-09 Roku, Inc. Content-modification system with fingerprint data match and mismatch detection feature
US11653037B2 (en) 2019-05-10 2023-05-16 Roku, Inc. Content-modification system with responsive transmission of reference fingerprint data feature
EP3997651A4 (en) * 2019-07-09 2023-08-02 Hyphametrics, Inc. Cross-media measurement device and method
US11922600B2 (en) 2018-08-31 2024-03-05 Samsung Display Co., Ltd. Afterimage compensator, display device having the same, and method for driving display device

Families Citing this family (197)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US9756295B2 (en) * 2007-12-29 2017-09-05 International Business Machines Corporation Simultaneous recording of a live event and third party information
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9519772B2 (en) 2008-11-26 2016-12-13 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10977693B2 (en) 2008-11-26 2021-04-13 Free Stream Media Corp. Association of content identifier of audio-visual data with additional data through capture infrastructure
US10567823B2 (en) * 2008-11-26 2020-02-18 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US9154942B2 (en) 2008-11-26 2015-10-06 Free Stream Media Corp. Zero configuration communication between a browser and a networked media device
US9961388B2 (en) 2008-11-26 2018-05-01 David Harrison Exposure of public internet protocol addresses in an advertising exchange server to improve relevancy of advertisements
US9986279B2 (en) 2008-11-26 2018-05-29 Free Stream Media Corp. Discovery, access control, and communication with networked services
US8180891B1 (en) 2008-11-26 2012-05-15 Free Stream Media Corp. Discovery, access control, and communication with networked services from within a security sandbox
US10419541B2 (en) 2008-11-26 2019-09-17 Free Stream Media Corp. Remotely control devices over a network without authentication or registration
US10880340B2 (en) 2008-11-26 2020-12-29 Free Stream Media Corp. Relevancy improvement through targeting of information based on data gathered from a networked device associated with a security sandbox of a client device
US10334324B2 (en) 2008-11-26 2019-06-25 Free Stream Media Corp. Relevant advertisement generation based on a user operating a client device communicatively coupled with a networked media device
US10631068B2 (en) 2008-11-26 2020-04-21 Free Stream Media Corp. Content exposure attribution based on renderings of related content across multiple devices
US9190110B2 (en) 2009-05-12 2015-11-17 JBF Interlude 2009 LTD System and method for assembling a recorded composition
US10255566B2 (en) 2011-06-03 2019-04-09 Apple Inc. Generating and processing task items that represent tasks to perform
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US11232458B2 (en) 2010-02-17 2022-01-25 JBF Interlude 2009 LTD System and method for data mining within interactive multimedia
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
CA3089869C (en) 2011-04-11 2022-08-16 Evertz Microsystems Ltd. Methods and systems for network based video clip generation and management
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10469886B2 (en) 2012-04-06 2019-11-05 Minerva Networks, Inc. System and methods of synchronizing program reproduction on multiple geographically remote display systems
US10321192B2 (en) * 2012-04-06 2019-06-11 Tok.Tv Inc. System and methods of communicating between multiple geographically remote sites to enable a shared, social viewing experience
US10674191B2 (en) 2012-04-06 2020-06-02 Minerva Networks, Inc Systems and methods to remotely synchronize digital data
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9871842B2 (en) * 2012-12-08 2018-01-16 Evertz Microsystems Ltd. Methods and systems for network based video clip processing and management
US8713600B2 (en) * 2013-01-30 2014-04-29 Almondnet, Inc. User control of replacement television advertisements inserted by a smart television
EP2954514B1 (en) 2013-02-07 2021-03-31 Apple Inc. Voice trigger for a digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
WO2014200728A1 (en) 2013-06-09 2014-12-18 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10318579B2 (en) 2013-09-06 2019-06-11 Gracenote, Inc. Inserting information into playing content
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
GB2523311B (en) * 2014-02-17 2021-07-14 Grass Valley Ltd Method and apparatus for managing audio visual, audio or visual content
US9653115B2 (en) 2014-04-10 2017-05-16 JBF Interlude 2009 LTD Systems and methods for creating linear video from branched video
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10659851B2 (en) * 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
JP6043767B2 (en) * 2014-09-26 2016-12-14 株式会社アステム Program output device, auxiliary information management server, program and auxiliary information output method, and program
US9565456B2 (en) * 2014-09-29 2017-02-07 Spotify Ab System and method for commercial detection in digital media environments
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9792957B2 (en) 2014-10-08 2017-10-17 JBF Interlude 2009 LTD Systems and methods for dynamic video bookmarking
US11412276B2 (en) 2014-10-10 2022-08-09 JBF Interlude 2009 LTD Systems and methods for parallel track transitions
CN105791906A (en) * 2014-12-15 2016-07-20 深圳Tcl数字技术有限公司 Information pushing method and system
KR20160085076A (en) * 2015-01-07 2016-07-15 삼성전자주식회사 Method for determining broadcasting server for providing contents and electronic device for implementing the same
US10965965B2 (en) * 2015-03-06 2021-03-30 Arris Enterprises Llc Detecting of graphical objects to identify video demarcations
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
CN107431844A (en) * 2015-03-09 2017-12-01 瑞典爱立信有限公司 For providing method, system and the equipment of live data stream to content presenting device
US20160337691A1 (en) * 2015-05-12 2016-11-17 Adsparx USA Inc System and method for detecting streaming of advertisements that occur while streaming a media program
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10460765B2 (en) 2015-08-26 2019-10-29 JBF Interlude 2009 LTD Systems and methods for adaptive and responsive video
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US9465996B1 (en) 2015-09-15 2016-10-11 Echostar Technologies Llc Apparatus, systems and methods for control of media content event recording
US10075751B2 (en) * 2015-09-30 2018-09-11 Rovi Guides, Inc. Method and system for verifying scheduled media assets
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10123073B2 (en) 2015-12-16 2018-11-06 Gracenote, Inc. Dynamic video overlays
US11164548B2 (en) 2015-12-22 2021-11-02 JBF Interlude 2009 LTD Intelligent buffering of large-scale video
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US11012719B2 (en) * 2016-03-08 2021-05-18 DISH Technologies L.L.C. Apparatus, systems and methods for control of sporting event presentation based on viewer engagement
US10360905B1 (en) 2016-03-11 2019-07-23 Gracenote, Inc. Robust audio identification with interference cancellation
US11856271B2 (en) 2016-04-12 2023-12-26 JBF Interlude 2009 LTD Symbiotic interactive video
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
US10013614B2 (en) * 2016-06-29 2018-07-03 Google Llc Using an image matching system to improve the quality of service of a video matching system
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10448107B2 (en) * 2016-11-11 2019-10-15 Lg Electronics Inc. Display device
CN108124167A (en) * 2016-11-30 2018-06-05 阿里巴巴集团控股有限公司 A kind of play handling method, device and equipment
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10146100B2 (en) 2016-12-12 2018-12-04 Gracenote, Inc. Systems and methods to transform events and/or mood associated with playing media into lighting effects
US11044520B2 (en) 2016-12-29 2021-06-22 Telefonaktiebolaget Lm Ericsson (Publ) Handling of video segments in a video stream
US11050809B2 (en) * 2016-12-30 2021-06-29 JBF Interlude 2009 LTD Systems and methods for dynamic weighting of branched video paths
US9977990B1 (en) * 2017-01-08 2018-05-22 International Business Machines Corporation Cognitive method for visual classification of very similar planar objects
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10785116B1 (en) * 2017-01-12 2020-09-22 Electronic Arts Inc. Computer architecture for asset management and delivery
US10419794B2 (en) * 2017-03-17 2019-09-17 Rovi Guides, Inc. Systems and methods for synchronizing media asset playback from multiple sources
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770428A1 (en) 2017-05-12 2019-02-18 Apple Inc. Low-latency intelligent automated assistant
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK179560B1 (en) 2017-05-16 2019-02-18 Apple Inc. Far-field extension for digital assistant services
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10719715B2 (en) * 2017-06-07 2020-07-21 Silveredge Technologies Pvt. Ltd. Method and system for adaptively switching detection strategies for watermarked and non-watermarked real-time televised advertisements
EP3646197A1 (en) * 2017-06-30 2020-05-06 The Nielsen Company (US), LLC Frame certainty for automatic content recognition
US10803038B2 (en) * 2017-09-13 2020-10-13 The Nielsen Company (Us), Llc Cold matching by automatic content recognition
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10257578B1 (en) 2018-01-05 2019-04-09 JBF Interlude 2009 LTD Dynamic library display for interactive videos
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10419790B2 (en) * 2018-01-19 2019-09-17 Infinite Designs, LLC System and method for video curation
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11166054B2 (en) 2018-04-06 2021-11-02 The Nielsen Company (Us), Llc Methods and apparatus for identification of local commercial insertion opportunities
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US11601721B2 (en) 2018-06-04 2023-03-07 JBF Interlude 2009 LTD Interactive video dynamic adaptation and user profiling
US10623800B2 (en) 2018-07-16 2020-04-14 Gracenote, Inc. Dynamic control of fingerprinting rate to facilitate time-accurate revision of media content
US10489496B1 (en) * 2018-09-04 2019-11-26 Rovi Guides, Inc. Systems and methods for advertising within a subtitle of a media asset
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US10861482B2 (en) * 2018-10-12 2020-12-08 Avid Technology, Inc. Foreign language dub validation
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
CN110175267B (en) * 2019-06-04 2020-07-07 黑龙江省七星农场 Agricultural Internet of things control processing method based on unmanned aerial vehicle remote sensing technology
US11546647B2 (en) 2019-06-07 2023-01-03 Roku, Inc. Content-modification system with probability-based selection feature
US20210073273A1 (en) 2019-09-05 2021-03-11 Gracenote, Inc. Methods and apparatus to identify media based on historical data
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
CN110688498B (en) * 2019-09-27 2022-07-22 北京奇艺世纪科技有限公司 Information processing method and device, electronic equipment and storage medium
US11082730B2 (en) * 2019-09-30 2021-08-03 The Nielsen Company (Us), Llc Methods and apparatus for affiliate interrupt detection
US11490047B2 (en) 2019-10-02 2022-11-01 JBF Interlude 2009 LTD Systems and methods for dynamically adjusting video aspect ratios
US11071182B2 (en) 2019-11-27 2021-07-20 Gracenote, Inc. Methods and apparatus to control lighting effects
US11776286B2 (en) * 2020-02-11 2023-10-03 NextVPU (Shanghai) Co., Ltd. Image text broadcasting
US11245961B2 (en) 2020-02-18 2022-02-08 JBF Interlude 2009 LTD System and methods for detecting anomalous activities for interactive videos
KR20210107480A (en) * 2020-02-24 2021-09-01 삼성전자주식회사 Electronice device and control method thereof
US11533533B2 (en) * 2020-04-08 2022-12-20 Roku, Inc. Content-modification system with feature for detecting and responding to content modifications by tuner devices
US11043220B1 (en) 2020-05-11 2021-06-22 Apple Inc. Digital assistant hardware abstraction
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
TWI739437B (en) * 2020-05-21 2021-09-11 瑞昱半導體股份有限公司 Image display system and image data transmission apparatus and method thereof having synchronous data transmission mechanism
CN113747201B (en) * 2020-05-27 2024-01-12 瑞昱半导体股份有限公司 Image playing system and image data transmission device and method with synchronous data transmission mechanism
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
US11792482B2 (en) * 2020-10-14 2023-10-17 Dish Network L.L.C. Visual testing based on machine learning and automated workflow
US11463744B2 (en) * 2020-10-29 2022-10-04 Roku, Inc. Use of primary and backup instances of supplemental content to facilitate dynamic content modification
US20220264171A1 (en) * 2021-02-12 2022-08-18 Roku, Inc. Use of In-Band Data to Facilitate Ad Harvesting for Dynamic Ad Replacement
US11882337B2 (en) 2021-05-28 2024-01-23 JBF Interlude 2009 LTD Automated platform for generating interactive videos
US11934477B2 (en) 2021-09-24 2024-03-19 JBF Interlude 2009 LTD Video player integration within websites
CN114979691B (en) * 2022-05-23 2023-07-28 上海影谱科技有限公司 Statistical analysis method and system for advertisement of retransmission rights of sports event

Citations (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5875263A (en) * 1991-10-28 1999-02-23 Froessl; Horst Non-edit multiple image font processing of records
US20020087979A1 (en) * 2000-11-16 2002-07-04 Dudkiewicz Gil Gavriel System and method for determining the desirability of video programming events using keyword matching
US6504990B1 (en) * 1998-11-12 2003-01-07 Max Abecassis Randomly and continuously playing fragments of a video segment
US20030076448A1 (en) * 2001-10-19 2003-04-24 Hao Pan Identification of replay segments
US20040128343A1 (en) * 2001-06-19 2004-07-01 Mayer Daniel J Method and apparatus for distributing video programs using partial caching
US6771885B1 (en) * 2000-02-07 2004-08-03 Koninklijke Philips Electronics N.V. Methods and apparatus for recording programs prior to or beyond a preset recording time period
US20040252875A1 (en) * 2000-05-03 2004-12-16 Aperio Technologies, Inc. System and method for data management in a linear-array-based microscope slide scanner
US20050028208A1 (en) * 1998-07-17 2005-02-03 United Video Properties, Inc. Interactive television program guide with remote access
US20050093895A1 (en) * 2003-10-30 2005-05-05 Niranjan Damera-Venkata Generating and displaying spatially offset sub-frames on a diamond grid
US20050166142A1 (en) * 2004-01-09 2005-07-28 Pioneer Corporation Information display method, information display device, and information delivery and display system
US6961954B1 (en) * 1997-10-27 2005-11-01 The Mitre Corporation Automated segmentation, information extraction, summarization, and presentation of broadcast news
US20050251827A1 (en) * 1998-07-17 2005-11-10 United Video Properties, Inc. Interactive television program guide system having multiple devices within a household
US20060080716A1 (en) * 2004-09-28 2006-04-13 Sony Corporation Method and apparatus for navigating video content
US20060120624A1 (en) * 2004-12-08 2006-06-08 Microsoft Corporation System and method for video browsing using a cluster index
US20060143655A1 (en) * 1998-11-30 2006-06-29 United Video Properties, Inc. Interactive television program guide with selectable languages
US20060187358A1 (en) * 2003-03-07 2006-08-24 Lienhart Rainer W Video entity recognition in compressed digital video streams
US20060212580A1 (en) * 2005-03-15 2006-09-21 Enreach Technology, Inc. Method and system of providing a personal audio/video broadcasting architecture
US20070104369A1 (en) * 2005-11-04 2007-05-10 Eyetracking, Inc. Characterizing dynamic regions of digital media data
US20070192782A1 (en) * 2004-08-09 2007-08-16 Arun Ramaswamy Methods and apparatus to monitor audio/visual content from various sources
US20080127253A1 (en) * 2006-06-20 2008-05-29 Min Zhang Methods and apparatus for detecting on-screen media sources
US20080235087A1 (en) * 2007-03-20 2008-09-25 Sbc Knowledge Ventures L.P. System and method for presenting alternative advertising data
US20090083042A1 (en) * 2006-04-26 2009-03-26 Sony Corporation Encoding Method and Encoding Apparatus
US7540009B1 (en) * 2008-06-30 2009-05-26 International Business Machines Corporation Use tagging in television programs for scene filtering and alerts
US20090199242A1 (en) * 2008-02-05 2009-08-06 Johnson Bradley G System and Method for Distributing Video Content via a Packet Based Network
US20090222849A1 (en) * 2008-02-29 2009-09-03 Peters Mark E Audiovisual Censoring
US20090249387A1 (en) * 2008-03-31 2009-10-01 Microsoft Corporation Personalized Event Notification Using Real-Time Video Analysis
US20090249393A1 (en) * 2005-08-04 2009-10-01 Nds Limited Advanced Digital TV System
US20100306193A1 (en) * 2009-05-28 2010-12-02 Zeitera, Llc Multi-media content identification using multi-level content signature correlation and fast similarity search
US20100318916A1 (en) * 2009-06-11 2010-12-16 David Wilkins System and method for generating multimedia presentations
US20110072452A1 (en) * 2009-09-23 2011-03-24 Rovi Technologies Corporation Systems and methods for providing automatic parental control activation when a restricted user is detected within range of a device
US20110078717A1 (en) * 2009-09-29 2011-03-31 Rovi Technologies Corporation System for notifying a community of interested users about programs or segments
US20110214046A1 (en) * 2000-04-07 2011-09-01 Visible World, Inc. Template Creation and Editing for a Message Campaign
US20110246172A1 (en) * 2010-03-30 2011-10-06 Polycom, Inc. Method and System for Adding Translation in a Videoconference
US20110282906A1 (en) * 2010-05-14 2011-11-17 Rovi Technologies Corporation Systems and methods for performing a search based on a media content snapshot image
US20110296048A1 (en) * 2009-12-28 2011-12-01 Akamai Technologies, Inc. Method and system for stream handling using an intermediate format
US20120030182A1 (en) * 2010-07-27 2012-02-02 Timothy Claman Hierarchical multimedia program composition
US20120093476A1 (en) * 2010-10-13 2012-04-19 Eldon Technology Limited Apparatus, systems and methods for a thumbnail-sized scene index of media content
US20120099795A1 (en) * 2010-10-20 2012-04-26 Comcast Cable Communications, Llc Detection of Transitions Between Text and Non-Text Frames in a Video Stream
US8180667B1 (en) * 2008-06-03 2012-05-15 Google Inc. Rewarding creative use of product placements in user-contributed videos
US8209729B2 (en) * 2006-04-20 2012-06-26 At&T Intellectual Property I, Lp Rules-based content management
US20130148884A1 (en) * 2011-12-13 2013-06-13 Morris Lee Video comparison using color histograms
US20130177294A1 (en) * 2012-01-07 2013-07-11 Aleksandr Kennberg Interactive media content supporting multiple camera views
US20140245354A1 (en) * 2005-03-30 2014-08-28 Rovi Guides, Inc. Systems and methods for video-rich navigation
US8966003B2 (en) * 2008-09-19 2015-02-24 Limelight Networks, Inc. Content delivery network stream server vignette distribution
US20150249854A1 (en) * 2009-12-28 2015-09-03 Akamai Technologies, Inc. Method and system for recording streams

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850252B1 (en) * 1999-10-05 2005-02-01 Steven M. Hoffberg Intelligent electronic appliance system and method
US7236969B1 (en) * 1999-07-08 2007-06-26 Nortel Networks Limited Associative search engine
US6357042B2 (en) * 1998-09-16 2002-03-12 Anand Srinivasan Method and apparatus for multiplexing separately-authored metadata for insertion into a video data stream
US7283954B2 (en) * 2001-04-13 2007-10-16 Dolby Laboratories Licensing Corporation Comparing audio using characterizations based on auditory events
US7493381B2 (en) * 2003-06-09 2009-02-17 National University Of Singapore System and method for providing a service
EP1652385B1 (en) * 2003-07-25 2007-09-12 Koninklijke Philips Electronics N.V. Method and device for generating and detecting fingerprints for synchronizing audio and video
CA2574998C (en) * 2004-07-23 2011-03-15 Nielsen Media Research, Inc. Methods and apparatus for monitoring the insertion of local media content into a program stream
US20090307234A1 (en) * 2005-08-12 2009-12-10 Zrike Kenneth L Sports Matchmaker Systems
US20070092104A1 (en) * 2005-10-26 2007-04-26 Shinhaeng Lee Content authentication system and method
US20090094159A1 (en) * 2007-10-05 2009-04-09 Yahoo! Inc. Stock video purchase
US20090144749A1 (en) * 2007-11-30 2009-06-04 Leviathan Entertainment Alert and Repair System for Data Scraping Routines
US9723254B2 (en) * 2008-04-14 2017-08-01 The Directv Group, Inc. Method and system of extending recording time for a run-over program
US8259177B2 (en) * 2008-06-30 2012-09-04 Cisco Technology, Inc. Video fingerprint systems and methods
US10229438B2 (en) * 2008-11-06 2019-03-12 Iheartmedia Management Services, Inc. System and method for integrated, automated inventory management and advertisement delivery
US8934545B2 (en) * 2009-02-13 2015-01-13 Yahoo! Inc. Extraction of video fingerprints and identification of multimedia using video fingerprinting
US8380050B2 (en) * 2010-02-09 2013-02-19 Echostar Technologies Llc Recording extension of delayed media content

Patent Citations (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5875263A (en) * 1991-10-28 1999-02-23 Froessl; Horst Non-edit multiple image font processing of records
US6961954B1 (en) * 1997-10-27 2005-11-01 The Mitre Corporation Automated segmentation, information extraction, summarization, and presentation of broadcast news
US20050028208A1 (en) * 1998-07-17 2005-02-03 United Video Properties, Inc. Interactive television program guide with remote access
US20050251827A1 (en) * 1998-07-17 2005-11-10 United Video Properties, Inc. Interactive television program guide system having multiple devices within a household
US6504990B1 (en) * 1998-11-12 2003-01-07 Max Abecassis Randomly and continuously playing fragments of a video segment
US20060143655A1 (en) * 1998-11-30 2006-06-29 United Video Properties, Inc. Interactive television program guide with selectable languages
US6771885B1 (en) * 2000-02-07 2004-08-03 Koninklijke Philips Electronics N.V. Methods and apparatus for recording programs prior to or beyond a preset recording time period
US20110214046A1 (en) * 2000-04-07 2011-09-01 Visible World, Inc. Template Creation and Editing for a Message Campaign
US20040252875A1 (en) * 2000-05-03 2004-12-16 Aperio Technologies, Inc. System and method for data management in a linear-array-based microscope slide scanner
US20020087979A1 (en) * 2000-11-16 2002-07-04 Dudkiewicz Gil Gavriel System and method for determining the desirability of video programming events using keyword matching
US20040128343A1 (en) * 2001-06-19 2004-07-01 Mayer Daniel J Method and apparatus for distributing video programs using partial caching
US20030076448A1 (en) * 2001-10-19 2003-04-24 Hao Pan Identification of replay segments
US20060187358A1 (en) * 2003-03-07 2006-08-24 Lienhart Rainer W Video entity recognition in compressed digital video streams
US20050093895A1 (en) * 2003-10-30 2005-05-05 Niranjan Damera-Venkata Generating and displaying spatially offset sub-frames on a diamond grid
US20050166142A1 (en) * 2004-01-09 2005-07-28 Pioneer Corporation Information display method, information display device, and information delivery and display system
US20070192782A1 (en) * 2004-08-09 2007-08-16 Arun Ramaswamy Methods and apparatus to monitor audio/visual content from various sources
US20060080716A1 (en) * 2004-09-28 2006-04-13 Sony Corporation Method and apparatus for navigating video content
US20060120624A1 (en) * 2004-12-08 2006-06-08 Microsoft Corporation System and method for video browsing using a cluster index
US20060212580A1 (en) * 2005-03-15 2006-09-21 Enreach Technology, Inc. Method and system of providing a personal audio/video broadcasting architecture
US20140245354A1 (en) * 2005-03-30 2014-08-28 Rovi Guides, Inc. Systems and methods for video-rich navigation
US20090249393A1 (en) * 2005-08-04 2009-10-01 Nds Limited Advanced Digital TV System
US20070104369A1 (en) * 2005-11-04 2007-05-10 Eyetracking, Inc. Characterizing dynamic regions of digital media data
US8209729B2 (en) * 2006-04-20 2012-06-26 At&T Intellectual Property I, Lp Rules-based content management
US20090083042A1 (en) * 2006-04-26 2009-03-26 Sony Corporation Encoding Method and Encoding Apparatus
US20080127253A1 (en) * 2006-06-20 2008-05-29 Min Zhang Methods and apparatus for detecting on-screen media sources
US20080235087A1 (en) * 2007-03-20 2008-09-25 Sbc Knowledge Ventures L.P. System and method for presenting alternative advertising data
US20090199242A1 (en) * 2008-02-05 2009-08-06 Johnson Bradley G System and Method for Distributing Video Content via a Packet Based Network
US20090222849A1 (en) * 2008-02-29 2009-09-03 Peters Mark E Audiovisual Censoring
US20090249387A1 (en) * 2008-03-31 2009-10-01 Microsoft Corporation Personalized Event Notification Using Real-Time Video Analysis
US8180667B1 (en) * 2008-06-03 2012-05-15 Google Inc. Rewarding creative use of product placements in user-contributed videos
US7540009B1 (en) * 2008-06-30 2009-05-26 International Business Machines Corporation Use tagging in television programs for scene filtering and alerts
US8966003B2 (en) * 2008-09-19 2015-02-24 Limelight Networks, Inc. Content delivery network stream server vignette distribution
US20100306193A1 (en) * 2009-05-28 2010-12-02 Zeitera, Llc Multi-media content identification using multi-level content signature correlation and fast similarity search
US20100318916A1 (en) * 2009-06-11 2010-12-16 David Wilkins System and method for generating multimedia presentations
US20110072452A1 (en) * 2009-09-23 2011-03-24 Rovi Technologies Corporation Systems and methods for providing automatic parental control activation when a restricted user is detected within range of a device
US20110078717A1 (en) * 2009-09-29 2011-03-31 Rovi Technologies Corporation System for notifying a community of interested users about programs or segments
US20110296048A1 (en) * 2009-12-28 2011-12-01 Akamai Technologies, Inc. Method and system for stream handling using an intermediate format
US20150249854A1 (en) * 2009-12-28 2015-09-03 Akamai Technologies, Inc. Method and system for recording streams
US20110246172A1 (en) * 2010-03-30 2011-10-06 Polycom, Inc. Method and System for Adding Translation in a Videoconference
US20110282906A1 (en) * 2010-05-14 2011-11-17 Rovi Technologies Corporation Systems and methods for performing a search based on a media content snapshot image
US20120030182A1 (en) * 2010-07-27 2012-02-02 Timothy Claman Hierarchical multimedia program composition
US20120093476A1 (en) * 2010-10-13 2012-04-19 Eldon Technology Limited Apparatus, systems and methods for a thumbnail-sized scene index of media content
US20120099795A1 (en) * 2010-10-20 2012-04-26 Comcast Cable Communications, Llc Detection of Transitions Between Text and Non-Text Frames in a Video Stream
US20130148884A1 (en) * 2011-12-13 2013-06-13 Morris Lee Video comparison using color histograms
US20130177294A1 (en) * 2012-01-07 2013-07-11 Aleksandr Kennberg Interactive media content supporting multiple camera views

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160337716A1 (en) * 2013-12-19 2016-11-17 Lg Electronics Inc. Broadcast transmitting device and operating method thereof, and broadcast receiving device and operating method thereof
US10108709B1 (en) * 2016-04-11 2018-10-23 Digital Reasoning Systems, Inc. Systems and methods for queryable graph representations of videos
US11234060B2 (en) 2017-09-01 2022-01-25 Roku, Inc. Weave streaming content into a linear viewing experience
US11418858B2 (en) 2017-09-01 2022-08-16 Roku, Inc. Interactive content when the secondary content is server stitched
WO2019088853A1 (en) * 2017-11-03 2019-05-09 Klaps Limited Live audio replacement in a digital stream
WO2019094403A1 (en) * 2017-11-08 2019-05-16 Roku, Inc. Enhanced playback bar
US10334326B2 (en) 2017-11-08 2019-06-25 Roku, Inc. Enhanced playback bar
US11575962B2 (en) 2018-05-21 2023-02-07 Samsung Electronics Co., Ltd. Electronic device and content recognition information acquisition therefor
US11190837B2 (en) 2018-06-25 2021-11-30 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
US11922600B2 (en) 2018-08-31 2024-03-05 Samsung Display Co., Ltd. Afterimage compensator, display device having the same, and method for driving display device
CN109492126A (en) * 2018-11-02 2019-03-19 廊坊市森淼春食用菌有限公司 A kind of intelligent interactive method and device
WO2020105993A1 (en) * 2018-11-19 2020-05-28 Samsung Electronics Co., Ltd. Display apparatus, server, electronic apparatus and control methods thereof
US11467798B2 (en) 2018-11-19 2022-10-11 Samsung Electronics Co., Ltd. Display apparatus for changing an advertisement area, server, electronic apparatus and control methods thereof
CN113170229A (en) * 2018-11-19 2021-07-23 三星电子株式会社 Display device, server, electronic device and control method thereof
CN109600184A (en) * 2018-11-24 2019-04-09 六安富华智能信息科技有限公司 A kind of emergent broadcast terminal having teletext self-checking function
CN109379155A (en) * 2018-11-24 2019-02-22 合肥龙泊信息科技有限公司 A kind of emergency broadcase system having teletext self-checking function
EP3671486B1 (en) * 2018-12-18 2023-01-04 Samsung Electronics Co., Ltd. Display apparatus and control method thereof
US11184670B2 (en) 2018-12-18 2021-11-23 Samsung Electronics Co., Ltd. Display apparatus and control method thereof
CN112204989A (en) * 2018-12-20 2021-01-08 海信视像科技股份有限公司 Broadcast signal receiving apparatus, advertisement replacing method, and advertisement replacing system
AU2020216550B2 (en) * 2019-01-22 2022-09-08 Infinite Athlete, Inc. Systems and methods for customizing and compositing a video feed at a client device
US11140328B2 (en) 2019-01-22 2021-10-05 Tempus Ex Machina, Inc. Systems and methods for partitioning a video feed to segment live player activity
WO2020160563A1 (en) * 2019-01-22 2020-08-06 MGM Resorts International Operations, Inc. Systems and methods for customizing and compositing a video feed at a client device
US11754662B2 (en) 2019-01-22 2023-09-12 Tempus Ex Machina, Inc. Systems and methods for partitioning a video feed to segment live player activity
US20220038767A1 (en) * 2019-01-22 2022-02-03 Tempus Ex Machina, Inc. Systems and methods for customizing and compositing a video feed at a client device
US11172248B2 (en) * 2019-01-22 2021-11-09 Tempus Ex Machina, Inc. Systems and methods for customizing and compositing a video feed at a client device
CN113574901A (en) * 2019-03-15 2021-10-29 天时机械公司 System and method for customizing and compositing video feeds at a client device
JP7343588B2 (en) 2019-03-15 2023-09-12 テンパス・エクス・マキーナ・インコーポレーテッド System and method for customizing and compositing video feeds on client devices
JP2022519990A (en) * 2019-03-15 2022-03-28 テンパス・エクス・マキーナ・インコーポレーテッド Systems and methods for customizing and compositing video feeds on client devices
EP3939331A4 (en) * 2019-03-15 2022-11-16 Tempus Ex Machina, Inc. Systems and methods for customizing and compositing a video feed at a client device
US20220224968A1 (en) * 2019-04-28 2022-07-14 Huawei Technologies Co., Ltd. Screen Projection Method, Electronic Device, and System
US20220215074A1 (en) * 2019-05-07 2022-07-07 The Nielsen Company (Us), Llc End-point media watermarking
US11645866B2 (en) 2019-05-10 2023-05-09 Roku, Inc. Content-modification system with fingerprint data match and mismatch detection feature
US11736742B2 (en) * 2019-05-10 2023-08-22 Roku, Inc. Content-modification system with responsive transmission of reference fingerprint data feature
US11653037B2 (en) 2019-05-10 2023-05-16 Roku, Inc. Content-modification system with responsive transmission of reference fingerprint data feature
US11632598B2 (en) 2019-05-10 2023-04-18 Roku, Inc. Content-modification system with responsive transmission of reference fingerprint data feature
WO2020247840A1 (en) * 2019-06-07 2020-12-10 The Nielson Company (Us), Llc Content-modification system with overlay handling feature
US11134292B2 (en) 2019-06-07 2021-09-28 Roku, Inc. Content-modification system with overlay handling feature
US11617001B2 (en) 2019-06-07 2023-03-28 Roku, Inc. Content-modification system with overlay handling feature
US11245870B2 (en) 2019-06-18 2022-02-08 Roku, Inc. Content-modification system with determination of input-buffer switching delay feature
WO2020257424A1 (en) * 2019-06-18 2020-12-24 The Nielsen Company (Us), Llc Content-modification system with determination of input-buffer switching delay feature
EP3997651A4 (en) * 2019-07-09 2023-08-02 Hyphametrics, Inc. Cross-media measurement device and method
EP4052477A4 (en) * 2019-10-31 2023-11-08 Roku, Inc. Content-modification system with delay buffer feature
CN114503600A (en) * 2019-10-31 2022-05-13 六科股份有限公司 Content modification system with delay buffer feature
US11678017B2 (en) 2019-11-06 2023-06-13 Samsung Electronics Co., Ltd. Electronic device and method for controlling the same, and storage medium
WO2021091171A1 (en) * 2019-11-06 2021-05-14 Samsung Electronics Co., Ltd. Electronic device and method for controlling the same, and storage medium
US20210385534A1 (en) * 2020-06-03 2021-12-09 Top Victory Investments Limited Method for obtaining television rating points for television channels
EP3920541A1 (en) * 2020-06-03 2021-12-08 Top Victory Investments Limited Method for obtaining television rating points for television channels
US11475668B2 (en) 2020-10-09 2022-10-18 Bank Of America Corporation System and method for automatic video categorization
CN113177603A (en) * 2021-05-12 2021-07-27 中移智行网络科技有限公司 Training method of classification model, video classification method and related equipment
US11514337B1 (en) * 2021-09-15 2022-11-29 Castle Global, Inc. Logo detection and processing data model
US11601694B1 (en) 2021-09-15 2023-03-07 Castle Global, Inc. Real-time content data processing using robust data models
CN115309920A (en) * 2022-10-08 2022-11-08 国家广播电视总局信息中心 Audio and video management method and system based on fusion big data

Also Published As

Publication number Publication date
US9510044B1 (en) 2016-11-29

Similar Documents

Publication Publication Date Title
US9510044B1 (en) TV content segmentation, categorization and identification and time-aligned applications
US11615621B2 (en) Video processing for embedded information card localization and content extraction
US10733230B2 (en) Automatic creation of metadata for video contents by in cooperating video and script data
US9888279B2 (en) Content based video content segmentation
US8750681B2 (en) Electronic apparatus, content recommendation method, and program therefor
JP5533861B2 (en) Display control apparatus, display control method, and program
WO2021082668A1 (en) Bullet screen editing method, smart terminal, and storage medium
Truong et al. Video abstraction: A systematic review and classification
KR102068790B1 (en) Estimating and displaying social interest in time-based media
EP2541963B1 (en) Method for identifying video segments and displaying contextually targeted content on a connected television
US9100701B2 (en) Enhanced video systems and methods
KR102246305B1 (en) Augmented media service providing method, apparatus thereof, and system thereof
JP2011223287A (en) Information processor, information processing method, and program
Li et al. Bridging the semantic gap in sports video retrieval and summarization
JP2014130536A (en) Information management device, server, and control method
Li et al. Bridging the semantic gap in sports
CN117812377A (en) Display device and intelligent editing method
CN117880585A (en) Method, medium and system for extracting metadata from video stream
EP3044728A1 (en) Content based video content segmentation

Legal Events

Date Code Title Description
AS Assignment

Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, ILLINOIS

Free format text: SECURITY INTEREST;ASSIGNORS:GRACENOTE, INC.;TRIBUNE BROADCASTING COMPANY, LLC;TRIBUNE DIGITAL VENTURES, LLC;REEL/FRAME:040306/0814

Effective date: 20161110

Owner name: JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT, IL

Free format text: SECURITY INTEREST;ASSIGNORS:GRACENOTE, INC.;TRIBUNE BROADCASTING COMPANY, LLC;TRIBUNE DIGITAL VENTURES, LLC;REEL/FRAME:040306/0814

Effective date: 20161110

AS Assignment

Owner name: GRACENOTE, INC., CALIFORNIA

Free format text: RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:041656/0804

Effective date: 20170201

Owner name: TRIBUNE MEDIA SERVICES, LLC, ILLINOIS

Free format text: RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:041656/0804

Effective date: 20170201

Owner name: TRIBUNE DIGITAL VENTURES, LLC, ILLINOIS

Free format text: RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:041656/0804

Effective date: 20170201

Owner name: CASTTV INC., ILLINOIS

Free format text: RELEASE OF SECURITY INTEREST IN PATENT RIGHTS;ASSIGNOR:JPMORGAN CHASE BANK, N.A.;REEL/FRAME:041656/0804

Effective date: 20170201

AS Assignment

Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK

Free format text: SUPPLEMENTAL SECURITY AGREEMENT;ASSIGNORS:GRACENOTE, INC.;GRACENOTE MEDIA SERVICES, LLC;GRACENOTE DIGITAL VENTURES, LLC;REEL/FRAME:042262/0601

Effective date: 20170412

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: THE NIELSEN COMPANY (US), LLC, NEW YORK

Free format text: PARTIAL RELEASE OF SECURITY INTEREST;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:056973/0280

Effective date: 20210415

Owner name: GRACENOTE, INC., NEW YORK

Free format text: PARTIAL RELEASE OF SECURITY INTEREST;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:056973/0280

Effective date: 20210415

AS Assignment

Owner name: ROKU, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GRACENOTE, INC.;REEL/FRAME:056103/0786

Effective date: 20210415

AS Assignment

Owner name: GRACENOTE DIGITAL VENTURES, LLC, NEW YORK

Free format text: RELEASE (REEL 042262 / FRAME 0601);ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:061748/0001

Effective date: 20221011

Owner name: GRACENOTE, INC., NEW YORK

Free format text: RELEASE (REEL 042262 / FRAME 0601);ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:061748/0001

Effective date: 20221011