WO2009111699A2

WO2009111699A2 - Automated process for segmenting and classifying video objects and auctioning rights to interactive video objects

Info

Publication number: WO2009111699A2
Application number: PCT/US2009/036332
Authority: WO
Inventors: Armin Moehrle
Original assignee: Armin Moehrle
Priority date: 2008-03-06
Filing date: 2009-03-06
Publication date: 2009-09-11
Also published as: CN102160084A; CN102160084B

Abstract

Disclosed is a method and system for automatically segmenting and classifying video content into objects. The objects are used to create selectable hyperlinks in the video which dynamically track the position of the object in the video. Also disclosed is a method and system for associating video content objects within video, animations and visual data streams with contextually relevant information and for connecting such video content object to an advertisement market exchange and multifaceted viewer profile and making these object interactive for pull and push viewer interaction.

Description

AUTOMATED PROCESS FOR SEGMENTING AND CLASSIFYING VIDEO OBJECTS AND AUCTIONING RIGHTS TO INTERACTIVE VIDEO OBJECTS

Claim for Priority

This application claims priority to U.S. Provisional Patent Application Serial Number 61/034470 filed March 6, 2008 entitled "Method For Creating And Activating A Video Content Inventory, And A Method For Creating An Advertising Market Exchange Using The Same."

Field of the Invention

[0001] The present invention relates to a system for automatically segmenting and classifying video content objects within a video, auctioning rights to associate adverting content with the video objects, and creating an overlay associating advertising with select video objects and which enables a video viewer to interact with video object in the video.

Background

[0002] Video is the technology of electronically capturing, recording, processing, storing, transmitting, and reconstructing a sequence of still images representing scenes in motion. Video technology was first developed for television systems, but has been further developed in many formats to allow for viewer video recording. Motion pictures on film can be converted into video formats. Video can also be viewed through the Internet as video clips or streaming media clips on computer monitors.

[0003] Animation is the rapid display of a sequence of images of artwork or model positions in order to create an illusion of movement. It is an optical illusion of motion due to the phenomenon of persistence of vision, and can be created and demonstrated in a number of ways. The most common method of presenting animation is as a motion picture or video, although several other forms of presenting animation also exist.

[0004] Video content segmentation is the systematic decomposition of a motion picture frame into its objects (components) such as a person, a shirt, a tree, a leave etc.

Segmenting video content results in a large number of objects with little value if not classified.

[0005] Classification is the process of assigning an object of one frame to the same class of the same object of another frame. It enables the automated recognition that a specific red shirt in one frame is the same as the red shirt in another frame. There are several approaches to assigning video objects to the class they belong to such as by the contours of its appearances in successive video frames. For example, this may be done by matching curvature features of the video object contour to a database containing preprocessed views of prototypical objects. See, Attachment 1 entitled MOCA Project:

Object Recognition.

[0006] For each two-dimensional appearance of an object in a video frame curvature features of its contour are calculated. These features are matched to those of views of prototypical video objects stored in a database. By applying context rules such as "a house may have a car in the frame or may have a tree in the frame but does not have a TV in the frame" the accuracy can be increased. The final classification of the object is achieved by integrating the matching results for successive frames.

[0007] There are several paradigms and algorithms for video segmentation and classification. Most are based on segmenting video into layers such as a static background layer and a dynamic foreground layer and using multiple cues, such as spatial location, color, motion, contours and depth discontinuities, etc.

[0008] Rotoscoping is an animation technique in which animators trace over live- action film movement, frame by frame, for use in animated films. [0009] By shooting video from several perspectives with synchronized cameras, video segmentation algorithms can be used to automatically reconstruct 3D wireframes of moving objects.

[00010] In one embodiment of the invention automatic rotoscoping techniques are applied to videos which have been shot by multiple camera angles to reconstruct the 3D objects and save their wireframes into the video object database. When a viewer selects an object for which there is 3D information available, the viewer is presented with a means to control the animation of the 3D object such as rotate, move, scale etc. [00011] An object of the invention is to provide an automated system for segmenting raw video to create an inventory of video objects which may be used to make the video interactive, and to auction these video objects to advertisers. The invention is not tied to any specific method of segmenting or classifying video content objects. [00012] In one embodiment of the invention an object information library which contains descriptive information and/or meta-data regarding objects which may appear in the video is used to associate meta data such as product information, unique product identification information or stock keeping units with the segmented video objects. [00013] A further object of the invention is to create an advertising market exchange whereby rights to an inventory of video objects are automatically auctioned to a third party such as an advertiser. [00014] Summary of the Invention

[00015] Disclosed is a system for automatically segmenting and classifying video content into objects and auctioning the same, comprising a video segmentation and classification server including a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; the video segmenting and classification server including one or more video files stored in a video database; an object information library stored on one of the random access memory, read-only memory, and mass storage memory, the object information library containing object information used to identify objects within the video files and at least one of descriptive information and semantic information used to describe the object; an object inventory database containing information describing a location of at least one video object within one of the video files; and a video content analysis application executed on the processor, the video content analysis application segmenting the video files to identify locations of video objects, classifying the video objects to match occurrences of a given video object, retrieving information describing the video object by matching occurrences of classified video objects with video objects in the object information library, and storing information describing the dynamic location of the video object within the video and information describing the video object in the object inventory database.

[00016] According to one embodiment, the system further includes at least one advertising server including a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; an automated bidding application executed on the advertising server; and an automated auction application executed on the video segmentation and classification server, the auction application transmitting auction information to the at least one advertising server, the auction information including information describing a selected video object, the automated auction application receiving bid information from the automated bidding application from the at least one advertising server and awarding rights to associate advertising content to a selected one of the at least one advertising server.

[00017] In the aforementioned system the automated auction application may transmit to the advertiser bidding application at least one of consumer behavioral information and market segment information associated with the given video object.

[00018] Any of the aforementioned embodiments of the system, may include adverting content in a database; and an overlay generation application for creating a video overlay linking the advertising content with a given the video object and creating a selectable hyperlink whose position tracks a dynamic location of the video object in the video.

[00019] The aforementioned system embodiment may further include a video broadcaster server comprising a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; a video consumer server comprising a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; the video broadcaster server receiving the video overlay from the video segmenting and classification server and transmitting the video overlay to the video consumer server, the video overlay selectively causing the display of content information linked with a given the video object responsive to interaction with the given video object.

[00020] In any of the aforementioned embodiments of the system, the video segmenting and classification server may maintain a database of objects which have been auctioned.

[00021] In any of the aforementioned embodiments of the system, each the database entry may include information indicating when rights to an auctioned object expire.

[00022] In any of the aforementioned embodiments of the system, the at least one advertiser server may communicates to the video segmenting and classification server information specifying a desired demographic audience, and the auction server communicating object auction information which is restricted to the desired demographic audience.

[00023] In any of the aforementioned embodiments of the system, the auction information may include information specifying at least one of demographic information and user behavioral history information.

[00024] In any of the aforementioned embodiments of the system, the video consumer server may further include a content display application for displaying video and which interacts with the video overlay and displays advertising content when a given video object is one of selected and rolled-over with a pointing device. [00025] In any of the aforementioned embodiments of the system, selection of video object may cause the content display application to pause or slow the display of video.

[00026] In any of the aforementioned embodiments of the system, the video consumer server may include a pointing device; the content display application displays first content associated with an object appearing in the video, and displays second content when a selected the video object within the streaming video is one of selected and rolled-over with the pointing device.

[00027] In any of the aforementioned embodiments of the system, the advertising content may include a selectable link wherein selection of the link provides ecommerce options.

[00028] Also disclosed is a system for automatically creating selectable hyperlinks in a video, comprising: segmenting the video file into video objects, a plurality of video objects, classifying the plurality of video object to identify duplicate occurrences of a given the video object; storing frame and sub frame information for each video object in a database; and creating selectable hyperlinks in the video file using a video overlay linking at least one video object in the database with the video file.

[00029] In any of the aforementioned embodiments of the system, the selectable hyperlink may be linked with advertising which is displayed when a user rolls over or selects the hyperlink with a pointing device.

[00030] Disclosed is a video object market exchange, comprising: a video segmentation and classification application for automatically segmenting video into a plurality of objects, classifying the objects into groups of similar objects, labeling the objects with descriptive information, and storing information identifying a dynamic location of the video object within the video in a database; and an overlay generator for automatically creating a video overlay linking at least one group of video objects with the video.

[00031] In any of the aforementioned embodiments of the video object market exchange, each linked video object may be a selectable hyperlink whose position tracks the dynamic location of the video object in the video.

[00032] Disclosed is a method for providing active regions for an interactive layer for a video viewer application, comprising: accessing video data that defines a plurality of frames showing a plurality of video objects, each video object being shown in a sequence of frames, and generating region definition data that defines a plurality of regions, each region corresponding to one of the plurality of video objects, wherein the outline of each region defined by the region definition data matches the outline of the corresponding video object as it is shown in the sequence of frames.

[00033] In the aforementioned embodiment of the method for providing active regions the outline of each region dynamically may change in the sequence of frames to match changes in at least one of the perspective and the size and the angle of view in which the corresponding video object is shown in the sequence of frames. [00034] In any of the aforementioned embodiments of the method for providing active regions, region definition data may be used to define a plurality of active regions for interactive video viewing.

[00035] In any of the aforementioned embodiments of the method for providing active regions, the frames may be shown to a user on a display as a video, and the region definition data may be used to determine whether a user action directed to a location of at least one of these frame addresses one of the active regions.

[00036] In any of the aforementioned embodiments of the method for providing active regions, advertising is presented to the user in response to a determination that the user action addresses a certain active region, the advertising pertaining to the video object that corresponds to the certain active region.

[00037] In any of the aforementioned embodiments of the method for providing active regions, the region definition data for at least one region includes a three- dimensional wireframe representation of the video object that corresponds to the region.

[00038] In any of the aforementioned embodiments of the method for providing active regions, the region definition data for the region further contains, for at least one frame of the sequence of frames in which the corresponding video object is shown, data defining a perspective view of the three-dimensional wireframe representation, wherein the outline of the perspective view of the three- dimensional wireframe representation defines the outline of the region for the frame. [00039] In any of the aforementioned embodiments of the method for providing active regions, the region definition data for the region further contains, for at least one pair of frames of the sequence of frames in which the corresponding video object is shown, data defining a change of the three-dimensional wireframe representation between the frames of the pair of frames.

[00040] In any of the aforementioned embodiments of the method for providing active regions, the three-dimensional wireframe representation includes a plurality of nodes, and the data defining the change includes data that defines a displacement of a position of at least one node with respect to at least another node.

[00041] In any of the aforementioned embodiments of the method for providing active regions, the data defining the change includes data that defines a change in at least one of the size and spatial orientation of the three-dimensional wireframe representation.

Brief Description of the Drawings

[00042] FIG. 1 is a flowchart of an video object market exchange process according to the present invention;

[00043] FIG. 2 is a block diagram of a video object market exchange system according to the present invention;

[00044] FIG. 3 is a block diagram of a computer on which the video content analysis application executes;

[00045] FIG. 4 is a flowchart of how an advertiser interacts with the video object market exchange system according to the present invention; [00046] FIGs. 5 and 6 are flowcharts showing the interactions of a viewer with the video object market exchange system according to the present invention;

[00047] FIG. 7 is an exemplary object inventory database; and

[00048] FIGs. 8A-8E are perspective views of a video object and a wire frame model created therefrom.

Detailed Description of the Invention

[00049] The present invention is a system 100 for automatically segmenting video into video objects, classifying the video objects, assembling a database of the classified video objects, defining region definition data representing each video object on an interactive layer, auctioning the right to associate advertising with the regions representing video objects on a market exchange (hereinafter "VOME") 300, and creating a video overlay with region definition data linking advertising content with the video content objects and thereby creating an interactive video. The region is a portion of the video frame which is congruent with the underlying video object. The region definition data defines such portion of the video frame. The system 100 of the present invention consists of several distinct yet related components.

[00050] One aspect of the invention relates to the creation of an inventory of video objects and corresponding region definition data. The video object inventory 114 is a database containing region definition data in form of pointers or references to video objects within video or animation files. Importantly, the region definition data is used to make the video interactive by providing the ability to link supplemental information with a specific video object within a video. For example, the video viewer can select a car displayed within the video and learn the make and model of the car and other supplemental information. The invention associates the supplemental information with the video object thereby making the object a selectable hyperlink.

[00051] In one embodiment of the invention, recognized video objects are represented by 3D vector graphics data such as wire frame models FIG. 8D. The representation is created by computing the difference between perspective views FIG.

8A-8C on the object and then specifying each edge of the physical object where two mathematically continuous smooth surfaces meet, or by connecting an object's constituent vertices using straight lines or curves.

[00052] If not all views are available, then only a partial 3D model is created but completed once the missing views become available in additional videos.

[00053] These 3D wireframe models may be used to improve the recognition of video objects but also may be used to represent the video objects as outlines of perspective views of the three-dimensional wireframe FIG 8E. Such an embodiment may have computational benefits.

[00054] According one embodiment, video objects are associated with meta-data and/or an object description which enables users (e.g., advertisers) to search for all instances of "automobile" and the search results will include "car" as well. The object description may be part of a semantic network which allows auction bidders to specify the object and the video context on which they want to bid. This may be useful for preventing exposure of an advertiser's brand in contexts which are not appropriate for the brand. The term video object as used in this specification refers to a video frame component, e.g., a car, a runner or a dog which appears in the video or animation. In one embodiment of the invention, motion attributes such as fast, slow, up, down etc of video objects in the database will be indexed, which will allow auction bidders to specify their bids with motion attributes. In another embodiment, the states of objects will be indexed such as by means of facial expression algorithms already known in the art which extract the state of a person in a video such as happy or sad.

[00055] Hereinafter reference to video should be understood to encompass 2D video, 3D video and animation unless an explicit distinction is made. The video object database includes detailed information for identifying the location, shape and movement of the video object within the video file. The video object inventory may include detailed descriptions of the specific object in the video content such as manufacturer, make and model. As will be explained in further detail below, this detailed information may be used to link information such as advertising content with the video objects. [00056] As will be explained below, according to various embodiments of the invention the video objects may be manually, semi-automatically or automatically identified and associated with relevant information.

[00057] A further aspect of the invention relates to the creation of a video object market exchange (VOME) in which bidders (advertisers) bid for the right to associate their advertising content with a given video object. It should be noted that the invention also enables a video content owner to only auction a certain portion of the video object inventory and sell the rest directly to an inventory buyer without the public bidding process.

Activation of Video Content

[00058] A further aspect of the invention relates to the creation of a video overlay which transforms "passive" video, e.g., video which you "passively" watch but with which you do not interact into interactive video where the viewer interacts with regions of the video by selecting or rolling over region within the video thereby triggering the display of advertising content associated with the object.

[00059] The use of hyperlinks within static media such as a website is well known.

In video games and animations it is very common to click on objects. That is what makes it "interactive". Rich or interactive media refers to communication media that facilitates active participation by the recipient, hence interactivity. Traditional information theory would describe interactive media as those media that establish two-way communication.

Identification and Compilation of Video Content

[00060] The present invention segments video and animation content into its objects and stores, region definition data such as shape, x, y, and temporal coordinates, or in the case of volumetric video or volumetric animation, the invention stores shape, x, y, z, and temporal coordinates. The term "temporal coordinate" refers to time, video frame or the like. Further, the term "video frame" is intended to convey an instantaneous (still) image frame of the video or animation at a particular time (location within the video stream). All of these coordinates are necessary to specify the video objects within a frame at a given moment in time.

[00061] An object of the present invention is to take conventional video content analysis technology such as currently used to identify a person within a crowd, or identify/inspect a widget on a conveyor belt and apply it to the field of marketing communication, advertising and commerce transaction. More particularly, it is an object of the invention to identify video objects of interest within video and animations. The identified video objects or content objects may be used to populate a video content inventory 114 used in an advertising market exchange. Moreover, video may be activated or made interactive using region definition data linking video objects with advertising content.

[00062] The method of the present invention should be understood to include both motion picture and object based animation. Hereinafter reference to video should therefore be understood to include both motion picture and object based animation. [00063] FIG. 1 is a high-level flow diagram of the method of a first embodiment of the present invention.

[00064] In steps 700 and 710 a video is segmented and classified using an automated segmentation and classification application to create a list or table of objects. The segmentation process 700 yields a list of video objects throughout the video (including the temporal coordinates and region definition date for each object) and the classification process 710 matches occurrences of the same object in different frames of the video thereby eliminating duplication/redundancy. It should be noted that the location, size and shape of a video object can and usually will vary throughout a video. The size of an object varies depending on its proximity which will vary as the object moves throughout the video. Similarly, the shape of an object may vary depending on the perspective or vantage point from which it is seen, e.g. frontal view versus side view. Moreover, the system of the invention is able to segment and classify a moving object. Thus the location of the video object dynamically changes as the underlying object moves, which is represented in the region definition data.

[00065] In step 720, the video objects are compared with objects in an object library, which may be 3D wire frame data representing objects within the video. Perspective views from such 3D wire frame models may be advantageous to the automatic object recognition process. This step is optional. If a match is detected then the object is associated with the product description and/or meta-data from the object library. The unmatched video object may be discarded or subjected to a secondary processing application and/or an analyst may manually identify/configure the object boundaries (step 730), and then the object may be subject to another classification step

(710) and/or another comparison with objects in the object library (720).

[00066] In step 740, the video objects are published to the market exchange and subject to an automated auction process.

[00067] In step 745 a video overlay is created which links the video object with the advertiser provided content, and in step 750 the video overlay with region definition data is transmitted to the video broadcaster 120.

[00068] In step 760, a video view interacts with the video overlay by rolling-over or selecting a video region thereby triggering the display of advertising content associated with the video object. It should be appreciated that rolling over may illicit the display of different advertising content than that displayed when the object is selected. For example, selecting an object may trigger more detailed information than that displayed when the object is simply rolled-over with the pointing device.

[00069] In step 770 (optional), the VOME 300 records the viewer interaction with the video objects and updates the viewer's behavioral profile. It should be noted that the video viewer's interactions with the video overlay (and the video objects) generally triggers the display of advertising content from the VOME 300. This enables the content associated with a video object to be updated on-the-fly without the need to alter the video overlay. The video overlay makes the video interactive by making video regions selectable hyperlinks, but the actual content comes directly from the VOME 300. [00070] In step 780 (optional), the VOME 100 completes a sales transaction initiated by the viewer's interactions with the video region representing the video object. As will be explained below in further detail, the VOME 300 may enable the viewer to complete a sales transaction.

[00071] Each of these steps will be described in additional detail below.

[00072] It should be appreciated that the relative order of steps can frequently be changed without impacting the system, for example steps 770 and 780 may be reversed without impacting the results. The listing of steps in a particular order should not be read as a limitation to a specific order unless it is clear from the context that a particular order is required.

[00073] Thus far, the method of the invention has been described with reference to video objects. However, the invention also pertains to the identification of events within a video, where an event is defined as an object moving through space. For example a person is walking or a car is driving. Even if the object is not defined, the event can still have characteristics such as high velocity which might be of value to advertisers. [00074] In object based animation, each object is defined by its vector graphic class. Consequently, the analyst doesn't teach the system to recognize objects, but rather describes the objects of interest. For each described object, the system stores Object Information useful for identifying each occurrence of the object in the animation data. [00075] Object based animations such as Adobe Flash or Java maintain object and event information. Other animation techniques lose references to objects and events during rendering. Once the references are lost, we have to apply object and event recognition techniques just like in regular pixel based video.

[00076] In motion picture media, video content analysis technologies are used to identify objects based on size, shape, color, color density etc. The present invention is not limited to any particular method for identifying content objects within video and several different methods are discussed in detail below. Analysts may manually train the segmentation application 106 to recognize an object by, for example, tracing the outline of a video or the system may present patterns of content objects it found by statistical pattern recognition.

[00077] A video content analysis or video analytics application 106 which is explained below in further detail automatically detects additional occurrences of the identified object in the video. The video content analysis application 106 may be provided with an object information library 112 containing 3D wire frame models or characteristics for identifying one or more pre-defined objects from one or more differing perspectives, or the video content analysis application 106 may be provided with heuristics for identifying objects. For each identified occurrence of an object, the VOME 300 stores information for identifying the frame (temporal coordinate) and region definition data (location within the frame, e.g. x, y, and z coordinates) in which the object appears. Using the region definition data, the VOME 300 is able to dynamically track an object. It should be noted that the size, shape and location of the selectable region (hyperlink) corresponds to the size, shape, and location of the underlying object. [00078] According to one refinement of the invention, the auction is automatically triggered when a viewer accesses or requests access to video content. [00079] According to another refinement of the invention, the auction may be triggered by expiration of an advertiser's right to associate advertising with a given video object. The auction may further be triggered each time video objects are added to the inventory of video objects or on a periodic basis, e.g., every hour, day, or week. [00080] According to yet another embodiment, the advertiser may search the database 114 of video objects (object inventory database) and purchase the rights to associate content with an object thereby bypassing the auction process, or may signal interest in participating in an auction by submitting an opening bid. Moreover, the advertiser may advise the VOME 300 of particular market segments, demographics, user behavioral profiles or the like which it is interested in bidding on. [00081] The advertiser 122 may be provided with viewer profile information pertaining to the video viewer 124 who triggered the auction such as taught in US Patent Number 6718551 entitled Method and system for providing targeted advertisements" which is hereby incorporated by reference. It should be noted that the viewer profile information is available because the video viewer 124 triggers the auction by requesting access to the video.

[00082] The viewer profile may be a multifaceted viewer profile identifying, among other things, the viewer's click history, purchasing habits, social network, history of geographic locations, browsing and search habits, and/or additional demographic data. The multifaceted viewer profile may be compiled, inter alia, from cookies stored on the viewer's computer, or from third party information of the viewer. The multifaceted viewer profile information may be used in determining the relative value of a given viewer for a given bidder (advertiser). [00083] In addition to providing the bidders with viewer profile information, the

VOME 300 provides a list of the objects contained in a video or set of videos. [00084] According to one embodiment, the VOME 300 solicits bids on an individual basis for rights to associate advertising content with one or more of the video objects contained in the video accessed by the viewer. Thus, different advertisers may own temporary rights to different video objects in a given video.

[00085] It should be noted that the advertising rights being auctioned are different from the traditional banner ads which are "pushed" to the video viewer. Push-advertising is advertising which is displayed on the top (banner) or the side of the viewer's display screen. Push-advertising is pushed to the viewer, i.e., the viewer does not specifically request the advertising. As will be explained below in further detail, according to one embodiment the video viewer pulls the advertising content by interacting with specific regions representing video objects within a video. For example the viewer may point to a video frame, which causes the video to slow down, select or roll-over a video object within the video thereby triggering the VOME 300 to display contextual information linked or associated with the object as a pop up, overlay or in a field next to the video player. However, the VOME 300 may combine the pull advertising with conventional push-advertising. For example, the VOME 300 may push advertising content which relates to the objects as they appear in the video, or the VOME 300 may push advertising relating to the type of objects with which the viewer has interacted e.g., object which the viewer has rolled-over or selected.

[00086] As noted above, the VOME 300 may provide 3^rd parties such as advertisers 122 with the profile of an actual video viewer in real-time before making the bid. Alternatively, the VOME 100 may simply auction rights to the video content objects for each of a plurality of market segments. For example, the VOME 100 may segment the market by a combination of age, gender, income, region or spending habits etc. If the auction occurs prior to access by the video viewer 124 it will not be possible to provide the advertisers (bidder) with actual viewer profile information and the VOME 100 will auction the rights by market segment.

[00087] It should be understood that the term automatic as used herein refers to actions which take place without human intervention. In other words, the auction is initiated by the VOME 300 simply by the addition of new content to the inventory 114 or the expiration of previously auctioned rights etc. The VOME 300 automatically segments video files and automatically classifies the video objects. The advertiser's server may include an automated bidding application which automatically submits bids to the VOME 300. Also, as will be explained below, the processing of video to create activated video objects, and the addition of such video objects to the inventory may itself occur without human intervention. Thus, the VOME 300 may according to some embodiments be a fully automated system. The only requirement for the system to run fully automatically is a preprocessed database 112 with images of objects from different viewing angles or 3D wire frame models of the objects with object descriptions. [00088] FIG. 2 is a block diagram of a first embodiment of the system 100 of the invention. System 100 includes a database 102 of video content whose rights are owned by a broadcaster 120 or the like. The term "broadcaster" simply refers to the party who owns the rights to the video content and makes it available to viewers 124 via interactive TV or streaming websites. [00089] The database 102 resides on a broadcaster server 200 (FIG. 2) which may be accessible over a distributed network 104 such as the Internet. Server 200 includes a processor 202 which is connected via BUS 204 to a mass storage device 206, Read-Only- Memory (ROM) 208 and Random Access Memory (RAM) 210 (which may by volatile or nonvolatile). The database 102 may be stored in RAM 210, ROM 208, or mass storage device 206. Accessory devices such as keyboard 212, touch screen 214 which serves both as a keyboard and a display, display device 216, and pointing device (mouse) 218 may optionally be connected to the server 200.

[00090] The database 102 contains unprocessed or raw video content which is accessed by a video content segmentation and classification engine 106 hereinafter referred to as a content analysis application. The phrase "raw video content" refers to video which has not been processed to identify objects.

[00091] In FIG. 2, the database 102 is shown as copied to database 108; however, copying of the database 102 is optional.

[00092] Database 108 resides on a video segmentation and classification server

300 (FIG. 2) which may be accessible over a distributed network such as the internet 104. Hereinafter reference to accessing the database 102 should be understood to be synonymous with accessing database 108 and vice versa.

[00093] Server 300 includes a processor 202 which is connected via BUS 204 to a mass storage device 206, Read-Only-Memory (ROM) 208 and Random Access Memory 210 (which may by volatile or nonvolatile). The video file database 108 may be stored in RAM 210, ROM 208, or mass storage device 206. Accessory devices such as keyboard 212, touch screen 214 which serves both as a keyboard and a display, display device 216, and pointing device (mouse) 218 may optionally be connected to the server 300. [00094] An inventory 114 of video objects is assembled by segmenting and classifying the raw video content from database 108 (or 102) to identify video objects therein. More particularly, the video content analysis application 106 segments the raw video content to yield a list of all the video objects in a given video. Then the video content analysis application 106 classifies the list of video objects to resolve occurrences of the same video object throughout the video. VOME 300 may be provided with separate software applications for performing segmentation and classification, or a single software application may perform both segmentation and classification. [00095] Disclosed is a method for providing active regions for an interactive layer for a video application, comprising accessing video data that defines a plurality of frames showing a plurality of video objects, each video object being shown in a sequence of frames, generating region definition data through using video object recognition algorithms including video object segmentation and classification. Such region definition data defines a plurality of regions, each region corresponding to one of the plurality of video objects, wherein the outline of each region defined by the region definition data matches the outline of the corresponding video object as it is shown in the sequence of video frames.

[00096] According to one refinement of the invention the outline of each region dynamically changes in the sequence of frames to match changes in at least one of the perspective and the size and the angle of view in which the corresponding video object is shown in the sequence of frames. [00097] According to one refinement of the invention, region definition data is used to define a plurality of active regions for interactive video viewing. [00098] According to one refinement of the invention, the frames are shown to a user on a display as a video, and the region definition data is used to determine whether a user action directed to a location of at least one of these frame addresses one of the active regions.

[00099] According to one refinement of the invention, in response to a determination that the user action addresses a certain active region, additional information is presented to the user, the additional information pertaining to the video object that corresponds to the certain active region.

[000100] According to one refinement of the invention, the region definition data for at least one region includes a three-dimensional wireframe representation of the video object that corresponds to the region.

[000101] According to one refinement of the invention, the region definition data for the region further contains, for at least one frame of the sequence of frames in which the corresponding video object is shown, data defining a perspective view of the three- dimensional wireframe representation, the outline of the perspective view of the three- dimensional wireframe representation defines the outline of the region for the frame. [000102] According to one refinement of the invention, the region definition data for the region further contains, for at least one pair of frames of the sequence of frames in which the corresponding video object is shown, data defining a change of the three- dimensional wireframe representation between the frames of the pair of frames. [000103] According to one refinement of the invention, the three-dimensional wireframe representation includes a plurality of nodes, and the data defining the change includes data that defines a displacement of a position of at least one node with respect to at least another node.

[000104] According to one refinement of the invention, the data defining the change includes data that defines a change in at least one of the size and spatial orientation of the 3D wireframe representation.

[000105] The video content analysis application 106 may access an object information library 112 which is a database stored on or accessible to server 300. For example, the object information library 112 may be stored on a memory device such as memory device 206 and/or RAM 210 used to store the program instructions for the video content analysis application 106. The library 112 stores images of objects from different viewing angles or 3D models of the objects. The image information may be used as the index or key to link descriptive information with the video object. The library 112 further contains one or more of an object identifier, label, and or meta-data description of the video object which may be used to describe the video content object to prospective bidder.

[000106] Alternatively, the content analysis application 106 may utilize logic to identify video content objects without recourse to object information library 112. [000107] Applicant hereby incorporates by reference to U.S. Patent number 6,625,310 entitled "Video segmentation using statistical pixel modeling" which discloses one of many methods for segmenting video data into foreground and background portions which utilizes statistical modeling of the pixels. A statistical model of the background is built for each pixel, and each pixel in an incoming video frame is compared with the background statistical model for that pixel. Pixels are determined to be foreground or background based on the comparisons.

[000108] Applicant hereby incorporates by reference to U.S. Patent number 6,462,754 entitled "Method and apparatus for authoring and linking video documents" which discloses an authoring method for video documents, involves creating anchorable information unit file based on boundaries of objects of interest such that objects interest are used to identify portions of video data.

[000109] Applicant hereby incorporates by reference to U.S. Patent number 7325245 entitled "Linking to video information" which discloses a system which enables dynamic linking between a variety of video formats including television broadcasts, web pages, and video displays which are stored on magnetic or optical media. Each frame of the video information is identified together with a plurality of locations within that frame. The locations selected by the user, for example using a pointing device, are then used to access associated information either within the system itself or on an external system. [000110] Applicant hereby incorporates by reference to U.S. Patent Publication 20080294694 entitled "Method, apparatus, system, medium, and signals for producing interactive video content" which discloses a method for producing interactive video content on a content publisher computer. The method involves associating indicia with at least one image portion in the video content, the indicia being operably configured to follow the at least one image portion as a display position of the image portion changes in the video content. The method also involves associating an image portion identifier with the indicia, and associating link properties with the indicia, the link properties being operable to cause transmission of a content location request to a registration server in response to selection of the indicia by a viewer of the interactive video content. The content location request includes the image portion identifier.

[000111] In case of a manual or semi-automated process, the inventory 114 may be created by the content analysis application 106 with the assistance and/or review of a human analyst 110. -The analyst 110 may manually identify a given instance of a video object by, for example, viewing a still image of the video and tracing the video object

(manual process), and then utilize the content analysis application 106 (semi-automated process) to identify other occurrences of the video object in the video. Additionally or alternatively, an analyst 110 may review and refine the boundaries of an unmatched video object, and then subject the object to a second round of classification and/or second round of matching the object with objects in the object library.

[000112] Alternatively, the analyst 110 may review and edit objects which were automatically identified by the content analysis application 106.

[000113] Thus far we have described the process by which an inventory 114 of video objects is created from raw video. The video object inventory 114 is stored on a storage device which is either accessible over the distributed network 104 (internet) or a copy of the database 114 is made accessible over the network.

[000114] It is important to note that the video objects are used to create selectable regions (hyperlinks) which dynamically track the movement, size and position of the object throughout the video.

[000115] According to one embodiment, a VOME 300 auctions adverting rights to video content objects stored in the inventory database 114 to advertisers 122. The auction is performed by automated auction application 126 on VOME server 300 which communicates with an automated bidding application on the advertiser server 500. More particularly, auction application 126 is a software application executed on processor 202 and stored on one of mass storage device 206, ROM 208 and RAM 210. The auction application 126 auctions rights to associate content with a video object. The auctioned rights may be time limited, i.e., rights which expired after a pre-defined amount of time has expired. Auction application 126 may include logic for automatic billing and/or settlement of bids.

[000116] The auction application 126 stores auction information identifying the owner of rights to associate content with an auction, the duration of such rights, content to be associated with the object, and billing information. See, FIG. 7. The auction information is stored in an auction information database on server 300. More particularly, auction information database is stored on one of mass storage device 206, ROM 208 and RAM 210.

[000117] The VOME server 300 includes an object association application which creates a video overlay used to associating advertising content received from the advertiser 400 with the video objects. The overlay is supplied by the VOME server 300 to the broadcaster 120 and in turn from the broadcaster 120 to the viewer 124 along with the underlying video.

[000118] The advertiser 122 uses a computer or server 500 (FIGs. 2, 3) to bid on the right to associate content with a video object. Computer 500 includes a processor 202 which is connected via BUS 204 to a mass storage device 206, ROM 208 and RAM 210 (which may by volatile or nonvolatile). An automated bidding application executes on processor 202 and may be stored on one or more of the ROM 208, RAM 210, and mass storage 206. The automated bidding application communicates auction bids to the automated auction application on the VOME 300. The automated bidding application is responsive to information from VOME 300 describing the video object(s) being auctioned. The use of video content objects transforms raw video into interactive video content.

[000119] The viewer 124 uses a computer 400 (FIG. 3) to access the video content made available by broadcaster 120 on a website or the like accessible over a distributed network such as the Internet. Computer 400 includes a processor 202 which is connected via BUS 204 to a mass storage device 206, Read-Only-Memory (ROM) 208 and Random Access Memory (RAM) 210 (which may by volatile or nonvolatile). A web browser executes on the processor and is used to access websites on the Internet. The viewer 124 interacts with the video overlay by selecting or rolling over a region representing a video object using a conventional pointing device 218, and/or using a touch sensitive screen 214 such as is known in the art. Interaction by the video viewer 124 triggers display of supplemental content such as advertisements. The advertiser 122 is bidding on the right to supply the advertising content.

[000120] The auction of advertising rights may be automated. For example, the VOME 300 may request a minimum starting bid and specify bidding increments, and each advertiser 122 may provide automated bids for viewer matching specified criteria up to a pre-determined maximum bid.

[000121] According to a variation of the previous embodiment, the auction of adverting rights to a video (including all of the video objects therein) or to individual video content objects is triggered when a video viewer 124 accesses the broadcaster's website and/or requests access to video content accessible therethrough. According to this embodiment, the broadcaster 120 is able to provide viewer profile information for the video viewer 124 to the advertiser 122. The viewer profile information may, for example, contain information regarding websites previously accessed by the viewer 124 and/or information regarding the purchasing habits of the viewer 124. [000122] Regardless of the starting point (manual or automated identification of objects, the end product is a database (video content inventory) 114 listing the coordinates (frame and sub frame) and semantic model for each identified object within a given media presentation (movie clip). This inventory 114 may be offered on an advertising market exchange (VOME) for advertisers to bid on. Advertiser will bid on inventory based on contextual information and multifaceted viewer profile of the viewer viewing the video content and the inventory description of the video. [000123] The advertiser may decide to push an overlay message content on the video object while a user with a certain multifaceted user profile views it. The interaction of a viewer with video objects may be used to refine the messages pushed to the viewer in the same way as search terms are currently used to refine messages to users while searching for something.

[000124] FIG. 4 is a flowchart of how an advertiser interacts with the VOME 300. In step 800, the advertiser deploys a search of the video content inventory 114 based on inventory descriptors or may submit images of products that he would like to purchase inventory rights to. The use of a semantic search as opposed to a more rudimentary keyword search is preferred because the semantic search is able to cope with the variations in descriptor information.

[000125] In step 802 the VOME 300 returns a list of objects and object classes matching the advertiser's search, and the advertiser aligns the search results with the advertiser's media strategy and budget. In step 804A, the advertiser simply chooses to purchase the inventory identified in step 802. Alternatively, in step 804B the advertiser specifies to the VOME 100 items which the advertiser is interested in bidding upon during the real-time auction. Moreover, the advertiser may specify a starting bid and/or a maximum bid. Alternatively, the VOME 100 may specify the starting bid and incremental increases in the bid, and the advertiser merely specifies a maximum bid. [000126] FIGs. 5 and 6 are flowcharts showing the interactions of a viewer with the VOME 300. In step 600, a viewer searches or browses for video content. In step 602, advertising content (contextual advertising) relating to the key words used in step 600 is displayed along with a list of video content search results. In step 604 the viewer selects a video to view, and in step 606 the contextual advertising is refined in relation to the selected video.

[000127] In steps 608 A and 608B the viewer is viewing the selected content (1700 in FIG. 6) and encounters video objects of interest. According to one embodiment pointing at the frame by, e.g., bringing pointer 1701 of pointing device 218 into video frame 1703 will cause the video to slow the video down, which allows the viewer to select an object. In the case of interactive TV or three-dimensional (3D) videos, the viewer can use a variety of pointing means including, but not limited to, a virtual pointer of the type popularized by the Nintendo Wii® which utilizes a glove or the like with sensors capable to determine X, Y, and Z coordinates. In step 608A the viewer merely tags the objects of interest for later review (1702 and 1704 in FIG. 6), whereupon in step 610 the contextual advertising is once again refined (this time in relation to the objects of interest) and the behavioral profile of the viewer is updated. Steps 608 A and 610 may be repeated any number of time during the viewing of the video. In step 612, the viewer reviews the list of tagged items from step 608 A and either jumps back to the scenes in which the items appear (step 614A and 1704 in FIG. 6) or learns more about the items selected, e.g., price, features etc (step 614B). In step 616 the viewer selects one or more objects (products) to purchase (from the tagged or identified objects), and in step 618 the viewer completes the transaction (1708 in FIG. 6).

[000128] Step 608B is an alternative to step 608A and presents the viewer with the option to immediately jump to 614 and learn more about the object. The information associated with the video object may be displayed as an overlay pop-up or in a field next the video player. Each time the viewer interacts with video objects his/her profile gets updated in the database.

[000129] While the invention has been described in detail with respect to the specific embodiments thereof, it will be appreciated that those skilled in the art, upon attaining an understanding of the foregoing, may readily conceive of alterations to, variations of, and equivalents to these embodiments. Accordingly, the scope of the present invention should be assessed as that of the appended claims and any equivalents thereto.

Claims

[000130] Claims

[000131] 1. System for automatically segmenting and classifying video content into objects and auctioning the same, comprising:

[000132] a video segmentation and classification server including a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory;

[000133] said video segmenting and classification server including one or more video files stored in a video database;

[000134] an object information library stored on one of said random access memory, read-only memory, and mass storage memory, said object information library containing object information used to identify objects within the video files and at least one of descriptive information and semantic information used to describe the object;

[000135] an object inventory database containing information describing a location of at least one video object within one of the video files; and

[000136] a video content analysis application executed on said processor, said video content analysis application segmenting the video files to identify locations of video objects, classifying the video objects to match occurrences of a given video object, retrieving information describing the video object by matching occurrences of classified video objects with video objects in said object information library, and storing information describing the dynamic location of the video object within the video and information describing the video object in the object inventory database. [000137] 2. The system of claim 1, further comprising:

[000138] at least one advertising server including a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory;

[000139] an automated bidding application executed on said advertising server; and

[000140] an automated auction application executed on said video segmentation and classification server, said auction application transmitting auction information to said at least one advertising server, said auction information including information describing a selected video object, said automated auction application receiving bid information from said automated bidding application from said at least one advertising server and awarding rights to associate advertising content to a selected one of said at least one advertising server.

[000141] 3. The system of claim 2, wherein said automated auction application transmits to said advertiser bidding application at least one of consumer behavioral information and market segment information associated with said given video object.

[000142] 4. The system of claim 1 further comprising:

[000143] adverting content in a database;

[000144] an overlay generation application for creating a video overlay linking said advertising content with a given said video object and creating a selectable hyperlink whose position tracks a dynamic location of said video object in the video. [000145] 5. The system of claim 4 further comprising:

[000146] a video broadcaster server comprising a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; [000147] a video consumer server comprising a computer connectable to a distributed network and having a processor, random access memory, read-only memory, and mass storage memory; [000148] said video broadcaster server receiving said video overlay from said video segmenting and classification server and transmitting said video overlay to said video consumer server, wherein said video overlay selectively causing the display of content information linked with a given said video object responsive to interaction with said given video object. [000149] 6. The system of claim 2, wherein said video segmenting and classification server maintaining a database of objects which have been auctioned. [000150] 7. The system of claim 6, wherein each said database entry includes information indicating when rights to an auctioned object expire. [000151] 8. The system of claim 2, wherein said at least one advertiser server communicates to said video segmenting and classification server information specifying a desired demographic audience, and said auction server communicating object auction information which is restricted to said desired demographic audience. [000152] 9. The system of claim 2, wherein said auction information includes information specifying at least one of demographic information and user behavioral history information. [000153] 10. The system of claim 5, wherein said video consumer server further includes a content display application for displaying video and which interacts with said video overlay and displays advertising content when a given video object is one of selected and rolled-over with a pointing device. [000154] 11. The system of claim 10, wherein selection of video object causes the content display application to pause or slow the display of video. [000155] 12. The system of claim 10, wherein:

[000156] said video consumer server includes a pointing device; [000157] said content display application displays first content associated with an object appearing in the video, and displays second content when a selected said video object within the streaming video is one of selected and rolled-over with said pointing device. [000158] 13. The system of claim 10, where the advertising content includes a selectable link wherein selection of the link provides ecommerce options. [000159] 14. System for automatically creating selectable hyperlinks in a video, comprising: [000160] segmenting the video file into video objects, a plurality of video objects, [000161] classifying the plurality of video obj ect to identify duplicate occurrences of a given said video object; [000162] storing frame and sub frame information for each video object in a database; and [000163] creating selectable hyperlinks in the video file using a video overlay linking at least one video object in the database with the video file. [000164] 15. The system of claim 14, wherein the selectable hyperlink is linked with advertising which is displayed when a user rolls over or selects the hyperlink with a pointing device.

[000165] 16. A video object market exchange, comprising:

[000166] a video segmentation and classification application for automatically segmenting video into a plurality of objects, classifying the objects into groups of similar objects, labeling the objects with descriptive information, and storing information identifying a dynamic location of the video object within the video in a database; and [000167] an overlay generator for automatically creating a video overlay linking at least one group of video objects with the video. [000168] 17. The video market exchange of claim 16, where each linked video object is a selectable hyperlink whose position tracks the dynamic location of the video object in the video. [000169] 18. A method for providing active regions for an interactive layer for a video viewer application, comprising: [000170] accessing video data that defines a plurality of frames showing a plurality of video objects, each video object being shown in a sequence of frames, and [000171] generating region definition data that defines a plurality of regions, each region corresponding to one of the plurality of video objects, wherein [000172] the outline of each region defined by the region definition data matches the outline of the corresponding video object as it is shown in the sequence of frames. [000173] 19. The method of claim 18, wherein the outline of each region dynamically changes in the sequence of frames to match changes in at least one of the perspective and the size and the angle of view in which the corresponding video object is shown in the sequence of frames. [000174] 20. The method of claim 18, further comprising using the region definition data to define a plurality of active regions for interactive video viewing. [000175] 21. The method of claim 20, wherein the frames are shown to a user on a display as a video, and wherein the region definition data is used to determine whether a user action directed to a location of at least one of these frame addresses one of the active regions. [000176] 22. The method of claim 21 , wherein, in response to a determination that the user action addresses a certain active region, advertising is presented to the user, the advertising pertaining to the video object that corresponds to the certain active region. [000177] 23. The method of claim 18, wherein the region definition data for at least one region includes a three-dimensional wireframe representation of the video object that corresponds to the region. [000178] 24. The method of claim 23, wherein the region definition data for the region further contains, for at least one frame of the sequence of frames in which the corresponding video object is shown, data defining a perspective view of the three-dimensional wireframe representation, wherein the outline of the perspective view of the three-dimensional wireframe representation defines the outline of the region for the frame.

[000179] 25. The method of claim 24, wherein the region definition data for the region further contains, for at least one pair of frames of the sequence of frames in which the corresponding video object is shown, data defining a change of the three-dimensional wireframe representation between the frames of the pair of frames.

[000180] 26. The method of claim 25 , wherein the three-dimensional wireframe representation includes a plurality of nodes, and wherein the data defining the change includes data that defines a displacement of a position of at least one node with respect to at least another node.

[000181] 27. The method of claim 25 , wherein the data defining the change includes data that defines a change in at least one of the size and spatial orientation of the three-dimensional wireframe representation.