GB2485573A - Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest - Google Patents

Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest Download PDF

Info

Publication number
GB2485573A
GB2485573A GB1019578.2A GB201019578A GB2485573A GB 2485573 A GB2485573 A GB 2485573A GB 201019578 A GB201019578 A GB 201019578A GB 2485573 A GB2485573 A GB 2485573A
Authority
GB
United Kingdom
Prior art keywords
information
image
displayed
interest
items
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1019578.2A
Other versions
GB201019578D0 (en
Inventor
Alan Geoffrey Rainer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to GB1019578.2A priority Critical patent/GB2485573A/en
Publication of GB201019578D0 publication Critical patent/GB201019578D0/en
Publication of GB2485573A publication Critical patent/GB2485573A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • G06F17/30256
    • G06K9/00711
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/44Receiver circuitry for the reception of television signals according to analogue transmission standards
    • H04N5/445Receiver circuitry for the reception of television signals according to analogue transmission standards for displaying additional information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A region of interest (RoI) (4) from a displayed video image (6) is selected (14), features defining the content of the RoI are identified, extracted and compared with an information source (16) to determine whether a correspondence exists between the extracted information and the information about items signified by displayed images contained with the information source. At least a portion of the information about items signified by the displayed image is relayed to a user. Known classifiers are combined to identify the RoI features, such as shape-based features, Scale Invariant Feature Transform (SIFT) features and gradient location and orientations histograms. The RoI may comprise a face, and the time and source of the selection may be correlated with scheduled transmissions to assist in RoI identification. Previous occurrences of the display of the item in the RoI may be used in identification. The additional information may be displayed on the same display as the video image or on a secondary display.

Description

Methods and systems of identifying video images
Field of the Invention
The invention concerns methods and systems of identifying video images and portions of such images, specifically those sourced from broadcast signals, such as television. The items on which the images are based include people, objects such as commercial products and places.
Background to the invention
A common problem experienced when viewing images, such as television pictures, is that of recalling who the actor is they are watching play a role, or where a programme is set.
Often a location or an item can seem familiar in a drama, or be of interest to the user in another way -perhaps a user might want to visit a building featured in a drama, or find out how to buy a particular product featured.
A conventional manner of finding out such a thing is to use a separate computer or a book (such as a TV guide) to research the answer. Because of the separateness of the computer or book from the programme itself it may be necessary for the inquisitive person to transfer attention from the television, whilst the programme is on, which is particularly undesirable for both viewer and supplier of programme -the viewer is sufficiently absorbed in the programme to ask the question in the first place, but now must turn away from the programme in order to find the answer Alternatively, the user may wait until after the programme has finished to do his or her research; the question or questions may have been forgotten by then, and in the ease of a product, the opportunity for a sale may have been missed. In either case, the very fact that the viewer is interested in an element of a programme is drawing his or her attention away from the programme itself, when it would be far more natural for the viewer to be drawn in as a result.
A further and pressing problem is that of the level of interaction in television. Oft criticised for being a passive medium', the television is now thought of as an old form and is losing viewers to fresher seeming, more interactive mediums such as the internet (especially since the advent of "web 2.0" software and the like). Television networks are looking for ways of making the experience more immersive and interactive to retain existing viewers and bring old viewers back. Several satellite and network systems is already comprise limited interactivity, but more interactivity is desired.
It is an object of the invention, amongst others, to provide solutions to these and other problems.
Summary of the invention
In a first broad, independent aspect, the invention comprises a method for identifying a region of interest in an image, comprising the steps of receiving a signal corresponding to a selected region of interest from a displayed video image, querying the content of that region of interest, extracting the data obtained from the query, comparing the extracted information to an information source, determining whether a correspondence exists between the extracted information and the information about items signified by displayed images contained with the information source, and relaying at least a portion of the information about items signified by the displayed image to a user.
A first advantage of the method is that it allows the user to find the answer to an identity-based question contemporaneously with the question being posed. As such, it is possible for more questions to be answered, rather than merely forgotten.
It also increases the level of potential interaction between the viewer and the image, which has the advantage of keeping the viewer focused on the image.
Generally, it provides for a more absorbing experience.
The method also allows for queries to be made without interrupting the image (which in preferred embodiments may be a television show), or having to have recourse to a different medium (such as a separate film database or a book) in order to find the answer to his or her query.
In this way, the method seeks to provide a solution to the problems mentioned above.
In a first subsidiary aspect, the query is affected by using a boosted or multiple instances classifier, combining a plurality of classifiers from the following list: * Shape-based features, * Scale invariant feature transform features, * Speeded up robust features, * Gradient location and orientation histograms, * PCA-SIFT features, * Contrast context histograms, and * Data sieves.
Using a boosted or multiple instances classifier will allow for accuracy in recognition of correspondences. The classifiers selected are capable of effective and useful combination.
In a second subsidiary aspect, the querying of the content of the image is done using a boosted classifier, combining a plurality of classifiers from the following list: * Scale invariant feature transform features, * Gradient location and orientation histograms, and * PCA-SIFT features.
The classifiers above are the most accurate methods of object recognition and function well in combination; as such they comprise an inventive selection.
In a third subsidiary aspect the invention comprises the additional step of employing a face detector to verify that a region of interest contains a face, subsequent to the step of Selecting a region of interest from a displayed image, but prior to the step of querying the content of that image and wherein the methods of query subsequently employed are determined by whether the face detector detects a face or not.
The utilization of a face detection step allows for the speeding up of the system by culling down the amount of information to be searched through. It is anticipated that the system will often be used for the purposes of identifying people, and the provision of a face detector will therefore also serve to direct information traffic more effectively to the correct part of the database.
In a fourth subsidiary aspect the invention comprises the steps of noting the time of the generation of the query and/or the source of the image; and attempting to retrieve a list of potential items corresponding to said image on the basis of said time and source data.
This provides a means of reducing the time and computing power required for comparison and also increases the level of accuracy of the search.
In a fifth subsidiary aspect, the extracted data further comprises audio data.
Supplementing image data with audio data provides more material with which to find an accurate correspondence; adding further features t o a boosted or muhiple classifier.
In a second broad independent aspect, the invention comprises a computer programme configured to operate the method of claim 1.
Embodying the method in a computer programme is advantageous, because it allows the programme to be provided as a download or an application which may be put into effect with extant hardware.
In a third broad, independent aspect, the invention comprises a system for the identification of regions of interest derived from displayed images, comprising an interface, for allowing a user to select a region of interest from a displayed image, an extractor for extracting information associated with a selected region of interest from the displayed image, an information source, comprising information about items signified by displayed images, a processor, in communication with the information source and the extractor, for comparing the extracted information with the information about items signified by displayed images, in order to identify instances of correspondence between the extracted information and the information about items signified by displayed images and a relayer, for relaying, in the event of a correspondence being found, at least a portion is of the information about items signified by the displayed image to a site.
This system confers the advantages given by the method above.
In a first subsidiary aspect of the third broad, independent aspect, the information source further comprises information relating to previous occurrences of the display of the item as an image, and wherein the means of identification of the extracted information with an item is done by comparing the extracted information with information relating to the previous occurrences of the item as an image.
The advantageously allows for an information source independent of the image producers; there is no need for information to be embedded in the source itself Preferably, the extracted information is stored on a database for subsequent use as information related to previous occurrences of an item as an image.
By including this information as part of the database, it is the case that the database will expand as an item occurs more frequently, and the level of accuracy in identifying a particular object will increase.
In a second subsidiary aspect of the third broad, independent aspect, the displayed images are derived from a broadcast signal.
Preferably the information source is data embedded in the said broadcast signal.
Where the information source is embedded, speed of access is increased.
Preferably the information source is a database.
Where the information source is a database, a larger amount of data may potentially stored, including information about items which information embedded in a broadcast signal might not have covered. As such, where a system allows for selections to be determined by the user -i.e. not from a pre-selected pool determined by another such as a Is broadcaster, the provision of a database is near essential.
In a third subsidiary aspect of the third broad, independent aspect, the region of interest is selected by the user from one or more predefined regions of interest Predefined regions of interest speed up the identification process.
Preferably, a region of interest is defined by the user's manipulation of the interface.
This is advantageous in that it allows for a broader range of identification queries.
In a fourth subsidiary aspect of the third broad, independent aspect, the invention further comprises a display, upon which the displayed image is displayed.
The provision of a display as part of the system ensures the compatibility of the display.
In a fifth subsidiary aspect of the third broad, independent aspect, the interface comprises a display with a touch screen.
A touch screen selection, allows the user to become more precise in his or her selections.
In a sixth subsidiary aspect of the third broad, independent aspect, the system comprises display upon which the displayed image is displayed, and an interface which comprises a secondary display with a touch screen.
A configuration with two displays, allows the first to be freely viewed by muhiple people, whilst the second is used to make a query.
Preferably the relayer displays information about items signified by the displayed image on the same display on which the displayed image is displayed.
The overlaying of information in this way allows the user to continue viewing even whilst data following a search is displayed.
Preferably the relayer displays information about items signified by the displayed image on a secondary display.
Provision of the information on a secondary display, allows viewing of the first display to be conducted without it being obscured by the information.
Preferably, the database stores the data collected by the processor, adding it to the store of information relating to the previous occurrences of the item as an image.
In so doing, the database expands, allowing for more information to be collected about items, which in turn improves accuracy.
In a seventh subsidiary aspect of the third broad, independent aspect, the extractor is a handheld imaging device.
Such a device is particularly easy to use, and saves the user from having to physically interface with the screen which first shows the image. This keeps the screen clean and free from being obscured.
Preferably, the extractor is a mobile telephone.
This advantageously utilizes existing appropriate technology in service of the system, allowing easier joining of and participation in the system.
In an eighth subsidiary aspect of the third broad, independent aspect at least a part of the database is remote from the rest of the system, and wherein said database may be accessed by a plurality of said systems.
The provision of a remote database and access for multiple systems allows for a large, shared database to be built up.
The invention also comprises a system substantially as described herein, with reference to is and as illustrated by any combination of the text and/or drawings.
The invention also comprises a method substantially as described herein, with reference to and as illustrated by any combination of the text and/or drawings.
Brief description of the figures
The preferred embodiments of the invention will now be described in detail, with reference to the figures, in which: Figure 1 shows a first embodiment of the invention.
Figure 2 shows a second embodiment of the invention.
Figure 3 shows a third embodiment of the invention.
Figure 4 is a block diagram showing a system wherein content may be embedded in a DVB stream.
Figure 5 is a block diagram of a system wherein the information is embedded, and a frame buffer stores a local copy of the television image.
Figure 6 is a block diagram of a wherein the system stands apart from the broadcast itself The figures and the features contained therein will now be discussed in detail with regard to a variety of practical embodiments of the invention.
Detailed description of the preferred embodiments
At figure 1 is shown, generally, a system 2 of the invention. The system 2 is interactive and allows the user to select a region of interest 4 on a screen 6 such as a television screen and automatically retrieve properties that relate to the region of interest 4, which may contain a person, object or place. In order to do this, once the region of interest 4 has is been acquired from the viewer, the content of the region of interest 4 is queried and data constituting features or elements of the region of interest 4 are extracted as a result of that query and that query is compared with an information source, which in this embodiment is a database 8. A region of interest 4 can, as shown in figure 1, efther be a area of a screen, preferably around an item 4i, or the area of screen directly corresponding to the item itself 4ii. In preferred embodiments, the database 8 is a collection of objects wfth data related to them, which may correspond to the subject matter contained in a region of interest 4. Thus, the system 2 will need to have had prior exposure to an object that that a user wishes to find out about in order for it to supply the information, whether this prior exposure is constftuted via previous instances of the object in previous television programming, or is constituted by abstracted objects and features place there by other users.
Selection of the region of interest 4 may be done by a user pointing a device 14 at a screen 6 such as a television screen or other display to identify a particular item of interest. In other embodiments the movement of a cursor or the operation of a selector may be preferred. The screen co-ordinates defining the region of interest 4 are recorded and features that describe this image region are extracted. These features are then used to search a database 8 for items with similar features, the details of which are retumed to the user. In preferred embodiments, to facilitate the database 8 search, the time of the query and the current channel are advantageously also captured. It is possible to select a region of interest 4 either when a given programme is "in motion", or when it is paused. In some embodiments the system 2 has a frame freezing facility.
There are several possible approaches for storing the database 8 and performing the search. A first is when the tasks are done locally to the user within a set top box 16 or similar. A second is when the extracted information could be sent to a host system for processing and the result returned to a set top box 16 or similarly locally sited device for display.
In the former embodiment no connection to external devices is required for extracting features and conducting the enquiry for correspondence, some means for populating the database 8 with new items as they become available will be required. This could be is performed via a broadband connection (not shown), at any time, not just when the system 2 is being used for an enquiry. As such, in the former embodiment, information can be more readily accessed.
In an example of the second option, there is shown figure 2, the image displayed on the screen 6 is also written to a frame buffer 15. The buffered image, the channel and the time are encoded and transmitted via a TCP/IP link to a central server 17 for processing.
In this embodiment, the server 17 stores the database 8 and performs the feature extraction; the coordinates of the region of interest 4 are sent to the server 17 to enable this. A principal advantage of this approach is that updating the database 8 and maintaining the search software (e.g. updating the search algorithms, bug fixing, etc) is much easier -only the central system or systems need updating -there is no requirement to update the technology at the user end.
Where the means of selection is handheld and wireless, after a selection is made e.g. a button press on the pointing device 14, the location of the pointing device 14 relative to the screen 6 would be required to infer the 2D screen coordinates. One way of doing this is by utilising a combination of an infrared camera 20 and a sensor bar 22 containing infrared LED's attached to or placed adjacent to the screen 6.
The user will require some form of visual feedback during their selection to inform them where the region of interest 4 that they have selected is placed. That is, the user will need to know which point on the screen the pointing or selecting device 14 currently is pointing to. This could either be a cursor rendered using a set top box or it could be, say a laser projected from the pointing device onto the screen itself In some preferred embodiments, the regions of interest 4 are generated on demand by a user. This can be done either in the context of an embedded system integrated at a low level with current set top box technology, or in the context of a stand alone system, which would not require access to the data stream at the source, would be to generate features for a database query directly from the transmitted and decoded image content at the destination, or a hybrid system could be operated.
Is In all embodiments, the descriptors of the items in the database 8 must be robust to at least partial preclusion, to viewpoint change, differences in illumination and age related issues which might affect appearance. Likewise, the appearance of people and especially actors changes over time; they age, their hairstyle and hair colour might change, they may have different styles of facial hair, they may wear different forms of makeup to alter their appearance, they sometimes might wear facial jewellery at other times not, and so on.
In use, a user might select a region of interest 4 on the screen corresponding to a character and retrieve personal details and information about the career history of the person selected (e.g. their name, age, other shows and films in which they have appeared and the characters that they have played). In an example where an object on the screen 6 such as a lamp, a car or an item of clothing, forms the subject of a region of interest 4, the viewer might be presented with the information about the manufacturer of the object, similar products, and information about where the item can be purchased. In an example where a building or place is identified, then information such as opening times, entry costs and services provided might be displayed, such as where the bookings for weddings and other functions can be made, as well as contact details for such services.
In a preferred embodiment shown in figure 3, the pointing device or interface, which is a handheld device 100 such as a PDA or mobile telephone, has networking capability 102, a touch screen 104, and an on-phone camera 106. A built-in imaging device such as on-phone camera 106 may be used to acquire an image 108 from a screen 110 such as a television screen, which may may then be used for selection on the touch screen 104 rather than be selected from the screen itself Thus, acquiring the screen co-ordinates is a much easier task as they are read directly from the touch screen 104. In the first instance then, the handheld device 100 is external to the television network. The handheld device has an onboard clock 112 could log the time that the image 108 was captured. The user can mark the region of interest 114 directly on touch screen 104. An advantage of this process is that the image 108 on which the user makes their selection is static, which removes problems such as users trying to mark objects that are moving on the source screen 110 or the touch screen 104 that subsequently disappear from view, or where a scene cut might occur before the selection is made. Following this the user might have is the option, to input information into the device 100 as to the source -i.e. the television channel, and this information all could be sent to a remote server 116 via the 3G network, or similar, or over a local area network 117, or similar. Searching and comparison would occur at remote server 116 via the means of a processor 118. Any correspondence found, or results associated with partial or non-correspondence, along with any information associated with that correspondence could then be sent back, over the utilised network 117, to the device 100 for display.
At figure 4 there is a system 40 wherein content may be embedded in a DYB stream. The process steps are shown in the figure.
At figure 5 there is a system 50 wherein the information is embedded, and a frame buffer stores a local copy of the television image. The process steps are shown in the figure.
At figure 6 there is a system 60 wherein the system stands apart from the broadcast itself The process steps are shown in the figure.
In some embodiments, predetermined regions of interest 4, interest could be readily implanted in a broadcast signal such as a television picture and could either be generated at the source and transmitted as metadata for example for an interactive television broadcast with a corresponding digital video broadcast; the interactive content is embedded in the DVB stream. There are several related network dependent standards for defining the protocol for implementing the encoding, decoding and broadcast over interactive network (cable, satellite, broadband etc).
There are several advantages of generating features at the source. The first is transparency to users; in an up to date interactive television environment the existing technology may be utilised. The second advantage is that the search of the database 8 can be simplified.
For example, if a region of interest 4 has no main features the user could be told that the object or objects (s)he has included in a region of interest 4 does not yet exist in the database 8. The third advantage is that there are standards available that can be utilised to achieve the method. Standards defined by, for example the European Telecommunications Standards Institute (ETSI) provide a protocol for two-way communication to send Is information about the selection from the user to a server for the database search, and for returning and displaying the results in an Interactive television environment which is a familiar interface for users. Additionally, the Moving Pictures Experts Group (MPEG) defines standards for encoding digital content for transmission and object indexing. In particular MPEG 7 multimedia content using low level and high level semantic features; the MPEG 21 open framework builds on this to describe a mechanism for representing and protecting digital items. These MPEG representations are XML based so can be readily integrated with current web base technologies for packaging, sending and reviewing data; the technology underpinning the system could therefore be provided under licence, allowing costs to be passed on to broadcasters and manufacturers.
If the features are generated on demand, the database query will be performed and the closest matching items will be returned; as such, a feedback system wherein inaccurate correspondences can be corrected by the user is desirable.
In any embodiment, there are a number of ways in which an item depicted in a given image can be characterised, such that information pertaining to it can be retrieved from a database 8. In all embodiments, the principal way utilised by the system 2 is the image based search, wherein features are extracted automatically from image content and these features are compared to existing features in the database 8. The examples where the correspondence is closest are assumed to be most similar and returned. This is known as "query by image content ". A text based search may be used in combination with the query by image content (as a hybrid search) or as a corrective to an incorrect answer. In this way accuracy may be bolstered. In one example of a text based search, a user provides words that describe what it is they are looking for and objects with matching key A variety of image based features -essentially sets of numbers that best represent the object in the region of interest in an image, can be utilised. For simple objects, e.g. representing a square, this is trivial. A square may be characterised by the (X,Y) co-ordinates of the centre of the square, the width and (possibly) the colour. However most real life objects have irregular shapes and are composed of many different colours; the appearance of the objects (shape and colour) might also be dependent upon viewpoint.
Is Objects closer to the camera will likely have much more detail than objects further away in the background (a problem related to scale). Furthermore, objects in the real world are not static -they change position over time. The feature-numbers used to represent an object in one image are likely to change as the object moves. Therefore features that characterise objects that are robust to scale, viewpoint change and illumination changes are particularly favoured.
The items to be identified can initially be categorised into two broad classes: people and non-people. The people class would represent all instances of performers, whilst the non-people class would represent other objects. The features used to encode these classes will differ, and for the people class, a system that detects faces may advantageously be employed. That said, the problem of face detection need not be problematic for the system, as in some preferred embodiments, the user will be responsible for locating the face of interest in the image. In examples where the image location is selected by the user, a Viola-Jones face detector (Viola and Jones, 2004) could be used to verify that the region of interest contains a face, and to obtain the bounding box of the face. As discussed in more detail elsewhere, it is possible to narrow the domain of the search, by noting the time the query is generated and the channel that it is derived from; these factors will be used to retrieve a cast list or analogous data for the television programme, where it is available. The search will then comprise identifying the performers in the database for the character from the cast that matches the supplied features.
If constraints can be imposed on the search, or if information additional to the video is included, the speed and efficiency of the search can be increased. Aids to the search can include the labelling of footage (including the searching for text in the footage itself (i.e. -talking heads are often introduced in a documentary via a label), text inputs and/or an acoustic speech recogniser. The underlying idea is that faces that occur frequently with textual names will likely correspond with one another. These will all serve to counteract the difficulties in obtaining features that reliably characterise faces across pose, illumination, age, expression and so on.
In order to reliably identify faces or objects across a wide variety of poses and illumination changes, boosted, or muhiple instance classifiers trained to identify features can advantageously be used. By combine weak classifiers together they may form a reliable and robust classifier.
With regard to representing object and places, numerous features have been proposed to represent general objects for recognition and retrieval from a database. These include, but are not limited to, shape-based features (Fetzenszwal, 2001) scale and variant feature transform features (Lowe, 2004), speeded up robust features (Bay Ess, Tuytelaars, Van Gool, 2008), gradient location and orientation histograms (Mikolajczyk and Schmid, 2005), contrast context histograms (Huang, Chen and Chung, 2008), PCA-SIFT (Ke and Sukthankar, 2004) and data sieves (Moravec, Harvey and Bangham, 2000).
Sift-based features are amongst the most popular and reliable but in some circumstances GLOH and PCA-sift are either more efficient and speedy or more robust and accurate. As with face recognition, combinations of these state of the art features to improve the overall robustness of the system, again using some form of boosted classifier. A standard manner of imaging for products to ease feature extraction and inclusion in the database may be provided for product manufacturers and advertisers, in order to enhance searching.
For example, they could supply an image for each possible orientation, so when a query image is provided by use of the item of interest will match at least one of the supplied orientations. There has been work on determining canonical views for object recognition and this work could be used to provide guidance or rules to obtain the best views to include for particular product types.
In some embodiments, separating the products component system from the places component may be useful, as rules distinguishing particular object types i.e. products as opposed to places, are developed. Initial focus would be on products and a set of rules to distinguish places from products would be investigated.
There are a number of applications that build on this basic idea of person and object recognition that may help with adoption of the system or act as supplementary uses.
Firstly, a game centred on character identification could be developed. The player would select a character on a screen and answer a number of questions based on the career history of the selected performer. A bank of questions could be generated and the Is answers populated automatically on the database. Users can compete with one another, where a list of attempts and high scores is maintained centrally.
Similarly applications based on the idea of the "six degrees of Kevin Bacon" game could be devised. For example a game begins when a user selects a character on the screen. A second (randomly selected) person from the database is presented and the goal is either to determine the smallest number of links between these people (in terms of either the performers they have appeared with or the shows they have appeared in), or the largest possible separation between these people. Queries (games played) can be stored centrally, so users can see the scores and games of other users and they can attempt to better the highest scores. An alternative might see a user selecting a person on the screen and retrieving information about the character (as well as the performer). For example, the recent history of the character in the show, and the current significance of the character to the plot would be displayed. Viewers could use such an application to catch up with the information they might have missed on their handheld device; a further advantage is that such an application would not intermpt the viewing of others.
A further possible use is outlined below. Data regarding a product of interest might contain details about its origins, and which retail outlets it can be bought from, and other purchasing information, such as costs, age restrictions and so on. Further, the user might be able to "click through" the image to a retail portal, where one can find further things out about the object and buy it -the opportunity, for example, to furnish a given film could be sold to companies as an advanced form of product placement and by embedding or otherwise supplying information about the product and the retailer, an strong mix of product placement and advertising can be achieved -leading directly to sales opportunities. In a further configuration, information would be displayed about a product or oblect, and the user would have means of interacting with the screen which would lead directly to an order, without the user having to go to an external site.

Claims (14)

  1. CLAIMS1 A method for identifying a region of interest in an image, comprising the steps of: Receiving a signal corresponding to a selected region of interest from a displayed video image, Querying the content of that region of interest, Extracting the data obtained from the query, Comparing the extracted information to an information source, Determining whether a correspondence exists between the extracted information and the information about items signified by displayed images contained with the information source, and relaying at least a portion of the information about items signified by the displayed image to a user.
  2. 2 A method according to claim 1 wherein the query is affected by using a boosted or multiple instances classifier, combining a plurality of classifiers from the following list: * Shape-based features, * Scale invariant feature transform features, * Speeded up robust features, * Gradient location and orientation histograms, * PCA-SIFT features, * Contrast context histograms, and * Data sieves.
  3. 3 A method according to claim 1, wherein the querying of the content of the image is done using a boosted classifier, combining a plurality of classifiers from the following list: * Scale invariant feature transform features, * Gradient location and orientation histograms, and * PCA-SIFT features.
  4. 4 A method according to any of the preceding claims, comprising the additional step of employing a face detector to verify that a region of interest contains a face, subsequent to the step of Selecting a region of interest from a displayed image, but prior to the step of querying the content of that image and wherein the methods of query subsequently employed are determined by whether the face detector detects a face or not.
  5. A method according to any of the preceding claims, further comprising the steps of: noting the time of the generation of the query and/or the source of the image; and attempting to retrieve a list of potential items corresponding to said image on the basis of said time and source data.
  6. 6 A method according to any of the preceding claims, wherein the extracted data further comprises audio data.
  7. 7 A computer programme configured to operate the method of claim 1.
  8. 8 A system for the identification of regions of interest derived from displayed images, comprising, an interface, for allowing a user to select a region of interest from a displayed image, an extractor for extracting information associated with a selected region of interest from the displayed image, an information source, comprising information about items signified by displayed images, a processor, in communication with the information source and the extractor, for comparing the extracted information with the information about items signified by displayed images, in order to identify instances of correspondence between the extracted information and the information about items signified by displayed images a relayer, for relaying, in the event of a correspondence being found, at least a portion of the information about items signified by the displayed image to a site.
  9. 9 A system according to claim 8, wherein the information source further comprises Is information relating to previous occurrences of the display of the item as an image, and wherein the means of identification of the extracted information with an item is done by comparing the extracted information with information relating to the previous occurrences of the item as an image.
  10. 10 A system according to claim 9, wherein the extracted information is stored on a database for subsequent use as information related to previous occurrences of an item as an image.
  11. 11 A system according to any of the preceding claims, wherein the displayed images are derived from a broadcast signal.
  12. 12 A system according to claim 11, wherein the information source is data embedded in the said broadcast signal.
  13. 13 A system according to any of claims 8-11, whereby the information source is a database.
  14. 14 A system according to any of the preceding claims, wherein the region of interest is selected by the user from one or more predefined regions of interest A system according to any of claims 8-14 wherein a region of interest is defined by the user's manipulation of the interface.16 A system according to any of the preceding claims, further comprising a display, upon which the displayed image is displayed.17 A system according to any of the preceding claims, wherein the interface comprises a display with a touch screen.18 A system according to any of the preceding claims, comprising a display upon which the displayed image is displayed, and an interface which comprises a secondary is display with a touch screen.19 A system according to any of claims 16-18 wherein the relayer displays information about items signified by the displayed image on the same display on which the displayed image is displayed.A system according to any of claims 16-18 wherein the relayer displays information about items signified by the displayed image on a secondary display.21 A system according to claim 13 whereby the database stores the data collected by the processor, adding it to the store of information relating to the previous occurrences of the item as an image.22 A system according to any of the preceding claims, wherein the extractor is a handheld imaging device.23 A system according to claim 22, wherein the extractor is a mobile telephone.24 A system according to any of the preceding claims, wherein at least a part of the database is remote from the rest of the system, and wherein said database may be accessed by a plurality of said systems.25 A system substantially as described herein, with reference to and as illustrated by any combination of the text and/or drawings.26 A method substantially as described herein, with reference to and as illustrated by any combination of the text and/or drawings.Amendments to the claims have been filed as follows:CLAIMS1 A method for identifying a region of interest in an image, comprising the steps of: receiving a signal corresponding to a selected region of interest of a displayed video image, 1 uerying the content of that region of interest, extracting the data obtained from the query, comparing the extracted information to an information source, determining whether a correspondence exists between the extracted information and is information about items signified by images contained within the information source, and * I relaying at least a portion of the information about items signified by the displayed image I...to a user, SIll * S " 20 in which the method comprises the additional steps of noting the time of the generation SItS of the query and/or the source of the image and attempting to retrieve a list of potential items corresponding to said image on the basis of said time and source data.S..... S *S * S * ad2 A method according to cLaim 1, wherein the extracted data further comprises audio data.*** .5.3 A computer program configured to operate the method of cLaim 1 or claim 2.4 A system for the identification of regions of interest derived from dispLayed images, comprising, an interface, for allowing a user to select a region of interest from a displayed image, an extractor for extracting information associated with a selected region of interest from the displayed image, an information source, comprising information about items signified by displayed images, a processor, in communication with the information source and the extractor, for comparing the extracted information with the information about items signified by displayed images, in order to identify instances of correspondence between the extracted information and the information about items signified by displayed images, and a relayer for relaying, in the event of a correspondence being found, at least a portion of the information about items signified by the displayed image to the interface, the system being configured to operate the method of claim 1 or claim 2.S A system according to claim 4, wherein the information source further comprises information relating to previous occurrences of the display of the item as an image, and wherein the means of identification of the extracted information with an item is done by comparing the extracted information with information relating to the previous occurrences of the item as an image.6 A system according to either of the preceding claims 4 and 5, wherein the :z 25 displayed images are derived from a broadcast signal.7 A system according to any of the preceding claims 4, 5 and 6, comprising a display upon which the displayed image is displayed, and an interface which comprises a secondary display with a touch screen. *308 A system according to claim 7 wherein the reLayer displays information about items signified by the displayed image on the same display on which the displayed image is displayed.9 A system according to cLaim 7 or cLaim 8 wherein the reLayer displays information about items signified by the displayed image on a secondary display.A system according to any of claims 7 to 9, wherein the extractor is a mobile teLephone.11 A system substantially as described herein, with reference to and as iLLustrated by any combination of the text and/or drawings.12 A method substantially as described herein, with reference to any combination of the text and/or drawings.50. is S * *iistS * S t) S is * 5..Sis rio 5SS IS * 5)
GB1019578.2A 2010-11-19 2010-11-19 Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest Withdrawn GB2485573A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB1019578.2A GB2485573A (en) 2010-11-19 2010-11-19 Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1019578.2A GB2485573A (en) 2010-11-19 2010-11-19 Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest

Publications (2)

Publication Number Publication Date
GB201019578D0 GB201019578D0 (en) 2010-12-29
GB2485573A true GB2485573A (en) 2012-05-23

Family

ID=43431679

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1019578.2A Withdrawn GB2485573A (en) 2010-11-19 2010-11-19 Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest

Country Status (1)

Country Link
GB (1) GB2485573A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839074A (en) * 2014-02-24 2014-06-04 西安电子科技大学 Image classification method based on matching of sketch line segment information and space pyramid
US10951923B2 (en) 2018-08-21 2021-03-16 At&T Intellectual Property I, L.P. Method and apparatus for provisioning secondary content based on primary content

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203000A (en) * 1989-12-29 1991-09-04 Matsushita Electric Ind Co Ltd Automatic road sign recognizing device
EP1168195A2 (en) * 2000-06-23 2002-01-02 NTT DoCoMo, Inc. Information search system
US20030086613A1 (en) * 1999-01-28 2003-05-08 Toshimitsu Kaneko Method of describing object region data, apparatus for generating object region data, video processing apparatus and video processing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03203000A (en) * 1989-12-29 1991-09-04 Matsushita Electric Ind Co Ltd Automatic road sign recognizing device
US20030086613A1 (en) * 1999-01-28 2003-05-08 Toshimitsu Kaneko Method of describing object region data, apparatus for generating object region data, video processing apparatus and video processing method
EP1168195A2 (en) * 2000-06-23 2002-01-02 NTT DoCoMo, Inc. Information search system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839074A (en) * 2014-02-24 2014-06-04 西安电子科技大学 Image classification method based on matching of sketch line segment information and space pyramid
CN103839074B (en) * 2014-02-24 2017-02-08 西安电子科技大学 Image classification method based on matching of sketch line segment information and space pyramid
US10951923B2 (en) 2018-08-21 2021-03-16 At&T Intellectual Property I, L.P. Method and apparatus for provisioning secondary content based on primary content

Also Published As

Publication number Publication date
GB201019578D0 (en) 2010-12-29

Similar Documents

Publication Publication Date Title
US20220224976A1 (en) Methods for identifying video segments and displaying contextually targeted content on a connected television
US10271098B2 (en) Methods for identifying video segments and displaying contextually targeted content on a connected television
US11443511B2 (en) Systems and methods for presenting supplemental content in augmented reality
US9100701B2 (en) Enhanced video systems and methods
KR101382499B1 (en) Method for tagging video and apparatus for video player using the same
US9015139B2 (en) Systems and methods for performing a search based on a media content snapshot image
KR102114701B1 (en) System and method for recognition of items in media data and delivery of information related thereto
US9197911B2 (en) Method and apparatus for providing interaction packages to users based on metadata associated with content
US20150248918A1 (en) Systems and methods for displaying a user selected object as marked based on its context in a program
KR102246305B1 (en) Augmented media service providing method, apparatus thereof, and system thereof
CN102771115A (en) Method for identifying video segments and displaying contextually targeted content on a connected television
JP2003157288A (en) Method for relating information, terminal equipment, server device, and program
CN113194346A (en) Display device
CN108293140A (en) The detection of public medium section
JP5143592B2 (en) Content reproduction apparatus, content reproduction method, content reproduction system, program, and recording medium
KR20100116412A (en) Apparatus and method for providing advertisement information based on video scene
US20090037387A1 (en) Method for providing contents and system therefor
US20150082344A1 (en) Interior permanent magnet motor
JP4932779B2 (en) Movie-adaptive advertising apparatus and method linked with TV program
CN106713973A (en) Program searching method and device
GB2485573A (en) Identifying a Selected Region of Interest in Video Images, and providing Additional Information Relating to the Region of Interest
KR100669639B1 (en) Video browsing system based on multi level object information
JP2013258638A (en) Information generation system, information generation device, information generation method, and information generation program

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)