WO2012171839A1 - Video navigation through object location - Google Patents

Video navigation through object location Download PDF

Info

Publication number
WO2012171839A1
WO2012171839A1 PCT/EP2012/060723 EP2012060723W WO2012171839A1 WO 2012171839 A1 WO2012171839 A1 WO 2012171839A1 EP 2012060723 W EP2012060723 W EP 2012060723W WO 2012171839 A1 WO2012171839 A1 WO 2012171839A1
Authority
WO
WIPO (PCT)
Prior art keywords
images
sequence
image
navigating
selecting
Prior art date
Application number
PCT/EP2012/060723
Other languages
French (fr)
Inventor
Louis Chevallier
Patrick Perez
Anne Lambert
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Priority to RU2014101339A priority Critical patent/RU2609071C2/en
Priority to CA2839519A priority patent/CA2839519A1/en
Priority to JP2014515137A priority patent/JP6031096B2/en
Priority to EP12730823.7A priority patent/EP2721528A1/en
Priority to KR1020137033446A priority patent/KR20140041561A/en
Priority to CN201280029819.XA priority patent/CN103608813A/en
Priority to US14/126,494 priority patent/US20140208208A1/en
Priority to MX2013014731A priority patent/MX2013014731A/en
Publication of WO2012171839A1 publication Critical patent/WO2012171839A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • G06F16/745Browsing; Visualisation therefor the internal structure of a single video sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/102Programmed access in sequence to addressed parts of tracks of operating record carriers
    • G11B27/105Programmed access in sequence to addressed parts of tracks of operating record carriers of operating discs
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/732Query formulation
    • G06F16/7335Graphical querying, e.g. query-by-region, query-by-sketch, query-by-trajectory, GUIs for designating a person/face/object as a query predicate
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8583Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by creating hot-spots

Definitions

  • the present invention relates to a method for navigating in a sequence of images, e.g. in a movie and for interactive rendering of the same, specifically for videos rendered on portable devices that allow easy user interaction, and to an apparatus for conducting the method.
  • object segmentation is known in the art for producing spatial image segmentations, i.e. object boundaries, based on color and texture information.
  • An object is defined quickly by a user using object
  • object segmentation technology just by selecting one or more points within the object.
  • Known algorithms for object segmentation are “graph cut” and “watershed”.
  • object tracking Another technology is called “object tracking”. After an object has been defined by its spatial boundary, the object is tracked automatically in the subsequent sequence of images. For object tracking, the object is typically described by its color distribution.
  • a known algorithm for object tracking is “mean shift”. For increased precision and robustness, some algorithms rely on the object appearance structure.
  • SIFT Scale—invariant feature transform
  • object detection Generic object detection technology makes use of machine learning for computing statistical model of the appearance of the object to be detected. This requires many examples of the objects (ground truth) .
  • Models typically rely on SIFT descriptors. Most common machine learning techniques used nowadays include boosting and support vector machine (SVM) .
  • SVM support vector machine
  • face detection is a specific object detection application.
  • the features used are typically filter parameters, more specifically "haar wavelet” parameters.
  • a well known implementation relies on cascaded boosted classifiers, e.g. Viola & Jone .
  • a first example is skipping a fixed amount of playback time, e.g. moving forward in the video for 10 or 30 seconds.
  • a second example is to make a jump to the next cut or to the next group of pictures (GOP) .
  • GOP group of pictures
  • the skipping mechanism is oriented according to the video data, not according to the content of the movie. It is not clear for the user what image is displayed at the end of the jump. Further, the length of the interval skipped is short.
  • a third example is that a jump is made to the next scene. A scene is a part of action in a single location in a TV show or movie, composed of a series of shots.
  • This method relies on the number of objects that the system can effectively index. For the time being, there are relatively few detectors compared to the huge variety of objects one can encounter in e.g. an average news video.
  • a method for navigating in a sequence of images comprises the steps of:
  • the first input is a user input or an input from another device that is connected to the device executing the method.
  • the first object is indicated by a symbol, e.g. a cross, a plus or a circle and this symbol is moved instead of the first object itself.
  • the second position is a symbol, e.g. a cross, a plus or a circle and this symbol is moved instead of the first object itself.
  • Another way to define the second position is to define the position of the first object in relation to at least one other object in the image. Identifying at least one image in the sequence of images where the first object is close to the second position .
  • the image in the sequence of images is used as a starting point for playback, for which the distance between the two objects is the smallest.
  • the distance between the objects e.g. the absolute value is used.
  • Another way for defining if an object is close to another object is only using X or Y coordinates or weighting the distance in X and Y direction using different weighting factors.
  • sequence of images which is a movie or news program, either being broadcasted or recorded, is navigating through the sequence of images according to the content of the images and is not dependent on some fixed structure of the broadcasted stream which is defined mainly due to technical reasons. Navigation is made intuitive and more user
  • the method is performed in real-time so that the user has the feeling of actually moving the object.
  • the user asks for the point in time where the designated object disappears from the screen.
  • the first input for selecting the first object is clicking on the object or drawing a bounding box around the object.
  • the user applies commonly known input methods for a man-machine interface. If an indexing exists, the user is also able to choose the objects by this index from a database .
  • the step of moving the first object to a second position according to a second input includes :
  • the step of identifying further includes identifying at least one image in the sequence of images where the
  • the first object might be the ball
  • the user can move the ball into the direction of the goal as he expects that there is a scene he might be interested in when the ball is close to the goal, because this might be shortly before the team scores or a player kicks the ball over the goal.
  • This kind of navigation by object is completely independent of the coordinates of the screen, but depends on the relative distance of two objects in the image.
  • the position of the destination of the first object being close to the position of the second object also includes that the second object is exactly at the same position as the destination or that the second object overlaps the destination of the moved first object.
  • the size of the objects and their variation over time is considered to define the relative position of two object to each other.
  • the user selects an object, e.g. a face and then zooms the bounding box of the face in order to define the size of the face. Afterwards, an image is searched in the sequence of images on which the face is displayed at the size or a size close to this size.
  • This feature has the advantage that if e.g. an interview is played back and the user is interested in the speech of a specific person, assuming that the face of this person is displayed almost covering the biggest part of the screen when this person speaks.
  • the further input for selecting the second object is clicking on the object or drawing a bounding box around the object.
  • the user applies commonly known input methods for a man-machine interface. If an indexing exists, the user is also able to choose the objects by this index from a database.
  • object segmentation For selecting the objects, object segmentation, object detection or face detection is employed.
  • object tracking techniques are used to track the position of this object in the subsequent images of the sequence of images.
  • key-point technique is employed for selecting an object.
  • key-point description is used for determining the similarity of objects in different images in the sequence of images.
  • Hierarchical segmentation produces a tree whose nodes and leaves correspond to nested areas of the images. This segmentation is done in advance. If a user selects an object by tapping to a given point of an image, the
  • the node selected with the first tap is considered as father of the node selected with the second tap.
  • the corresponding area is considered to define the object.
  • only a part of the images of the sequence of images are analyzed for identifying at least one image where the object is close to the second position.
  • This part to be analyzed is a certain number of images following the actual image, the certain number of images representing a certain playback time following the currently displayed image.
  • Another way to implement the method is to analyze all following images from the
  • the invention further concerns an apparatus for navigation in a sequence of images according to the above described method .
  • Fig. 1 shows an apparatus for playback of a sequence of images and for performing the inventive method
  • Fig. 2 shows the inventive method for navigating
  • Fig. 3 shows a flow chart illustrating the inventive method
  • Fig. 4 shows a first example of navigation according to the inventive method
  • Fig. 5 shows a second example of navigation according to the inventive method
  • Fig. 1 schematically depicts a playback device for
  • the playback device includes a screen 1, a TV receiver, HDD, DVD, BD player or the like as source 2 for a sequence of images and a man- machine interface 3.
  • the playback device can also be an apparatus including all functions, e.g. a tablet, where the screen is also used as man-machine interface (touchscreen) and a hard disc or flash disc for storing a movie or documentary is present and a broadcast receiver device is also included into the device.
  • Fig. 2 shows a sequence of images 100, e.g. of a movie, documentary or sports event, comprising multiple images.
  • the image 101 which is currently displayed on the screen, is a starting point for the inventive method. In the first step, the screen view 11 displays this image 101.
  • a first object 12 is selected according to a first input received from the man-machine interface. Then, this first object 12 or a symbol representing this first object is moved to another location 13 on the screen, e.g. by drag and drop according to a second input received by the man-machine interface. On screen view 21, the new location 13 of the first object 12 is illustrated. Then, the method identifies at least one image 102 in the sequence of images 100 in which the first object 12 is at a location 14 that is close to the location 13 where this object has been moved to. In this image, the location 14 has a certain distance 15 to the desired location 13, indicated by the drag and drop movement. This distance 15 is used as a measure for
  • Fig. 3 illustrates the steps which are performed by the method.
  • an object is selected in a displayed image according to a first input.
  • the input is received from a man-machine interface. It is assumed that the selecting process described is performed in a short time period. This ensures that the object appearance does not change too much.
  • an image analysis is performed. The image of the current frame is analyzed and the point of interest, which captures a set of key-points present in the image, is extracted. These key-points are located where strong gradients are present. These key-points are extracted with a description of the surrounding texture. When a position in the image is selected, the key-points around this position are collected.
  • the radius of the area in which key-points are collected is a parameter of the method.
  • the selection of the key-points is assisted by other methods, e.g. by a spatial segmentation.
  • the set of extracted key- points constitute a description of the selected object.
  • the object is moved to a second position in step 210. This movement is executed according to a second input, which is an input from the man-machine interface. The movement is realized as drag and drop.
  • the method identifies in step 220 at least one image in the sequence of images in which the first object is close to the second position, which is the image
  • step 230 the method jumps to the
  • Fig. 4 shows an example of applying the method when
  • the playback time of the whole show is indicated by an arrow t.
  • the first image is displayed on the screen, the image is including three faces.
  • the user is interested in the person displayed on the left-hand side of the screen and selects the person by drawing a bounding box around the face. Then the user drags the selected object (the face with fancy hairs) into the middle of the screen and in addition enlarges the bounding box to indicate that he wants to see this person in the middle of the screen and in a close-up view.
  • an image fulfilling this requirement is searched for in the sequence of images, this image is found at time t2 and this image is displayed and playback is started at this time t2.
  • FIG. 5 shows an example of applying a method when watching a soccer game.
  • a scene of a game in the middle of the field is shown.
  • There are four players, one of them is close to the ball.
  • the user is interested in a certain situation, e.g. in the next penalty.
  • he selects the ball with the bounding box and tracks the object to the penalty spot to indicate that he wants to see a scene where the ball is exactly at this point.
  • this requirement is fulfilled.
  • a scene is displayed where the ball lies on the penalty spot and a player prepares for kicking a penalty.
  • the game is played back from this scene onwards.
  • the user is able to conveniently navigate to the next scene he is interested in.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Television Signal Processing For Recording (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Indexing, Searching, Synchronizing, And The Amount Of Synchronization Travel Of Record Carriers (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The present invention relates to a method for navigating in a sequence of images. An image is displayed on a screen. A first object of the displayed image is selected at a first position according to a first input. The first object is moved to a second position according to a second input. At least one image is identified in the sequence of images where the first object is close to the second position. Playback of the sequence of images is started beginning at one of the identified images.

Description

Video navigation through object location
The present invention relates to a method for navigating in a sequence of images, e.g. in a movie and for interactive rendering of the same, specifically for videos rendered on portable devices that allow easy user interaction, and to an apparatus for conducting the method.
For video analysis, different technologies exist. A
technology called "object segmentation" is known in the art for producing spatial image segmentations, i.e. object boundaries, based on color and texture information. An object is defined quickly by a user using object
segmentation technology, just by selecting one or more points within the object. Known algorithms for object segmentation are "graph cut" and "watershed". Another technology is called "object tracking". After an object has been defined by its spatial boundary, the object is tracked automatically in the subsequent sequence of images. For object tracking, the object is typically described by its color distribution. A known algorithm for object tracking is "mean shift". For increased precision and robustness, some algorithms rely on the object appearance structure. A known descriptor for object tracking is Scale—invariant feature transform (SIFT) . A further technology is called "object detection". Generic object detection technology makes use of machine learning for computing statistical model of the appearance of the object to be detected. This requires many examples of the objects (ground truth) .
Automatic object detection is done on new images by using the models. Models typically rely on SIFT descriptors. Most common machine learning techniques used nowadays include boosting and support vector machine (SVM) . In addition, face detection is a specific object detection application. In this case, the features used are typically filter parameters, more specifically "haar wavelet" parameters. A well known implementation relies on cascaded boosted classifiers, e.g. Viola & Jone .
Users watching video content such as news or documentaries might want to interact with the video by skipping some segment or going directly to some point. This possibility is even more desirable when using a tactile device such as a tablet used for video rendering that makes it easy to interact with the display.
For making this non linear navigation possible several means are available on some systems. A first example is skipping a fixed amount of playback time, e.g. moving forward in the video for 10 or 30 seconds. A second example is to make a jump to the next cut or to the next group of pictures (GOP) . These two cases provide a limited semantic level of the underlying analysis. The skipping mechanism is oriented according to the video data, not according to the content of the movie. It is not clear for the user what image is displayed at the end of the jump. Further, the length of the interval skipped is short. A third example is that a jump is made to the next scene. A scene is a part of action in a single location in a TV show or movie, composed of a series of shots. When skipping a whole scene, in general this means jumping to a part of the movie where a different action begins, at a different location in the movie. This might be a too long video portion, which is skipped. A user might want to move by finer steps. On some system where in-depth video analysis is available, some objects or persons can even be indexed. The users can then click on these objects/faces when they are visible on the video, the system can then move to the point where these persons appear again or display additional
information on this particular object. This method relies on the number of objects that the system can effectively index. For the time being, there are relatively few detectors compared to the huge variety of objects one can encounter in e.g. an average news video.
It is an object of the invention to propose a method for navigation and an apparatus for conducting the method, which overcomes the limitations outlined above and offers more user friendly and intuitive navigation.
According to the invention, a method for navigating in a sequence of images is proposed. The method comprises the steps of:
- Displaying an image on a screen.
- Selecting a first object of the displayed image at a first position according to a first input. The first input is a user input or an input from another device that is connected to the device executing the method.
- Moving the first object to a second position
according to a second input. Alternatively, the first object is indicated by a symbol, e.g. a cross, a plus or a circle and this symbol is moved instead of the first object itself. The second position is a
position on the screen defined by e.g. coordinates. Another way to define the second position is to define the position of the first object in relation to at least one other object in the image. Identifying at least one image in the sequence of images where the first object is close to the second position .
- Starting playback of the sequence of images beginning at one of the identified images. The playback is started at the first image identified to fulfil the condition that the first object and the second object are close to each other. Another solution is that the method identifies all images fulfilling this
condition and the user selects one of the images fulfilling the condition to start playback from this image. A further solution is that the image in the sequence of images is used as a starting point for playback, for which the distance between the two objects is the smallest. For defining the distance between the objects, e.g. the absolute value is used. Another way for defining if an object is close to another object is only using X or Y coordinates or weighting the distance in X and Y direction using different weighting factors.
The method has the advantage that a user watching a
sequence of images, which is a movie or news program, either being broadcasted or recorded, is navigating through the sequence of images according to the content of the images and is not dependent on some fixed structure of the broadcasted stream which is defined mainly due to technical reasons. Navigation is made intuitive and more user
friendly. Preferably, the method is performed in real-time so that the user has the feeling of actually moving the object. By a specific interaction, the user asks for the point in time where the designated object disappears from the screen. The first input for selecting the first object is clicking on the object or drawing a bounding box around the object. Thus, the user applies commonly known input methods for a man-machine interface. If an indexing exists, the user is also able to choose the objects by this index from a database .
According to the invention, the step of moving the first object to a second position according to a second input includes :
- selecting a second object of the displayed image at a third position according to a further input,
- defining a destination of the movement of the first object relative to the second object,
- moving the first object to the destination.
The step of identifying further includes identifying at least one image in the sequence of images where the
relative position of the destination of the first object is close to the position of the second object.
This has the advantage that a user can not only choose a location on the screen which is related to the physical coordinates of the screen, but can also choose a position where he expects the object with respect to other objects in the image. For example, in a recorded soccer game, the first object might be the ball, and the user can move the ball into the direction of the goal as he expects that there is a scene he might be interested in when the ball is close to the goal, because this might be shortly before the team scores or a player kicks the ball over the goal. This kind of navigation by object is completely independent of the coordinates of the screen, but depends on the relative distance of two objects in the image. The position of the destination of the first object being close to the position of the second object also includes that the second object is exactly at the same position as the destination or that the second object overlaps the destination of the moved first object. Advantageously, the size of the objects and their variation over time is considered to define the relative position of two object to each other. A further alternative is that the user selects an object, e.g. a face and then zooms the bounding box of the face in order to define the size of the face. Afterwards, an image is searched in the sequence of images on which the face is displayed at the size or a size close to this size. This feature has the advantage that if e.g. an interview is played back and the user is interested in the speech of a specific person, assuming that the face of this person is displayed almost covering the biggest part of the screen when this person speaks. Thus, an advantage of the
invention is that there is an easy method for jumping to a part of the recording where a specific person is
interviewed. The first and the second object do not
necessarily have to be selected in the same image of the sequence of images.
The further input for selecting the second object is clicking on the object or drawing a bounding box around the object. Thus, the user applies commonly known input methods for a man-machine interface. If an indexing exists, the user is also able to choose the objects by this index from a database.
For selecting the objects, object segmentation, object detection or face detection is employed. When the first object is detected, object tracking techniques are used to track the position of this object in the subsequent images of the sequence of images. Also key-point technique is employed for selecting an object. Further, key-point description is used for determining the similarity of objects in different images in the sequence of images. A combination of the above mentioned techniques for
selecting, identifying and tracking an object is used.
Hierarchical segmentation produces a tree whose nodes and leaves correspond to nested areas of the images. This segmentation is done in advance. If a user selects an object by tapping to a given point of an image, the
smallest node containing this point is selected. If a further tap of the user is received, the node selected with the first tap is considered as father of the node selected with the second tap. Thus, the corresponding area is considered to define the object. According to the invention, only a part of the images of the sequence of images are analyzed for identifying at least one image where the object is close to the second position. This part to be analyzed is a certain number of images following the actual image, the certain number of images representing a certain playback time following the currently displayed image. Another way to implement the method is to analyze all following images from the
currently displayed image or all previous images from the currently displayed image. This is a familiar way for a user to navigate in a sequence of images as it represents a fast forward or fast backward navigation. According to another implementation of the invention, only I or only I and P pictures or all pictures are analyzed for the object based navigation.
The invention further concerns an apparatus for navigation in a sequence of images according to the above described method . For better understanding the invention shall now be
explained in more detail in the following description with reference to the figures. It is understood that the
invention is not limited to this exemplary embodiment and that specified features can also expediently be combined and/or modified without departing from the scope of the present invention.
Fig. 1 shows an apparatus for playback of a sequence of images and for performing the inventive method
Fig. 2 shows the inventive method for navigating
Fig. 3 shows a flow chart illustrating the inventive method
Fig. 4 shows a first example of navigation according to the inventive method
Fig. 5 shows a second example of navigation according to the inventive method
Fig. 1 schematically depicts a playback device for
displaying a sequence of images. The playback device includes a screen 1, a TV receiver, HDD, DVD, BD player or the like as source 2 for a sequence of images and a man- machine interface 3. The playback device can also be an apparatus including all functions, e.g. a tablet, where the screen is also used as man-machine interface (touchscreen) and a hard disc or flash disc for storing a movie or documentary is present and a broadcast receiver device is also included into the device. Fig. 2 shows a sequence of images 100, e.g. of a movie, documentary or sports event, comprising multiple images. The image 101, which is currently displayed on the screen, is a starting point for the inventive method. In the first step, the screen view 11 displays this image 101. A first object 12 is selected according to a first input received from the man-machine interface. Then, this first object 12 or a symbol representing this first object is moved to another location 13 on the screen, e.g. by drag and drop according to a second input received by the man-machine interface. On screen view 21, the new location 13 of the first object 12 is illustrated. Then, the method identifies at least one image 102 in the sequence of images 100 in which the first object 12 is at a location 14 that is close to the location 13 where this object has been moved to. In this image, the location 14 has a certain distance 15 to the desired location 13, indicated by the drag and drop movement. This distance 15 is used as a measure for
evaluating how close the desired position and the position in the examined image are. This is illustrated on screen view 31. After identifying the best image, according to the user request, this image is displayed on screen view 41. This image has a certain position, shown as image 102, in the sequence of images 100. The sequence of images 100 is played back from this certain location.
Fig. 3 illustrates the steps which are performed by the method. In the first step 200, an object is selected in a displayed image according to a first input. The input is received from a man-machine interface. It is assumed that the selecting process described is performed in a short time period. This ensures that the object appearance does not change too much. In order to detect the selected object, an image analysis is performed. The image of the current frame is analyzed and the point of interest, which captures a set of key-points present in the image, is extracted. These key-points are located where strong gradients are present. These key-points are extracted with a description of the surrounding texture. When a position in the image is selected, the key-points around this position are collected. The radius of the area in which key-points are collected is a parameter of the method. The selection of the key-points is assisted by other methods, e.g. by a spatial segmentation. The set of extracted key- points constitute a description of the selected object. After selecting the first object, the object is moved to a second position in step 210. This movement is executed according to a second input, which is an input from the man-machine interface. The movement is realized as drag and drop. Then, the method identifies in step 220 at least one image in the sequence of images in which the first object is close to the second position, which is the image
location designated by the user. The object similarity in different images is implemented by a comparison of the set of key-points. In step 230, the method jumps to the
identified image and playback is started.
Fig. 4 shows an example of applying the method when
watching a talk show, in which multiple people are
discussing a selected topic. The playback time of the whole show is indicated by an arrow t. At time tl the first image is displayed on the screen, the image is including three faces. The user is interested in the person displayed on the left-hand side of the screen and selects the person by drawing a bounding box around the face. Then the user drags the selected object (the face with fancy hairs) into the middle of the screen and in addition enlarges the bounding box to indicate that he wants to see this person in the middle of the screen and in a close-up view. Thus, an image fulfilling this requirement is searched for in the sequence of images, this image is found at time t2 and this image is displayed and playback is started at this time t2. Fig. 5 shows an example of applying a method when watching a soccer game. At time tl a scene of a game in the middle of the field is shown. There are four players, one of them is close to the ball. The user is interested in a certain situation, e.g. in the next penalty. Thus, he selects the ball with the bounding box and tracks the object to the penalty spot to indicate that he wants to see a scene where the ball is exactly at this point. At time t2, this requirement is fulfilled. A scene is displayed where the ball lies on the penalty spot and a player prepares for kicking a penalty. The game is played back from this scene onwards. Thus, the user is able to conveniently navigate to the next scene he is interested in.

Claims

Claims
1. Method for navigating in a sequence of images,
comprising the steps of:
- displaying an image on a screen,
- selecting a first object of the displayed image at a first position according to a first input,
- moving the first object to a second position according to a second input,
- identifying at least one image in the sequence of images where the first object is close to the second position, and
- starting playback of the sequence of images beginning at one of the identified images.
2. Method for navigating according to claim 1, wherein the first input for selecting the first object is one of clicking on the object, drawing a bounding box around the object, and choosing the object by an index.
3. Method for navigating according to claim 1 or 2, wherein the second position is defined by coordinates on the screen different from the coordinates of the first position.
4. Method for navigating according to claim 1 or 2, wherein the second position is defined with regard to the second obj ect .
5. Method for navigating according to claim 1, 2 or 4, wherein
moving the first object to a second position according to a second input includes:
- selecting a second object of the displayed image at a third position according to a further input, - defining a destination of the movement of the first object relative to the second object,
- moving the first object to the destination, and wherein
the step of identifying includes identifying at least one image in the sequence of images where the relative position of the destination of the first object is close to the position of the second object.
6. Method for navigating according to claim 5, wherein the further input for selecting the second object is clicking on the object, drawing a bounding box around the object or choosing the object in an index.
7. Method for navigating according to one of claims 1 to 6, wherein the objects are selected by object segmentation, object detection or face detection.
8. Method for navigating according to one of claims 1 to 6, wherein the identifying step includes object tracking for defining the position of the first object in an image of the sequence of images.
9. Method for navigating according to one of claims 1 to 8, wherein key-point technique is used for selecting an obj ect .
10. Method for navigating according to one of claims 1 to 8, wherein key-point technique is used for selecting an object and the key-point description is used for
determining the similarity of objects in different images in the sequence of images.
11. Method for navigating according to one of claims 1 to 10, wherein only a part of the images of the sequence of images are analyzed for identifying at least one image where the object is close to the second position.
12. Method for navigating according to claim 11, the part of images of the sequence of images represents one of a certain playback time from the currently displayed image, all following images from the currently displayed image and all previous images from the currently displayed image.
13. Method for navigating according to claim 11 or 12, the part of images of the sequence of images represents one of I pictures, B pictures and P pictures.
14. Apparatus for navigation in a sequence of images, wherein the apparatus implements a method according to one of claims 1 to 14.
PCT/EP2012/060723 2011-06-17 2012-06-06 Video navigation through object location WO2012171839A1 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
RU2014101339A RU2609071C2 (en) 2011-06-17 2012-06-06 Video navigation through object location
CA2839519A CA2839519A1 (en) 2011-06-17 2012-06-06 Video navigation through object location
JP2014515137A JP6031096B2 (en) 2011-06-17 2012-06-06 Video navigation through object position
EP12730823.7A EP2721528A1 (en) 2011-06-17 2012-06-06 Video navigation through object location
KR1020137033446A KR20140041561A (en) 2011-06-17 2012-06-06 Video navigation through object location
CN201280029819.XA CN103608813A (en) 2011-06-17 2012-06-06 Video navigation through object location
US14/126,494 US20140208208A1 (en) 2011-06-17 2012-06-06 Video navigation through object location
MX2013014731A MX2013014731A (en) 2011-06-17 2012-06-06 Video navigation through object location.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP11305767 2011-06-17
EP11305767.3 2011-06-17

Publications (1)

Publication Number Publication Date
WO2012171839A1 true WO2012171839A1 (en) 2012-12-20

Family

ID=46420070

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2012/060723 WO2012171839A1 (en) 2011-06-17 2012-06-06 Video navigation through object location

Country Status (9)

Country Link
US (1) US20140208208A1 (en)
EP (1) EP2721528A1 (en)
JP (1) JP6031096B2 (en)
KR (1) KR20140041561A (en)
CN (1) CN103608813A (en)
CA (1) CA2839519A1 (en)
MX (1) MX2013014731A (en)
RU (1) RU2609071C2 (en)
WO (1) WO2012171839A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9405770B2 (en) 2014-03-10 2016-08-02 Google Inc. Three dimensional navigation among photos
CN104185086A (en) * 2014-03-28 2014-12-03 无锡天脉聚源传媒科技有限公司 Method and device for providing video information
CN104270676B (en) * 2014-09-28 2019-02-05 联想(北京)有限公司 A kind of information processing method and electronic equipment
JP6142897B2 (en) * 2015-05-15 2017-06-07 カシオ計算機株式会社 Image display device, display control method, and program
KR102474244B1 (en) * 2015-11-20 2022-12-06 삼성전자주식회사 Image display apparatus and operating method for the same
TWI636426B (en) * 2017-08-23 2018-09-21 財團法人國家實驗研究院 Method of tracking a person's face in an image

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090052861A1 (en) * 2007-08-22 2009-02-26 Adobe Systems Incorporated Systems and Methods for Interactive Video Frame Selection
US20100082585A1 (en) * 2008-09-23 2010-04-01 Disney Enterprises, Inc. System and method for visual search in a video media player

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06101018B2 (en) * 1991-08-29 1994-12-12 インターナショナル・ビジネス・マシーンズ・コーポレイション Search of moving image database
JP4226730B2 (en) * 1999-01-28 2009-02-18 株式会社東芝 Object region information generation method, object region information generation device, video information processing method, and information processing device
KR100355382B1 (en) * 2001-01-20 2002-10-12 삼성전자 주식회사 Apparatus and method for generating object label images in video sequence
JP2004240750A (en) * 2003-02-06 2004-08-26 Canon Inc Picture retrieval device
TW200537941A (en) * 2004-01-26 2005-11-16 Koninkl Philips Electronics Nv Replay of media stream from a prior change location
US20080285886A1 (en) * 2005-03-29 2008-11-20 Matthew Emmerson Allen System For Displaying Images
WO2007096003A1 (en) * 2006-02-27 2007-08-30 Robert Bosch Gmbh Trajectory-based video retrieval system, method and computer program
US7787697B2 (en) * 2006-06-09 2010-08-31 Sony Ericsson Mobile Communications Ab Identification of an object in media and of related media objects
US8488839B2 (en) * 2006-11-20 2013-07-16 Videosurf, Inc. Computer program and apparatus for motion-based object extraction and tracking in video
DE102007013811A1 (en) * 2007-03-22 2008-09-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. A method for temporally segmenting a video into video sequences and selecting keyframes for finding image content including subshot detection
US20100281371A1 (en) * 2009-04-30 2010-11-04 Peter Warner Navigation Tool for Video Presentations
JP5163605B2 (en) * 2009-07-14 2013-03-13 パナソニック株式会社 Moving picture reproducing apparatus and moving picture reproducing method
US20110113444A1 (en) * 2009-11-12 2011-05-12 Dragan Popovich Index of video objects
US9171075B2 (en) * 2010-12-30 2015-10-27 Pelco, Inc. Searching recorded video

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090052861A1 (en) * 2007-08-22 2009-02-26 Adobe Systems Incorporated Systems and Methods for Interactive Video Frame Selection
US20100082585A1 (en) * 2008-09-23 2010-04-01 Disney Enterprises, Inc. System and method for visual search in a video media player

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOSEF SIVIC ET AL: "Person Spotting: Video Shot Retrieval for Face Sets", IMAGE AND VIDEO RETRIEVAL; [LECTURE NOTES IN COMPUTER SCIENCE; LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, vol. 3568, 4 August 2005 (2005-08-04), pages 226 - 236, XP019012815, ISBN: 978-3-540-27858-0 *

Also Published As

Publication number Publication date
JP2014524170A (en) 2014-09-18
RU2014101339A (en) 2015-07-27
KR20140041561A (en) 2014-04-04
JP6031096B2 (en) 2016-11-24
RU2609071C2 (en) 2017-01-30
US20140208208A1 (en) 2014-07-24
CA2839519A1 (en) 2012-12-20
CN103608813A (en) 2014-02-26
MX2013014731A (en) 2014-02-11
EP2721528A1 (en) 2014-04-23

Similar Documents

Publication Publication Date Title
US20200218902A1 (en) Methods and systems of spatiotemporal pattern recognition for video content development
AU2015222869B2 (en) System and method for performing spatio-temporal analysis of sporting events
JP5355422B2 (en) Method and system for video indexing and video synopsis
Pritch et al. Webcam synopsis: Peeking around the world
US7802188B2 (en) Method and apparatus for identifying selected portions of a video stream
US20140208208A1 (en) Video navigation through object location
EP3513566A1 (en) Methods and systems of spatiotemporal pattern recognition for video content development
Chen et al. Personalized production of basketball videos from multi-sensored data under limited display resolution
Carlier et al. Combining content-based analysis and crowdsourcing to improve user interaction with zoomable video
US10325628B2 (en) Audio-visual project generator
JP2004508757A (en) A playback device that provides a color slider bar
CN111031349B (en) Method and device for controlling video playing
JP2011504034A (en) How to determine the starting point of a semantic unit in an audiovisual signal
JP2007200249A (en) Image search method, device, program, and computer readable storage medium
Wittenburg et al. Rapid serial visual presentation techniques for consumer digital video devices
WO1999005865A1 (en) Content-based video access
JP3629047B2 (en) Information processing device
Coimbra et al. The shape of the game
Zhuang Sports video structure analysis and feature extraction in long jump video
KR20110114385A (en) Manual tracing method for object in movie and authoring apparatus for object service
JP6219808B2 (en) Video search device operating method, video search method, and video search device
Wang Viewing support system for multi-view videos
Sumiya et al. A Spatial User Interface for Browsing Video Key Frames
Pongnumkul Facilitating Interactive Video Browsing through Content-Aware Task-Centric Interfaces

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12730823

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014515137

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: MX/A/2013/014731

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2839519

Country of ref document: CA

Ref document number: 20137033446

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012730823

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2014101339

Country of ref document: RU

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 14126494

Country of ref document: US