WO2002023891A2 - Method for highlighting important information in a video program using visual cues - Google Patents
Method for highlighting important information in a video program using visual cues Download PDFInfo
- Publication number
- WO2002023891A2 WO2002023891A2 PCT/EP2001/010112 EP0110112W WO0223891A2 WO 2002023891 A2 WO2002023891 A2 WO 2002023891A2 EP 0110112 W EP0110112 W EP 0110112W WO 0223891 A2 WO0223891 A2 WO 0223891A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cue
- video clip
- preselected
- frames
- video
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/785—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7857—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using texture
Definitions
- the present invention relates to content-based video retrieval and browsing, and niore particularly, to a method for automatically identifying important information or developments in video clips of sports events.
- Video applications call for browsing methods which enable one to browse through a large amount of video material to find clips which are of a certain importance.
- Such applications may include for example, interactive TV and pay-per-view systems.
- Customers who use interactive TV and pay-per-view systems want to see sections of programs before renting them.
- Video browsers enable the customers to find programs of interest.
- low-level features such as color, texture, shape and camera motion.
- low-level features can be useful for certain applications, many other interesting applications require the use of higher level semantic information. Bridging the gap between low-level features and high-level semantic information is not always easy. In most cases when higher level semantic information is required, manual annotation using keywords is always used.
- One of the important applications for video archiving and retrieval is for sports such as soccer, football, etc. Accordingly, a method is needed which enables automatic extraction of high level information using low level features.
- the present invention is directed to a method for automatically identifying important developments in video clips of sporting events, especially soccer matches.
- the method comprises detecting sequences of frames in a video clip of a sporting event that have a preselected cue indicative of a possible important development in frames of the video clip immediately preceding the frame sequences having the preselected cue; comparing the number of frames in each of the frame sequences having the cue to a predefined threshold number; and declaring an important development in the frames immediately preceding each frame sequence if the number of frames in that sequence is equal to or greater than the threshold number.
- the method further involves acquiring the preselected cue from low level features in the image in each frame of the sequence.
- the preselected cue is based on changes in the camera's center of attention. More particularly, when an important development occurs in the video clip, the camera typically focuses on the viewers or players, and thus, the images in the sequence of frames immediately subsequent to the frames with the important development have little or no grass areas.
- Fig. 1 is a flowchart outlining an algorithm that performs an illustrative embodiment of the method of the present invention
- Fig. 2 is a block diagram of a computer for implementing the present invention.
- Fig. 3 is a block diagram of the internal structure of the computer for implementing the present invention.
- the method of the present invention extracts high level information from multiple images or video using low level features in order to achieve advancements in content-based retrieval and browsing. This is accomplished in the present invention by specifying a particular domain of interest and using knowledge specific to that domain to automatically extract high level information based on low level features.
- One especially useful application for the present invention is in highlighting segments of important developments in video clips of sports events, including but not limited to soccer matches and football games. Such video clips typically include video, audio, and textual (close- captioning) information.
- the method of the present invention highlights important developments in a video clip by inferring the developments from one or more cues which are provided from low level features and textual information of the video clip. More particularly, the method detects sequences of frames in the video clip having a certain preselected visual, audible, and/or textual (close captioning) cue. The number of frames in each sequence having the cue(s) is then compared to a predefined threshold number. If the number of frames in a sequence is equal to or greater than the threshold number, an important development is declared in the frames immediately preceding the threshold meeting frame sequence with the cue. It has been found that important developments in video clips of sports events are typically marked with a visual cue which relates to changes in the camera's center of attention.
- the video camera usually focuses on the stadium viewers or the players.
- the camera When the camera focuses on the viewers or players, little or none of the grass of the playing field can be seen in the camera' s field of view.
- the method of the present invention detects sequences of frames in the video clip with images that have little or no grass areas of the playing field.
- the number of frames in each sequence is compared to a predefined threshold number. If the number of frames in the sequence is equal to or greater than the threshold number, an important development is declared in the frames immediately preceding the threshold meeting frame sequence that has little or no grass areas.
- the threshold is based on the assumption that if the number of frames in the sequence with little or no grass areas of the playing field is significant, the camera must be focusing on the viewers or the players. Consequently, it is likely that the frames immediately preceding that sequence of frames includes an important development such as the scoring of a goal in the case of a soccer match.
- Fig. 1 shows a flowchart which outlines an illustrative embodiment of an algorithm for performing the method of the present invention as it applies to highlighting segments of important events in a video clip of a soccer match.
- the algorithm in step SI detects sequences of frames in the video clip in which there are little or no grass areas.
- step 52 if the number of frames in the sequence is larger than a predefined threshold, then in step
- the algorithm detects green areas which have colors similar to grass.
- the algorithm is trained to differentiate the green colors from the other colors in each frame so that the grass areas in the frame can be identified. This is accomplished using patches from a training set of images of grass areas which have been extracted from the soccer match in the video clip, or from one or more previous soccer matches.
- the algorithm learns from the patches how the grass areas translate into the values of the color green. Given an image in a frame of the video clip, the training is used to judge whether a given pixel in the frame is grass.
- a color histogram of an image is obtained by dividing a color space, such as red, green, and blue, into discrete image colors (called bins) and counting the number of times each discrete color appears by traversing every pixel in the image.
- This normalized histogram can be considered as the probability density function for the class grass, p(pixel value I grass).
- the detection step SI is accomplish in the algorithm by marking pixels in each frame that have a value of p(pixel value
- step S2 If only small grass color components are detected for a short period of time in step S2, for example in only one-three or four frames, then no important event is declared in step S3. However, if small grass color components are detected for a relatively long period of time, for example in 200-300 frames, then an important event is declared in step S3.
- the results obtained with the algorithm can be further refined using other cues either from the same modality or from other modalities, such as audio or closed captions. Cues from the same modalities or different modalities can be used to confirm the identity of the detected important occurrences or activities and more importantly, to classify the detected important occurrences or activities into semantic classes, such as goals, attempted goals, penalties, injuries, fights between players and the like, and rank them by importance.
- the method of the Fig.ure 1 is implemented by a computer readable code executed by a data processing apparatus.
- the code may be stored in a memory within the data processing apparatus or read/downloaded from a memory medium such as a CD-ROM or floppy disk.
- hardware circuitry may be used in place of, or in combination with, software instructions to implement the invention.
- the invention for example, can also be implemented on a computer 30 shown in Fig. 2.
- the computer 30 may include a network connection 31 for interfacing to a data network, such as a variable-bandwidth network or the Internet, and a fax/modem connection 32 for interfacing with other remote sources such as a video or a digital camera (not shown).
- the computer 30 may also include a display for displaying information (including video data) to a user, a keyboard for inputting text and user commands, a mouse for positioning a cursor on the display and for inputting user commands, a disk drive for reading from and writing to floppy disks installed therein, and a CD-ROM drive for accessing information stored on CD-ROM.
- the computer 30 may also have one or more peripheral devices 38 attached thereto inputting images, or the like, and a printer for outputting images, text, or the like.
- Fig. 3 shows the internal structure of the computer 30 which includes a memory 40 that may include a Random Access Memory (RAM), Read-Only Memory (ROM) and a computer-readable medium such as a hard disk.
- the items stored in the memory 40 include an operating system 41, data 42 and applications 43.
- the operating system 41 may be a windowing operating system, such as UNIX; although the invention may be used with other operating systems as well such as Microsoft Windows95.
- the applications stored in the memory 40 include a video coder 44, a video decoder 45 and a frame grabber 46.
- the video coder 44 encodes video data in a conventional manner
- the video decoder 45 decodes video data which has been coded in the conventional manner.
- the frame grabber 46 allows single frames from a video signal stream to be captured and processed.
- the CPU 50 comprises a microprocessor or the like for executing computer readable code, i.e., applications, such those noted above, out of the memory 50.
- applications may be stored in memory 40 (as noted above) or, alternatively, on a floppy disk in disk drive 36 or a CD-ROM in CD-ROM drive 37.
- the CPU 50 accesses the applications (or other data) stored on a floppy disk via the memory interface 52 and accesses the applications (or other data) stored on a CD-ROM via CD-ROM drive interface 53.
- Input video data may be received through the video interface 54 or the communication interface 51.
- the input video data may be decoded by the video decoder 45.
- Output video data may be coded by the video coder 44 for transmission through the video interface 54 or the communication interface 51.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Television Signal Processing For Recording (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002527199A JP2004509529A (en) | 2000-09-13 | 2001-08-30 | How to use visual cues to highlight important information in video programs |
EP01971992A EP1320992A2 (en) | 2000-09-13 | 2001-08-30 | Method for highlighting important information in a video program using visual cues |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US66091800A | 2000-09-13 | 2000-09-13 | |
US09/660,918 | 2000-09-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2002023891A2 true WO2002023891A2 (en) | 2002-03-21 |
WO2002023891A3 WO2002023891A3 (en) | 2002-05-30 |
Family
ID=24651479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/010112 WO2002023891A2 (en) | 2000-09-13 | 2001-08-30 | Method for highlighting important information in a video program using visual cues |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1320992A2 (en) |
JP (1) | JP2004509529A (en) |
WO (1) | WO2002023891A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007073349A1 (en) * | 2005-12-19 | 2007-06-28 | Agency For Science, Technology And Research | Method and system for event detection in a video stream |
WO2008154292A1 (en) * | 2007-06-08 | 2008-12-18 | Apple Inc. | Assembling video content |
US9508012B2 (en) | 2014-03-17 | 2016-11-29 | Fujitsu Limited | Extraction method and device |
US9892320B2 (en) | 2014-03-17 | 2018-02-13 | Fujitsu Limited | Method of extracting attack scene from sports footage |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4577774B2 (en) * | 2005-03-08 | 2010-11-10 | Kddi株式会社 | Sports video classification device and log generation device |
JP2011015129A (en) * | 2009-07-01 | 2011-01-20 | Mitsubishi Electric Corp | Image quality adjusting device |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3728775B2 (en) * | 1995-08-18 | 2005-12-21 | 株式会社日立製作所 | Method and apparatus for detecting feature scene of moving image |
KR100206804B1 (en) * | 1996-08-29 | 1999-07-01 | 구자홍 | The automatic selection recording method of highlight part |
JPH1155613A (en) * | 1997-07-30 | 1999-02-26 | Hitachi Ltd | Recording and/or reproducing device and recording medium using same device |
AU719329B2 (en) * | 1997-10-03 | 2000-05-04 | Canon Kabushiki Kaisha | Multi-media editing method and apparatus |
KR20010041607A (en) * | 1998-03-04 | 2001-05-25 | 더 트러스티스 오브 콜롬비아 유니버시티 인 더 시티 오브 뉴욕 | Method and system for generating semantic visual templates for image and video retrieval |
US6163510A (en) * | 1998-06-30 | 2000-12-19 | International Business Machines Corporation | Multimedia search and indexing system and method of operation using audio cues with signal thresholds |
-
2001
- 2001-08-30 WO PCT/EP2001/010112 patent/WO2002023891A2/en active Application Filing
- 2001-08-30 EP EP01971992A patent/EP1320992A2/en not_active Withdrawn
- 2001-08-30 JP JP2002527199A patent/JP2004509529A/en active Pending
Non-Patent Citations (2)
Title |
---|
CHANG YUH-LIN E.A.: "Integrated image and speech analysis for content-based video indexing", PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, 17 June 1996 (1996-06-17), pages 306 - 313 |
See also references of EP1320992A2 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007073349A1 (en) * | 2005-12-19 | 2007-06-28 | Agency For Science, Technology And Research | Method and system for event detection in a video stream |
WO2008154292A1 (en) * | 2007-06-08 | 2008-12-18 | Apple Inc. | Assembling video content |
US9047374B2 (en) | 2007-06-08 | 2015-06-02 | Apple Inc. | Assembling video content |
US9508012B2 (en) | 2014-03-17 | 2016-11-29 | Fujitsu Limited | Extraction method and device |
US9892320B2 (en) | 2014-03-17 | 2018-02-13 | Fujitsu Limited | Method of extracting attack scene from sports footage |
Also Published As
Publication number | Publication date |
---|---|
EP1320992A2 (en) | 2003-06-25 |
JP2004509529A (en) | 2004-03-25 |
WO2002023891A3 (en) | 2002-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7339992B2 (en) | System and method for extracting text captions from video and generating video summaries | |
Truong et al. | Scene extraction in motion pictures | |
JP4643829B2 (en) | System and method for analyzing video content using detected text in a video frame | |
US7120873B2 (en) | Summarization of sumo video content | |
JP5420199B2 (en) | Video analysis device, video analysis method, digest automatic creation system and highlight automatic extraction system | |
CN110381366B (en) | Automatic event reporting method, system, server and storage medium | |
US8340498B1 (en) | Extraction of text elements from video content | |
EP2089820B1 (en) | Method and apparatus for generating a summary of a video data stream | |
WO2019007020A1 (en) | Method and device for generating video summary | |
US8051446B1 (en) | Method of creating a semantic video summary using information from secondary sources | |
Snoek et al. | Time interval maximum entropy based event indexing in soccer video | |
WO2002023891A2 (en) | Method for highlighting important information in a video program using visual cues | |
Choroś | Highlights extraction in sports videos based on automatic posture and gesture recognition | |
US20070124678A1 (en) | Method and apparatus for identifying the high level structure of a program | |
Brezeale | Learning video preferences using visual features and closed captions | |
Jung et al. | Player information extraction for semantic annotation in golf videos | |
Bailer et al. | Skimming rushes video using retake detection | |
CN117221669B (en) | Bullet screen generation method and device | |
US11417100B2 (en) | Device and method of generating video synopsis of sports game | |
Lotfi | A Novel Hybrid System Based on Fractal Coding for Soccer Retrieval from Video Database | |
Hsieh et al. | Constructing a bowling information system with video content analysis | |
Gupta | A Survey on Video Content Analysis | |
Brezeale et al. | Learning video preferences from video content | |
Pande | Mapping of Low Level to High Level Audio-Visual Features: A Survey of the Literature | |
Manickam et al. | Fast lead star detection in entertainment videos |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
ENP | Entry into the national phase |
Ref country code: JP Ref document number: 2002 527199 Kind code of ref document: A Format of ref document f/p: F |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001971992 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2001971992 Country of ref document: EP |