WO2009148518A2 - Semantic event detection for digital content records - Google Patents
Semantic event detection for digital content records Download PDFInfo
- Publication number
- WO2009148518A2 WO2009148518A2 PCT/US2009/003160 US2009003160W WO2009148518A2 WO 2009148518 A2 WO2009148518 A2 WO 2009148518A2 US 2009003160 W US2009003160 W US 2009003160W WO 2009148518 A2 WO2009148518 A2 WO 2009148518A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- concept
- event
- image
- visual
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/785—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using colour or luminescence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using shape
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7847—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
- G06F16/7857—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content using texture
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
Definitions
- the invention relates to categorizing digital content records, such as digital still images or video.
- the invention relates to categorizing digital content records based on the detection of semantic events.
- the consumer may take images at each of the specific locations, as well as images at each of the locations that are related to other subject categories or events. For example, the consumer may take images of family members at each of the locations, images of specific events at each of the locations, and images of historical buildings at each of the locations.
- the consumer may desire to sort the digital images based on various groupings such as persons, birthdays, museums, etc., and to store the digital images based on the groupings in an electronic album.
- the consumer is currently faced with manually sorting through hundreds of digital still images and video segments in order to identify them with specific events.
- automatic albuming of consumer photos and videos has gained a great deal of interest in recent years.
- One popular approach to automatic albuming is to organize digital images and videos according to events by chronological order and by visual similarities in image content.
- A.C. Loui and A. Savakis "Automated event clustering and quality screening of consumer pictures for digital albuming," IEEE Trans, on Multimedia, 5(3):390-402, 2003, discusses how a group of digital images can be automatically clustered into events. While basic clustering of images can group images that appear to be related to a single event, it would be desirable to be able to tag semantic meanings to the clustered events in order to improve the automatic albuming process.
- Semantic event detection presents basic problems: first, a practical system needs to be able to process digital still images and videos simultaneously, as both often exist in real consumer image collections; second, a practical system needs to accommodate the diverse semantic content in real consumer collections, thereby making it desirable to provide a system that incorporates generic methods for detecting different semantic events instead of specific individual methods for detecting each specific semantic event; finally, a practical system needs to be robust to prevent errors in identification and classification.
- the invention provides a system and method for semantic event detection in digital image content records.
- an event-level "Bag-of-Features" (BOF) representation is used to model events, and generic semantic events are detected in a concept space instead of an original low-level visual feature space based on the BOF representation.
- BOF Bog-of-Features
- an event-level representation is developed where each event is modeled by a BOF feature vector based on which semantic event detectors are directly built.
- the present invention is more robust to the difficult images or mistakenly organized images within events. For example, in any given event, some images may be difficult to classify. These difficult images usually make the decision boundary complex and hard to model.
- a method for facilitating semantic event classification of a group of image records related to an event includes extracting a plurality of visual features from each of the image records, generating a plurality of concept scores for each of the image records using the visual features, wherein each concept score corresponds to a visual concept and each concept score is indicative of a probability that the image record includes the visual concept, generating a feature vector corresponding to the event based on the concept scores of the image records, and supplying the feature vector to an event classifier that identifies at least one semantic event classifier that corresponds to the event.
- the image records may include at least one digital still image and at least one video segment. Accordingly, the system is capable of handling real life consumer image datasets that normally contain both digital still images and video segments.
- the extracting of the plurality of visual features includes extracting a keyframe from the video segment and extracting the plurality of visual features from both the keyframe and the digital still image.
- Initial concept scores are then generated for each keyframe and each digital still image corresponding to each of the extracted visual features.
- Ensemble concept scores are then preferably generated for each keyframe and each digital still image based on the initial concept scores
- the ensemble concept scores are preferably generated by fusing the individual concept scores for each extracted visual feature for a given keyframe or a given digital still image.
- the digital still images and video segments can be tagged to facilitate proper sorting, storage and retrieval of the images and video segments.
- Fig. 1 is a schematic block diagram of a semantic event detection system in accordance with the invention
- Fig. 2 is a flow diagram illustrating processing modules utilized by the semantic event detection system illustrated in Fig. 1 ;
- Fig. 3 is a flow diagram illustrating processing modules utilized by to train the system illustrated in Fig. 1 for semantic event detection;
- Fig. 4 is a flow diagram illustrating processing modules utilized to train concept detectors employed in the system illustrated in Fig. 1 ;
- Fig. 5 is a table illustrating different semantic events used in a test process and including their detailed definitions
- Fig. 6 is a graph illustrating a comparison the results of the system illustrated in Fig. 1 with a convention approach where the BOF feature vectors are constructed based on original low-level visual features
- Fig. 7 is a graph comparing the results of the present invention with a baseline event detector and an SVM detector using image-level concept score representation directly (SVM- Direct);
- Fig. 8 is a graph comparing the number of support vectors required by the present invention compared with the SVM-Direct method.
- a visual concept can be defined as an image-content characteristic of an image, and are usually semantically represented by words that are broader than the words used to identify a specific event. Accordingly, the visual concepts form a subset of image-content characteristics that can be contributed to a specific event.
- semantic event detectors are built in the concept space instead of in the original low-level feature space.
- Benefits from such an approach include at least two aspects.
- S. Ebadollahi et al. "Visual event detection using multi-dimensional concept dynamics," IEEE ICME, 2006
- concept scores are powerful to model semantic events.
- the concept space in the present invention is preferably formed by semantic concept detectors, as described for example in S. F. Chang et al., "Multimodal semantic concept detection for consumer video benchmark," ACM MIR, 2007, trained over a known consumer electronic image dataset of the type described, for example, A.C. Loui et al., "Kodak consumer video benchmark data set: concept definition and annotation," ACM MIR, 2007.
- These semantic concept detectors play the important role of incorporating additional information from a previous image dataset to help detect semantic events in a current image dataset.
- the entire dataset first be partitioned into a set of macro-events, and each macro-event be further partitioned into a set of events.
- the partition is preferably based on the capture time of each digital still image of video segment and the color similarity between them, by using the previously developed event clustering algorithm described above. For example, letting E, denote the Mh event which contains m p ' photos and m v ' videos. // and V 1 ' denote the /-th photo and y-th video in E,.
- images can be grouped or clustered into events utilizing this algorithm, the events themselves are not identified or associated with semantic meanings. Accordingly, the goal of the present invention is to tag a specific semantic meaning, i.e., a semantic event S E such as "wedding" and "birthday", to a specific event E t and to the image records correspond to the event.
- each video Vf is preferably partitioned into a set of segments v ] ' X ,--,v i ' m , with each segment having a given length ( for example five seconds).
- the keyframes are then uniformly periodically sampled from the video segments (for example every half second).
- V J k be the /-th keyframe in the /c-th segment V J ' k
- i j ' Jt t can also be represented by a feature vector f( V 1 kJ ) in the concept space in the same manner as a digital still image. It will be understood that different sampling rates may be readily employed than those illustrated above.
- Both digital still images and video segments can be defined as data points represented by x.
- Semantic event detection is then performed based on these data points and the corresponding feature vectors developed from the concept scores.
- BOF The BOF representation has been proven effective for detecting generic concepts for images. See, for example, J. Sivic and A. Zisserman, "Video google: a text retrieval approach to object matching in videos", ICCV, pp 1470-1477, 2003.
- images are represented by a set of orderless local descriptors.
- clustering techniques a middle-level visual vocabulary is constructed where each visual word is formed by a group of local descriptors. Each visual word is considered as a robust and denoised visual term for describing images.
- S ⁇ denote a semantic event, e.g. "wedding”
- E 1 E M denote M events containing this semantic event.
- Each E is formed by m p ' photos and m[. video segments.
- Each concept word can be treated as a pattern of concept concurrences that is a common character for describing all the events containing S E .
- a spectral clustering algorithm See, for example, A.Y. Ng, M. Jordan, and Y.
- Each data point is treated as a set of images, i.e., one image for a still video image and multiple images for a video segment. Then EMD is used to measure the similarity between two data points (image sets). There are many ways to compute the distance between two image sets, e.g. the maximum /minimum/mean distance between images in these two sets. These methods are easily influenced by noisy outlier images, while EMD provides a more robust distance metric. EMD finds a minimum weighted distance among all pairwise distances between two image sets subject to weight-normalization constraints, and allows partial match between data points and can alleviate the influence of outlier images.
- the EMD between two data points is calculated as follows. Assume that there are H 1 and n 2 images in data points X 1 and X 2 , respectively.
- the EMD between X 1 and X 2 is a linear combination of ground distance d(l p ⁇ I q ) weighted by flow f (Ip , Iq ) between any two images i p e X 1 , I 2 e X 1 .
- Spectral clustering is a technique for finding groups in data sets consisting of similarities between pairs of data points.
- the algorithm developed in Ng et al. is adopted and can be described as follows. Given the similarity matrix S(x,, x y ): . Get affine matrix 0.
- Each data cluster obtained by the spectral clustering algorithm is called a concept word, and all the clusters form a concept vocabulary to represent and detect semantic events.
- Wj denote the /-th word learned for semantic event S ⁇
- S(x, W 1 ) denote the similarity of data x to word Wj calculated as the maximum similarity between x and the member data points in Wr'.
- S(x,WJ) max , S(x k , x), where S(x k , x) is defined in the same way as described above.
- vector [S(x, W]),..., S(x, W n )] ⁇ can be treated as a BOF feature vector for x.
- event E contains m* data points
- event E t can also be represented by a BOF feature vector f bOf (£ f ) as: W ⁇ iH max ⁇ S(x, W,),..., max xe£ ⁇ (x, W n )Y.
- BOF feature f bO f a binary one-vs.-all SVM classifier can be learned to detect semantic event S B .
- the system 100 includes a data processing unit 110, a peripheral unit 120, a user interface unit 130, and a memory unit 140.
- the memory unit 140, the peripheral unit 120, and the user interface unit 130 are communicatively connected to the data processing system 110.
- the data processing system 110 includes one or more data processing devices that implement the processes of the various embodiments of the present invention, including the example processes of Figs. 2-4 described herein.
- data processing device or “data processor” are intended to include any type of data processing device, such as a central processing unit (“CPU”), a desktop computer, a laptop computer, a mainframe computer, a personal digital assistant, a BlackberryTM, a digital camera, cellular phone, or any other device for processing data, managing data, or handling data, whether implemented with electrical, magnetic, optical, biological components, or otherwise.
- CPU central processing unit
- desktop computer a laptop computer
- mainframe computer a personal digital assistant
- BlackberryTM a digital camera
- cellular phone or any other device for processing data, managing data, or handling data, whether implemented with electrical, magnetic, optical, biological components, or otherwise.
- the memory unit 140 includes one or more memory devices configured to store information, including the information needed to execute the processes of the various embodiments of the present invention, including the example processes of Figs. 2-4 described herein.
- the memory unit 140 may be a distributed processor- accessible memory system including multiple processor-accessible memories communicatively connected to the data processing system 110 via a plurality of computers and/or devices.
- the memory unit 140 need not be a distributed processor-accessible memory system and, consequently, may include one or more processor-accessible memories located within a single data processor or device.
- memory unit is intended to include any processor- accessible data storage device, whether volatile or nonvolatile, electronic, magnetic, optical, or otherwise, including but not limited to, floppy disks, hard disks, Compact Discs, DVDs, flash memories, ROMs, and RAMs or any other digital storage medium.
- the peripheral system 120 may include one or more devices configured to provide digital content records to the data processing system 110.
- the peripheral system 120 may include digital video cameras, cellular phones, regular digital cameras, or other data processors.
- the peripheral system 120 may include the necessary equipment, devices, circuitry, etc. to connect the data processing system 110 to remote sources of data.
- the system 100 may be linked via the Internet to servers upon which datasets are stored.
- the datasets may include datasets of digital content records used to train the system 100 or datasets including digital content records that are to be analyzed by the system 100.
- the data processing system 110 upon receipt of digital content records from a device in the peripheral system 120, may store such digital content records in the processor-accessible memory system 140 for future processing or, if sufficient processing power is available, analyze the digital content records in real time as a received data stream.
- the user interface system 130 may include a mouse, a keyboard, another computer, or any device or combination of devices from which data is input to the data processing system 110.
- the peripheral system 120 is shown separately from the user interface system 130, the peripheral system 120 may be included as part of the user interface system 130.
- the user interface system 130 also may include a display device, a processor-accessible memory, or any device or combination of devices to which data is output by the data processing system 110.
- a display device e.g., a liquid crystal display
- a processor-accessible memory e.g., any device or combination of devices to which data is output by the data processing system 110.
- such memory may be part of the memory unit 140 even though the user interface system 130 and the memory unit 140 are shown separately in Fig. 1.
- the basic operation of the system will now be described with reference to Fig.
- a new event (E 0 ) is provided to the system 100 via a data entry module 200, where it is desired that the probability that E 0 belongs to a specific semantic event be determined.
- the data processing unit 110 controls the operation of the peripheral unit 120 to download the data corresponding to E 0 into the memory unit 140.
- Each event contains a plurality of digital content records, in the illustrated example still digital images m o , p and video segments m o , v , that are grouped together according to capture time and color similarity utilizing the previously described clustering method.
- the clustering method can be applied to a dataset of still digital images and video segments prior to submission to the system 100.
- the dataset can be provided to the system 100 and the data entry module 200 can perform the clustering operation as one operational element of the data processing unit 110 in order to generate E 0 .
- a consumer may capture a dataset consisting of one hundred digital still images and videos of a plurality of different events using an electronic camera.
- a memory card from the electronic camera is provided to a card reader unit as part of the peripheral unit 120.
- the data processing unit 110 controls the operation of the peripheral unit 120 to download the dataset from the memory card to the memory unit 140.
- the data processing unit 110 then proceeds to perform the clustering algorithm on the dataset in order to group the digital still images and videos into a plurality of clusters that correspond to the plurality of events.
- the functions of the instructions provided within the data entry module 200 are completed and a number of digital still images and videos (for example 10 out of the original
- E 0 100 are identified as being associated with E 0 .
- E 0 has yet to be associated with a particular semantic event such as "wedding".
- a visual feature extraction module 210 is then utilized to obtain keyframes from the video segments in E 0 and to extract visual features from both the keyframes and the digital still images contained in E 0 .
- the visual feature extraction module 210 determines grid-based color moments, Gabor texture and edge directional histogram for each of the digital still images and videos. It will be understood, however, that other visual features may be readily employed other than those utilized in the illustrated example.
- the data processing unit 110 performs the necessary keyframe and visual feature extraction utilizing conventional techniques on each of the digital still images and videos contained with E 0 in accordance with the instructions provided within the visual feature extraction module 210. Accordingly, three visual feature representations of each of ten digital still images and videos corresponding to E 0 are now available for further analysis.
- the three visual features extracted by the feature extraction module 210 are utilized by a concept detection module 220 to generate concept scores reflective of the probability that a particular keyframe or still digital image is related to a specific semantic event.
- the concept detection module 220 preferably determines the concept scores using a two step process.
- a concept score detection module 222 utilizes twenty-one of the above-described SVM semantic concept detectors (implemented by the data processing unit 110) to generate a concept score based on each individual classifier in each visual feature space for each digital still image and keyframe.
- the individual concept scores are then fused by a fusion module 224 (implemented by the data processing unit 110) to generate an ensemble concept detection score for a particular digital still image or keyframe, thereby reducing the amount of data to be further processed.
- the fusion module 224 first normalizes the different classification outputs from different features by a sigmoid function 1/(1+exp(- D)) where D is the output of the SVM classifier representing the distance to the decision boundary. Fusion is accomplished by taking the average of the classification outputs from the different visual features for each of the twenty-one concepts to generate the ensemble concept detection score.
- the color feature representation of a first image in the group of ten images may have a 90% probability of containing people, a 5% probability of containing a park and a 5% probability of containing flowers
- the texture feature representation of the first image has a 5% probability of containing people, an 80% probability of containing a park, and a 15% probability of containing flowers
- the edge detection feature of the first image has a 10% probability of containing people, an 50% probability of containing a park, and a 40% probability of containing flowers.
- the probabilities for each of the concepts for each of the visual representations is averaged, such that the ensemble concept score for the first image would be 35% probability of containing people (average of the people probabilities for color 90%, texture 5% and edge 5%), a 30% probability of containing a park (average of the park probabilities for color 5%, texture 80% and edge 5%), and a 20% probability of containing flowers (average of the flowers probabilities for color 5%, texture 15% and edge 40%).
- the ensemble concept scores are subsequently provided to a BOF module 230 which determines a BOF vector for E 0 .
- the BOF feature vector for E 0 is obtained by first determining individual feature vectors for each of the digital still images and video segments contained within E 0 using the ensemble concept score of each respective digital still image and video segment.
- each digital still image or video segment is treated as a data point, and a pairwise similarity between the ensemble concept score of each data point in E 0 and the ensemble concept score of each predetermined positive training data point for a given semantic event (SE), for example "wedding", is then calculated by a similarity detection module 232 using EMD. Effectively, an individual feature vector is obtained for each of the digital still images and video segments contained in E 0 .
- SE semantic event
- a mapping module 234 is then used to map each of the individual feature vectors of E 0 to a codebook of semantic events (previously developed during a training process described in greater detail below), and an event feature vector for E 0 is generated based on the mapped similarities.
- the event feature vector can now be supplied to a classifier module 240.
- the classifier module 240 is utilizes a SVM classifier to generate an event detection score for E 0 .
- the event detection score represents a final probability that the new event E 0 corresponds to a given semantic event such as "wedding".
- the event detection score is then preferably compared with a predetermined threshold to determine if £b should be categorized as a wedding event.
- the predetermined threshold may be varied depending on the level of accuracy required by the system 100 in a given application.
- the tagged still digital images and video segments can be written to an image storage medium via the peripheral unit 120.
- the tagging of the still digital images and video segments with semantic event classifiers provides an additional advantage of enabling the images and video segments to be easily retrieved by search engines.
- Training of the system 100 will now be described with reference to Fig. 3 First, 7 positive training events E 1 ,...,Er are entered using the data entry module 200. Each event E, contains m ttP photos and m t , v videos grouped together according to capture time and color similarity by the previously described clustering method.
- the visual extraction module 210 is then used to extract keyframes from video segments, and visual features are extracted from both the keyframes and the digital still images. As in the case of the operation described above, the visual features include grid-based color moments, Gabor texture and edge directional histogram.
- the concept detection module 220 is then used to generate the ensemble concept scores for the keyframes and the digital images as discussed above.
- a BOF learning module 250 is then employed to train the system 100.
- each digital image or video segment is treated as a data point and a pairwise similarity between each pair of data points is calculated by EMD using the similarity detection module 232 previously described.
- a spectral clustering module 252 is used to apply spectral clustering to group the data points into different clusters, with each clustering corresponding to one code word.
- To train a classifier for detecting semantic event SE all training events E 1 (E 1 contains both positive training events and negative training events for SE) are mapped to the above codebook to generate a BOF feature vectors for each training event by a mapping module 254.
- a classifier training module 260 is used to train a binary SVM classifier to detect a specific semantic event SE.
- Fig. 4 depicts the details of the training process for the video concept detectors utilized in the concept score detection module 222.
- N positive training videos from a benchmark consumer video data set are provided via the data entry module 200.
- Keyframes are obtained from videos and visual features are extracted from the keyframes as in the prior examples utilizing the visual feature extraction module 210.
- the visual features include grid-based color moments, Gabor texture and edge directional histogram.
- a concept training module 270 is then used to train the concept detectors. Namely, based on each type of visual feature, each keyframe is represented as a feature vector, and a binary SVM classifier is trained to detect concept C. The discriminant function of these classifiers over individual types of features are averaged together to generate the ensemble concept detector for concept C.
- Average precision was used as the performance metric, which has been used as an official metric for video concept detection. See, for example, Nist, "Tree video retrieval evaluation (treevid),” 2001-2006, http://www- nlpir.nist.gov/projects/treevid. It calculates the average of precision values at different recall points on the precision-recall curve, and thus evaluates the effectiveness of a classifier in detecting a specific semantic event. When multiple semantic events are considered, the mean of APs (MAP) is used. To show the effectiveness of the concept score representation in the semantic event detection algorithm, an experiment was done to compare the inventive method and the approach where the BOF feature vectors are constructed based on original low-level visual features.
- Fig. 6 gives the performance comparison. As shown in Fig. 6, both methods consistently outperform random guess. SE detection with concept scores, however, significantly outperforms SE detection with low-level features in terms of AP over most concepts . and by 20.7% in terms of MAP. This result confirms the power of using previous concept detection models to help detect semantic events. A second experiment was conducted to compare event-level versus image- level representations.
- Fig. 7 gives AP comparison of the different methods.
- the proposed SEDetection performs the best over most of the semantic events and gets significant performance improvement of more than 20% compared with the second best method, over many semantic events like "wedding", "Christmas", and "school activity”. This result confirms the success of the event-level BOF representation.
- Fig. 8 gives a comparison of the number of support vectors from different algorithms. Generally, the less the support vectors the simpler the decision boundary.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP09758682A EP2289021B1 (en) | 2008-06-02 | 2009-05-22 | Semantic event detection for digital content records |
| JP2011512451A JP5351958B2 (ja) | 2008-06-02 | 2009-05-22 | デジタルコンテンツ記録のための意味論的イベント検出 |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US5820108P | 2008-06-02 | 2008-06-02 | |
| US61/058,201 | 2008-06-02 | ||
| US12/331,927 US8358856B2 (en) | 2008-06-02 | 2008-12-10 | Semantic event detection for digital content records |
| US12/331,927 | 2008-12-10 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2009148518A2 true WO2009148518A2 (en) | 2009-12-10 |
| WO2009148518A3 WO2009148518A3 (en) | 2010-01-28 |
Family
ID=41379891
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2009/003160 Ceased WO2009148518A2 (en) | 2008-06-02 | 2009-05-22 | Semantic event detection for digital content records |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US8358856B2 (enExample) |
| EP (1) | EP2289021B1 (enExample) |
| JP (1) | JP5351958B2 (enExample) |
| WO (1) | WO2009148518A2 (enExample) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012043337A (ja) * | 2010-08-23 | 2012-03-01 | Nikon Corp | 画像処理装置、撮像システム、画像処理方法、およびプログラム |
Families Citing this family (40)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20100052676A (ko) * | 2008-11-11 | 2010-05-20 | 삼성전자주식회사 | 컨텐츠 앨범화 장치 및 그 방법 |
| US8611677B2 (en) * | 2008-11-19 | 2013-12-17 | Intellectual Ventures Fund 83 Llc | Method for event-based semantic classification |
| US8406461B2 (en) | 2010-04-27 | 2013-03-26 | Intellectual Ventures Fund 83 Llc | Automated template layout system |
| US8406460B2 (en) | 2010-04-27 | 2013-03-26 | Intellectual Ventures Fund 83 Llc | Automated template layout method |
| US8970720B2 (en) | 2010-07-26 | 2015-03-03 | Apple Inc. | Automatic digital camera photography mode selection |
| US20130132377A1 (en) * | 2010-08-26 | 2013-05-23 | Zhe Lin | Systems and Methods for Localized Bag-of-Features Retrieval |
| JP5649425B2 (ja) * | 2010-12-06 | 2015-01-07 | 株式会社東芝 | 映像検索装置 |
| US8923607B1 (en) * | 2010-12-08 | 2014-12-30 | Google Inc. | Learning sports highlights using event detection |
| US8635197B2 (en) | 2011-02-28 | 2014-01-21 | International Business Machines Corporation | Systems and methods for efficient development of a rule-based system using crowd-sourcing |
| US20120275714A1 (en) * | 2011-04-27 | 2012-11-01 | Yuli Gao | Determination of an image selection representative of a storyline |
| US9055276B2 (en) | 2011-07-29 | 2015-06-09 | Apple Inc. | Camera having processing customized for identified persons |
| US8983940B2 (en) | 2011-09-02 | 2015-03-17 | Adobe Systems Incorporated | K-nearest neighbor re-ranking |
| US8634661B2 (en) * | 2011-09-07 | 2014-01-21 | Intellectual Ventures Fund 83 Llc | Event classification method using light source detection |
| US20130058577A1 (en) * | 2011-09-07 | 2013-03-07 | Peter O. Stubler | Event classification method for related digital images |
| US8634660B2 (en) * | 2011-09-07 | 2014-01-21 | Intellectual Ventures Fund 83 Llc | Event classification method using lit candle detection |
| US8805116B2 (en) | 2011-09-17 | 2014-08-12 | Adobe Systems Incorporated | Methods and apparatus for visual search |
| JP2015529365A (ja) * | 2012-09-05 | 2015-10-05 | エレメント,インク. | カメラ付きデバイスに関連する生体認証のためのシステム及び方法 |
| US8880563B2 (en) | 2012-09-21 | 2014-11-04 | Adobe Systems Incorporated | Image search by query object segmentation |
| US9451335B2 (en) | 2014-04-29 | 2016-09-20 | At&T Intellectual Property I, Lp | Method and apparatus for augmenting media content |
| US9898685B2 (en) | 2014-04-29 | 2018-02-20 | At&T Intellectual Property I, L.P. | Method and apparatus for analyzing media content |
| KR102506826B1 (ko) | 2014-05-13 | 2023-03-06 | 엘리먼트, 인크. | 모바일 장치와 관련된 전자 키 지급 및 액세스 관리를 위한 시스템 및 방법 |
| CN106664207B (zh) | 2014-06-03 | 2019-12-13 | 埃利蒙特公司 | 与移动设备有关的考勤验证与管理 |
| CN105335595A (zh) | 2014-06-30 | 2016-02-17 | 杜比实验室特许公司 | 基于感受的多媒体处理 |
| CN104133917B (zh) * | 2014-08-15 | 2018-08-10 | 百度在线网络技术(北京)有限公司 | 照片的分类存储方法及装置 |
| AU2014218444B2 (en) * | 2014-08-29 | 2017-06-15 | Canon Kabushiki Kaisha | Dynamic feature selection for joint probabilistic recognition |
| US10572735B2 (en) * | 2015-03-31 | 2020-02-25 | Beijing Shunyuan Kaihua Technology Limited | Detect sports video highlights for mobile computing devices |
| CN104915685A (zh) * | 2015-07-02 | 2015-09-16 | 北京联合大学 | 基于多矩形划分的图像表示方法 |
| KR102225088B1 (ko) * | 2015-10-26 | 2021-03-08 | 에스케이텔레콤 주식회사 | 상황 정보 기반의 태그 생성 방법 및 장치 |
| US9961202B2 (en) * | 2015-12-31 | 2018-05-01 | Nice Ltd. | Automated call classification |
| US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
| US20190019107A1 (en) * | 2017-07-12 | 2019-01-17 | Samsung Electronics Co., Ltd. | Method of machine learning by remote storage device and remote storage device employing method of machine learning |
| US10735959B2 (en) | 2017-09-18 | 2020-08-04 | Element Inc. | Methods, systems, and media for detecting spoofing in mobile authentication |
| CN108090199B (zh) * | 2017-12-22 | 2020-02-21 | 浙江大学 | 一种大型图像集的语义信息提取和可视化方法 |
| CN113826110B (zh) | 2019-03-12 | 2023-06-20 | 埃利蒙特公司 | 利用移动设备检测生物特征身份识别的欺骗的方法、系统和介质 |
| US11586861B2 (en) | 2019-09-13 | 2023-02-21 | Toyota Research Institute, Inc. | Embeddings + SVM for teaching traversability |
| CN110781963B (zh) * | 2019-10-28 | 2022-03-04 | 西安电子科技大学 | 基于K-means聚类的空中目标分群方法 |
| US11507248B2 (en) | 2019-12-16 | 2022-11-22 | Element Inc. | Methods, systems, and media for anti-spoofing using eye-tracking |
| CN111221984B (zh) * | 2020-01-15 | 2024-03-01 | 北京百度网讯科技有限公司 | 多模态内容处理方法、装置、设备及存储介质 |
| EP4113327A4 (en) * | 2020-02-27 | 2023-06-21 | Panasonic Intellectual Property Management Co., Ltd. | IMAGE PROCESSING DEVICE AND IMAGE PROCESSING METHOD |
| US20230215155A1 (en) * | 2022-01-05 | 2023-07-06 | Dell Products L.P. | Label inheritance for soft label generation in information processing system |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6832006B2 (en) * | 2001-07-23 | 2004-12-14 | Eastman Kodak Company | System and method for controlling image compression based on image emphasis |
| US7039239B2 (en) * | 2002-02-07 | 2006-05-02 | Eastman Kodak Company | Method for image region classification using unsupervised and supervised learning |
| US20030233232A1 (en) * | 2002-06-12 | 2003-12-18 | Lucent Technologies Inc. | System and method for measuring domain independence of semantic classes |
| US7383260B2 (en) * | 2004-08-03 | 2008-06-03 | International Business Machines Corporation | Method and apparatus for ontology-based classification of media content |
| US7545978B2 (en) * | 2005-07-01 | 2009-06-09 | International Business Machines Corporation | Methods and apparatus for filtering video packets for large-scale video stream monitoring |
| JP2007317077A (ja) * | 2006-05-29 | 2007-12-06 | Fujifilm Corp | 画像分類装置および方法ならびにプログラム |
| US8165406B2 (en) * | 2007-12-12 | 2012-04-24 | Microsoft Corp. | Interactive concept learning in image search |
-
2008
- 2008-12-10 US US12/331,927 patent/US8358856B2/en not_active Expired - Fee Related
-
2009
- 2009-05-22 WO PCT/US2009/003160 patent/WO2009148518A2/en not_active Ceased
- 2009-05-22 EP EP09758682A patent/EP2289021B1/en not_active Not-in-force
- 2009-05-22 JP JP2011512451A patent/JP5351958B2/ja not_active Expired - Fee Related
Non-Patent Citations (9)
| Title |
|---|
| AMIR, A. ET AL.: "IBM research TRECVID-2003 video retrieval system" PROC. TRECVID WORKSHOP, 2003, XP002555781 * |
| AMIR, A. ET AL.: "IBM research TRECVID-2005 video retrieval system" PROCEEDINGS OF TRECVID 2005, 2005, pages 1-17, XP002555780 * |
| DONG XU ET AL: "Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment" IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 30, no. 11, 20 May 2008 (2008-05-20), pages 1985-1997, XP011227516 ISSN: 0162-8828 * |
| HAERING N ET AL: "VISUAL EVENT DETECTION, PASSAGE" VISUAL EVENT DETECTION; [KLUWER INTERNATIONAL SERIES IN VIDEO COMPUTING], NORWELL, MA : KLUWER ACADEMIC PUBL, US, 1 January 2001 (2001-01-01), pages I-II,19, XP002357960 ISBN: 978-0-7923-7436-7 * |
| LEXING XIE ET AL: "Detecting Generic Visual Eventswith Temporal Cues" SIGNALS, SYSTEMS AND COMPUTERS, 2006. ACSSC '06. FORTIETH ASILOMA R CONFERENCE ON, IEEE, PI, 1 October 2006 (2006-10-01), pages 54-58, XP031081013 ISBN: 978-1-4244-0784-2 * |
| SHAHRAM EBADOLLAHI ET AL: "Visual Event Detection using Multi-Dimensional Concept Dynamics" MULTIMEDIA AND EXPO, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 July 2006 (2006-07-01), pages 881-884, XP031032977 ISBN: 978-1-4244-0366-0 cited in the application * |
| SHIH-FU CHANG ET AL: "Recent Advances and Challenges of Semantic Image/Video Search" 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING 15-20 APRIL 2007 HONOLULU, HI, USA, IEEE, PISCATAWAY, NJ, USA, 15 April 2007 (2007-04-15), pages IV-1205, XP031464072 ISBN: 978-1-4244-0727-9 * |
| WEI JIANG ET AL: "Semantic event detection for consumer photo and video collections" MULTIMEDIA AND EXPO, 2008 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), pages 313-316, XP031312721 ISBN: 978-1-4244-2570-9 * |
| YUXIN PENG ET AL: "EMD-Based Video Clip Retrieval by Many-to-Many Matching" IMAGE AND VIDEO RETRIEVAL; [LECTURE NOTES IN COMPUTER SCIENCE;;LNCS], SPRINGER-VERLAG, BERLIN/HEIDELBERG, vol. 3568, 4 August 2005 (2005-08-04), pages 71-81, XP019012799 ISBN: 978-3-540-27858-0 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2012043337A (ja) * | 2010-08-23 | 2012-03-01 | Nikon Corp | 画像処理装置、撮像システム、画像処理方法、およびプログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| US8358856B2 (en) | 2013-01-22 |
| EP2289021B1 (en) | 2013-01-02 |
| JP2011525012A (ja) | 2011-09-08 |
| WO2009148518A3 (en) | 2010-01-28 |
| EP2289021A2 (en) | 2011-03-02 |
| US20090297032A1 (en) | 2009-12-03 |
| JP5351958B2 (ja) | 2013-11-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2289021B1 (en) | Semantic event detection for digital content records | |
| US8213725B2 (en) | Semantic event detection using cross-domain knowledge | |
| US12222980B2 (en) | Generating congruous metadata for multimedia | |
| Zhou et al. | Recent advance in content-based image retrieval: A literature survey | |
| Yanagawa et al. | Columbia university’s baseline detectors for 374 lscom semantic visual concepts | |
| US8548256B2 (en) | Method for fast scene matching | |
| Amores et al. | Context-based object-class recognition and retrieval by generalized correlograms | |
| Salih et al. | An effective bi-layer content-based image retrieval technique | |
| Mehmood et al. | A novel image retrieval based on a combination of local and global histograms of visual words | |
| JP2016134175A (ja) | ワイルドカードを用いてテキスト−画像クエリを実施するための方法およびシステム | |
| Jiang | Super: Towards real-time event recognition in internet videos | |
| Dharani et al. | Content based image retrieval system using feature classification with modified KNN algorithm | |
| Bekhet et al. | Gender recognition from unconstrained selfie images: a convolutional neural network approach | |
| Mathan Kumar et al. | Multiple kernel scale invariant feature transform and cross indexing for image search and retrieval | |
| Xu et al. | Robust seed localization and growing with deep convolutional features for scene text detection | |
| Imran et al. | Event recognition from photo collections via pagerank | |
| Aly et al. | Scaling object recognition: Benchmark of current state of the art techniques | |
| Jiang et al. | Semantic event detection for consumer photo and video collections | |
| Liu et al. | Determining the best attributes for surveillance video keywords generation | |
| Deselaers et al. | Overview of the ImageCLEF 2007 object retrieval task | |
| Lee et al. | Face Discovery with Social Context. | |
| Watcharapinchai et al. | Dimensionality reduction of SIFT using PCA for object categorization | |
| Diou et al. | Vitalas at trecvid-2008 | |
| Tao | Visual concept detection and real time object detection | |
| Olaode | Application of unsupervised image classification to semantic based image retrieval |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09758682 Country of ref document: EP Kind code of ref document: A2 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2011512451 Country of ref document: JP Ref document number: 2009758682 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |