WO2012141655A1 - In-video product annotation with web information mining - Google Patents

In-video product annotation with web information mining Download PDF

Info

Publication number
WO2012141655A1
WO2012141655A1 PCT/SG2012/000127 SG2012000127W WO2012141655A1 WO 2012141655 A1 WO2012141655 A1 WO 2012141655A1 SG 2012000127 W SG2012000127 W SG 2012000127W WO 2012141655 A1 WO2012141655 A1 WO 2012141655A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
visual
video
images
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SG2012/000127
Other languages
English (en)
French (fr)
Inventor
Tat Seng Chua
Guangda LI
Zheng Lu
Meng Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Singapore
Original Assignee
National University of Singapore
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Singapore filed Critical National University of Singapore
Priority to GB1319882.5A priority Critical patent/GB2506028B/en
Priority to SG2013075056A priority patent/SG194442A1/en
Priority to JP2014505107A priority patent/JP6049693B2/ja
Priority to US14/111,149 priority patent/US9355330B2/en
Priority to CN201280027434.XA priority patent/CN103608826B/zh
Publication of WO2012141655A1 publication Critical patent/WO2012141655A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7847Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using low-level visual features of the video content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • Described embodiments relate generally to product annotation in a video and specifically to in-video product annotation using web information mining.
  • Video annotation also widely known as video concept detection or high-level feature extraction
  • video concept detection or high-level feature extraction which aims to automatically assign descriptive concepts to video content
  • high level concepts such as events (e.g., airplane crash and running), scenes (e.g., sundown and beach) and object categories (e.g., car and screen)
  • object categories e.g., car and screen
  • Annotation of product concepts is of great importance to many applications such
  • the second challenge is the effective visual representation. Bag of Visual Words
  • Scale Invariant Feature Transform SIFT
  • a BoVW histogram is generated to describe the product image.
  • the descriptors of an image are about the whole image and not the product parts contained in the image and contain a lot noise for product annotation.
  • Embodiments of the invention provide product annotation in a video to one or more users using product training images from web mining.
  • a computer system provides services of product annotation in a video to one or more users.
  • the system receives a video from a user, where the video includes multiple video frames.
  • the system extracts multiple key frames from the video and generates a visual representation of the key frame.
  • the system compares the visual representation of the key frame with a plurality of product visual signatures, where each visual signature identifies a product.
  • the system collects multiple training images comprising multiple of expert product images obtained from an expert product repository, each of which is associated with multiple product images obtained from multiple web resources. Based on the comparison of the visual representation of the key frame and a product visual signature, the system determines whether the key frame contains the product identified by the visual signature of the product.
  • FIG. 1 is a block diagram of a computing environment configured to provide in- video product annotation service to clients.
  • FIG. 2 is a block diagram of an in- video product annotation module to generate product visual signatures and annotate products detected in a video stream.
  • FIG. 3 is an example of collecting training images for in- video product annotation process according to one embodiment of the invention.
  • FIG. 4 is an example of product images for collectively generating product visual signatures.
  • FIG. 5 is a flow chart of a process for generating a visual signature of a product according to one embodiment of the invention.
  • FIG. 6 is a flow chart of a process for detecting a product and annotating the detected product in one or more video frames of a videos stream according to one embodiment of the invention.
  • FIG. 7 is an example of an in- video product annotation system according to one embodiment of the invention.
  • FIG. 8 is an example result of an in- video product annotation process according to one embodiment of the invention.
  • FIG. 1 is a block diagram of a computing environment 100 configured to provide in-video product annotation service to clients 1 10.
  • Multiple users/viewers use clients 1 10A-N to provide video streams to an in-video product annotation service 120 and request the in-video product annotation service 120 to annotate the products contained in the video frames of the video streams.
  • the product annotation service 120 stores the video streams and responds to the requests with product detection and annotation results to the clients 1 10.
  • Each client 110 executes a browser 1 12 for browsing the video streams and product annotation results from the product annotation service 120.
  • Other embodiments can have different configurations.
  • each client 1 10 is used by a user to use services provided by the in- video product annotation service 120.
  • a user uses a client 1 10 to browse a video, request annotation of a product contained in the video and receive the product detection and annotation results from the product annotation service 120.
  • the client 110 can be any type of computer device, such as a personal computer (e.g., desktop, notebook, and laptop) computer, as well as devices such as a mobile telephone or personal digital assistant that has the capability to record video content.
  • the client 110 typically includes a processor, a display device (or output to a display device), a local storage, such as a hard drive or flash memory device, to which the client 1 10 stores data used by the user in performing tasks, and a network interface for coupling to the in-video product annotation service 120 via the network 130.
  • the network 130 enables communications between the clients 1 10 and the in- video product annotation service 120.
  • the network 130 is the Internet, and uses standardized internetworking communications technologies and protocols, known now or subsequently developed that enable the clients 110 to communicate with the in- video product annotation service 120.
  • the network 130 is a cloud computing network and includes one or more components of the in- video product annotation service 120.
  • the visual signature generation stage includes three components: collection of high-quality visual examples of products from a repository, e.g., AMAZONTM, expansion of the collected visual examples with Internet product images search results, and generation of visual signature from training examples comprising the high-quality visual examples of products and their
  • the visual signatures of a variety of known products are stored in a product visual signature file.
  • the runtime video processing stage includes two components, feature extraction and product annotation.
  • the product annotation service 120 For an input video stream, the product annotation service 120 identifies a set of key frames of the video stream, and for each key frame, the product annotation service 120 extracts visual features (e.g., Scale Invariant Feature Transform (SIFT) descriptors) and generates a visual representation of the extracted features (e.g., the Bag of Visual Words (BoVW) histograms).
  • SIFT Scale Invariant Feature Transform
  • BoVW Bag of Visual Words
  • the product annotation service 120 performs production annotation by comparing the visual signature of each product stored in the visual signature file with the BoVW histogram of each key frame of the input video.
  • the in-video product annotation service In the embodiment illustrated in FIG. 1 , the in-video product annotation service
  • the 120 has an in-video product annotation module 102, a video server 104 and a product images database 106.
  • the in- video product annotation module 102 comprises a product visual signature generation module 200 for product visual signature generation and a video processing module 300 for processing input videos from clients 110.
  • the video server 104 stores the videos streams received from the clients 110 and annotated video frames of the videos streams.
  • the product images database 106 comprises two sub-databases: database 1 (106A) and database 2 (106B), to store high-quality product images obtained from one or more online product merchants, such as AMAZONTM, and related product images collected through Internet search.
  • the product images from a known product merchant generally have high visual quality, but the number of them for a given product can be limited.
  • the number of related images of the product through Internet search using various search engines, such as GOOGLETM can be large but noisy (e.g., containing textual information unrelated to the product).
  • the product annotation service 120 filters the related product images obtained from the Internet search results based on the high quality product images to generate product visual signatures, and uses the product visual signatures to detect and annotate products in a video stream.
  • the high-quality product images from a known merchant are referred to as "expert product images,” and for a given expert product image, its associated images obtained from Internet search are referred to as “extended product images.”
  • extended product images IN-VIDHO PRODUCT ANNO TA TION VISUAL SIGNA TURE GENERATION
  • FIG. 2 is a block diagram of an in- video product annotation module 102 to generate product visual signatures and annotate products detected in a video stream according to one embodiment.
  • the product annotation module 102 comprises a product visual signature generation module 200 and a video processing module 300.
  • the product visual signature generation module 200 includes an expert product images module 210, an extended product images module 220 and a visual signature generation module 230.
  • the video processing module 300 includes a frame extraction module 310, a feature extraction and quantization module 320 and a product annotation module 330.
  • the product visual signature generation module 200 is configured to generate product visual signatures.
  • the expert product images module 210 is configured to collect high- quality visual examples of products (e.g., expert product images in different views, such as frontal, side and back views). In one embodiment, the expert product images module 210 collects the expert product images from AMAZONTM for a variety of consumer products, such as digital cameras, cars and digital phones.
  • the expert product images of a given product are often too few for constructing a good visual signature for the product.
  • numbers of expert product images collected from AMAZONTM for a product vary from 1 to 8.
  • the extended product images module 220 is configured to collect, from the Internet, associated images of a product that has one or more expert product images.
  • the product name is used as a search query for associated product images in the Internet using GOOGLETM search engine. The process is to expand the expert product images using the web product image database.
  • the images from the Internet search contain a lot of noise, e.g., text information
  • the signature generation module 230 Before the signature generation module 230 generates visual signatures of products, the signature generation module 230 re-ranks the extended product images from the Internet search results based on the expert product images. For each expert product image, a pre-determined number of extended product images that are near to the expert product image are selected as the result of the filtering. For a given product, the expert product images and the filtered extended product images form a set of positive training images for the product, from which the signature generation module 230 generates a visual signature for the product.
  • the collection of training images of known products can be automated to improve the in-video product annotation system performance.
  • the signature generation module 230 extracts the visual features of the expert product image and its associated extended product images.
  • the visual features of the product images are Bag of Visual Words (BoVW) features.
  • the signature generation module 230 extracts one or more SIFT descriptors on several detected key points or by densely sampling patches of each product image and quantizes the SIFT descriptors into multiple visual words.
  • a BoVW histogram is generated from the quantized SIFT descriptors to describe each image.
  • the signature generation module 230 uses a visual feature detection and extraction method, e.g., Difference-of-Gaussian method, to extract 128-dimension SIFT features from a product image and groups the SIFT features into 160, 000 clusters with hierarchical K-means.
  • the product image is represented by a 160,000-dimensional BoVW histogram.
  • the signature generation module 230 selects a predetermined a number of nearest neighbors from the extended product images associated with the expert product image based on a similarity measure defined in Equation (1) below:
  • the signature generation module 230 obtains kn positive training images for a given product, where k is the number of expert product images and n is the pre-determined nearest neighbors (i.e., the extended product images) of the expert product image.
  • FIG. 3 provides an example of training data collection process for digital camera Canon 40D.
  • the product visual signature generation module 200 collects five expert product images 302 of the camera in different views from online merchant AMAZONTM. For each expert product image, the product visual signature generation module 200 searches the Internet using GOOGLETM search engine to collect a number of related product images 304. Because the product images obtained from the Internet search can be noisy (e.g., containing text unrelated to the product), the product visual signature generation module 200 filters the related product images based on the expert product images.
  • the product visual signature generation module 200 applies a correlative sparsification described below to reduce the noise by selecting a pre-determined number of nearest neighbors of the product images from the Internet search.
  • the selection of related product image is based on a similarity measure between the related product image and its corresponding expert product image.
  • the product visual signature generation module 200 obtains a set of training examples 306 for digital camera Canon 40D, where the product visual signature generation module 200 generates a visual signature for digital camera Canon 40D.
  • the signature generation module 230 To effectively annotate a product contained in a product image represented in a high dimensional feature space, the signature generation module 230 generates a template for annotation by averaging positive training images of the product. In one embodiment, the signature generation module 230 merges the visual representation of multiple training images of a product to generate an accumulated histogram for the product. Since there are many noises caused by the descriptors from image background, there are actually many noisy bins in the accumulated histogram.
  • Equation (2) which fits the LI -regularized least square optimization problem, where indicate the 2-norm and 1-norm, respectively.
  • the parameter ⁇ modulates
  • Equation (2) keeps the obtained signature to be close to the original one, while the second term minimizes the 1 -norm value of the obtained visual signature, which makes the signature sparse.
  • the signature generation module 230 generates visual signatures of products collectively.
  • the signature generation module 230 modifies the visual signature generation defined in Equation (2) by adding a graph Laplacian term to Equation (2) as follows:
  • Equation (3) can be solved using an optimization approach. Assuming that all the visual signatures except v, are fixed. The problem described by Equation (3) can be rewritten as Equation (4) below:
  • the visual signature v is defined by Equation
  • the signature generation module 230 uses an interior-point method to solve the problem defined in Equation (5).
  • the visual signature of a product represents the ground truth of the product, which can be used to determine whether a video frame of a video stream contains the product at runtime.
  • the similarity between two sets of product images is defined by Equation (6) as: where the number of images for image sets indicates the &-th product in the set P t , and sim(.,) is the similarity of an image pair from different sets.
  • Equation (6) The similarity measure defined in Equation (6) has the following properties:
  • Wij Wji,: the similarity is symmetry
  • the similarity sim(.,.) of an image pair from different sets is calculated from the histogram intersection for the image pair described in Equation (1).
  • the similarity of two products belonging to two different subcategories of a product e.g., video games and portable audio/video products under the same class of product "electronics" is set to zero.
  • FIG. 4 is an example of three sets of product images for collectively generating product visual signatures according to one embodiment of the invention.
  • the example in FIG. 4 contains three sets of product images for three products: product image set 410 is for digital camera Canon 40D; product image set 420 is for digital camera Nikon D90; and product image set 430 is for video game console Xbox. It is noted that the products Canon 40D and Nikon D90 have very close appearances because they belong to same class of product. Generating visual signatures of products collectively enables the visual signatures of products to reflect the closeness of images of products of a same class.
  • the signature generation module 230 can derive an iterative process to solve each v ; by repeatedly updating each v, .
  • An example pseudo code for iteratively updating the visual signature v is the following:
  • FIG. 5 is a flow chart of a process for generating a visual signature of a product according to one embodiment of the invention.
  • the product visual signature generation module 200 searches 510 for expert product images from a repository for a product and stores 520 the expert product images in a storage place (e.g., the database 1 of the product images database 106 of FIG. 1).
  • the product visual signature generation module 200 collects 530 multiple related product images through web mining (e.g., the Internet search).
  • the product visual signature generation module 200 filters the related product images based on its corresponding expert product image and generates 540 training sample product images from the filtering.
  • the product visual signature generation module 200 uses the training sample product images, the product visual signature generation module 200 generates 550 a visual signature for the product.
  • the product visual signature generation module 200 compiles 560 a visual signature file containing the visual signatures for a variety of products.
  • the in-video product annotation module 102 has a video processing module 300 to annotate product in one or more video frames of a video stream.
  • the video processing module 300 receives a video stream from a client 110 and processes one or more selected video frames of the video stream. For each selected video frame, the video processing module 300 determine whether the video frame contains a known product using the product visual signatures provided by the product visual signature generation module 200.
  • the video processing module 300 includes a video frame extraction module 310, a feature extract and quantization module 320 and a product annotation module 330.
  • the video frame extraction module 310 receives a video stream consisting of multiple video frames and extracts a number of key frames from the video stream.
  • One way to extract the key frames from the video stream is to select a video frame at a fixed point of the video stream, e.g., extracting a video frame every 5 seconds of the video stream.
  • Other embodiments of the frame extraction module 310 can use different methods, e.g., selecting the first frame of every group of pictures (GOPs) of the video stream, to obtain key frames.
  • GOPs group of pictures
  • the feature extract and quantization module 320 is for extracting visual features from the key frames of the video stream and quantized the extracted visual features to generate a visual representation of each key frame.
  • the feature extraction and quantization module 320 uses a Difference-of-Gaussian method to detect keypoints in a key frame and from each keypoiiit, the module 320 extracts 128-dimensional SIFT features.
  • the module 320 groups the SIFT features into a number of clusters (e.g., 160,000 clusters) with hierarchical k-means.
  • a key frame is represented by a multi-dimensional bag of visual words histogram (e.g., a 160,000-dimensiaonl BoVW histogram).
  • the product annotation module 330 determines whether a key frame of a video stream contains a known product by comparing the product visual signatures with the visual representation (e.g., the 160,000-dimensiaonl BoVW histogram) of the key frame.
  • the comparison between a product visual signature and the visual representation of the key frame is measured by a product relevance measure defined in Equation (7):
  • the product annotation module 330 determines whether the key frame contains a known product. In one embodiment, the estimated product relevance measure is compared with a threshold value to determine whether the key frame contains a known product.
  • FIG. 6 is a flow chart of a process for detecting product and annotating the detected product in one or more video frames of a videos stream according to one embodiment of the invention.
  • the video processing module 300 receives 610 a video stream from a client 110 and extracts 620 multiple key frames from the video stream. For each key frame, the video processing module 300 extracts 630 visual features (e.g., SIFT features) of the key frame and generates 620 a visual presentation of the key frame (e.g., multi-dimensional BoVW histogram).
  • the video processing module 300 compares 650 the visual representation of the key frame with each visual signature of known products. Based on the comparison, the video processing module 300 determines 660 whether the key frame contains a known product.
  • FIG. 7 is an example of an in-video product annotation system according to one embodiment of the invention.
  • the in-video product annotation system 700 has a product visual signature generation sub-system 701 A for generating a visual signature of digital camera Canon G9 702 off-line and a video processing sub-system 70 IB for processing selected video frames of a video stream 712 for product annotation at runtime.
  • the product visual signature generation sub-system 701 A collects one or more expert product images at different views of Canon G9 from AMAZONTM 704 as product visual examples 706. For each expert product image of Canon G9, and the product visual signature generation sub-system 701 A collects multiple associated product images from GOOGLETM 710.
  • the product visual signature generation sub-system 701 A filters the product images from the Internet search by a correlative sparsification method 708 to reduce the noise.
  • the filtered product images and the associated expert images form a set of training images of Canon G9 and the product visual signature generation sub-system 701 A generates a visual signature for Canon G9.
  • the video processing sub-system 70 IB receives a video stream 712 and extracts
  • the video processing sub-system 70 IB compares the visual representation of the key frame with the visual signature of Canon G9, and based on the comparison, the video processing sub-system 101B determines whether the key frame contains the digital camera Canon G9. For illustration purpose, assuming that the product visual signature file compiled by the product visual signature generation 701 A contains only the visual signature of Canon G9. If the product visual signature file contains visual signatures of more products, the video processing sub-system 701B compares the visual presentation of a key frame to each of the visual signatures to determine whether the key frame contains the product identified by the visual signature.
  • a visual representation of the key frame e.g., a BoVW histogram
  • a Difference-of-Gaussian method was used to detect keypoints and from each keypoint 128-dimensional SIFT features were extracted.
  • the SIFT features were grouped into 160,000 clusters with hierarchical k-means.
  • Each product image is represented by a 160,000-dimensional BoVW histogram.
  • image search engine There are 300 images used for each product.
  • the parameters ⁇ and ⁇ 2 are empirically set to 5 and 0.05 respectively.
  • the performance results (e.g., MAP results) of different combination of training data sources and annotation algorithms are demonstrated in Table I below.
  • FIG. 8 is an example of a video frame processed by variations of product annotation algorithm and their corresponding visual words in a video frame.
  • the left column contains the visual signatures (81 OA, 820A and 830A by different annotation methods.
  • the visual signature 81 OA is generated by "No-sparse” method.
  • the visual signature 820A is generated by "1-norm sparsification” method and the visual signature 830A is generated by "Correlative sparsification” method.
  • the right column shows the corresponding visual words (810B, 820B and 830B) in a video frame.
  • the examples in FIG. 8 show that sparsification methods are able to remove several noisy bins and thus the obtained visual signatures are better.
  • in-video product annotation Another embodiment of in-video product annotation described above is multimodal product annotation, which makes use of both text information associated with a video stream.
  • the text information associated with a video includes the video title, description and tags.
  • the in- video product annotation performance can be further improved.
  • the experimental results show that pure text-based method for in- video product annotation achieves an MAP measurement of 0.23 and it is worse than the MAP 0.39 achieved by only using visual information.
  • the MAP measure can be boosted to 0.55.
  • the embodiments of the invention advantageously provide in-video product annotation using correlative sparsification of product visual signatures.
  • Sparse visual signatures of products are generated using expert product images and associated product images from web mining.
  • the noise of the associated product images are reduced by the correlative sparsification.
  • the extended training images of products from the web mining along with the expert product images enable the embodiments of the invention to generate product visual signatures, which are used to efficiently identify products contained in a video frame of a video stream at run time.
  • the performance of the embodiments of the invention is computationally efficient. For example, after feature extraction, the annotation of a product for a video frame actually scales with the number of non-zeros bins of the visual signature. When annotating a large dataset, an inverted structure can be built by investigating the sparsity of the product visual signatures. Therefore, the sparsification of the product visual signatures not only improves annotation performance also reduces computational cost.
  • Reference in the specification to "one embodiment” or to "an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” or "a preferred embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
  • Certain aspects of the invention include process steps and instructions described herein in the form of a method. It should be noted that the process steps and instructions of the invention can be embodied in software, firmware or hardware, and when embodied in software, can be downloaded to reside on and be operated from different platforms used by a variety of operating systems.
  • the invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
  • a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
  • the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
PCT/SG2012/000127 2011-04-12 2012-04-11 In-video product annotation with web information mining Ceased WO2012141655A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
GB1319882.5A GB2506028B (en) 2011-04-12 2012-04-11 In-video product annotation with web information mining
SG2013075056A SG194442A1 (en) 2011-04-12 2012-04-11 In-video product annotation with web information mining
JP2014505107A JP6049693B2 (ja) 2011-04-12 2012-04-11 ウェブ情報マイニングを用いたビデオ内製品アノテーション
US14/111,149 US9355330B2 (en) 2011-04-12 2012-04-11 In-video product annotation with web information mining
CN201280027434.XA CN103608826B (zh) 2011-04-12 2012-04-11 利用网络信息挖掘的视频内产品注释

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161474328P 2011-04-12 2011-04-12
US61/474,328 2011-04-12

Publications (1)

Publication Number Publication Date
WO2012141655A1 true WO2012141655A1 (en) 2012-10-18

Family

ID=47009585

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SG2012/000127 Ceased WO2012141655A1 (en) 2011-04-12 2012-04-11 In-video product annotation with web information mining

Country Status (6)

Country Link
US (1) US9355330B2 (https=)
JP (1) JP6049693B2 (https=)
CN (1) CN103608826B (https=)
GB (1) GB2506028B (https=)
SG (1) SG194442A1 (https=)
WO (1) WO2012141655A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015027805A1 (zh) * 2013-08-28 2015-03-05 上海合合信息科技发展有限公司 产品说明的查询方法、装置、系统及客户端

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103608826B (zh) * 2011-04-12 2017-04-05 新加坡国立大学 利用网络信息挖掘的视频内产品注释
US20130297454A1 (en) * 2012-05-03 2013-11-07 Nokia Corporation Method and apparatus for verifying association of users with products and information
US8917908B2 (en) * 2012-07-12 2014-12-23 Palo Alto Research Center Incorporated Distributed object tracking for augmented reality application
US9118886B2 (en) * 2012-07-18 2015-08-25 Hulu, LLC Annotating general objects in video
US9888207B2 (en) 2014-03-17 2018-02-06 Microsoft Technology Licensing, Llc Automatic camera selection
US10284813B2 (en) 2014-03-17 2019-05-07 Microsoft Technology Licensing, Llc Automatic camera selection
US10178346B2 (en) 2014-03-17 2019-01-08 Microsoft Technology Licensing, Llc Highlighting unread messages
US9749585B2 (en) 2014-03-17 2017-08-29 Microsoft Technology Licensing, Llc Highlighting unread messages
CN105373938A (zh) 2014-08-27 2016-03-02 阿里巴巴集团控股有限公司 识别视频图像中的商品和展示其信息的方法、装置及系统
TWI590173B (zh) * 2015-01-30 2017-07-01 Bravo Ideas Digital Co Ltd Interactive Advertising Approach and Its Interactive System
TWI582710B (zh) * 2015-11-18 2017-05-11 Bravo Ideas Digital Co Ltd The method of recognizing the object of moving image and the interactive film establishment method of automatically intercepting target image
CN106778449B (zh) * 2015-11-23 2020-09-22 创意点子数位股份有限公司 动态影像的物件辨识方法及自动截取目标图像的互动式影片建立方法
CN106845323B (zh) * 2015-12-03 2020-04-28 阿里巴巴集团控股有限公司 一种打标数据的收集方法、装置以及证件识别系统
US10643264B2 (en) * 2016-07-25 2020-05-05 Facebook, Inc. Method and computer readable medium for presentation of content items synchronized with media display
RU2729956C2 (ru) * 2016-09-08 2020-08-13 Гох Су Сиах Обнаружение объектов из запросов визуального поиска
CN107909088B (zh) * 2017-09-27 2022-06-28 百度在线网络技术(北京)有限公司 获取训练样本的方法、装置、设备和计算机存储介质
US11120070B2 (en) * 2018-05-21 2021-09-14 Microsoft Technology Licensing, Llc System and method for attribute-based visual search over a computer communication network
WO2020227845A1 (en) * 2019-05-10 2020-11-19 Shenzhen Malong Technologies Co., Ltd. Compressed network for product recognition
TW202232437A (zh) 2021-02-09 2022-08-16 阿物科技股份有限公司 圖像分類與標示方法及系統

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020114522A1 (en) * 2000-12-21 2002-08-22 Rene Seeber System and method for compiling images from a database and comparing the compiled images with known images
US6560281B1 (en) * 1998-02-24 2003-05-06 Xerox Corporation Method and apparatus for generating a condensed version of a video sequence including desired affordances
US20070083815A1 (en) * 2005-10-11 2007-04-12 Alexandre Delorme Method and apparatus for processing a video stream
US7783135B2 (en) * 2005-05-09 2010-08-24 Like.Com System and method for providing objectified image renderings using recognition information from images

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003023595A (ja) * 2001-07-06 2003-01-24 Canon Inc 画像処理装置、方法、プログラム、及びコンピュータ読み取り可能な記憶媒体
JP2003044717A (ja) * 2001-07-27 2003-02-14 Goji Toyokawa 人気タレントイメージ商品・サービス紹介システム、人気タレントイメージ商品・サービス紹介方法及びその方法を実行するためのプログラムを記録したコンピュータ読取可能な記録媒体
JP4413633B2 (ja) * 2004-01-29 2010-02-10 株式会社ゼータ・ブリッジ 情報検索システム、情報検索方法、情報検索装置、情報検索プログラム、画像認識装置、画像認識方法および画像認識プログラム、ならびに、販売システム
US20060218578A1 (en) * 2005-03-24 2006-09-28 Amy Kyle Integrated offline product branding method
TWI316690B (en) * 2006-09-05 2009-11-01 Univ Nat Cheng Kung Video annotation method by integrating visual features and frequent patterns
WO2008060580A2 (en) * 2006-11-15 2008-05-22 24Eight Llc Image-based searching apparatus and method
JP5063098B2 (ja) * 2006-12-12 2012-10-31 ヤフー株式会社 情報提供装置、情報提供方法、及びコンピュータプログラム
US8374413B2 (en) 2007-12-20 2013-02-12 Wisconsin Alumni Research Foundation Method for prior image constrained image reconstruction
US20110022589A1 (en) * 2008-03-31 2011-01-27 Dolby Laboratories Licensing Corporation Associating information with media content using objects recognized therein
CN103608826B (zh) * 2011-04-12 2017-04-05 新加坡国立大学 利用网络信息挖掘的视频内产品注释

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6560281B1 (en) * 1998-02-24 2003-05-06 Xerox Corporation Method and apparatus for generating a condensed version of a video sequence including desired affordances
US20020114522A1 (en) * 2000-12-21 2002-08-22 Rene Seeber System and method for compiling images from a database and comparing the compiled images with known images
US7783135B2 (en) * 2005-05-09 2010-08-24 Like.Com System and method for providing objectified image renderings using recognition information from images
US20070083815A1 (en) * 2005-10-11 2007-04-12 Alexandre Delorme Method and apparatus for processing a video stream

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015027805A1 (zh) * 2013-08-28 2015-03-05 上海合合信息科技发展有限公司 产品说明的查询方法、装置、系统及客户端

Also Published As

Publication number Publication date
JP6049693B2 (ja) 2016-12-21
CN103608826B (zh) 2017-04-05
SG194442A1 (en) 2013-12-30
US20140029801A1 (en) 2014-01-30
JP2014524058A (ja) 2014-09-18
US9355330B2 (en) 2016-05-31
GB201319882D0 (en) 2013-12-25
GB2506028A (en) 2014-03-19
GB2506028B (en) 2018-11-28
CN103608826A (zh) 2014-02-26

Similar Documents

Publication Publication Date Title
US9355330B2 (en) In-video product annotation with web information mining
US12373490B2 (en) Relevance-based image selection
US10922350B2 (en) Associating still images and videos
US8818916B2 (en) System and method for linking multimedia data elements to web pages
US8533134B1 (en) Graph-based fusion for video classification
US20200012674A1 (en) System and methods thereof for generation of taxonomies based on an analysis of multimedia content elements
US8396286B1 (en) Learning concepts for video annotation
CN107562742B (zh) 一种图像数据处理方法及装置
US20160358025A1 (en) Enriching online videos by content detection, searching, and information aggregation
US20120148149A1 (en) Video key frame extraction using sparse representation
CN101281540A (zh) 用于处理信息的设备、方法和计算机程序
US11537636B2 (en) System and method for using multimedia content as search queries
US20130191368A1 (en) System and method for using multimedia content as search queries
Sudha et al. Reducing semantic gap in video retrieval with fusion: A survey
US9208157B1 (en) Spam detection for user-generated multimedia items based on concept clustering
Vadivukarassi et al. A framework of keyword based image retrieval using proposed Hog_Sift feature extraction method from Twitter Dataset
Mane et al. Video classification using SVM
Namala et al. Efficient feature based video retrieval and indexing using pattern change with invariance algorithm
Tsafaris Interactive video search based on online content classification
Zelnik-Manor et al. Personalized Content-Based Categorization of Web Videos
Pawar et al. Interpretation of user intention via textual and visual resemblance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12771862

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 14111149

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2014505107

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 1319882

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20120411

WWE Wipo information: entry into national phase

Ref document number: 1319882.5

Country of ref document: GB

122 Ep: pct application non-entry in european phase

Ref document number: 12771862

Country of ref document: EP

Kind code of ref document: A1