EP1739593B1 - Verfahren und Anordnung zur generischen visuellen Kategorisierung - Google Patents

Verfahren und Anordnung zur generischen visuellen Kategorisierung Download PDF

Info

Publication number
EP1739593B1
EP1739593B1 EP06115147A EP06115147A EP1739593B1 EP 1739593 B1 EP1739593 B1 EP 1739593B1 EP 06115147 A EP06115147 A EP 06115147A EP 06115147 A EP06115147 A EP 06115147A EP 1739593 B1 EP1739593 B1 EP 1739593B1
Authority
EP
European Patent Office
Prior art keywords
class
vocabulary
classes
feature vectors
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP06115147A
Other languages
English (en)
French (fr)
Other versions
EP1739593A1 (de
Inventor
Florent Perronnin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Publication of EP1739593A1 publication Critical patent/EP1739593A1/de
Application granted granted Critical
Publication of EP1739593B1 publication Critical patent/EP1739593B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Definitions

  • the following relates generally to methods, apparatus and articles of manufacture therefor, for categorizing images.
  • Generic visual categorization provides access to high-level class information about objects contained in images for managing, searching, and mining such collections. Categorization of image content through generic visual categorization involves generalizing over natural variations in appearance inherent in a category of elements (e.g., objects, animals, etc.), and over viewing and imaging conditions. Unlike categorization methods for individual categories or object types, such as faces or cars, generic visual categorization systems handle multiple object types simultaneously.
  • One existing approach for performing generic visual categorization is an example-based machine learning approach known as the "bag of keypoints" approach, which makes use of a "visual vocabulary” to provide a mid-level characterization of images for bridging the semantic gap between low-level features and high-level concepts.
  • the visual vocabulary is estimated in an unsupervised manner by clustering a set of training samples (i.e., low level features extracted from training images). To characterize an image, each of its feature vectors is assigned to its closest cluster and a single occupancy histogram is built. The image is classified by providing the single occupancy histogram to a set of Support Vector Machine (SVM) classifiers (i.e., one per class), trained in a one versus all manner.
  • SVM Support Vector Machine
  • a produce recognition system based on classification employing a histogram method is described in EP-0 685 814 A2 .
  • the produce recognition system is particularly suitable for usage in shops, in connection with a weighing system, for determining the price of an amount of produce when the price depends on the weight and the type of produce.
  • Document US 2002/0168097 describes a system and method for recognizing markers on printed circuit boards.
  • the system can recognize a plurality of different kinds of markers (in particular: for indicating whether a printed board is defective).
  • a histogram based recognition process or a correlation based recognition process is selected in a first method step. Further, a training process for a classifier is described in the document.
  • the present invention aims to provide for assigning one of a plurality of classes to an input image and a method for training a classifier.
  • Figure 1 illustrates elements of a system for performing image categorization training in accordance with the embodiments disclosed herein;
  • Figure 2 illustrates a flow diagram of operations performed by the system elements shown in Figure 1 ;
  • Figure 3 is an illustrative flow diagram for generating class vocabularies in accordance with the embodiments described herein;
  • Figure 4 illustrates elements of a system for performing image categorization in accordance with the embodiments disclosed herein;
  • Figure 5 illustrates a flow diagram of operations performed by the system elements shown in Figure 4 ;
  • Figure 6 is an illustrative flow diagram for categorizing an input image in accordance with the embodiments described herein.
  • Figures 1 , 2 , and 3 concern the training of an image classifier, which figures are cross-referenced in this section.
  • Figure 1 illustrates elements of a system 100 for performing image classification training in accordance with the embodiments disclosed herein.
  • Figure 2 illustrates a flow diagram of operations performed by the system elements shown in Figure 1 .
  • Figure 3 is an illustrative flow diagram for generating class vocabularies in accordance with the embodiments described herein.
  • the system 100 includes a memory 102 for storing one or more class training sets.
  • Each class training set is made up of one or more images labeled with an identifier of the class.
  • the memory 102 includes a large number of training samples for each class, thereby permitting a distribution of its visual words to be represented in an "average image".
  • the classes may represent elements of an ontology that may be unorganized (e.g., flat) or organized (e.g., in a hierarchy), or a combination of both.
  • the ontology may be formulated using for example the DMOZ ontology (defined at dmoz.org).
  • a key-patch detector 104 identifies key-patches in images of the class training sets stored in memory 102.
  • the key-patch detector 104 should preferably detect repeatable, invariant regions in images. Namely, it is desirable that the key-patch detector 104 be adapted to detect similar features regions as an object undergoes transformations between images (such as in viewpoint, imaging, or lighting).
  • the key-patch detector 104 is a Harris affine detector (as described by Mikolajczyk and Schmid, in "An Affine Invariant Interest Point Detector", ECCV, 2002 , and " A Performance Evaluation Of Local Descriptors", in IEEE Conference on Computer vision and Pattern Recognition, June 2003 ).
  • the Harris affine detector detects in an image a circular region of points (as illustrated in Figure 1 by circles on image 105) using an iterative two-part process.
  • positions and scales of interest points are determined as local maxima (in position) of a scale-adapted Harris function, and as local extrema in scale of the Laplacian operator.
  • an elliptical (i.e., affine) region is determined, which elliptical region has a size given by the selected scale and a shape given by the eigenvalues of the image's second moment matrix.
  • the first and second parts are then iterated and the elliptical region is kept only if the process converges within a fixed number of iterations.
  • the elliptical region is then mapped to a circular region normalized according to scale, orientation, and illumination.
  • alternate detectors may be used to identify key-patches in images of the class training sets stored in memory 102.
  • Examples of such alternate detectors are set forth by Mikolajczyk, Tuytelaars, Schmid, Zisserman, Matas, Schaffalitzky, Kadir, Van Gool, in "A Comparison Of Affine Region Detectors", International Journal of Computer Vision, 2005 (available on the Internet at http://lear.inrialpes.fr/pubs/ ).
  • a feature description module 106 computes (unordered) feature vectors (as illustrated in Figure 1 at 107) for the key-patches identified at 202.
  • Scale Invariant Feature Transform (SIFT) descriptors (as described by Lowe, in "Object Recognition From Local Scale-Invariant Features", ICCV (International Conference on Computer Vision), 1999) are computed on the (circular) region(s) of an image detected by the key-patch detector 104 (e.g., normalized Harris affine regions).
  • SIFT descriptors are multi-image representations of an image neighborhood. More specifically, SIFT descriptors are Gaussian derivatives computed at eight orientation planes over a four-by-four grid of spatial locations, giving a 128-dimensional vector.
  • alternate descriptors may be computed on the regions of an image detected by the key-patch detector 104. Examples of such alternate descriptors are set forth by K. Mikolajczyk and C. Schmid, in "A Performance Evaluation Of Local Descriptors", Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wisconsin, USA, June 2003 .
  • a multi-histogram computation module 108 generates histograms from (unordered) feature vectors produced by module 106 by performing 206, 208, 210, and 212, which are discussed in more detail below.
  • the histograms produced by the multi-histogram computation module 108 represent fixed length feature vectors for use with a machine learning classification method employed by classifier training module 110.
  • the machine learning classification method is a Support Vector Machine (SVM) classifier described in more detail below.
  • the classifier trained at 110 may be any discriminative classifier (i.e., a classifier that models class boundaries), such as Fisher kernels (FK), or neural networks.
  • a general vocabulary is estimated by clustering the feature vectors computed at 204. Assuming that the feature vectors may be modeled according to a probability density function (pdf) p, clustering may be performed by maximizing the likelihood function p ( X
  • ⁇ g ) with respect to the parameters ⁇ g ⁇ W i,g , ⁇ i,g , C i,g ⁇ of the general vocabulary defined below, where X is the set of feature vectors ⁇ x 1 , ... , x N ⁇ of the training images 102. Further, assuming that the feature vectors are independent, the likelihood function may be defined as: p X
  • the pdf is a Gaussian Mixture Model (GMM) given by: p x t
  • EM Expectation-Maximization
  • a vocabulary with one Gaussian may be initialized using a set of closed formulas that estimate the parameters of the Gaussian.
  • the Gaussian may be split into two Gaussians by introducing a small perturbation in its mean (for background see, Ananth Sankar, "Experiments With A Gaussian Merging-Splitting Algorithm For HMM Training For Speech Recognition", Proceedings of the 1997 DARPA Broadcast News Transcription and Understanding Workshop, pp. 99-104, 1998 ).
  • EM is iteratively performed until convergence. This process of Gaussian splitting and EM training may then be repeated until a desired number of Gaussians is obtained.
  • each component density p i,g of the GMM corresponds to a visual word as described herein;
  • the mixture weights w i,g are the relative frequencies of the visual words in the visual vocabulary;
  • the mean parameters ⁇ i,g are the "averages" of the visual words;
  • the covariance matrices C i,g reflect the variations of the visual words around their averages.
  • an adapted vocabulary is computed for each class of the labeled training set stored at 102.
  • the adapted vocabularies may be estimated using the Maximum A Posteriori (MAP) criterion (for more background on the MAP criterion see the following publication: Gauvain and Lee, "Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains", IEEE Trans. on Speech and Audio Processing, Vol. 2, No. 2, April 1994 ).
  • MAP Maximum A Posteriori
  • the adapted vocabulary for a class is computed using the general vocabulary computed at 206 and the feature vectors of the class.
  • the criterion to be maximized in MAP is: p ⁇ a
  • the difference between MAP and ML estimation lies in the assumption of an appropriate prior distribution p ( ⁇ a ) of the parameters to be estimated.
  • the seed visual vocabulary used to initialize EM is the general visual vocabulary.
  • the adapted vocabularies may be reduced by saving only those Gaussians (i.e., visual words) that have significantly changed compared to the general vocabulary. The significance of the change may be measured using various metrics such as the divergence, the Bhattacharya distance, or the overlap.
  • the adapted vocabularies may be computed using Maximum Likelihood Linear Regression (MLLR), as for example disclosed by Leggetter and Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models", Computer Speech and Language, issue 9, pp. 171-185, 1995 .
  • MLLR Maximum Likelihood Linear Regression
  • an affine transformation is applied to all components of the adapted vocabularies. Storage required for the adapted vocabularies may be reduced using this alternate embodiment if the number of parameters of the transformation is smaller than the number of Gaussian parameters corresponding to the adapted vocabularies.
  • a class vocabulary of visual words for each class is computed by merging the general vocabulary computed at 206 and the adaptive vocabulary computed for its respective class at 208.
  • the merging of the general vocabulary and an adaptive vocabulary involves adjusting the weight parameter of the Gaussians to reflect the vocabulary size having doubled, thereby requiring the frequency of words of the Gaussians to be each divided by, for example, its relative frequency of occurrence; the other parameters of the Gaussians remain unchanged (e.g., mean and covariance parameters).
  • the set of parameters of the class vocabulary of visual words for class c is denoted ⁇ c .
  • a histogram is computed for each training image (or alternatively input image 402, as discussed below) for each class by each class-specific histogram calculator 109.
  • Figure 3 illustrates the generation of class vocabularies in greater detail.
  • feature vectors for images of class "cat” at 302A and feature vectors for images of class "dog” at 302B are computed at 204.
  • a general vocabulary 304 is generated using both cat feature vectors 302A and dog feature vectors 302B.
  • an adapted vocabulary is generated for each class at 306A and 306B using a general vocabulary 304 and a set of class feature vectors, 302A and 302B, respectively.
  • each resulting adapted vocabulary 306A and 306B is merged with the general vocabulary 304 to define class vocabularies 308A and 308B, respectively.
  • the general vocabulary 304 include Gaussians G 1 g , G 2 g , and G 3 g that represent feature densities of all objects detected in the image classes of feature vectors 302A and 302B, such as, eyes, ears, and tail.
  • the adapted vocabularies 306A and 306B include Gaussians G 1 a , G 2 a , and G 3 a that represent feature densities of only those objects detected in the image classes of feature vectors 302A and 302B, such as, eyes, ears, and tail, for the classes "cat" and "dog", respectively.
  • the class vocabularies 308A and 308B which represent the visual words of each class, merge the general-Gaussians 304 and adapted-Gaussians of the adapted vocabulary 306A and 306B for each class, respectively.
  • the class vocabulary illustrate variations between general visual words and adapted visual words in the parameters of the Gaussians.
  • Figure 3 illustrates a shift from left-to-right for the class "cat" of the adapted Gaussians G 1 a , G 2 a , and G 3 a relative to the general Gaussians G 1 g , G 2 g , and G 3 g , whereas for the class "dog" there exists a shift from right-to-left.
  • each class vocabulary is trained on all available animal images (e.g., cats and dogs, which are likely to include visual words of objects in the image such as eyes, ears, and tail), while at the same time each is trained on each class-specific vocabulary (e.g., cat-eyes, cat-ears, cat-tail) by adapting the general vocabulary to class-specific vocabularies.
  • animal images e.g., cats and dogs, which are likely to include visual words of objects in the image such as eyes, ears, and tail
  • each class-specific vocabulary e.g., cat-eyes, cat-ears, cat-tail
  • class vocabularies that are concatenations of a general vocabulary and its corresponding adapted vocabulary permits the occupancy histogram (computed at 212) to remain a constant size for each class.
  • the size of the occupancy histogram may be decreased by a factor of two by merging histograms corresponding to the general and the adapted vocabularies, for example, by computing a visual word-by-word difference ratio.
  • a set of SVM classifiers are trained (by the classifier training module 110) using the computed histograms that serve as fixed length feature vectors for each SVM classifier 111.
  • one SVM is trained for each class or visual category of labeled training images.
  • labeled data i.e., the histograms computed for each class at 212
  • each classifier for adapting a statistical decision procedure for distinguishing between the visual categories.
  • Each SVM classifier 111 in the set of SVM classifiers finds a hyperplane that separates two-class data with maximal margin (for background see Christopher Burges, "A tutorial on Support Vector Machines for Pattern Recognition", in Data Mining and Knowledge Discovery, vol. 2, pp. 121-167, 1998 ).
  • the margin is defined as the distance of the closest training point to the separating hyperplane.
  • f h sign ⁇ w T ⁇ h + b , where w, b represents the parameters of the hyperplane.
  • the SVM may introduce an error weighting constant C which penalizes misclassification of samples in proportion to their distance from the classification boundary.
  • the SVM may perform a mapping ⁇ from the original data space of X to another feature space. This second feature space may have a high or even infinite dimension.
  • the parameters ⁇ i are typically zero for most i.
  • the sum in equation 7 may be taken only over a select few of the training feature vectors h i .
  • These feature vectors are known as support vectors. It can be shown that the support vectors are those feature vectors lying nearest to the separating hyperplane.
  • the input features h i of equation 7 are the training histograms for each class (at 109) computed at 212 in Figure 2 .
  • a plurality of SVM classifiers 111 may be trained by module 110 using class-specific histogram calculators 109 computed by module 108, such that each SVM (e.g., SVM 111A, 111 B, and 111 C) evaluates one class against all other classes. That is, given an m-class problem, SVMs are trained, such that each distinguishes images from some category I from images from all the other m-1 categories j not equal to i.
  • SVM e.g., SVM 111A, 111 B, and 111 C
  • the classifier 111A for classifying whether an image is a cat or not a cat is trained using cat-class-specific histogram 109A, which histogram estimates the occupancy probabilities of feature vectors of images of the cat-class training set.
  • Figures 4 , 5 , and 6 concern the use of the image classifier trained in Figures 1-3 , which Figures 4-6 are cross-referenced in this section.
  • Figure 4 illustrates elements of a system 400 for performing image classification in accordance with the embodiments disclosed herein. Common elements in Figures 1 and 4 are labeled with similar reference numbers.
  • Figure 5 illustrates a flow diagram of operations performed by the system elements shown in Figure 4 .
  • Figure 6 is an illustrative flow diagram for categorizing an input image in accordance with the embodiments described herein.
  • an input image 402 is received for categorization.
  • key-patches are identified in the input image using key-patch detector 104 that operates on the input image 402 (as at 202 described above for the labeled class training sets 102).
  • feature vectors are computed by the feature detection module 106 for identified key-patches (as at 204 described above).
  • a histogram is computed by the multi-histogram computation module 108 for each class vocabulary (computed at 206, 208, and 210) using the computed feature vectors (as at 212 described above).
  • Each class histogram estimates the occupancy probability of the feature vectors of the input image for each class vocabulary.
  • each SVM 411 of a classifier 410 computes a categorization score corresponding to the histogram of its class vocabulary computed by its class-specific histogram calculators 109, respectively.
  • the categorization score reflects the probability that the detected key-patches belong to one of the labeled class training sets 102.
  • a decision module 412 assigns the input image 402 to have a label (or set of labels) associated with the class (or set of classes) corresponding to the class (or classes) of the SVM(s) 411 producing the highest score(s).
  • Figure 6 is an illustrative example of categorization performed in accordance with the flow diagram set for in Figure 5 .
  • key-patches are identified in an input image 602 (i.e., more generally at 504).
  • feature vectors 606 are computed for each identified key-patch (i.e., more generally at 506).
  • the set of feature vectors 606 are then used to compute histograms 608 that estimate occupancy probabilities with respect to the adapted vocabulary 614 and the general vocabulary 616, corresponding to each class 618 (i.e., more generally at 508).
  • a score 610 is computed for each histogram 608 (i.e., more generally at 510).
  • a label 612 is assigned to the image to provide the categorization decision based on the best score 610 for the class 618 (i.e., more generally at 512).
  • Figure 6 illustrates how each class-histogram may be split into two sub-histograms of equal size, where the first half of each histogram records the feature densities for the adapted vocabulary 614 and the second half of each histogram records the feature densities of the general vocabulary 616.
  • Figure 6 illustrates that the histogram for the class "cat" has an adapted vocabulary 614 in which the feature densities of feature vectors of key-patches identified in the input image 402 for the adapted vocabulary 614 (i.e., G 1 a , G 2 a , and G 3 a , )are much greater than that of the corresponding general vocabulary 616 ( G 1 g , G 2 g , and G 3 g ), indicating that the key-patches are much closer to cat features (e.g., eyes, ear, tail) than features describing all image classes.
  • the feature densities of feature vectors of key-patches identified in the input image 402 for the adapted vocabulary 614 i.e., G 1 a , G 2 a , and G 3 a ,
  • the key-patches are much closer to cat features (e.g., eyes, ear, tail) than features describing all image classes.
  • the histograms 608 computed for the "bird” and “dog” classes illustrate that the feature densities of feature vectors of key-patches identified in the input image 402 for the adapted vocabulary 614 are much less than that of the corresponding general vocabulary 616, indicating that the key-patches are much closer to features describing all image classes than the specific "bird” or "dog” classes.
  • scoring is performed at 508 and 510 using a two-step procedure, which begins by first determining which visual word of the general vocabulary corresponds closest to a given a feature vector for an input image.
  • the feature vector may, for example, represent features such as an animal's ears, eyes, and tail.
  • a determination is made identifying which adapted vocabulary (or class) that visual word most closely corresponds. For example, if the feature vector is determined to correspond to the "eye" visual word of the general vocabulary, it is subsequently determined whether the "eye" feature vector is more likely to be that of a "cat” or a "dog".
  • the two-step scoring procedure in the first step, may compute for each feature vector of a given input image, the occupancy probability for only the top K scoring Gaussians in the general vocabulary (while assuming the occupancy probability for any other lesser scoring Gaussian is zero).
  • the occupancy probability may be computed for those corresponding top K scoring Gaussians in each adapted vocabulary.
  • the method, apparatus and article of manufacture therefor, for generic visual categorization complements a general (visual) vocabulary with adapted (visual) vocabularies that are class (or category) specific. Images are characterized within each class (or category) through a histogram indicating whether the image is better described by the general vocabulary or the class-specific adapted vocabulary.
  • defining a generic vocabulary and adapting it to different class specific categories through histograms based on a mixture of both vocabularies permits the appropriateness of both vocabularies to be captured.
  • a general purpose computer may be used as an apparatus for training and using the elements of the generic visual categorizer shown in Figures 1 and 4 and described herein.
  • the systems 100 and 400 may operate as separate systems or together as a single system.
  • Such a general purpose computer would include hardware and software.
  • the hardware would comprise, for example, memory (ROM, RAM, etc.) (e.g., for storing processing instructions of the categorization system detailed in Figures 2 and 5 ), a processor (i.e., CPU) (e.g., coupled to the memory for executing the processing instructions), persistent storage (e.g., CD-ROM, hard drive, floppy drive, tape drive, etc.), user I/O, and network I/O.
  • the user I/O may include a camera, a microphone, speakers, a keyboard, a pointing device (e.g., pointing stick, mouse, etc.), and the display.
  • the network I/O may for example be coupled to a network such as the Internet.
  • the software of the general purpose computer would include an operating system and application software providing the functions of the generic visual categorization system.
  • Any resulting program(s), having computer-readable program code, may be embodied within one or more computer-usable media such as memory devices or transmitting devices, thereby making a computer program product or article of manufacture according to the embodiment described herein.
  • the terms "article of manufacture” and “computer program product” as used herein are intended to encompass a computer program existent (permanently, temporarily, or transitorily) on any computer-usable medium such as on any memory device or in any transmitting device.
  • Executing program code directly from one medium, storing program code onto a medium, copying the code from one medium to another medium, transmitting the code using a transmitting device, or other equivalent acts may involve the use of a memory or transmitting device which only embodies program code transitorily as a preliminary or final step in making, using, or selling the embodiments as set forth in the claims.
  • Memory devices include, but are not limited to, fixed (hard) disk drives, floppy disks (or diskettes), optical disks, magnetic tape, semiconductor memories such as RAM, ROM, Proms, etc.
  • Transmitting devices include, but are not limited to, the Internet, intranets, electronic bulletin board and message/note exchanges, telephone/modem based network communication, hard-wired/cabled communication network, cellular communication, radio wave communication, satellite communication, and other stationary or mobile network systems/communication links.
  • a machine embodying the embodiments may involve one or more processing systems including, but not limited to, CPU, memory/storage devices, communication links, communication/transmitting devices, servers, I/O devices, or any subcomponents or individual parts of one or more processing systems, including software, firmware, hardware, or any combination or subcombination thereof, which embody the disclosure as set forth in the claims.
  • processing systems including, but not limited to, CPU, memory/storage devices, communication links, communication/transmitting devices, servers, I/O devices, or any subcomponents or individual parts of one or more processing systems, including software, firmware, hardware, or any combination or subcombination thereof, which embody the disclosure as set forth in the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Claims (8)

  1. Verfahren zum Zuordnen einer aus einer Mehrzahl von Klassen (618) zu einem eingegebenen Bild (402, 602), umfassend folgende Schritte:
    Identifizieren (504) einer Mehrzahl von Schlüssel-Teilbereichen (604) in dem eingegebenen Bild (402, 602);
    Berechnen (506) eines Merkmalvektors (606) für jeden der Mehrzahl von Schlüssel-Teilbereichen (604);
    Berechnen (508) eines Histogramms (608); und
    Zuordnen (512) wenigstens einer der Mehrzahl von Klassen (618) zu dem eingegebenen Bild (402, 602);
    gekennzeichnet durch
    Definieren (206) eines allgemeinen visuellen Vokabulars (304, 616), wobei das allgemeine visuelle Vokabular (304, 616) einen Satz visueller Wörter (G1 g, G2 g, G3 g) einschließt, wobei jedes der visuellen Wörter (G1 g, G2 g, G3 g) einer Komponentenwahrscheinlichkeitsdichtefunktion eines gemischten statistischen Modells zum Modellieren von Merkmalsvektoren (606) von Bildern entspricht;
    Definieren (208) eines angepassten visuellen Vokabulars (306A, 306B, 614) für jede der Mehrzahl von Klassen (618), basierend auf dem allgemeinen visuellen Vokabular (304, 616) und auf Merkmalsvektoren (107) von Beispielbildern (105) der jeweilige Klasse;
    Definieren (210) einer Klassenvokabulars (308A, 308B) für jeder der Mehrzahl von Klassen (618), durch Vereinigen des allgemeinen visuellen Vokabulars (304, 616) und des angepassten visuellen Vokabulars (306A, 306B, 614) für die jeweilige Klasse, so dass jedes Klassenvokabular (308A, 308B) eine Verkettung des allgemeinen visuellen Vokabulars (304, 616) und des jeweiligen angepassten visuellen Vokabulars (306A, 306B, 614) ist, und dadurch, dass
    der Schritt des Berechnens (508) eines Histogramms (608) für jede der Mehrzahl von Klassen (618) ausgeführt wird, indem Besetzungswahrscheinlichkeiten der Merkmalsvektoren (606) des eingegebenen Bildes (402, 602) für das jeweilige Klassenvokabular (308A, 308B) abgeschätzt werden; und
    der Schritt des Zuordnens (512) wenigstens einer der Mehrzahl von Klassen (618) unter Verwendung der Mehrzahl von berechneten Histogrammen (608) als Eingabe in einen Klassifizierer (410) ausgeführt wird.
  2. Verfahren nach Anspruch 1, wobei das für jede der Mehrzahl von Klassen (618) berechnete Histogramm (608) anzeigt, ob das eingegebene Bild (402, 602) besser durch das allgemeine visuelle Vokabular (304, 616) oder das angepasste visuelle Vokabular (306A, 306B, 614) seiner entsprechenden Klasse (618) beschrieben wird.
  3. Verfahren nach Anspruch 2, weiterhin umfassend:
    Berechnen (202) von Schlüssel-Teilbereichen in Bildern (105) von Klassentrainingssätzen (102), wobei jeder der Klassentrainingssätze (102) eines oder mehrere Beispielbilder (105) der jeweiligen Klasse enthält;
    Berechnen (204) von Merkmalsvektoren (107) für Schlüssel-Teilbereiche der Bilder (105) der Klassentrainingssätze (102);
    wobei der Schritt des Definierens (206) eines allgemeinen visuellen Vokabulars (304, 616) den Schritt des Berechnens des allgemeinen visuellen Vokabulars (304, 616) durch Gruppieren von Merkmalsvektoren (107) der Bilder (105) der Klassentrainingssätze (102) einschließt; und
    wobei der Schritt des Definierens (208) eines angepassten visuellen Vokabulars (306A, 306B, 614) für jede Klasse (618) den Schritt des Berechnens des angepassten visuellen Vokabulars (306A, 306B, 614) für die jeweilige Klasse durch Abschätzen von Besetzungswahrscheinlichkeiten der Merkmalsvektoren (107) der Bilder (105) ihres Klassentrainingssatzes (102) einschließt.
  4. Verfahren nach Anspruch 3, weiterhin umfassend Trainieren (214) des Klassifizierers (410) mit Histogrammen (109A, 109B, 109C), die für jede Klasse (618) durch Abschätzen von Besetzungswahrscheinlichkeiten der Merkmalsvektoren (107) von Bildern (105) des Klassentrainingssatzes (102) für jedes Klassenvokabular (308A, 308B) berechnet worden sind.
  5. Vorrichtung zum Zuordnen einer aus einer Mehrzahl von Klassen (618) zu einem eingegebenen Bild (402, 602), wobei die Vorrichtung umfasst:
    einen Schlüssel-Teilbereichsdetektor (104) zum Identifizieren einer Mehrzahl von Schlüssel-Teilbereichen (604) in dem eingegebenen Bild (402, 602);
    ein Merkmals-Beschreibungsmodul (106) zum Berechnen eines Merkmalsvektors (606) für jeden der Mehrzahl von Schlüssel-Teilbereichen (604); und
    einen Klassifizierer (410) zum Zuordnen wenigstens einer der Mehrzahl von Klassen (618) zu dem eingegebenen Bild (402, 602);
    gekennzeichnet durch
    ein allgemeines visuelles Vokabular-Erzeugungsmodul zum Definieren eines allgemeinen visuellen Vokabulars (304, 616), wobei das allgemeine visuelle Vokabular (304, 616) einen Satz visueller Wörter (G1 g, G2 g, G3 g) einschließt, wobei jedes der visuellen Wörter (G1 g, G2 g, G3 g) einer Komponentenwahrscheinlichkeitsdichtefunktion eines gemischten statistischen Modells zum Modellieren von Merkmalsvektoren (606) von Bildern entspricht;
    ein angepasstes visuelles Vokabular-Erzeugungsmodul zum Definieren eines angepassten visuellen Vokabulars (306A, 306B, 614) für jede der Mehrzahl von Klassen (618), basierend auf dem allgemeinen visuellen Vokabular (304, 616) und von Merkmalsvektoren (107) von Beispielbildern (105) der jeweiligen Klasse;
    ein Vokabular-Vereinigungsmodul zum Definieren eines Klassenvokabulars (308A, 308B) für jede der Mehrzahl von Klassen (618) durch Vereinigen des allgemeinen visuellen Vokabulars (314, 616) und des angepassten visuellen Vokabulars (306A, 306B, 614) für die jeweilige Klasse, so dass jedes Klassenvokabular (308A, 308B) eine Verkettung des allgemeinen visuellen Vokabulars (304, 616) und des jeweiligen angepassten visuellen Vokabulars (306A, 306B, 614) ist;
    ein Muiti-Histogramm-Berechnungsmodul (108) zum Berechnen eines Histogramms (608) für jede der Mehrzahl von Klassen (618) durch Abschätzen von Besetzungswahrscheinlichkeiten der Merkmalsvektoren (606) des eingegebenen Bildes (402, 602) für das jeweilige Klassenvokabular (308A, 308B), und dadurch, dass
    der Klassifizierer (410) eingerichtet ist, die Mehrzahl von berechneten Histogrammen (608) als Eingabe in den Klassifizierer (410) zu verwenden, um wenigstens eine der Mehrzahl von Klassen (618) zuzuordnen.
  6. Verfahren zum Trainieren eines Klassifizierers (410), umfassend folgende Schritte:
    Identifizieren (202) von Schlüssel-Teilbereichen in Bildern (105) einer Mehrzahl von Klassentrainingssätzen (102), wobei jeder der Klassentrainingssätze (102) eines oder mehrere Beispielbilder (105) einer jeweiligen Klasse einschließt;
    Berechnen (204) von Merkmalsvektoren (107) für die identifizierten Schlüssel-Teilbereiche;
    Berechnen (206) eines allgemeinen visuellen Vokabulars (304, 616) durch Gruppieren der berechneten Merkmalsvektoren (107), wobei das allgemeine visuelle Vokabular (304, 616) einen Satz von visuellen Wörtern (G1 g, G2 g, G3 g) einschließt, wobei jedes der visuellen Wörter (G1 g, G2 g, G3 g) einer Komponentenwahrscheinlichkeitsdichtefunktion eines gemischten statistischen Modells zum Modellieren von Merkmalsvektoren (107, 606) von Bildern (105, 402) entspricht;
    Berechnen (208) eines angepassten visuellen Vokabulars (306A, 306B, 614) für jede der Mehrzahl von Klassen (618) unter Verwendung das allgemeinen visuellen Vokabulars (304, 616) und der Merkmalsvektoren (107) der Bilder (105) des jeweiligen Klassentrainingssatzes (102);
    Berechnen (212) eines Histogramms (109A, 109B, 109C) für jede der Mehrzahl von Klassen (618) durch Abschätzen von Besetzungswahrscheinlichkeiten von Merkmalsvektoren (107) von Bildern (105) des jeweiligen Klassentrainingssatzes (102); und
    Trainieren (214) des Klassifizierers (410) unter Verwendung der Histogramme (109A, 109B, 109C) für jede der Mehrzahl von Klassen (618).
  7. Verfahren nach Anspruch 6, weiterhin umfassend das Kategorisieren eines eingegebenen Bildes (402, 602) mit dem Bildklassifizierer (410) und
    wobei das Kategorisieren weiterhin umfasst:
    Identifizieren (504) einer Mehrzahl von Schlüssel-Teilbereiche (604) in dem eingegebenen Bild (402, 602);
    Berechnen (506) eines Merkmalsvektors (606) für jeden der Mehrzahl von Schlüssel-Teilbereichen (604);
    Berechnen (508) eines Histogramms (608) für jede der Mehrzahl von Klassen (618) unter Verwendung der Mehrzahl von berechneten Merkmalsvektoren (606);
    Zuordnen (512) wenigstens einer der Mehrzahl von Klassen (618) zu dem eingegebenen Bild (402, 602) unter Verwendung der Mehrzahl von berechneten Histogrammen (608) als Eingabe in den Klassifizierer (410).
  8. Verfahren nach Anspruch 7, wobei jedes Histogramm (608) berechnet wird durch Abschätzen von Besetzungswahrscheinlichkeiten der Merkmalsvektoren (606) des eingegebenen Bildes (402, 602).
EP06115147A 2005-06-30 2006-06-08 Verfahren und Anordnung zur generischen visuellen Kategorisierung Active EP1739593B1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/170,496 US7756341B2 (en) 2005-06-30 2005-06-30 Generic visual categorization method and system

Publications (2)

Publication Number Publication Date
EP1739593A1 EP1739593A1 (de) 2007-01-03
EP1739593B1 true EP1739593B1 (de) 2008-08-27

Family

ID=36617282

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06115147A Active EP1739593B1 (de) 2005-06-30 2006-06-08 Verfahren und Anordnung zur generischen visuellen Kategorisierung

Country Status (3)

Country Link
US (1) US7756341B2 (de)
EP (1) EP1739593B1 (de)
DE (1) DE602006002434D1 (de)

Families Citing this family (120)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4721829B2 (ja) * 2005-08-31 2011-07-13 トヨタ自動車株式会社 画像検索方法及び装置
US7899251B2 (en) * 2006-06-05 2011-03-01 Microsoft Corporation Balancing out-of-dictionary and in-dictionary recognition scores
US8467570B2 (en) * 2006-06-14 2013-06-18 Honeywell International Inc. Tracking system with fused motion and object detection
US7724962B2 (en) * 2006-07-07 2010-05-25 Siemens Corporation Context adaptive approach in vehicle detection under various visibility conditions
US7885466B2 (en) * 2006-09-19 2011-02-08 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
US7933454B2 (en) * 2007-06-25 2011-04-26 Xerox Corporation Class-based image enhancement system
US7885794B2 (en) * 2007-11-30 2011-02-08 Xerox Corporation Object comparison, retrieval, and categorization methods and apparatuses
US8321368B2 (en) * 2008-01-23 2012-11-27 Niigata University Identification device, identification method, and identification processing program
US8009921B2 (en) 2008-02-19 2011-08-30 Xerox Corporation Context dependent intelligent thumbnail images
US8340452B2 (en) 2008-03-17 2012-12-25 Xerox Corporation Automatic generation of a photo guide
US9002100B2 (en) * 2008-04-02 2015-04-07 Xerox Corporation Model uncertainty visualization for active learning
US9549713B2 (en) 2008-04-24 2017-01-24 Boston Scientific Scimed, Inc. Methods, systems, and devices for tissue characterization and quantification using intravascular ultrasound signals
WO2009132188A1 (en) * 2008-04-24 2009-10-29 Boston Scientific Scimed, Inc. Methods, systems, and devices for tissue characterization by spectral similarity of intravascular ultrasound signals
US8094947B2 (en) * 2008-05-20 2012-01-10 Xerox Corporation Image visualization through content-based insets
US8285059B2 (en) * 2008-05-20 2012-10-09 Xerox Corporation Method for automatic enhancement of images containing snow
US9066054B2 (en) * 2008-05-27 2015-06-23 Xerox Corporation Image indexed rendering of images for tuning images from single or multiple print engines
US8745478B2 (en) * 2008-07-07 2014-06-03 Xerox Corporation System and method for generating inspiration boards
US8224092B2 (en) * 2008-07-08 2012-07-17 Xerox Corporation Word detection method and system
US8194992B2 (en) * 2008-07-18 2012-06-05 Xerox Corporation System and method for automatic enhancement of seascape images
US8463053B1 (en) * 2008-08-08 2013-06-11 The Research Foundation Of State University Of New York Enhanced max margin learning on multimodal data mining in a multimedia database
US8111923B2 (en) 2008-08-14 2012-02-07 Xerox Corporation System and method for object class localization and semantic class based image segmentation
US8335381B2 (en) * 2008-09-18 2012-12-18 Xerox Corporation Handwritten word spotter using synthesized typed queries
US8463051B2 (en) * 2008-10-16 2013-06-11 Xerox Corporation Modeling images as mixtures of image models
EP2172874B1 (de) 2008-10-06 2012-03-14 Xerox Corporation Modellierung von Bildern als Mischungen aus Bildmodellen
US8537409B2 (en) 2008-10-13 2013-09-17 Xerox Corporation Image summarization by a learning approach
US8254679B2 (en) * 2008-10-13 2012-08-28 Xerox Corporation Content-based image harmonization
US8249343B2 (en) * 2008-10-15 2012-08-21 Xerox Corporation Representing documents with runlength histograms
US9202137B2 (en) * 2008-11-13 2015-12-01 Google Inc. Foreground object detection from multiple images
US8774498B2 (en) * 2009-01-28 2014-07-08 Xerox Corporation Modeling images as sets of weighted features
US8237743B2 (en) * 2009-02-17 2012-08-07 Xerox Corporation Modification of images from a user's album for spot-the-differences
US8175376B2 (en) * 2009-03-09 2012-05-08 Xerox Corporation Framework for image thumbnailing based on visual similarity
US8271871B2 (en) * 2009-04-30 2012-09-18 Xerox Corporation Automated method for alignment of document objects
US8260062B2 (en) * 2009-05-07 2012-09-04 Fuji Xerox Co., Ltd. System and method for identifying document genres
US9405456B2 (en) * 2009-06-08 2016-08-02 Xerox Corporation Manipulation of displayed objects by virtual magnetism
US8380647B2 (en) 2009-08-14 2013-02-19 Xerox Corporation Training a classifier by dimension-wise embedding of training data
US8566349B2 (en) 2009-09-28 2013-10-22 Xerox Corporation Handwritten document categorizer and method of training
US8775424B2 (en) 2010-01-26 2014-07-08 Xerox Corporation System for creative image navigation and exploration
US9233399B2 (en) 2010-02-09 2016-01-12 Xerox Corporation Document separation by document sequence reconstruction based on information capture
JP5174068B2 (ja) * 2010-03-11 2013-04-03 株式会社東芝 信号分類装置
US9652462B2 (en) 2010-04-29 2017-05-16 Google Inc. Identifying responsive resources across still images and videos
CN102893294A (zh) 2010-04-30 2013-01-23 沃康普公司 概率密度函数估计器
US8332429B2 (en) 2010-06-22 2012-12-11 Xerox Corporation Photography assistant and method for assisting a user in photographing landmarks and scenes
US9256799B2 (en) 2010-07-07 2016-02-09 Vucomp, Inc. Marking system for computer-aided detection of breast abnormalities
US8509537B2 (en) 2010-08-05 2013-08-13 Xerox Corporation Learning weights of fonts for typed samples in handwritten keyword spotting
US8532399B2 (en) 2010-08-20 2013-09-10 Xerox Corporation Large scale image classification
US8566746B2 (en) 2010-08-30 2013-10-22 Xerox Corporation Parameterization of a categorizer for adjusting image categorization and retrieval
US8553045B2 (en) 2010-09-24 2013-10-08 Xerox Corporation System and method for image color transfer based on target concepts
US8731317B2 (en) 2010-09-27 2014-05-20 Xerox Corporation Image classification employing image vectors compressed using vector quantization
US8369616B2 (en) 2010-10-20 2013-02-05 Xerox Corporation Chromatic matching game
US8370338B2 (en) 2010-12-03 2013-02-05 Xerox Corporation Large-scale asymmetric comparison computation for binary embeddings
US8447767B2 (en) 2010-12-15 2013-05-21 Xerox Corporation System and method for multimedia information retrieval
US8379974B2 (en) 2010-12-22 2013-02-19 Xerox Corporation Convex clustering for chromatic content modeling
US8532377B2 (en) 2010-12-22 2013-09-10 Xerox Corporation Image ranking based on abstract concepts
US8484245B2 (en) 2011-02-08 2013-07-09 Xerox Corporation Large scale unsupervised hierarchical document categorization using ontological guidance
US9600826B2 (en) 2011-02-28 2017-03-21 Xerox Corporation Local metric learning for tag recommendation in social networks using indexing
US9058611B2 (en) 2011-03-17 2015-06-16 Xerox Corporation System and method for advertising using image search and classification
US20120243751A1 (en) * 2011-03-24 2012-09-27 Zhihong Zheng Baseline face analysis
US8594385B2 (en) 2011-04-19 2013-11-26 Xerox Corporation Predicting the aesthetic value of an image
US8712157B2 (en) 2011-04-19 2014-04-29 Xerox Corporation Image quality assessment
US8774515B2 (en) 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
WO2012156774A1 (en) * 2011-05-18 2012-11-22 Ltu Technologies Method and apparatus for detecting visual words which are representative of a specific image category
US8867829B2 (en) 2011-05-26 2014-10-21 Xerox Corporation Method and apparatus for editing color characteristics of electronic image
US8570339B2 (en) 2011-05-26 2013-10-29 Xerox Corporation Modifying color adjustment choices based on image characteristics in an image editing system
US9298982B2 (en) 2011-07-26 2016-03-29 Xerox Corporation System and method for computing the visual profile of a place
US8813111B2 (en) 2011-08-22 2014-08-19 Xerox Corporation Photograph-based game
US8458174B1 (en) * 2011-09-02 2013-06-04 Google Inc. Semantic image label synthesis
US8533204B2 (en) 2011-09-02 2013-09-10 Xerox Corporation Text-based searching of image data
US8983940B2 (en) 2011-09-02 2015-03-17 Adobe Systems Incorporated K-nearest neighbor re-ranking
US8699789B2 (en) 2011-09-12 2014-04-15 Xerox Corporation Document classification using multiple views
US8781255B2 (en) 2011-09-17 2014-07-15 Adobe Systems Incorporated Methods and apparatus for visual search
US8824797B2 (en) 2011-10-03 2014-09-02 Xerox Corporation Graph-based segmentation integrating visible and NIR information
CN103164713B (zh) 2011-12-12 2016-04-06 阿里巴巴集团控股有限公司 图像分类方法和装置
US8489585B2 (en) 2011-12-20 2013-07-16 Xerox Corporation Efficient document processing system and method
US9430563B2 (en) 2012-02-02 2016-08-30 Xerox Corporation Document processing employing probabilistic topic modeling of documents represented as text words transformed to a continuous space
US20130243077A1 (en) * 2012-03-13 2013-09-19 Canon Kabushiki Kaisha Method and apparatus for processing moving image information, and method and apparatus for identifying moving image pattern
US10785545B2 (en) * 2012-04-20 2020-09-22 The Board Of Regents Of The University Of Texas System Systems and methods for simultaneous compression and encryption
US9075824B2 (en) 2012-04-27 2015-07-07 Xerox Corporation Retrieval system and method leveraging category-level labels
US8666992B2 (en) 2012-06-15 2014-03-04 Xerox Corporation Privacy preserving method for querying a remote public service
US8892562B2 (en) 2012-07-26 2014-11-18 Xerox Corporation Categorization of multi-page documents by anisotropic diffusion
US8873812B2 (en) 2012-08-06 2014-10-28 Xerox Corporation Image segmentation using hierarchical unsupervised segmentation and hierarchical classifiers
US8879796B2 (en) 2012-08-23 2014-11-04 Xerox Corporation Region refocusing for data-driven object localization
US8880563B2 (en) 2012-09-21 2014-11-04 Adobe Systems Incorporated Image search by query object segmentation
JP5756443B2 (ja) * 2012-09-25 2015-07-29 日本電信電話株式会社 画像分類装置及び画像識別装置並びにプログラム
MY172808A (en) * 2012-12-13 2019-12-12 Mimos Berhad A method and system for identifying multiple entities in images
US9008429B2 (en) 2013-02-01 2015-04-14 Xerox Corporation Label-embedding for text recognition
US8923608B2 (en) 2013-03-04 2014-12-30 Xerox Corporation Pre-screening training data for classifiers
US8879103B2 (en) 2013-03-04 2014-11-04 Xerox Corporation System and method for highlighting barriers to reducing paper usage
US9158995B2 (en) * 2013-03-14 2015-10-13 Xerox Corporation Data driven localization using task-dependent representations
US9384423B2 (en) 2013-05-28 2016-07-05 Xerox Corporation System and method for OCR output verification
US9411829B2 (en) 2013-06-10 2016-08-09 Yahoo! Inc. Image-based faceted system and method
US10331976B2 (en) * 2013-06-21 2019-06-25 Xerox Corporation Label-embedding view of attribute-based recognition
US9330110B2 (en) 2013-07-17 2016-05-03 Xerox Corporation Image search system and method for personalized photo applications using semantic networks
US9082047B2 (en) 2013-08-20 2015-07-14 Xerox Corporation Learning beautiful and ugly visual attributes
US9412031B2 (en) 2013-10-16 2016-08-09 Xerox Corporation Delayed vehicle identification for privacy enforcement
WO2015089115A1 (en) 2013-12-09 2015-06-18 Nant Holdings Ip, Llc Feature density object classification, systems and methods
US9779284B2 (en) 2013-12-17 2017-10-03 Conduent Business Services, Llc Privacy-preserving evidence in ALPR applications
US9349150B2 (en) 2013-12-26 2016-05-24 Xerox Corporation System and method for multi-task learning for prediction of demand on a system
US9424492B2 (en) 2013-12-27 2016-08-23 Xerox Corporation Weighting scheme for pooling image descriptors
US9436890B2 (en) * 2014-01-23 2016-09-06 Samsung Electronics Co., Ltd. Method of generating feature vector, generating histogram, and learning classifier for recognition of behavior
KR102214922B1 (ko) * 2014-01-23 2021-02-15 삼성전자주식회사 행동 인식을 위한 특징 벡터 생성 방법, 히스토그램 생성 방법, 및 분류기 학습 방법
US9158971B2 (en) 2014-03-03 2015-10-13 Xerox Corporation Self-learning object detectors for unlabeled videos using multi-task learning
US9639806B2 (en) 2014-04-15 2017-05-02 Xerox Corporation System and method for predicting iconicity of an image
JP2015204561A (ja) * 2014-04-15 2015-11-16 株式会社デンソー 情報提示システム、及び、提示装置
US9697439B2 (en) 2014-10-02 2017-07-04 Xerox Corporation Efficient object detection with patch-level window processing
US9298981B1 (en) 2014-10-08 2016-03-29 Xerox Corporation Categorizer assisted capture of customer documents using a mobile device
US9996768B2 (en) * 2014-11-19 2018-06-12 Adobe Systems Incorporated Neural network patch aggregation and statistics
US9443164B2 (en) 2014-12-02 2016-09-13 Xerox Corporation System and method for product identification
US9216591B1 (en) 2014-12-23 2015-12-22 Xerox Corporation Method and system for mutual augmentation of a motivational printing awareness platform and recommendation-enabled printing drivers
US9367763B1 (en) 2015-01-12 2016-06-14 Xerox Corporation Privacy-preserving text to image matching
US9626594B2 (en) 2015-01-21 2017-04-18 Xerox Corporation Method and system to perform text-to-image queries with wildcards
US9600738B2 (en) 2015-04-07 2017-03-21 Xerox Corporation Discriminative embedding of local color names for object retrieval and classification
US9514391B2 (en) * 2015-04-20 2016-12-06 Xerox Corporation Fisher vectors meet neural networks: a hybrid visual classification architecture
US10456105B2 (en) 2015-05-05 2019-10-29 Boston Scientific Scimed, Inc. Systems and methods with a swellable material disposed over a transducer of an ultrasound imaging system
US9443320B1 (en) 2015-05-18 2016-09-13 Xerox Corporation Multi-object tracking with generic object proposals
WO2018119684A1 (zh) * 2016-12-27 2018-07-05 深圳前海达闼云端智能科技有限公司 一种图像识别系统及图像识别方法
WO2018194611A1 (en) * 2017-04-20 2018-10-25 Hewlett-Packard Development Company, L.P. Recommending a photographic filter
WO2019156938A1 (en) * 2018-02-06 2019-08-15 Hrl Laboratories, Llc Machine vision system for recognizing novel objects
US11625557B2 (en) 2018-10-29 2023-04-11 Hrl Laboratories, Llc Process to learn new image classes without labels
US11386636B2 (en) 2019-04-04 2022-07-12 Datalogic Usa, Inc. Image preprocessing for optical character recognition
US10984279B2 (en) * 2019-06-13 2021-04-20 Wipro Limited System and method for machine translation of text

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5546475A (en) 1994-04-29 1996-08-13 International Business Machines Corporation Produce recognition system
US5745601A (en) * 1995-07-31 1998-04-28 Neopath, Inc. Robustness of classification measurement apparatus and method
US5963670A (en) * 1996-02-12 1999-10-05 Massachusetts Institute Of Technology Method and apparatus for classifying and identifying images
US6092059A (en) * 1996-12-27 2000-07-18 Cognex Corporation Automatic classifier for real time inspection and classification
CN1207664C (zh) * 1999-07-27 2005-06-22 国际商业机器公司 对语音识别结果中的错误进行校正的方法和语音识别系统
US6741756B1 (en) * 1999-09-30 2004-05-25 Microsoft Corp. System and method for estimating the orientation of an object
US7110591B2 (en) * 2001-03-28 2006-09-19 Siemens Corporate Research, Inc. System and method for recognizing markers on printed circuit boards
US7006250B2 (en) * 2001-09-27 2006-02-28 Lexmark International, Inc. Method of setting laser power and developer bias in an electrophotographic machine based on an estimated intermediate belt reflectivity
US6879954B2 (en) * 2002-04-22 2005-04-12 Matsushita Electric Industrial Co., Ltd. Pattern matching for large vocabulary speech recognition systems
GB2396001B (en) * 2002-10-09 2005-10-26 Canon Kk Gaze tracking system
US7359572B2 (en) * 2003-03-26 2008-04-15 Microsoft Corporation Automatic analysis and adjustment of digital images with exposure problems
US7286710B2 (en) * 2003-10-01 2007-10-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Coding of a syntax element contained in a pre-coded video signal

Also Published As

Publication number Publication date
US7756341B2 (en) 2010-07-13
US20070005356A1 (en) 2007-01-04
DE602006002434D1 (de) 2008-10-09
EP1739593A1 (de) 2007-01-03

Similar Documents

Publication Publication Date Title
EP1739593B1 (de) Verfahren und Anordnung zur generischen visuellen Kategorisierung
US7680341B2 (en) Generic visual classification with gradient components-based dimensionality enhancement
US8463051B2 (en) Modeling images as mixtures of image models
US11978272B2 (en) Domain adaptation for machine learning models
Perronnin et al. Adapted vocabularies for generic visual categorization
Hoiem et al. Object-based image retrieval using the statistical structure of images
US10430649B2 (en) Text region detection in digital images using image tag filtering
US8165410B2 (en) Bags of visual context-dependent words for generic visual categorization
US7707132B2 (en) User preference techniques for support vector machines in content based image retrieval
US9158995B2 (en) Data driven localization using task-dependent representations
US20140219563A1 (en) Label-embedding for text recognition
US7840059B2 (en) Object recognition using textons and shape filters
US8111923B2 (en) System and method for object class localization and semantic class based image segmentation
US20140056520A1 (en) Region refocusing for data-driven object localization
US20060013475A1 (en) Computer vision system and method employing illumination invariant neural networks
Bouguila A model-based approach for discrete data clustering and feature weighting using MAP and stochastic complexity
Elguebaly et al. Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models
EP2172874B1 (de) Modellierung von Bildern als Mischungen aus Bildmodellen
EP3166021A1 (de) Verfahren und vorrichtung zur bildsuche mittels lichtungsanalyse- und -syntheseoperatoren
Zagoris et al. Text localization using standard deviation analysis of structure elements and support vector machines
Bibi et al. BoVW model based on adaptive local and global visual words modeling and log-based relevance feedback for semantic retrieval of the images
Ates et al. Kernel likelihood estimation for superpixel image parsing
Hentschel et al. Automatic image annotation using a visual dictionary based on reliable image segmentation
Xu et al. Integrated patch model: A generative model for image categorization based on feature selection
Allili et al. Unsupervised feature selection and learning for image segmentation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK YU

17P Request for examination filed

Effective date: 20070703

17Q First examination report despatched

Effective date: 20070802

AKX Designation fees paid

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 602006002434

Country of ref document: DE

Date of ref document: 20081009

Kind code of ref document: P

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20090528

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 11

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 12

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602006002434

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G06K0009620000

Ipc: G06V0030190000

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20240521

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240521

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20240522

Year of fee payment: 19