US20100226564A1 - Framework for image thumbnailing based on visual similarity - Google Patents

Framework for image thumbnailing based on visual similarity Download PDF

Info

Publication number
US20100226564A1
US20100226564A1 US12/400,277 US40027709A US2010226564A1 US 20100226564 A1 US20100226564 A1 US 20100226564A1 US 40027709 A US40027709 A US 40027709A US 2010226564 A1 US2010226564 A1 US 2010226564A1
Authority
US
United States
Prior art keywords
image
interest
region
dataset
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/400,277
Other versions
US8175376B2 (en
Inventor
Luca Marchesotti
Claudio Cifarelli
Gabriela Csurka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Priority to US12/400,277 priority Critical patent/US8175376B2/en
Assigned to XEROX CORPORATION reassignment XEROX CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CSURKA, GABRIELA, MARCHESOTTI, LUCA, CIFARELLI, CLAUDIO
Publication of US20100226564A1 publication Critical patent/US20100226564A1/en
Application granted granted Critical
Publication of US8175376B2 publication Critical patent/US8175376B2/en
Assigned to CITIBANK, N.A., AS AGENT reassignment CITIBANK, N.A., AS AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XEROX CORPORATION
Assigned to XEROX CORPORATION reassignment XEROX CORPORATION RELEASE OF SECURITY INTEREST IN PATENTS AT R/F 062740/0214 Assignors: CITIBANK, N.A., AS AGENT
Assigned to CITIBANK, N.A., AS COLLATERAL AGENT reassignment CITIBANK, N.A., AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XEROX CORPORATION
Assigned to JEFFERIES FINANCE LLC, AS COLLATERAL AGENT reassignment JEFFERIES FINANCE LLC, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XEROX CORPORATION
Assigned to CITIBANK, N.A., AS COLLATERAL AGENT reassignment CITIBANK, N.A., AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XEROX CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5838Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the exemplary embodiment relates to digital image processing. It finds particular application in connection with detection of salient regions and image thumbnailing in natural images based on visual similarity.
  • Image thumbnailing consists of the identification of one or more regions of interest in an input image: for example, salient parts are aggregated in foreground regions, whereas redundant and non informative pixels become part of the background.
  • the range of applications where thumbnailing can be applied is broad, including traditional problems like image compression, image visualizations, adaptive image display in small devices, but also more recent applications like variable data printing, assisted content creation, automatic blogging, and the like.
  • Saliency detection is seen as a simulation or modeling of the human visual attention mechanism. In the field of image processing, it is understood that some parts of an image receive more attention from human observers than others. Saliency refers to the “importance” or “attractiveness” of the visual information in an image. A salient region may describe any relevant part of an image that is a main focus of a typical viewer's attention. Visual saliency models have been used for feature detection and to estimate regions of interest. Many of these methods are based on biological vision models, which aim to estimate which parts of images attract visual attention.
  • Saliency maps can provide richer information about the relevance of features throughout an image. While interest points are generally simplistic corner (Harris) or blob (Laplace) detectors, saliency maps can carry higher level information. Such methods have been designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers.
  • saliency maps have been used for object recognition, image categorization, automated image cropping, adaptive image display, and the like.
  • saliency maps have been used to control the sampling density for feature extraction.
  • saliency maps can be used as foreground detection methods to provide regions of interest (ROI) for classification. It has been shown that extracting image features in the locality of ROIs can give better results than sampling features uniformly through the image. A disadvantage is that such methods may miss important context information from the background.
  • top-down saliency detection is often referred to as top-down saliency detection.
  • Bottom-up strategies are by far the most common and they are advantageous if the low level features represent the salient parts of the image well (e.g., isolated objects, uncluttered background). Top-down methods help when other factors dominate (e.g., the presence of human face), but they are lacking in generality. Hybrid approaches, in general, are designed in a two stage fashion where top-down strategies filter out noisy regions in bottom-up saliency maps.
  • Top-down visual attention processes are considered to be driven by voluntary control, and related to the observer's goal when analyzing a scene. These methods take into account higher order information about the image such as context, structure, etc.
  • Object detection can be seen as a particular case of top-down saliency detection, where the predefined task is given by the object class to be detected (See, Jiebo Luo, “Subject content-based intelligent cropping of digital photos,” in IEEE Intl. Conf. on Multimedia and Expo (2007)).
  • An additional example of a top-down approach is where the system first classifies the image in twrms of landscape, close-up, faces, etc. and then it applies the most appropriate thumbnailing/cropping strategy (See, G. Ciocca, C. Cusano, F. Gasparini, and R. Schettini, “Self-adaptive image cropping for small display,” in IEEE Intl. Conf. on Consumer Electronics (2007)).
  • a method for detecting a region of interest in an image includes, for each image in a dataset of images for which a region of interest has been respectively established, storing a respective dataset image representation based on features extracted from the image.
  • the method includes generating an original image representation for the original image based on features extracted from the image, identifying a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, training a classifier with information extracted from the established regions of interest of the subset of similar images and, with the trained classifier, identifying a region of interest in the original image.
  • an apparatus for detecting a region of interest in an image includes memory which stores the dataset image representations, and instructions for performing the above-described method.
  • a processor with access to the instructions and dataset image representations executes the instructions.
  • an apparatus for detecting a region of interest in an image includes memory which, for a dataset of images for which a respective region of interest has been established, stores a set of dataset image representations, each dataset image representation being derived from features extracted from a respective one of the images in the dataset.
  • Memory stores instructions which, for an original image for which a region of interest is to be detected, generate an original image representation for the original image based on features extracted from the original image, identify a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, and train a classifier to identify a region of interest in the original image, the classifier being trained with positive and negative examples, each of the positive examples comprising a high level representation based on features extracted from the established region of interest of a respective one of the subset of similar images and each of the negative examples comprising a high level representation based on features extracted from outside the established region of interest of a respective one of the subset of similar images.
  • a method for detecting a region of interest in an image includes storing a set of image representations, each image representation being based on features extracted from patches of a dataset image, where for each dataset image, the patch features are identified as salient or non-salient based on whether or not the patch is within a manually identified region of interest.
  • the method includes generating an original image representation for the original image based on features extracted from patches of the image, computing a distance measure between the original image representation and image representations in the set of image representations to identify a subset of similar image representations from the set of image representations, and training a classifier with positive and negative examples extracted from the images corresponding to subset of similar image representations, the positive examples each being based on the salient patch features of a respective image and the negative examples being based on non-salient patch features of the respective image.
  • the trained classifier a region of interest in the original image is identified based on the patch features of the original image.
  • FIG. 1 is a functional block diagram of an apparatus for identifying a region of interest in an image in accordance with one aspect of the exemplary method
  • FIG. 2 is a flow chart illustrating a method for identifying a region of interest in an image in accordance with one aspect of the exemplary method which may be performed with the apparatus of FIG. 1 ;
  • FIG. 3 illustrates the images processed during steps of the method
  • FIG. 4 illustrates substeps of part of the method of FIG. 2 ;
  • FIG. 5 illustrates substeps of part of the method of FIG. 2 ;
  • FIG. 6 illustrates patches and windows used in generating a saliency map
  • FIG. 7 illustrates inputting a salient region into categorizer which generates a category for the image
  • FIG. 8 illustrates F-measure values for various saliency detection methods as a function on threshold size
  • FIG. 9 illustrates Precision, Recall, and F-measure data for an Example comparing the present method (methods A and B, without and with Graph-cut) to comparative methods for saliency detection (methods C,D,E, and F);
  • FIG. 10 illustrates the displacement of a bounding box around the salient region from a manually assigned bounding box for the exemplary method (method B) and comparative methods C, D, E, and F.
  • the exemplary embodiment relates to an apparatus and computer-implemented method and computer program product for detecting saliency in an image, such as a natural image, based on similarity of the original image with images for which visually salient regions of pixels are pre-segmented.
  • the method assumes that images sharing similar visual appearance (as determined by comparing computer-generated content-based representations) share the same salient regions.
  • saliency detection is approached as a binary classification problem where pre-segmented salient/non salient pixels are available to train and test an algorithm.
  • the method allows both context and context independent saliency detection within a single framework.
  • the apparatus may be embodied in an electronic processing device, such as the illustrated computer 10 .
  • the electronic processing device 10 may include one or more specific or general purpose computing devices, such as a network server, Internet-based server, desk top computer, laptop computer, personal data assistant (PDA), cellular telephone, or the like.
  • the apparatus 10 includes an input component 12 , an output component 14 , a processor 16 , such as a CPU, and memory 18 .
  • the computer 10 is configured to implement a salient region detector 20 , hosted by the computer 10 , for identifying a salient region or regions of an original input image.
  • the salient region detector 20 may be in the form or software, hardware, or a combination thereof.
  • the exemplary salient region detector 20 is stored in memory 18 (e.g., non-volatile computer memory) and comprises instructions for performing the exemplary method described below with reference to FIG. 2 . These instructions are executed by the processor 16 .
  • a database 22 of previously annotated images (and/or information extracted therefrom) is stored in memory 18 or a separate memory.
  • Components 12 , 14 , 16 , 18 , of the computer 10 may be connected for communication with each other by a data/control bus 24 .
  • Input and output components may be combined or separate components and may include, for example, data input ports, modems, network connections, and the like.
  • the computer 10 is configured for receiving an original image 30 , e.g., via input component 12 , and storing the image 30 in memory, such as a volatile portion of computer memory 18 , while being processed by the salient region detector 20 .
  • the image 30 is transformed by the salient region detector 20 , e.g., by cropping or otherwise identifying a salient region or regions 32 of the image.
  • the computer 10 is also configured for storing and/or outputting the salient region 32 generated for the image 30 by the salient region detector 20 and for outputting a transformed image 34 in which the salient region is identified or which comprises a cop of the original image based on the salient region 32 , e.g., by the output component 14 .
  • the salient region image data may be cropped from the original image data.
  • a classifier 36 incorporated in the salient region detector or in communication with, is fed by the salient region detector with a subset of the database images (or information extracted therefrom) on which the classifier is trained to identify a salient region in an original image.
  • the computer 10 may include or be in data communication with a display 40 , such as an LCD screen, or other output device for displaying the salient region 32 .
  • a display 40 such as an LCD screen, or other output device for displaying the salient region 32 .
  • the salient region 32 may be further processed, e.g., by incorporation into a document 42 , which is output by the output component 14 , or output to a categorizer 44 .
  • the input image 30 generally includes image data for an array of pixels forming the image.
  • the image data may include colorant values, such as grayscale values, for each of a set of color separations, such as L*a*b* or RGB, or be expressed in another other color space in which different colors can be represented.
  • grayscale refers to the optical density value of any single image data channel, however expressed (e.g., L*a*b*, RGB, YCbCr, etc.).
  • the images may be photographs, video images, graphical images (such as freeform drawings, plans, etc.), text images, or combined images which include photographs along with text, and/or graphics, or the like.
  • the images may be received in PDF, JPEG, GIF, JBIG, BMP, TIFF or other common file format used for images and which may optionally be converted to another suitable format prior to processing.
  • Input images may be stored in a virtual portion of memory 18 during processing.
  • color as used herein is intended to broadly encompass any characteristic or combination of characteristics of the image pixels to be employed in the extraction of features.
  • the “color” may be characterized by one, two, or all three of the red, green, and blue pixel coordinates in an RGB color space representation, or by one, two, or all three of the L, a, and b pixel coordinates in an Lab color space representation, or by one or both of the x and y coordinates of a CIE chromaticity representation, or the like.
  • the color may incorporate pixel characteristics such as intensity, hue, brightness, etc.
  • pixel as used herein is intended to denote “picture element” and encompasses image elements of two-dimensional images or of three-dimensional images (which are sometimes also called voxels to emphasize the volumetric nature of the pixels for three-dimensional images).
  • Image 30 can be input from any suitable image source 50 , such as a workstation, database, scanner, or memory storage device, such as a disk, camera memory, memory stick, or the like.
  • the image source 30 may be temporarily or permanently communicatively linked to the computer 10 via a wired or wireless link 52 , such as a cable, telephone line, local area network or wide area network, such as the Internet, through a suitable input/output (I/O) connection 12 , such as a modem, USB port, or the like.
  • processor 16 may be the computer's central processing unit (CPU).
  • the exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like.
  • any processor capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIG. 2 , can be used to implement the method for generating an image representation.
  • Memory 18 may be in the form of separate memories or combined and may be in the form of any type of tangible computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, holographic memory, or suitable combination thereof.
  • RAM random access memory
  • ROM read only memory
  • magnetic disk or tape magnetic disk or tape
  • optical disk optical disk
  • flash memory holographic memory
  • FIG. 3 illustrates graphically the processing of an exemplary image 30 during the method.
  • the method begins at S 100 .
  • a large dataset of pre-segmented images 22 is stored. These are images for which the pixels have been identified as either salient or non-salient, based on human interest.
  • the dataset ideally includes a wide variety of images, including images which are similar in content to the image 30 for which a region of interest to be detected.
  • the dataset may include at least 100, e.g., at least 1000 images, such as at least about 10,000 images, and can be up to 100,000 or more, each dataset image having an established region of interest.
  • the pre-segmented region(s) of each image can further be associated with a semantic label referring to the content of the region.
  • a set of label types may be defined, such as animals, faces, people, buildings, automobiles, landscapes, flowers, other, and each image manually assigned one or more of these labels, based on its region of interest.
  • image representations are generated for each of the images in the dataset.
  • the representations are generally high level representations which are derived from low level features extracted from the image.
  • the high level representation of each pre-segmented image is based on fusing (e.g., a sum or concatenation) of positive (+ve) and negative ( ⁇ ve) high level representations, the positive one generated for the salient region (region of interest) of the image, the negative one for the non-salient region (i.e., everywhere except the region of interest).
  • the two high level representations of each of the pre-segmented images may be derived from patch level representations, e.g., fisher vectors from salient region patches for generating the +ve high level representation and fisher vectors from patches outside the salient region for the ⁇ ve high level representation.
  • S 104 may be performed prior to input of image 30 and the computed high level +ve and ⁇ ve representations stored in memory 18 . At this point, storing of the actual images in the dataset 22 may no longer be necessary. Further details of this step are illustrated in FIG. 4 and are described below.
  • an image 30 for which a visually salient region (which may be referred to herein as a region of interest (ROI)) is to be identified is input and stored in memory.
  • ROI region of interest
  • a representation of the input image is generated (e.g., by the salient region detector 20 ), based on low level features extracted from patches of the image in a similar manner to that for the pre-segmented images in the data-set except that here, there are no pre-segmented salient regions. Further details of this step are illustrated in FIG. 5 and are described below.
  • a subset K of images in the dataset of pre-segmented images is identified, based on similarity of their high level representations to that of the original image.
  • the K-nearest neighbor images may be retrieved from the annotated dataset 22 by the salient region detector 20 using a simple distance measure, such as the L 1 norm distance between Fisher signatures of each dataset image (e.g., as a sum of the high level +ve and ⁇ ve representations) and the high level representation of the input image (e.g., as a sum of all high level patch representations) e.g., as generated using a global visual vocabulary.
  • the subset of K nearest neighbor images is identified in the substantially the same way, but in this case, from among those images having pre-segmented regions labeled with the selected semantic label (assuming there are sufficient images in the dataset with pre-segmented regions annotated with the selected label).
  • a binary classifier 36 is trained using, as positive examples, the representations of the salient regions of the retrieved K-nearest neighbor images (designated by a “+” in FIG. 3 ), which may all be concatenated or summed to form a single vector. As negative examples, representations the non-salient backgrounds regions are used (designated by a “ ⁇ ” in FIG. 3 ), which again, may all be concatenated or summed to form a single vector.
  • the same high level representations can be used by any binary classifier, or alternatively other local patch representations can be considered in another embodiment.
  • the trained classifier 36 is used to output a saliency probability for each patch of the original image extracted at S 106 .
  • a region of interest of the original image is identified by the salient region detector 20 .
  • This step may include generating a saliency map 56 ( FIG. 3 ).
  • the saliency map may be refined by the salient region detector 20 , e.g., with graph-cut segmentation to refine the salient region, as illustrated at 58 in FIG. 3 .
  • the transformed image e.g., a crop of the image based on the salient region or an image in which the salient region is identified by the salient region detector 20 , e.g., by annotations such as HTML tags, is output.
  • further processing may be performed on the transformed image, e.g., the image crop based on the salient region may be displayed or incorporated into a document, e.g., placed in a predetermined placeholder location in a text document or sent to a categorizer 44 for assigning an object class to the image 30 .
  • the method ends at S 124 .
  • the present apparatus and method take advantage of a process which allows image saliency to be learned using (previously annotated) visually similar example images. Additionally, segmentation strategies can be advantageously employed for saliency detection. Further, the method is generic in the sense that it does not need to be tied to any specific category of images (e.g., faces), but allows a more broad concept of visual similarity, while at the same time, being readily adaptable to consideration of context. Finally, while the exemplary method has been described with particular reference to photographic (natural) images, the method is applicable to other types of images, such as medical or text document images, assuming that appropriate annotated data is available.
  • one or more human observers looks at each image, e.g., on a computer screen, and identifies a salient region (a region which the observer considers to be the most interesting). For example, the user may generate a bounding box which encompasses the salient region. Alternatively, the observer may identify a region or regions of interest by moving the cursor around the region(s) to generate a bounded region, which may then be processed, for example, by automatically creating a bounding box which encompasses the bounded region. In other embodiments, eye gaze data may be employed to identify a region of interest.
  • an eye gaze tracking device tracks eye movements of the observer while viewing the image for a short period of time.
  • the tracking data is superimposed on the image to identify the region of interest.
  • the identified regions/observations of several users may be combined to generate an overall region of interest for the image.
  • the image 62 can then be segmented into a salient region 64 and a non salient region 66 , based on the identified region of interest.
  • the image may then be annotated with the segmentation information, e.g., by applying a HTML tag or by storing the segmentation in a separate file.
  • the salient region may be associated with a semantic concept (by annotating the salient region or entire image with a label).
  • ROI Region of Interest
  • S 104 may include the following substeps for each image 62 in the dataset 22 :
  • a patches 70 A,B,C, etc., 72 A,B,C,D, and 74 are extracted from the image e.g., at multiple scales. This is illustrated for a portion of the image 62 in FIG. 6 , showing patches (unbroken lines) at three scales by way of example, where the arrows point roughly to the centers of the respective patches.
  • a representation of the patch (e.g., a Fisher vector) may be generated, based on the low level features.
  • patches are designated as salient or non salient, depending on whether they are within the pre-segmented region or not.
  • Various methods may be used to determine whether a patch is be considered to be “within” the salient region.
  • a threshold degree of overlap may be sufficient for a patch to be considered within the salient region.
  • the overlap is computed relative to the area of the patch size, e.g., if 50% or more of the patch is within the salient region, then it is accepted as being within it. If the region of interest is too small, relative to the size of the patch (e.g., ROI is less than 70% of the patch area), then the patch will not be considered.
  • a patch is considered to be within the salient region if its geometric center lies within the salient region. In yet another embodiment, the patch is considered to be within the salient region if it is entirely encompassed by or entirely encompasses the salient region.
  • a high level +ve representation of the salient region of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the salient patches and a high level ⁇ ve representation of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the non-salient patches.
  • patch representations e.g., fisher vectors, or simply, low level features
  • patch representations e.g., fisher vectors, or simply, low level features
  • a high level representation of the image is generated, e.g., as a feature vector, e.g., a Fisher vector-based Image Signature, for example, by concatenation or other function of the +ve and ⁇ ve high level representations (Fisher FG vector and Fisher BG vector).
  • a feature vector e.g., a Fisher vector-based Image Signature
  • low level features are extracted, e.g., as a features vector.
  • a representation (e.g., Fisher vector) may be generated, based on the extracted low level features.
  • a high level representation of the image is extracted, based on the patch representations or low level features.
  • the high level representation is a vector (e.g., a Fisher vector-based Image Signature) formed by concatenation or other function of the patch level Fisher vectors.
  • a Bag-of-Visual words (BOV) representation of the image as disclosed, for example, in above-mentioned U.S. Pub. Nos. 2007/0005356; 2007/0258648; 2008/0069456; the disclosures of which are incorporated herein by reference, and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints,” ECCV Workshop on Statistical Learning in Computer Vision (2004); also the method of Y.
  • BOV Bag-of-Visual words
  • multiple patches are extracted from the image (original or dataset image) at various scales (S 104 a, S 108 a ).
  • low level features are extracted (S 104 b , S 108 b ).
  • the low level features which are extracted from the patches are typically quantitative values that summarize or characterize aspects of the respective patch, such as spatial frequency content, an average intensity, color characteristics (in the case of color images), gradient values, and/or other characteristic values.
  • at least about fifty low level features are extracted from each patch; however, the number of features that can be extracted is not limited to any particular number or type of features for example, 1000 or 1 million low level features could be extracted depending on computational capabilities.
  • the low level features include local (e.g., pixel) color statistics, and texture.
  • local RGB statistics e.g., mean and standard deviation
  • texture gradient orientations (representing a change in color) may be computed for each patch as a histogram (SIFT-like features).
  • SIFT-like features two (or more) types of low level features, such as color and texture, are separately extracted and the high level representation of the patch or image is based on a combination (e.g., a sum or a concatenation) of two Fisher Vectors, one for each feature type.
  • SIFT descriptors are multi-image representations of an image neighborhood, such as Gaussian derivatives computed at, for example, eight orientation planes over a four-by-four grid of spatial locations, giving a 128-dimensional vector (that is, 128 features per features vector in these embodiments).
  • Other descriptors or feature extraction algorithms may be employed to extract features from the patches. Examples of some other suitable descriptors are set forth by K. Mikolajczyk and C. Schmid, in “A Performance Evaluation Of Local Descriptors,” Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wis., USA, June 2003, which is incorporated in its entirety by reference.
  • a feature vector can be employed to characterize each patch.
  • the feature vector can be a simple concatenation of the low level features.
  • the extracted low level features can be used to generate a high level representation of the patch (e.g., a Fisher vector) (S 104 c , S 108 c ).
  • a visual vocabulary is built for each feature type using Gaussian Mixture Models. Modeling the visual vocabulary in the feature space with a GMM may be performed according to the method described in F. Perronnin, C. Dance, G. Csurka and M. Bressan, “ Adapted Vocabularies for Generic Visual Categorization ,” In ECCV (2006).
  • each patch is then characterized (at S 104 c , S 108 c ) with a gradient vector derived from a generative probability model.
  • the visual vocabulary is modeled by a Gaussian mixture model in a low level feature space where each Gaussian corresponds to a visual word.
  • the GMM vocabulary is trained using maximum likelihood estimation (MLE) considering all or a random subset the low level descriptors extracted from the annotated dataset 22 .
  • MLE maximum likelihood estimation
  • the Fisher gradient vector f t of the descriptor x t is then just the concatenation of the partial derivatives in Equations (1) and (2), leading to a 2 ⁇ D ⁇ N dimensional vector, where D is the dimension of the low level feature space. While the Fisher vector is high dimensional, it can be made relatively sparse as only a small number of components have non-negligible values.
  • a Fisher vector Considering the gradient log-likelihood of each patch with respect to the parameters of the Gaussian Mixture leads to a high level representation of the patch which is referred to as a Fisher vector.
  • the dimensionality of the Fisher vector can be reduced to a fixed value, such as 50 or 100 dimensions, using principal component analysis.
  • the two Fisher vectors are concatenated or otherwise combined to form a single high level representation of the patch having a fixed dimensionality.
  • features-based representations can be used to represent each patch, such as a set of features, a two- or more-dimensional array of features, or the like.
  • the high level representation of the original image can then be generated from the patch feature vectors (e.g., the patch Fisher vectors) (S 104 f , S 108 d ).
  • the patch feature vectors e.g., the patch Fisher vectors
  • the patches are labeled according to their overlap with the manually designated salient regions. This leads to two sets of low level features X+and X ⁇ referring to the set of patches that are considered salient and those which are non-salient.
  • two Fisher vectors f X+ and f X ⁇ are computed. These two vectors are then stored as indexes in the database and are, in the exemplary embodiment, the only required information from the dataset images needed to process a new image.
  • each original image 30 and each of the K nearest neighbor images 62 is represented by a high level representation which is simply the concatenation of two Fisher Vectors, one for texture and one for color, each vector formed by averaging the Fisher Vectors of the patches.
  • This single vector is referred to herein as a Fisher image signature.
  • the patch level Fisher vectors may be otherwise fused, e.g., by concatenation, dot product, or other combination of patch level Fisher vectors to produce an image level Fisher vector.
  • a Fisher image signature F Y is computed in an analogous way with respect to the initialization phase, except that all patches of the image are used to compute the signature (S 104 d ).
  • the Fisher image signature is exemplary of types of high level representation which can be used herein.
  • Other image signatures used in the literature for image retrieval may alternatively be used, as discussed above, such as a Bag-of-Visual Words (BOV) representation or Fisher kernel (FK).
  • BOV Bag-of-Visual Words
  • FK Fisher kernel
  • the most similar images are retrieved from the dataset where, for each image, a manually annotated ROI is available, as described above.
  • the K nearest neighbors are identified, based on the distance metric, where K may be, for example, at least 10, and up to about 50 or 100.
  • K may be, for example, at least 10, and up to about 50 or 100.
  • a suitable subset contains about 20-30 images, which may represent, for example, less than 20%, e.g., no more than about 10% of the number of images in the dataset, and in one embodiment, no more than about 1 % or 0.2% thereof.
  • the retrieval of a set of K images from D which are visually similar to I n generates a list of signatures ⁇ F X+ ,F X ⁇ > associated with the K most similar images to I n .
  • a distance metric is computed between the global Fisher image signature obtained by summing F X+ and F X ⁇ (or other high level image representation) and that of the original image F Y .
  • the K most similar images are retrieved using the Fisher image signature with the normalized L 1 distance measure as described, for example, in S. Clinchant, J.-M. Renders and G.
  • a normalized L1 measure can be used to retrieve similar images:
  • ⁇ circumflex over (f) ⁇ is the vector f normalized to normalize L 1 as equal to 1
  • ⁇ circumflex over (f) ⁇ i are the elements of the vector ⁇ circumflex over (f) ⁇
  • f X f X+ +f X ⁇ (as the set of descriptors in image X is the union of salient and non-salient patches).
  • distance measure used is the L 1 norm distance between Fisher Image Signatures of each dataset image and the input image.
  • other distance measures such as Euclidian distance, chi 2 distance, or the like, may alternatively be used for identifying a subset of similar images from the dataset.
  • the classifier 36 is trained using the Fisher Vector representations of image patches extracted from the retrieved K-nearest neighbor images. For the K-nearest neighbor images retrieved, manually annotated salient regions are available in, e.g., the form of bounding boxes. Therefore in each annotated image, the system considers as positive (i.e. salient) patches, the ones inside the annotated bounding box, and as negative (i.e., non-salient) all the others.
  • FG signature For each retrieved image X j , a Foreground Fisher vector (FG signature) f X+ j is/has been computed by averaging the Fisher Vectors of the +ve patches and a Background Fisher Vector (BG signature) f X ⁇ j is/has been computed by averaging over the ⁇ ve patches. Then, all Fisher vectors representing salient regions are collected (summed) and all Fisher vectors representing non-salient regions are collected (summed) in the K most similar image retrieved images leading to a foreground Fisher model and a background Fisher model:
  • the patches are designated as positives only if they are within the salient regions labeled with the target concept. Otherwise they are considered negatives. Therefore, while in the context-independent case the f X+ j and f X ⁇ j need not be recomputed (they correspond to the values in the stored signatures ⁇ F X+ ,F X ⁇ >), in the context-dependent case, these values may be re-computed on-line as the set of positive and negative patches may be different (if multiple objects were designed as salient regions in the image and have different labels).
  • a saliency score is computed based on the foreground Fisher model and on the background Fisher model. For example, a patch x i is considered salient, if its normalized L 1 distance to the foreground Fisher model is smaller than to the background Fisher model:
  • the binary classifier score may be replaced with a non-binary score which is a simple function of the normalized L1 distances:
  • S S( )
  • the value S can be assigned to the center pixel of each region and then either interpolate the values between these centers or use a Gaussian propagation of these values. The latter can be done by averaging over all Gaussian weighted scores:
  • W is the value in pixel p of the Gaussian centered in the geometrical center of each the region .
  • a diagonal isotropic covariance matrix may be used, with values (0.6*R) 2 , R 2 being the size of .
  • the saliency map is built for the original image by considering N such overlapping sub-windows (shown as 80 A,B,C, etc.) of the same size (e.g., 50 pixels*50 pixels) (a few of these windows 80 are illustrated in FIG. 6 ).
  • the windows may be of the same size or somewhat larger than the smallest patches.
  • a patch is considered to belong to a window if the geometric center of the patch lies within the window. For example, in the case of window 80 E, patches 70 F and 74 are considered to belong to it. Note that this could be done at the patch level rather than using windows 80 . However averaging over several patches gives more stable results.
  • the window's saliency score is computed based on the distance of the window signature (Eqn. (6) to the Foreground signature (FS) and Background signature (BS), as defined in (Eqn. (5), using the (optionally normalized) L 1 distance computed as in Eqn. (7).
  • the scores at the window level are projected to the pixels, as described in (Eqn. 8) above (averaging for each pixel, the window saliency scores of the windows containing that pixel).
  • Equation (8) has a low computational cost but it is also a rather simple evaluation of the saliency score.
  • a patch classifier (not shown) could be used to compute a saliency probability map by using the approach described in Gabriela Csurka and Florent Perronnin, “A Simple High Performance Approach to Semantic Segmentation,” British Machine Vision Conference (BMVC), Leeds, UK (September 2008).
  • BMVC British Machine Vision Conference
  • BMVC British Machine Vision Conference
  • BMVC British Machine Vision Conference
  • BMVC British Machine Vision Conference
  • BMVC British Machine Vision Conference
  • a patch classifier is trained and the patch probability score for the original image is then propagated from patches to pixels as described in the Csurka and Perronnin reference.
  • the saliency maps obtained by this type of classifier are not necessarily better than that which uses Eqn. 8.
  • a bounding box may simply be drawn to encompass all (or substantially all) pixels which exceed a threshold probability score which is then designated as the region of interest.
  • Different strategies can be designed to build a thumbnail from this map.
  • One option is to select the bounding box of the biggest or most centered connected component.
  • Another option is to consider all connected components and retarget them into a single region as proposed in V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B.
  • refinement techniques may be applied to define an ROI based on the salient pixels which takes further considerations into account (S 118 ).
  • the role of this step is to enhance the precision.
  • the salient regions correspond to isolated objects. Therefore, regions classified as salient can be further refined by taking into account edge constraints.
  • a Graph-Cut segmentation may be used to adjust the borders of the salient region. This approach assumes that the estimated region contains a consistent part of the relevant objects.
  • One suitable method is based on the Graph-Cut algorithms described in Rother, C., Kolmogorov, V., and Blake, A., “Grabcut: Interactive foreground extraction using iterated graph cuts,” In ACM Trans. Graphics ( SIGGRAPH 2004) 23(3), 309-314 (2004).
  • the problem of segmentation is formulated in terms of energy minimization (i.e., max-flow/min-cut).
  • the image is represented as graph in which each pixel is a node and the edges can represent color similarity between adjacent pixels as in a Markov Random Field.
  • two extra nodes starting and ending nodes are added to the graph and linked to each pixel based on the probability that the pixel belongs to background or foreground.
  • the saliency map generated at S 116 is used to build an initial Graph-Cut model.
  • a first Gaussian Mixture Model (GMM) is created for the foreground colors and a second GMM is created for the background colors.
  • GMM Gaussian Mixture Model
  • FIG. 3 shows an example graph-cut mask 58 created from the ROI mask 56 generated at S 116 .
  • the graph-cut method is performed as follows: First, two thresholds are chosen (one positive th+ and one negative th ⁇ ). This separates the saliency map S into 3 different regions: pixels u labeled as salient (S(u)>th+), pixels labeled as non-salient (S(u) ⁇ th ⁇ ) and unknown (the others). Two Gaussian Mixture Models (GMMs) ⁇ 1 and ⁇ 2 are created, one using RGB values of salient (foreground) pixels and one using RGB values of non salient (background) pixels. Then the following energy:
  • E ⁇ ( L ) ⁇ u ⁇ P ⁇ D u ⁇ ( u ) + ⁇ ( u , v ) ⁇ c ⁇ V u , v ⁇ ( u , v ) Eqn . ⁇ ( 9 )
  • V u , v ⁇ ( u , v ) ⁇ ⁇ ⁇ ⁇ ( u , v ) ⁇ ⁇ C ⁇ ⁇ l u , l v ⁇ exp ( - ⁇ u - v ⁇ 2 2 * ⁇ ) Eqn . ⁇ ( 10 )
  • the positively labeled area after Graph-Cut is too small, compared with the size of the original image, e.g., less than 5% or less than 10% of its size.
  • step S 116 i.e., the binarized Saliency Map 56 is used for identifying an ROI.
  • the ROI may be generated, for example, from the saliency map 58 (or 56 ) by processing the map in order to find the biggest, most centered object based on an analysis of statistics of the saliency map distribution (e.g., center of mass of the distribution, cumulative probability etc.).
  • statistics of the saliency map distribution e.g., center of mass of the distribution, cumulative probability etc.
  • a rectangular crop (image thumbnail) 90 can then be generated, based on this salient region.
  • the method illustrated in FIGS. 2 , 4 , and 5 may be implemented in a computer program product that may be executed on a computer.
  • the computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or may be a transmittable carrier wave in which the control program is embodied as a data signal.
  • Computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like, or any other medium from which a computer can read and use.
  • the exemplary method thus described may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like.
  • any device capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIGS. 2 , 4 , and 5 , can be used to implement the automated method for identifying a region of interest in an image.
  • variable data applications such as 1 to 1 personalization and direct mail marketing often employ an image.
  • a document 42 can be created incorporating an appropriately sized crop 90 which incorporates the salient region.
  • the human observers used to annotate the salient regions of the images in the dataset 22 can be selected to represent the target audience.
  • two or more sets of annotators may be used, e.g., one group comprising only females, the other, only males, and separate sets of image signatures stored for each group.
  • the K nearest neighbors may be different, depending on which set of signatures is used.
  • Variable data printing is not the only application of the exemplary system and apparatus.
  • Other applications such as image and document asset management or document image/photograph set visualization, and the like can also benefit.
  • a crop 90 of the original image based on the salient region, can be used for a thumbnail which is displayed in place of the original image, allowing a user to select images of interest from a large group of images, based on the interesting parts.
  • the thumbnail (crop) 90 can be fed to a categorizer 44 for categorizing the image based on image content.
  • the categorizer is not confused by including areas of the image which are less likely to be of visual interest.
  • the image crop 90 is fed to a categorizer, which has been trained with training image crops 94 , generated in the same way, but which has been annotated with a respective class (e.g., dogs, cats, flowers in the exemplary embodiment).
  • the categorizer (which may incorporate a multiclass classifier or a set of binary classifiers, one for each object class) outputs a class 96 for the crop, based on a similarity of features of the image crop to those of the training images.
  • the exemplary method is evaluated by comparing the results with those of four comparative methods for saliency detection:
  • Method A Exemplary method without Graph-cut.
  • Method B Exemplary method using Graph-cut, as described above.
  • Method D (ITTI): A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention.
  • Itti and Koch 2000 A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention.
  • Matlab implementation available at http://www.saliencytoolbox.net/ was employed.
  • Method F (CRF): A learning method (Liu, et al.), based on a Conditional Random Field classifier.
  • MRSA Dataset Part of the dataset described in Liu, et al. (MRSA Dataset) was used to train and test the exemplary method.
  • the dataset was composed of 5000 images labeled by different users with no specific skills in graphic design.
  • the dataset included images of a variety of different subjects. In general, a single object is present in the image with a broad range of backgrounds with fairly homogeneous color or texture.
  • Ground truth data comprising manually annotated regions of interest generated by different users is also available.
  • the users manually selected a rectangle (bounding box) containing the region of interest, which is typically represented by a full object or, in some cases by a subpart of the object (e.g., face).
  • the 5000 images from the MRSA Dataset used in this example had bounding boxes annotated by nine users.
  • the annotations are highly consistent with a very small variance over the nine bounding boxes.
  • the bounding boxes represent approximately 35% of the total area of the image, but this varies over a fairly wide distribution.
  • the distance of the center of mass of the object from the center of the image is, on average, 42 pixels. Again the annotated dataset showed a distribution.
  • a ground truth saliency map g(x,y) For each image in the dataset, a ground truth saliency map g(x,y) has been generated to evaluate the results based on user annotations (bounding boxes containing salient regions). In particular, since the annotations for MRSA are highly consistent, an average of the nine bounding boxes of the various users was used. Maps g(x,y) were generated, with rectangular salient regions pixels set to 1 and 0 otherwise.
  • Performance was evaluated by providing benchmarks for the performances using the following measures: BDE (See, D. R. Martin, C. C. Fowkles and J. Malik, “Learning to detect natural image boundaries using local brightness, color and texture cues,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI 26(5) pp. 530-549 (May 2004)) was used for assessing the displacement of the bounding boxes ( FIG. 10 ) and Precision, Recall and F-measure to acessess the quality of the saliency map.
  • Precision (Pr), Recall (Re) and F-measure (F ⁇ ) can be defined according to Liu, et al., as follows:
  • FIG. 8 shows the behavior of the F-measure as a function of the threshold on the map.
  • the exemplary method (A and B) can be seen to give a better result than Methods C and E.
  • FIG. 8 shows the improvement that the Graph-Cut stage (Method B) introduces in the proposed method, increasing the F-measure of almost 10% as compared with Method A (without Graph-Cut).
  • Methods D and F the thresholding was not applied because the results were taken directly from the Hou, et al. paper.
  • FIG. 9 shows the thresholds selected for the Methods compared.
  • FIG. 10 shows the Bounding Box displacement index. It represents the average distance, in pixels, of the center of the automatically detected Bounding Box from the center of the ground truth Bounding Box. The smaller this value the more accurate is the bounding box detected. As can be seen, the exemplary method using Graph-Cut (Method B) gave the best results.

Abstract

An apparatus and method for detecting a region of interest in an image are disclosed. Image representations for a set of images that have been manually annotated with regions of interest are stored, along with positive and negative representations of each image which are similarly derived to the image representations except that they are based on features extracted from patches within the region of interest and outside it, respectively. For an original image for which a region of interest is desired, the stored information for K similar images is automatically retrieved and used to train a classifier. The trained classifier provides, for each patch of the original image, a probability of being in a region of interest, based extracted features of the patch (represented, for example, as a Fisher vector), which can be used to determine a region of interest in the original image.

Description

    CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS
  • The following copending applications, the disclosures of which are incorporated herein in their entireties by reference, are mentioned:
  • U.S. patent application Ser. No. 12/250,248, filed Oct. 13, 2008, entitled IMAGE SUMMARIZATION BY A LEARNING APPROACH, by Luca Marchesotti, et al.
  • U.S. application Ser. No. 12/361,235, filed Feb. 5, 2009, entitled MODELING IMAGES AS SETS OF WEIGHTED FEATURES, by Teofilo E. de Campos, et al.
  • U.S. application Ser. No. 12/033,434, filed Feb. 19, 2008, entitled CONTEXT DEPENDENT INTELLIGENT THUMBNAIL IMAGES, by Gabriela Csurka.
  • U.S. application Ser. No. 12/049,520 filed Mar. 17, 2008, entitled AUTOMATIC GENERATION OF A PHOTO GUIDE, by Luca Marchesotti, et al.
  • U.S. patent application Ser. No. 12/123,511, filed May 20, 2008, entitled IMPROVING IMAGE VISUALIZATION THROUGH CONTENT-BASED INSETS, by Luca Marchesotti, et al.
  • U.S. application Ser. No. 12/123,586, filed May 20, 2008, entitled METHOD FOR AUTOMATIC ENHANCEMENT OF IMAGES CONTAINING SNOW, by Luca Marchesotti.
  • U.S. application Ser. No. 12/175,857, filed Jul. 18, 2008, entitled SYSTEM AND METHOD FOR AUTOMATIC ENHANCEMENT OF SEASCAPE IMAGES, by Luca Marchesotti.
  • U.S. application Ser. No. 12/191,579, filed on Aug. 14, 2008, entitled SYSTEM AND METHOD FOR OBJECT CLASS LOCALIZATION AND SEMANTIC CLASS BASED IMAGE SEGMENTATION, by Gabriela Csurka, et al.
  • BACKGROUND
  • The exemplary embodiment relates to digital image processing. It finds particular application in connection with detection of salient regions and image thumbnailing in natural images based on visual similarity.
  • Image thumbnailing consists of the identification of one or more regions of interest in an input image: for example, salient parts are aggregated in foreground regions, whereas redundant and non informative pixels become part of the background. The range of applications where thumbnailing can be applied is broad, including traditional problems like image compression, image visualizations, adaptive image display in small devices, but also more recent applications like variable data printing, assisted content creation, automatic blogging, and the like.
  • Image thumbnailing is strongly related with the detection of salient regions. Saliency detection is seen as a simulation or modeling of the human visual attention mechanism. In the field of image processing, it is understood that some parts of an image receive more attention from human observers than others. Saliency refers to the “importance” or “attractiveness” of the visual information in an image. A salient region may describe any relevant part of an image that is a main focus of a typical viewer's attention. Visual saliency models have been used for feature detection and to estimate regions of interest. Many of these methods are based on biological vision models, which aim to estimate which parts of images attract visual attention. Implementation of these methods in computer systems generally fall into one of two main categories: those that give a number of relevant punctual positions, known as interest (or key-point) detectors, and those that give a more continuous map of relevance, such as saliency maps. Saliency maps can provide richer information about the relevance of features throughout an image. While interest points are generally simplistic corner (Harris) or blob (Laplace) detectors, saliency maps can carry higher level information. Such methods have been designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers.
  • Recently, saliency maps have been used for object recognition, image categorization, automated image cropping, adaptive image display, and the like. For example, saliency maps have been used to control the sampling density for feature extraction. Alternatively, saliency maps can be used as foreground detection methods to provide regions of interest (ROI) for classification. It has been shown that extracting image features in the locality of ROIs can give better results than sampling features uniformly through the image. A disadvantage is that such methods may miss important context information from the background.
  • A distinction can be made between a type of saliency detection which aims to detect the most interesting object in an image, irrespective of context (context independent saliency detection) and a concept type of saliency detection in which specific type of object is searched for in the image.
  • The typical context independent case is often solved by bottom-up methods which seek to detect the most interesting part of the image, without targeting any specific object or concept. Concept type saliency detection is often referred to as top-down saliency detection.
  • Visual saliency and attention has been modelled with three categories of approaches inspired by the human visual system. Bottom-up, stimulus-driven methods are based on intrinsic low-level features such as contrast, color, orientation, and the like. Top-down methods take into account higher order information (context, structure) about the image in the analysis. Hybrid approaches aim to leverage benefits of the other two categories.
  • Bottom-up strategies are by far the most common and they are advantageous if the low level features represent the salient parts of the image well (e.g., isolated objects, uncluttered background). Top-down methods help when other factors dominate (e.g., the presence of human face), but they are lacking in generality. Hybrid approaches, in general, are designed in a two stage fashion where top-down strategies filter out noisy regions in bottom-up saliency maps.
  • One of example of bottom-up methods is described in L. Itti, C. Koch, E. Niebur, et al., “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259 (1998). In this approach, multi-scale topographic features characterizing color, intensity and texture are extracted and combined with “center-surround” operations to obtain saliency maps. Another method is described in Xiaodi Hou and Liqing Zhang, “Saliency Detection: A Spectral. Residual Approach,” IEEE Conf on Computer Vision & Pattern Recognition (2007). The methods is based on spectral residual of images in the spectral domain that locates salient regions by taking into account the “noise” in the logarithmic magnitude frequency curve of an image.
  • Gao, et al. reformulated the “center-surround” hypothesis in a decision theoretic framework (see, D. Gao and N. Vasconcelos, “Bottom-up saliency is a discriminant process, Proceedings of IEEE Int'l Conf. on Computer Vision (ICCV), Rio de Janeiro, Brazil (2007); D. Gao, V. Mahadevan and N. Vasconcelos, “The discriminant center-surround hypothesis for bottom-up saliency,” Proc. of Neural Information Processing Systems (NIPS), Vancouver, Canada (2007)). Saliency detection is interpreted as a binary classification problem where saliency is identified with features that discriminate “center” and “surround” regions well.
  • Top-down visual attention processes are considered to be driven by voluntary control, and related to the observer's goal when analyzing a scene. These methods take into account higher order information about the image such as context, structure, etc. Object detection can be seen as a particular case of top-down saliency detection, where the predefined task is given by the object class to be detected (See, Jiebo Luo, “Subject content-based intelligent cropping of digital photos,” in IEEE Intl. Conf. on Multimedia and Expo (2007)).
  • An additional example of a top-down approach is where the system first classifies the image in twrms of landscape, close-up, faces, etc. and then it applies the most appropriate thumbnailing/cropping strategy (See, G. Ciocca, C. Cusano, F. Gasparini, and R. Schettini, “Self-adaptive image cropping for small display,” in IEEE Intl. Conf. on Consumer Electronics (2007)).
  • Recent Hybrid approaches combine bottom-up with classic top-down object detection strategies. One approach blends the Viola-Jones face detector (Jones, M. J., Rehg, J. M., “Statistical Color Models with Application to Skin Detection,” IJCV(46), No. 1, pp. 81-96 (January 2002)) with the Itti classic approach (See, L. Itti and C. Koch, “Computational Modeling of Visual Attention,” Nature Reviews Neuroscience, 2(3): 194-203 (2001), hereinafter “Itti and Koch 2001”). In a similar fashion, Huang, et al. combines their saliency map based on color, shape, and texture with face and text detector and uses branch and bound algorithm to find optimal solutions efficiently (See, Chen-Hsiu Huang, Chih-Hao Shen, Chun-Hsiang Huang and Ja-Ling Wu, “A MPEG-7 Based Content-aware Album System for Consumer Photographs,” Bulletin of the College of Engineering, NTU, No. 90, pp. 3-24 (February 2004)).
  • Recent approaches suggest that saliency can be learned, either using global features or sufficient manually labelled examples (See, T. Liu, J. Sun, N. Zheng, X. Tang and H. Shum, “Learning to Detect A Salient Object,” CVPR (2007), hereinafter “Liu, et al.”), or directly from human eye movement data through a simple parameter-free approach.
  • In contrast, Z. Wang, B. Li, “A Two-Stage Approach to Saliency Detection in Images,” In ICASSP 2008 IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (March/April 2008) combines spectral residual for bottom-up analysis with features capturing similarity and continuity based on Gestalt principles.
  • Above-mentioned U.S. patent application Ser. No. 12/250,248 detects regions of interest (ROIs) by a learning approach. The method uses the information related to the position and the size of the manually selected ROIs. Above-mentioned U.S. application Ser. No. 12/033,434 also proposes a method for detecting salient parts of an image, but the approach is heavily dependent on the semantic context in which either the image or its thumbnail is used. A visual concept is derived from each image and the ROI that corresponds to that visual concept is sought. Therefore, an image can lead to completely different thumbnails, depending on the context.
  • INCORPORATION BY REFERENCE
  • The following references, the disclosures of which are incorporated herein in their entireties by reference, are mentioned:
  • U.S. Pub. No. 2008/0317358, published Dec. 25, 2008, entitled CLASS-BASED IMAGE ENHANCEMENT SYSTEM, by Marco Bressan, et al., discloses a method for image enhancement, which includes assigning a semantic class to a digital image based on image content, and applying an aesthetic enhancement to the image based on an image quality of the image and the assigned semantic class.
  • U.S. Pub. No. 2007/0005356, entitled GENERIC VISUAL CATEGORIZATION METHOD AND SYSTEM; U.S. Pub. No. 2007/0258648, entitled GENERIC VISUAL CLASSIFICATION WITH GRADIENT COMPONENTS-BASED DIMENSIONALITY ENHANCEMENT; and U.S. Pub. No. 2008/0069456 entitled BAGS OF VISUAL CONTEXT-DEPENDENT WORDS FOR GENERIC VISUAL CATEGORIZATION, all by Florent Perronnin; and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints”, ECCV Workshop on Statistical Learning in Computer Vision, 2004, disclose systems and methods for categorizing images based on content.
  • The following relate to various methods for saliency detection: U.S. Pub. No. 2008/0304740, published Dec. 11, 2008, entitled Salient Object Detection, by Jian Sun, et al.; U.S. Pub. No. 2008/0304708, published Dec. 11, 2008, entitled DEVICE AND METHOD FOR CREATING A SALIENCY MAP OF AN IMAGE, by Olivier Le Meur, et al.; U.S. Pub. No. 2008/0304742, published Dec. 11, 2008, entitled COMBINING MULTIPLE CUES IN A VISUAL OBJECT DETECTOR, by Jonathan H. Connell; U.S. Pub. No. 2006/0093184, published May 4, 2006, entitled IMAGE PROCESSING APPARATUS, by Motofumi Fukui, et al.; and U.S. Pat. No. 7,400,761, issued Jul. 15, 2008, entitled CONTRAST-BASED IMAGE ATTENTION ANALYSIS FRAMEWORK, by Ma, et al.
  • Brief Description
  • In accordance with one aspect of the exemplary embodiment, a method for detecting a region of interest in an image includes, for each image in a dataset of images for which a region of interest has been respectively established, storing a respective dataset image representation based on features extracted from the image. For an original image for which a region of interest is to be detected, the method includes generating an original image representation for the original image based on features extracted from the image, identifying a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, training a classifier with information extracted from the established regions of interest of the subset of similar images and, with the trained classifier, identifying a region of interest in the original image.
  • In another aspect, an apparatus for detecting a region of interest in an image includes memory which stores the dataset image representations, and instructions for performing the above-described method. A processor with access to the instructions and dataset image representations executes the instructions. In another aspect, an apparatus for detecting a region of interest in an image includes memory which, for a dataset of images for which a respective region of interest has been established, stores a set of dataset image representations, each dataset image representation being derived from features extracted from a respective one of the images in the dataset. Memory stores instructions which, for an original image for which a region of interest is to be detected, generate an original image representation for the original image based on features extracted from the original image, identify a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, and train a classifier to identify a region of interest in the original image, the classifier being trained with positive and negative examples, each of the positive examples comprising a high level representation based on features extracted from the established region of interest of a respective one of the subset of similar images and each of the negative examples comprising a high level representation based on features extracted from outside the established region of interest of a respective one of the subset of similar images.
  • In another aspect, a method for detecting a region of interest in an image includes storing a set of image representations, each image representation being based on features extracted from patches of a dataset image, where for each dataset image, the patch features are identified as salient or non-salient based on whether or not the patch is within a manually identified region of interest. For an original image for which a region of interest is to be detected, the method includes generating an original image representation for the original image based on features extracted from patches of the image, computing a distance measure between the original image representation and image representations in the set of image representations to identify a subset of similar image representations from the set of image representations, and training a classifier with positive and negative examples extracted from the images corresponding to subset of similar image representations, the positive examples each being based on the salient patch features of a respective image and the negative examples being based on non-salient patch features of the respective image. With the trained classifier, a region of interest in the original image is identified based on the patch features of the original image.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a functional block diagram of an apparatus for identifying a region of interest in an image in accordance with one aspect of the exemplary method;
  • FIG. 2 is a flow chart illustrating a method for identifying a region of interest in an image in accordance with one aspect of the exemplary method which may be performed with the apparatus of FIG. 1;
  • FIG. 3 illustrates the images processed during steps of the method;
  • FIG. 4 illustrates substeps of part of the method of FIG. 2;
  • FIG. 5 illustrates substeps of part of the method of FIG. 2;
  • FIG. 6 illustrates patches and windows used in generating a saliency map;
  • FIG. 7 illustrates inputting a salient region into categorizer which generates a category for the image;
  • FIG. 8 illustrates F-measure values for various saliency detection methods as a function on threshold size;
  • FIG. 9 illustrates Precision, Recall, and F-measure data for an Example comparing the present method (methods A and B, without and with Graph-cut) to comparative methods for saliency detection (methods C,D,E, and F); and
  • FIG. 10 illustrates the displacement of a bounding box around the salient region from a manually assigned bounding box for the exemplary method (method B) and comparative methods C, D, E, and F.
  • DETAILED DESCRIPTION
  • The exemplary embodiment relates to an apparatus and computer-implemented method and computer program product for detecting saliency in an image, such as a natural image, based on similarity of the original image with images for which visually salient regions of pixels are pre-segmented. The method assumes that images sharing similar visual appearance (as determined by comparing computer-generated content-based representations) share the same salient regions. In the exemplary embodiment, saliency detection is approached as a binary classification problem where pre-segmented salient/non salient pixels are available to train and test an algorithm. In one embodiment, the method allows both context and context independent saliency detection within a single framework.
  • With reference to FIG. 1, an exemplary apparatus for salient region detection is illustrated. The apparatus may be embodied in an electronic processing device, such as the illustrated computer 10. In other embodiments, the electronic processing device 10 may include one or more specific or general purpose computing devices, such as a network server, Internet-based server, desk top computer, laptop computer, personal data assistant (PDA), cellular telephone, or the like. The apparatus 10 includes an input component 12, an output component 14, a processor 16, such as a CPU, and memory 18. The computer 10 is configured to implement a salient region detector 20, hosted by the computer 10, for identifying a salient region or regions of an original input image. The salient region detector 20 may be in the form or software, hardware, or a combination thereof. The exemplary salient region detector 20 is stored in memory 18 (e.g., non-volatile computer memory) and comprises instructions for performing the exemplary method described below with reference to FIG. 2. These instructions are executed by the processor 16. A database 22 of previously annotated images (and/or information extracted therefrom) is stored in memory 18 or a separate memory. Components 12,14,16,18, of the computer 10 may be connected for communication with each other by a data/control bus 24. Input and output components may be combined or separate components and may include, for example, data input ports, modems, network connections, and the like.
  • The computer 10 is configured for receiving an original image 30, e.g., via input component 12, and storing the image 30 in memory, such as a volatile portion of computer memory 18, while being processed by the salient region detector 20. The image 30 is transformed by the salient region detector 20, e.g., by cropping or otherwise identifying a salient region or regions 32 of the image. The computer 10 is also configured for storing and/or outputting the salient region 32 generated for the image 30 by the salient region detector 20 and for outputting a transformed image 34 in which the salient region is identified or which comprises a cop of the original image based on the salient region 32, e.g., by the output component 14. In one embodiment, the salient region image data may be cropped from the original image data. A classifier 36, incorporated in the salient region detector or in communication with, is fed by the salient region detector with a subset of the database images (or information extracted therefrom) on which the classifier is trained to identify a salient region in an original image.
  • The computer 10 may include or be in data communication with a display 40, such as an LCD screen, or other output device for displaying the salient region 32. Alternatively or additionally, the salient region 32 may be further processed, e.g., by incorporation into a document 42, which is output by the output component 14, or output to a categorizer 44.
  • The input image 30 generally includes image data for an array of pixels forming the image. The image data may include colorant values, such as grayscale values, for each of a set of color separations, such as L*a*b* or RGB, or be expressed in another other color space in which different colors can be represented. In general, “grayscale” refers to the optical density value of any single image data channel, however expressed (e.g., L*a*b*, RGB, YCbCr, etc.). The images may be photographs, video images, graphical images (such as freeform drawings, plans, etc.), text images, or combined images which include photographs along with text, and/or graphics, or the like. The images may be received in PDF, JPEG, GIF, JBIG, BMP, TIFF or other common file format used for images and which may optionally be converted to another suitable format prior to processing. Input images may be stored in a virtual portion of memory 18 during processing.
  • The term “color” as used herein is intended to broadly encompass any characteristic or combination of characteristics of the image pixels to be employed in the extraction of features. For example, the “color” may be characterized by one, two, or all three of the red, green, and blue pixel coordinates in an RGB color space representation, or by one, two, or all three of the L, a, and b pixel coordinates in an Lab color space representation, or by one or both of the x and y coordinates of a CIE chromaticity representation, or the like. Additionally or alternatively, the color may incorporate pixel characteristics such as intensity, hue, brightness, etc. Moreover, while the method is described herein with illustrative reference to two-dimensional images such as photographs or video frames, it is to be appreciated that these techniques are readily applied to three-dimensional images as well. The term “pixel” as used herein is intended to denote “picture element” and encompasses image elements of two-dimensional images or of three-dimensional images (which are sometimes also called voxels to emphasize the volumetric nature of the pixels for three-dimensional images).
  • Image 30 can be input from any suitable image source 50, such as a workstation, database, scanner, or memory storage device, such as a disk, camera memory, memory stick, or the like. The image source 30 may be temporarily or permanently communicatively linked to the computer 10 via a wired or wireless link 52, such as a cable, telephone line, local area network or wide area network, such as the Internet, through a suitable input/output (I/O) connection 12, such as a modem, USB port, or the like. In the case of a computer 10, processor 16 may be the computer's central processing unit (CPU). However, it is to be appreciated that the exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like. In general, any processor, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIG. 2, can be used to implement the method for generating an image representation.
  • Memory 18 may be in the form of separate memories or combined and may be in the form of any type of tangible computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, holographic memory, or suitable combination thereof.
  • With reference to FIG. 2, a method for detecting a salient region of an original image is illustrated. FIG. 3 illustrates graphically the processing of an exemplary image 30 during the method.
  • The method begins at S100.
  • At S102, a large dataset of pre-segmented images 22 is stored. These are images for which the pixels have been identified as either salient or non-salient, based on human interest. The dataset ideally includes a wide variety of images, including images which are similar in content to the image 30 for which a region of interest to be detected. For example, the dataset may include at least 100, e.g., at least 1000 images, such as at least about 10,000 images, and can be up to 100,000 or more, each dataset image having an established region of interest. In one embodiment, for at least some of the images in the dataset, the pre-segmented region(s) of each image can further be associated with a semantic label referring to the content of the region. For example, a set of label types may be defined, such as animals, faces, people, buildings, automobiles, landscapes, flowers, other, and each image manually assigned one or more of these labels, based on its region of interest.
  • At S104, image representations are generated for each of the images in the dataset. The representations are generally high level representations which are derived from low level features extracted from the image. In one embodiment, the high level representation of each pre-segmented image is based on fusing (e.g., a sum or concatenation) of positive (+ve) and negative (−ve) high level representations, the positive one generated for the salient region (region of interest) of the image, the negative one for the non-salient region (i.e., everywhere except the region of interest). The two high level representations of each of the pre-segmented images may be derived from patch level representations, e.g., fisher vectors from salient region patches for generating the +ve high level representation and fisher vectors from patches outside the salient region for the −ve high level representation. As will be appreciated, S104 may be performed prior to input of image 30 and the computed high level +ve and −ve representations stored in memory 18. At this point, storing of the actual images in the dataset 22 may no longer be necessary. Further details of this step are illustrated in FIG. 4 and are described below.
  • At S106, an image 30 for which a visually salient region (which may be referred to herein as a region of interest (ROI)) is to be identified is input and stored in memory.
  • At S108, a representation of the input image is generated (e.g., by the salient region detector 20), based on low level features extracted from patches of the image in a similar manner to that for the pre-segmented images in the data-set except that here, there are no pre-segmented salient regions. Further details of this step are illustrated in FIG. 5 and are described below.
  • At S110, a subset K of images in the dataset of pre-segmented images is identified, based on similarity of their high level representations to that of the original image. In particular, the K-nearest neighbor images may be retrieved from the annotated dataset 22 by the salient region detector 20 using a simple distance measure, such as the L1 norm distance between Fisher signatures of each dataset image (e.g., as a sum of the high level +ve and −ve representations) and the high level representation of the input image (e.g., as a sum of all high level patch representations) e.g., as generated using a global visual vocabulary.
  • Where images have been manually annotated with labels, prior to identifying the subset of K images, a user may be prompted to select one of the label types, or this information may be fed to the salient region detector 20 when the image 30 is input. In this embodiment, the subset of K nearest neighbor images is identified in the substantially the same way, but in this case, from among those images having pre-segmented regions labeled with the selected semantic label (assuming there are sufficient images in the dataset with pre-segmented regions annotated with the selected label).
  • At S112, a binary classifier 36 is trained using, as positive examples, the representations of the salient regions of the retrieved K-nearest neighbor images (designated by a “+” in FIG. 3), which may all be concatenated or summed to form a single vector. As negative examples, representations the non-salient backgrounds regions are used (designated by a “−” in FIG. 3), which again, may all be concatenated or summed to form a single vector. The same high level representations can be used by any binary classifier, or alternatively other local patch representations can be considered in another embodiment.
  • In the case where it is desired that a context-dependent salient region of the original image be identified, then when there are multiple salient regions in a nearest neighbor image, only the one(s) labeled with the selected label are considered as salient regions and used in generating the +ve representation. The rest of the image is considered non-salient.
  • At S114, the trained classifier 36 is used to output a saliency probability for each patch of the original image extracted at S106.
  • At S116, based on the saliency probabilities, a region of interest of the original image is identified by the salient region detector 20. This step may include generating a saliency map 56 (FIG. 3).
  • At S118 the saliency map may be refined by the salient region detector 20, e.g., with graph-cut segmentation to refine the salient region, as illustrated at 58 in FIG. 3.
  • At S120, the transformed image, e.g., a crop of the image based on the salient region or an image in which the salient region is identified by the salient region detector 20, e.g., by annotations such as HTML tags, is output.
  • At S122, further processing may be performed on the transformed image, e.g., the image crop based on the salient region may be displayed or incorporated into a document, e.g., placed in a predetermined placeholder location in a text document or sent to a categorizer 44 for assigning an object class to the image 30.
  • The method ends at S124.
  • There are several advantages to the exemplary method and apparatus. Unlike prior saliency detection methods which rely solely on the content of the image to generate a saliency map, the present apparatus and method take advantage of a process which allows image saliency to be learned using (previously annotated) visually similar example images. Additionally, segmentation strategies can be advantageously employed for saliency detection. Further, the method is generic in the sense that it does not need to be tied to any specific category of images (e.g., faces), but allows a more broad concept of visual similarity, while at the same time, being readily adaptable to consideration of context. Finally, while the exemplary method has been described with particular reference to photographic (natural) images, the method is applicable to other types of images, such as medical or text document images, assuming that appropriate annotated data is available.
  • Further details of the apparatus and method will now be described.
  • Dataset Image Annotation: (S102)
  • Referring once more to FIG. 1, a variety of methods exist for identifying salient regions 60 for the images 62 in the dataset 22. In one embodiment, one or more human observers looks at each image, e.g., on a computer screen, and identifies a salient region (a region which the observer considers to be the most interesting). For example, the user may generate a bounding box which encompasses the salient region. Alternatively, the observer may identify a region or regions of interest by moving the cursor around the region(s) to generate a bounded region, which may then be processed, for example, by automatically creating a bounding box which encompasses the bounded region. In other embodiments, eye gaze data may be employed to identify a region of interest. In this embodiment, an eye gaze tracking device tracks eye movements of the observer while viewing the image for a short period of time. The tracking data is superimposed on the image to identify the region of interest. The identified regions/observations of several users may be combined to generate an overall region of interest for the image. The image 62 can then be segmented into a salient region 64 and a non salient region 66, based on the identified region of interest. The image may then be annotated with the segmentation information, e.g., by applying a HTML tag or by storing the segmentation in a separate file. Furthermore, the salient region may be associated with a semantic concept (by annotating the salient region or entire image with a label). Thus, in the exemplary embodiment, the existence of a set D of images {I1, . . . , Id, . . . ,ID} representing a wide variety of subjects is assumed for building the dataset. It can also be assumed that each image Id has been manually annotated by specifying one (or more) rectangular Region of Interest (ROI) per image (e.g., =rd(x, y, w, h) centered in (x, y), with width and height dimensions w and h) or with a more general map containing the annotated salient region(s) and optionally with an associated semantic label.
  • Feature Extraction: (S104, S108)
  • As shown in FIG. 4, S104 may include the following substeps for each image 62 in the dataset 22:
  • At S104 a patches 70A,B,C, etc., 72A,B,C,D, and 74 are extracted from the image e.g., at multiple scales. This is illustrated for a portion of the image 62 in FIG. 6, showing patches (unbroken lines) at three scales by way of example, where the arrows point roughly to the centers of the respective patches.
  • At S104 b, for each patch, low level features are extracted.
  • At S104 c, for each patch, a representation of the patch (e.g., a Fisher vector) may be generated, based on the low level features.
  • At S104 d, patches are designated as salient or non salient, depending on whether they are within the pre-segmented region or not. Various methods may be used to determine whether a patch is be considered to be “within” the salient region. In one embodiment, a threshold degree of overlap may be sufficient for a patch to be considered within the salient region. In the exemplary embodiment, the overlap is computed relative to the area of the patch size, e.g., if 50% or more of the patch is within the salient region, then it is accepted as being within it. If the region of interest is too small, relative to the size of the patch (e.g., ROI is less than 70% of the patch area), then the patch will not be considered. In other embodiments, a patch is considered to be within the salient region if its geometric center lies within the salient region. In yet another embodiment, the patch is considered to be within the salient region if it is entirely encompassed by or entirely encompasses the salient region.
  • At S104 e, a high level +ve representation of the salient region of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the salient patches and a high level −ve representation of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the non-salient patches. As noted above, salient patches may be considered to be patches which are at least partially overlapping the salient region 60. These +ve and −ve representations are referred to herein as Fisher FG vector and Fisher BG vector, respectively, even though they do not necessarily correspond to what would be considered as the foreground and background regions of an image.
  • At S104 f, a high level representation of the image is generated, e.g., as a feature vector, e.g., a Fisher vector-based Image Signature, for example, by concatenation or other function of the +ve and −ve high level representations (Fisher FG vector and Fisher BG vector).
  • A similar procedure may be followed for the original image 30, as shown in FIG. 5: At S108 a patches are extracted from the image e.g., at multiple scales.
  • At S108 b, for each patch, low level features are extracted, e.g., as a features vector.
  • At S108 c, for each patch, a representation (e.g., Fisher vector) may be generated, based on the extracted low level features.
  • At S108 d, a high level representation of the image is extracted, based on the patch representations or low level features. In the exemplary embodiment, the high level representation is a vector (e.g., a Fisher vector-based Image Signature) formed by concatenation or other function of the patch level Fisher vectors.
  • While the exemplary embodiment is described herein with respect to Fisher vectors, various methods exist for generation of a high level representation of an image, which may be implemented as an alternative to the high level representation in the exemplary method, e.g., a Bag-of-Visual words (BOV) representation of the image as disclosed, for example, in above-mentioned U.S. Pub. Nos. 2007/0005356; 2007/0258648; 2008/0069456; the disclosures of which are incorporated herein by reference, and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints,” ECCV Workshop on Statistical Learning in Computer Vision (2004); also the method of Y. Liu, D. S. Zhang, G. Lu, W.-Y. Ma, “A survey of content-based image retrieval with high-level semantics,” in Pattern Recognition, 40 (1) (2007); as well as that of F. Perronnin and C. Dance, “Fisher kernel on visual vocabularies for image categorization,” In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Minneapolis, Minn., USA. (June 2007). This last reference and U.S. Pub. No. 2007/0258648 are collectively referred to herein as “Perronnin and Dance” and describe a Fisher kernel (FK) representation based on Fisher vectors, which is similar in many respects to the Fisher Signature described herein.
  • Further details of the steps S104 and S108 now follow.
  • In the exemplary embodiment, multiple patches are extracted from the image (original or dataset image) at various scales (S104 a, S108 a). For each patch, low level features are extracted (S104 b, S108 b). The low level features which are extracted from the patches are typically quantitative values that summarize or characterize aspects of the respective patch, such as spatial frequency content, an average intensity, color characteristics (in the case of color images), gradient values, and/or other characteristic values. In some embodiments, at least about fifty low level features are extracted from each patch; however, the number of features that can be extracted is not limited to any particular number or type of features for example, 1000 or 1 million low level features could be extracted depending on computational capabilities. In the exemplary embodiment, the low level features include local (e.g., pixel) color statistics, and texture. For color statistics, local RGB statistics (e.g., mean and standard deviation) may be computed. For texture, gradient orientations (representing a change in color) may be computed for each patch as a histogram (SIFT-like features). In the exemplary embodiment two (or more) types of low level features, such as color and texture, are separately extracted and the high level representation of the patch or image is based on a combination (e.g., a sum or a concatenation) of two Fisher Vectors, one for each feature type.
  • In other embodiments, Scale Invariant Feature Transform (SIFT) descriptors (as described by Lowe, in “Object Recognition From Local Scale-Invariant Features,” ICCV (International Conference on Computer Vision), 1999, are computed on each patch. SIFT descriptors are multi-image representations of an image neighborhood, such as Gaussian derivatives computed at, for example, eight orientation planes over a four-by-four grid of spatial locations, giving a 128-dimensional vector (that is, 128 features per features vector in these embodiments). Other descriptors or feature extraction algorithms may be employed to extract features from the patches. Examples of some other suitable descriptors are set forth by K. Mikolajczyk and C. Schmid, in “A Performance Evaluation Of Local Descriptors,” Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wis., USA, June 2003, which is incorporated in its entirety by reference.
  • A feature vector can be employed to characterize each patch. The feature vector can be a simple concatenation of the low level features. In the exemplary embodiment, the extracted low level features can be used to generate a high level representation of the patch (e.g., a Fisher vector) (S104 c, S108 c). In this embodiment, a visual vocabulary is built for each feature type using Gaussian Mixture Models. Modeling the visual vocabulary in the feature space with a GMM may be performed according to the method described in F. Perronnin, C. Dance, G. Csurka and M. Bressan, “Adapted Vocabularies for Generic Visual Categorization,” In ECCV (2006).
  • Each patch is then characterized (at S104 c, S108 c) with a gradient vector derived from a generative probability model. In the present case, the visual vocabulary is modeled by a Gaussian mixture model in a low level feature space where each Gaussian corresponds to a visual word. Let λ={wiii,i=1 . . . N} denote the set of parameters of the GMM, where N denotes the number of Gaussians and wi, μi and σi are respectively the weight, mean vector, and variance vector represented by the diagonal covariance matrix Σi of Gaussian i. The GMM vocabulary is trained using maximum likelihood estimation (MLE) considering all or a random subset the low level descriptors extracted from the annotated dataset 22.
  • Given a new low level descriptor xt (such as a color or texture feature vector), the probability that it was generated by the GMM is
  • p ( x t λ ) = i = 1 N w i p i ( x t λ ) , where : p i ( x t ) = exp { - 1 2 ( x t - μ i ) i - 1 ( x t - μ i ) } ( 2 π ) D 2 i 1 2
  • Perronin and Dance show that the partial derivatives of the loglikelihood of log p(xt|λ) according to the GMM parameters can be computed by the following formulas:
  • log p ( x t λ ) μ i d = γ i ( x t ) [ x t d - μ i d ( σ i d ) 2 ] , Eqn . ( 1 ) log p ( x t λ ) σ i d = γ i ( x t ) [ ( x t d - μ i d ) 2 ( σ i d ) 3 - 1 σ i d ] . Eqn . ( 2 )
  • where the superscript d denotes the d-th dimension of a vector and γi(xt) is the occupancy probability given by
  • w i p i ( x t ) j = 1 N w j p j ( x t ) .
  • In the exemplary embodiment, only the gradient with respect to the mean and standard deviation is used as it was shown in Perronnin and Dance that the gradient with respect to the mixture weights does not contain significant information. The Fisher gradient vector ft of the descriptor xt is then just the concatenation of the partial derivatives in Equations (1) and (2), leading to a 2×D×N dimensional vector, where D is the dimension of the low level feature space. While the Fisher vector is high dimensional, it can be made relatively sparse as only a small number of components have non-negligible values. In the following description, the Fisher Vector of a set of descriptors X={xt, t=1 . . . T} is defined as the sum of individual Fisher Vectors:
  • f X = t = 1 T f t Eqn . ( 3 )
  • This vector can be directly derived from the independence assumption:
  • log p ( X λ ) = t = 1 T log p ( x t λ )
  • of the set's log-likelihood and can be interpreted as the direction in which parameters should be modified to best fit the dataset (see Perronnin and Dance for further details).
  • Considering the gradient log-likelihood of each patch with respect to the parameters of the Gaussian Mixture leads to a high level representation of the patch which is referred to as a Fisher vector. The dimensionality of the Fisher vector can be reduced to a fixed value, such as 50 or 100 dimensions, using principal component analysis. In the exemplary embodiment, since there are two vocabularies, the two Fisher vectors are concatenated or otherwise combined to form a single high level representation of the patch having a fixed dimensionality.
  • As will be appreciated, rather than Fisher vectors, other features-based representations can be used to represent each patch, such as a set of features, a two- or more-dimensional array of features, or the like.
  • The high level representation of the original image (Fisher Image Signature) can then be generated from the patch feature vectors (e.g., the patch Fisher vectors) (S104 f, S108 d).
  • In the case of the dataset images, the patches are labeled according to their overlap with the manually designated salient regions. This leads to two sets of low level features X+and X− referring to the set of patches that are considered salient and those which are non-salient. Using equation (3), two Fisher vectors fX+ and fX− are computed. These two vectors are then stored as indexes in the database and are, in the exemplary embodiment, the only required information from the dataset images needed to process a new image.
  • In the exemplary embodiment, each original image 30 and each of the K nearest neighbor images 62 is represented by a high level representation which is simply the concatenation of two Fisher Vectors, one for texture and one for color, each vector formed by averaging the Fisher Vectors of the patches. This single vector is referred to herein as a Fisher image signature. In other embodiments, the patch level Fisher vectors may be otherwise fused, e.g., by concatenation, dot product, or other combination of patch level Fisher vectors to produce an image level Fisher vector.
  • In the exemplary embodiment, initialization proceeds as follows. From each image Id a set of patches P={p1(d), . . . , ps(d) is extracted at multiple scales. Each patch is then labeled as salient ps +(d) or non salient ps (d) according to its position with respect to the annotated region of interest rd (S104 d). For each image in D a pair of signatures <F+(d),F(d)> is created, which is composed, respectively, of the representation of the collection of salient patch descriptors F+(d), respectively, and non-salient patch descriptors F(d). The pair of signatures is stored in the saliency database 22.
  • For the original image, a Fisher image signature FY is computed in an analogous way with respect to the initialization phase, except that all patches of the image are used to compute the signature (S104 d).
  • As will be appreciated, the Fisher image signature is exemplary of types of high level representation which can be used herein. Other image signatures used in the literature for image retrieval may alternatively be used, as discussed above, such as a Bag-of-Visual Words (BOV) representation or Fisher kernel (FK).
  • Retrieval of Similar Images: (S110)
  • Based on the high level representation of the original image, the most similar images are retrieved from the dataset where, for each image, a manually annotated ROI is available, as described above. The K nearest neighbors are identified, based on the distance metric, where K may be, for example, at least 10, and up to about 50 or 100. In general performance is not appreciably improved when K is above about 20-30, so a suitable subset contains about 20-30 images, which may represent, for example, less than 20%, e.g., no more than about 10% of the number of images in the dataset, and in one embodiment, no more than about 1 % or 0.2% thereof.
  • In the exemplary embodiment, the retrieval of a set of K images from D which are visually similar to In generates a list of signatures <FX+,FX−> associated with the K most similar images to In. For example, for each image in the dataset, a distance metric is computed between the global Fisher image signature obtained by summing FX+ and FX− (or other high level image representation) and that of the original image FY. In one embodiment, the K most similar images are retrieved using the Fisher image signature with the normalized L1 distance measure as described, for example, in S. Clinchant, J.-M. Renders and G. Csurka, “Trans-Media Pseudo-Relevance Feedback Methods in Multimedia Retrieval,” Advances in Multilingual and Multimodal Information Retrieval, 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, Sep. 19-21, 2007, LNCS 5152 (2008).
  • As noted above, a set of local image patches are extracted from the original image and for each one, the descriptor set Y=y1,y2, . . . yM and the corresponding Fisher vector fY are computed. To compute the similarities between two images, a normalized L1 measure can be used to retrieve similar images:
  • sim NL 1 ( X , Y ) = - f ^ X - f Y ^ L 1 = - i f ^ X i - f ^ Y i Eqn . ( 4 )
  • where {circumflex over (f)} is the vector f normalized to normalize L1 as equal to 1, {circumflex over (f)}i are the elements of the vector {circumflex over (f)} and fX=fX++fX− (as the set of descriptors in image X is the union of salient and non-salient patches). In the exemplary embodiment, distance measure used is the L1 norm distance between Fisher Image Signatures of each dataset image and the input image. However, other distance measures, such as Euclidian distance, chi2 distance, or the like, may alternatively be used for identifying a subset of similar images from the dataset.
  • Classification: (S112, S114)
  • The classifier 36 is trained using the Fisher Vector representations of image patches extracted from the retrieved K-nearest neighbor images. For the K-nearest neighbor images retrieved, manually annotated salient regions are available in, e.g., the form of bounding boxes. Therefore in each annotated image, the system considers as positive (i.e. salient) patches, the ones inside the annotated bounding box, and as negative (i.e., non-salient) all the others. For each retrieved image Xj, a Foreground Fisher vector (FG signature) fX+ j is/has been computed by averaging the Fisher Vectors of the +ve patches and a Background Fisher Vector (BG signature) fX− j is/has been computed by averaging over the −ve patches. Then, all Fisher vectors representing salient regions are collected (summed) and all Fisher vectors representing non-salient regions are collected (summed) in the K most similar image retrieved images leading to a foreground Fisher model and a background Fisher model:
  • f FG = j = 1 K f X j + and f BG = j = 1 K f X j - ( 5 )
  • In another embodiment, where the aim is context dependent saliency detection, the patches are designated as positives only if they are within the salient regions labeled with the target concept. Otherwise they are considered negatives. Therefore, while in the context-independent case the fX+ j and fX− j need not be recomputed (they correspond to the values in the stored signatures <FX+,FX−>), in the context-dependent case, these values may be re-computed on-line as the set of positive and negative patches may be different (if multiple objects were designed as salient regions in the image and have different labels).
  • In the exemplary embodiment, for each original image patch representation (Fisher vector), a saliency score is computed based on the foreground Fisher model and on the background Fisher model. For example, a patch xi is considered salient, if its normalized L1 distance to the foreground Fisher model is smaller than to the background Fisher model:

  • ∥{circumflex over (f)}x i −{circumflex over (f)}FG i L 1−∥{circumflex over (f)}x i −{circumflex over (f)}BG i L 1<0
  • Such a classifier can be too dependent on a single local patch which makes it locally unstable. Therefore, in order to increase the model's robustness, instead of considering a single patch the Fisher vectors may be averaged over a neighborhood N of patches:
  • f = x i f i Eqn . ( 6 )
  • Furthermore, the binary classifier score may be replaced with a non-binary score which is a simple function of the normalized L1 distances:
  • s ( ) = f ^ - f ^ FG L 1 - f ^ - f ^ BG L 1 Eqn . ( 7 )
  • Finally, to build a “saliency map” S, it could be considered that each pixel in the neighborhood region
    Figure US20100226564A1-20100909-P00001
    takes the value S
    Figure US20100226564A1-20100909-P00002
    =S(
    Figure US20100226564A1-20100909-P00002
    ). However, this may not be a good strategy especially if overlapping regions are considered (see below). Accordingly, the value S
    Figure US20100226564A1-20100909-P00002
    can be assigned to the center pixel of each region
    Figure US20100226564A1-20100909-P00003
    and then either interpolate the values between these centers or use a Gaussian propagation of these values. The latter can be done by averaging over all Gaussian weighted scores:
  • s ( p ) = N s w ( p ) N w ( p ) Eqn . ( 8 )
  • where W
    Figure US20100226564A1-20100909-P00004
    is the value in pixel p of the Gaussian centered in the geometrical center of each the region
    Figure US20100226564A1-20100909-P00003
    . In one embodiment, a diagonal isotropic covariance matrix may be used, with values (0.6*R)2, R2 being the size of
    Figure US20100226564A1-20100909-P00003
    .
  • In the exemplary embodiment the saliency map is built for the original image by considering N such overlapping sub-windows
    Figure US20100226564A1-20100909-P00003
    (shown as 80A,B,C, etc.) of the same size (e.g., 50 pixels*50 pixels) (a few of these windows 80 are illustrated in FIG. 6). The windows may be of the same size or somewhat larger than the smallest patches. A patch is considered to belong to a window if the geometric center of the patch lies within the window. For example, in the case of window 80E, patches 70F and 74 are considered to belong to it. Note that this could be done at the patch level rather than using windows 80. However averaging over several patches gives more stable results.
  • As noted above the window's saliency score is computed based on the distance of the window signature (Eqn. (6) to the Foreground signature (FS) and Background signature (BS), as defined in (Eqn. (5), using the (optionally normalized) L1 distance computed as in Eqn. (7). The scores at the window level are projected to the pixels, as described in (Eqn. 8) above (averaging for each pixel, the window saliency scores of the windows containing that pixel).
  • Equation (8) has a low computational cost but it is also a rather simple evaluation of the saliency score. Alternatively, a patch classifier (not shown) could be used to compute a saliency probability map by using the approach described in Gabriela Csurka and Florent Perronnin, “A Simple High Performance Approach to Semantic Segmentation,” British Machine Vision Conference (BMVC), Leeds, UK (September 2008). The main difference from that described in the reference is that instead of using object class labels, a single classifier is used, which is trained to categorize foreground versus background. Based on the labeled Fisher Vectors of +ve and −ve patches, a patch classifier is trained and the patch probability score for the original image is then propagated from patches to pixels as described in the Csurka and Perronnin reference. In practice, the saliency maps obtained by this type of classifier are not necessarily better than that which uses Eqn. 8.
  • ROIs Adjustment and Selection of a Thumbnail: (S118)
  • The aim of this step is to build one or more thumbnails from the saliency map S. In one embodiment, a bounding box may simply be drawn to encompass all (or substantially all) pixels which exceed a threshold probability score which is then designated as the region of interest.
  • A straightforward option is to binarize S, giving, for example, a value of 0 to non salient pixels and 1 to salient ones. This may be the output of the classifier itself if it has default threshold th=0 that is supposed to discriminate salient values from non-salient ones. However, by increasing this threshold, more importance can be given to the precision, or by decreasing it, to recall. For example, denote the binarized saliency map by sB. Different strategies can be designed to build a thumbnail from this map. One option is to select the bounding box of the biggest or most centered connected component. Another option is to consider all connected components and retarget them into a single region as proposed in V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B. Gooch, “Automatic image retargeting,”. In Mobile and Ubiquitous Multimedia (MUM), 2005. However, a drawback of these simple approaches is that they rely directly on the saliency map, which by its construction is rather smooth and does not take into account the contours of the contained object. Depending on the selected threshold, this may lead either to sectioning the object of interest or leading to a thumbnail significantly larger than necessary.
  • In other embodiments, refinement techniques may be applied to define an ROI based on the salient pixels which takes further considerations into account (S118). The role of this step is to enhance the precision. In general, the salient regions correspond to isolated objects. Therefore, regions classified as salient can be further refined by taking into account edge constraints.
  • In one embodiment, at S118, a Graph-Cut segmentation may be used to adjust the borders of the salient region. This approach assumes that the estimated region contains a consistent part of the relevant objects. One suitable method is based on the Graph-Cut algorithms described in Rother, C., Kolmogorov, V., and Blake, A., “Grabcut: Interactive foreground extraction using iterated graph cuts,” In ACM Trans. Graphics (SIGGRAPH 2004) 23(3), 309-314 (2004).
  • In this approach, the problem of segmentation is formulated in terms of energy minimization (i.e., max-flow/min-cut). The image is represented as graph in which each pixel is a node and the edges can represent color similarity between adjacent pixels as in a Markov Random Field. In addition, two extra nodes (starting and ending nodes) are added to the graph and linked to each pixel based on the probability that the pixel belongs to background or foreground.
  • In one embodiment, for initializing the Graph-Cut algorithm, the saliency map generated at S116 is used to build an initial Graph-Cut model. In particular, a first Gaussian Mixture Model (GMM) is created for the foreground colors and a second GMM is created for the background colors. Then the algorithm iterates between Graph-Cut binary labeling and GMM updating as in Rother, et al. FIG. 3 shows an example graph-cut mask 58 created from the ROI mask 56 generated at S116.
  • For example, in the exemplary embodiment, the graph-cut method is performed as follows: First, two thresholds are chosen (one positive th+ and one negative th−). This separates the saliency map S into 3 different regions: pixels u labeled as salient (S(u)>th+), pixels labeled as non-salient (S(u)<th−) and unknown (the others). Two Gaussian Mixture Models (GMMs) Ω1 and Ω2 are created, one using RGB values of salient (foreground) pixels and one using RGB values of non salient (background) pixels. Then the following energy:
  • E ( L ) = u P D u ( u ) + ( u , v ) c V u , v ( u , v ) Eqn . ( 9 )
  • where the data penalty function Du(u)=−log p(u|lu, Ωk u ) is the negative log likelihood that the pixel u belongs to the GMM Ωl u , with lu ∈ 0,1 and the contrast term:
  • V u , v ( u , v ) = γ ( u , v ) C δ l u , l v exp ( - u - v 2 2 * β ) Eqn . ( 10 )
  • With δl u ,l=1 if lu=lv, C representing 4-way cliques, and β=E(∥u−v∥2), as described in Rother, et al. The energy can be minimized using the min-cut/max-flow algorithms proposed in Y. Boykov and V. Kolmogorov. “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI, 26, 2004 leading to a binary labeling of the image. Using the new labels, we update (adapt) the two GMM parameters and similarly to Rother, et al. iterate between energy minimization and GMM updates. No modifications are made to the binary labels. This binary map can be considered as a new saliency map, denoted by SG.
  • This method works in most cases. In cases where the method does not work effectively, such as where there are similar colors in the foreground and background regions, the Graph-Cut method can be replaced by an alternative method. Detection of cases not suited to graph-cut processing can be automatically detected and the Graph-Cut regions rejected if any of the following is found:
  • 1. All pixels in the image are labeled with the same label.
  • 2. The positively labeled area after Graph-Cut is too small, compared with the size of the original image, e.g., less than 5% or less than 10% of its size.
  • 3. There is a too great a divergence between the initialization (binarized Saliency Map 56) and the output of the Graph-Cut 58 (for example, the Graph-Cut region is greater than twice the size or less than 10% of the size of the ROI generated by the saliency map. Where the Graph-Cut results are rejected, the output of step S116, i.e., the binarized Saliency Map 56 is used for identifying an ROI.
  • This can be expressed more generally by the equation
  • S * = { S G if S B S G S B S G > th d S B otherwise with 0 < th d < 1 ( for example , th d = 0.1 ) .
  • When SG is computed the only information used about the saliency is the initialization of the two GMM. Therefore, if there is an important divergence between SG and SB, the initial SB map is more trustworthy.
  • At S120, the ROI may be generated, for example, from the saliency map 58 (or 56) by processing the map in order to find the biggest, most centered object based on an analysis of statistics of the saliency map distribution (e.g., center of mass of the distribution, cumulative probability etc.). Alternatively, all the detected salient regions and retarget them into a single thumbnail. A rectangular crop (image thumbnail) 90 can then be generated, based on this salient region.
  • The method illustrated in FIGS. 2, 4, and 5 may be implemented in a computer program product that may be executed on a computer. The computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or may be a transmittable carrier wave in which the control program is embodied as a data signal. Common forms of computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like, or any other medium from which a computer can read and use.
  • The exemplary method thus described may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like. In general, any device, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIGS. 2, 4, and 5, can be used to implement the automated method for identifying a region of interest in an image.
  • Applications of the Method and Apparatus
  • The exemplary embodiment finds application in a variety of contexts. For example, variable data applications such as 1 to 1 personalization and direct mail marketing often employ an image. By automated selection of a region of interest 32 using the exemplary method, a document 42 can be created incorporating an appropriately sized crop 90 which incorporates the salient region. In one embodiment, the human observers used to annotate the salient regions of the images in the dataset 22 can be selected to represent the target audience. Or for example, two or more sets of annotators may be used, e.g., one group comprising only females, the other, only males, and separate sets of image signatures stored for each group. Thus, the K nearest neighbors may be different, depending on which set of signatures is used.
  • Variable data printing is not the only application of the exemplary system and apparatus. Other applications, such as image and document asset management or document image/photograph set visualization, and the like can also benefit. For example, a crop 90 of the original image, based on the salient region, can be used for a thumbnail which is displayed in place of the original image, allowing a user to select images of interest from a large group of images, based on the interesting parts.
  • In another embodiment, the thumbnail (crop) 90 can be fed to a categorizer 44 for categorizing the image based on image content. Here the categorizer is not confused by including areas of the image which are less likely to be of visual interest. In one embodiment, illustrated in FIG. 7, the image crop 90 is fed to a categorizer, which has been trained with training image crops 94, generated in the same way, but which has been annotated with a respective class (e.g., dogs, cats, flowers in the exemplary embodiment). The categorizer (which may incorporate a multiclass classifier or a set of binary classifiers, one for each object class) outputs a class 96 for the crop, based on a similarity of features of the image crop to those of the training images.
  • It has been shown that extracting image features only around ROIs or on segmented foreground gives better results than sampling features uniformly through the image.
  • Without intending to limit the scope of the exemplary embodiment, the following example compares results obtained with the exemplary apparatus described herein with comparative saliency detection methods.
  • EXAMPLE
  • The exemplary method is evaluated by comparing the results with those of four comparative methods for saliency detection:
  • Method A: Exemplary method without Graph-cut.
  • Method B: Exemplary method using Graph-cut, as described above.
  • Method C: based on above-mentioned U.S. patent application Ser. No. 12/250,248. This method generates saliency maps by linearly combining the bounding boxes of the K (with K=50) nearest images in the dataset, given the input image.
  • Method D: (ITTI): A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention. In the Examples, a Matlab implementation available at http://www.saliencytoolbox.net/ was employed.
  • Method E: (SR): This method is described in X. Hou, L. Zhang, “Saliency Detection: A Spectral Residual Approach,” CVPR, 2007, hereinafter “Hou, et al.” It is based on the analysis of the spectral residual of an image in the spectral domain. In these Examples a Matlab implementation available at http://bcmi.sjtu.edu.cn/˜houxiaodi was employed.
  • Method F: (CRF): A learning method (Liu, et al.), based on a Conditional Random Field classifier.
  • Part of the dataset described in Liu, et al. (MRSA Dataset) was used to train and test the exemplary method. The dataset was composed of 5000 images labeled by different users with no specific skills in graphic design. The dataset included images of a variety of different subjects. In general, a single object is present in the image with a broad range of backgrounds with fairly homogeneous color or texture. The salient region detector was configured to retrieve the K most similar images (with K=50).
  • Ground truth data comprising manually annotated regions of interest generated by different users is also available. The users manually selected a rectangle (bounding box) containing the region of interest, which is typically represented by a full object or, in some cases by a subpart of the object (e.g., face). The 5000 images from the MRSA Dataset used in this example had bounding boxes annotated by nine users. The annotations are highly consistent with a very small variance over the nine bounding boxes. On average, the bounding boxes represent approximately 35% of the total area of the image, but this varies over a fairly wide distribution. Moreover the distance of the center of mass of the object from the center of the image is, on average, 42 pixels. Again the annotated dataset showed a distribution.
  • For each image in the dataset, a ground truth saliency map g(x,y) has been generated to evaluate the results based on user annotations (bounding boxes containing salient regions). In particular, since the annotations for MRSA are highly consistent, an average of the nine bounding boxes of the various users was used. Maps g(x,y) were generated, with rectangular salient regions pixels set to 1 and 0 otherwise.
  • Performance was evaluated by providing benchmarks for the performances using the following measures: BDE (See, D. R. Martin, C. C. Fowkles and J. Malik, “Learning to detect natural image boundaries using local brightness, color and texture cues,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI 26(5) pp. 530-549 (May 2004)) was used for assessing the displacement of the bounding boxes (FIG. 10) and Precision, Recall and F-measure to acessess the quality of the saliency map. In particular Precision (Pr), Recall (Re) and F-measure (Fα) can be defined according to Liu, et al., as follows:
  • Pr = 1 I i x , y s i ( x , y ) · g i ( x , y ) x , y s i ( x , y ) , Re = 1 I i x , y s i ( x , y ) · g i ( x , y ) x , y g i ( x , y ) , F α = ( 1 + α ) · Pr · Re ( α · Pr ) + Re
  • The F-measure is the weighted harmonic mean of precision and recall, with α=0.5 (thereby adding more importance to the precision than to the recall as in Liu, et al. If both precision and recall are zero, Fα is set to zero.
  • In the Examples, some of the above mentioned methods (B-E) were tuned by selecting a specific threshold on the maps in order to maximize the F-measure of each one. The behavior of the F-measure as a function of the threshold on the map is shown in FIG. 8. As seen in FIG. 8, the exemplary method (A and B) can be seen to give a better result than Methods C and E. Further, FIG. 8 shows the improvement that the Graph-Cut stage (Method B) introduces in the proposed method, increasing the F-measure of almost 10% as compared with Method A (without Graph-Cut). For Methods D and F, the thresholding was not applied because the results were taken directly from the Hou, et al. paper.
  • FIG. 9 shows the thresholds selected for the Methods compared.
  • All the above mentioned Methods are compared in FIG. 10, where the results obtained in the experiment are shown in more detail. For each method considered the precision, recall and F-measure is given considering their best parameter setting. The CRF and ITTI results have been reported from the cited Hou, et al paper.
  • FIG. 10 shows the Bounding Box displacement index. It represents the average distance, in pixels, of the center of the automatically detected Bounding Box from the center of the ground truth Bounding Box. The smaller this value the more accurate is the bounding box detected. As can be seen, the exemplary method using Graph-Cut (Method B) gave the best results.
  • It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

Claims (25)

1. A method for detecting a region of interest in an image: comprising:
for each image in a dataset of images for which a region of interest has been established respectively, storing a dataset image representation based on features extracted from the image;
for an original image for which a region of interest is to be detected:
generating an original image representation for the original image based on features extracted from the image;
identifying a subset of similar images from the images in the dataset, the identified subset being based on a measure of similarity between the original image representation and respective dataset image representations;
training a classifier with information extracted from the established regions of interest of the subset of similar images;
with the trained classifier, identifying a region of interest in the original image.
2. The method of claim 1, wherein each dataset image representation is based on features of patches extracted from the dataset image and wherein the original image representation is based on features of patches extracted from the original image.
3. The method of claim 2, wherein for each patch, a vector is generated, based on the extracted features.
4. The method of claim 3, wherein the vector comprises a Fisher vector.
5. The method of claim 3, wherein a plurality of types of features is extracted and wherein for each patch, one vector is generated for each of a type of feature extracted.
6. The method of claim 1, wherein the extracted features are selected from the group consisting of color features, texture features, and combinations thereof.
7. The method of claim 1, wherein the established regions of interest are generated from salient regions identified by a set of human observers.
8. The method of claim 1, wherein patches of the dataset images are identified as salient or non-salient, based on whether they are within the established region of interest or not.
9. The method of claim 8, wherein the dataset image representations are each derived from a +ve and a −ve high level representation of the image, the +ve high level representation being based on features of salient patches from and the −ve high level representation is based on features of non-salient patches.
10. The method of claim 9, wherein the classifier is trained with the +ve high level representations and the −ve high level representations of the similar images.
11. The method of claim 1, wherein the information for training the classifier comprises at least one of:
positive examples, comprising information extracted from patches of the identified similar images that are within the established region of interest, and
negative examples comprising information extracted from patches of the identified similar images that are not within the established region of interest.
12. The method of claim 1, wherein the identifying a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation comprises computing a distance metric between the original image and images in the dataset.
13. The method of claim 1, wherein the identified region of interest in the original image is processed for more precisely identifying the region of interest.
14. The method of claim 1, wherein the identified region of interest in the original image is processed with a graph-cut technique.
15. The method of claim 1, further comprising generating a crop based on the region of interest which removes image data from the original image outside a crop area.
16. The method of claim 1, wherein, for each of a plurality of images in the dataset of images, a semantic label is associated with the established region of interest, the semantic label being selected from a set of semantic labels, each relating to a different context, and wherein where a concept is specified, the identifying of the subset of similar images from the images in the dataset considers the semantic labels of the images in selecting a subset of similar images.
17. The method of claim 1, wherein the identifying a region of interest in the original image comprises:
outputting a saliency map from the classifier in which patches of the image at multiple scales are each assigned a saliency value;
partitioning the image into a set of overlapping windows;
assigning each window a saliency score based on the saliency values of patches within the window;
assigning each pixel of the image a saliency value based on the saliency values of windows in which the pixel is located.
18. The method of claim 1, further comprising, outputting the identified region of interest in the original image.
19. The method of claim 18, wherein the outputting includes outputting a crop of the original image based on the identified region of interest.
20. The method of claim 19, further comprising, inputting the crop into a categorizer which has been trained on annotated image crops to identify a class for the original image.
21. A computer program product encoding instructions, which when executed on a computer causes the computer to perform the method of claim 1.
22. An apparatus for detecting a region of interest in an image comprising:
memory which stores:
at least one of a) the dataset image representations and b) feature vectors of patches of the dataset images from which the image representations are able to be generated, and
instructions for performing the method of claim 1; and
a processor with access to the instructions and dataset image representations which executes the instructions.
23. An apparatus for detecting a region of interest in an image comprising:
memory which, for a dataset of images for which a respective region of interest has been established, stores a set of dataset image representations, each dataset image representation being derived from features extracted from a respective one of the images in the dataset;
memory which stores instructions which, for an original image for which a region of interest is to be detected:
generate an original image representation for the original image based on features extracted from the original image;
identify a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation;
train a classifier to identify a region of interest in the original image, the classifier being trained with positive and negative examples, each of the positive examples comprising a high level representation based on features extracted from the established region of interest of a respective one of the subset of similar images and each of the negative examples comprising a high level representation based on features extracted from outside the established region of interest of a respective one of the subset of similar images.
24. A method for detecting a region of interest in an image comprising:
storing a set of image representations, each image representation being based on features extracted from patches of a dataset image, where for each dataset image, the patch features are identified as salient or non-salient based on whether or not the patch is within a manually identified region of interest; and
for an original image for which a region of interest is to be detected:
generating an original image representation for the original image based on features extracted from patches of the image;
computing a distance measure between the original image representation and image representations in the set of image representations to identify a subset of similar image representations from the set of image representations;
training a classifier with positive and negative examples extracted from the images corresponding to subset of similar image representations, the positive examples each being based on the salient patch features of a respective image and the negative examples being based on non-salient patch features of the respective image; and
with the trained classifier, identifying a region of interest in the original image based on the patch features of the original image.
25. The method of claim 24, wherein the patch features are represented by Fisher vectors.
US12/400,277 2009-03-09 2009-03-09 Framework for image thumbnailing based on visual similarity Active 2031-01-20 US8175376B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/400,277 US8175376B2 (en) 2009-03-09 2009-03-09 Framework for image thumbnailing based on visual similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/400,277 US8175376B2 (en) 2009-03-09 2009-03-09 Framework for image thumbnailing based on visual similarity

Publications (2)

Publication Number Publication Date
US20100226564A1 true US20100226564A1 (en) 2010-09-09
US8175376B2 US8175376B2 (en) 2012-05-08

Family

ID=42678297

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/400,277 Active 2031-01-20 US8175376B2 (en) 2009-03-09 2009-03-09 Framework for image thumbnailing based on visual similarity

Country Status (1)

Country Link
US (1) US8175376B2 (en)

Cited By (192)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100215098A1 (en) * 2009-02-23 2010-08-26 Mondo Systems, Inc. Apparatus and method for compressing pictures with roi-dependent compression parameters
US20100281361A1 (en) * 2009-04-30 2010-11-04 Xerox Corporation Automated method for alignment of document objects
US20110164815A1 (en) * 2009-11-17 2011-07-07 Samsung Electronics Co., Ltd. Method, device and system for content based image categorization field
US20120027309A1 (en) * 2009-04-14 2012-02-02 Nec Corporation Image signature extraction device
US8175376B2 (en) * 2009-03-09 2012-05-08 Xerox Corporation Framework for image thumbnailing based on visual similarity
WO2012138299A1 (en) * 2011-04-08 2012-10-11 Creative Technology Ltd A method, system and electronic device for at least one of efficient graphic processing and salient based learning
CN102800092A (en) * 2012-07-12 2012-11-28 北方工业大学 Point-to-surface image significance detection
US20120328150A1 (en) * 2011-03-22 2012-12-27 Rochester Institute Of Technology Methods for assisting with object recognition in image sequences and devices thereof
US20130038632A1 (en) * 2011-08-12 2013-02-14 Marcus W. Dillavou System and method for image registration of multiple video streams
US8379981B1 (en) 2011-08-26 2013-02-19 Toyota Motor Engineering & Manufacturing North America, Inc. Segmenting spatiotemporal data based on user gaze data
CN102945378A (en) * 2012-10-23 2013-02-27 西北工业大学 Method for detecting potential target regions of remote sensing image on basis of monitoring method
CN103020993A (en) * 2012-11-28 2013-04-03 杭州电子科技大学 Visual saliency detection method by fusing dual-channel color contrasts
EP2579211A2 (en) 2011-10-03 2013-04-10 Xerox Corporation Graph-based segmentation integrating visible and NIR information
US20130091515A1 (en) * 2011-02-04 2013-04-11 Kotaro Sakata Degree of interest estimating device and degree of interest estimating method
US20130120454A1 (en) * 2009-09-18 2013-05-16 Elya Shechtman Methods and Apparatuses for Generating Thumbnail Summaries for Image Collections
US20130148880A1 (en) * 2011-12-08 2013-06-13 Yahoo! Inc. Image Cropping Using Supervised Learning
US20130148910A1 (en) * 2011-12-12 2013-06-13 Canon Kabushiki Kaisha Method, apparatus and system for identifying distracting elements in an image
CN103198319A (en) * 2013-04-11 2013-07-10 武汉大学 Method of extraction of corner of blurred image in mine shaft environment
US8487959B1 (en) * 2010-08-06 2013-07-16 Google Inc. Generating simulated eye movement traces for visual displays
US8532387B2 (en) 2009-09-04 2013-09-10 Adobe Systems Incorporated Methods and apparatus for procedural directional texture generation
US8560517B2 (en) 2011-07-05 2013-10-15 Microsoft Corporation Object retrieval using visual query context
US8570339B2 (en) 2011-05-26 2013-10-29 Xerox Corporation Modifying color adjustment choices based on image characteristics in an image editing system
US8577182B1 (en) 2010-07-13 2013-11-05 Google Inc. Method and system for automatically cropping images
US20130307762A1 (en) * 2012-05-17 2013-11-21 Nokia Corporation Method and apparatus for attracting a user's gaze to information in a non-intrusive manner
EP2674881A1 (en) 2012-06-15 2013-12-18 Xerox Corporation Privacy preserving method for querying a remote public service
US8619098B2 (en) * 2009-09-18 2013-12-31 Adobe Systems Incorporated Methods and apparatuses for generating co-salient thumbnails for digital images
US8660351B2 (en) * 2011-10-24 2014-02-25 Hewlett-Packard Development Company, L.P. Auto-cropping images using saliency maps
US8675966B2 (en) 2011-09-29 2014-03-18 Hewlett-Packard Development Company, L.P. System and method for saliency map generation
CN103678552A (en) * 2013-12-05 2014-03-26 武汉大学 Remote-sensing image retrieving method and system based on salient regional features
US20140122531A1 (en) * 2012-11-01 2014-05-01 Google Inc. Image comparison process
US20140126782A1 (en) * 2012-11-02 2014-05-08 Sony Corporation Image display apparatus, image display method, and computer program
WO2014092548A1 (en) * 2012-12-13 2014-06-19 Mimos Berhad A method and system for identifying multiple entities in images
US8774517B1 (en) * 2007-06-14 2014-07-08 Hrl Laboratories, Llc System for identifying regions of interest in visual imagery
CN103927758A (en) * 2014-04-30 2014-07-16 重庆大学 Saliency detection method based on contrast ratio and minimum convex hull of angular point
US20140250110A1 (en) * 2011-11-25 2014-09-04 Linjun Yang Image attractiveness based indexing and searching
US20140270350A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Data driven localization using task-dependent representations
US8861868B2 (en) 2011-08-29 2014-10-14 Adobe-Systems Incorporated Patch-based synthesis techniques
EP2790135A1 (en) 2013-03-04 2014-10-15 Xerox Corporation System and method for highlighting barriers to reducing paper usage
US8867829B2 (en) 2011-05-26 2014-10-21 Xerox Corporation Method and apparatus for editing color characteristics of electronic image
US8873812B2 (en) 2012-08-06 2014-10-28 Xerox Corporation Image segmentation using hierarchical unsupervised segmentation and hierarchical classifiers
US8879796B2 (en) 2012-08-23 2014-11-04 Xerox Corporation Region refocusing for data-driven object localization
US8892562B2 (en) 2012-07-26 2014-11-18 Xerox Corporation Categorization of multi-page documents by anisotropic diffusion
US8917910B2 (en) 2012-01-16 2014-12-23 Xerox Corporation Image segmentation based on approximation of segmentation similarity
US20140376819A1 (en) * 2013-06-21 2014-12-25 Microsoft Corporation Image recognition by image search
US9008429B2 (en) 2013-02-01 2015-04-14 Xerox Corporation Label-embedding for text recognition
EP2863338A2 (en) 2013-10-16 2015-04-22 Xerox Corporation Delayed vehicle identification for privacy enforcement
US20150131899A1 (en) * 2013-11-13 2015-05-14 Canon Kabushiki Kaisha Devices, systems, and methods for learning a discriminant image representation
US20150134688A1 (en) * 2013-11-12 2015-05-14 Pinterest, Inc. Image based search
US20150130838A1 (en) * 2013-11-13 2015-05-14 Sony Corporation Display control device, display control method, and program
US9058611B2 (en) 2011-03-17 2015-06-16 Xerox Corporation System and method for advertising using image search and classification
US20150169982A1 (en) * 2013-12-17 2015-06-18 Canon Kabushiki Kaisha Observer Preference Model
US20150178587A1 (en) * 2012-06-18 2015-06-25 Thomson Licensing Device and a method for color harmonization of an image
US9070182B1 (en) 2010-07-13 2015-06-30 Google Inc. Method and system for automatically cropping images
US9075824B2 (en) 2012-04-27 2015-07-07 Xerox Corporation Retrieval system and method leveraging category-level labels
US9082047B2 (en) 2013-08-20 2015-07-14 Xerox Corporation Learning beautiful and ugly visual attributes
US9104946B2 (en) 2012-10-15 2015-08-11 Canon Kabushiki Kaisha Systems and methods for comparing images
US20150227784A1 (en) * 2014-02-07 2015-08-13 Tata Consultancy Services Limited Object detection system and method
EP2916265A1 (en) 2014-03-03 2015-09-09 Xerox Corporation Self-learning object detectors for unlabeled videos using multi-task learning
US20150262039A1 (en) * 2014-03-13 2015-09-17 Omron Corporation Image processing apparatus and image processing method
US20150294181A1 (en) * 2014-04-15 2015-10-15 Canon Kabushiki Kaisha Object detection apparatus object detection method and storage medium
US20150332605A1 (en) * 2014-05-19 2015-11-19 Thomson Licensing Method for harmonizing colors, corresponding computer program and device
DE102011113154B4 (en) * 2011-09-14 2015-12-03 Airbus Defence and Space GmbH Machine learning method for machine learning of manifestations of objects in images
US9229956B2 (en) 2011-01-10 2016-01-05 Microsoft Technology Licensing, Llc Image retrieval using discriminative visual features
US20160019440A1 (en) * 2014-07-18 2016-01-21 Adobe Systems Incorporated Feature Interpolation
GB2529888A (en) * 2014-09-05 2016-03-09 Apical Ltd A method of image anaysis
US20160104031A1 (en) * 2014-10-14 2016-04-14 Microsoft Technology Licensing, Llc Depth from time of flight camera
CN105513080A (en) * 2015-12-21 2016-04-20 南京邮电大学 Infrared image target salience evaluating method
US9367763B1 (en) 2015-01-12 2016-06-14 Xerox Corporation Privacy-preserving text to image matching
US20160171299A1 (en) * 2014-12-11 2016-06-16 Samsung Electronics Co., Ltd. Apparatus and method for computer aided diagnosis (cad) based on eye movement
US9384423B2 (en) 2013-05-28 2016-07-05 Xerox Corporation System and method for OCR output verification
US20160196662A1 (en) * 2013-08-16 2016-07-07 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for manufacturing virtual fitting model image
CN105760886A (en) * 2016-02-23 2016-07-13 北京联合大学 Image scene multi-object segmentation method based on target identification and saliency detection
EP3048561A1 (en) 2015-01-21 2016-07-27 Xerox Corporation Method and system to perform text-to-image queries with wildcards
US9443164B2 (en) 2014-12-02 2016-09-13 Xerox Corporation System and method for product identification
US9471828B2 (en) 2014-07-28 2016-10-18 Adobe Systems Incorporated Accelerating object detection
US20160360267A1 (en) * 2014-01-14 2016-12-08 Alcatel Lucent Process for increasing the quality of experience for users that watch on their terminals a high definition video stream
US20170046621A1 (en) * 2014-04-30 2017-02-16 Siemens Healthcare Diagnostics Inc. Method and apparatus for performing block retrieval on block to be processed of urine sediment image
US20170060812A1 (en) * 2015-08-31 2017-03-02 Qualtrics, Llc Presenting views of an electronic document
US9600738B2 (en) 2015-04-07 2017-03-21 Xerox Corporation Discriminative embedding of local color names for object retrieval and classification
US9613273B2 (en) * 2015-05-19 2017-04-04 Toyota Motor Engineering & Manufacturing North America, Inc. Apparatus and method for object tracking
US9639806B2 (en) 2014-04-15 2017-05-02 Xerox Corporation System and method for predicting iconicity of an image
CN106780430A (en) * 2016-11-17 2017-05-31 大连理工大学 A kind of image significance detection method based on surroundedness and Markov model
US9697439B2 (en) 2014-10-02 2017-07-04 Xerox Corporation Efficient object detection with patch-level window processing
US9740949B1 (en) 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
US9779284B2 (en) 2013-12-17 2017-10-03 Conduent Business Services, Llc Privacy-preserving evidence in ALPR applications
US9778351B1 (en) 2007-10-04 2017-10-03 Hrl Laboratories, Llc System for surveillance by integrating radar with a panoramic staring sensor
US9830529B2 (en) 2016-04-26 2017-11-28 Xerox Corporation End-to-end saliency mapping via probability distribution prediction
US9928532B2 (en) 2014-03-04 2018-03-27 Daniel Torres Image based search engine
US9940750B2 (en) 2013-06-27 2018-04-10 Help Lighting, Inc. System and method for role negotiation in multi-reality environments
US9952594B1 (en) 2017-04-07 2018-04-24 TuSimple System and method for traffic data collection using unmanned aerial vehicles (UAVs)
US9953236B1 (en) 2017-03-10 2018-04-24 TuSimple System and method for semantic segmentation using dense upsampling convolution (DUC)
US9959629B2 (en) 2012-05-21 2018-05-01 Help Lighting, Inc. System and method for managing spatiotemporal uncertainty
US10007679B2 (en) 2008-08-08 2018-06-26 The Research Foundation For The State University Of New York Enhanced max margin learning on multimodal data mining in a multimedia database
US10067509B1 (en) 2017-03-10 2018-09-04 TuSimple System and method for occluding contour detection
CN108898136A (en) * 2018-07-04 2018-11-27 安徽大学 A kind of cross-module state image significance detection method
US10147193B2 (en) 2017-03-10 2018-12-04 TuSimple System and method for semantic segmentation using hybrid dilated convolution (HDC)
US20190005659A1 (en) * 2014-09-19 2019-01-03 Brain Corporation Salient features tracking apparatus and methods using visual initialization
US10269055B2 (en) 2015-05-12 2019-04-23 Pinterest, Inc. Matching user provided representations of items with sellers of those items
US10303522B2 (en) 2017-07-01 2019-05-28 TuSimple System and method for distributed graphics processing unit (GPU) computation
US10303956B2 (en) 2017-08-23 2019-05-28 TuSimple System and method for using triplet loss for proposal free instance-wise semantic segmentation for lane detection
US10308242B2 (en) 2017-07-01 2019-06-04 TuSimple System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles
US10311312B2 (en) 2017-08-31 2019-06-04 TuSimple System and method for vehicle occlusion detection
US10360257B2 (en) 2017-08-08 2019-07-23 TuSimple System and method for image annotation
US10387736B2 (en) 2017-09-20 2019-08-20 TuSimple System and method for detecting taillight signals of a vehicle
US10410055B2 (en) 2017-10-05 2019-09-10 TuSimple System and method for aerial video traffic analysis
CN110377204A (en) * 2019-06-30 2019-10-25 华为技术有限公司 A kind of method and electronic equipment generating user's head portrait
US10474790B2 (en) 2017-06-02 2019-11-12 TuSimple Large scale distributed simulation for realistic multiple-agent interactive environments
US10471963B2 (en) 2017-04-07 2019-11-12 TuSimple System and method for transitioning between an autonomous and manual driving mode based on detection of a drivers capacity to control a vehicle
WO2019217562A1 (en) * 2018-05-09 2019-11-14 Figure Eight Technologies, Inc. Aggregated image annotation
US10481044B2 (en) 2017-05-18 2019-11-19 TuSimple Perception simulation for improved autonomous vehicle control
US10493988B2 (en) 2017-07-01 2019-12-03 TuSimple System and method for adaptive cruise control for defensive driving
US10521503B2 (en) 2016-09-23 2019-12-31 Qualtrics, Llc Authenticating a respondent to an electronic survey
US10528823B2 (en) 2017-11-27 2020-01-07 TuSimple System and method for large-scale lane marking detection using multimodal sensor data
US10528851B2 (en) 2017-11-27 2020-01-07 TuSimple System and method for drivable road surface representation generation using multimodal sensor data
US10552979B2 (en) 2017-09-13 2020-02-04 TuSimple Output of a neural network method for deep odometry assisted by static scene optical flow
US10552691B2 (en) 2017-04-25 2020-02-04 TuSimple System and method for vehicle position and velocity estimation based on camera and lidar data
US10558864B2 (en) 2017-05-18 2020-02-11 TuSimple System and method for image localization based on semantic segmentation
US10573044B2 (en) * 2017-11-09 2020-02-25 Adobe Inc. Saliency-based collage generation using digital images
US10607109B2 (en) * 2016-11-16 2020-03-31 Samsung Electronics Co., Ltd. Method and apparatus to perform material recognition and training for material recognition
US10607111B2 (en) * 2018-02-06 2020-03-31 Hrl Laboratories, Llc Machine vision system for recognizing novel objects
US20200128145A1 (en) * 2015-02-13 2020-04-23 Smugmug, Inc. System and method for photo subject display optimization
CN111071152A (en) * 2018-10-19 2020-04-28 图森有限公司 Fisheye image processing system and method
US10649458B2 (en) 2017-09-07 2020-05-12 Tusimple, Inc. Data-driven prediction-based system and method for trajectory planning of autonomous vehicles
US10657390B2 (en) 2017-11-27 2020-05-19 Tusimple, Inc. System and method for large-scale lane marking detection using multimodal sensor data
US10656644B2 (en) 2017-09-07 2020-05-19 Tusimple, Inc. System and method for using human driving patterns to manage speed control for autonomous vehicles
US10666730B2 (en) 2017-10-28 2020-05-26 Tusimple, Inc. Storage architecture for heterogeneous multimedia data
US10671083B2 (en) 2017-09-13 2020-06-02 Tusimple, Inc. Neural network architecture system for deep odometry assisted by static scene optical flow
US10671873B2 (en) 2017-03-10 2020-06-02 Tusimple, Inc. System and method for vehicle wheel detection
US10679269B2 (en) 2015-05-12 2020-06-09 Pinterest, Inc. Item selling on multiple web sites
US10678234B2 (en) 2017-08-24 2020-06-09 Tusimple, Inc. System and method for autonomous vehicle control to minimize energy cost
US10685239B2 (en) 2018-03-18 2020-06-16 Tusimple, Inc. System and method for lateral vehicle detection
US10685244B2 (en) 2018-02-27 2020-06-16 Tusimple, Inc. System and method for online real-time multi-object tracking
US10706549B2 (en) * 2016-12-20 2020-07-07 Kodak Alaris Inc. Iterative method for salient foreground detection and multi-object segmentation
US10706735B2 (en) 2016-10-31 2020-07-07 Qualtrics, Llc Guiding creation of an electronic survey
US10710592B2 (en) 2017-04-07 2020-07-14 Tusimple, Inc. System and method for path planning of autonomous vehicles based on gradient
US10733465B2 (en) 2017-09-20 2020-08-04 Tusimple, Inc. System and method for vehicle taillight state recognition
US10739775B2 (en) 2017-10-28 2020-08-11 Tusimple, Inc. System and method for real world autonomous vehicle trajectory simulation
US10737695B2 (en) 2017-07-01 2020-08-11 Tusimple, Inc. System and method for adaptive cruise control for low speed following
US10752246B2 (en) 2017-07-01 2020-08-25 Tusimple, Inc. System and method for adaptive cruise control with proximate vehicle detection
US10762635B2 (en) 2017-06-14 2020-09-01 Tusimple, Inc. System and method for actively selecting and labeling images for semantic segmentation
US10762673B2 (en) 2017-08-23 2020-09-01 Tusimple, Inc. 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map
US10768626B2 (en) 2017-09-30 2020-09-08 Tusimple, Inc. System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles
CN111666439A (en) * 2020-05-28 2020-09-15 重庆渝抗医药科技有限公司 Working method for rapidly extracting and dividing medical image big data aiming at cloud environment
US10783381B2 (en) 2017-08-31 2020-09-22 Tusimple, Inc. System and method for vehicle occlusion detection
US10782694B2 (en) 2017-09-07 2020-09-22 Tusimple, Inc. Prediction-based system and method for trajectory planning of autonomous vehicles
US10782693B2 (en) 2017-09-07 2020-09-22 Tusimple, Inc. Prediction-based system and method for trajectory planning of autonomous vehicles
US10812589B2 (en) 2017-10-28 2020-10-20 Tusimple, Inc. Storage architecture for heterogeneous multimedia data
US10816354B2 (en) 2017-08-22 2020-10-27 Tusimple, Inc. Verification module system and method for motion-based lane detection with multiple sensors
CN111936989A (en) * 2018-03-29 2020-11-13 谷歌有限责任公司 Similar medical image search
US10839234B2 (en) 2018-09-12 2020-11-17 Tusimple, Inc. System and method for three-dimensional (3D) object detection
US10860018B2 (en) 2017-11-30 2020-12-08 Tusimple, Inc. System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners
US10877476B2 (en) 2017-11-30 2020-12-29 Tusimple, Inc. Autonomous vehicle simulation system for analyzing motion planners
CN112329810A (en) * 2020-09-28 2021-02-05 北京师范大学 Image recognition model training method and device based on saliency detection
US10943146B2 (en) * 2016-12-28 2021-03-09 Ancestry.Com Operations Inc. Clustering historical images using a convolutional neural net and labeled data bootstrapping
US10942966B2 (en) 2017-09-22 2021-03-09 Pinterest, Inc. Textual and image based search
US10942271B2 (en) 2018-10-30 2021-03-09 Tusimple, Inc. Determining an angle between a tow vehicle and a trailer
US10953880B2 (en) 2017-09-07 2021-03-23 Tusimple, Inc. System and method for automated lane change control for autonomous vehicles
US10953881B2 (en) 2017-09-07 2021-03-23 Tusimple, Inc. System and method for automated lane change control for autonomous vehicles
US10962979B2 (en) 2017-09-30 2021-03-30 Tusimple, Inc. System and method for multitask processing for autonomous vehicle computation and control
CN112613528A (en) * 2020-12-31 2021-04-06 广东工业大学 Point cloud simplification method and device based on significance variation and storage medium
US10970564B2 (en) 2017-09-30 2021-04-06 Tusimple, Inc. System and method for instance-level lane detection for autonomous vehicle control
US11009365B2 (en) 2018-02-14 2021-05-18 Tusimple, Inc. Lane marking localization
US11009356B2 (en) 2018-02-14 2021-05-18 Tusimple, Inc. Lane marking localization and fusion
US11010874B2 (en) 2018-04-12 2021-05-18 Tusimple, Inc. Images for perception modules of autonomous vehicles
US11029693B2 (en) 2017-08-08 2021-06-08 Tusimple, Inc. Neural network based vehicle dynamics model
US11055343B2 (en) 2015-10-05 2021-07-06 Pinterest, Inc. Dynamic search control invocation and visual search
CN113221715A (en) * 2020-10-31 2021-08-06 嘉应学院 Fire detection and identification method fused with visual attention mechanism
US20210248715A1 (en) * 2019-01-18 2021-08-12 Ramot At Tel-Aviv University Ltd. Method and system for end-to-end image processing
US11104334B2 (en) 2018-05-31 2021-08-31 Tusimple, Inc. System and method for proximate vehicle intention prediction for autonomous vehicles
CN113345052A (en) * 2021-06-11 2021-09-03 山东大学 Classified data multi-view visualization coloring method and system based on similarity significance
US11126653B2 (en) 2017-09-22 2021-09-21 Pinterest, Inc. Mixed type image based search results
US11151393B2 (en) 2017-08-23 2021-10-19 Tusimple, Inc. Feature matching and corresponding refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map
US11182639B2 (en) * 2017-04-16 2021-11-23 Facebook, Inc. Systems and methods for provisioning content
US11222399B2 (en) * 2014-10-09 2022-01-11 Adobe Inc. Image cropping suggestion using multiple saliency maps
US11238374B2 (en) * 2018-08-24 2022-02-01 Htc Corporation Method for verifying training data, training system, and computer readable medium
US11263752B2 (en) * 2019-05-09 2022-03-01 Boe Technology Group Co., Ltd. Computer-implemented method of detecting foreign object on background object in an image, apparatus for detecting foreign object on background object in an image, and computer-program product
US11292480B2 (en) 2018-09-13 2022-04-05 Tusimple, Inc. Remote safe driving methods and systems
US11305782B2 (en) 2018-01-11 2022-04-19 Tusimple, Inc. Monitoring system for autonomous vehicle operation
US11312334B2 (en) 2018-01-09 2022-04-26 Tusimple, Inc. Real-time remote control of vehicles with high redundancy
US20220138950A1 (en) * 2020-11-02 2022-05-05 Adobe Inc. Generating change comparisons during editing of digital images
US11440473B2 (en) * 2018-10-29 2022-09-13 Aisin Corporation Driving assistance apparatus
US11500101B2 (en) 2018-05-02 2022-11-15 Tusimple, Inc. Curb detection by analysis of reflection images
US11580398B2 (en) * 2016-10-14 2023-02-14 KLA-Tenor Corp. Diagnostic systems and methods for deep learning models configured for semiconductor applications
US11587304B2 (en) 2017-03-10 2023-02-21 Tusimple, Inc. System and method for occluding contour detection
US11609946B2 (en) 2015-10-05 2023-03-21 Pinterest, Inc. Dynamic search input selection
US11625557B2 (en) 2018-10-29 2023-04-11 Hrl Laboratories, Llc Process to learn new image classes without labels
US11701931B2 (en) 2020-06-18 2023-07-18 Tusimple, Inc. Angle and orientation measurements for vehicles with multiple drivable sections
US11704692B2 (en) 2016-05-12 2023-07-18 Pinterest, Inc. Promoting representations of items to users on behalf of sellers of those items
US11810322B2 (en) 2020-04-09 2023-11-07 Tusimple, Inc. Camera pose estimation techniques
US11823460B2 (en) 2019-06-14 2023-11-21 Tusimple, Inc. Image fusion for autonomous vehicle operation
US11841735B2 (en) 2017-09-22 2023-12-12 Pinterest, Inc. Object based image search
US11958473B2 (en) 2021-06-17 2024-04-16 Tusimple, Inc. System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8611695B1 (en) * 2009-04-27 2013-12-17 Google Inc. Large scale patch search
US8391634B1 (en) 2009-04-28 2013-03-05 Google Inc. Illumination estimation for images
US9031243B2 (en) * 2009-09-28 2015-05-12 iZotope, Inc. Automatic labeling and control of audio algorithms by audio recognition
DE102009060687A1 (en) * 2009-11-04 2011-05-05 Siemens Aktiengesellschaft Method and device for computer-aided annotation of multimedia data
US8494302B2 (en) * 2010-11-11 2013-07-23 Seiko Epson Corporation Importance filtering for image retargeting
US8798393B2 (en) 2010-12-01 2014-08-05 Google Inc. Removing illumination variation from images
EP2463821A1 (en) * 2010-12-08 2012-06-13 Alcatel Lucent Method and system for segmenting an image
US20120272171A1 (en) * 2011-04-21 2012-10-25 Panasonic Corporation Apparatus, Method and Computer-Implemented Program for Editable Categorization
US9501710B2 (en) * 2012-06-29 2016-11-22 Arizona Board Of Regents, A Body Corporate Of The State Of Arizona, Acting For And On Behalf Of Arizona State University Systems, methods, and media for identifying object characteristics based on fixation points
US9595298B2 (en) 2012-07-18 2017-03-14 Microsoft Technology Licensing, Llc Transforming data to create layouts
CN102968786B (en) * 2012-10-23 2015-08-12 西北工业大学 A kind of non-supervisory remote sensing images potential target method for detecting area
US9626768B2 (en) 2014-09-30 2017-04-18 Microsoft Technology Licensing, Llc Optimizing a visual perspective of media
US10282069B2 (en) 2014-09-30 2019-05-07 Microsoft Technology Licensing, Llc Dynamic presentation of suggested content
US9454712B2 (en) * 2014-10-08 2016-09-27 Adobe Systems Incorporated Saliency map computation
EP3026917A1 (en) 2014-11-27 2016-06-01 Thomson Licensing Methods and apparatus for model-based visual descriptors compression
US9216591B1 (en) 2014-12-23 2015-12-22 Xerox Corporation Method and system for mutual augmentation of a motivational printing awareness platform and recommendation-enabled printing drivers
US10296846B2 (en) * 2015-11-24 2019-05-21 Xerox Corporation Adapted domain specific class means classifier
US10380228B2 (en) 2017-02-10 2019-08-13 Microsoft Technology Licensing, Llc Output generation based on semantic expressions
CN106845457A (en) * 2017-03-02 2017-06-13 西安电子科技大学 Method for detecting infrared puniness target based on spectrum residual error with fuzzy clustering
WO2020066233A1 (en) * 2018-09-28 2020-04-02 富士フイルム株式会社 Learning device, learning device operation program, and learning device operation method
US10929715B2 (en) 2018-12-31 2021-02-23 Robert Bosch Gmbh Semantic segmentation using driver attention information
US11263482B2 (en) 2019-08-09 2022-03-01 Florida Power & Light Company AI image recognition training tool sets
CN113515981A (en) 2020-05-22 2021-10-19 阿里巴巴集团控股有限公司 Identification method, device, equipment and storage medium
US11423265B1 (en) 2020-06-30 2022-08-23 Amazon Technologies, Inc. Content moderation using object detection and image classification

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5960111A (en) * 1997-02-10 1999-09-28 At&T Corp Method and apparatus for segmenting images prior to coding
US6151408A (en) * 1995-02-10 2000-11-21 Fuji Photo Film Co., Ltd. Method for separating a desired pattern region from a color image
US6208758B1 (en) * 1991-09-12 2001-03-27 Fuji Photo Film Co., Ltd. Method for learning by a neural network including extracting a target object image for which learning operations are to be carried out
US6711278B1 (en) * 1998-09-10 2004-03-23 Microsoft Corporation Tracking semantic objects in vector image sequences
US20050213810A1 (en) * 2004-03-29 2005-09-29 Kohtaro Sabe Information processing apparatus and method, recording medium, and program
US20050220336A1 (en) * 2004-03-26 2005-10-06 Kohtaro Sabe Information processing apparatus and method, recording medium, and program
US20060093184A1 (en) * 2004-11-04 2006-05-04 Fuji Xerox Co., Ltd. Image processing apparatus
US20070005356A1 (en) * 2005-06-30 2007-01-04 Florent Perronnin Generic visual categorization method and system
US20070258648A1 (en) * 2006-05-05 2007-11-08 Xerox Corporation Generic visual classification with gradient components-based dimensionality enhancement
US20080069456A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
US7400761B2 (en) * 2003-09-30 2008-07-15 Microsoft Corporation Contrast-based image attention analysis framework
US20080240532A1 (en) * 2007-03-30 2008-10-02 Siemens Corporation System and Method for Detection of Fetal Anatomies From Ultrasound Images Using a Constrained Probabilistic Boosting Tree
US20080304740A1 (en) * 2007-06-06 2008-12-11 Microsoft Corporation Salient Object Detection
US20080304742A1 (en) * 2005-02-17 2008-12-11 Connell Jonathan H Combining multiple cues in a visual object detection system
US20080317358A1 (en) * 2007-06-25 2008-12-25 Xerox Corporation Class-based image enhancement system
US7876938B2 (en) * 2005-10-06 2011-01-25 Siemens Medical Solutions Usa, Inc. System and method for whole body landmark detection, segmentation and change quantification in digital images

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4705959B2 (en) 2005-01-10 2011-06-22 トムソン ライセンシング Apparatus and method for creating image saliency map
US8175376B2 (en) * 2009-03-09 2012-05-08 Xerox Corporation Framework for image thumbnailing based on visual similarity

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208758B1 (en) * 1991-09-12 2001-03-27 Fuji Photo Film Co., Ltd. Method for learning by a neural network including extracting a target object image for which learning operations are to be carried out
US6151408A (en) * 1995-02-10 2000-11-21 Fuji Photo Film Co., Ltd. Method for separating a desired pattern region from a color image
US5960111A (en) * 1997-02-10 1999-09-28 At&T Corp Method and apparatus for segmenting images prior to coding
US6711278B1 (en) * 1998-09-10 2004-03-23 Microsoft Corporation Tracking semantic objects in vector image sequences
US7400761B2 (en) * 2003-09-30 2008-07-15 Microsoft Corporation Contrast-based image attention analysis framework
US20050220336A1 (en) * 2004-03-26 2005-10-06 Kohtaro Sabe Information processing apparatus and method, recording medium, and program
US20050213810A1 (en) * 2004-03-29 2005-09-29 Kohtaro Sabe Information processing apparatus and method, recording medium, and program
US20090175533A1 (en) * 2004-03-29 2009-07-09 Kohtaro Sabe Information processing apparatus and method, recording medium, and program
US7630525B2 (en) * 2004-03-29 2009-12-08 Sony Corporation Information processing apparatus and method, recording medium, and program
US20060093184A1 (en) * 2004-11-04 2006-05-04 Fuji Xerox Co., Ltd. Image processing apparatus
US20080304742A1 (en) * 2005-02-17 2008-12-11 Connell Jonathan H Combining multiple cues in a visual object detection system
US20070005356A1 (en) * 2005-06-30 2007-01-04 Florent Perronnin Generic visual categorization method and system
US7876938B2 (en) * 2005-10-06 2011-01-25 Siemens Medical Solutions Usa, Inc. System and method for whole body landmark detection, segmentation and change quantification in digital images
US20070258648A1 (en) * 2006-05-05 2007-11-08 Xerox Corporation Generic visual classification with gradient components-based dimensionality enhancement
US20080069456A1 (en) * 2006-09-19 2008-03-20 Xerox Corporation Bags of visual context-dependent words for generic visual categorization
US20080240532A1 (en) * 2007-03-30 2008-10-02 Siemens Corporation System and Method for Detection of Fetal Anatomies From Ultrasound Images Using a Constrained Probabilistic Boosting Tree
US7995820B2 (en) * 2007-03-30 2011-08-09 Siemens Medical Solutions Usa, Inc. System and method for detection of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree
US20080304740A1 (en) * 2007-06-06 2008-12-11 Microsoft Corporation Salient Object Detection
US20080317358A1 (en) * 2007-06-25 2008-12-25 Xerox Corporation Class-based image enhancement system

Cited By (308)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9740949B1 (en) 2007-06-14 2017-08-22 Hrl Laboratories, Llc System and method for detection of objects of interest in imagery
US8774517B1 (en) * 2007-06-14 2014-07-08 Hrl Laboratories, Llc System for identifying regions of interest in visual imagery
US9778351B1 (en) 2007-10-04 2017-10-03 Hrl Laboratories, Llc System for surveillance by integrating radar with a panoramic staring sensor
US10007679B2 (en) 2008-08-08 2018-06-26 The Research Foundation For The State University Of New York Enhanced max margin learning on multimodal data mining in a multimedia database
US20100215098A1 (en) * 2009-02-23 2010-08-26 Mondo Systems, Inc. Apparatus and method for compressing pictures with roi-dependent compression parameters
US10027966B2 (en) * 2009-02-23 2018-07-17 Mondo Systems, Inc. Apparatus and method for compressing pictures with ROI-dependent compression parameters
US8175376B2 (en) * 2009-03-09 2012-05-08 Xerox Corporation Framework for image thumbnailing based on visual similarity
US20120027309A1 (en) * 2009-04-14 2012-02-02 Nec Corporation Image signature extraction device
US8861871B2 (en) * 2009-04-14 2014-10-14 Nec Corporation Image signature extraction device
US20100281361A1 (en) * 2009-04-30 2010-11-04 Xerox Corporation Automated method for alignment of document objects
US8271871B2 (en) * 2009-04-30 2012-09-18 Xerox Corporation Automated method for alignment of document objects
US8532387B2 (en) 2009-09-04 2013-09-10 Adobe Systems Incorporated Methods and apparatus for procedural directional texture generation
US8787698B2 (en) 2009-09-04 2014-07-22 Adobe Systems Incorporated Methods and apparatus for directional texture generation using image warping
US8599219B2 (en) * 2009-09-18 2013-12-03 Adobe Systems Incorporated Methods and apparatuses for generating thumbnail summaries for image collections
US8619098B2 (en) * 2009-09-18 2013-12-31 Adobe Systems Incorporated Methods and apparatuses for generating co-salient thumbnails for digital images
US20130120454A1 (en) * 2009-09-18 2013-05-16 Elya Shechtman Methods and Apparatuses for Generating Thumbnail Summaries for Image Collections
US20110164815A1 (en) * 2009-11-17 2011-07-07 Samsung Electronics Co., Ltd. Method, device and system for content based image categorization field
US9355432B1 (en) 2010-07-13 2016-05-31 Google Inc. Method and system for automatically cropping images
US9070182B1 (en) 2010-07-13 2015-06-30 Google Inc. Method and system for automatically cropping images
US8577182B1 (en) 2010-07-13 2013-11-05 Google Inc. Method and system for automatically cropping images
US9552622B2 (en) 2010-07-13 2017-01-24 Google Inc. Method and system for automatically cropping images
US8487959B1 (en) * 2010-08-06 2013-07-16 Google Inc. Generating simulated eye movement traces for visual displays
US8933938B2 (en) * 2010-08-06 2015-01-13 Google Inc. Generating simulated eye movement traces for visual displays
US9229956B2 (en) 2011-01-10 2016-01-05 Microsoft Technology Licensing, Llc Image retrieval using discriminative visual features
US20130091515A1 (en) * 2011-02-04 2013-04-11 Kotaro Sakata Degree of interest estimating device and degree of interest estimating method
US9538219B2 (en) * 2011-02-04 2017-01-03 Panasonic Intellectual Property Corporation Of America Degree of interest estimating device and degree of interest estimating method
US9058611B2 (en) 2011-03-17 2015-06-16 Xerox Corporation System and method for advertising using image search and classification
US20120328150A1 (en) * 2011-03-22 2012-12-27 Rochester Institute Of Technology Methods for assisting with object recognition in image sequences and devices thereof
US9785835B2 (en) * 2011-03-22 2017-10-10 Rochester Institute Of Technology Methods for assisting with object recognition in image sequences and devices thereof
US10026198B2 (en) 2011-04-08 2018-07-17 Creative Technology Ltd Method, system and electronic device for at least one of efficient graphic processing and salient based learning
TWI566116B (en) * 2011-04-08 2017-01-11 創新科技有限公司 Electronic device for at least one of efficient graphic processing and salient based learning
CN103597484A (en) * 2011-04-08 2014-02-19 创新科技有限公司 A method, system and electronic device for at least one of efficient graphic processing and salient based learning
WO2012138299A1 (en) * 2011-04-08 2012-10-11 Creative Technology Ltd A method, system and electronic device for at least one of efficient graphic processing and salient based learning
US8867829B2 (en) 2011-05-26 2014-10-21 Xerox Corporation Method and apparatus for editing color characteristics of electronic image
US8570339B2 (en) 2011-05-26 2013-10-29 Xerox Corporation Modifying color adjustment choices based on image characteristics in an image editing system
US8560517B2 (en) 2011-07-05 2013-10-15 Microsoft Corporation Object retrieval using visual query context
US10622111B2 (en) 2011-08-12 2020-04-14 Help Lightning, Inc. System and method for image registration of multiple video streams
US20130038632A1 (en) * 2011-08-12 2013-02-14 Marcus W. Dillavou System and method for image registration of multiple video streams
US9886552B2 (en) * 2011-08-12 2018-02-06 Help Lighting, Inc. System and method for image registration of multiple video streams
US10181361B2 (en) 2011-08-12 2019-01-15 Help Lightning, Inc. System and method for image registration of multiple video streams
US8379981B1 (en) 2011-08-26 2013-02-19 Toyota Motor Engineering & Manufacturing North America, Inc. Segmenting spatiotemporal data based on user gaze data
US9317773B2 (en) 2011-08-29 2016-04-19 Adobe Systems Incorporated Patch-based synthesis techniques using color and color gradient voting
US8861868B2 (en) 2011-08-29 2014-10-14 Adobe-Systems Incorporated Patch-based synthesis techniques
DE102011113154B4 (en) * 2011-09-14 2015-12-03 Airbus Defence and Space GmbH Machine learning method for machine learning of manifestations of objects in images
US9361543B2 (en) 2011-09-14 2016-06-07 Airbus Defence and Space GmbH Automatic learning method for the automatic learning of forms of appearance of objects in images
US8675966B2 (en) 2011-09-29 2014-03-18 Hewlett-Packard Development Company, L.P. System and method for saliency map generation
US8824797B2 (en) 2011-10-03 2014-09-02 Xerox Corporation Graph-based segmentation integrating visible and NIR information
EP2579211A2 (en) 2011-10-03 2013-04-10 Xerox Corporation Graph-based segmentation integrating visible and NIR information
US8660351B2 (en) * 2011-10-24 2014-02-25 Hewlett-Packard Development Company, L.P. Auto-cropping images using saliency maps
US20140250110A1 (en) * 2011-11-25 2014-09-04 Linjun Yang Image attractiveness based indexing and searching
US8938116B2 (en) * 2011-12-08 2015-01-20 Yahoo! Inc. Image cropping using supervised learning
US20150131900A1 (en) * 2011-12-08 2015-05-14 Yahoo! Inc. Image Cropping Using Supervised Learning
US9177207B2 (en) * 2011-12-08 2015-11-03 Zynga Inc. Image cropping using supervised learning
US20130148880A1 (en) * 2011-12-08 2013-06-13 Yahoo! Inc. Image Cropping Using Supervised Learning
US8929680B2 (en) * 2011-12-12 2015-01-06 Canon Kabushiki Kaisha Method, apparatus and system for identifying distracting elements in an image
US20130148910A1 (en) * 2011-12-12 2013-06-13 Canon Kabushiki Kaisha Method, apparatus and system for identifying distracting elements in an image
US8917910B2 (en) 2012-01-16 2014-12-23 Xerox Corporation Image segmentation based on approximation of segmentation similarity
US9075824B2 (en) 2012-04-27 2015-07-07 Xerox Corporation Retrieval system and method leveraging category-level labels
US9030505B2 (en) * 2012-05-17 2015-05-12 Nokia Technologies Oy Method and apparatus for attracting a user's gaze to information in a non-intrusive manner
US20130307762A1 (en) * 2012-05-17 2013-11-21 Nokia Corporation Method and apparatus for attracting a user's gaze to information in a non-intrusive manner
US9959629B2 (en) 2012-05-21 2018-05-01 Help Lighting, Inc. System and method for managing spatiotemporal uncertainty
EP2674881A1 (en) 2012-06-15 2013-12-18 Xerox Corporation Privacy preserving method for querying a remote public service
US8666992B2 (en) * 2012-06-15 2014-03-04 Xerox Corporation Privacy preserving method for querying a remote public service
US20150178587A1 (en) * 2012-06-18 2015-06-25 Thomson Licensing Device and a method for color harmonization of an image
CN102800092A (en) * 2012-07-12 2012-11-28 北方工业大学 Point-to-surface image significance detection
US8892562B2 (en) 2012-07-26 2014-11-18 Xerox Corporation Categorization of multi-page documents by anisotropic diffusion
US8873812B2 (en) 2012-08-06 2014-10-28 Xerox Corporation Image segmentation using hierarchical unsupervised segmentation and hierarchical classifiers
EP2701098A3 (en) * 2012-08-23 2015-06-03 Xerox Corporation Region refocusing for data-driven object localization
US8879796B2 (en) 2012-08-23 2014-11-04 Xerox Corporation Region refocusing for data-driven object localization
US9104946B2 (en) 2012-10-15 2015-08-11 Canon Kabushiki Kaisha Systems and methods for comparing images
CN102945378A (en) * 2012-10-23 2013-02-27 西北工业大学 Method for detecting potential target regions of remote sensing image on basis of monitoring method
US9418079B2 (en) * 2012-11-01 2016-08-16 Google Inc. Image comparison process
US20140122531A1 (en) * 2012-11-01 2014-05-01 Google Inc. Image comparison process
US20140126782A1 (en) * 2012-11-02 2014-05-08 Sony Corporation Image display apparatus, image display method, and computer program
CN103020993B (en) * 2012-11-28 2015-06-17 杭州电子科技大学 Visual saliency detection method by fusing dual-channel color contrasts
CN103020993A (en) * 2012-11-28 2013-04-03 杭州电子科技大学 Visual saliency detection method by fusing dual-channel color contrasts
WO2014092548A1 (en) * 2012-12-13 2014-06-19 Mimos Berhad A method and system for identifying multiple entities in images
US9008429B2 (en) 2013-02-01 2015-04-14 Xerox Corporation Label-embedding for text recognition
US8879103B2 (en) 2013-03-04 2014-11-04 Xerox Corporation System and method for highlighting barriers to reducing paper usage
EP2790135A1 (en) 2013-03-04 2014-10-15 Xerox Corporation System and method for highlighting barriers to reducing paper usage
US9158995B2 (en) * 2013-03-14 2015-10-13 Xerox Corporation Data driven localization using task-dependent representations
US20140270350A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Data driven localization using task-dependent representations
CN103198319A (en) * 2013-04-11 2013-07-10 武汉大学 Method of extraction of corner of blurred image in mine shaft environment
US9384423B2 (en) 2013-05-28 2016-07-05 Xerox Corporation System and method for OCR output verification
US20140376819A1 (en) * 2013-06-21 2014-12-25 Microsoft Corporation Image recognition by image search
US9754177B2 (en) * 2013-06-21 2017-09-05 Microsoft Technology Licensing, Llc Identifying objects within an image
US10482673B2 (en) 2013-06-27 2019-11-19 Help Lightning, Inc. System and method for role negotiation in multi-reality environments
US9940750B2 (en) 2013-06-27 2018-04-10 Help Lighting, Inc. System and method for role negotiation in multi-reality environments
US20160196662A1 (en) * 2013-08-16 2016-07-07 Beijing Jingdong Shangke Information Technology Co., Ltd. Method and device for manufacturing virtual fitting model image
US9082047B2 (en) 2013-08-20 2015-07-14 Xerox Corporation Learning beautiful and ugly visual attributes
EP2863338A2 (en) 2013-10-16 2015-04-22 Xerox Corporation Delayed vehicle identification for privacy enforcement
US9412031B2 (en) 2013-10-16 2016-08-09 Xerox Corporation Delayed vehicle identification for privacy enforcement
US11436272B2 (en) 2013-11-12 2022-09-06 Pinterest, Inc. Object based image based search
US20150134688A1 (en) * 2013-11-12 2015-05-14 Pinterest, Inc. Image based search
US10515110B2 (en) * 2013-11-12 2019-12-24 Pinterest, Inc. Image based search
US10832448B2 (en) 2013-11-13 2020-11-10 Sony Corporation Display control device, display control method, and program
US20150130838A1 (en) * 2013-11-13 2015-05-14 Sony Corporation Display control device, display control method, and program
US9275306B2 (en) * 2013-11-13 2016-03-01 Canon Kabushiki Kaisha Devices, systems, and methods for learning a discriminant image representation
US20150131899A1 (en) * 2013-11-13 2015-05-14 Canon Kabushiki Kaisha Devices, systems, and methods for learning a discriminant image representation
US10115210B2 (en) * 2013-11-13 2018-10-30 Sony Corporation Display control device, display control method, and program
CN103678552A (en) * 2013-12-05 2014-03-26 武汉大学 Remote-sensing image retrieving method and system based on salient regional features
US20150169982A1 (en) * 2013-12-17 2015-06-18 Canon Kabushiki Kaisha Observer Preference Model
US9779284B2 (en) 2013-12-17 2017-10-03 Conduent Business Services, Llc Privacy-preserving evidence in ALPR applications
US9558423B2 (en) * 2013-12-17 2017-01-31 Canon Kabushiki Kaisha Observer preference model
US20160360267A1 (en) * 2014-01-14 2016-12-08 Alcatel Lucent Process for increasing the quality of experience for users that watch on their terminals a high definition video stream
US9430701B2 (en) * 2014-02-07 2016-08-30 Tata Consultancy Services Limited Object detection system and method
US20150227784A1 (en) * 2014-02-07 2015-08-13 Tata Consultancy Services Limited Object detection system and method
US9158971B2 (en) 2014-03-03 2015-10-13 Xerox Corporation Self-learning object detectors for unlabeled videos using multi-task learning
EP2916265A1 (en) 2014-03-03 2015-09-09 Xerox Corporation Self-learning object detectors for unlabeled videos using multi-task learning
US9928532B2 (en) 2014-03-04 2018-03-27 Daniel Torres Image based search engine
US20150262039A1 (en) * 2014-03-13 2015-09-17 Omron Corporation Image processing apparatus and image processing method
US9600746B2 (en) * 2014-03-13 2017-03-21 Omron Corporation Image processing apparatus and image processing method
US20170236030A1 (en) * 2014-04-15 2017-08-17 Canon Kabushiki Kaisha Object detection apparatus, object detection method, and storage medium
US20150294181A1 (en) * 2014-04-15 2015-10-15 Canon Kabushiki Kaisha Object detection apparatus object detection method and storage medium
US9672439B2 (en) * 2014-04-15 2017-06-06 Canon Kabushiki Kaisha Object detection apparatus object detection method and storage medium
US10643100B2 (en) * 2014-04-15 2020-05-05 Canon Kabushiki Kaisha Object detection apparatus, object detection method, and storage medium
US9639806B2 (en) 2014-04-15 2017-05-02 Xerox Corporation System and method for predicting iconicity of an image
US20170046621A1 (en) * 2014-04-30 2017-02-16 Siemens Healthcare Diagnostics Inc. Method and apparatus for performing block retrieval on block to be processed of urine sediment image
US11386340B2 (en) * 2014-04-30 2022-07-12 Siemens Healthcare Diagnostic Inc. Method and apparatus for performing block retrieval on block to be processed of urine sediment image
CN103927758A (en) * 2014-04-30 2014-07-16 重庆大学 Saliency detection method based on contrast ratio and minimum convex hull of angular point
US10748069B2 (en) * 2014-04-30 2020-08-18 Siemens Healthcare Diagnostics Inc. Method and apparatus for performing block retrieval on block to be processed of urine sediment image
US20150332605A1 (en) * 2014-05-19 2015-11-19 Thomson Licensing Method for harmonizing colors, corresponding computer program and device
US9761152B2 (en) * 2014-05-19 2017-09-12 Thomson Licensing Method for harmonizing colors, corresponding computer program and device
US9734434B2 (en) 2014-07-18 2017-08-15 Adobe Systems Incorporated Feature interpolation
US9424484B2 (en) * 2014-07-18 2016-08-23 Adobe Systems Incorporated Feature interpolation
US20160019440A1 (en) * 2014-07-18 2016-01-21 Adobe Systems Incorporated Feature Interpolation
US10043057B2 (en) 2014-07-28 2018-08-07 Adobe Systems Incorporated Accelerating object detection
US9471828B2 (en) 2014-07-28 2016-10-18 Adobe Systems Incorporated Accelerating object detection
US9858677B2 (en) 2014-09-05 2018-01-02 Apical Ltd. Method of image analysis
GB2529888A (en) * 2014-09-05 2016-03-09 Apical Ltd A method of image anaysis
CN105404884A (en) * 2014-09-05 2016-03-16 顶级公司 Image analysis method
GB2529888B (en) * 2014-09-05 2020-09-23 Apical Ltd A method of image analysis
US20190005659A1 (en) * 2014-09-19 2019-01-03 Brain Corporation Salient features tracking apparatus and methods using visual initialization
US9697439B2 (en) 2014-10-02 2017-07-04 Xerox Corporation Efficient object detection with patch-level window processing
US11222399B2 (en) * 2014-10-09 2022-01-11 Adobe Inc. Image cropping suggestion using multiple saliency maps
US9773155B2 (en) * 2014-10-14 2017-09-26 Microsoft Technology Licensing, Llc Depth from time of flight camera
US20160104031A1 (en) * 2014-10-14 2016-04-14 Microsoft Technology Licensing, Llc Depth from time of flight camera
US10311282B2 (en) 2014-10-14 2019-06-04 Microsoft Technology Licensing, Llc Depth from time of flight camera
US9443164B2 (en) 2014-12-02 2016-09-13 Xerox Corporation System and method for product identification
US20160171299A1 (en) * 2014-12-11 2016-06-16 Samsung Electronics Co., Ltd. Apparatus and method for computer aided diagnosis (cad) based on eye movement
US9818029B2 (en) * 2014-12-11 2017-11-14 Samsung Electronics Co., Ltd. Apparatus and method for computer aided diagnosis (CAD) based on eye movement
US9367763B1 (en) 2015-01-12 2016-06-14 Xerox Corporation Privacy-preserving text to image matching
EP3048561A1 (en) 2015-01-21 2016-07-27 Xerox Corporation Method and system to perform text-to-image queries with wildcards
US9626594B2 (en) 2015-01-21 2017-04-18 Xerox Corporation Method and system to perform text-to-image queries with wildcards
US20200128145A1 (en) * 2015-02-13 2020-04-23 Smugmug, Inc. System and method for photo subject display optimization
US11743402B2 (en) * 2015-02-13 2023-08-29 Awes.Me, Inc. System and method for photo subject display optimization
US9600738B2 (en) 2015-04-07 2017-03-21 Xerox Corporation Discriminative embedding of local color names for object retrieval and classification
US11935102B2 (en) 2015-05-12 2024-03-19 Pinterest, Inc. Matching user provided representations of items with sellers of those items
US10269055B2 (en) 2015-05-12 2019-04-23 Pinterest, Inc. Matching user provided representations of items with sellers of those items
US11443357B2 (en) 2015-05-12 2022-09-13 Pinterest, Inc. Matching user provided representations of items with sellers of those items
US10679269B2 (en) 2015-05-12 2020-06-09 Pinterest, Inc. Item selling on multiple web sites
US10210421B2 (en) * 2015-05-19 2019-02-19 Toyota Motor Engineering & Manufacturing North America, Inc. Apparatus and method for object tracking
US20170185860A1 (en) * 2015-05-19 2017-06-29 Toyota Motor Engineering & Manufacturing North America, Inc. Apparatus and method for object tracking
US9613273B2 (en) * 2015-05-19 2017-04-04 Toyota Motor Engineering & Manufacturing North America, Inc. Apparatus and method for object tracking
US10049085B2 (en) * 2015-08-31 2018-08-14 Qualtrics, Llc Presenting views of an electronic document
US20170060812A1 (en) * 2015-08-31 2017-03-02 Qualtrics, Llc Presenting views of an electronic document
US10430497B2 (en) 2015-08-31 2019-10-01 Qualtrics, Llc Presenting views of an electronic document
US11113448B2 (en) 2015-08-31 2021-09-07 Qualtrics, Llc Presenting views of an electronic document
US11055343B2 (en) 2015-10-05 2021-07-06 Pinterest, Inc. Dynamic search control invocation and visual search
US11609946B2 (en) 2015-10-05 2023-03-21 Pinterest, Inc. Dynamic search input selection
CN105513080A (en) * 2015-12-21 2016-04-20 南京邮电大学 Infrared image target salience evaluating method
CN105760886A (en) * 2016-02-23 2016-07-13 北京联合大学 Image scene multi-object segmentation method based on target identification and saliency detection
US9830529B2 (en) 2016-04-26 2017-11-28 Xerox Corporation End-to-end saliency mapping via probability distribution prediction
US11704692B2 (en) 2016-05-12 2023-07-18 Pinterest, Inc. Promoting representations of items to users on behalf of sellers of those items
US10521503B2 (en) 2016-09-23 2019-12-31 Qualtrics, Llc Authenticating a respondent to an electronic survey
US11017166B2 (en) 2016-09-23 2021-05-25 Qualtrics, Llc Authenticating a respondent to an electronic survey
US11580398B2 (en) * 2016-10-14 2023-02-14 KLA-Tenor Corp. Diagnostic systems and methods for deep learning models configured for semiconductor applications
US11568754B2 (en) 2016-10-31 2023-01-31 Qualtrics, Llc Guiding creation of an electronic survey
US10909868B2 (en) 2016-10-31 2021-02-02 Qualtrics, Llc Guiding creation of an electronic survey
US10706735B2 (en) 2016-10-31 2020-07-07 Qualtrics, Llc Guiding creation of an electronic survey
US10607109B2 (en) * 2016-11-16 2020-03-31 Samsung Electronics Co., Ltd. Method and apparatus to perform material recognition and training for material recognition
CN106780430A (en) * 2016-11-17 2017-05-31 大连理工大学 A kind of image significance detection method based on surroundedness and Markov model
US10706549B2 (en) * 2016-12-20 2020-07-07 Kodak Alaris Inc. Iterative method for salient foreground detection and multi-object segmentation
US11120556B2 (en) * 2016-12-20 2021-09-14 Kodak Alaris Inc. Iterative method for salient foreground detection and multi-object segmentation
US10943146B2 (en) * 2016-12-28 2021-03-09 Ancestry.Com Operations Inc. Clustering historical images using a convolutional neural net and labeled data bootstrapping
US11721091B2 (en) 2016-12-28 2023-08-08 Ancestry.Com Operations Inc. Clustering historical images using a convolutional neural net and labeled data bootstrapping
US11501513B2 (en) 2017-03-10 2022-11-15 Tusimple, Inc. System and method for vehicle wheel detection
US10671873B2 (en) 2017-03-10 2020-06-02 Tusimple, Inc. System and method for vehicle wheel detection
US11587304B2 (en) 2017-03-10 2023-02-21 Tusimple, Inc. System and method for occluding contour detection
US10147193B2 (en) 2017-03-10 2018-12-04 TuSimple System and method for semantic segmentation using hybrid dilated convolution (HDC)
US9953236B1 (en) 2017-03-10 2018-04-24 TuSimple System and method for semantic segmentation using dense upsampling convolution (DUC)
US10067509B1 (en) 2017-03-10 2018-09-04 TuSimple System and method for occluding contour detection
US11673557B2 (en) 2017-04-07 2023-06-13 Tusimple, Inc. System and method for path planning of autonomous vehicles based on gradient
US9952594B1 (en) 2017-04-07 2018-04-24 TuSimple System and method for traffic data collection using unmanned aerial vehicles (UAVs)
US10471963B2 (en) 2017-04-07 2019-11-12 TuSimple System and method for transitioning between an autonomous and manual driving mode based on detection of a drivers capacity to control a vehicle
US10710592B2 (en) 2017-04-07 2020-07-14 Tusimple, Inc. System and method for path planning of autonomous vehicles based on gradient
US11182639B2 (en) * 2017-04-16 2021-11-23 Facebook, Inc. Systems and methods for provisioning content
US11557128B2 (en) 2017-04-25 2023-01-17 Tusimple, Inc. System and method for vehicle position and velocity estimation based on camera and LIDAR data
US11928868B2 (en) 2017-04-25 2024-03-12 Tusimple, Inc. System and method for vehicle position and velocity estimation based on camera and LIDAR data
US10552691B2 (en) 2017-04-25 2020-02-04 TuSimple System and method for vehicle position and velocity estimation based on camera and lidar data
US10481044B2 (en) 2017-05-18 2019-11-19 TuSimple Perception simulation for improved autonomous vehicle control
US10558864B2 (en) 2017-05-18 2020-02-11 TuSimple System and method for image localization based on semantic segmentation
US10830669B2 (en) 2017-05-18 2020-11-10 Tusimple, Inc. Perception simulation for improved autonomous vehicle control
US10867188B2 (en) 2017-05-18 2020-12-15 Tusimple, Inc. System and method for image localization based on semantic segmentation
US11885712B2 (en) 2017-05-18 2024-01-30 Tusimple, Inc. Perception simulation for improved autonomous vehicle control
US10474790B2 (en) 2017-06-02 2019-11-12 TuSimple Large scale distributed simulation for realistic multiple-agent interactive environments
US10762635B2 (en) 2017-06-14 2020-09-01 Tusimple, Inc. System and method for actively selecting and labeling images for semantic segmentation
US10752246B2 (en) 2017-07-01 2020-08-25 Tusimple, Inc. System and method for adaptive cruise control with proximate vehicle detection
US10308242B2 (en) 2017-07-01 2019-06-04 TuSimple System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles
US10737695B2 (en) 2017-07-01 2020-08-11 Tusimple, Inc. System and method for adaptive cruise control for low speed following
US11040710B2 (en) 2017-07-01 2021-06-22 Tusimple, Inc. System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles
US11753008B2 (en) 2017-07-01 2023-09-12 Tusimple, Inc. System and method for adaptive cruise control with proximate vehicle detection
US10493988B2 (en) 2017-07-01 2019-12-03 TuSimple System and method for adaptive cruise control for defensive driving
US10303522B2 (en) 2017-07-01 2019-05-28 TuSimple System and method for distributed graphics processing unit (GPU) computation
US11029693B2 (en) 2017-08-08 2021-06-08 Tusimple, Inc. Neural network based vehicle dynamics model
US11550329B2 (en) 2017-08-08 2023-01-10 Tusimple, Inc. Neural network based vehicle dynamics model
US10360257B2 (en) 2017-08-08 2019-07-23 TuSimple System and method for image annotation
US10816354B2 (en) 2017-08-22 2020-10-27 Tusimple, Inc. Verification module system and method for motion-based lane detection with multiple sensors
US11573095B2 (en) 2017-08-22 2023-02-07 Tusimple, Inc. Verification module system and method for motion-based lane detection with multiple sensors
US11874130B2 (en) 2017-08-22 2024-01-16 Tusimple, Inc. Verification module system and method for motion-based lane detection with multiple sensors
US11846510B2 (en) 2017-08-23 2023-12-19 Tusimple, Inc. Feature matching and correspondence refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map
US10303956B2 (en) 2017-08-23 2019-05-28 TuSimple System and method for using triplet loss for proposal free instance-wise semantic segmentation for lane detection
US10762673B2 (en) 2017-08-23 2020-09-01 Tusimple, Inc. 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map
US11151393B2 (en) 2017-08-23 2021-10-19 Tusimple, Inc. Feature matching and corresponding refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map
US10678234B2 (en) 2017-08-24 2020-06-09 Tusimple, Inc. System and method for autonomous vehicle control to minimize energy cost
US11886183B2 (en) 2017-08-24 2024-01-30 Tusimple, Inc. System and method for autonomous vehicle control to minimize energy cost
US11366467B2 (en) 2017-08-24 2022-06-21 Tusimple, Inc. System and method for autonomous vehicle control to minimize energy cost
US10783381B2 (en) 2017-08-31 2020-09-22 Tusimple, Inc. System and method for vehicle occlusion detection
US10311312B2 (en) 2017-08-31 2019-06-04 TuSimple System and method for vehicle occlusion detection
US11745736B2 (en) 2017-08-31 2023-09-05 Tusimple, Inc. System and method for vehicle occlusion detection
US10656644B2 (en) 2017-09-07 2020-05-19 Tusimple, Inc. System and method for using human driving patterns to manage speed control for autonomous vehicles
US10649458B2 (en) 2017-09-07 2020-05-12 Tusimple, Inc. Data-driven prediction-based system and method for trajectory planning of autonomous vehicles
US10953880B2 (en) 2017-09-07 2021-03-23 Tusimple, Inc. System and method for automated lane change control for autonomous vehicles
US10953881B2 (en) 2017-09-07 2021-03-23 Tusimple, Inc. System and method for automated lane change control for autonomous vehicles
US10782693B2 (en) 2017-09-07 2020-09-22 Tusimple, Inc. Prediction-based system and method for trajectory planning of autonomous vehicles
US11853071B2 (en) 2017-09-07 2023-12-26 Tusimple, Inc. Data-driven prediction-based system and method for trajectory planning of autonomous vehicles
US11294375B2 (en) 2017-09-07 2022-04-05 Tusimple, Inc. System and method for using human driving patterns to manage speed control for autonomous vehicles
US10782694B2 (en) 2017-09-07 2020-09-22 Tusimple, Inc. Prediction-based system and method for trajectory planning of autonomous vehicles
US11892846B2 (en) 2017-09-07 2024-02-06 Tusimple, Inc. Prediction-based system and method for trajectory planning of autonomous vehicles
US10552979B2 (en) 2017-09-13 2020-02-04 TuSimple Output of a neural network method for deep odometry assisted by static scene optical flow
US10671083B2 (en) 2017-09-13 2020-06-02 Tusimple, Inc. Neural network architecture system for deep odometry assisted by static scene optical flow
US10733465B2 (en) 2017-09-20 2020-08-04 Tusimple, Inc. System and method for vehicle taillight state recognition
US11328164B2 (en) 2017-09-20 2022-05-10 Tusimple, Inc. System and method for vehicle taillight state recognition
US11734563B2 (en) 2017-09-20 2023-08-22 Tusimple, Inc. System and method for vehicle taillight state recognition
US10387736B2 (en) 2017-09-20 2019-08-20 TuSimple System and method for detecting taillight signals of a vehicle
US11126653B2 (en) 2017-09-22 2021-09-21 Pinterest, Inc. Mixed type image based search results
US11620331B2 (en) 2017-09-22 2023-04-04 Pinterest, Inc. Textual and image based search
US11841735B2 (en) 2017-09-22 2023-12-12 Pinterest, Inc. Object based image search
US10942966B2 (en) 2017-09-22 2021-03-09 Pinterest, Inc. Textual and image based search
US11853883B2 (en) 2017-09-30 2023-12-26 Tusimple, Inc. System and method for instance-level lane detection for autonomous vehicle control
US11500387B2 (en) 2017-09-30 2022-11-15 Tusimple, Inc. System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles
US10970564B2 (en) 2017-09-30 2021-04-06 Tusimple, Inc. System and method for instance-level lane detection for autonomous vehicle control
US10962979B2 (en) 2017-09-30 2021-03-30 Tusimple, Inc. System and method for multitask processing for autonomous vehicle computation and control
US10768626B2 (en) 2017-09-30 2020-09-08 Tusimple, Inc. System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles
US10410055B2 (en) 2017-10-05 2019-09-10 TuSimple System and method for aerial video traffic analysis
US10739775B2 (en) 2017-10-28 2020-08-11 Tusimple, Inc. System and method for real world autonomous vehicle trajectory simulation
US10666730B2 (en) 2017-10-28 2020-05-26 Tusimple, Inc. Storage architecture for heterogeneous multimedia data
US11853072B2 (en) * 2017-10-28 2023-12-26 Tusimple, Inc. System and method for real world autonomous vehicle trajectory simulation
US20230004165A1 (en) * 2017-10-28 2023-01-05 Tusimple, Inc. System and method for real world autonomous vehicle trajectory simulation
US10812589B2 (en) 2017-10-28 2020-10-20 Tusimple, Inc. Storage architecture for heterogeneous multimedia data
US11435748B2 (en) 2017-10-28 2022-09-06 Tusimple, Inc. System and method for real world autonomous vehicle trajectory simulation
US10573044B2 (en) * 2017-11-09 2020-02-25 Adobe Inc. Saliency-based collage generation using digital images
US10657390B2 (en) 2017-11-27 2020-05-19 Tusimple, Inc. System and method for large-scale lane marking detection using multimodal sensor data
US10528823B2 (en) 2017-11-27 2020-01-07 TuSimple System and method for large-scale lane marking detection using multimodal sensor data
US10528851B2 (en) 2017-11-27 2020-01-07 TuSimple System and method for drivable road surface representation generation using multimodal sensor data
US11580754B2 (en) 2017-11-27 2023-02-14 Tusimple, Inc. System and method for large-scale lane marking detection using multimodal sensor data
US10860018B2 (en) 2017-11-30 2020-12-08 Tusimple, Inc. System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners
US10877476B2 (en) 2017-11-30 2020-12-29 Tusimple, Inc. Autonomous vehicle simulation system for analyzing motion planners
US11681292B2 (en) 2017-11-30 2023-06-20 Tusimple, Inc. System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners
US11782440B2 (en) 2017-11-30 2023-10-10 Tusimple, Inc. Autonomous vehicle simulation system for analyzing motion planners
US11312334B2 (en) 2018-01-09 2022-04-26 Tusimple, Inc. Real-time remote control of vehicles with high redundancy
US11305782B2 (en) 2018-01-11 2022-04-19 Tusimple, Inc. Monitoring system for autonomous vehicle operation
US10607111B2 (en) * 2018-02-06 2020-03-31 Hrl Laboratories, Llc Machine vision system for recognizing novel objects
US11740093B2 (en) 2018-02-14 2023-08-29 Tusimple, Inc. Lane marking localization and fusion
US11852498B2 (en) 2018-02-14 2023-12-26 Tusimple, Inc. Lane marking localization
US11009365B2 (en) 2018-02-14 2021-05-18 Tusimple, Inc. Lane marking localization
US11009356B2 (en) 2018-02-14 2021-05-18 Tusimple, Inc. Lane marking localization and fusion
US11295146B2 (en) 2018-02-27 2022-04-05 Tusimple, Inc. System and method for online real-time multi-object tracking
US11830205B2 (en) 2018-02-27 2023-11-28 Tusimple, Inc. System and method for online real-time multi- object tracking
US10685244B2 (en) 2018-02-27 2020-06-16 Tusimple, Inc. System and method for online real-time multi-object tracking
US11074462B2 (en) 2018-03-18 2021-07-27 Tusimple, Inc. System and method for lateral vehicle detection
US11610406B2 (en) 2018-03-18 2023-03-21 Tusimple, Inc. System and method for lateral vehicle detection
US10685239B2 (en) 2018-03-18 2020-06-16 Tusimple, Inc. System and method for lateral vehicle detection
CN111936989A (en) * 2018-03-29 2020-11-13 谷歌有限责任公司 Similar medical image search
US11694308B2 (en) 2018-04-12 2023-07-04 Tusimple, Inc. Images for perception modules of autonomous vehicles
US11010874B2 (en) 2018-04-12 2021-05-18 Tusimple, Inc. Images for perception modules of autonomous vehicles
US11500101B2 (en) 2018-05-02 2022-11-15 Tusimple, Inc. Curb detection by analysis of reflection images
WO2019217562A1 (en) * 2018-05-09 2019-11-14 Figure Eight Technologies, Inc. Aggregated image annotation
US11017266B2 (en) * 2018-05-09 2021-05-25 Figure Eight Technologies, Inc. Aggregated image annotation
US11948082B2 (en) 2018-05-31 2024-04-02 Tusimple, Inc. System and method for proximate vehicle intention prediction for autonomous vehicles
US11104334B2 (en) 2018-05-31 2021-08-31 Tusimple, Inc. System and method for proximate vehicle intention prediction for autonomous vehicles
CN108898136A (en) * 2018-07-04 2018-11-27 安徽大学 A kind of cross-module state image significance detection method
US11238374B2 (en) * 2018-08-24 2022-02-01 Htc Corporation Method for verifying training data, training system, and computer readable medium
US10839234B2 (en) 2018-09-12 2020-11-17 Tusimple, Inc. System and method for three-dimensional (3D) object detection
US11727691B2 (en) 2018-09-12 2023-08-15 Tusimple, Inc. System and method for three-dimensional (3D) object detection
US11292480B2 (en) 2018-09-13 2022-04-05 Tusimple, Inc. Remote safe driving methods and systems
CN111071152A (en) * 2018-10-19 2020-04-28 图森有限公司 Fisheye image processing system and method
US11935210B2 (en) 2018-10-19 2024-03-19 Tusimple, Inc. System and method for fisheye image processing
US10796402B2 (en) 2018-10-19 2020-10-06 Tusimple, Inc. System and method for fisheye image processing
US11625557B2 (en) 2018-10-29 2023-04-11 Hrl Laboratories, Llc Process to learn new image classes without labels
US11440473B2 (en) * 2018-10-29 2022-09-13 Aisin Corporation Driving assistance apparatus
US11714192B2 (en) 2018-10-30 2023-08-01 Tusimple, Inc. Determining an angle between a tow vehicle and a trailer
US10942271B2 (en) 2018-10-30 2021-03-09 Tusimple, Inc. Determining an angle between a tow vehicle and a trailer
US20210248715A1 (en) * 2019-01-18 2021-08-12 Ramot At Tel-Aviv University Ltd. Method and system for end-to-end image processing
US11263752B2 (en) * 2019-05-09 2022-03-01 Boe Technology Group Co., Ltd. Computer-implemented method of detecting foreign object on background object in an image, apparatus for detecting foreign object on background object in an image, and computer-program product
US11823460B2 (en) 2019-06-14 2023-11-21 Tusimple, Inc. Image fusion for autonomous vehicle operation
WO2021000841A1 (en) * 2019-06-30 2021-01-07 华为技术有限公司 Method for generating user profile photo, and electronic device
US11914850B2 (en) 2019-06-30 2024-02-27 Huawei Technologies Co., Ltd. User profile picture generation method and electronic device
CN110377204A (en) * 2019-06-30 2019-10-25 华为技术有限公司 A kind of method and electronic equipment generating user's head portrait
US11810322B2 (en) 2020-04-09 2023-11-07 Tusimple, Inc. Camera pose estimation techniques
CN111666439A (en) * 2020-05-28 2020-09-15 重庆渝抗医药科技有限公司 Working method for rapidly extracting and dividing medical image big data aiming at cloud environment
US11701931B2 (en) 2020-06-18 2023-07-18 Tusimple, Inc. Angle and orientation measurements for vehicles with multiple drivable sections
CN112329810A (en) * 2020-09-28 2021-02-05 北京师范大学 Image recognition model training method and device based on saliency detection
CN113221715A (en) * 2020-10-31 2021-08-06 嘉应学院 Fire detection and identification method fused with visual attention mechanism
US20220138950A1 (en) * 2020-11-02 2022-05-05 Adobe Inc. Generating change comparisons during editing of digital images
CN112613528A (en) * 2020-12-31 2021-04-06 广东工业大学 Point cloud simplification method and device based on significance variation and storage medium
CN113345052A (en) * 2021-06-11 2021-09-03 山东大学 Classified data multi-view visualization coloring method and system based on similarity significance
US11958473B2 (en) 2021-06-17 2024-04-16 Tusimple, Inc. System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles

Also Published As

Publication number Publication date
US8175376B2 (en) 2012-05-08

Similar Documents

Publication Publication Date Title
US8175376B2 (en) Framework for image thumbnailing based on visual similarity
US8537409B2 (en) Image summarization by a learning approach
US8111923B2 (en) System and method for object class localization and semantic class based image segmentation
Marchesotti et al. A framework for visual saliency detection with applications to image thumbnailing
US9430719B2 (en) System and method for providing objectified image renderings using recognition information from images
US8879796B2 (en) Region refocusing for data-driven object localization
US8009921B2 (en) Context dependent intelligent thumbnail images
US8837820B2 (en) Image selection based on photographic style
US8897505B2 (en) System and method for enabling the use of captured images through recognition
US7809722B2 (en) System and method for enabling search and retrieval from image files based on recognized information
US7809192B2 (en) System and method for recognizing objects from images and identifying relevancy amongst images and information
US9158995B2 (en) Data driven localization using task-dependent representations
US8594385B2 (en) Predicting the aesthetic value of an image
US8917910B2 (en) Image segmentation based on approximation of segmentation similarity
WO2006122164A2 (en) System and method for enabling the use of captured images through recognition
Cavalcanti et al. A survey on automatic techniques for enhancement and analysis of digital photography
Wang Integrated content-aware image retargeting system
Chen et al. An efficient framework for location-based scene matching in image databases
Yang et al. An automatic object retrieval framework for complex background
Gavilan et al. Mobile image retrieval using morphological color segmentation
Cooray Enhancing Person Annotation for Personal Photo Management Using Content and Context based Technologies
Apostolidis et al. Multimedia Processing Essentials
Moskovchuk et al. Video Metadata Extraction in a Video-Mail System
Wang INTEGRATED CONTENT-AWARE IMAGE
Iqbal Important Person Detection from Multiple Videos

Legal Events

Date Code Title Description
AS Assignment

Owner name: XEROX CORPORATION, CONNECTICUT

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCHESOTTI, LUCA;CIFARELLI, CLAUDIO;CSURKA, GABRIELA;SIGNING DATES FROM 20090402 TO 20090414;REEL/FRAME:022559/0093

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: CITIBANK, N.A., AS AGENT, DELAWARE

Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:062740/0214

Effective date: 20221107

AS Assignment

Owner name: XEROX CORPORATION, CONNECTICUT

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT R/F 062740/0214;ASSIGNOR:CITIBANK, N.A., AS AGENT;REEL/FRAME:063694/0122

Effective date: 20230517

AS Assignment

Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:064760/0389

Effective date: 20230621

AS Assignment

Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:065628/0019

Effective date: 20231117

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:066741/0001

Effective date: 20240206