US20100226564A1 - Framework for image thumbnailing based on visual similarity - Google Patents
Framework for image thumbnailing based on visual similarity Download PDFInfo
- Publication number
- US20100226564A1 US20100226564A1 US12/400,277 US40027709A US2010226564A1 US 20100226564 A1 US20100226564 A1 US 20100226564A1 US 40027709 A US40027709 A US 40027709A US 2010226564 A1 US2010226564 A1 US 2010226564A1
- Authority
- US
- United States
- Prior art keywords
- image
- interest
- region
- dataset
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title description 33
- 238000000034 method Methods 0.000 claims abstract description 130
- 239000013598 vector Substances 0.000 claims abstract description 71
- 230000015654 memory Effects 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 description 31
- 238000013459 approach Methods 0.000 description 22
- 238000012545 processing Methods 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 230000011218 segmentation Effects 0.000 description 7
- 230000003595 spectral effect Effects 0.000 description 7
- 238000012935 Averaging Methods 0.000 description 6
- 230000001419 dependent effect Effects 0.000 description 6
- 239000003086 colorant Substances 0.000 description 5
- 238000000605 extraction Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000003909 pattern recognition Methods 0.000 description 4
- 206010041925 Staphylococcal infections Diseases 0.000 description 3
- 238000012733 comparative method Methods 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 208000015688 methicillin-resistant staphylococcus aureus infectious disease Diseases 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004424 eye movement Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000010348 incorporation Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000282320 Panthera leo Species 0.000 description 1
- 230000009901 attention process Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5838—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/422—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
- G06V10/426—Graphical representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
Definitions
- the exemplary embodiment relates to digital image processing. It finds particular application in connection with detection of salient regions and image thumbnailing in natural images based on visual similarity.
- Image thumbnailing consists of the identification of one or more regions of interest in an input image: for example, salient parts are aggregated in foreground regions, whereas redundant and non informative pixels become part of the background.
- the range of applications where thumbnailing can be applied is broad, including traditional problems like image compression, image visualizations, adaptive image display in small devices, but also more recent applications like variable data printing, assisted content creation, automatic blogging, and the like.
- Saliency detection is seen as a simulation or modeling of the human visual attention mechanism. In the field of image processing, it is understood that some parts of an image receive more attention from human observers than others. Saliency refers to the “importance” or “attractiveness” of the visual information in an image. A salient region may describe any relevant part of an image that is a main focus of a typical viewer's attention. Visual saliency models have been used for feature detection and to estimate regions of interest. Many of these methods are based on biological vision models, which aim to estimate which parts of images attract visual attention.
- Saliency maps can provide richer information about the relevance of features throughout an image. While interest points are generally simplistic corner (Harris) or blob (Laplace) detectors, saliency maps can carry higher level information. Such methods have been designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers.
- saliency maps have been used for object recognition, image categorization, automated image cropping, adaptive image display, and the like.
- saliency maps have been used to control the sampling density for feature extraction.
- saliency maps can be used as foreground detection methods to provide regions of interest (ROI) for classification. It has been shown that extracting image features in the locality of ROIs can give better results than sampling features uniformly through the image. A disadvantage is that such methods may miss important context information from the background.
- top-down saliency detection is often referred to as top-down saliency detection.
- Bottom-up strategies are by far the most common and they are advantageous if the low level features represent the salient parts of the image well (e.g., isolated objects, uncluttered background). Top-down methods help when other factors dominate (e.g., the presence of human face), but they are lacking in generality. Hybrid approaches, in general, are designed in a two stage fashion where top-down strategies filter out noisy regions in bottom-up saliency maps.
- Top-down visual attention processes are considered to be driven by voluntary control, and related to the observer's goal when analyzing a scene. These methods take into account higher order information about the image such as context, structure, etc.
- Object detection can be seen as a particular case of top-down saliency detection, where the predefined task is given by the object class to be detected (See, Jiebo Luo, “Subject content-based intelligent cropping of digital photos,” in IEEE Intl. Conf. on Multimedia and Expo (2007)).
- An additional example of a top-down approach is where the system first classifies the image in twrms of landscape, close-up, faces, etc. and then it applies the most appropriate thumbnailing/cropping strategy (See, G. Ciocca, C. Cusano, F. Gasparini, and R. Schettini, “Self-adaptive image cropping for small display,” in IEEE Intl. Conf. on Consumer Electronics (2007)).
- a method for detecting a region of interest in an image includes, for each image in a dataset of images for which a region of interest has been respectively established, storing a respective dataset image representation based on features extracted from the image.
- the method includes generating an original image representation for the original image based on features extracted from the image, identifying a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, training a classifier with information extracted from the established regions of interest of the subset of similar images and, with the trained classifier, identifying a region of interest in the original image.
- an apparatus for detecting a region of interest in an image includes memory which stores the dataset image representations, and instructions for performing the above-described method.
- a processor with access to the instructions and dataset image representations executes the instructions.
- an apparatus for detecting a region of interest in an image includes memory which, for a dataset of images for which a respective region of interest has been established, stores a set of dataset image representations, each dataset image representation being derived from features extracted from a respective one of the images in the dataset.
- Memory stores instructions which, for an original image for which a region of interest is to be detected, generate an original image representation for the original image based on features extracted from the original image, identify a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, and train a classifier to identify a region of interest in the original image, the classifier being trained with positive and negative examples, each of the positive examples comprising a high level representation based on features extracted from the established region of interest of a respective one of the subset of similar images and each of the negative examples comprising a high level representation based on features extracted from outside the established region of interest of a respective one of the subset of similar images.
- a method for detecting a region of interest in an image includes storing a set of image representations, each image representation being based on features extracted from patches of a dataset image, where for each dataset image, the patch features are identified as salient or non-salient based on whether or not the patch is within a manually identified region of interest.
- the method includes generating an original image representation for the original image based on features extracted from patches of the image, computing a distance measure between the original image representation and image representations in the set of image representations to identify a subset of similar image representations from the set of image representations, and training a classifier with positive and negative examples extracted from the images corresponding to subset of similar image representations, the positive examples each being based on the salient patch features of a respective image and the negative examples being based on non-salient patch features of the respective image.
- the trained classifier a region of interest in the original image is identified based on the patch features of the original image.
- FIG. 1 is a functional block diagram of an apparatus for identifying a region of interest in an image in accordance with one aspect of the exemplary method
- FIG. 2 is a flow chart illustrating a method for identifying a region of interest in an image in accordance with one aspect of the exemplary method which may be performed with the apparatus of FIG. 1 ;
- FIG. 3 illustrates the images processed during steps of the method
- FIG. 4 illustrates substeps of part of the method of FIG. 2 ;
- FIG. 5 illustrates substeps of part of the method of FIG. 2 ;
- FIG. 6 illustrates patches and windows used in generating a saliency map
- FIG. 7 illustrates inputting a salient region into categorizer which generates a category for the image
- FIG. 8 illustrates F-measure values for various saliency detection methods as a function on threshold size
- FIG. 9 illustrates Precision, Recall, and F-measure data for an Example comparing the present method (methods A and B, without and with Graph-cut) to comparative methods for saliency detection (methods C,D,E, and F);
- FIG. 10 illustrates the displacement of a bounding box around the salient region from a manually assigned bounding box for the exemplary method (method B) and comparative methods C, D, E, and F.
- the exemplary embodiment relates to an apparatus and computer-implemented method and computer program product for detecting saliency in an image, such as a natural image, based on similarity of the original image with images for which visually salient regions of pixels are pre-segmented.
- the method assumes that images sharing similar visual appearance (as determined by comparing computer-generated content-based representations) share the same salient regions.
- saliency detection is approached as a binary classification problem where pre-segmented salient/non salient pixels are available to train and test an algorithm.
- the method allows both context and context independent saliency detection within a single framework.
- the apparatus may be embodied in an electronic processing device, such as the illustrated computer 10 .
- the electronic processing device 10 may include one or more specific or general purpose computing devices, such as a network server, Internet-based server, desk top computer, laptop computer, personal data assistant (PDA), cellular telephone, or the like.
- the apparatus 10 includes an input component 12 , an output component 14 , a processor 16 , such as a CPU, and memory 18 .
- the computer 10 is configured to implement a salient region detector 20 , hosted by the computer 10 , for identifying a salient region or regions of an original input image.
- the salient region detector 20 may be in the form or software, hardware, or a combination thereof.
- the exemplary salient region detector 20 is stored in memory 18 (e.g., non-volatile computer memory) and comprises instructions for performing the exemplary method described below with reference to FIG. 2 . These instructions are executed by the processor 16 .
- a database 22 of previously annotated images (and/or information extracted therefrom) is stored in memory 18 or a separate memory.
- Components 12 , 14 , 16 , 18 , of the computer 10 may be connected for communication with each other by a data/control bus 24 .
- Input and output components may be combined or separate components and may include, for example, data input ports, modems, network connections, and the like.
- the computer 10 is configured for receiving an original image 30 , e.g., via input component 12 , and storing the image 30 in memory, such as a volatile portion of computer memory 18 , while being processed by the salient region detector 20 .
- the image 30 is transformed by the salient region detector 20 , e.g., by cropping or otherwise identifying a salient region or regions 32 of the image.
- the computer 10 is also configured for storing and/or outputting the salient region 32 generated for the image 30 by the salient region detector 20 and for outputting a transformed image 34 in which the salient region is identified or which comprises a cop of the original image based on the salient region 32 , e.g., by the output component 14 .
- the salient region image data may be cropped from the original image data.
- a classifier 36 incorporated in the salient region detector or in communication with, is fed by the salient region detector with a subset of the database images (or information extracted therefrom) on which the classifier is trained to identify a salient region in an original image.
- the computer 10 may include or be in data communication with a display 40 , such as an LCD screen, or other output device for displaying the salient region 32 .
- a display 40 such as an LCD screen, or other output device for displaying the salient region 32 .
- the salient region 32 may be further processed, e.g., by incorporation into a document 42 , which is output by the output component 14 , or output to a categorizer 44 .
- the input image 30 generally includes image data for an array of pixels forming the image.
- the image data may include colorant values, such as grayscale values, for each of a set of color separations, such as L*a*b* or RGB, or be expressed in another other color space in which different colors can be represented.
- grayscale refers to the optical density value of any single image data channel, however expressed (e.g., L*a*b*, RGB, YCbCr, etc.).
- the images may be photographs, video images, graphical images (such as freeform drawings, plans, etc.), text images, or combined images which include photographs along with text, and/or graphics, or the like.
- the images may be received in PDF, JPEG, GIF, JBIG, BMP, TIFF or other common file format used for images and which may optionally be converted to another suitable format prior to processing.
- Input images may be stored in a virtual portion of memory 18 during processing.
- color as used herein is intended to broadly encompass any characteristic or combination of characteristics of the image pixels to be employed in the extraction of features.
- the “color” may be characterized by one, two, or all three of the red, green, and blue pixel coordinates in an RGB color space representation, or by one, two, or all three of the L, a, and b pixel coordinates in an Lab color space representation, or by one or both of the x and y coordinates of a CIE chromaticity representation, or the like.
- the color may incorporate pixel characteristics such as intensity, hue, brightness, etc.
- pixel as used herein is intended to denote “picture element” and encompasses image elements of two-dimensional images or of three-dimensional images (which are sometimes also called voxels to emphasize the volumetric nature of the pixels for three-dimensional images).
- Image 30 can be input from any suitable image source 50 , such as a workstation, database, scanner, or memory storage device, such as a disk, camera memory, memory stick, or the like.
- the image source 30 may be temporarily or permanently communicatively linked to the computer 10 via a wired or wireless link 52 , such as a cable, telephone line, local area network or wide area network, such as the Internet, through a suitable input/output (I/O) connection 12 , such as a modem, USB port, or the like.
- processor 16 may be the computer's central processing unit (CPU).
- the exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like.
- any processor capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIG. 2 , can be used to implement the method for generating an image representation.
- Memory 18 may be in the form of separate memories or combined and may be in the form of any type of tangible computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, holographic memory, or suitable combination thereof.
- RAM random access memory
- ROM read only memory
- magnetic disk or tape magnetic disk or tape
- optical disk optical disk
- flash memory holographic memory
- FIG. 3 illustrates graphically the processing of an exemplary image 30 during the method.
- the method begins at S 100 .
- a large dataset of pre-segmented images 22 is stored. These are images for which the pixels have been identified as either salient or non-salient, based on human interest.
- the dataset ideally includes a wide variety of images, including images which are similar in content to the image 30 for which a region of interest to be detected.
- the dataset may include at least 100, e.g., at least 1000 images, such as at least about 10,000 images, and can be up to 100,000 or more, each dataset image having an established region of interest.
- the pre-segmented region(s) of each image can further be associated with a semantic label referring to the content of the region.
- a set of label types may be defined, such as animals, faces, people, buildings, automobiles, landscapes, flowers, other, and each image manually assigned one or more of these labels, based on its region of interest.
- image representations are generated for each of the images in the dataset.
- the representations are generally high level representations which are derived from low level features extracted from the image.
- the high level representation of each pre-segmented image is based on fusing (e.g., a sum or concatenation) of positive (+ve) and negative ( ⁇ ve) high level representations, the positive one generated for the salient region (region of interest) of the image, the negative one for the non-salient region (i.e., everywhere except the region of interest).
- the two high level representations of each of the pre-segmented images may be derived from patch level representations, e.g., fisher vectors from salient region patches for generating the +ve high level representation and fisher vectors from patches outside the salient region for the ⁇ ve high level representation.
- S 104 may be performed prior to input of image 30 and the computed high level +ve and ⁇ ve representations stored in memory 18 . At this point, storing of the actual images in the dataset 22 may no longer be necessary. Further details of this step are illustrated in FIG. 4 and are described below.
- an image 30 for which a visually salient region (which may be referred to herein as a region of interest (ROI)) is to be identified is input and stored in memory.
- ROI region of interest
- a representation of the input image is generated (e.g., by the salient region detector 20 ), based on low level features extracted from patches of the image in a similar manner to that for the pre-segmented images in the data-set except that here, there are no pre-segmented salient regions. Further details of this step are illustrated in FIG. 5 and are described below.
- a subset K of images in the dataset of pre-segmented images is identified, based on similarity of their high level representations to that of the original image.
- the K-nearest neighbor images may be retrieved from the annotated dataset 22 by the salient region detector 20 using a simple distance measure, such as the L 1 norm distance between Fisher signatures of each dataset image (e.g., as a sum of the high level +ve and ⁇ ve representations) and the high level representation of the input image (e.g., as a sum of all high level patch representations) e.g., as generated using a global visual vocabulary.
- the subset of K nearest neighbor images is identified in the substantially the same way, but in this case, from among those images having pre-segmented regions labeled with the selected semantic label (assuming there are sufficient images in the dataset with pre-segmented regions annotated with the selected label).
- a binary classifier 36 is trained using, as positive examples, the representations of the salient regions of the retrieved K-nearest neighbor images (designated by a “+” in FIG. 3 ), which may all be concatenated or summed to form a single vector. As negative examples, representations the non-salient backgrounds regions are used (designated by a “ ⁇ ” in FIG. 3 ), which again, may all be concatenated or summed to form a single vector.
- the same high level representations can be used by any binary classifier, or alternatively other local patch representations can be considered in another embodiment.
- the trained classifier 36 is used to output a saliency probability for each patch of the original image extracted at S 106 .
- a region of interest of the original image is identified by the salient region detector 20 .
- This step may include generating a saliency map 56 ( FIG. 3 ).
- the saliency map may be refined by the salient region detector 20 , e.g., with graph-cut segmentation to refine the salient region, as illustrated at 58 in FIG. 3 .
- the transformed image e.g., a crop of the image based on the salient region or an image in which the salient region is identified by the salient region detector 20 , e.g., by annotations such as HTML tags, is output.
- further processing may be performed on the transformed image, e.g., the image crop based on the salient region may be displayed or incorporated into a document, e.g., placed in a predetermined placeholder location in a text document or sent to a categorizer 44 for assigning an object class to the image 30 .
- the method ends at S 124 .
- the present apparatus and method take advantage of a process which allows image saliency to be learned using (previously annotated) visually similar example images. Additionally, segmentation strategies can be advantageously employed for saliency detection. Further, the method is generic in the sense that it does not need to be tied to any specific category of images (e.g., faces), but allows a more broad concept of visual similarity, while at the same time, being readily adaptable to consideration of context. Finally, while the exemplary method has been described with particular reference to photographic (natural) images, the method is applicable to other types of images, such as medical or text document images, assuming that appropriate annotated data is available.
- one or more human observers looks at each image, e.g., on a computer screen, and identifies a salient region (a region which the observer considers to be the most interesting). For example, the user may generate a bounding box which encompasses the salient region. Alternatively, the observer may identify a region or regions of interest by moving the cursor around the region(s) to generate a bounded region, which may then be processed, for example, by automatically creating a bounding box which encompasses the bounded region. In other embodiments, eye gaze data may be employed to identify a region of interest.
- an eye gaze tracking device tracks eye movements of the observer while viewing the image for a short period of time.
- the tracking data is superimposed on the image to identify the region of interest.
- the identified regions/observations of several users may be combined to generate an overall region of interest for the image.
- the image 62 can then be segmented into a salient region 64 and a non salient region 66 , based on the identified region of interest.
- the image may then be annotated with the segmentation information, e.g., by applying a HTML tag or by storing the segmentation in a separate file.
- the salient region may be associated with a semantic concept (by annotating the salient region or entire image with a label).
- ROI Region of Interest
- S 104 may include the following substeps for each image 62 in the dataset 22 :
- a patches 70 A,B,C, etc., 72 A,B,C,D, and 74 are extracted from the image e.g., at multiple scales. This is illustrated for a portion of the image 62 in FIG. 6 , showing patches (unbroken lines) at three scales by way of example, where the arrows point roughly to the centers of the respective patches.
- a representation of the patch (e.g., a Fisher vector) may be generated, based on the low level features.
- patches are designated as salient or non salient, depending on whether they are within the pre-segmented region or not.
- Various methods may be used to determine whether a patch is be considered to be “within” the salient region.
- a threshold degree of overlap may be sufficient for a patch to be considered within the salient region.
- the overlap is computed relative to the area of the patch size, e.g., if 50% or more of the patch is within the salient region, then it is accepted as being within it. If the region of interest is too small, relative to the size of the patch (e.g., ROI is less than 70% of the patch area), then the patch will not be considered.
- a patch is considered to be within the salient region if its geometric center lies within the salient region. In yet another embodiment, the patch is considered to be within the salient region if it is entirely encompassed by or entirely encompasses the salient region.
- a high level +ve representation of the salient region of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the salient patches and a high level ⁇ ve representation of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the non-salient patches.
- patch representations e.g., fisher vectors, or simply, low level features
- patch representations e.g., fisher vectors, or simply, low level features
- a high level representation of the image is generated, e.g., as a feature vector, e.g., a Fisher vector-based Image Signature, for example, by concatenation or other function of the +ve and ⁇ ve high level representations (Fisher FG vector and Fisher BG vector).
- a feature vector e.g., a Fisher vector-based Image Signature
- low level features are extracted, e.g., as a features vector.
- a representation (e.g., Fisher vector) may be generated, based on the extracted low level features.
- a high level representation of the image is extracted, based on the patch representations or low level features.
- the high level representation is a vector (e.g., a Fisher vector-based Image Signature) formed by concatenation or other function of the patch level Fisher vectors.
- a Bag-of-Visual words (BOV) representation of the image as disclosed, for example, in above-mentioned U.S. Pub. Nos. 2007/0005356; 2007/0258648; 2008/0069456; the disclosures of which are incorporated herein by reference, and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints,” ECCV Workshop on Statistical Learning in Computer Vision (2004); also the method of Y.
- BOV Bag-of-Visual words
- multiple patches are extracted from the image (original or dataset image) at various scales (S 104 a, S 108 a ).
- low level features are extracted (S 104 b , S 108 b ).
- the low level features which are extracted from the patches are typically quantitative values that summarize or characterize aspects of the respective patch, such as spatial frequency content, an average intensity, color characteristics (in the case of color images), gradient values, and/or other characteristic values.
- at least about fifty low level features are extracted from each patch; however, the number of features that can be extracted is not limited to any particular number or type of features for example, 1000 or 1 million low level features could be extracted depending on computational capabilities.
- the low level features include local (e.g., pixel) color statistics, and texture.
- local RGB statistics e.g., mean and standard deviation
- texture gradient orientations (representing a change in color) may be computed for each patch as a histogram (SIFT-like features).
- SIFT-like features two (or more) types of low level features, such as color and texture, are separately extracted and the high level representation of the patch or image is based on a combination (e.g., a sum or a concatenation) of two Fisher Vectors, one for each feature type.
- SIFT descriptors are multi-image representations of an image neighborhood, such as Gaussian derivatives computed at, for example, eight orientation planes over a four-by-four grid of spatial locations, giving a 128-dimensional vector (that is, 128 features per features vector in these embodiments).
- Other descriptors or feature extraction algorithms may be employed to extract features from the patches. Examples of some other suitable descriptors are set forth by K. Mikolajczyk and C. Schmid, in “A Performance Evaluation Of Local Descriptors,” Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wis., USA, June 2003, which is incorporated in its entirety by reference.
- a feature vector can be employed to characterize each patch.
- the feature vector can be a simple concatenation of the low level features.
- the extracted low level features can be used to generate a high level representation of the patch (e.g., a Fisher vector) (S 104 c , S 108 c ).
- a visual vocabulary is built for each feature type using Gaussian Mixture Models. Modeling the visual vocabulary in the feature space with a GMM may be performed according to the method described in F. Perronnin, C. Dance, G. Csurka and M. Bressan, “ Adapted Vocabularies for Generic Visual Categorization ,” In ECCV (2006).
- each patch is then characterized (at S 104 c , S 108 c ) with a gradient vector derived from a generative probability model.
- the visual vocabulary is modeled by a Gaussian mixture model in a low level feature space where each Gaussian corresponds to a visual word.
- the GMM vocabulary is trained using maximum likelihood estimation (MLE) considering all or a random subset the low level descriptors extracted from the annotated dataset 22 .
- MLE maximum likelihood estimation
- the Fisher gradient vector f t of the descriptor x t is then just the concatenation of the partial derivatives in Equations (1) and (2), leading to a 2 ⁇ D ⁇ N dimensional vector, where D is the dimension of the low level feature space. While the Fisher vector is high dimensional, it can be made relatively sparse as only a small number of components have non-negligible values.
- a Fisher vector Considering the gradient log-likelihood of each patch with respect to the parameters of the Gaussian Mixture leads to a high level representation of the patch which is referred to as a Fisher vector.
- the dimensionality of the Fisher vector can be reduced to a fixed value, such as 50 or 100 dimensions, using principal component analysis.
- the two Fisher vectors are concatenated or otherwise combined to form a single high level representation of the patch having a fixed dimensionality.
- features-based representations can be used to represent each patch, such as a set of features, a two- or more-dimensional array of features, or the like.
- the high level representation of the original image can then be generated from the patch feature vectors (e.g., the patch Fisher vectors) (S 104 f , S 108 d ).
- the patch feature vectors e.g., the patch Fisher vectors
- the patches are labeled according to their overlap with the manually designated salient regions. This leads to two sets of low level features X+and X ⁇ referring to the set of patches that are considered salient and those which are non-salient.
- two Fisher vectors f X+ and f X ⁇ are computed. These two vectors are then stored as indexes in the database and are, in the exemplary embodiment, the only required information from the dataset images needed to process a new image.
- each original image 30 and each of the K nearest neighbor images 62 is represented by a high level representation which is simply the concatenation of two Fisher Vectors, one for texture and one for color, each vector formed by averaging the Fisher Vectors of the patches.
- This single vector is referred to herein as a Fisher image signature.
- the patch level Fisher vectors may be otherwise fused, e.g., by concatenation, dot product, or other combination of patch level Fisher vectors to produce an image level Fisher vector.
- a Fisher image signature F Y is computed in an analogous way with respect to the initialization phase, except that all patches of the image are used to compute the signature (S 104 d ).
- the Fisher image signature is exemplary of types of high level representation which can be used herein.
- Other image signatures used in the literature for image retrieval may alternatively be used, as discussed above, such as a Bag-of-Visual Words (BOV) representation or Fisher kernel (FK).
- BOV Bag-of-Visual Words
- FK Fisher kernel
- the most similar images are retrieved from the dataset where, for each image, a manually annotated ROI is available, as described above.
- the K nearest neighbors are identified, based on the distance metric, where K may be, for example, at least 10, and up to about 50 or 100.
- K may be, for example, at least 10, and up to about 50 or 100.
- a suitable subset contains about 20-30 images, which may represent, for example, less than 20%, e.g., no more than about 10% of the number of images in the dataset, and in one embodiment, no more than about 1 % or 0.2% thereof.
- the retrieval of a set of K images from D which are visually similar to I n generates a list of signatures ⁇ F X+ ,F X ⁇ > associated with the K most similar images to I n .
- a distance metric is computed between the global Fisher image signature obtained by summing F X+ and F X ⁇ (or other high level image representation) and that of the original image F Y .
- the K most similar images are retrieved using the Fisher image signature with the normalized L 1 distance measure as described, for example, in S. Clinchant, J.-M. Renders and G.
- a normalized L1 measure can be used to retrieve similar images:
- ⁇ circumflex over (f) ⁇ is the vector f normalized to normalize L 1 as equal to 1
- ⁇ circumflex over (f) ⁇ i are the elements of the vector ⁇ circumflex over (f) ⁇
- f X f X+ +f X ⁇ (as the set of descriptors in image X is the union of salient and non-salient patches).
- distance measure used is the L 1 norm distance between Fisher Image Signatures of each dataset image and the input image.
- other distance measures such as Euclidian distance, chi 2 distance, or the like, may alternatively be used for identifying a subset of similar images from the dataset.
- the classifier 36 is trained using the Fisher Vector representations of image patches extracted from the retrieved K-nearest neighbor images. For the K-nearest neighbor images retrieved, manually annotated salient regions are available in, e.g., the form of bounding boxes. Therefore in each annotated image, the system considers as positive (i.e. salient) patches, the ones inside the annotated bounding box, and as negative (i.e., non-salient) all the others.
- FG signature For each retrieved image X j , a Foreground Fisher vector (FG signature) f X+ j is/has been computed by averaging the Fisher Vectors of the +ve patches and a Background Fisher Vector (BG signature) f X ⁇ j is/has been computed by averaging over the ⁇ ve patches. Then, all Fisher vectors representing salient regions are collected (summed) and all Fisher vectors representing non-salient regions are collected (summed) in the K most similar image retrieved images leading to a foreground Fisher model and a background Fisher model:
- the patches are designated as positives only if they are within the salient regions labeled with the target concept. Otherwise they are considered negatives. Therefore, while in the context-independent case the f X+ j and f X ⁇ j need not be recomputed (they correspond to the values in the stored signatures ⁇ F X+ ,F X ⁇ >), in the context-dependent case, these values may be re-computed on-line as the set of positive and negative patches may be different (if multiple objects were designed as salient regions in the image and have different labels).
- a saliency score is computed based on the foreground Fisher model and on the background Fisher model. For example, a patch x i is considered salient, if its normalized L 1 distance to the foreground Fisher model is smaller than to the background Fisher model:
- the binary classifier score may be replaced with a non-binary score which is a simple function of the normalized L1 distances:
- S S( )
- the value S can be assigned to the center pixel of each region and then either interpolate the values between these centers or use a Gaussian propagation of these values. The latter can be done by averaging over all Gaussian weighted scores:
- W is the value in pixel p of the Gaussian centered in the geometrical center of each the region .
- a diagonal isotropic covariance matrix may be used, with values (0.6*R) 2 , R 2 being the size of .
- the saliency map is built for the original image by considering N such overlapping sub-windows (shown as 80 A,B,C, etc.) of the same size (e.g., 50 pixels*50 pixels) (a few of these windows 80 are illustrated in FIG. 6 ).
- the windows may be of the same size or somewhat larger than the smallest patches.
- a patch is considered to belong to a window if the geometric center of the patch lies within the window. For example, in the case of window 80 E, patches 70 F and 74 are considered to belong to it. Note that this could be done at the patch level rather than using windows 80 . However averaging over several patches gives more stable results.
- the window's saliency score is computed based on the distance of the window signature (Eqn. (6) to the Foreground signature (FS) and Background signature (BS), as defined in (Eqn. (5), using the (optionally normalized) L 1 distance computed as in Eqn. (7).
- the scores at the window level are projected to the pixels, as described in (Eqn. 8) above (averaging for each pixel, the window saliency scores of the windows containing that pixel).
- Equation (8) has a low computational cost but it is also a rather simple evaluation of the saliency score.
- a patch classifier (not shown) could be used to compute a saliency probability map by using the approach described in Gabriela Csurka and Florent Perronnin, “A Simple High Performance Approach to Semantic Segmentation,” British Machine Vision Conference (BMVC), Leeds, UK (September 2008).
- BMVC British Machine Vision Conference
- BMVC British Machine Vision Conference
- BMVC British Machine Vision Conference
- BMVC British Machine Vision Conference
- BMVC British Machine Vision Conference
- a patch classifier is trained and the patch probability score for the original image is then propagated from patches to pixels as described in the Csurka and Perronnin reference.
- the saliency maps obtained by this type of classifier are not necessarily better than that which uses Eqn. 8.
- a bounding box may simply be drawn to encompass all (or substantially all) pixels which exceed a threshold probability score which is then designated as the region of interest.
- Different strategies can be designed to build a thumbnail from this map.
- One option is to select the bounding box of the biggest or most centered connected component.
- Another option is to consider all connected components and retarget them into a single region as proposed in V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B.
- refinement techniques may be applied to define an ROI based on the salient pixels which takes further considerations into account (S 118 ).
- the role of this step is to enhance the precision.
- the salient regions correspond to isolated objects. Therefore, regions classified as salient can be further refined by taking into account edge constraints.
- a Graph-Cut segmentation may be used to adjust the borders of the salient region. This approach assumes that the estimated region contains a consistent part of the relevant objects.
- One suitable method is based on the Graph-Cut algorithms described in Rother, C., Kolmogorov, V., and Blake, A., “Grabcut: Interactive foreground extraction using iterated graph cuts,” In ACM Trans. Graphics ( SIGGRAPH 2004) 23(3), 309-314 (2004).
- the problem of segmentation is formulated in terms of energy minimization (i.e., max-flow/min-cut).
- the image is represented as graph in which each pixel is a node and the edges can represent color similarity between adjacent pixels as in a Markov Random Field.
- two extra nodes starting and ending nodes are added to the graph and linked to each pixel based on the probability that the pixel belongs to background or foreground.
- the saliency map generated at S 116 is used to build an initial Graph-Cut model.
- a first Gaussian Mixture Model (GMM) is created for the foreground colors and a second GMM is created for the background colors.
- GMM Gaussian Mixture Model
- FIG. 3 shows an example graph-cut mask 58 created from the ROI mask 56 generated at S 116 .
- the graph-cut method is performed as follows: First, two thresholds are chosen (one positive th+ and one negative th ⁇ ). This separates the saliency map S into 3 different regions: pixels u labeled as salient (S(u)>th+), pixels labeled as non-salient (S(u) ⁇ th ⁇ ) and unknown (the others). Two Gaussian Mixture Models (GMMs) ⁇ 1 and ⁇ 2 are created, one using RGB values of salient (foreground) pixels and one using RGB values of non salient (background) pixels. Then the following energy:
- E ⁇ ( L ) ⁇ u ⁇ P ⁇ D u ⁇ ( u ) + ⁇ ( u , v ) ⁇ c ⁇ V u , v ⁇ ( u , v ) Eqn . ⁇ ( 9 )
- V u , v ⁇ ( u , v ) ⁇ ⁇ ⁇ ⁇ ( u , v ) ⁇ ⁇ C ⁇ ⁇ l u , l v ⁇ exp ( - ⁇ u - v ⁇ 2 2 * ⁇ ) Eqn . ⁇ ( 10 )
- the positively labeled area after Graph-Cut is too small, compared with the size of the original image, e.g., less than 5% or less than 10% of its size.
- step S 116 i.e., the binarized Saliency Map 56 is used for identifying an ROI.
- the ROI may be generated, for example, from the saliency map 58 (or 56 ) by processing the map in order to find the biggest, most centered object based on an analysis of statistics of the saliency map distribution (e.g., center of mass of the distribution, cumulative probability etc.).
- statistics of the saliency map distribution e.g., center of mass of the distribution, cumulative probability etc.
- a rectangular crop (image thumbnail) 90 can then be generated, based on this salient region.
- the method illustrated in FIGS. 2 , 4 , and 5 may be implemented in a computer program product that may be executed on a computer.
- the computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or may be a transmittable carrier wave in which the control program is embodied as a data signal.
- Computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like, or any other medium from which a computer can read and use.
- the exemplary method thus described may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like.
- any device capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in FIGS. 2 , 4 , and 5 , can be used to implement the automated method for identifying a region of interest in an image.
- variable data applications such as 1 to 1 personalization and direct mail marketing often employ an image.
- a document 42 can be created incorporating an appropriately sized crop 90 which incorporates the salient region.
- the human observers used to annotate the salient regions of the images in the dataset 22 can be selected to represent the target audience.
- two or more sets of annotators may be used, e.g., one group comprising only females, the other, only males, and separate sets of image signatures stored for each group.
- the K nearest neighbors may be different, depending on which set of signatures is used.
- Variable data printing is not the only application of the exemplary system and apparatus.
- Other applications such as image and document asset management or document image/photograph set visualization, and the like can also benefit.
- a crop 90 of the original image based on the salient region, can be used for a thumbnail which is displayed in place of the original image, allowing a user to select images of interest from a large group of images, based on the interesting parts.
- the thumbnail (crop) 90 can be fed to a categorizer 44 for categorizing the image based on image content.
- the categorizer is not confused by including areas of the image which are less likely to be of visual interest.
- the image crop 90 is fed to a categorizer, which has been trained with training image crops 94 , generated in the same way, but which has been annotated with a respective class (e.g., dogs, cats, flowers in the exemplary embodiment).
- the categorizer (which may incorporate a multiclass classifier or a set of binary classifiers, one for each object class) outputs a class 96 for the crop, based on a similarity of features of the image crop to those of the training images.
- the exemplary method is evaluated by comparing the results with those of four comparative methods for saliency detection:
- Method A Exemplary method without Graph-cut.
- Method B Exemplary method using Graph-cut, as described above.
- Method D (ITTI): A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention.
- Itti and Koch 2000 A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention.
- Matlab implementation available at http://www.saliencytoolbox.net/ was employed.
- Method F (CRF): A learning method (Liu, et al.), based on a Conditional Random Field classifier.
- MRSA Dataset Part of the dataset described in Liu, et al. (MRSA Dataset) was used to train and test the exemplary method.
- the dataset was composed of 5000 images labeled by different users with no specific skills in graphic design.
- the dataset included images of a variety of different subjects. In general, a single object is present in the image with a broad range of backgrounds with fairly homogeneous color or texture.
- Ground truth data comprising manually annotated regions of interest generated by different users is also available.
- the users manually selected a rectangle (bounding box) containing the region of interest, which is typically represented by a full object or, in some cases by a subpart of the object (e.g., face).
- the 5000 images from the MRSA Dataset used in this example had bounding boxes annotated by nine users.
- the annotations are highly consistent with a very small variance over the nine bounding boxes.
- the bounding boxes represent approximately 35% of the total area of the image, but this varies over a fairly wide distribution.
- the distance of the center of mass of the object from the center of the image is, on average, 42 pixels. Again the annotated dataset showed a distribution.
- a ground truth saliency map g(x,y) For each image in the dataset, a ground truth saliency map g(x,y) has been generated to evaluate the results based on user annotations (bounding boxes containing salient regions). In particular, since the annotations for MRSA are highly consistent, an average of the nine bounding boxes of the various users was used. Maps g(x,y) were generated, with rectangular salient regions pixels set to 1 and 0 otherwise.
- Performance was evaluated by providing benchmarks for the performances using the following measures: BDE (See, D. R. Martin, C. C. Fowkles and J. Malik, “Learning to detect natural image boundaries using local brightness, color and texture cues,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI 26(5) pp. 530-549 (May 2004)) was used for assessing the displacement of the bounding boxes ( FIG. 10 ) and Precision, Recall and F-measure to acessess the quality of the saliency map.
- Precision (Pr), Recall (Re) and F-measure (F ⁇ ) can be defined according to Liu, et al., as follows:
- FIG. 8 shows the behavior of the F-measure as a function of the threshold on the map.
- the exemplary method (A and B) can be seen to give a better result than Methods C and E.
- FIG. 8 shows the improvement that the Graph-Cut stage (Method B) introduces in the proposed method, increasing the F-measure of almost 10% as compared with Method A (without Graph-Cut).
- Methods D and F the thresholding was not applied because the results were taken directly from the Hou, et al. paper.
- FIG. 9 shows the thresholds selected for the Methods compared.
- FIG. 10 shows the Bounding Box displacement index. It represents the average distance, in pixels, of the center of the automatically detected Bounding Box from the center of the ground truth Bounding Box. The smaller this value the more accurate is the bounding box detected. As can be seen, the exemplary method using Graph-Cut (Method B) gave the best results.
Abstract
Description
- The following copending applications, the disclosures of which are incorporated herein in their entireties by reference, are mentioned:
- U.S. patent application Ser. No. 12/250,248, filed Oct. 13, 2008, entitled IMAGE SUMMARIZATION BY A LEARNING APPROACH, by Luca Marchesotti, et al.
- U.S. application Ser. No. 12/361,235, filed Feb. 5, 2009, entitled MODELING IMAGES AS SETS OF WEIGHTED FEATURES, by Teofilo E. de Campos, et al.
- U.S. application Ser. No. 12/033,434, filed Feb. 19, 2008, entitled CONTEXT DEPENDENT INTELLIGENT THUMBNAIL IMAGES, by Gabriela Csurka.
- U.S. application Ser. No. 12/049,520 filed Mar. 17, 2008, entitled AUTOMATIC GENERATION OF A PHOTO GUIDE, by Luca Marchesotti, et al.
- U.S. patent application Ser. No. 12/123,511, filed May 20, 2008, entitled IMPROVING IMAGE VISUALIZATION THROUGH CONTENT-BASED INSETS, by Luca Marchesotti, et al.
- U.S. application Ser. No. 12/123,586, filed May 20, 2008, entitled METHOD FOR AUTOMATIC ENHANCEMENT OF IMAGES CONTAINING SNOW, by Luca Marchesotti.
- U.S. application Ser. No. 12/175,857, filed Jul. 18, 2008, entitled SYSTEM AND METHOD FOR AUTOMATIC ENHANCEMENT OF SEASCAPE IMAGES, by Luca Marchesotti.
- U.S. application Ser. No. 12/191,579, filed on Aug. 14, 2008, entitled SYSTEM AND METHOD FOR OBJECT CLASS LOCALIZATION AND SEMANTIC CLASS BASED IMAGE SEGMENTATION, by Gabriela Csurka, et al.
- The exemplary embodiment relates to digital image processing. It finds particular application in connection with detection of salient regions and image thumbnailing in natural images based on visual similarity.
- Image thumbnailing consists of the identification of one or more regions of interest in an input image: for example, salient parts are aggregated in foreground regions, whereas redundant and non informative pixels become part of the background. The range of applications where thumbnailing can be applied is broad, including traditional problems like image compression, image visualizations, adaptive image display in small devices, but also more recent applications like variable data printing, assisted content creation, automatic blogging, and the like.
- Image thumbnailing is strongly related with the detection of salient regions. Saliency detection is seen as a simulation or modeling of the human visual attention mechanism. In the field of image processing, it is understood that some parts of an image receive more attention from human observers than others. Saliency refers to the “importance” or “attractiveness” of the visual information in an image. A salient region may describe any relevant part of an image that is a main focus of a typical viewer's attention. Visual saliency models have been used for feature detection and to estimate regions of interest. Many of these methods are based on biological vision models, which aim to estimate which parts of images attract visual attention. Implementation of these methods in computer systems generally fall into one of two main categories: those that give a number of relevant punctual positions, known as interest (or key-point) detectors, and those that give a more continuous map of relevance, such as saliency maps. Saliency maps can provide richer information about the relevance of features throughout an image. While interest points are generally simplistic corner (Harris) or blob (Laplace) detectors, saliency maps can carry higher level information. Such methods have been designed to model visual attention and have been evaluated by their congruence with fixation data obtained from experiments with eye gaze trackers.
- Recently, saliency maps have been used for object recognition, image categorization, automated image cropping, adaptive image display, and the like. For example, saliency maps have been used to control the sampling density for feature extraction. Alternatively, saliency maps can be used as foreground detection methods to provide regions of interest (ROI) for classification. It has been shown that extracting image features in the locality of ROIs can give better results than sampling features uniformly through the image. A disadvantage is that such methods may miss important context information from the background.
- A distinction can be made between a type of saliency detection which aims to detect the most interesting object in an image, irrespective of context (context independent saliency detection) and a concept type of saliency detection in which specific type of object is searched for in the image.
- The typical context independent case is often solved by bottom-up methods which seek to detect the most interesting part of the image, without targeting any specific object or concept. Concept type saliency detection is often referred to as top-down saliency detection.
- Visual saliency and attention has been modelled with three categories of approaches inspired by the human visual system. Bottom-up, stimulus-driven methods are based on intrinsic low-level features such as contrast, color, orientation, and the like. Top-down methods take into account higher order information (context, structure) about the image in the analysis. Hybrid approaches aim to leverage benefits of the other two categories.
- Bottom-up strategies are by far the most common and they are advantageous if the low level features represent the salient parts of the image well (e.g., isolated objects, uncluttered background). Top-down methods help when other factors dominate (e.g., the presence of human face), but they are lacking in generality. Hybrid approaches, in general, are designed in a two stage fashion where top-down strategies filter out noisy regions in bottom-up saliency maps.
- One of example of bottom-up methods is described in L. Itti, C. Koch, E. Niebur, et al., “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259 (1998). In this approach, multi-scale topographic features characterizing color, intensity and texture are extracted and combined with “center-surround” operations to obtain saliency maps. Another method is described in Xiaodi Hou and Liqing Zhang, “Saliency Detection: A Spectral. Residual Approach,” IEEE Conf on Computer Vision & Pattern Recognition (2007). The methods is based on spectral residual of images in the spectral domain that locates salient regions by taking into account the “noise” in the logarithmic magnitude frequency curve of an image.
- Gao, et al. reformulated the “center-surround” hypothesis in a decision theoretic framework (see, D. Gao and N. Vasconcelos, “Bottom-up saliency is a discriminant process, Proceedings of IEEE Int'l Conf. on Computer Vision (ICCV), Rio de Janeiro, Brazil (2007); D. Gao, V. Mahadevan and N. Vasconcelos, “The discriminant center-surround hypothesis for bottom-up saliency,” Proc. of Neural Information Processing Systems (NIPS), Vancouver, Canada (2007)). Saliency detection is interpreted as a binary classification problem where saliency is identified with features that discriminate “center” and “surround” regions well.
- Top-down visual attention processes are considered to be driven by voluntary control, and related to the observer's goal when analyzing a scene. These methods take into account higher order information about the image such as context, structure, etc. Object detection can be seen as a particular case of top-down saliency detection, where the predefined task is given by the object class to be detected (See, Jiebo Luo, “Subject content-based intelligent cropping of digital photos,” in IEEE Intl. Conf. on Multimedia and Expo (2007)).
- An additional example of a top-down approach is where the system first classifies the image in twrms of landscape, close-up, faces, etc. and then it applies the most appropriate thumbnailing/cropping strategy (See, G. Ciocca, C. Cusano, F. Gasparini, and R. Schettini, “Self-adaptive image cropping for small display,” in IEEE Intl. Conf. on Consumer Electronics (2007)).
- Recent Hybrid approaches combine bottom-up with classic top-down object detection strategies. One approach blends the Viola-Jones face detector (Jones, M. J., Rehg, J. M., “Statistical Color Models with Application to Skin Detection,” IJCV(46), No. 1, pp. 81-96 (January 2002)) with the Itti classic approach (See, L. Itti and C. Koch, “Computational Modeling of Visual Attention,” Nature Reviews Neuroscience, 2(3): 194-203 (2001), hereinafter “Itti and Koch 2001”). In a similar fashion, Huang, et al. combines their saliency map based on color, shape, and texture with face and text detector and uses branch and bound algorithm to find optimal solutions efficiently (See, Chen-Hsiu Huang, Chih-Hao Shen, Chun-Hsiang Huang and Ja-Ling Wu, “A MPEG-7 Based Content-aware Album System for Consumer Photographs,” Bulletin of the College of Engineering, NTU, No. 90, pp. 3-24 (February 2004)).
- Recent approaches suggest that saliency can be learned, either using global features or sufficient manually labelled examples (See, T. Liu, J. Sun, N. Zheng, X. Tang and H. Shum, “Learning to Detect A Salient Object,” CVPR (2007), hereinafter “Liu, et al.”), or directly from human eye movement data through a simple parameter-free approach.
- In contrast, Z. Wang, B. Li, “A Two-Stage Approach to Saliency Detection in Images,” In ICASSP 2008 IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) (March/April 2008) combines spectral residual for bottom-up analysis with features capturing similarity and continuity based on Gestalt principles.
- Above-mentioned U.S. patent application Ser. No. 12/250,248 detects regions of interest (ROIs) by a learning approach. The method uses the information related to the position and the size of the manually selected ROIs. Above-mentioned U.S. application Ser. No. 12/033,434 also proposes a method for detecting salient parts of an image, but the approach is heavily dependent on the semantic context in which either the image or its thumbnail is used. A visual concept is derived from each image and the ROI that corresponds to that visual concept is sought. Therefore, an image can lead to completely different thumbnails, depending on the context.
- The following references, the disclosures of which are incorporated herein in their entireties by reference, are mentioned:
- U.S. Pub. No. 2008/0317358, published Dec. 25, 2008, entitled CLASS-BASED IMAGE ENHANCEMENT SYSTEM, by Marco Bressan, et al., discloses a method for image enhancement, which includes assigning a semantic class to a digital image based on image content, and applying an aesthetic enhancement to the image based on an image quality of the image and the assigned semantic class.
- U.S. Pub. No. 2007/0005356, entitled GENERIC VISUAL CATEGORIZATION METHOD AND SYSTEM; U.S. Pub. No. 2007/0258648, entitled GENERIC VISUAL CLASSIFICATION WITH GRADIENT COMPONENTS-BASED DIMENSIONALITY ENHANCEMENT; and U.S. Pub. No. 2008/0069456 entitled BAGS OF VISUAL CONTEXT-DEPENDENT WORDS FOR GENERIC VISUAL CATEGORIZATION, all by Florent Perronnin; and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints”, ECCV Workshop on Statistical Learning in Computer Vision, 2004, disclose systems and methods for categorizing images based on content.
- The following relate to various methods for saliency detection: U.S. Pub. No. 2008/0304740, published Dec. 11, 2008, entitled Salient Object Detection, by Jian Sun, et al.; U.S. Pub. No. 2008/0304708, published Dec. 11, 2008, entitled DEVICE AND METHOD FOR CREATING A SALIENCY MAP OF AN IMAGE, by Olivier Le Meur, et al.; U.S. Pub. No. 2008/0304742, published Dec. 11, 2008, entitled COMBINING MULTIPLE CUES IN A VISUAL OBJECT DETECTOR, by Jonathan H. Connell; U.S. Pub. No. 2006/0093184, published May 4, 2006, entitled IMAGE PROCESSING APPARATUS, by Motofumi Fukui, et al.; and U.S. Pat. No. 7,400,761, issued Jul. 15, 2008, entitled CONTRAST-BASED IMAGE ATTENTION ANALYSIS FRAMEWORK, by Ma, et al.
- In accordance with one aspect of the exemplary embodiment, a method for detecting a region of interest in an image includes, for each image in a dataset of images for which a region of interest has been respectively established, storing a respective dataset image representation based on features extracted from the image. For an original image for which a region of interest is to be detected, the method includes generating an original image representation for the original image based on features extracted from the image, identifying a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, training a classifier with information extracted from the established regions of interest of the subset of similar images and, with the trained classifier, identifying a region of interest in the original image.
- In another aspect, an apparatus for detecting a region of interest in an image includes memory which stores the dataset image representations, and instructions for performing the above-described method. A processor with access to the instructions and dataset image representations executes the instructions. In another aspect, an apparatus for detecting a region of interest in an image includes memory which, for a dataset of images for which a respective region of interest has been established, stores a set of dataset image representations, each dataset image representation being derived from features extracted from a respective one of the images in the dataset. Memory stores instructions which, for an original image for which a region of interest is to be detected, generate an original image representation for the original image based on features extracted from the original image, identify a subset of similar images in the dataset, based on a measure of similarity between the original image representation and each dataset image representation, and train a classifier to identify a region of interest in the original image, the classifier being trained with positive and negative examples, each of the positive examples comprising a high level representation based on features extracted from the established region of interest of a respective one of the subset of similar images and each of the negative examples comprising a high level representation based on features extracted from outside the established region of interest of a respective one of the subset of similar images.
- In another aspect, a method for detecting a region of interest in an image includes storing a set of image representations, each image representation being based on features extracted from patches of a dataset image, where for each dataset image, the patch features are identified as salient or non-salient based on whether or not the patch is within a manually identified region of interest. For an original image for which a region of interest is to be detected, the method includes generating an original image representation for the original image based on features extracted from patches of the image, computing a distance measure between the original image representation and image representations in the set of image representations to identify a subset of similar image representations from the set of image representations, and training a classifier with positive and negative examples extracted from the images corresponding to subset of similar image representations, the positive examples each being based on the salient patch features of a respective image and the negative examples being based on non-salient patch features of the respective image. With the trained classifier, a region of interest in the original image is identified based on the patch features of the original image.
-
FIG. 1 is a functional block diagram of an apparatus for identifying a region of interest in an image in accordance with one aspect of the exemplary method; -
FIG. 2 is a flow chart illustrating a method for identifying a region of interest in an image in accordance with one aspect of the exemplary method which may be performed with the apparatus ofFIG. 1 ; -
FIG. 3 illustrates the images processed during steps of the method; -
FIG. 4 illustrates substeps of part of the method ofFIG. 2 ; -
FIG. 5 illustrates substeps of part of the method ofFIG. 2 ; -
FIG. 6 illustrates patches and windows used in generating a saliency map; -
FIG. 7 illustrates inputting a salient region into categorizer which generates a category for the image; -
FIG. 8 illustrates F-measure values for various saliency detection methods as a function on threshold size; -
FIG. 9 illustrates Precision, Recall, and F-measure data for an Example comparing the present method (methods A and B, without and with Graph-cut) to comparative methods for saliency detection (methods C,D,E, and F); and -
FIG. 10 illustrates the displacement of a bounding box around the salient region from a manually assigned bounding box for the exemplary method (method B) and comparative methods C, D, E, and F. - The exemplary embodiment relates to an apparatus and computer-implemented method and computer program product for detecting saliency in an image, such as a natural image, based on similarity of the original image with images for which visually salient regions of pixels are pre-segmented. The method assumes that images sharing similar visual appearance (as determined by comparing computer-generated content-based representations) share the same salient regions. In the exemplary embodiment, saliency detection is approached as a binary classification problem where pre-segmented salient/non salient pixels are available to train and test an algorithm. In one embodiment, the method allows both context and context independent saliency detection within a single framework.
- With reference to
FIG. 1 , an exemplary apparatus for salient region detection is illustrated. The apparatus may be embodied in an electronic processing device, such as the illustratedcomputer 10. In other embodiments, theelectronic processing device 10 may include one or more specific or general purpose computing devices, such as a network server, Internet-based server, desk top computer, laptop computer, personal data assistant (PDA), cellular telephone, or the like. Theapparatus 10 includes aninput component 12, anoutput component 14, aprocessor 16, such as a CPU, andmemory 18. Thecomputer 10 is configured to implement asalient region detector 20, hosted by thecomputer 10, for identifying a salient region or regions of an original input image. Thesalient region detector 20 may be in the form or software, hardware, or a combination thereof. The exemplarysalient region detector 20 is stored in memory 18 (e.g., non-volatile computer memory) and comprises instructions for performing the exemplary method described below with reference toFIG. 2 . These instructions are executed by theprocessor 16. Adatabase 22 of previously annotated images (and/or information extracted therefrom) is stored inmemory 18 or a separate memory.Components computer 10 may be connected for communication with each other by a data/control bus 24. Input and output components may be combined or separate components and may include, for example, data input ports, modems, network connections, and the like. - The
computer 10 is configured for receiving anoriginal image 30, e.g., viainput component 12, and storing theimage 30 in memory, such as a volatile portion ofcomputer memory 18, while being processed by thesalient region detector 20. Theimage 30 is transformed by thesalient region detector 20, e.g., by cropping or otherwise identifying a salient region orregions 32 of the image. Thecomputer 10 is also configured for storing and/or outputting thesalient region 32 generated for theimage 30 by thesalient region detector 20 and for outputting a transformedimage 34 in which the salient region is identified or which comprises a cop of the original image based on thesalient region 32, e.g., by theoutput component 14. In one embodiment, the salient region image data may be cropped from the original image data. Aclassifier 36, incorporated in the salient region detector or in communication with, is fed by the salient region detector with a subset of the database images (or information extracted therefrom) on which the classifier is trained to identify a salient region in an original image. - The
computer 10 may include or be in data communication with adisplay 40, such as an LCD screen, or other output device for displaying thesalient region 32. Alternatively or additionally, thesalient region 32 may be further processed, e.g., by incorporation into adocument 42, which is output by theoutput component 14, or output to acategorizer 44. - The
input image 30 generally includes image data for an array of pixels forming the image. The image data may include colorant values, such as grayscale values, for each of a set of color separations, such as L*a*b* or RGB, or be expressed in another other color space in which different colors can be represented. In general, “grayscale” refers to the optical density value of any single image data channel, however expressed (e.g., L*a*b*, RGB, YCbCr, etc.). The images may be photographs, video images, graphical images (such as freeform drawings, plans, etc.), text images, or combined images which include photographs along with text, and/or graphics, or the like. The images may be received in PDF, JPEG, GIF, JBIG, BMP, TIFF or other common file format used for images and which may optionally be converted to another suitable format prior to processing. Input images may be stored in a virtual portion ofmemory 18 during processing. - The term “color” as used herein is intended to broadly encompass any characteristic or combination of characteristics of the image pixels to be employed in the extraction of features. For example, the “color” may be characterized by one, two, or all three of the red, green, and blue pixel coordinates in an RGB color space representation, or by one, two, or all three of the L, a, and b pixel coordinates in an Lab color space representation, or by one or both of the x and y coordinates of a CIE chromaticity representation, or the like. Additionally or alternatively, the color may incorporate pixel characteristics such as intensity, hue, brightness, etc. Moreover, while the method is described herein with illustrative reference to two-dimensional images such as photographs or video frames, it is to be appreciated that these techniques are readily applied to three-dimensional images as well. The term “pixel” as used herein is intended to denote “picture element” and encompasses image elements of two-dimensional images or of three-dimensional images (which are sometimes also called voxels to emphasize the volumetric nature of the pixels for three-dimensional images).
-
Image 30 can be input from anysuitable image source 50, such as a workstation, database, scanner, or memory storage device, such as a disk, camera memory, memory stick, or the like. Theimage source 30 may be temporarily or permanently communicatively linked to thecomputer 10 via a wired orwireless link 52, such as a cable, telephone line, local area network or wide area network, such as the Internet, through a suitable input/output (I/O)connection 12, such as a modem, USB port, or the like. In the case of acomputer 10,processor 16 may be the computer's central processing unit (CPU). However, it is to be appreciated that the exemplary method may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like. In general, any processor, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown inFIG. 2 , can be used to implement the method for generating an image representation. -
Memory 18 may be in the form of separate memories or combined and may be in the form of any type of tangible computer readable medium such as random access memory (RAM), read only memory (ROM), magnetic disk or tape, optical disk, flash memory, holographic memory, or suitable combination thereof. - With reference to
FIG. 2 , a method for detecting a salient region of an original image is illustrated.FIG. 3 illustrates graphically the processing of anexemplary image 30 during the method. - The method begins at S100.
- At S102, a large dataset of
pre-segmented images 22 is stored. These are images for which the pixels have been identified as either salient or non-salient, based on human interest. The dataset ideally includes a wide variety of images, including images which are similar in content to theimage 30 for which a region of interest to be detected. For example, the dataset may include at least 100, e.g., at least 1000 images, such as at least about 10,000 images, and can be up to 100,000 or more, each dataset image having an established region of interest. In one embodiment, for at least some of the images in the dataset, the pre-segmented region(s) of each image can further be associated with a semantic label referring to the content of the region. For example, a set of label types may be defined, such as animals, faces, people, buildings, automobiles, landscapes, flowers, other, and each image manually assigned one or more of these labels, based on its region of interest. - At S104, image representations are generated for each of the images in the dataset. The representations are generally high level representations which are derived from low level features extracted from the image. In one embodiment, the high level representation of each pre-segmented image is based on fusing (e.g., a sum or concatenation) of positive (+ve) and negative (−ve) high level representations, the positive one generated for the salient region (region of interest) of the image, the negative one for the non-salient region (i.e., everywhere except the region of interest). The two high level representations of each of the pre-segmented images may be derived from patch level representations, e.g., fisher vectors from salient region patches for generating the +ve high level representation and fisher vectors from patches outside the salient region for the −ve high level representation. As will be appreciated, S104 may be performed prior to input of
image 30 and the computed high level +ve and −ve representations stored inmemory 18. At this point, storing of the actual images in thedataset 22 may no longer be necessary. Further details of this step are illustrated inFIG. 4 and are described below. - At S106, an
image 30 for which a visually salient region (which may be referred to herein as a region of interest (ROI)) is to be identified is input and stored in memory. - At S108, a representation of the input image is generated (e.g., by the salient region detector 20), based on low level features extracted from patches of the image in a similar manner to that for the pre-segmented images in the data-set except that here, there are no pre-segmented salient regions. Further details of this step are illustrated in
FIG. 5 and are described below. - At S110, a subset K of images in the dataset of pre-segmented images is identified, based on similarity of their high level representations to that of the original image. In particular, the K-nearest neighbor images may be retrieved from the annotated
dataset 22 by thesalient region detector 20 using a simple distance measure, such as the L1 norm distance between Fisher signatures of each dataset image (e.g., as a sum of the high level +ve and −ve representations) and the high level representation of the input image (e.g., as a sum of all high level patch representations) e.g., as generated using a global visual vocabulary. - Where images have been manually annotated with labels, prior to identifying the subset of K images, a user may be prompted to select one of the label types, or this information may be fed to the
salient region detector 20 when theimage 30 is input. In this embodiment, the subset of K nearest neighbor images is identified in the substantially the same way, but in this case, from among those images having pre-segmented regions labeled with the selected semantic label (assuming there are sufficient images in the dataset with pre-segmented regions annotated with the selected label). - At S112, a
binary classifier 36 is trained using, as positive examples, the representations of the salient regions of the retrieved K-nearest neighbor images (designated by a “+” inFIG. 3 ), which may all be concatenated or summed to form a single vector. As negative examples, representations the non-salient backgrounds regions are used (designated by a “−” inFIG. 3 ), which again, may all be concatenated or summed to form a single vector. The same high level representations can be used by any binary classifier, or alternatively other local patch representations can be considered in another embodiment. - In the case where it is desired that a context-dependent salient region of the original image be identified, then when there are multiple salient regions in a nearest neighbor image, only the one(s) labeled with the selected label are considered as salient regions and used in generating the +ve representation. The rest of the image is considered non-salient.
- At S114, the trained
classifier 36 is used to output a saliency probability for each patch of the original image extracted at S106. - At S116, based on the saliency probabilities, a region of interest of the original image is identified by the
salient region detector 20. This step may include generating a saliency map 56 (FIG. 3 ). - At S118 the saliency map may be refined by the
salient region detector 20, e.g., with graph-cut segmentation to refine the salient region, as illustrated at 58 inFIG. 3 . - At S120, the transformed image, e.g., a crop of the image based on the salient region or an image in which the salient region is identified by the
salient region detector 20, e.g., by annotations such as HTML tags, is output. - At S122, further processing may be performed on the transformed image, e.g., the image crop based on the salient region may be displayed or incorporated into a document, e.g., placed in a predetermined placeholder location in a text document or sent to a
categorizer 44 for assigning an object class to theimage 30. - The method ends at S124.
- There are several advantages to the exemplary method and apparatus. Unlike prior saliency detection methods which rely solely on the content of the image to generate a saliency map, the present apparatus and method take advantage of a process which allows image saliency to be learned using (previously annotated) visually similar example images. Additionally, segmentation strategies can be advantageously employed for saliency detection. Further, the method is generic in the sense that it does not need to be tied to any specific category of images (e.g., faces), but allows a more broad concept of visual similarity, while at the same time, being readily adaptable to consideration of context. Finally, while the exemplary method has been described with particular reference to photographic (natural) images, the method is applicable to other types of images, such as medical or text document images, assuming that appropriate annotated data is available.
- Further details of the apparatus and method will now be described.
- Referring once more to
FIG. 1 , a variety of methods exist for identifyingsalient regions 60 for theimages 62 in thedataset 22. In one embodiment, one or more human observers looks at each image, e.g., on a computer screen, and identifies a salient region (a region which the observer considers to be the most interesting). For example, the user may generate a bounding box which encompasses the salient region. Alternatively, the observer may identify a region or regions of interest by moving the cursor around the region(s) to generate a bounded region, which may then be processed, for example, by automatically creating a bounding box which encompasses the bounded region. In other embodiments, eye gaze data may be employed to identify a region of interest. In this embodiment, an eye gaze tracking device tracks eye movements of the observer while viewing the image for a short period of time. The tracking data is superimposed on the image to identify the region of interest. The identified regions/observations of several users may be combined to generate an overall region of interest for the image. Theimage 62 can then be segmented into asalient region 64 and a nonsalient region 66, based on the identified region of interest. The image may then be annotated with the segmentation information, e.g., by applying a HTML tag or by storing the segmentation in a separate file. Furthermore, the salient region may be associated with a semantic concept (by annotating the salient region or entire image with a label). Thus, in the exemplary embodiment, the existence of a set D of images {I1, . . . , Id, . . . ,ID} representing a wide variety of subjects is assumed for building the dataset. It can also be assumed that each image Id has been manually annotated by specifying one (or more) rectangular Region of Interest (ROI) per image (e.g., =rd(x, y, w, h) centered in (x, y), with width and height dimensions w and h) or with a more general map containing the annotated salient region(s) and optionally with an associated semantic label. - As shown in
FIG. 4 , S104 may include the following substeps for eachimage 62 in the dataset 22: - At S104 a
patches 70A,B,C, etc., 72A,B,C,D, and 74 are extracted from the image e.g., at multiple scales. This is illustrated for a portion of theimage 62 inFIG. 6 , showing patches (unbroken lines) at three scales by way of example, where the arrows point roughly to the centers of the respective patches. - At S104 b, for each patch, low level features are extracted.
- At S104 c, for each patch, a representation of the patch (e.g., a Fisher vector) may be generated, based on the low level features.
- At S104 d, patches are designated as salient or non salient, depending on whether they are within the pre-segmented region or not. Various methods may be used to determine whether a patch is be considered to be “within” the salient region. In one embodiment, a threshold degree of overlap may be sufficient for a patch to be considered within the salient region. In the exemplary embodiment, the overlap is computed relative to the area of the patch size, e.g., if 50% or more of the patch is within the salient region, then it is accepted as being within it. If the region of interest is too small, relative to the size of the patch (e.g., ROI is less than 70% of the patch area), then the patch will not be considered. In other embodiments, a patch is considered to be within the salient region if its geometric center lies within the salient region. In yet another embodiment, the patch is considered to be within the salient region if it is entirely encompassed by or entirely encompasses the salient region.
- At S104 e, a high level +ve representation of the salient region of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the salient patches and a high level −ve representation of the image is extracted, based on the patch representations (e.g., fisher vectors, or simply, low level features) of all the non-salient patches. As noted above, salient patches may be considered to be patches which are at least partially overlapping the
salient region 60. These +ve and −ve representations are referred to herein as Fisher FG vector and Fisher BG vector, respectively, even though they do not necessarily correspond to what would be considered as the foreground and background regions of an image. - At S104 f, a high level representation of the image is generated, e.g., as a feature vector, e.g., a Fisher vector-based Image Signature, for example, by concatenation or other function of the +ve and −ve high level representations (Fisher FG vector and Fisher BG vector).
- A similar procedure may be followed for the
original image 30, as shown inFIG. 5 : At S108 a patches are extracted from the image e.g., at multiple scales. - At S108 b, for each patch, low level features are extracted, e.g., as a features vector.
- At S108 c, for each patch, a representation (e.g., Fisher vector) may be generated, based on the extracted low level features.
- At S108 d, a high level representation of the image is extracted, based on the patch representations or low level features. In the exemplary embodiment, the high level representation is a vector (e.g., a Fisher vector-based Image Signature) formed by concatenation or other function of the patch level Fisher vectors.
- While the exemplary embodiment is described herein with respect to Fisher vectors, various methods exist for generation of a high level representation of an image, which may be implemented as an alternative to the high level representation in the exemplary method, e.g., a Bag-of-Visual words (BOV) representation of the image as disclosed, for example, in above-mentioned U.S. Pub. Nos. 2007/0005356; 2007/0258648; 2008/0069456; the disclosures of which are incorporated herein by reference, and G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, “Visual Categorization with Bags of Keypoints,” ECCV Workshop on Statistical Learning in Computer Vision (2004); also the method of Y. Liu, D. S. Zhang, G. Lu, W.-Y. Ma, “A survey of content-based image retrieval with high-level semantics,” in Pattern Recognition, 40 (1) (2007); as well as that of F. Perronnin and C. Dance, “Fisher kernel on visual vocabularies for image categorization,” In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Minneapolis, Minn., USA. (June 2007). This last reference and U.S. Pub. No. 2007/0258648 are collectively referred to herein as “Perronnin and Dance” and describe a Fisher kernel (FK) representation based on Fisher vectors, which is similar in many respects to the Fisher Signature described herein.
- Further details of the steps S104 and S108 now follow.
- In the exemplary embodiment, multiple patches are extracted from the image (original or dataset image) at various scales (S104 a, S108 a). For each patch, low level features are extracted (S104 b, S108 b). The low level features which are extracted from the patches are typically quantitative values that summarize or characterize aspects of the respective patch, such as spatial frequency content, an average intensity, color characteristics (in the case of color images), gradient values, and/or other characteristic values. In some embodiments, at least about fifty low level features are extracted from each patch; however, the number of features that can be extracted is not limited to any particular number or type of features for example, 1000 or 1 million low level features could be extracted depending on computational capabilities. In the exemplary embodiment, the low level features include local (e.g., pixel) color statistics, and texture. For color statistics, local RGB statistics (e.g., mean and standard deviation) may be computed. For texture, gradient orientations (representing a change in color) may be computed for each patch as a histogram (SIFT-like features). In the exemplary embodiment two (or more) types of low level features, such as color and texture, are separately extracted and the high level representation of the patch or image is based on a combination (e.g., a sum or a concatenation) of two Fisher Vectors, one for each feature type.
- In other embodiments, Scale Invariant Feature Transform (SIFT) descriptors (as described by Lowe, in “Object Recognition From Local Scale-Invariant Features,” ICCV (International Conference on Computer Vision), 1999, are computed on each patch. SIFT descriptors are multi-image representations of an image neighborhood, such as Gaussian derivatives computed at, for example, eight orientation planes over a four-by-four grid of spatial locations, giving a 128-dimensional vector (that is, 128 features per features vector in these embodiments). Other descriptors or feature extraction algorithms may be employed to extract features from the patches. Examples of some other suitable descriptors are set forth by K. Mikolajczyk and C. Schmid, in “A Performance Evaluation Of Local Descriptors,” Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Madison, Wis., USA, June 2003, which is incorporated in its entirety by reference.
- A feature vector can be employed to characterize each patch. The feature vector can be a simple concatenation of the low level features. In the exemplary embodiment, the extracted low level features can be used to generate a high level representation of the patch (e.g., a Fisher vector) (S104 c, S108 c). In this embodiment, a visual vocabulary is built for each feature type using Gaussian Mixture Models. Modeling the visual vocabulary in the feature space with a GMM may be performed according to the method described in F. Perronnin, C. Dance, G. Csurka and M. Bressan, “Adapted Vocabularies for Generic Visual Categorization,” In ECCV (2006).
- Each patch is then characterized (at S104 c, S108 c) with a gradient vector derived from a generative probability model. In the present case, the visual vocabulary is modeled by a Gaussian mixture model in a low level feature space where each Gaussian corresponds to a visual word. Let λ={wi,μi,σi,i=1 . . . N} denote the set of parameters of the GMM, where N denotes the number of Gaussians and wi, μi and σi are respectively the weight, mean vector, and variance vector represented by the diagonal covariance matrix Σi of Gaussian i. The GMM vocabulary is trained using maximum likelihood estimation (MLE) considering all or a random subset the low level descriptors extracted from the annotated
dataset 22. - Given a new low level descriptor xt (such as a color or texture feature vector), the probability that it was generated by the GMM is
-
- Perronin and Dance show that the partial derivatives of the loglikelihood of log p(xt|λ) according to the GMM parameters can be computed by the following formulas:
-
- where the superscript d denotes the d-th dimension of a vector and γi(xt) is the occupancy probability given by
-
- In the exemplary embodiment, only the gradient with respect to the mean and standard deviation is used as it was shown in Perronnin and Dance that the gradient with respect to the mixture weights does not contain significant information. The Fisher gradient vector ft of the descriptor xt is then just the concatenation of the partial derivatives in Equations (1) and (2), leading to a 2×D×N dimensional vector, where D is the dimension of the low level feature space. While the Fisher vector is high dimensional, it can be made relatively sparse as only a small number of components have non-negligible values. In the following description, the Fisher Vector of a set of descriptors X={xt, t=1 . . . T} is defined as the sum of individual Fisher Vectors:
-
- This vector can be directly derived from the independence assumption:
-
- of the set's log-likelihood and can be interpreted as the direction in which parameters should be modified to best fit the dataset (see Perronnin and Dance for further details).
- Considering the gradient log-likelihood of each patch with respect to the parameters of the Gaussian Mixture leads to a high level representation of the patch which is referred to as a Fisher vector. The dimensionality of the Fisher vector can be reduced to a fixed value, such as 50 or 100 dimensions, using principal component analysis. In the exemplary embodiment, since there are two vocabularies, the two Fisher vectors are concatenated or otherwise combined to form a single high level representation of the patch having a fixed dimensionality.
- As will be appreciated, rather than Fisher vectors, other features-based representations can be used to represent each patch, such as a set of features, a two- or more-dimensional array of features, or the like.
- The high level representation of the original image (Fisher Image Signature) can then be generated from the patch feature vectors (e.g., the patch Fisher vectors) (S104 f, S108 d).
- In the case of the dataset images, the patches are labeled according to their overlap with the manually designated salient regions. This leads to two sets of low level features X+and X− referring to the set of patches that are considered salient and those which are non-salient. Using equation (3), two Fisher vectors fX+ and fX− are computed. These two vectors are then stored as indexes in the database and are, in the exemplary embodiment, the only required information from the dataset images needed to process a new image.
- In the exemplary embodiment, each
original image 30 and each of the Knearest neighbor images 62 is represented by a high level representation which is simply the concatenation of two Fisher Vectors, one for texture and one for color, each vector formed by averaging the Fisher Vectors of the patches. This single vector is referred to herein as a Fisher image signature. In other embodiments, the patch level Fisher vectors may be otherwise fused, e.g., by concatenation, dot product, or other combination of patch level Fisher vectors to produce an image level Fisher vector. - In the exemplary embodiment, initialization proceeds as follows. From each image Id a set of patches P={p1(d), . . . , ps(d) is extracted at multiple scales. Each patch is then labeled as salient ps +(d) or non salient ps −(d) according to its position with respect to the annotated region of interest rd (S104 d). For each image in D a pair of signatures <F+(d),F−(d)> is created, which is composed, respectively, of the representation of the collection of salient patch descriptors F+(d), respectively, and non-salient patch descriptors F−(d). The pair of signatures is stored in the
saliency database 22. - For the original image, a Fisher image signature FY is computed in an analogous way with respect to the initialization phase, except that all patches of the image are used to compute the signature (S104 d).
- As will be appreciated, the Fisher image signature is exemplary of types of high level representation which can be used herein. Other image signatures used in the literature for image retrieval may alternatively be used, as discussed above, such as a Bag-of-Visual Words (BOV) representation or Fisher kernel (FK).
- Based on the high level representation of the original image, the most similar images are retrieved from the dataset where, for each image, a manually annotated ROI is available, as described above. The K nearest neighbors are identified, based on the distance metric, where K may be, for example, at least 10, and up to about 50 or 100. In general performance is not appreciably improved when K is above about 20-30, so a suitable subset contains about 20-30 images, which may represent, for example, less than 20%, e.g., no more than about 10% of the number of images in the dataset, and in one embodiment, no more than about 1 % or 0.2% thereof.
- In the exemplary embodiment, the retrieval of a set of K images from D which are visually similar to In generates a list of signatures <FX+,FX−> associated with the K most similar images to In. For example, for each image in the dataset, a distance metric is computed between the global Fisher image signature obtained by summing FX+ and FX− (or other high level image representation) and that of the original image FY. In one embodiment, the K most similar images are retrieved using the Fisher image signature with the normalized L1 distance measure as described, for example, in S. Clinchant, J.-M. Renders and G. Csurka, “Trans-Media Pseudo-Relevance Feedback Methods in Multimedia Retrieval,” Advances in Multilingual and Multimodal Information Retrieval, 8th Workshop of the Cross-Language Evaluation Forum, CLEF 2007, Budapest, Hungary, Sep. 19-21, 2007, LNCS 5152 (2008).
- As noted above, a set of local image patches are extracted from the original image and for each one, the descriptor set Y=y1,y2, . . . yM and the corresponding Fisher vector fY are computed. To compute the similarities between two images, a normalized L1 measure can be used to retrieve similar images:
-
- where {circumflex over (f)} is the vector f normalized to normalize L1 as equal to 1, {circumflex over (f)}i are the elements of the vector {circumflex over (f)} and fX=fX++fX− (as the set of descriptors in image X is the union of salient and non-salient patches). In the exemplary embodiment, distance measure used is the L1 norm distance between Fisher Image Signatures of each dataset image and the input image. However, other distance measures, such as Euclidian distance, chi2 distance, or the like, may alternatively be used for identifying a subset of similar images from the dataset.
- The
classifier 36 is trained using the Fisher Vector representations of image patches extracted from the retrieved K-nearest neighbor images. For the K-nearest neighbor images retrieved, manually annotated salient regions are available in, e.g., the form of bounding boxes. Therefore in each annotated image, the system considers as positive (i.e. salient) patches, the ones inside the annotated bounding box, and as negative (i.e., non-salient) all the others. For each retrieved image Xj, a Foreground Fisher vector (FG signature) fX+j is/has been computed by averaging the Fisher Vectors of the +ve patches and a Background Fisher Vector (BG signature) fX−j is/has been computed by averaging over the −ve patches. Then, all Fisher vectors representing salient regions are collected (summed) and all Fisher vectors representing non-salient regions are collected (summed) in the K most similar image retrieved images leading to a foreground Fisher model and a background Fisher model: -
- In another embodiment, where the aim is context dependent saliency detection, the patches are designated as positives only if they are within the salient regions labeled with the target concept. Otherwise they are considered negatives. Therefore, while in the context-independent case the fX+
j and fX−j need not be recomputed (they correspond to the values in the stored signatures <FX+,FX−>), in the context-dependent case, these values may be re-computed on-line as the set of positive and negative patches may be different (if multiple objects were designed as salient regions in the image and have different labels). - In the exemplary embodiment, for each original image patch representation (Fisher vector), a saliency score is computed based on the foreground Fisher model and on the background Fisher model. For example, a patch xi is considered salient, if its normalized L1 distance to the foreground Fisher model is smaller than to the background Fisher model:
-
∥{circumflex over (f)}xi −{circumflex over (f)}FGi ∥L— 1−∥{circumflex over (f)}xi −{circumflex over (f)}BGi ∥L— 1<0 - Such a classifier can be too dependent on a single local patch which makes it locally unstable. Therefore, in order to increase the model's robustness, instead of considering a single patch the Fisher vectors may be averaged over a neighborhood N of patches:
-
- Furthermore, the binary classifier score may be replaced with a non-binary score which is a simple function of the normalized L1 distances:
-
- Finally, to build a “saliency map” S, it could be considered that each pixel in the neighborhood region takes the value S=S(). However, this may not be a good strategy especially if overlapping regions are considered (see below). Accordingly, the value S can be assigned to the center pixel of each region and then either interpolate the values between these centers or use a Gaussian propagation of these values. The latter can be done by averaging over all Gaussian weighted scores:
-
- In the exemplary embodiment the saliency map is built for the original image by considering N such overlapping sub-windows (shown as 80A,B,C, etc.) of the same size (e.g., 50 pixels*50 pixels) (a few of these windows 80 are illustrated in
FIG. 6 ). The windows may be of the same size or somewhat larger than the smallest patches. A patch is considered to belong to a window if the geometric center of the patch lies within the window. For example, in the case ofwindow 80E,patches 70F and 74 are considered to belong to it. Note that this could be done at the patch level rather than using windows 80. However averaging over several patches gives more stable results. - As noted above the window's saliency score is computed based on the distance of the window signature (Eqn. (6) to the Foreground signature (FS) and Background signature (BS), as defined in (Eqn. (5), using the (optionally normalized) L1 distance computed as in Eqn. (7). The scores at the window level are projected to the pixels, as described in (Eqn. 8) above (averaging for each pixel, the window saliency scores of the windows containing that pixel).
- Equation (8) has a low computational cost but it is also a rather simple evaluation of the saliency score. Alternatively, a patch classifier (not shown) could be used to compute a saliency probability map by using the approach described in Gabriela Csurka and Florent Perronnin, “A Simple High Performance Approach to Semantic Segmentation,” British Machine Vision Conference (BMVC), Leeds, UK (September 2008). The main difference from that described in the reference is that instead of using object class labels, a single classifier is used, which is trained to categorize foreground versus background. Based on the labeled Fisher Vectors of +ve and −ve patches, a patch classifier is trained and the patch probability score for the original image is then propagated from patches to pixels as described in the Csurka and Perronnin reference. In practice, the saliency maps obtained by this type of classifier are not necessarily better than that which uses Eqn. 8.
- The aim of this step is to build one or more thumbnails from the saliency map S. In one embodiment, a bounding box may simply be drawn to encompass all (or substantially all) pixels which exceed a threshold probability score which is then designated as the region of interest.
- A straightforward option is to binarize S, giving, for example, a value of 0 to non salient pixels and 1 to salient ones. This may be the output of the classifier itself if it has default threshold th=0 that is supposed to discriminate salient values from non-salient ones. However, by increasing this threshold, more importance can be given to the precision, or by decreasing it, to recall. For example, denote the binarized saliency map by sB. Different strategies can be designed to build a thumbnail from this map. One option is to select the bounding box of the biggest or most centered connected component. Another option is to consider all connected components and retarget them into a single region as proposed in V. Setlur, S. Takagi, R. Raskar, M. Gleicher, and B. Gooch, “Automatic image retargeting,”. In Mobile and Ubiquitous Multimedia (MUM), 2005. However, a drawback of these simple approaches is that they rely directly on the saliency map, which by its construction is rather smooth and does not take into account the contours of the contained object. Depending on the selected threshold, this may lead either to sectioning the object of interest or leading to a thumbnail significantly larger than necessary.
- In other embodiments, refinement techniques may be applied to define an ROI based on the salient pixels which takes further considerations into account (S118). The role of this step is to enhance the precision. In general, the salient regions correspond to isolated objects. Therefore, regions classified as salient can be further refined by taking into account edge constraints.
- In one embodiment, at S118, a Graph-Cut segmentation may be used to adjust the borders of the salient region. This approach assumes that the estimated region contains a consistent part of the relevant objects. One suitable method is based on the Graph-Cut algorithms described in Rother, C., Kolmogorov, V., and Blake, A., “Grabcut: Interactive foreground extraction using iterated graph cuts,” In ACM Trans. Graphics (SIGGRAPH 2004) 23(3), 309-314 (2004).
- In this approach, the problem of segmentation is formulated in terms of energy minimization (i.e., max-flow/min-cut). The image is represented as graph in which each pixel is a node and the edges can represent color similarity between adjacent pixels as in a Markov Random Field. In addition, two extra nodes (starting and ending nodes) are added to the graph and linked to each pixel based on the probability that the pixel belongs to background or foreground.
- In one embodiment, for initializing the Graph-Cut algorithm, the saliency map generated at S116 is used to build an initial Graph-Cut model. In particular, a first Gaussian Mixture Model (GMM) is created for the foreground colors and a second GMM is created for the background colors. Then the algorithm iterates between Graph-Cut binary labeling and GMM updating as in Rother, et al.
FIG. 3 shows an example graph-cut mask 58 created from theROI mask 56 generated at S116. - For example, in the exemplary embodiment, the graph-cut method is performed as follows: First, two thresholds are chosen (one positive th+ and one negative th−). This separates the saliency map S into 3 different regions: pixels u labeled as salient (S(u)>th+), pixels labeled as non-salient (S(u)<th−) and unknown (the others). Two Gaussian Mixture Models (GMMs) Ω1 and Ω2 are created, one using RGB values of salient (foreground) pixels and one using RGB values of non salient (background) pixels. Then the following energy:
-
- where the data penalty function Du(u)=−log p(u|lu, Ωk
u ) is the negative log likelihood that the pixel u belongs to the GMM Ωlu , with lu ∈ 0,1 and the contrast term: -
- With δl
u ,l=1 if lu=lv, C representing 4-way cliques, and β=E(∥u−v∥2), as described in Rother, et al. The energy can be minimized using the min-cut/max-flow algorithms proposed in Y. Boykov and V. Kolmogorov. “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. PAMI, 26, 2004 leading to a binary labeling of the image. Using the new labels, we update (adapt) the two GMM parameters and similarly to Rother, et al. iterate between energy minimization and GMM updates. No modifications are made to the binary labels. This binary map can be considered as a new saliency map, denoted by SG. - This method works in most cases. In cases where the method does not work effectively, such as where there are similar colors in the foreground and background regions, the Graph-Cut method can be replaced by an alternative method. Detection of cases not suited to graph-cut processing can be automatically detected and the Graph-Cut regions rejected if any of the following is found:
- 1. All pixels in the image are labeled with the same label.
- 2. The positively labeled area after Graph-Cut is too small, compared with the size of the original image, e.g., less than 5% or less than 10% of its size.
- 3. There is a too great a divergence between the initialization (binarized Saliency Map 56) and the output of the Graph-Cut 58 (for example, the Graph-Cut region is greater than twice the size or less than 10% of the size of the ROI generated by the saliency map. Where the Graph-Cut results are rejected, the output of step S116, i.e., the
binarized Saliency Map 56 is used for identifying an ROI. - This can be expressed more generally by the equation
-
- When SG is computed the only information used about the saliency is the initialization of the two GMM. Therefore, if there is an important divergence between SG and SB, the initial SB map is more trustworthy.
- At S120, the ROI may be generated, for example, from the saliency map 58 (or 56) by processing the map in order to find the biggest, most centered object based on an analysis of statistics of the saliency map distribution (e.g., center of mass of the distribution, cumulative probability etc.). Alternatively, all the detected salient regions and retarget them into a single thumbnail. A rectangular crop (image thumbnail) 90 can then be generated, based on this salient region.
- The method illustrated in
FIGS. 2 , 4, and 5 may be implemented in a computer program product that may be executed on a computer. The computer program product may be a tangible computer-readable recording medium on which a control program is recorded, such as a disk, hard drive, or may be a transmittable carrier wave in which the control program is embodied as a data signal. Common forms of computer-readable media include, for example, floppy disks, flexible disks, hard disks, magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or other memory chip or cartridge, transmission media, such as acoustic or light waves, such as those generated during radio wave and infrared data communications, and the like, or any other medium from which a computer can read and use. - The exemplary method thus described may be implemented on one or more general purpose computers, special purpose computer(s), a programmed microprocessor or microcontroller and peripheral integrated circuit elements, an ASIC or other integrated circuit, a digital signal processor, a hardwired electronic or logic circuit such as a discrete element circuit, a programmable logic device such as a PLD, PLA, FPGA, or PAL, or the like. In general, any device, capable of implementing a finite state machine that is in turn capable of implementing the flowchart shown in
FIGS. 2 , 4, and 5, can be used to implement the automated method for identifying a region of interest in an image. - The exemplary embodiment finds application in a variety of contexts. For example, variable data applications such as 1 to 1 personalization and direct mail marketing often employ an image. By automated selection of a region of
interest 32 using the exemplary method, adocument 42 can be created incorporating an appropriatelysized crop 90 which incorporates the salient region. In one embodiment, the human observers used to annotate the salient regions of the images in thedataset 22 can be selected to represent the target audience. Or for example, two or more sets of annotators may be used, e.g., one group comprising only females, the other, only males, and separate sets of image signatures stored for each group. Thus, the K nearest neighbors may be different, depending on which set of signatures is used. - Variable data printing is not the only application of the exemplary system and apparatus. Other applications, such as image and document asset management or document image/photograph set visualization, and the like can also benefit. For example, a
crop 90 of the original image, based on the salient region, can be used for a thumbnail which is displayed in place of the original image, allowing a user to select images of interest from a large group of images, based on the interesting parts. - In another embodiment, the thumbnail (crop) 90 can be fed to a
categorizer 44 for categorizing the image based on image content. Here the categorizer is not confused by including areas of the image which are less likely to be of visual interest. In one embodiment, illustrated inFIG. 7 , theimage crop 90 is fed to a categorizer, which has been trained with training image crops 94, generated in the same way, but which has been annotated with a respective class (e.g., dogs, cats, flowers in the exemplary embodiment). The categorizer (which may incorporate a multiclass classifier or a set of binary classifiers, one for each object class) outputs aclass 96 for the crop, based on a similarity of features of the image crop to those of the training images. - It has been shown that extracting image features only around ROIs or on segmented foreground gives better results than sampling features uniformly through the image.
- Without intending to limit the scope of the exemplary embodiment, the following example compares results obtained with the exemplary apparatus described herein with comparative saliency detection methods.
- The exemplary method is evaluated by comparing the results with those of four comparative methods for saliency detection:
- Method A: Exemplary method without Graph-cut.
- Method B: Exemplary method using Graph-cut, as described above.
- Method C: based on above-mentioned U.S. patent application Ser. No. 12/250,248. This method generates saliency maps by linearly combining the bounding boxes of the K (with K=50) nearest images in the dataset, given the input image.
- Method D: (ITTI): A classic approach based on Itti theory (See, L. Itti and C. Koch, “A Saliency-Based Search Mechanism for Overt and Covert Shifts of Visual Attention,” Vision Research, 40(10-12): 1489-1506, 2000 (hereinafter Itti and Koch 2000) that leverages a neuromorphic models simulating which elements are likely to attract visual attention. In the Examples, a Matlab implementation available at http://www.saliencytoolbox.net/ was employed.
- Method E: (SR): This method is described in X. Hou, L. Zhang, “Saliency Detection: A Spectral Residual Approach,” CVPR, 2007, hereinafter “Hou, et al.” It is based on the analysis of the spectral residual of an image in the spectral domain. In these Examples a Matlab implementation available at http://bcmi.sjtu.edu.cn/˜houxiaodi was employed.
- Method F: (CRF): A learning method (Liu, et al.), based on a Conditional Random Field classifier.
- Part of the dataset described in Liu, et al. (MRSA Dataset) was used to train and test the exemplary method. The dataset was composed of 5000 images labeled by different users with no specific skills in graphic design. The dataset included images of a variety of different subjects. In general, a single object is present in the image with a broad range of backgrounds with fairly homogeneous color or texture. The salient region detector was configured to retrieve the K most similar images (with K=50).
- Ground truth data comprising manually annotated regions of interest generated by different users is also available. The users manually selected a rectangle (bounding box) containing the region of interest, which is typically represented by a full object or, in some cases by a subpart of the object (e.g., face). The 5000 images from the MRSA Dataset used in this example had bounding boxes annotated by nine users. The annotations are highly consistent with a very small variance over the nine bounding boxes. On average, the bounding boxes represent approximately 35% of the total area of the image, but this varies over a fairly wide distribution. Moreover the distance of the center of mass of the object from the center of the image is, on average, 42 pixels. Again the annotated dataset showed a distribution.
- For each image in the dataset, a ground truth saliency map g(x,y) has been generated to evaluate the results based on user annotations (bounding boxes containing salient regions). In particular, since the annotations for MRSA are highly consistent, an average of the nine bounding boxes of the various users was used. Maps g(x,y) were generated, with rectangular salient regions pixels set to 1 and 0 otherwise.
- Performance was evaluated by providing benchmarks for the performances using the following measures: BDE (See, D. R. Martin, C. C. Fowkles and J. Malik, “Learning to detect natural image boundaries using local brightness, color and texture cues,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI 26(5) pp. 530-549 (May 2004)) was used for assessing the displacement of the bounding boxes (
FIG. 10 ) and Precision, Recall and F-measure to acessess the quality of the saliency map. In particular Precision (Pr), Recall (Re) and F-measure (Fα) can be defined according to Liu, et al., as follows: -
- The F-measure is the weighted harmonic mean of precision and recall, with α=0.5 (thereby adding more importance to the precision than to the recall as in Liu, et al. If both precision and recall are zero, Fα is set to zero.
- In the Examples, some of the above mentioned methods (B-E) were tuned by selecting a specific threshold on the maps in order to maximize the F-measure of each one. The behavior of the F-measure as a function of the threshold on the map is shown in
FIG. 8 . As seen inFIG. 8 , the exemplary method (A and B) can be seen to give a better result than Methods C and E. Further,FIG. 8 shows the improvement that the Graph-Cut stage (Method B) introduces in the proposed method, increasing the F-measure of almost 10% as compared with Method A (without Graph-Cut). For Methods D and F, the thresholding was not applied because the results were taken directly from the Hou, et al. paper. -
FIG. 9 shows the thresholds selected for the Methods compared. - All the above mentioned Methods are compared in
FIG. 10 , where the results obtained in the experiment are shown in more detail. For each method considered the precision, recall and F-measure is given considering their best parameter setting. The CRF and ITTI results have been reported from the cited Hou, et al paper. -
FIG. 10 shows the Bounding Box displacement index. It represents the average distance, in pixels, of the center of the automatically detected Bounding Box from the center of the ground truth Bounding Box. The smaller this value the more accurate is the bounding box detected. As can be seen, the exemplary method using Graph-Cut (Method B) gave the best results. - It will be appreciated that various of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/400,277 US8175376B2 (en) | 2009-03-09 | 2009-03-09 | Framework for image thumbnailing based on visual similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/400,277 US8175376B2 (en) | 2009-03-09 | 2009-03-09 | Framework for image thumbnailing based on visual similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
US20100226564A1 true US20100226564A1 (en) | 2010-09-09 |
US8175376B2 US8175376B2 (en) | 2012-05-08 |
Family
ID=42678297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/400,277 Active 2031-01-20 US8175376B2 (en) | 2009-03-09 | 2009-03-09 | Framework for image thumbnailing based on visual similarity |
Country Status (1)
Country | Link |
---|---|
US (1) | US8175376B2 (en) |
Cited By (192)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100215098A1 (en) * | 2009-02-23 | 2010-08-26 | Mondo Systems, Inc. | Apparatus and method for compressing pictures with roi-dependent compression parameters |
US20100281361A1 (en) * | 2009-04-30 | 2010-11-04 | Xerox Corporation | Automated method for alignment of document objects |
US20110164815A1 (en) * | 2009-11-17 | 2011-07-07 | Samsung Electronics Co., Ltd. | Method, device and system for content based image categorization field |
US20120027309A1 (en) * | 2009-04-14 | 2012-02-02 | Nec Corporation | Image signature extraction device |
US8175376B2 (en) * | 2009-03-09 | 2012-05-08 | Xerox Corporation | Framework for image thumbnailing based on visual similarity |
WO2012138299A1 (en) * | 2011-04-08 | 2012-10-11 | Creative Technology Ltd | A method, system and electronic device for at least one of efficient graphic processing and salient based learning |
CN102800092A (en) * | 2012-07-12 | 2012-11-28 | 北方工业大学 | Point-to-surface image significance detection |
US20120328150A1 (en) * | 2011-03-22 | 2012-12-27 | Rochester Institute Of Technology | Methods for assisting with object recognition in image sequences and devices thereof |
US20130038632A1 (en) * | 2011-08-12 | 2013-02-14 | Marcus W. Dillavou | System and method for image registration of multiple video streams |
US8379981B1 (en) | 2011-08-26 | 2013-02-19 | Toyota Motor Engineering & Manufacturing North America, Inc. | Segmenting spatiotemporal data based on user gaze data |
CN102945378A (en) * | 2012-10-23 | 2013-02-27 | 西北工业大学 | Method for detecting potential target regions of remote sensing image on basis of monitoring method |
CN103020993A (en) * | 2012-11-28 | 2013-04-03 | 杭州电子科技大学 | Visual saliency detection method by fusing dual-channel color contrasts |
EP2579211A2 (en) | 2011-10-03 | 2013-04-10 | Xerox Corporation | Graph-based segmentation integrating visible and NIR information |
US20130091515A1 (en) * | 2011-02-04 | 2013-04-11 | Kotaro Sakata | Degree of interest estimating device and degree of interest estimating method |
US20130120454A1 (en) * | 2009-09-18 | 2013-05-16 | Elya Shechtman | Methods and Apparatuses for Generating Thumbnail Summaries for Image Collections |
US20130148880A1 (en) * | 2011-12-08 | 2013-06-13 | Yahoo! Inc. | Image Cropping Using Supervised Learning |
US20130148910A1 (en) * | 2011-12-12 | 2013-06-13 | Canon Kabushiki Kaisha | Method, apparatus and system for identifying distracting elements in an image |
CN103198319A (en) * | 2013-04-11 | 2013-07-10 | 武汉大学 | Method of extraction of corner of blurred image in mine shaft environment |
US8487959B1 (en) * | 2010-08-06 | 2013-07-16 | Google Inc. | Generating simulated eye movement traces for visual displays |
US8532387B2 (en) | 2009-09-04 | 2013-09-10 | Adobe Systems Incorporated | Methods and apparatus for procedural directional texture generation |
US8560517B2 (en) | 2011-07-05 | 2013-10-15 | Microsoft Corporation | Object retrieval using visual query context |
US8570339B2 (en) | 2011-05-26 | 2013-10-29 | Xerox Corporation | Modifying color adjustment choices based on image characteristics in an image editing system |
US8577182B1 (en) | 2010-07-13 | 2013-11-05 | Google Inc. | Method and system for automatically cropping images |
US20130307762A1 (en) * | 2012-05-17 | 2013-11-21 | Nokia Corporation | Method and apparatus for attracting a user's gaze to information in a non-intrusive manner |
EP2674881A1 (en) | 2012-06-15 | 2013-12-18 | Xerox Corporation | Privacy preserving method for querying a remote public service |
US8619098B2 (en) * | 2009-09-18 | 2013-12-31 | Adobe Systems Incorporated | Methods and apparatuses for generating co-salient thumbnails for digital images |
US8660351B2 (en) * | 2011-10-24 | 2014-02-25 | Hewlett-Packard Development Company, L.P. | Auto-cropping images using saliency maps |
US8675966B2 (en) | 2011-09-29 | 2014-03-18 | Hewlett-Packard Development Company, L.P. | System and method for saliency map generation |
CN103678552A (en) * | 2013-12-05 | 2014-03-26 | 武汉大学 | Remote-sensing image retrieving method and system based on salient regional features |
US20140122531A1 (en) * | 2012-11-01 | 2014-05-01 | Google Inc. | Image comparison process |
US20140126782A1 (en) * | 2012-11-02 | 2014-05-08 | Sony Corporation | Image display apparatus, image display method, and computer program |
WO2014092548A1 (en) * | 2012-12-13 | 2014-06-19 | Mimos Berhad | A method and system for identifying multiple entities in images |
US8774517B1 (en) * | 2007-06-14 | 2014-07-08 | Hrl Laboratories, Llc | System for identifying regions of interest in visual imagery |
CN103927758A (en) * | 2014-04-30 | 2014-07-16 | 重庆大学 | Saliency detection method based on contrast ratio and minimum convex hull of angular point |
US20140250110A1 (en) * | 2011-11-25 | 2014-09-04 | Linjun Yang | Image attractiveness based indexing and searching |
US20140270350A1 (en) * | 2013-03-14 | 2014-09-18 | Xerox Corporation | Data driven localization using task-dependent representations |
US8861868B2 (en) | 2011-08-29 | 2014-10-14 | Adobe-Systems Incorporated | Patch-based synthesis techniques |
EP2790135A1 (en) | 2013-03-04 | 2014-10-15 | Xerox Corporation | System and method for highlighting barriers to reducing paper usage |
US8867829B2 (en) | 2011-05-26 | 2014-10-21 | Xerox Corporation | Method and apparatus for editing color characteristics of electronic image |
US8873812B2 (en) | 2012-08-06 | 2014-10-28 | Xerox Corporation | Image segmentation using hierarchical unsupervised segmentation and hierarchical classifiers |
US8879796B2 (en) | 2012-08-23 | 2014-11-04 | Xerox Corporation | Region refocusing for data-driven object localization |
US8892562B2 (en) | 2012-07-26 | 2014-11-18 | Xerox Corporation | Categorization of multi-page documents by anisotropic diffusion |
US8917910B2 (en) | 2012-01-16 | 2014-12-23 | Xerox Corporation | Image segmentation based on approximation of segmentation similarity |
US20140376819A1 (en) * | 2013-06-21 | 2014-12-25 | Microsoft Corporation | Image recognition by image search |
US9008429B2 (en) | 2013-02-01 | 2015-04-14 | Xerox Corporation | Label-embedding for text recognition |
EP2863338A2 (en) | 2013-10-16 | 2015-04-22 | Xerox Corporation | Delayed vehicle identification for privacy enforcement |
US20150131899A1 (en) * | 2013-11-13 | 2015-05-14 | Canon Kabushiki Kaisha | Devices, systems, and methods for learning a discriminant image representation |
US20150134688A1 (en) * | 2013-11-12 | 2015-05-14 | Pinterest, Inc. | Image based search |
US20150130838A1 (en) * | 2013-11-13 | 2015-05-14 | Sony Corporation | Display control device, display control method, and program |
US9058611B2 (en) | 2011-03-17 | 2015-06-16 | Xerox Corporation | System and method for advertising using image search and classification |
US20150169982A1 (en) * | 2013-12-17 | 2015-06-18 | Canon Kabushiki Kaisha | Observer Preference Model |
US20150178587A1 (en) * | 2012-06-18 | 2015-06-25 | Thomson Licensing | Device and a method for color harmonization of an image |
US9070182B1 (en) | 2010-07-13 | 2015-06-30 | Google Inc. | Method and system for automatically cropping images |
US9075824B2 (en) | 2012-04-27 | 2015-07-07 | Xerox Corporation | Retrieval system and method leveraging category-level labels |
US9082047B2 (en) | 2013-08-20 | 2015-07-14 | Xerox Corporation | Learning beautiful and ugly visual attributes |
US9104946B2 (en) | 2012-10-15 | 2015-08-11 | Canon Kabushiki Kaisha | Systems and methods for comparing images |
US20150227784A1 (en) * | 2014-02-07 | 2015-08-13 | Tata Consultancy Services Limited | Object detection system and method |
EP2916265A1 (en) | 2014-03-03 | 2015-09-09 | Xerox Corporation | Self-learning object detectors for unlabeled videos using multi-task learning |
US20150262039A1 (en) * | 2014-03-13 | 2015-09-17 | Omron Corporation | Image processing apparatus and image processing method |
US20150294181A1 (en) * | 2014-04-15 | 2015-10-15 | Canon Kabushiki Kaisha | Object detection apparatus object detection method and storage medium |
US20150332605A1 (en) * | 2014-05-19 | 2015-11-19 | Thomson Licensing | Method for harmonizing colors, corresponding computer program and device |
DE102011113154B4 (en) * | 2011-09-14 | 2015-12-03 | Airbus Defence and Space GmbH | Machine learning method for machine learning of manifestations of objects in images |
US9229956B2 (en) | 2011-01-10 | 2016-01-05 | Microsoft Technology Licensing, Llc | Image retrieval using discriminative visual features |
US20160019440A1 (en) * | 2014-07-18 | 2016-01-21 | Adobe Systems Incorporated | Feature Interpolation |
GB2529888A (en) * | 2014-09-05 | 2016-03-09 | Apical Ltd | A method of image anaysis |
US20160104031A1 (en) * | 2014-10-14 | 2016-04-14 | Microsoft Technology Licensing, Llc | Depth from time of flight camera |
CN105513080A (en) * | 2015-12-21 | 2016-04-20 | 南京邮电大学 | Infrared image target salience evaluating method |
US9367763B1 (en) | 2015-01-12 | 2016-06-14 | Xerox Corporation | Privacy-preserving text to image matching |
US20160171299A1 (en) * | 2014-12-11 | 2016-06-16 | Samsung Electronics Co., Ltd. | Apparatus and method for computer aided diagnosis (cad) based on eye movement |
US9384423B2 (en) | 2013-05-28 | 2016-07-05 | Xerox Corporation | System and method for OCR output verification |
US20160196662A1 (en) * | 2013-08-16 | 2016-07-07 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Method and device for manufacturing virtual fitting model image |
CN105760886A (en) * | 2016-02-23 | 2016-07-13 | 北京联合大学 | Image scene multi-object segmentation method based on target identification and saliency detection |
EP3048561A1 (en) | 2015-01-21 | 2016-07-27 | Xerox Corporation | Method and system to perform text-to-image queries with wildcards |
US9443164B2 (en) | 2014-12-02 | 2016-09-13 | Xerox Corporation | System and method for product identification |
US9471828B2 (en) | 2014-07-28 | 2016-10-18 | Adobe Systems Incorporated | Accelerating object detection |
US20160360267A1 (en) * | 2014-01-14 | 2016-12-08 | Alcatel Lucent | Process for increasing the quality of experience for users that watch on their terminals a high definition video stream |
US20170046621A1 (en) * | 2014-04-30 | 2017-02-16 | Siemens Healthcare Diagnostics Inc. | Method and apparatus for performing block retrieval on block to be processed of urine sediment image |
US20170060812A1 (en) * | 2015-08-31 | 2017-03-02 | Qualtrics, Llc | Presenting views of an electronic document |
US9600738B2 (en) | 2015-04-07 | 2017-03-21 | Xerox Corporation | Discriminative embedding of local color names for object retrieval and classification |
US9613273B2 (en) * | 2015-05-19 | 2017-04-04 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
US9639806B2 (en) | 2014-04-15 | 2017-05-02 | Xerox Corporation | System and method for predicting iconicity of an image |
CN106780430A (en) * | 2016-11-17 | 2017-05-31 | 大连理工大学 | A kind of image significance detection method based on surroundedness and Markov model |
US9697439B2 (en) | 2014-10-02 | 2017-07-04 | Xerox Corporation | Efficient object detection with patch-level window processing |
US9740949B1 (en) | 2007-06-14 | 2017-08-22 | Hrl Laboratories, Llc | System and method for detection of objects of interest in imagery |
US9779284B2 (en) | 2013-12-17 | 2017-10-03 | Conduent Business Services, Llc | Privacy-preserving evidence in ALPR applications |
US9778351B1 (en) | 2007-10-04 | 2017-10-03 | Hrl Laboratories, Llc | System for surveillance by integrating radar with a panoramic staring sensor |
US9830529B2 (en) | 2016-04-26 | 2017-11-28 | Xerox Corporation | End-to-end saliency mapping via probability distribution prediction |
US9928532B2 (en) | 2014-03-04 | 2018-03-27 | Daniel Torres | Image based search engine |
US9940750B2 (en) | 2013-06-27 | 2018-04-10 | Help Lighting, Inc. | System and method for role negotiation in multi-reality environments |
US9952594B1 (en) | 2017-04-07 | 2018-04-24 | TuSimple | System and method for traffic data collection using unmanned aerial vehicles (UAVs) |
US9953236B1 (en) | 2017-03-10 | 2018-04-24 | TuSimple | System and method for semantic segmentation using dense upsampling convolution (DUC) |
US9959629B2 (en) | 2012-05-21 | 2018-05-01 | Help Lighting, Inc. | System and method for managing spatiotemporal uncertainty |
US10007679B2 (en) | 2008-08-08 | 2018-06-26 | The Research Foundation For The State University Of New York | Enhanced max margin learning on multimodal data mining in a multimedia database |
US10067509B1 (en) | 2017-03-10 | 2018-09-04 | TuSimple | System and method for occluding contour detection |
CN108898136A (en) * | 2018-07-04 | 2018-11-27 | 安徽大学 | A kind of cross-module state image significance detection method |
US10147193B2 (en) | 2017-03-10 | 2018-12-04 | TuSimple | System and method for semantic segmentation using hybrid dilated convolution (HDC) |
US20190005659A1 (en) * | 2014-09-19 | 2019-01-03 | Brain Corporation | Salient features tracking apparatus and methods using visual initialization |
US10269055B2 (en) | 2015-05-12 | 2019-04-23 | Pinterest, Inc. | Matching user provided representations of items with sellers of those items |
US10303522B2 (en) | 2017-07-01 | 2019-05-28 | TuSimple | System and method for distributed graphics processing unit (GPU) computation |
US10303956B2 (en) | 2017-08-23 | 2019-05-28 | TuSimple | System and method for using triplet loss for proposal free instance-wise semantic segmentation for lane detection |
US10308242B2 (en) | 2017-07-01 | 2019-06-04 | TuSimple | System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles |
US10311312B2 (en) | 2017-08-31 | 2019-06-04 | TuSimple | System and method for vehicle occlusion detection |
US10360257B2 (en) | 2017-08-08 | 2019-07-23 | TuSimple | System and method for image annotation |
US10387736B2 (en) | 2017-09-20 | 2019-08-20 | TuSimple | System and method for detecting taillight signals of a vehicle |
US10410055B2 (en) | 2017-10-05 | 2019-09-10 | TuSimple | System and method for aerial video traffic analysis |
CN110377204A (en) * | 2019-06-30 | 2019-10-25 | 华为技术有限公司 | A kind of method and electronic equipment generating user's head portrait |
US10474790B2 (en) | 2017-06-02 | 2019-11-12 | TuSimple | Large scale distributed simulation for realistic multiple-agent interactive environments |
US10471963B2 (en) | 2017-04-07 | 2019-11-12 | TuSimple | System and method for transitioning between an autonomous and manual driving mode based on detection of a drivers capacity to control a vehicle |
WO2019217562A1 (en) * | 2018-05-09 | 2019-11-14 | Figure Eight Technologies, Inc. | Aggregated image annotation |
US10481044B2 (en) | 2017-05-18 | 2019-11-19 | TuSimple | Perception simulation for improved autonomous vehicle control |
US10493988B2 (en) | 2017-07-01 | 2019-12-03 | TuSimple | System and method for adaptive cruise control for defensive driving |
US10521503B2 (en) | 2016-09-23 | 2019-12-31 | Qualtrics, Llc | Authenticating a respondent to an electronic survey |
US10528823B2 (en) | 2017-11-27 | 2020-01-07 | TuSimple | System and method for large-scale lane marking detection using multimodal sensor data |
US10528851B2 (en) | 2017-11-27 | 2020-01-07 | TuSimple | System and method for drivable road surface representation generation using multimodal sensor data |
US10552979B2 (en) | 2017-09-13 | 2020-02-04 | TuSimple | Output of a neural network method for deep odometry assisted by static scene optical flow |
US10552691B2 (en) | 2017-04-25 | 2020-02-04 | TuSimple | System and method for vehicle position and velocity estimation based on camera and lidar data |
US10558864B2 (en) | 2017-05-18 | 2020-02-11 | TuSimple | System and method for image localization based on semantic segmentation |
US10573044B2 (en) * | 2017-11-09 | 2020-02-25 | Adobe Inc. | Saliency-based collage generation using digital images |
US10607109B2 (en) * | 2016-11-16 | 2020-03-31 | Samsung Electronics Co., Ltd. | Method and apparatus to perform material recognition and training for material recognition |
US10607111B2 (en) * | 2018-02-06 | 2020-03-31 | Hrl Laboratories, Llc | Machine vision system for recognizing novel objects |
US20200128145A1 (en) * | 2015-02-13 | 2020-04-23 | Smugmug, Inc. | System and method for photo subject display optimization |
CN111071152A (en) * | 2018-10-19 | 2020-04-28 | 图森有限公司 | Fisheye image processing system and method |
US10649458B2 (en) | 2017-09-07 | 2020-05-12 | Tusimple, Inc. | Data-driven prediction-based system and method for trajectory planning of autonomous vehicles |
US10657390B2 (en) | 2017-11-27 | 2020-05-19 | Tusimple, Inc. | System and method for large-scale lane marking detection using multimodal sensor data |
US10656644B2 (en) | 2017-09-07 | 2020-05-19 | Tusimple, Inc. | System and method for using human driving patterns to manage speed control for autonomous vehicles |
US10666730B2 (en) | 2017-10-28 | 2020-05-26 | Tusimple, Inc. | Storage architecture for heterogeneous multimedia data |
US10671083B2 (en) | 2017-09-13 | 2020-06-02 | Tusimple, Inc. | Neural network architecture system for deep odometry assisted by static scene optical flow |
US10671873B2 (en) | 2017-03-10 | 2020-06-02 | Tusimple, Inc. | System and method for vehicle wheel detection |
US10679269B2 (en) | 2015-05-12 | 2020-06-09 | Pinterest, Inc. | Item selling on multiple web sites |
US10678234B2 (en) | 2017-08-24 | 2020-06-09 | Tusimple, Inc. | System and method for autonomous vehicle control to minimize energy cost |
US10685239B2 (en) | 2018-03-18 | 2020-06-16 | Tusimple, Inc. | System and method for lateral vehicle detection |
US10685244B2 (en) | 2018-02-27 | 2020-06-16 | Tusimple, Inc. | System and method for online real-time multi-object tracking |
US10706549B2 (en) * | 2016-12-20 | 2020-07-07 | Kodak Alaris Inc. | Iterative method for salient foreground detection and multi-object segmentation |
US10706735B2 (en) | 2016-10-31 | 2020-07-07 | Qualtrics, Llc | Guiding creation of an electronic survey |
US10710592B2 (en) | 2017-04-07 | 2020-07-14 | Tusimple, Inc. | System and method for path planning of autonomous vehicles based on gradient |
US10733465B2 (en) | 2017-09-20 | 2020-08-04 | Tusimple, Inc. | System and method for vehicle taillight state recognition |
US10739775B2 (en) | 2017-10-28 | 2020-08-11 | Tusimple, Inc. | System and method for real world autonomous vehicle trajectory simulation |
US10737695B2 (en) | 2017-07-01 | 2020-08-11 | Tusimple, Inc. | System and method for adaptive cruise control for low speed following |
US10752246B2 (en) | 2017-07-01 | 2020-08-25 | Tusimple, Inc. | System and method for adaptive cruise control with proximate vehicle detection |
US10762635B2 (en) | 2017-06-14 | 2020-09-01 | Tusimple, Inc. | System and method for actively selecting and labeling images for semantic segmentation |
US10762673B2 (en) | 2017-08-23 | 2020-09-01 | Tusimple, Inc. | 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US10768626B2 (en) | 2017-09-30 | 2020-09-08 | Tusimple, Inc. | System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles |
CN111666439A (en) * | 2020-05-28 | 2020-09-15 | 重庆渝抗医药科技有限公司 | Working method for rapidly extracting and dividing medical image big data aiming at cloud environment |
US10783381B2 (en) | 2017-08-31 | 2020-09-22 | Tusimple, Inc. | System and method for vehicle occlusion detection |
US10782694B2 (en) | 2017-09-07 | 2020-09-22 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
US10782693B2 (en) | 2017-09-07 | 2020-09-22 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
US10812589B2 (en) | 2017-10-28 | 2020-10-20 | Tusimple, Inc. | Storage architecture for heterogeneous multimedia data |
US10816354B2 (en) | 2017-08-22 | 2020-10-27 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
CN111936989A (en) * | 2018-03-29 | 2020-11-13 | 谷歌有限责任公司 | Similar medical image search |
US10839234B2 (en) | 2018-09-12 | 2020-11-17 | Tusimple, Inc. | System and method for three-dimensional (3D) object detection |
US10860018B2 (en) | 2017-11-30 | 2020-12-08 | Tusimple, Inc. | System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners |
US10877476B2 (en) | 2017-11-30 | 2020-12-29 | Tusimple, Inc. | Autonomous vehicle simulation system for analyzing motion planners |
CN112329810A (en) * | 2020-09-28 | 2021-02-05 | 北京师范大学 | Image recognition model training method and device based on saliency detection |
US10943146B2 (en) * | 2016-12-28 | 2021-03-09 | Ancestry.Com Operations Inc. | Clustering historical images using a convolutional neural net and labeled data bootstrapping |
US10942966B2 (en) | 2017-09-22 | 2021-03-09 | Pinterest, Inc. | Textual and image based search |
US10942271B2 (en) | 2018-10-30 | 2021-03-09 | Tusimple, Inc. | Determining an angle between a tow vehicle and a trailer |
US10953880B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10953881B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10962979B2 (en) | 2017-09-30 | 2021-03-30 | Tusimple, Inc. | System and method for multitask processing for autonomous vehicle computation and control |
CN112613528A (en) * | 2020-12-31 | 2021-04-06 | 广东工业大学 | Point cloud simplification method and device based on significance variation and storage medium |
US10970564B2 (en) | 2017-09-30 | 2021-04-06 | Tusimple, Inc. | System and method for instance-level lane detection for autonomous vehicle control |
US11009365B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization |
US11009356B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization and fusion |
US11010874B2 (en) | 2018-04-12 | 2021-05-18 | Tusimple, Inc. | Images for perception modules of autonomous vehicles |
US11029693B2 (en) | 2017-08-08 | 2021-06-08 | Tusimple, Inc. | Neural network based vehicle dynamics model |
US11055343B2 (en) | 2015-10-05 | 2021-07-06 | Pinterest, Inc. | Dynamic search control invocation and visual search |
CN113221715A (en) * | 2020-10-31 | 2021-08-06 | 嘉应学院 | Fire detection and identification method fused with visual attention mechanism |
US20210248715A1 (en) * | 2019-01-18 | 2021-08-12 | Ramot At Tel-Aviv University Ltd. | Method and system for end-to-end image processing |
US11104334B2 (en) | 2018-05-31 | 2021-08-31 | Tusimple, Inc. | System and method for proximate vehicle intention prediction for autonomous vehicles |
CN113345052A (en) * | 2021-06-11 | 2021-09-03 | 山东大学 | Classified data multi-view visualization coloring method and system based on similarity significance |
US11126653B2 (en) | 2017-09-22 | 2021-09-21 | Pinterest, Inc. | Mixed type image based search results |
US11151393B2 (en) | 2017-08-23 | 2021-10-19 | Tusimple, Inc. | Feature matching and corresponding refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US11182639B2 (en) * | 2017-04-16 | 2021-11-23 | Facebook, Inc. | Systems and methods for provisioning content |
US11222399B2 (en) * | 2014-10-09 | 2022-01-11 | Adobe Inc. | Image cropping suggestion using multiple saliency maps |
US11238374B2 (en) * | 2018-08-24 | 2022-02-01 | Htc Corporation | Method for verifying training data, training system, and computer readable medium |
US11263752B2 (en) * | 2019-05-09 | 2022-03-01 | Boe Technology Group Co., Ltd. | Computer-implemented method of detecting foreign object on background object in an image, apparatus for detecting foreign object on background object in an image, and computer-program product |
US11292480B2 (en) | 2018-09-13 | 2022-04-05 | Tusimple, Inc. | Remote safe driving methods and systems |
US11305782B2 (en) | 2018-01-11 | 2022-04-19 | Tusimple, Inc. | Monitoring system for autonomous vehicle operation |
US11312334B2 (en) | 2018-01-09 | 2022-04-26 | Tusimple, Inc. | Real-time remote control of vehicles with high redundancy |
US20220138950A1 (en) * | 2020-11-02 | 2022-05-05 | Adobe Inc. | Generating change comparisons during editing of digital images |
US11440473B2 (en) * | 2018-10-29 | 2022-09-13 | Aisin Corporation | Driving assistance apparatus |
US11500101B2 (en) | 2018-05-02 | 2022-11-15 | Tusimple, Inc. | Curb detection by analysis of reflection images |
US11580398B2 (en) * | 2016-10-14 | 2023-02-14 | KLA-Tenor Corp. | Diagnostic systems and methods for deep learning models configured for semiconductor applications |
US11587304B2 (en) | 2017-03-10 | 2023-02-21 | Tusimple, Inc. | System and method for occluding contour detection |
US11609946B2 (en) | 2015-10-05 | 2023-03-21 | Pinterest, Inc. | Dynamic search input selection |
US11625557B2 (en) | 2018-10-29 | 2023-04-11 | Hrl Laboratories, Llc | Process to learn new image classes without labels |
US11701931B2 (en) | 2020-06-18 | 2023-07-18 | Tusimple, Inc. | Angle and orientation measurements for vehicles with multiple drivable sections |
US11704692B2 (en) | 2016-05-12 | 2023-07-18 | Pinterest, Inc. | Promoting representations of items to users on behalf of sellers of those items |
US11810322B2 (en) | 2020-04-09 | 2023-11-07 | Tusimple, Inc. | Camera pose estimation techniques |
US11823460B2 (en) | 2019-06-14 | 2023-11-21 | Tusimple, Inc. | Image fusion for autonomous vehicle operation |
US11841735B2 (en) | 2017-09-22 | 2023-12-12 | Pinterest, Inc. | Object based image search |
US11958473B2 (en) | 2021-06-17 | 2024-04-16 | Tusimple, Inc. | System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles |
Families Citing this family (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8611695B1 (en) * | 2009-04-27 | 2013-12-17 | Google Inc. | Large scale patch search |
US8391634B1 (en) | 2009-04-28 | 2013-03-05 | Google Inc. | Illumination estimation for images |
US9031243B2 (en) * | 2009-09-28 | 2015-05-12 | iZotope, Inc. | Automatic labeling and control of audio algorithms by audio recognition |
DE102009060687A1 (en) * | 2009-11-04 | 2011-05-05 | Siemens Aktiengesellschaft | Method and device for computer-aided annotation of multimedia data |
US8494302B2 (en) * | 2010-11-11 | 2013-07-23 | Seiko Epson Corporation | Importance filtering for image retargeting |
US8798393B2 (en) | 2010-12-01 | 2014-08-05 | Google Inc. | Removing illumination variation from images |
EP2463821A1 (en) * | 2010-12-08 | 2012-06-13 | Alcatel Lucent | Method and system for segmenting an image |
US20120272171A1 (en) * | 2011-04-21 | 2012-10-25 | Panasonic Corporation | Apparatus, Method and Computer-Implemented Program for Editable Categorization |
US9501710B2 (en) * | 2012-06-29 | 2016-11-22 | Arizona Board Of Regents, A Body Corporate Of The State Of Arizona, Acting For And On Behalf Of Arizona State University | Systems, methods, and media for identifying object characteristics based on fixation points |
US9595298B2 (en) | 2012-07-18 | 2017-03-14 | Microsoft Technology Licensing, Llc | Transforming data to create layouts |
CN102968786B (en) * | 2012-10-23 | 2015-08-12 | 西北工业大学 | A kind of non-supervisory remote sensing images potential target method for detecting area |
US9626768B2 (en) | 2014-09-30 | 2017-04-18 | Microsoft Technology Licensing, Llc | Optimizing a visual perspective of media |
US10282069B2 (en) | 2014-09-30 | 2019-05-07 | Microsoft Technology Licensing, Llc | Dynamic presentation of suggested content |
US9454712B2 (en) * | 2014-10-08 | 2016-09-27 | Adobe Systems Incorporated | Saliency map computation |
EP3026917A1 (en) | 2014-11-27 | 2016-06-01 | Thomson Licensing | Methods and apparatus for model-based visual descriptors compression |
US9216591B1 (en) | 2014-12-23 | 2015-12-22 | Xerox Corporation | Method and system for mutual augmentation of a motivational printing awareness platform and recommendation-enabled printing drivers |
US10296846B2 (en) * | 2015-11-24 | 2019-05-21 | Xerox Corporation | Adapted domain specific class means classifier |
US10380228B2 (en) | 2017-02-10 | 2019-08-13 | Microsoft Technology Licensing, Llc | Output generation based on semantic expressions |
CN106845457A (en) * | 2017-03-02 | 2017-06-13 | 西安电子科技大学 | Method for detecting infrared puniness target based on spectrum residual error with fuzzy clustering |
WO2020066233A1 (en) * | 2018-09-28 | 2020-04-02 | 富士フイルム株式会社 | Learning device, learning device operation program, and learning device operation method |
US10929715B2 (en) | 2018-12-31 | 2021-02-23 | Robert Bosch Gmbh | Semantic segmentation using driver attention information |
US11263482B2 (en) | 2019-08-09 | 2022-03-01 | Florida Power & Light Company | AI image recognition training tool sets |
CN113515981A (en) | 2020-05-22 | 2021-10-19 | 阿里巴巴集团控股有限公司 | Identification method, device, equipment and storage medium |
US11423265B1 (en) | 2020-06-30 | 2022-08-23 | Amazon Technologies, Inc. | Content moderation using object detection and image classification |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5960111A (en) * | 1997-02-10 | 1999-09-28 | At&T Corp | Method and apparatus for segmenting images prior to coding |
US6151408A (en) * | 1995-02-10 | 2000-11-21 | Fuji Photo Film Co., Ltd. | Method for separating a desired pattern region from a color image |
US6208758B1 (en) * | 1991-09-12 | 2001-03-27 | Fuji Photo Film Co., Ltd. | Method for learning by a neural network including extracting a target object image for which learning operations are to be carried out |
US6711278B1 (en) * | 1998-09-10 | 2004-03-23 | Microsoft Corporation | Tracking semantic objects in vector image sequences |
US20050213810A1 (en) * | 2004-03-29 | 2005-09-29 | Kohtaro Sabe | Information processing apparatus and method, recording medium, and program |
US20050220336A1 (en) * | 2004-03-26 | 2005-10-06 | Kohtaro Sabe | Information processing apparatus and method, recording medium, and program |
US20060093184A1 (en) * | 2004-11-04 | 2006-05-04 | Fuji Xerox Co., Ltd. | Image processing apparatus |
US20070005356A1 (en) * | 2005-06-30 | 2007-01-04 | Florent Perronnin | Generic visual categorization method and system |
US20070258648A1 (en) * | 2006-05-05 | 2007-11-08 | Xerox Corporation | Generic visual classification with gradient components-based dimensionality enhancement |
US20080069456A1 (en) * | 2006-09-19 | 2008-03-20 | Xerox Corporation | Bags of visual context-dependent words for generic visual categorization |
US7400761B2 (en) * | 2003-09-30 | 2008-07-15 | Microsoft Corporation | Contrast-based image attention analysis framework |
US20080240532A1 (en) * | 2007-03-30 | 2008-10-02 | Siemens Corporation | System and Method for Detection of Fetal Anatomies From Ultrasound Images Using a Constrained Probabilistic Boosting Tree |
US20080304740A1 (en) * | 2007-06-06 | 2008-12-11 | Microsoft Corporation | Salient Object Detection |
US20080304742A1 (en) * | 2005-02-17 | 2008-12-11 | Connell Jonathan H | Combining multiple cues in a visual object detection system |
US20080317358A1 (en) * | 2007-06-25 | 2008-12-25 | Xerox Corporation | Class-based image enhancement system |
US7876938B2 (en) * | 2005-10-06 | 2011-01-25 | Siemens Medical Solutions Usa, Inc. | System and method for whole body landmark detection, segmentation and change quantification in digital images |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4705959B2 (en) | 2005-01-10 | 2011-06-22 | トムソン ライセンシング | Apparatus and method for creating image saliency map |
US8175376B2 (en) * | 2009-03-09 | 2012-05-08 | Xerox Corporation | Framework for image thumbnailing based on visual similarity |
-
2009
- 2009-03-09 US US12/400,277 patent/US8175376B2/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6208758B1 (en) * | 1991-09-12 | 2001-03-27 | Fuji Photo Film Co., Ltd. | Method for learning by a neural network including extracting a target object image for which learning operations are to be carried out |
US6151408A (en) * | 1995-02-10 | 2000-11-21 | Fuji Photo Film Co., Ltd. | Method for separating a desired pattern region from a color image |
US5960111A (en) * | 1997-02-10 | 1999-09-28 | At&T Corp | Method and apparatus for segmenting images prior to coding |
US6711278B1 (en) * | 1998-09-10 | 2004-03-23 | Microsoft Corporation | Tracking semantic objects in vector image sequences |
US7400761B2 (en) * | 2003-09-30 | 2008-07-15 | Microsoft Corporation | Contrast-based image attention analysis framework |
US20050220336A1 (en) * | 2004-03-26 | 2005-10-06 | Kohtaro Sabe | Information processing apparatus and method, recording medium, and program |
US20050213810A1 (en) * | 2004-03-29 | 2005-09-29 | Kohtaro Sabe | Information processing apparatus and method, recording medium, and program |
US20090175533A1 (en) * | 2004-03-29 | 2009-07-09 | Kohtaro Sabe | Information processing apparatus and method, recording medium, and program |
US7630525B2 (en) * | 2004-03-29 | 2009-12-08 | Sony Corporation | Information processing apparatus and method, recording medium, and program |
US20060093184A1 (en) * | 2004-11-04 | 2006-05-04 | Fuji Xerox Co., Ltd. | Image processing apparatus |
US20080304742A1 (en) * | 2005-02-17 | 2008-12-11 | Connell Jonathan H | Combining multiple cues in a visual object detection system |
US20070005356A1 (en) * | 2005-06-30 | 2007-01-04 | Florent Perronnin | Generic visual categorization method and system |
US7876938B2 (en) * | 2005-10-06 | 2011-01-25 | Siemens Medical Solutions Usa, Inc. | System and method for whole body landmark detection, segmentation and change quantification in digital images |
US20070258648A1 (en) * | 2006-05-05 | 2007-11-08 | Xerox Corporation | Generic visual classification with gradient components-based dimensionality enhancement |
US20080069456A1 (en) * | 2006-09-19 | 2008-03-20 | Xerox Corporation | Bags of visual context-dependent words for generic visual categorization |
US20080240532A1 (en) * | 2007-03-30 | 2008-10-02 | Siemens Corporation | System and Method for Detection of Fetal Anatomies From Ultrasound Images Using a Constrained Probabilistic Boosting Tree |
US7995820B2 (en) * | 2007-03-30 | 2011-08-09 | Siemens Medical Solutions Usa, Inc. | System and method for detection of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree |
US20080304740A1 (en) * | 2007-06-06 | 2008-12-11 | Microsoft Corporation | Salient Object Detection |
US20080317358A1 (en) * | 2007-06-25 | 2008-12-25 | Xerox Corporation | Class-based image enhancement system |
Cited By (308)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9740949B1 (en) | 2007-06-14 | 2017-08-22 | Hrl Laboratories, Llc | System and method for detection of objects of interest in imagery |
US8774517B1 (en) * | 2007-06-14 | 2014-07-08 | Hrl Laboratories, Llc | System for identifying regions of interest in visual imagery |
US9778351B1 (en) | 2007-10-04 | 2017-10-03 | Hrl Laboratories, Llc | System for surveillance by integrating radar with a panoramic staring sensor |
US10007679B2 (en) | 2008-08-08 | 2018-06-26 | The Research Foundation For The State University Of New York | Enhanced max margin learning on multimodal data mining in a multimedia database |
US20100215098A1 (en) * | 2009-02-23 | 2010-08-26 | Mondo Systems, Inc. | Apparatus and method for compressing pictures with roi-dependent compression parameters |
US10027966B2 (en) * | 2009-02-23 | 2018-07-17 | Mondo Systems, Inc. | Apparatus and method for compressing pictures with ROI-dependent compression parameters |
US8175376B2 (en) * | 2009-03-09 | 2012-05-08 | Xerox Corporation | Framework for image thumbnailing based on visual similarity |
US20120027309A1 (en) * | 2009-04-14 | 2012-02-02 | Nec Corporation | Image signature extraction device |
US8861871B2 (en) * | 2009-04-14 | 2014-10-14 | Nec Corporation | Image signature extraction device |
US20100281361A1 (en) * | 2009-04-30 | 2010-11-04 | Xerox Corporation | Automated method for alignment of document objects |
US8271871B2 (en) * | 2009-04-30 | 2012-09-18 | Xerox Corporation | Automated method for alignment of document objects |
US8532387B2 (en) | 2009-09-04 | 2013-09-10 | Adobe Systems Incorporated | Methods and apparatus for procedural directional texture generation |
US8787698B2 (en) | 2009-09-04 | 2014-07-22 | Adobe Systems Incorporated | Methods and apparatus for directional texture generation using image warping |
US8599219B2 (en) * | 2009-09-18 | 2013-12-03 | Adobe Systems Incorporated | Methods and apparatuses for generating thumbnail summaries for image collections |
US8619098B2 (en) * | 2009-09-18 | 2013-12-31 | Adobe Systems Incorporated | Methods and apparatuses for generating co-salient thumbnails for digital images |
US20130120454A1 (en) * | 2009-09-18 | 2013-05-16 | Elya Shechtman | Methods and Apparatuses for Generating Thumbnail Summaries for Image Collections |
US20110164815A1 (en) * | 2009-11-17 | 2011-07-07 | Samsung Electronics Co., Ltd. | Method, device and system for content based image categorization field |
US9355432B1 (en) | 2010-07-13 | 2016-05-31 | Google Inc. | Method and system for automatically cropping images |
US9070182B1 (en) | 2010-07-13 | 2015-06-30 | Google Inc. | Method and system for automatically cropping images |
US8577182B1 (en) | 2010-07-13 | 2013-11-05 | Google Inc. | Method and system for automatically cropping images |
US9552622B2 (en) | 2010-07-13 | 2017-01-24 | Google Inc. | Method and system for automatically cropping images |
US8487959B1 (en) * | 2010-08-06 | 2013-07-16 | Google Inc. | Generating simulated eye movement traces for visual displays |
US8933938B2 (en) * | 2010-08-06 | 2015-01-13 | Google Inc. | Generating simulated eye movement traces for visual displays |
US9229956B2 (en) | 2011-01-10 | 2016-01-05 | Microsoft Technology Licensing, Llc | Image retrieval using discriminative visual features |
US20130091515A1 (en) * | 2011-02-04 | 2013-04-11 | Kotaro Sakata | Degree of interest estimating device and degree of interest estimating method |
US9538219B2 (en) * | 2011-02-04 | 2017-01-03 | Panasonic Intellectual Property Corporation Of America | Degree of interest estimating device and degree of interest estimating method |
US9058611B2 (en) | 2011-03-17 | 2015-06-16 | Xerox Corporation | System and method for advertising using image search and classification |
US20120328150A1 (en) * | 2011-03-22 | 2012-12-27 | Rochester Institute Of Technology | Methods for assisting with object recognition in image sequences and devices thereof |
US9785835B2 (en) * | 2011-03-22 | 2017-10-10 | Rochester Institute Of Technology | Methods for assisting with object recognition in image sequences and devices thereof |
US10026198B2 (en) | 2011-04-08 | 2018-07-17 | Creative Technology Ltd | Method, system and electronic device for at least one of efficient graphic processing and salient based learning |
TWI566116B (en) * | 2011-04-08 | 2017-01-11 | 創新科技有限公司 | Electronic device for at least one of efficient graphic processing and salient based learning |
CN103597484A (en) * | 2011-04-08 | 2014-02-19 | 创新科技有限公司 | A method, system and electronic device for at least one of efficient graphic processing and salient based learning |
WO2012138299A1 (en) * | 2011-04-08 | 2012-10-11 | Creative Technology Ltd | A method, system and electronic device for at least one of efficient graphic processing and salient based learning |
US8867829B2 (en) | 2011-05-26 | 2014-10-21 | Xerox Corporation | Method and apparatus for editing color characteristics of electronic image |
US8570339B2 (en) | 2011-05-26 | 2013-10-29 | Xerox Corporation | Modifying color adjustment choices based on image characteristics in an image editing system |
US8560517B2 (en) | 2011-07-05 | 2013-10-15 | Microsoft Corporation | Object retrieval using visual query context |
US10622111B2 (en) | 2011-08-12 | 2020-04-14 | Help Lightning, Inc. | System and method for image registration of multiple video streams |
US20130038632A1 (en) * | 2011-08-12 | 2013-02-14 | Marcus W. Dillavou | System and method for image registration of multiple video streams |
US9886552B2 (en) * | 2011-08-12 | 2018-02-06 | Help Lighting, Inc. | System and method for image registration of multiple video streams |
US10181361B2 (en) | 2011-08-12 | 2019-01-15 | Help Lightning, Inc. | System and method for image registration of multiple video streams |
US8379981B1 (en) | 2011-08-26 | 2013-02-19 | Toyota Motor Engineering & Manufacturing North America, Inc. | Segmenting spatiotemporal data based on user gaze data |
US9317773B2 (en) | 2011-08-29 | 2016-04-19 | Adobe Systems Incorporated | Patch-based synthesis techniques using color and color gradient voting |
US8861868B2 (en) | 2011-08-29 | 2014-10-14 | Adobe-Systems Incorporated | Patch-based synthesis techniques |
DE102011113154B4 (en) * | 2011-09-14 | 2015-12-03 | Airbus Defence and Space GmbH | Machine learning method for machine learning of manifestations of objects in images |
US9361543B2 (en) | 2011-09-14 | 2016-06-07 | Airbus Defence and Space GmbH | Automatic learning method for the automatic learning of forms of appearance of objects in images |
US8675966B2 (en) | 2011-09-29 | 2014-03-18 | Hewlett-Packard Development Company, L.P. | System and method for saliency map generation |
US8824797B2 (en) | 2011-10-03 | 2014-09-02 | Xerox Corporation | Graph-based segmentation integrating visible and NIR information |
EP2579211A2 (en) | 2011-10-03 | 2013-04-10 | Xerox Corporation | Graph-based segmentation integrating visible and NIR information |
US8660351B2 (en) * | 2011-10-24 | 2014-02-25 | Hewlett-Packard Development Company, L.P. | Auto-cropping images using saliency maps |
US20140250110A1 (en) * | 2011-11-25 | 2014-09-04 | Linjun Yang | Image attractiveness based indexing and searching |
US8938116B2 (en) * | 2011-12-08 | 2015-01-20 | Yahoo! Inc. | Image cropping using supervised learning |
US20150131900A1 (en) * | 2011-12-08 | 2015-05-14 | Yahoo! Inc. | Image Cropping Using Supervised Learning |
US9177207B2 (en) * | 2011-12-08 | 2015-11-03 | Zynga Inc. | Image cropping using supervised learning |
US20130148880A1 (en) * | 2011-12-08 | 2013-06-13 | Yahoo! Inc. | Image Cropping Using Supervised Learning |
US8929680B2 (en) * | 2011-12-12 | 2015-01-06 | Canon Kabushiki Kaisha | Method, apparatus and system for identifying distracting elements in an image |
US20130148910A1 (en) * | 2011-12-12 | 2013-06-13 | Canon Kabushiki Kaisha | Method, apparatus and system for identifying distracting elements in an image |
US8917910B2 (en) | 2012-01-16 | 2014-12-23 | Xerox Corporation | Image segmentation based on approximation of segmentation similarity |
US9075824B2 (en) | 2012-04-27 | 2015-07-07 | Xerox Corporation | Retrieval system and method leveraging category-level labels |
US9030505B2 (en) * | 2012-05-17 | 2015-05-12 | Nokia Technologies Oy | Method and apparatus for attracting a user's gaze to information in a non-intrusive manner |
US20130307762A1 (en) * | 2012-05-17 | 2013-11-21 | Nokia Corporation | Method and apparatus for attracting a user's gaze to information in a non-intrusive manner |
US9959629B2 (en) | 2012-05-21 | 2018-05-01 | Help Lighting, Inc. | System and method for managing spatiotemporal uncertainty |
EP2674881A1 (en) | 2012-06-15 | 2013-12-18 | Xerox Corporation | Privacy preserving method for querying a remote public service |
US8666992B2 (en) * | 2012-06-15 | 2014-03-04 | Xerox Corporation | Privacy preserving method for querying a remote public service |
US20150178587A1 (en) * | 2012-06-18 | 2015-06-25 | Thomson Licensing | Device and a method for color harmonization of an image |
CN102800092A (en) * | 2012-07-12 | 2012-11-28 | 北方工业大学 | Point-to-surface image significance detection |
US8892562B2 (en) | 2012-07-26 | 2014-11-18 | Xerox Corporation | Categorization of multi-page documents by anisotropic diffusion |
US8873812B2 (en) | 2012-08-06 | 2014-10-28 | Xerox Corporation | Image segmentation using hierarchical unsupervised segmentation and hierarchical classifiers |
EP2701098A3 (en) * | 2012-08-23 | 2015-06-03 | Xerox Corporation | Region refocusing for data-driven object localization |
US8879796B2 (en) | 2012-08-23 | 2014-11-04 | Xerox Corporation | Region refocusing for data-driven object localization |
US9104946B2 (en) | 2012-10-15 | 2015-08-11 | Canon Kabushiki Kaisha | Systems and methods for comparing images |
CN102945378A (en) * | 2012-10-23 | 2013-02-27 | 西北工业大学 | Method for detecting potential target regions of remote sensing image on basis of monitoring method |
US9418079B2 (en) * | 2012-11-01 | 2016-08-16 | Google Inc. | Image comparison process |
US20140122531A1 (en) * | 2012-11-01 | 2014-05-01 | Google Inc. | Image comparison process |
US20140126782A1 (en) * | 2012-11-02 | 2014-05-08 | Sony Corporation | Image display apparatus, image display method, and computer program |
CN103020993B (en) * | 2012-11-28 | 2015-06-17 | 杭州电子科技大学 | Visual saliency detection method by fusing dual-channel color contrasts |
CN103020993A (en) * | 2012-11-28 | 2013-04-03 | 杭州电子科技大学 | Visual saliency detection method by fusing dual-channel color contrasts |
WO2014092548A1 (en) * | 2012-12-13 | 2014-06-19 | Mimos Berhad | A method and system for identifying multiple entities in images |
US9008429B2 (en) | 2013-02-01 | 2015-04-14 | Xerox Corporation | Label-embedding for text recognition |
US8879103B2 (en) | 2013-03-04 | 2014-11-04 | Xerox Corporation | System and method for highlighting barriers to reducing paper usage |
EP2790135A1 (en) | 2013-03-04 | 2014-10-15 | Xerox Corporation | System and method for highlighting barriers to reducing paper usage |
US9158995B2 (en) * | 2013-03-14 | 2015-10-13 | Xerox Corporation | Data driven localization using task-dependent representations |
US20140270350A1 (en) * | 2013-03-14 | 2014-09-18 | Xerox Corporation | Data driven localization using task-dependent representations |
CN103198319A (en) * | 2013-04-11 | 2013-07-10 | 武汉大学 | Method of extraction of corner of blurred image in mine shaft environment |
US9384423B2 (en) | 2013-05-28 | 2016-07-05 | Xerox Corporation | System and method for OCR output verification |
US20140376819A1 (en) * | 2013-06-21 | 2014-12-25 | Microsoft Corporation | Image recognition by image search |
US9754177B2 (en) * | 2013-06-21 | 2017-09-05 | Microsoft Technology Licensing, Llc | Identifying objects within an image |
US10482673B2 (en) | 2013-06-27 | 2019-11-19 | Help Lightning, Inc. | System and method for role negotiation in multi-reality environments |
US9940750B2 (en) | 2013-06-27 | 2018-04-10 | Help Lighting, Inc. | System and method for role negotiation in multi-reality environments |
US20160196662A1 (en) * | 2013-08-16 | 2016-07-07 | Beijing Jingdong Shangke Information Technology Co., Ltd. | Method and device for manufacturing virtual fitting model image |
US9082047B2 (en) | 2013-08-20 | 2015-07-14 | Xerox Corporation | Learning beautiful and ugly visual attributes |
EP2863338A2 (en) | 2013-10-16 | 2015-04-22 | Xerox Corporation | Delayed vehicle identification for privacy enforcement |
US9412031B2 (en) | 2013-10-16 | 2016-08-09 | Xerox Corporation | Delayed vehicle identification for privacy enforcement |
US11436272B2 (en) | 2013-11-12 | 2022-09-06 | Pinterest, Inc. | Object based image based search |
US20150134688A1 (en) * | 2013-11-12 | 2015-05-14 | Pinterest, Inc. | Image based search |
US10515110B2 (en) * | 2013-11-12 | 2019-12-24 | Pinterest, Inc. | Image based search |
US10832448B2 (en) | 2013-11-13 | 2020-11-10 | Sony Corporation | Display control device, display control method, and program |
US20150130838A1 (en) * | 2013-11-13 | 2015-05-14 | Sony Corporation | Display control device, display control method, and program |
US9275306B2 (en) * | 2013-11-13 | 2016-03-01 | Canon Kabushiki Kaisha | Devices, systems, and methods for learning a discriminant image representation |
US20150131899A1 (en) * | 2013-11-13 | 2015-05-14 | Canon Kabushiki Kaisha | Devices, systems, and methods for learning a discriminant image representation |
US10115210B2 (en) * | 2013-11-13 | 2018-10-30 | Sony Corporation | Display control device, display control method, and program |
CN103678552A (en) * | 2013-12-05 | 2014-03-26 | 武汉大学 | Remote-sensing image retrieving method and system based on salient regional features |
US20150169982A1 (en) * | 2013-12-17 | 2015-06-18 | Canon Kabushiki Kaisha | Observer Preference Model |
US9779284B2 (en) | 2013-12-17 | 2017-10-03 | Conduent Business Services, Llc | Privacy-preserving evidence in ALPR applications |
US9558423B2 (en) * | 2013-12-17 | 2017-01-31 | Canon Kabushiki Kaisha | Observer preference model |
US20160360267A1 (en) * | 2014-01-14 | 2016-12-08 | Alcatel Lucent | Process for increasing the quality of experience for users that watch on their terminals a high definition video stream |
US9430701B2 (en) * | 2014-02-07 | 2016-08-30 | Tata Consultancy Services Limited | Object detection system and method |
US20150227784A1 (en) * | 2014-02-07 | 2015-08-13 | Tata Consultancy Services Limited | Object detection system and method |
US9158971B2 (en) | 2014-03-03 | 2015-10-13 | Xerox Corporation | Self-learning object detectors for unlabeled videos using multi-task learning |
EP2916265A1 (en) | 2014-03-03 | 2015-09-09 | Xerox Corporation | Self-learning object detectors for unlabeled videos using multi-task learning |
US9928532B2 (en) | 2014-03-04 | 2018-03-27 | Daniel Torres | Image based search engine |
US20150262039A1 (en) * | 2014-03-13 | 2015-09-17 | Omron Corporation | Image processing apparatus and image processing method |
US9600746B2 (en) * | 2014-03-13 | 2017-03-21 | Omron Corporation | Image processing apparatus and image processing method |
US20170236030A1 (en) * | 2014-04-15 | 2017-08-17 | Canon Kabushiki Kaisha | Object detection apparatus, object detection method, and storage medium |
US20150294181A1 (en) * | 2014-04-15 | 2015-10-15 | Canon Kabushiki Kaisha | Object detection apparatus object detection method and storage medium |
US9672439B2 (en) * | 2014-04-15 | 2017-06-06 | Canon Kabushiki Kaisha | Object detection apparatus object detection method and storage medium |
US10643100B2 (en) * | 2014-04-15 | 2020-05-05 | Canon Kabushiki Kaisha | Object detection apparatus, object detection method, and storage medium |
US9639806B2 (en) | 2014-04-15 | 2017-05-02 | Xerox Corporation | System and method for predicting iconicity of an image |
US20170046621A1 (en) * | 2014-04-30 | 2017-02-16 | Siemens Healthcare Diagnostics Inc. | Method and apparatus for performing block retrieval on block to be processed of urine sediment image |
US11386340B2 (en) * | 2014-04-30 | 2022-07-12 | Siemens Healthcare Diagnostic Inc. | Method and apparatus for performing block retrieval on block to be processed of urine sediment image |
CN103927758A (en) * | 2014-04-30 | 2014-07-16 | 重庆大学 | Saliency detection method based on contrast ratio and minimum convex hull of angular point |
US10748069B2 (en) * | 2014-04-30 | 2020-08-18 | Siemens Healthcare Diagnostics Inc. | Method and apparatus for performing block retrieval on block to be processed of urine sediment image |
US20150332605A1 (en) * | 2014-05-19 | 2015-11-19 | Thomson Licensing | Method for harmonizing colors, corresponding computer program and device |
US9761152B2 (en) * | 2014-05-19 | 2017-09-12 | Thomson Licensing | Method for harmonizing colors, corresponding computer program and device |
US9734434B2 (en) | 2014-07-18 | 2017-08-15 | Adobe Systems Incorporated | Feature interpolation |
US9424484B2 (en) * | 2014-07-18 | 2016-08-23 | Adobe Systems Incorporated | Feature interpolation |
US20160019440A1 (en) * | 2014-07-18 | 2016-01-21 | Adobe Systems Incorporated | Feature Interpolation |
US10043057B2 (en) | 2014-07-28 | 2018-08-07 | Adobe Systems Incorporated | Accelerating object detection |
US9471828B2 (en) | 2014-07-28 | 2016-10-18 | Adobe Systems Incorporated | Accelerating object detection |
US9858677B2 (en) | 2014-09-05 | 2018-01-02 | Apical Ltd. | Method of image analysis |
GB2529888A (en) * | 2014-09-05 | 2016-03-09 | Apical Ltd | A method of image anaysis |
CN105404884A (en) * | 2014-09-05 | 2016-03-16 | 顶级公司 | Image analysis method |
GB2529888B (en) * | 2014-09-05 | 2020-09-23 | Apical Ltd | A method of image analysis |
US20190005659A1 (en) * | 2014-09-19 | 2019-01-03 | Brain Corporation | Salient features tracking apparatus and methods using visual initialization |
US9697439B2 (en) | 2014-10-02 | 2017-07-04 | Xerox Corporation | Efficient object detection with patch-level window processing |
US11222399B2 (en) * | 2014-10-09 | 2022-01-11 | Adobe Inc. | Image cropping suggestion using multiple saliency maps |
US9773155B2 (en) * | 2014-10-14 | 2017-09-26 | Microsoft Technology Licensing, Llc | Depth from time of flight camera |
US20160104031A1 (en) * | 2014-10-14 | 2016-04-14 | Microsoft Technology Licensing, Llc | Depth from time of flight camera |
US10311282B2 (en) | 2014-10-14 | 2019-06-04 | Microsoft Technology Licensing, Llc | Depth from time of flight camera |
US9443164B2 (en) | 2014-12-02 | 2016-09-13 | Xerox Corporation | System and method for product identification |
US20160171299A1 (en) * | 2014-12-11 | 2016-06-16 | Samsung Electronics Co., Ltd. | Apparatus and method for computer aided diagnosis (cad) based on eye movement |
US9818029B2 (en) * | 2014-12-11 | 2017-11-14 | Samsung Electronics Co., Ltd. | Apparatus and method for computer aided diagnosis (CAD) based on eye movement |
US9367763B1 (en) | 2015-01-12 | 2016-06-14 | Xerox Corporation | Privacy-preserving text to image matching |
EP3048561A1 (en) | 2015-01-21 | 2016-07-27 | Xerox Corporation | Method and system to perform text-to-image queries with wildcards |
US9626594B2 (en) | 2015-01-21 | 2017-04-18 | Xerox Corporation | Method and system to perform text-to-image queries with wildcards |
US20200128145A1 (en) * | 2015-02-13 | 2020-04-23 | Smugmug, Inc. | System and method for photo subject display optimization |
US11743402B2 (en) * | 2015-02-13 | 2023-08-29 | Awes.Me, Inc. | System and method for photo subject display optimization |
US9600738B2 (en) | 2015-04-07 | 2017-03-21 | Xerox Corporation | Discriminative embedding of local color names for object retrieval and classification |
US11935102B2 (en) | 2015-05-12 | 2024-03-19 | Pinterest, Inc. | Matching user provided representations of items with sellers of those items |
US10269055B2 (en) | 2015-05-12 | 2019-04-23 | Pinterest, Inc. | Matching user provided representations of items with sellers of those items |
US11443357B2 (en) | 2015-05-12 | 2022-09-13 | Pinterest, Inc. | Matching user provided representations of items with sellers of those items |
US10679269B2 (en) | 2015-05-12 | 2020-06-09 | Pinterest, Inc. | Item selling on multiple web sites |
US10210421B2 (en) * | 2015-05-19 | 2019-02-19 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
US20170185860A1 (en) * | 2015-05-19 | 2017-06-29 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
US9613273B2 (en) * | 2015-05-19 | 2017-04-04 | Toyota Motor Engineering & Manufacturing North America, Inc. | Apparatus and method for object tracking |
US10049085B2 (en) * | 2015-08-31 | 2018-08-14 | Qualtrics, Llc | Presenting views of an electronic document |
US20170060812A1 (en) * | 2015-08-31 | 2017-03-02 | Qualtrics, Llc | Presenting views of an electronic document |
US10430497B2 (en) | 2015-08-31 | 2019-10-01 | Qualtrics, Llc | Presenting views of an electronic document |
US11113448B2 (en) | 2015-08-31 | 2021-09-07 | Qualtrics, Llc | Presenting views of an electronic document |
US11055343B2 (en) | 2015-10-05 | 2021-07-06 | Pinterest, Inc. | Dynamic search control invocation and visual search |
US11609946B2 (en) | 2015-10-05 | 2023-03-21 | Pinterest, Inc. | Dynamic search input selection |
CN105513080A (en) * | 2015-12-21 | 2016-04-20 | 南京邮电大学 | Infrared image target salience evaluating method |
CN105760886A (en) * | 2016-02-23 | 2016-07-13 | 北京联合大学 | Image scene multi-object segmentation method based on target identification and saliency detection |
US9830529B2 (en) | 2016-04-26 | 2017-11-28 | Xerox Corporation | End-to-end saliency mapping via probability distribution prediction |
US11704692B2 (en) | 2016-05-12 | 2023-07-18 | Pinterest, Inc. | Promoting representations of items to users on behalf of sellers of those items |
US10521503B2 (en) | 2016-09-23 | 2019-12-31 | Qualtrics, Llc | Authenticating a respondent to an electronic survey |
US11017166B2 (en) | 2016-09-23 | 2021-05-25 | Qualtrics, Llc | Authenticating a respondent to an electronic survey |
US11580398B2 (en) * | 2016-10-14 | 2023-02-14 | KLA-Tenor Corp. | Diagnostic systems and methods for deep learning models configured for semiconductor applications |
US11568754B2 (en) | 2016-10-31 | 2023-01-31 | Qualtrics, Llc | Guiding creation of an electronic survey |
US10909868B2 (en) | 2016-10-31 | 2021-02-02 | Qualtrics, Llc | Guiding creation of an electronic survey |
US10706735B2 (en) | 2016-10-31 | 2020-07-07 | Qualtrics, Llc | Guiding creation of an electronic survey |
US10607109B2 (en) * | 2016-11-16 | 2020-03-31 | Samsung Electronics Co., Ltd. | Method and apparatus to perform material recognition and training for material recognition |
CN106780430A (en) * | 2016-11-17 | 2017-05-31 | 大连理工大学 | A kind of image significance detection method based on surroundedness and Markov model |
US10706549B2 (en) * | 2016-12-20 | 2020-07-07 | Kodak Alaris Inc. | Iterative method for salient foreground detection and multi-object segmentation |
US11120556B2 (en) * | 2016-12-20 | 2021-09-14 | Kodak Alaris Inc. | Iterative method for salient foreground detection and multi-object segmentation |
US10943146B2 (en) * | 2016-12-28 | 2021-03-09 | Ancestry.Com Operations Inc. | Clustering historical images using a convolutional neural net and labeled data bootstrapping |
US11721091B2 (en) | 2016-12-28 | 2023-08-08 | Ancestry.Com Operations Inc. | Clustering historical images using a convolutional neural net and labeled data bootstrapping |
US11501513B2 (en) | 2017-03-10 | 2022-11-15 | Tusimple, Inc. | System and method for vehicle wheel detection |
US10671873B2 (en) | 2017-03-10 | 2020-06-02 | Tusimple, Inc. | System and method for vehicle wheel detection |
US11587304B2 (en) | 2017-03-10 | 2023-02-21 | Tusimple, Inc. | System and method for occluding contour detection |
US10147193B2 (en) | 2017-03-10 | 2018-12-04 | TuSimple | System and method for semantic segmentation using hybrid dilated convolution (HDC) |
US9953236B1 (en) | 2017-03-10 | 2018-04-24 | TuSimple | System and method for semantic segmentation using dense upsampling convolution (DUC) |
US10067509B1 (en) | 2017-03-10 | 2018-09-04 | TuSimple | System and method for occluding contour detection |
US11673557B2 (en) | 2017-04-07 | 2023-06-13 | Tusimple, Inc. | System and method for path planning of autonomous vehicles based on gradient |
US9952594B1 (en) | 2017-04-07 | 2018-04-24 | TuSimple | System and method for traffic data collection using unmanned aerial vehicles (UAVs) |
US10471963B2 (en) | 2017-04-07 | 2019-11-12 | TuSimple | System and method for transitioning between an autonomous and manual driving mode based on detection of a drivers capacity to control a vehicle |
US10710592B2 (en) | 2017-04-07 | 2020-07-14 | Tusimple, Inc. | System and method for path planning of autonomous vehicles based on gradient |
US11182639B2 (en) * | 2017-04-16 | 2021-11-23 | Facebook, Inc. | Systems and methods for provisioning content |
US11557128B2 (en) | 2017-04-25 | 2023-01-17 | Tusimple, Inc. | System and method for vehicle position and velocity estimation based on camera and LIDAR data |
US11928868B2 (en) | 2017-04-25 | 2024-03-12 | Tusimple, Inc. | System and method for vehicle position and velocity estimation based on camera and LIDAR data |
US10552691B2 (en) | 2017-04-25 | 2020-02-04 | TuSimple | System and method for vehicle position and velocity estimation based on camera and lidar data |
US10481044B2 (en) | 2017-05-18 | 2019-11-19 | TuSimple | Perception simulation for improved autonomous vehicle control |
US10558864B2 (en) | 2017-05-18 | 2020-02-11 | TuSimple | System and method for image localization based on semantic segmentation |
US10830669B2 (en) | 2017-05-18 | 2020-11-10 | Tusimple, Inc. | Perception simulation for improved autonomous vehicle control |
US10867188B2 (en) | 2017-05-18 | 2020-12-15 | Tusimple, Inc. | System and method for image localization based on semantic segmentation |
US11885712B2 (en) | 2017-05-18 | 2024-01-30 | Tusimple, Inc. | Perception simulation for improved autonomous vehicle control |
US10474790B2 (en) | 2017-06-02 | 2019-11-12 | TuSimple | Large scale distributed simulation for realistic multiple-agent interactive environments |
US10762635B2 (en) | 2017-06-14 | 2020-09-01 | Tusimple, Inc. | System and method for actively selecting and labeling images for semantic segmentation |
US10752246B2 (en) | 2017-07-01 | 2020-08-25 | Tusimple, Inc. | System and method for adaptive cruise control with proximate vehicle detection |
US10308242B2 (en) | 2017-07-01 | 2019-06-04 | TuSimple | System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles |
US10737695B2 (en) | 2017-07-01 | 2020-08-11 | Tusimple, Inc. | System and method for adaptive cruise control for low speed following |
US11040710B2 (en) | 2017-07-01 | 2021-06-22 | Tusimple, Inc. | System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles |
US11753008B2 (en) | 2017-07-01 | 2023-09-12 | Tusimple, Inc. | System and method for adaptive cruise control with proximate vehicle detection |
US10493988B2 (en) | 2017-07-01 | 2019-12-03 | TuSimple | System and method for adaptive cruise control for defensive driving |
US10303522B2 (en) | 2017-07-01 | 2019-05-28 | TuSimple | System and method for distributed graphics processing unit (GPU) computation |
US11029693B2 (en) | 2017-08-08 | 2021-06-08 | Tusimple, Inc. | Neural network based vehicle dynamics model |
US11550329B2 (en) | 2017-08-08 | 2023-01-10 | Tusimple, Inc. | Neural network based vehicle dynamics model |
US10360257B2 (en) | 2017-08-08 | 2019-07-23 | TuSimple | System and method for image annotation |
US10816354B2 (en) | 2017-08-22 | 2020-10-27 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
US11573095B2 (en) | 2017-08-22 | 2023-02-07 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
US11874130B2 (en) | 2017-08-22 | 2024-01-16 | Tusimple, Inc. | Verification module system and method for motion-based lane detection with multiple sensors |
US11846510B2 (en) | 2017-08-23 | 2023-12-19 | Tusimple, Inc. | Feature matching and correspondence refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US10303956B2 (en) | 2017-08-23 | 2019-05-28 | TuSimple | System and method for using triplet loss for proposal free instance-wise semantic segmentation for lane detection |
US10762673B2 (en) | 2017-08-23 | 2020-09-01 | Tusimple, Inc. | 3D submap reconstruction system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US11151393B2 (en) | 2017-08-23 | 2021-10-19 | Tusimple, Inc. | Feature matching and corresponding refinement and 3D submap position refinement system and method for centimeter precision localization using camera-based submap and LiDAR-based global map |
US10678234B2 (en) | 2017-08-24 | 2020-06-09 | Tusimple, Inc. | System and method for autonomous vehicle control to minimize energy cost |
US11886183B2 (en) | 2017-08-24 | 2024-01-30 | Tusimple, Inc. | System and method for autonomous vehicle control to minimize energy cost |
US11366467B2 (en) | 2017-08-24 | 2022-06-21 | Tusimple, Inc. | System and method for autonomous vehicle control to minimize energy cost |
US10783381B2 (en) | 2017-08-31 | 2020-09-22 | Tusimple, Inc. | System and method for vehicle occlusion detection |
US10311312B2 (en) | 2017-08-31 | 2019-06-04 | TuSimple | System and method for vehicle occlusion detection |
US11745736B2 (en) | 2017-08-31 | 2023-09-05 | Tusimple, Inc. | System and method for vehicle occlusion detection |
US10656644B2 (en) | 2017-09-07 | 2020-05-19 | Tusimple, Inc. | System and method for using human driving patterns to manage speed control for autonomous vehicles |
US10649458B2 (en) | 2017-09-07 | 2020-05-12 | Tusimple, Inc. | Data-driven prediction-based system and method for trajectory planning of autonomous vehicles |
US10953880B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10953881B2 (en) | 2017-09-07 | 2021-03-23 | Tusimple, Inc. | System and method for automated lane change control for autonomous vehicles |
US10782693B2 (en) | 2017-09-07 | 2020-09-22 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
US11853071B2 (en) | 2017-09-07 | 2023-12-26 | Tusimple, Inc. | Data-driven prediction-based system and method for trajectory planning of autonomous vehicles |
US11294375B2 (en) | 2017-09-07 | 2022-04-05 | Tusimple, Inc. | System and method for using human driving patterns to manage speed control for autonomous vehicles |
US10782694B2 (en) | 2017-09-07 | 2020-09-22 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
US11892846B2 (en) | 2017-09-07 | 2024-02-06 | Tusimple, Inc. | Prediction-based system and method for trajectory planning of autonomous vehicles |
US10552979B2 (en) | 2017-09-13 | 2020-02-04 | TuSimple | Output of a neural network method for deep odometry assisted by static scene optical flow |
US10671083B2 (en) | 2017-09-13 | 2020-06-02 | Tusimple, Inc. | Neural network architecture system for deep odometry assisted by static scene optical flow |
US10733465B2 (en) | 2017-09-20 | 2020-08-04 | Tusimple, Inc. | System and method for vehicle taillight state recognition |
US11328164B2 (en) | 2017-09-20 | 2022-05-10 | Tusimple, Inc. | System and method for vehicle taillight state recognition |
US11734563B2 (en) | 2017-09-20 | 2023-08-22 | Tusimple, Inc. | System and method for vehicle taillight state recognition |
US10387736B2 (en) | 2017-09-20 | 2019-08-20 | TuSimple | System and method for detecting taillight signals of a vehicle |
US11126653B2 (en) | 2017-09-22 | 2021-09-21 | Pinterest, Inc. | Mixed type image based search results |
US11620331B2 (en) | 2017-09-22 | 2023-04-04 | Pinterest, Inc. | Textual and image based search |
US11841735B2 (en) | 2017-09-22 | 2023-12-12 | Pinterest, Inc. | Object based image search |
US10942966B2 (en) | 2017-09-22 | 2021-03-09 | Pinterest, Inc. | Textual and image based search |
US11853883B2 (en) | 2017-09-30 | 2023-12-26 | Tusimple, Inc. | System and method for instance-level lane detection for autonomous vehicle control |
US11500387B2 (en) | 2017-09-30 | 2022-11-15 | Tusimple, Inc. | System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles |
US10970564B2 (en) | 2017-09-30 | 2021-04-06 | Tusimple, Inc. | System and method for instance-level lane detection for autonomous vehicle control |
US10962979B2 (en) | 2017-09-30 | 2021-03-30 | Tusimple, Inc. | System and method for multitask processing for autonomous vehicle computation and control |
US10768626B2 (en) | 2017-09-30 | 2020-09-08 | Tusimple, Inc. | System and method for providing multiple agents for decision making, trajectory planning, and control for autonomous vehicles |
US10410055B2 (en) | 2017-10-05 | 2019-09-10 | TuSimple | System and method for aerial video traffic analysis |
US10739775B2 (en) | 2017-10-28 | 2020-08-11 | Tusimple, Inc. | System and method for real world autonomous vehicle trajectory simulation |
US10666730B2 (en) | 2017-10-28 | 2020-05-26 | Tusimple, Inc. | Storage architecture for heterogeneous multimedia data |
US11853072B2 (en) * | 2017-10-28 | 2023-12-26 | Tusimple, Inc. | System and method for real world autonomous vehicle trajectory simulation |
US20230004165A1 (en) * | 2017-10-28 | 2023-01-05 | Tusimple, Inc. | System and method for real world autonomous vehicle trajectory simulation |
US10812589B2 (en) | 2017-10-28 | 2020-10-20 | Tusimple, Inc. | Storage architecture for heterogeneous multimedia data |
US11435748B2 (en) | 2017-10-28 | 2022-09-06 | Tusimple, Inc. | System and method for real world autonomous vehicle trajectory simulation |
US10573044B2 (en) * | 2017-11-09 | 2020-02-25 | Adobe Inc. | Saliency-based collage generation using digital images |
US10657390B2 (en) | 2017-11-27 | 2020-05-19 | Tusimple, Inc. | System and method for large-scale lane marking detection using multimodal sensor data |
US10528823B2 (en) | 2017-11-27 | 2020-01-07 | TuSimple | System and method for large-scale lane marking detection using multimodal sensor data |
US10528851B2 (en) | 2017-11-27 | 2020-01-07 | TuSimple | System and method for drivable road surface representation generation using multimodal sensor data |
US11580754B2 (en) | 2017-11-27 | 2023-02-14 | Tusimple, Inc. | System and method for large-scale lane marking detection using multimodal sensor data |
US10860018B2 (en) | 2017-11-30 | 2020-12-08 | Tusimple, Inc. | System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners |
US10877476B2 (en) | 2017-11-30 | 2020-12-29 | Tusimple, Inc. | Autonomous vehicle simulation system for analyzing motion planners |
US11681292B2 (en) | 2017-11-30 | 2023-06-20 | Tusimple, Inc. | System and method for generating simulated vehicles with configured behaviors for analyzing autonomous vehicle motion planners |
US11782440B2 (en) | 2017-11-30 | 2023-10-10 | Tusimple, Inc. | Autonomous vehicle simulation system for analyzing motion planners |
US11312334B2 (en) | 2018-01-09 | 2022-04-26 | Tusimple, Inc. | Real-time remote control of vehicles with high redundancy |
US11305782B2 (en) | 2018-01-11 | 2022-04-19 | Tusimple, Inc. | Monitoring system for autonomous vehicle operation |
US10607111B2 (en) * | 2018-02-06 | 2020-03-31 | Hrl Laboratories, Llc | Machine vision system for recognizing novel objects |
US11740093B2 (en) | 2018-02-14 | 2023-08-29 | Tusimple, Inc. | Lane marking localization and fusion |
US11852498B2 (en) | 2018-02-14 | 2023-12-26 | Tusimple, Inc. | Lane marking localization |
US11009365B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization |
US11009356B2 (en) | 2018-02-14 | 2021-05-18 | Tusimple, Inc. | Lane marking localization and fusion |
US11295146B2 (en) | 2018-02-27 | 2022-04-05 | Tusimple, Inc. | System and method for online real-time multi-object tracking |
US11830205B2 (en) | 2018-02-27 | 2023-11-28 | Tusimple, Inc. | System and method for online real-time multi- object tracking |
US10685244B2 (en) | 2018-02-27 | 2020-06-16 | Tusimple, Inc. | System and method for online real-time multi-object tracking |
US11074462B2 (en) | 2018-03-18 | 2021-07-27 | Tusimple, Inc. | System and method for lateral vehicle detection |
US11610406B2 (en) | 2018-03-18 | 2023-03-21 | Tusimple, Inc. | System and method for lateral vehicle detection |
US10685239B2 (en) | 2018-03-18 | 2020-06-16 | Tusimple, Inc. | System and method for lateral vehicle detection |
CN111936989A (en) * | 2018-03-29 | 2020-11-13 | 谷歌有限责任公司 | Similar medical image search |
US11694308B2 (en) | 2018-04-12 | 2023-07-04 | Tusimple, Inc. | Images for perception modules of autonomous vehicles |
US11010874B2 (en) | 2018-04-12 | 2021-05-18 | Tusimple, Inc. | Images for perception modules of autonomous vehicles |
US11500101B2 (en) | 2018-05-02 | 2022-11-15 | Tusimple, Inc. | Curb detection by analysis of reflection images |
WO2019217562A1 (en) * | 2018-05-09 | 2019-11-14 | Figure Eight Technologies, Inc. | Aggregated image annotation |
US11017266B2 (en) * | 2018-05-09 | 2021-05-25 | Figure Eight Technologies, Inc. | Aggregated image annotation |
US11948082B2 (en) | 2018-05-31 | 2024-04-02 | Tusimple, Inc. | System and method for proximate vehicle intention prediction for autonomous vehicles |
US11104334B2 (en) | 2018-05-31 | 2021-08-31 | Tusimple, Inc. | System and method for proximate vehicle intention prediction for autonomous vehicles |
CN108898136A (en) * | 2018-07-04 | 2018-11-27 | 安徽大学 | A kind of cross-module state image significance detection method |
US11238374B2 (en) * | 2018-08-24 | 2022-02-01 | Htc Corporation | Method for verifying training data, training system, and computer readable medium |
US10839234B2 (en) | 2018-09-12 | 2020-11-17 | Tusimple, Inc. | System and method for three-dimensional (3D) object detection |
US11727691B2 (en) | 2018-09-12 | 2023-08-15 | Tusimple, Inc. | System and method for three-dimensional (3D) object detection |
US11292480B2 (en) | 2018-09-13 | 2022-04-05 | Tusimple, Inc. | Remote safe driving methods and systems |
CN111071152A (en) * | 2018-10-19 | 2020-04-28 | 图森有限公司 | Fisheye image processing system and method |
US11935210B2 (en) | 2018-10-19 | 2024-03-19 | Tusimple, Inc. | System and method for fisheye image processing |
US10796402B2 (en) | 2018-10-19 | 2020-10-06 | Tusimple, Inc. | System and method for fisheye image processing |
US11625557B2 (en) | 2018-10-29 | 2023-04-11 | Hrl Laboratories, Llc | Process to learn new image classes without labels |
US11440473B2 (en) * | 2018-10-29 | 2022-09-13 | Aisin Corporation | Driving assistance apparatus |
US11714192B2 (en) | 2018-10-30 | 2023-08-01 | Tusimple, Inc. | Determining an angle between a tow vehicle and a trailer |
US10942271B2 (en) | 2018-10-30 | 2021-03-09 | Tusimple, Inc. | Determining an angle between a tow vehicle and a trailer |
US20210248715A1 (en) * | 2019-01-18 | 2021-08-12 | Ramot At Tel-Aviv University Ltd. | Method and system for end-to-end image processing |
US11263752B2 (en) * | 2019-05-09 | 2022-03-01 | Boe Technology Group Co., Ltd. | Computer-implemented method of detecting foreign object on background object in an image, apparatus for detecting foreign object on background object in an image, and computer-program product |
US11823460B2 (en) | 2019-06-14 | 2023-11-21 | Tusimple, Inc. | Image fusion for autonomous vehicle operation |
WO2021000841A1 (en) * | 2019-06-30 | 2021-01-07 | 华为技术有限公司 | Method for generating user profile photo, and electronic device |
US11914850B2 (en) | 2019-06-30 | 2024-02-27 | Huawei Technologies Co., Ltd. | User profile picture generation method and electronic device |
CN110377204A (en) * | 2019-06-30 | 2019-10-25 | 华为技术有限公司 | A kind of method and electronic equipment generating user's head portrait |
US11810322B2 (en) | 2020-04-09 | 2023-11-07 | Tusimple, Inc. | Camera pose estimation techniques |
CN111666439A (en) * | 2020-05-28 | 2020-09-15 | 重庆渝抗医药科技有限公司 | Working method for rapidly extracting and dividing medical image big data aiming at cloud environment |
US11701931B2 (en) | 2020-06-18 | 2023-07-18 | Tusimple, Inc. | Angle and orientation measurements for vehicles with multiple drivable sections |
CN112329810A (en) * | 2020-09-28 | 2021-02-05 | 北京师范大学 | Image recognition model training method and device based on saliency detection |
CN113221715A (en) * | 2020-10-31 | 2021-08-06 | 嘉应学院 | Fire detection and identification method fused with visual attention mechanism |
US20220138950A1 (en) * | 2020-11-02 | 2022-05-05 | Adobe Inc. | Generating change comparisons during editing of digital images |
CN112613528A (en) * | 2020-12-31 | 2021-04-06 | 广东工业大学 | Point cloud simplification method and device based on significance variation and storage medium |
CN113345052A (en) * | 2021-06-11 | 2021-09-03 | 山东大学 | Classified data multi-view visualization coloring method and system based on similarity significance |
US11958473B2 (en) | 2021-06-17 | 2024-04-16 | Tusimple, Inc. | System and method for using human driving patterns to detect and correct abnormal driving behaviors of autonomous vehicles |
Also Published As
Publication number | Publication date |
---|---|
US8175376B2 (en) | 2012-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8175376B2 (en) | Framework for image thumbnailing based on visual similarity | |
US8537409B2 (en) | Image summarization by a learning approach | |
US8111923B2 (en) | System and method for object class localization and semantic class based image segmentation | |
Marchesotti et al. | A framework for visual saliency detection with applications to image thumbnailing | |
US9430719B2 (en) | System and method for providing objectified image renderings using recognition information from images | |
US8879796B2 (en) | Region refocusing for data-driven object localization | |
US8009921B2 (en) | Context dependent intelligent thumbnail images | |
US8837820B2 (en) | Image selection based on photographic style | |
US8897505B2 (en) | System and method for enabling the use of captured images through recognition | |
US7809722B2 (en) | System and method for enabling search and retrieval from image files based on recognized information | |
US7809192B2 (en) | System and method for recognizing objects from images and identifying relevancy amongst images and information | |
US9158995B2 (en) | Data driven localization using task-dependent representations | |
US8594385B2 (en) | Predicting the aesthetic value of an image | |
US8917910B2 (en) | Image segmentation based on approximation of segmentation similarity | |
WO2006122164A2 (en) | System and method for enabling the use of captured images through recognition | |
Cavalcanti et al. | A survey on automatic techniques for enhancement and analysis of digital photography | |
Wang | Integrated content-aware image retargeting system | |
Chen et al. | An efficient framework for location-based scene matching in image databases | |
Yang et al. | An automatic object retrieval framework for complex background | |
Gavilan et al. | Mobile image retrieval using morphological color segmentation | |
Cooray | Enhancing Person Annotation for Personal Photo Management Using Content and Context based Technologies | |
Apostolidis et al. | Multimedia Processing Essentials | |
Moskovchuk et al. | Video Metadata Extraction in a Video-Mail System | |
Wang | INTEGRATED CONTENT-AWARE IMAGE | |
Iqbal | Important Person Detection from Multiple Videos |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARCHESOTTI, LUCA;CIFARELLI, CLAUDIO;CSURKA, GABRIELA;SIGNING DATES FROM 20090402 TO 20090414;REEL/FRAME:022559/0093 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS AGENT, DELAWARE Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:062740/0214 Effective date: 20221107 |
|
AS | Assignment |
Owner name: XEROX CORPORATION, CONNECTICUT Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT R/F 062740/0214;ASSIGNOR:CITIBANK, N.A., AS AGENT;REEL/FRAME:063694/0122 Effective date: 20230517 |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:064760/0389 Effective date: 20230621 |
|
AS | Assignment |
Owner name: JEFFERIES FINANCE LLC, AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:065628/0019 Effective date: 20231117 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:XEROX CORPORATION;REEL/FRAME:066741/0001 Effective date: 20240206 |