US20160300121A1 - Neural network image representation - Google Patents

Neural network image representation Download PDF

Info

Publication number
US20160300121A1
US20160300121A1 US15/188,729 US201615188729A US2016300121A1 US 20160300121 A1 US20160300121 A1 US 20160300121A1 US 201615188729 A US201615188729 A US 201615188729A US 2016300121 A1 US2016300121 A1 US 2016300121A1
Authority
US
United States
Prior art keywords
input image
interest points
feature maps
image
values
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/188,729
Inventor
Michael Chertok
Alexander LORBERT
Adi Pinhas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Superfish Ltd
Original Assignee
Superfish Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Superfish Ltd filed Critical Superfish Ltd
Priority to US15/188,729 priority Critical patent/US20160300121A1/en
Assigned to SUPERFISH LTD. reassignment SUPERFISH LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LORBERT, ALEXANDER, PINHAS, ADI, CHERTOK, MICHAEL
Publication of US20160300121A1 publication Critical patent/US20160300121A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/422Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
    • G06V10/426Graphical representations
    • G06K9/627
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/146Methods for optical code recognition the method including quality enhancement steps
    • G06K7/1482Methods for optical code recognition the method including quality enhancement steps using fuzzy logic or natural solvers, such as neural networks, genetic algorithms and simulated annealing
    • G06K9/469
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/02Computing arrangements based on specific mathematical models using fuzzy logic
    • G06N7/04Physical realisation
    • G06N7/046Implementation by means of a neural network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the disclosed technique relates to image representation in general, and to methods and systems for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, in particular.
  • CNN convolutional neural networks
  • CNN can learn to produce multiscale representations of an image.
  • the features extracted by the convolutional neural networks are features that are pertinent to the image on which the convolutional network is applied.
  • the CNN of this publication includes eight learned layers (five convolutional layers and three fully-connected layers).
  • the pooling layers in this publication include overlapping tiles covering their respective input in an overlapping manner.
  • the detailed CNN is employed for image classification.
  • An article by Zeiler et al., entitled “Visualizing and Understanding Convolutional Networks” published on http://arxiv.org/abs/1311.2901v3, is directed to a visualization technique that gives insight into the function of intermediate feature layers of a CNN.
  • the visualization technique shows a plausible and interpretable input pattern (situated in the original input image space) that gives rise to a given activation in the feature maps.
  • the visualization technique employs a multi-layered de-convolutional network.
  • a de-convolutional network employs the same components as a convolutional network (e.g., filtering and pooling) but in reverse.
  • this article describes mapping detected features in the produced feature maps to the image space of the input image.
  • the de-convolutional networks are employed as a probe of an already trained convolutional network.
  • the system includes an analysis component and a classification component.
  • the analysis component analyzes image characteristics of an image that includes an average color value.
  • the classification component includes a self-organizing map (e.g., Kohonen neural network) for classifying the image relative to a second image based on classification information computed from the average color value.
  • a method for representing an input image includes the steps of applying a trained neural network on the input image, selecting a plurality of feature maps of an output of at least one selected layer of the trained neural network, determining a location corresponding to each of the plurality of feature maps in an image space of the input image, and defining a plurality of interest points of the input image for representing said input image.
  • the feature maps are selected according to values attributed thereto by the trained neural network.
  • the interest points are defined based on the determined locations corresponding to the feature maps.
  • FIGS. 1A and 1B are schematic illustrations of a convolutional neural network, constructed and operative in accordance with an embodiment of the disclosed technique
  • FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique;
  • FIG. 3 is a schematic illustration of a system for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.
  • the disclosed technique overcomes the disadvantages of the prior art by providing a method and a system for representing an input image as a set of interest points (or key points) detected by applying a trained Neural Network (e.g., a Convolutional Neural Network—CNN) on the input image.
  • a trained Neural Network e.g., a Convolutional Neural Network—CNN
  • the input image is run through the trained CNN and the most prominent extracted features (i.e., salient features) of the layers of the trained CNN are back-projected onto the image space of the original input image.
  • the back-projected features are all combined into a single intensity map, or heat map.
  • Interest points are extracted from the heat map. Each interest point is defined by a distinct location in the image space of the input image, and can be associated with a respective descriptor. Furthermore, the geometric relations between the extracted interest points are determined according to the locations of the interest points.
  • the input image can be represented as a graph according to the extracted interest points and the geometric relations between the interest points.
  • the graph representation of the input image can then be employed for various visual tasks, such as determining image similarity, similarity-based image search, and the like.
  • the features detected by applying the trained CNN on the input image are features that are relevant to the input image. That is, the input image is expressed through the features that are attributed with the greatest values, and which can therefore be considered as most pertinent to the image.
  • the input image might be better expressed by the features learned and detected by the CNN, than by predetermined conventional features not adapted specifically to the analyzed input image.
  • these high value features represent the input image in an optimized manner and can provide better results when employed for various visual tasks (as compared to conventional features).
  • the disclosed technique represents an image by employing key points (interest points) that correspond to multi-scale salient features of the image as detected by the CNN.
  • FIGS. 1A and 1B are schematic illustrations of a Convolutional Neural Network (CNN), generally referenced 10 , constructed and operative in accordance with an embodiment of the disclosed technique.
  • CNN Convolutional Neural Network
  • FIG. 1A depicts an overview of CNN 10
  • FIG. 1B depicts a selected convolutional layer of CNN 10 .
  • CNN 10 includes an input image 12 , followed by first and second convolutional layers 14 and 18 with respective outputs 16 and 20 . It is noted that CNN 10 can include more, or less, convolutional layers. The output of second convolutional layer 20 is then vectorized in vectorizing layer 22 . A vectoriziation output 24 is fed into a layered, fully connected, neural network (not referenced). In the example set forth in FIG. 1A , in the fully connected neural network of CNN 10 there are three fully connected layers 26 , 30 and 34 —more, or less, layers are possible.
  • Each of fully connected layers 26 , 30 and 34 comprises a variable number of linear, or affine, operators potentially followed by a nonlinear activation function.
  • the last fully connected layer 34 is typically a normalization layer so that the final elements of an output vector 36 are bounded in some fixed, interpretable range.
  • the parameters of each convolutional layer and each fully connected layer are set during a training (i.e., learning) period of CNN 10 .
  • each input to a convolutional layer is a multichannel feature map 52 that is represented by a three-dimensional (3D) matrix.
  • a color input image may contain the various color intensity channels.
  • the depth dimension of the input 3D matrix, representing feature map 52 is defined by the channels of multichannel feature map 52 .
  • the 3D matrix could be an M ⁇ N ⁇ 3 matrix (i.e., the depth dimension has a value of three).
  • the horizontal and vertical dimensions of 3D matrix 52 i.e., the height and width of matrix 52 ) are defined by the respective dimensions of the input image.
  • the input is convolved with filters 54 that are set in the training stage of CNN 10 . While each of filters 54 has the same depth as input feature map 52 , the horizontal and vertical dimensions of the filter may vary. Each of the filters 54 is convolved with the layer input 52 to generate a two-dimensional (2D) matrix 56 .
  • an optional max pooling operation 58 is applied to produce feature maps 60 .
  • the output of convolutional layer 56 enters max pooling layer 58 (i.e., performing the max pooling operation) whose outputs are feature maps 60 .
  • max pooling layer 58 i.e., performing the max pooling operation
  • These 2D feature maps 60 are then stacked to yield a 3D output matrix 62 .
  • Both convolution and max pooling operations contain various strides (or incremental steps) by which the respective input is horizontally and vertically traversed.
  • Each of convolutional layer outputs 16 and 20 , and fully connected layer outputs 28 , 32 , and 36 details the image structures (i.e., features) that best matched the filters of the respective layer, thereby “detecting” those image structures.
  • each of convolutional layer outputs 16 and 20 , and fully connected layer outputs 28 , 32 , and 36 detects image structures in an escalating manner such that the deeper layers detect features of greater complexity.
  • the first convolutional layer 14 detects edges
  • the second convolutional layer 18 which is deeper than first layer 14 , may detect object attributes, such as curvature and texture.
  • CNN 10 FIG. 1A
  • CNN 10 can include other numbers of convolutional layers, such as a single layer, four layers, five layers and the like.
  • Max pooling layer 58 selects the input feature maps of greatest value (i.e., indicating that the filters that produced those largest feature map values can serve as salient feature detectors). Max pooling layer 58 demarcates its input into a set of overlapping or non-overlapping tiles and for each such tile, outputs the maximum value. Thus, max-pooling layer 58 reduces the computational cost for deeper layers (i.e., max pooling layer 58 serves as a sub-sampling or down-sampling layer).
  • a convolution layer can be augmented with rectified linear operation and a max pooling layer 58 can be augmented with normalization (e.g., local response normalization—as described, for example, in the Krizhevsky article referenced in the background section herein above).
  • max pooling layer 58 can be replaced by another feature-pooling layer, such as average pooling layer, a quantile pooling layer, or rank pooling layer.
  • Fully connected layers 26 , 30 , and 34 operate as a Multilayer Perceptron (MLP).
  • MLP Multilayer Perceptron
  • CNN 10 includes two convolutional layers and three fully connected layers.
  • the disclosed technique can be implemented by employing CNNs having more, or less, layers (e.g., three convolutional layers and five fully connected layers).
  • other parameters and characteristics of the CNN can be adapted according to the specific task, available resources, user preferences, the training set, the input image, and the like.
  • the disclosed technique is also applicable to other types of artificial neural networks (besides CNNs).
  • the salient features detected by the neural network are regions, or patches, of the input image which are attributed with high values when convolved with the filters of the neural network.
  • the salient features can vary between simple corners to semantic object parts, such as an eye of a person, a whole head or face, or a car wheel, depending on the input image.
  • FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique.
  • a trained Neural Network e.g., a trained Convolutional Neural Network—CNN
  • the CNN may include convolutional layers and fully connected layers.
  • CNN 10 is received after being trained with a selected training set.
  • the trained CNN is applied on an input image.
  • the input image may, or may not, be related to the training set employed for training the neural network. That is, there is no requirement to use a training image, or to use an image from an image class found in the training set.
  • the input image conforms to the expected input dimensions of the trained CNN. As such, the input image may require resizing and cropping, for example, for adapting it to the input dimensions of the CNN.
  • a pixel-based mean image, as determined in the training phase i.e., mean image of the image training set
  • input image 12 is inputted into CNN 10 as a multichannel feature map represented by a 3D matrix. In general, the input image has to undergo the same (or similar) preprocessing, which was applied to every image when training the neural network.
  • a plurality of feature maps from the output of the layers of the neural network are selected according to their values.
  • the feature maps are produced in response to convolution of the various filters with the layer input.
  • feature maps that are attributed with the top ranked values are selected. That is, the highest valued feature maps at the output of the convolutional layer (or the fully connected layer) are selected.
  • the highest valued feature maps can be selected at any stage following the convolution operation, for example prior to max pooling (i.e., even if the convolutional layer includes the optional max pooling operation).
  • the applied filters of the layers of the trained CNN serve as feature detectors that detect the locations of the layer input that have high correspondence with the filters.
  • the feature maps having the top ranked values (i.e., also referred to as top ranked feature maps or top ranked values) represent the locations within the layer input that showed the greatest correspondence to the applied filters.
  • the top ranked values represent salient features of the layer input as detected by the filter detectors of the respective layer.
  • the top ranked values can be selected “on the fly” during application of the trained CNN on the input image. That is, as a convolutional layer processes its respective input and produces respective output, the largest output values are selected.
  • the top ranked values can be selected such that a selected percentage or quantity of values is selected (e.g., the upper 15% or the largest 1000 values), or can be selected such that only values exceeding a threshold are selected. With reference to FIG. 1B , the greatest values of layer output 62 are selected.
  • each selected top ranked value i.e., high value feature map
  • each selected top ranked value selected for each layer of the CNN, is mapped back to the image space of the original image.
  • the back-projection of the top ranked values to the image space of the input image is performed, for example, by employing a de-convolutional network.
  • the back-projection is performed by a simple backpropagation (e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above).
  • a simple backpropagation e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above.
  • to approximately invert the convolutional step we may use any technique from the Blind Source Separation field, for example, a sparsity-based approach.
  • a matched filter approach can be employed for inverting the convolutional step.
  • the stored masks can be used to place the max values in their appropriate input locations (i.e., zeroes are placed by default).
  • any technique for mapping the selected high valued feature maps back to the image space of the input image can be applied.
  • the method of the disclosed technique can involve tracking all potential features (i.e., image patches or image regions detected by the neural network) throughout the network, thereby avoiding the need for back-projecting the features. For example, a selected image patch at the input to the first layer is tracked and the value attributed to that image patch by each of the filters of the first layer is recorded. Thus, the output of the first layer that is associated with the selected image patch is known.
  • the output of the first layer, associated with the selected image patch, that enters the second layer as input, is tracked, and so forth. Thereby, the output of each subsequent layer that is associated with the selected image patch is determined.
  • the selected highest (top ranked) values are back-projected to the image space of input image 12 .
  • a plurality of interest points of the input image are defined based on the locations corresponding to the selected feature maps.
  • Each interest point is associated with a distinct position within the image space of the input image.
  • the geometric relations between the interest points e.g., the distances and/or the angles between the interest points
  • a descriptor can be determined for each interest point.
  • the descriptor of an interest point provides further information about the interest point. For example, in case the interest points are employed for determining image similarity, an interest point of a first image should not be compared to an interest point of a second image, having a completely different descriptor. In this manner, computational resources can be saved during image similarity determination, and other visual tasks related thereto.
  • the locations determined in the back-projection step are defined as the interest points of the input image.
  • the method continues in procedure 114 .
  • a subset of the back-projected locations are employed as interest points for representing the input image.
  • the selected subset of interest points should preferably correspond to the more prominent features detected by the different layers of the CNN.
  • the method of the disclosed technique may include additional sub-steps 110 and 112 as detailed herein below.
  • the locations corresponding to the selected feature maps are combined into a heat map.
  • the heat map includes the selected top ranked values, each located in a location determined in the back-projection process.
  • the heat map combines values representing salient features extracted from all layers of the CNN (i.e., features of various scale levels).
  • a respective heat map is generated for each layer of the network.
  • key points detected by each layer can be selected separately.
  • knowledge of the scale level of each key point can be maintained and each layer can be represented separately.
  • the selected highest values i.e., the locations corresponding to the selected feature maps attributed with the top ranked values
  • Each selected value is located in its respective location within the image space of input image 12 as determined by back-projection.
  • a plurality of interest points are extracted from the heat map (or heat maps).
  • the interest points can be, for example, the peaks in the intensity map (e.g., global peaks or local peaks).
  • the interest points are the centers of the densest portions of the heat map.
  • any intensity based method for selecting key points out of the locations determined by back-projection of the detected salient features can be employed.
  • the extracted interest points are employed for representing the input image for performing various visual tasks. With reference to FIG. 1A , interest points are extracted from the heat map, and can be employed for representing input image 12 .
  • the input image is represented as a graph according to the extracted interest points and the geometric relations between them.
  • the geometric relations between the interest points can be, for example, the distance between pairs of points and the angles between triplets of points.
  • the graph image representation maintains data respective of the geometric relations between the interest points and thereby, can improve the results of various visual tasks, such as similarity based image search.
  • procedure 114 is optional and the method can stop after procedure 112 (or even after procedure 108 ) and represent the image as a set of key points (interest points).
  • input image 12 is represented as a graph according to the extracted interest points and the geometric relations between the interest points.
  • FIG. 3 is a schematic illustration of a system, generally referenced 150 , for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.
  • System 150 includes a CNN trainer 152 , a CNN executer 154 , a top ranks values selector 156 , a feature back-projector 158 , a heat map generator 160 , an interest point extractor 162 , an image representer 164 , and a storage device 168 .
  • Storage device 168 is coupled with each of CNN trainer 152 , CNN executer 154 , top ranked values selector 156 , feature back-projector 158 , heat map generator 160 , interest point extractor 162 , and image representer 164 for enabling the different components of system 150 to store and retrieve data. It is noted that all components except storage device 168 can be embedded on a single processing device or on an array of processing devices connected there-between. For example, components 152 - 164 are all embedded on a single graphics processing unit (GPU) 166 , or a single Central Processing Unit (CPU) 166 .
  • Storage device 168 can be any storage device, such as a magnetic storage device (e.g., Hard Disc Drive—HDD), an optic storage device, and the like.
  • CNN trainer 152 retrieves a CNN architecture and a training image data set from storage device 168 or from another external data source. CNN trainer executes the CNN on any of the images of the training image data set, and accordingly trains the CNN to detect features pertinent to the images of the training image data set. CNN trainer stores the trained CNN on data storage device.
  • CNN executer 154 retrieves the trained CNN from storage device 168 and further retrieves an input image to be represented as a graph according to interest points detected by applying the trained CNN on the input image. CNN executer applies the trained CNN to the input image.
  • top ranked values selector 156 selects the top ranked values produced in response to the convolution of the various filters applied on the input to the respective layer.
  • the top ranked values indicate that the filter that produced the high value is pertinent to the input image and therefore should be included in the image graph representation.
  • Feature back-projector 158 retrieves the top ranked values and performs back-projection for each top ranked value.
  • feature back-projector maps the top ranked value onto a respective location in the image space of the input image. That is, feature back-projector 158 determines for each selected value the location in the input image that when convolved with a respective filter of a respective convolutional layer produced the selected high value.
  • Heat map generator 160 combines all back-projected top ranked values into a single heat map including each back-projected value positioned at its respective location within the image space of the input image, as determined by feature back-projector 158 .
  • Interest point extractor 162 extracts interest points (e.g., intensity based interest points) from the heat map produced by heat map generator 160 .
  • Each extracted interest point is associated with a location within the image space of the input image (e.g., the coordinates of the interest point). Additionally, the interest point extractor can also determine a descriptor for each of the extracted interest points.
  • Image representer 164 represents the input image as a graph based on the extracted interest points and the geometric relations between the interest points (e.g., distance and angles between interest points) as determined according to the location of the extracted interest points.
  • the method and system of the disclosed technique were exemplified by a CNN.
  • the disclosed technique is not limited to CNNs only, and is applicable to other artificial neural networks as well.
  • the neural network e.g., a feed-forward neural network, or any other configuration of artificial neural network
  • High value features detected by the nodes of the network are mapped back to the image space of the input image, and key points (interest points) are selected therefrom.
  • key points are selected therefrom.
  • only a subset of the detected features are activating subsequent nodes (or are employed for detecting key points) for reducing computational cost and/or for filtering out features that are less pertinent.
  • the key points are employed for representing the input image for performing various visual tasks. In this manner, the input image is represented by features learned and detected by the neural network that are better suited for representing the input image than conventional features (not specifically adapted to the input image).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Automation & Control Theory (AREA)
  • Fuzzy Systems (AREA)
  • Quality & Reliability (AREA)
  • Electromagnetism (AREA)
  • Toxicology (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)

Abstract

A method for representing an input image, the method including the steps of applying a trained neural network (NN) on the input image, selecting a plurality of feature maps, determining a location of each of the feature maps in an image space of the input image, defining a plurality of interest points of the input image, representing the input image as a graph according to the interest points and geometric relations between the interest points, and employing the graph for performing a visual task, the graph including a plurality of vertices and edges, and maintaining the data respective of the geometric relations, the feature maps being selected of an output of at least one selected layer of the trained NN according to values attributed to the feature maps by the trained NN, the interest points of the input image being defined based on the locations corresponding to the feature maps.

Description

  • This application is a Continuation of U.S. application Ser. No. 14/676,404, filed 1 Apr. 2015, which claims benefit of Serial No. 231862, filed 1 Apr. 2014 in Israel and which applications are incorporated herein by reference. To the extent appropriate, a claim of priority is made to the above disclosed applications.
  • FIELD OF THE DISCLOSED TECHNIQUE
  • The disclosed technique relates to image representation in general, and to methods and systems for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, in particular.
  • BACKGROUND OF THE DISCLOSED TECHNIQUE
  • For many visual tasks, the manner in which the image is represented can have a substantial effect on both the performance and the results of the visual task. Convolutional neural networks (CNN), as known in the art, can learn to produce multiscale representations of an image. The features extracted by the convolutional neural networks are features that are pertinent to the image on which the convolutional network is applied.
  • An article by Krizhevsky et al., entitled “ImageNet Classification with Deep Convolutional Neural Networks” published in the proceedings from the conference on Neural Information Processing Systems 2012, describes the architecture and operation of a deep convolutional neural network. The CNN of this publication includes eight learned layers (five convolutional layers and three fully-connected layers). The pooling layers in this publication include overlapping tiles covering their respective input in an overlapping manner. The detailed CNN is employed for image classification.
  • An article by Zeiler et al., entitled “Visualizing and Understanding Convolutional Networks” published on http://arxiv.org/abs/1311.2901v3, is directed to a visualization technique that gives insight into the function of intermediate feature layers of a CNN. The visualization technique shows a plausible and interpretable input pattern (situated in the original input image space) that gives rise to a given activation in the feature maps. The visualization technique employs a multi-layered de-convolutional network. A de-convolutional network employs the same components as a convolutional network (e.g., filtering and pooling) but in reverse. Thus, this article describes mapping detected features in the produced feature maps to the image space of the input image. In this article, the de-convolutional networks are employed as a probe of an already trained convolutional network.
  • An article by Simonyan et al., entitled “Deep Inside Convolutional Networks: Visualizing Image Classification Models and Saliency Maps” published on http://arxiv.org/abs/1312.6034, is directed to visualization of image classification models, learnt using deep Convolutional Networks (ConvNets). This article describes two visualization techniques. The first one generates an image for maximizing the class score based on computing the gradient of the class score with respect to the input image. The second one involves computing a class saliency map, specific to a given image and class.
  • Reference is now made to US Patent Application Publication Number 2010/0266200 to Atallah et al., and entitled “Image Analysis through Neutral Network Using Image Average Color”. This publication is directed at a computer-implemented image system. The system includes an analysis component and a classification component. The analysis component analyzes image characteristics of an image that includes an average color value. The classification component includes a self-organizing map (e.g., Kohonen neural network) for classifying the image relative to a second image based on classification information computed from the average color value.
  • SUMMARY OF THE PRESENT DISCLOSED TECHNIQUE
  • It is an object of the disclosed technique to provide a novel method and system for representing an input image as a set of interest points detected by applying a trained Neural Network (NN) on the input image. In accordance with an embodiment of the disclosed technique, there is thus provided a method for representing an input image. The method includes the steps of applying a trained neural network on the input image, selecting a plurality of feature maps of an output of at least one selected layer of the trained neural network, determining a location corresponding to each of the plurality of feature maps in an image space of the input image, and defining a plurality of interest points of the input image for representing said input image. The feature maps are selected according to values attributed thereto by the trained neural network. The interest points are defined based on the determined locations corresponding to the feature maps.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The disclosed technique will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which:
  • FIGS. 1A and 1B are schematic illustrations of a convolutional neural network, constructed and operative in accordance with an embodiment of the disclosed technique;
  • FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique; and
  • FIG. 3 is a schematic illustration of a system for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.
  • DETAILED DESCRIPTION OF THE EMBODIMENTS
  • The disclosed technique overcomes the disadvantages of the prior art by providing a method and a system for representing an input image as a set of interest points (or key points) detected by applying a trained Neural Network (e.g., a Convolutional Neural Network—CNN) on the input image. The input image is run through the trained CNN and the most prominent extracted features (i.e., salient features) of the layers of the trained CNN are back-projected onto the image space of the original input image. The back-projected features are all combined into a single intensity map, or heat map. Interest points are extracted from the heat map. Each interest point is defined by a distinct location in the image space of the input image, and can be associated with a respective descriptor. Furthermore, the geometric relations between the extracted interest points are determined according to the locations of the interest points.
  • Thereafter, the input image can be represented as a graph according to the extracted interest points and the geometric relations between the interest points. The graph representation of the input image can then be employed for various visual tasks, such as determining image similarity, similarity-based image search, and the like.
  • It is noted that the features detected by applying the trained CNN on the input image are features that are relevant to the input image. That is, the input image is expressed through the features that are attributed with the greatest values, and which can therefore be considered as most pertinent to the image. In particular, the input image might be better expressed by the features learned and detected by the CNN, than by predetermined conventional features not adapted specifically to the analyzed input image. Thus, these high value features represent the input image in an optimized manner and can provide better results when employed for various visual tasks (as compared to conventional features). To sum up, the disclosed technique represents an image by employing key points (interest points) that correspond to multi-scale salient features of the image as detected by the CNN.
  • Reference is now made to FIGS. 1A and 1B, which are schematic illustrations of a Convolutional Neural Network (CNN), generally referenced 10, constructed and operative in accordance with an embodiment of the disclosed technique. FIG. 1A depicts an overview of CNN 10, and FIG. 1B depicts a selected convolutional layer of CNN 10.
  • With reference to FIG. 1A, CNN 10 includes an input image 12, followed by first and second convolutional layers 14 and 18 with respective outputs 16 and 20. It is noted that CNN 10 can include more, or less, convolutional layers. The output of second convolutional layer 20 is then vectorized in vectorizing layer 22. A vectoriziation output 24 is fed into a layered, fully connected, neural network (not referenced). In the example set forth in FIG. 1A, in the fully connected neural network of CNN 10 there are three fully connected layers 26, 30 and 34—more, or less, layers are possible.
  • Each of fully connected layers 26, 30 and 34 comprises a variable number of linear, or affine, operators potentially followed by a nonlinear activation function. The last fully connected layer 34 is typically a normalization layer so that the final elements of an output vector 36 are bounded in some fixed, interpretable range. The parameters of each convolutional layer and each fully connected layer are set during a training (i.e., learning) period of CNN 10.
  • The structure and operation of each of the convolutional layers and the fully connected layers is further detailed in the following paragraphs. With reference to FIG. 1B, each input to a convolutional layer is a multichannel feature map 52 that is represented by a three-dimensional (3D) matrix. For example, a color input image may contain the various color intensity channels. The depth dimension of the input 3D matrix, representing feature map 52, is defined by the channels of multichannel feature map 52. For instance, for an input image having three color channels, the 3D matrix could be an M×N×3 matrix (i.e., the depth dimension has a value of three). The horizontal and vertical dimensions of 3D matrix 52 (i.e., the height and width of matrix 52) are defined by the respective dimensions of the input image.
  • The input is convolved with filters 54 that are set in the training stage of CNN 10. While each of filters 54 has the same depth as input feature map 52, the horizontal and vertical dimensions of the filter may vary. Each of the filters 54 is convolved with the layer input 52 to generate a two-dimensional (2D) matrix 56.
  • Subsequently, an optional max pooling operation 58 is applied to produce feature maps 60. In other words, the output of convolutional layer 56 enters max pooling layer 58 (i.e., performing the max pooling operation) whose outputs are feature maps 60. These 2D feature maps 60 are then stacked to yield a 3D output matrix 62. Both convolution and max pooling operations contain various strides (or incremental steps) by which the respective input is horizontally and vertically traversed.
  • Each of convolutional layer outputs 16 and 20, and fully connected layer outputs 28, 32, and 36, details the image structures (i.e., features) that best matched the filters of the respective layer, thereby “detecting” those image structures. In general, each of convolutional layer outputs 16 and 20, and fully connected layer outputs 28, 32, and 36, detects image structures in an escalating manner such that the deeper layers detect features of greater complexity. For example, it has been empirically demonstrated that the first convolutional layer 14 detects edges, and the second convolutional layer 18, which is deeper than first layer 14, may detect object attributes, such as curvature and texture. It is noted that CNN 10 (FIG. 1A) can include other numbers of convolutional layers, such as a single layer, four layers, five layers and the like.
  • Max pooling layer 58 selects the input feature maps of greatest value (i.e., indicating that the filters that produced those largest feature map values can serve as salient feature detectors). Max pooling layer 58 demarcates its input into a set of overlapping or non-overlapping tiles and for each such tile, outputs the maximum value. Thus, max-pooling layer 58 reduces the computational cost for deeper layers (i.e., max pooling layer 58 serves as a sub-sampling or down-sampling layer).
  • It is noted that a convolution layer can be augmented with rectified linear operation and a max pooling layer 58 can be augmented with normalization (e.g., local response normalization—as described, for example, in the Krizhevsky article referenced in the background section herein above). Alternatively, max pooling layer 58 can be replaced by another feature-pooling layer, such as average pooling layer, a quantile pooling layer, or rank pooling layer. Fully connected layers 26, 30, and 34 operate as a Multilayer Perceptron (MLP).
  • In the example set forth in FIGS. 1A and 1B, CNN 10 includes two convolutional layers and three fully connected layers. However, the disclosed technique can be implemented by employing CNNs having more, or less, layers (e.g., three convolutional layers and five fully connected layers). Moreover, other parameters and characteristics of the CNN can be adapted according to the specific task, available resources, user preferences, the training set, the input image, and the like. Additionally, the disclosed technique is also applicable to other types of artificial neural networks (besides CNNs).
  • It is noted that the salient features detected by the neural network are regions, or patches, of the input image which are attributed with high values when convolved with the filters of the neural network. For example, the salient features can vary between simple corners to semantic object parts, such as an eye of a person, a whole head or face, or a car wheel, depending on the input image.
  • Reference is now made to FIG. 2, which is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique. In procedure 100, a trained Neural Network (e.g., a trained Convolutional Neural Network—CNN) is received. The CNN may include convolutional layers and fully connected layers. With reference to FIG. 1A, CNN 10 is received after being trained with a selected training set.
  • In procedure 102, the trained CNN is applied on an input image. The input image may, or may not, be related to the training set employed for training the neural network. That is, there is no requirement to use a training image, or to use an image from an image class found in the training set. The input image conforms to the expected input dimensions of the trained CNN. As such, the input image may require resizing and cropping, for example, for adapting it to the input dimensions of the CNN. Additionally, a pixel-based mean image, as determined in the training phase (i.e., mean image of the image training set), may be subtracted from the input image. With reference to FIG. 1A, input image 12 is inputted into CNN 10 as a multichannel feature map represented by a 3D matrix. In general, the input image has to undergo the same (or similar) preprocessing, which was applied to every image when training the neural network.
  • In procedure 104, a plurality of feature maps from the output of the layers of the neural network are selected according to their values. The feature maps are produced in response to convolution of the various filters with the layer input. In particular, for each layer of the trained CNN, feature maps that are attributed with the top ranked values, are selected. That is, the highest valued feature maps at the output of the convolutional layer (or the fully connected layer) are selected. Alternatively, the highest valued feature maps can be selected at any stage following the convolution operation, for example prior to max pooling (i.e., even if the convolutional layer includes the optional max pooling operation).
  • The applied filters of the layers of the trained CNN serve as feature detectors that detect the locations of the layer input that have high correspondence with the filters. The feature maps having the top ranked values (i.e., also referred to as top ranked feature maps or top ranked values) represent the locations within the layer input that showed the greatest correspondence to the applied filters. Thus, the top ranked values represent salient features of the layer input as detected by the filter detectors of the respective layer.
  • It is noted that the top ranked values can be selected “on the fly” during application of the trained CNN on the input image. That is, as a convolutional layer processes its respective input and produces respective output, the largest output values are selected. The top ranked values can be selected such that a selected percentage or quantity of values is selected (e.g., the upper 15% or the largest 1000 values), or can be selected such that only values exceeding a threshold are selected. With reference to FIG. 1B, the greatest values of layer output 62 are selected.
  • In procedure 106, the locations corresponding to the selected feature maps (i.e., feature maps having the top ranked values) in an image space of the input image, are determined. The determination of these locations within the image space of the input image is also referred to herein as back-projection of the features that are represented by the selected top ranked values. In other words, in the back-projection process, each selected top ranked value (i.e., high value feature map), selected for each layer of the CNN, is mapped back to the image space of the original image.
  • The back-projection of the top ranked values to the image space of the input image is performed, for example, by employing a de-convolutional network. Alternatively, the back-projection is performed by a simple backpropagation (e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above). In particular, and as described, for example, in the Zeiler article referenced in the background section herein above, to approximately invert the convolutional step we may use any technique from the Blind Source Separation field, for example, a sparsity-based approach. Alternatively, a matched filter approach can be employed for inverting the convolutional step. To approximately invert the max pooling operation the stored masks can be used to place the max values in their appropriate input locations (i.e., zeroes are placed by default). Generally, any technique for mapping the selected high valued feature maps back to the image space of the input image can be applied. For example, the method of the disclosed technique can involve tracking all potential features (i.e., image patches or image regions detected by the neural network) throughout the network, thereby avoiding the need for back-projecting the features. For example, a selected image patch at the input to the first layer is tracked and the value attributed to that image patch by each of the filters of the first layer is recorded. Thus, the output of the first layer that is associated with the selected image patch is known. Similarly, the output of the first layer, associated with the selected image patch, that enters the second layer as input, is tracked, and so forth. Thereby, the output of each subsequent layer that is associated with the selected image patch is determined. With reference to FIG. 1A, the selected highest (top ranked) values are back-projected to the image space of input image 12.
  • In procedure 108, a plurality of interest points of the input image are defined based on the locations corresponding to the selected feature maps. Each interest point is associated with a distinct position within the image space of the input image. Thus, the geometric relations between the interest points (e.g., the distances and/or the angles between the interest points) can be determined according to the location of each interest point. Additionally, a descriptor can be determined for each interest point. The descriptor of an interest point provides further information about the interest point. For example, in case the interest points are employed for determining image similarity, an interest point of a first image should not be compared to an interest point of a second image, having a completely different descriptor. In this manner, computational resources can be saved during image similarity determination, and other visual tasks related thereto.
  • In accordance with the simplest (though not the most cost effective) embodiment of the disclose technique, the locations determined in the back-projection step are defined as the interest points of the input image. In this case, after procedure 108, the method continues in procedure 114. However, for reducing the number of interest points (i.e., thereby reducing the computational cost of the visual task performed based on the representation of the input image) only a subset of the back-projected locations are employed as interest points for representing the input image. Furthermore, the selected subset of interest points should preferably correspond to the more prominent features detected by the different layers of the CNN. Thus, for choosing the interest points that correspond to the highest back-projected values (i.e., corresponding to the most prominent salient features detected by the different layers of the CNN), the method of the disclosed technique may include additional sub-steps 110 and 112 as detailed herein below.
  • In procedure 110, the locations corresponding to the selected feature maps are combined into a heat map. The heat map includes the selected top ranked values, each located in a location determined in the back-projection process. Thereby, the heat map combines values representing salient features extracted from all layers of the CNN (i.e., features of various scale levels). Alternatively, a respective heat map is generated for each layer of the network. Thus, key points detected by each layer can be selected separately. Thereby, for example, knowledge of the scale level of each key point can be maintained and each layer can be represented separately. With reference to FIG. 1A, the selected highest values (i.e., the locations corresponding to the selected feature maps attributed with the top ranked values) are combined into a heat map. Each selected value is located in its respective location within the image space of input image 12 as determined by back-projection.
  • In procedure 112, a plurality of interest points are extracted from the heat map (or heat maps). The interest points can be, for example, the peaks in the intensity map (e.g., global peaks or local peaks). Alternatively, the interest points are the centers of the densest portions of the heat map. Generally, any intensity based method for selecting key points out of the locations determined by back-projection of the detected salient features can be employed. The extracted interest points are employed for representing the input image for performing various visual tasks. With reference to FIG. 1A, interest points are extracted from the heat map, and can be employed for representing input image 12.
  • In procedure 114, the input image is represented as a graph according to the extracted interest points and the geometric relations between them. The geometric relations between the interest points can be, for example, the distance between pairs of points and the angles between triplets of points. The graph image representation maintains data respective of the geometric relations between the interest points and thereby, can improve the results of various visual tasks, such as similarity based image search. It is noted that procedure 114 is optional and the method can stop after procedure 112 (or even after procedure 108) and represent the image as a set of key points (interest points). With reference to FIG. 1A, input image 12 is represented as a graph according to the extracted interest points and the geometric relations between the interest points.
  • Reference is now made to FIG. 3, which is a schematic illustration of a system, generally referenced 150, for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique. System 150 includes a CNN trainer 152, a CNN executer 154, a top ranks values selector 156, a feature back-projector 158, a heat map generator 160, an interest point extractor 162, an image representer 164, and a storage device 168.
  • Storage device 168 is coupled with each of CNN trainer 152, CNN executer 154, top ranked values selector 156, feature back-projector 158, heat map generator 160, interest point extractor 162, and image representer 164 for enabling the different components of system 150 to store and retrieve data. It is noted that all components except storage device 168 can be embedded on a single processing device or on an array of processing devices connected there-between. For example, components 152-164 are all embedded on a single graphics processing unit (GPU) 166, or a single Central Processing Unit (CPU) 166. Storage device 168 can be any storage device, such as a magnetic storage device (e.g., Hard Disc Drive—HDD), an optic storage device, and the like.
  • CNN trainer 152 retrieves a CNN architecture and a training image data set from storage device 168 or from another external data source. CNN trainer executes the CNN on any of the images of the training image data set, and accordingly trains the CNN to detect features pertinent to the images of the training image data set. CNN trainer stores the trained CNN on data storage device.
  • CNN executer 154 retrieves the trained CNN from storage device 168 and further retrieves an input image to be represented as a graph according to interest points detected by applying the trained CNN on the input image. CNN executer applies the trained CNN to the input image.
  • During execution of the trained CNN, top ranked values selector 156 selects the top ranked values produced in response to the convolution of the various filters applied on the input to the respective layer. The top ranked values indicate that the filter that produced the high value is pertinent to the input image and therefore should be included in the image graph representation.
  • Feature back-projector 158 retrieves the top ranked values and performs back-projection for each top ranked value. In other words, for each selected top ranked value, feature back-projector maps the top ranked value onto a respective location in the image space of the input image. That is, feature back-projector 158 determines for each selected value the location in the input image that when convolved with a respective filter of a respective convolutional layer produced the selected high value. Heat map generator 160 combines all back-projected top ranked values into a single heat map including each back-projected value positioned at its respective location within the image space of the input image, as determined by feature back-projector 158.
  • Interest point extractor 162 extracts interest points (e.g., intensity based interest points) from the heat map produced by heat map generator 160. Each extracted interest point is associated with a location within the image space of the input image (e.g., the coordinates of the interest point). Additionally, the interest point extractor can also determine a descriptor for each of the extracted interest points. Image representer 164 represents the input image as a graph based on the extracted interest points and the geometric relations between the interest points (e.g., distance and angles between interest points) as determined according to the location of the extracted interest points.
  • In the examples set forth herein above with reference to FIGS. 1A, 1 B, 2 and 3, the method and system of the disclosed technique were exemplified by a CNN. However, the disclosed technique is not limited to CNNs only, and is applicable to other artificial neural networks as well. In such cases the neural network (e.g., a feed-forward neural network, or any other configuration of artificial neural network) is applied onto an input image. High value features detected by the nodes of the network are mapped back to the image space of the input image, and key points (interest points) are selected therefrom. Optionally, only a subset of the detected features are activating subsequent nodes (or are employed for detecting key points) for reducing computational cost and/or for filtering out features that are less pertinent. The key points are employed for representing the input image for performing various visual tasks. In this manner, the input image is represented by features learned and detected by the neural network that are better suited for representing the input image than conventional features (not specifically adapted to the input image).
  • It will be appreciated by persons skilled in the art that the disclosed technique is not limited to what has been particularly shown and described hereinabove. Rather the scope of the disclosed technique is defined only by the claims, which follow.

Claims (6)

1. A method for representing an input image, the method comprising the following procedures:
applying a trained neural network on said input image;
selecting a plurality of feature maps of an output of at least one selected layer of said trained neural network according to values attributed to said plurality of feature maps by said trained neural network;
for each of said plurality of feature maps, determining a location corresponding thereto in an image space of said input image;
defining a plurality of interest points of said input image, based on said locations corresponding to said plurality of feature maps;
representing said input image as a graph according to said plurality of interest points and according to geometric relations between interest points of said plurality of interest points; and
employing said graph for performing a visual task,
wherein said graph comprises a plurality of vertices and edges; and
wherein said graph maintains data respective of said geometric relations between interest points.
2. The method of claim 1, wherein said plurality of feature maps are selected according to a selected criterion of the list consisting of:
said values attributed to said plurality of feature maps exceed a threshold;
said values attributed to said plurality of feature maps being the N highest values; and
said values attributed to said plurality of feature maps being in the upper P % of values,
wherein N and P are selected numerical values.
3. The method of claim 1, wherein said procedure of defining said plurality of interest points comprises the sub-procedures of:
combining said locations corresponding to said plurality of feature maps into at least one heat map; and
extracting said plurality of interest points from said at least one heat map.
4. The method of claim 3, wherein each interest point of said plurality of interest points being an intensity peak of said at least one heat map.
5. The method of claim 3, wherein each interest point of said plurality of interest points being a center of a region of said at least one heat map having high density of said locations corresponding to said plurality of feature maps, and wherein said region of said at least one heat map having high density of said locations being selected from the list consisting of:
regions having density value exceeding a threshold;
N regions having the highest density values; and
regions in the upper P % of density values,
wherein N and P are selected numerical values.
6. The method of claim 1, further comprising the procedure of associating each one of said plurality of interest points with a respective descriptor before said procedure of representing said input image as a graph.
US15/188,729 2014-04-01 2016-06-21 Neural network image representation Abandoned US20160300121A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/188,729 US20160300121A1 (en) 2014-04-01 2016-06-21 Neural network image representation

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IL231862 2014-04-01
IL231862A IL231862A (en) 2014-04-01 2014-04-01 Neural network image representation
US14/676,404 US9396415B2 (en) 2014-04-01 2015-04-01 Neural network image representation
US15/188,729 US20160300121A1 (en) 2014-04-01 2016-06-21 Neural network image representation

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/676,404 Continuation US9396415B2 (en) 2014-04-01 2015-04-01 Neural network image representation

Publications (1)

Publication Number Publication Date
US20160300121A1 true US20160300121A1 (en) 2016-10-13

Family

ID=51418161

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/676,404 Expired - Fee Related US9396415B2 (en) 2014-04-01 2015-04-01 Neural network image representation
US15/188,729 Abandoned US20160300121A1 (en) 2014-04-01 2016-06-21 Neural network image representation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/676,404 Expired - Fee Related US9396415B2 (en) 2014-04-01 2015-04-01 Neural network image representation

Country Status (2)

Country Link
US (2) US9396415B2 (en)
IL (1) IL231862A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018113261A1 (en) * 2016-12-22 2018-06-28 深圳光启合众科技有限公司 Target object recognition method and apparatus, and robot
WO2018221863A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
CN109063854A (en) * 2018-08-23 2018-12-21 河南中裕广恒科技股份有限公司 Intelligent O&M cloud platform system and its control method
WO2019099515A1 (en) * 2017-11-14 2019-05-23 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
CN109961083A (en) * 2017-12-14 2019-07-02 安讯士有限公司 For convolutional neural networks to be applied to the method and image procossing entity of image
US10769821B2 (en) 2017-07-25 2020-09-08 Nutech Company Limited Method and device for reconstructing CT image and storage medium
WO2020236624A1 (en) * 2019-05-17 2020-11-26 Magic Leap, Inc. Methods and apparatuses for corner detection using neural network and corner detector
US11119915B2 (en) 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
WO2021185379A1 (en) * 2020-03-20 2021-09-23 长沙智能驾驶研究院有限公司 Dense target detection method and system
US20210350175A1 (en) * 2020-05-07 2021-11-11 Adobe Inc. Key-value memory network for predicting time-series metrics of target entities
US11270147B1 (en) 2020-10-05 2022-03-08 International Business Machines Corporation Action-object recognition in cluttered video scenes using text
US11348275B2 (en) * 2017-11-21 2022-05-31 Beijing Sensetime Technology Development Co. Ltd. Methods and apparatuses for determining bounding box of target object, media, and devices
US11423252B1 (en) 2021-04-29 2022-08-23 International Business Machines Corporation Object dataset creation or modification using labeled action-object videos
US20220414145A1 (en) * 2019-08-16 2022-12-29 The Toronto-Dominion Bank Automated image retrieval with graph neural network
US11797603B2 (en) 2020-05-01 2023-10-24 Magic Leap, Inc. Image descriptor network with imposed hierarchical normalization

Families Citing this family (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9542626B2 (en) * 2013-09-06 2017-01-10 Toyota Jidosha Kabushiki Kaisha Augmenting layer-based object detection with deep convolutional neural networks
US9953425B2 (en) 2014-07-30 2018-04-24 Adobe Systems Incorporated Learning image categorization using related attributes
US9536293B2 (en) * 2014-07-30 2017-01-03 Adobe Systems Incorporated Image assessment using deep convolutional neural networks
US9418458B2 (en) * 2015-01-05 2016-08-16 Superfish Ltd. Graph image representation from convolutional neural networks
US10872230B2 (en) * 2015-03-27 2020-12-22 Intel Corporation Low-cost face recognition using Gaussian receptive field features
US9436895B1 (en) * 2015-04-03 2016-09-06 Mitsubishi Electric Research Laboratories, Inc. Method for determining similarity of objects represented in images
AU2016261487B2 (en) * 2015-05-11 2020-11-05 Magic Leap, Inc. Devices, methods and systems for biometric user recognition utilizing neural networks
US20160350336A1 (en) * 2015-05-31 2016-12-01 Allyke, Inc. Automated image searching, exploration and discovery
EP3262569A1 (en) * 2015-06-05 2018-01-03 Google, Inc. Spatial transformer modules
US9971940B1 (en) * 2015-08-10 2018-05-15 Google Llc Automatic learning of a video matching system
US10255040B2 (en) * 2017-05-11 2019-04-09 Veridium Ip Limited System and method for biometric identification
US11329980B2 (en) 2015-08-21 2022-05-10 Veridium Ip Limited System and method for biometric protocol standards
US9852492B2 (en) * 2015-09-18 2017-12-26 Yahoo Holdings, Inc. Face detection
US11074492B2 (en) * 2015-10-07 2021-07-27 Altera Corporation Method and apparatus for performing different types of convolution operations with the same processing elements
CA2972183C (en) * 2015-12-14 2018-03-27 Motion Metrics International Corp. Method and apparatus for identifying fragmented material portions within an image
US10460231B2 (en) * 2015-12-29 2019-10-29 Samsung Electronics Co., Ltd. Method and apparatus of neural network based image signal processor
US20170236057A1 (en) * 2016-02-16 2017-08-17 Carnegie Mellon University, A Pennsylvania Non-Profit Corporation System and Method for Face Detection and Landmark Localization
US9916522B2 (en) * 2016-03-11 2018-03-13 Kabushiki Kaisha Toshiba Training constrained deconvolutional networks for road scene semantic segmentation
WO2017156547A1 (en) 2016-03-11 2017-09-14 Magic Leap, Inc. Structure learning in convolutional neural networks
US11461919B2 (en) 2016-04-21 2022-10-04 Ramot At Tel Aviv University Ltd. Cascaded neural network
GB2549554A (en) * 2016-04-21 2017-10-25 Ramot At Tel-Aviv Univ Ltd Method and system for detecting an object in an image
US10303977B2 (en) 2016-06-28 2019-05-28 Conduent Business Services, Llc System and method for expanding and training convolutional neural networks for large size input images
CN109661194B (en) 2016-07-14 2022-02-25 奇跃公司 Iris boundary estimation using corneal curvature
EP3485425B1 (en) 2016-07-14 2023-08-23 Magic Leap, Inc. Deep neural network for iris identification
CN106251338B (en) * 2016-07-20 2019-04-30 北京旷视科技有限公司 Target integrity detection method and device
WO2018014109A1 (en) * 2016-07-22 2018-01-25 9206868 Canada Inc. System and method for analyzing and searching for features associated with objects
CA3034644A1 (en) 2016-08-22 2018-03-01 Magic Leap, Inc. Augmented reality display device with deep learning sensors
EP3300002A1 (en) 2016-09-22 2018-03-28 Styria medijski servisi d.o.o. Method for determining the similarity of digital images
RU2016138608A (en) 2016-09-29 2018-03-30 Мэджик Лип, Инк. NEURAL NETWORK FOR SEGMENTING THE EYE IMAGE AND ASSESSING THE QUALITY OF THE IMAGE
KR102216019B1 (en) 2016-10-04 2021-02-15 매직 립, 인코포레이티드 Efficient data layouts for convolutional neural networks
US10552709B2 (en) * 2016-10-05 2020-02-04 Ecole Polytechnique Federale De Lausanne (Epfl) Method, system, and device for learned invariant feature transform for computer images
CN106529578A (en) * 2016-10-20 2017-03-22 中山大学 Vehicle brand model fine identification method and system based on depth learning
US10339651B2 (en) * 2016-10-28 2019-07-02 International Business Machines Corporation Simultaneous feature extraction and dictionary learning using deep learning architectures for characterization of images of heterogeneous tissue samples
US10573040B2 (en) * 2016-11-08 2020-02-25 Adobe Inc. Image modification using detected symmetry
US10733505B2 (en) 2016-11-10 2020-08-04 Google Llc Performing kernel striding in hardware
JP6854344B2 (en) 2016-11-15 2021-04-07 マジック リープ, インコーポレイテッドMagic Leap,Inc. Deep machine learning system for rectangular parallelepiped detection
US10360494B2 (en) * 2016-11-30 2019-07-23 Altumview Systems Inc. Convolutional neural network (CNN) system based on resolution-limited small-scale CNN modules
KR20230070318A (en) 2016-12-05 2023-05-22 매직 립, 인코포레이티드 Virual user input controls in a mixed reality environment
US11010431B2 (en) * 2016-12-30 2021-05-18 Samsung Electronics Co., Ltd. Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD
US10546231B2 (en) * 2017-01-23 2020-01-28 Fotonation Limited Method for synthesizing a neural network
US10198655B2 (en) * 2017-01-24 2019-02-05 Ford Global Technologies, Llc Object detection using recurrent neural network and concatenated feature map
TWI617993B (en) * 2017-03-03 2018-03-11 財團法人資訊工業策進會 Recognition system and recognition method
GB201703602D0 (en) * 2017-03-07 2017-04-19 Selerio Ltd Multi-Modal image search
AU2018236433B2 (en) 2017-03-17 2022-03-03 Magic Leap, Inc. Room layout estimation methods and techniques
US10496699B2 (en) * 2017-03-20 2019-12-03 Adobe Inc. Topic association and tagging for dense images
US10147019B2 (en) * 2017-03-20 2018-12-04 Sap Se Small object detection
CN107203765B (en) * 2017-03-30 2023-08-25 腾讯科技(上海)有限公司 Sensitive image detection method and device
US10783393B2 (en) 2017-06-20 2020-09-22 Nvidia Corporation Semi-supervised learning for landmark localization
CN109214238B (en) * 2017-06-30 2022-06-28 阿波罗智能技术(北京)有限公司 Multi-target tracking method, device, equipment and storage medium
US10380413B2 (en) 2017-07-13 2019-08-13 Robert Bosch Gmbh System and method for pose-invariant face alignment
KR102666475B1 (en) 2017-07-26 2024-05-21 매직 립, 인코포레이티드 Training a neural network with representations of user interface devices
US10275646B2 (en) * 2017-08-03 2019-04-30 Gyrfalcon Technology Inc. Motion recognition via a two-dimensional symbol having multiple ideograms contained therein
US10521661B2 (en) 2017-09-01 2019-12-31 Magic Leap, Inc. Detailed eye shape model for robust biometric applications
US10268205B2 (en) * 2017-09-13 2019-04-23 TuSimple Training and testing of a neural network method for deep odometry assisted by static scene optical flow
CA3068481A1 (en) 2017-09-20 2019-03-28 Magic Leap, Inc. Personalized neural network for eye tracking
CN107680088A (en) * 2017-09-30 2018-02-09 百度在线网络技术(北京)有限公司 Method and apparatus for analyzing medical image
US11093832B2 (en) 2017-10-19 2021-08-17 International Business Machines Corporation Pruning redundant neurons and kernels of deep convolutional neural networks
US10535155B2 (en) * 2017-10-24 2020-01-14 Toyota Motor Engineering & Manufacturing North America, Inc. Systems and methods for articulated pose estimation
IL273991B2 (en) 2017-10-26 2023-11-01 Magic Leap Inc Gradient normalization systems and methods for adaptive loss balancing in deep multitask networks
EP3741109B1 (en) 2018-01-17 2024-04-24 Magic Leap, Inc. Eye center of rotation determination, depth plane selection, and render camera positioning in display systems
CN108399382A (en) 2018-02-13 2018-08-14 阿里巴巴集团控股有限公司 Vehicle insurance image processing method and device
US10855986B2 (en) 2018-05-29 2020-12-01 Qualcomm Incorporated Bandwidth compression for neural network systems
US10671891B2 (en) 2018-07-19 2020-06-02 International Business Machines Corporation Reducing computational costs of deep reinforcement learning by gated convolutional neural network
US11567336B2 (en) 2018-07-24 2023-01-31 Magic Leap, Inc. Display systems and methods for determining registration between display and eyes of user
CN109101919B (en) * 2018-08-03 2022-05-10 北京字节跳动网络技术有限公司 Method and apparatus for generating information
US10936907B2 (en) 2018-08-10 2021-03-02 Buffalo Automation Group Inc. Training a deep learning system for maritime applications
US10782691B2 (en) 2018-08-10 2020-09-22 Buffalo Automation Group Inc. Deep learning and intelligent sensing system integration
US10984262B2 (en) * 2018-10-08 2021-04-20 StradVision, Inc. Learning method and testing method for monitoring blind spot of vehicle, and learning device and testing device using the same
JP7357674B2 (en) * 2018-10-24 2023-10-06 クライメイト、リミテッド、ライアビリティー、カンパニー Plant disease infection detection with improved machine learning
US20210406695A1 (en) * 2018-11-06 2021-12-30 Emory University Systems and Methods for Training an Autoencoder Neural Network Using Sparse Data
US10977548B2 (en) 2018-12-05 2021-04-13 Bank Of America Corporation Generation of capsule neural networks for enhancing image processing platforms
KR102046113B1 (en) * 2019-03-19 2019-11-18 주식회사 루닛 Machine-learning method for neural network and apparatus thereof
US11410016B2 (en) 2019-04-26 2022-08-09 Alibaba Group Holding Limited Selective performance of deterministic computations for neural networks
WO2020236993A1 (en) 2019-05-21 2020-11-26 Magic Leap, Inc. Hand pose estimation
CN110211164B (en) * 2019-06-05 2021-05-07 中德(珠海)人工智能研究院有限公司 Picture processing method of characteristic point operator based on neural network learning basic graph
CN114424147A (en) 2019-07-16 2022-04-29 奇跃公司 Determining eye rotation center using one or more eye tracking cameras
CN111368853A (en) * 2020-02-04 2020-07-03 清华珠三角研究院 Label construction method, system, device and storage medium
CN111310509A (en) * 2020-03-12 2020-06-19 北京大学 Real-time bar code detection system and method based on logistics waybill
CN113435233B (en) * 2020-03-23 2024-03-05 北京金山云网络技术有限公司 Pornographic image recognition method and system and electronic equipment
US11500939B2 (en) * 2020-04-21 2022-11-15 Adobe Inc. Unified framework for multi-modal similarity search
US11532147B2 (en) * 2020-09-25 2022-12-20 Microsoft Technology Licensing, Llc Diagnostic tool for deep learning similarity models
CN113205481A (en) * 2021-03-19 2021-08-03 浙江科技学院 Salient object detection method based on stepped progressive neural network
US20240338491A1 (en) * 2021-08-16 2024-10-10 The Regents Of The University Of California A design automation methodology based on graph neural networks to model integrated circuits and mitigate hardware security threats

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004055735A1 (en) * 2002-12-16 2004-07-01 Canon Kabushiki Kaisha Pattern identification method, device thereof, and program thereof
JP4532915B2 (en) * 2004-01-29 2010-08-25 キヤノン株式会社 Pattern recognition learning method, pattern recognition learning device, image input device, computer program, and computer-readable recording medium
JP4546157B2 (en) * 2004-06-03 2010-09-15 キヤノン株式会社 Information processing method, information processing apparatus, and imaging apparatus
WO2008133951A2 (en) * 2007-04-24 2008-11-06 Massachusetts Institute Of Technology Method and apparatus for image processing
US8135202B2 (en) * 2008-06-02 2012-03-13 Nec Laboratories America, Inc. Automated method and system for nuclear analysis of biopsy images
US8428348B2 (en) 2009-04-15 2013-04-23 Microsoft Corporation Image analysis through neural network using image average color

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229263A (en) * 2016-12-22 2018-06-29 深圳光启合众科技有限公司 The recognition methods of target object and device, robot
WO2018113261A1 (en) * 2016-12-22 2018-06-28 深圳光启合众科技有限公司 Target object recognition method and apparatus, and robot
US10733767B2 (en) 2017-05-31 2020-08-04 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
WO2018221863A1 (en) * 2017-05-31 2018-12-06 Samsung Electronics Co., Ltd. Method and device for processing multi-channel feature map images
KR20180131073A (en) * 2017-05-31 2018-12-10 삼성전자주식회사 Method and apparatus for processing multiple-channel feature map images
KR102301232B1 (en) 2017-05-31 2021-09-10 삼성전자주식회사 Method and apparatus for processing multiple-channel feature map images
US10769821B2 (en) 2017-07-25 2020-09-08 Nutech Company Limited Method and device for reconstructing CT image and storage medium
US10977554B2 (en) * 2017-11-14 2021-04-13 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
US11537894B2 (en) * 2017-11-14 2022-12-27 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
WO2019099515A1 (en) * 2017-11-14 2019-05-23 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
AU2018369757B2 (en) * 2017-11-14 2023-10-12 Magic Leap, Inc. Fully convolutional interest point detection and description via homographic adaptation
US11348275B2 (en) * 2017-11-21 2022-05-31 Beijing Sensetime Technology Development Co. Ltd. Methods and apparatuses for determining bounding box of target object, media, and devices
CN109961083A (en) * 2017-12-14 2019-07-02 安讯士有限公司 For convolutional neural networks to be applied to the method and image procossing entity of image
US11119915B2 (en) 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
CN109063854A (en) * 2018-08-23 2018-12-21 河南中裕广恒科技股份有限公司 Intelligent O&M cloud platform system and its control method
WO2020236624A1 (en) * 2019-05-17 2020-11-26 Magic Leap, Inc. Methods and apparatuses for corner detection using neural network and corner detector
US12007564B2 (en) 2019-05-17 2024-06-11 Magic Leap, Inc. Methods and apparatuses for corner detection using neural network and corner detector
JP7422785B2 (en) 2019-05-17 2024-01-26 マジック リープ, インコーポレイテッド Method and apparatus for angle detection using neural networks and angle detectors
JP2022532238A (en) * 2019-05-17 2022-07-13 マジック リープ, インコーポレイテッド Methods and equipment for angle detection using neural networks and angle detectors
US11686941B2 (en) 2019-05-17 2023-06-27 Magic Leap, Inc. Methods and apparatuses for corner detection using neural network and corner detector
US20220414145A1 (en) * 2019-08-16 2022-12-29 The Toronto-Dominion Bank Automated image retrieval with graph neural network
US11809486B2 (en) * 2019-08-16 2023-11-07 The Toronto-Dominion Bank Automated image retrieval with graph neural network
WO2021185379A1 (en) * 2020-03-20 2021-09-23 长沙智能驾驶研究院有限公司 Dense target detection method and system
US11797603B2 (en) 2020-05-01 2023-10-24 Magic Leap, Inc. Image descriptor network with imposed hierarchical normalization
US12072927B2 (en) 2020-05-01 2024-08-27 Magic Leap, Inc. Image descriptor network with imposed hierarchical normalization
US11501107B2 (en) * 2020-05-07 2022-11-15 Adobe Inc. Key-value memory network for predicting time-series metrics of target entities
US11694165B2 (en) 2020-05-07 2023-07-04 Adobe Inc. Key-value memory network for predicting time-series metrics of target entities
US20210350175A1 (en) * 2020-05-07 2021-11-11 Adobe Inc. Key-value memory network for predicting time-series metrics of target entities
US11270147B1 (en) 2020-10-05 2022-03-08 International Business Machines Corporation Action-object recognition in cluttered video scenes using text
US11928849B2 (en) 2020-10-05 2024-03-12 International Business Machines Corporation Action-object recognition in cluttered video scenes using text
US11423252B1 (en) 2021-04-29 2022-08-23 International Business Machines Corporation Object dataset creation or modification using labeled action-object videos

Also Published As

Publication number Publication date
US9396415B2 (en) 2016-07-19
IL231862A0 (en) 2014-08-31
IL231862A (en) 2015-04-30
US20150278642A1 (en) 2015-10-01

Similar Documents

Publication Publication Date Title
US9396415B2 (en) Neural network image representation
US9418458B2 (en) Graph image representation from convolutional neural networks
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
Von Stumberg et al. Gn-net: The gauss-newton loss for multi-weather relocalization
CN107871106B (en) Face detection method and device
Silberman et al. Instance segmentation of indoor scenes using a coverage loss
Christlein et al. An evaluation of popular copy-move forgery detection approaches
WO2020170014A1 (en) Object counting and instance segmentation using neural network architectures with image-level supervision
US20160196479A1 (en) Image similarity as a function of weighted descriptor similarities derived from neural networks
WO2016054779A1 (en) Spatial pyramid pooling networks for image processing
EP3813661A1 (en) Human pose analysis system and method
JP6905079B2 (en) Detection and representation of objects in images
CN105139004A (en) Face expression identification method based on video sequences
JP7225731B2 (en) Imaging multivariable data sequences
Galety et al. Marking attendance using modern face recognition (fr): Deep learning using the opencv method
Le et al. Co-localization with category-consistent features and geodesic distance propagation
Zhang et al. Multiresolution attention extractor for small object detection
Manzoor et al. Ancient coin classification based on recent trends of deep learning.
Sreenivas et al. Modified deep belief network based human emotion recognition with multiscale features from video sequences
CN114565752A (en) Image weak supervision target detection method based on class-agnostic foreground mining
Mocanu et al. Multimodal convolutional neural network for object detection using rgb-d images
Rasouli et al. Integrating three mechanisms of visual attention for active visual search
Lee et al. Where to look: Visual attention estimation in road scene video for safe driving
Seemanthini et al. Small human group detection and validation using pyramidal histogram of oriented gradients and gray level run length method
Manimurugan et al. HLASwin-T-ACoat-Net based Underwater Object Detection

Legal Events

Date Code Title Description
AS Assignment

Owner name: SUPERFISH LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHERTOK, MICHAEL;LORBERT, ALEXANDER;PINHAS, ADI;SIGNING DATES FROM 20150512 TO 20150513;REEL/FRAME:038977/0420

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION