US20160300121A1 - Neural network image representation - Google Patents
Neural network image representation Download PDFInfo
- Publication number
- US20160300121A1 US20160300121A1 US15/188,729 US201615188729A US2016300121A1 US 20160300121 A1 US20160300121 A1 US 20160300121A1 US 201615188729 A US201615188729 A US 201615188729A US 2016300121 A1 US2016300121 A1 US 2016300121A1
- Authority
- US
- United States
- Prior art keywords
- input image
- interest points
- feature maps
- image
- values
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
- G06V10/422—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation for representing the structure of the pattern or shape of an object therefor
- G06V10/426—Graphical representations
-
- G06K9/627—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06K—GRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K7/00—Methods or arrangements for sensing record carriers, e.g. for reading patterns
- G06K7/10—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
- G06K7/14—Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
- G06K7/1404—Methods for optical code recognition
- G06K7/146—Methods for optical code recognition the method including quality enhancement steps
- G06K7/1482—Methods for optical code recognition the method including quality enhancement steps using fuzzy logic or natural solvers, such as neural networks, genetic algorithms and simulated annealing
-
- G06K9/469—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/02—Computing arrangements based on specific mathematical models using fuzzy logic
- G06N7/04—Physical realisation
- G06N7/046—Implementation by means of a neural network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the disclosed technique relates to image representation in general, and to methods and systems for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, in particular.
- CNN convolutional neural networks
- CNN can learn to produce multiscale representations of an image.
- the features extracted by the convolutional neural networks are features that are pertinent to the image on which the convolutional network is applied.
- the CNN of this publication includes eight learned layers (five convolutional layers and three fully-connected layers).
- the pooling layers in this publication include overlapping tiles covering their respective input in an overlapping manner.
- the detailed CNN is employed for image classification.
- An article by Zeiler et al., entitled “Visualizing and Understanding Convolutional Networks” published on http://arxiv.org/abs/1311.2901v3, is directed to a visualization technique that gives insight into the function of intermediate feature layers of a CNN.
- the visualization technique shows a plausible and interpretable input pattern (situated in the original input image space) that gives rise to a given activation in the feature maps.
- the visualization technique employs a multi-layered de-convolutional network.
- a de-convolutional network employs the same components as a convolutional network (e.g., filtering and pooling) but in reverse.
- this article describes mapping detected features in the produced feature maps to the image space of the input image.
- the de-convolutional networks are employed as a probe of an already trained convolutional network.
- the system includes an analysis component and a classification component.
- the analysis component analyzes image characteristics of an image that includes an average color value.
- the classification component includes a self-organizing map (e.g., Kohonen neural network) for classifying the image relative to a second image based on classification information computed from the average color value.
- a method for representing an input image includes the steps of applying a trained neural network on the input image, selecting a plurality of feature maps of an output of at least one selected layer of the trained neural network, determining a location corresponding to each of the plurality of feature maps in an image space of the input image, and defining a plurality of interest points of the input image for representing said input image.
- the feature maps are selected according to values attributed thereto by the trained neural network.
- the interest points are defined based on the determined locations corresponding to the feature maps.
- FIGS. 1A and 1B are schematic illustrations of a convolutional neural network, constructed and operative in accordance with an embodiment of the disclosed technique
- FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique;
- FIG. 3 is a schematic illustration of a system for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.
- the disclosed technique overcomes the disadvantages of the prior art by providing a method and a system for representing an input image as a set of interest points (or key points) detected by applying a trained Neural Network (e.g., a Convolutional Neural Network—CNN) on the input image.
- a trained Neural Network e.g., a Convolutional Neural Network—CNN
- the input image is run through the trained CNN and the most prominent extracted features (i.e., salient features) of the layers of the trained CNN are back-projected onto the image space of the original input image.
- the back-projected features are all combined into a single intensity map, or heat map.
- Interest points are extracted from the heat map. Each interest point is defined by a distinct location in the image space of the input image, and can be associated with a respective descriptor. Furthermore, the geometric relations between the extracted interest points are determined according to the locations of the interest points.
- the input image can be represented as a graph according to the extracted interest points and the geometric relations between the interest points.
- the graph representation of the input image can then be employed for various visual tasks, such as determining image similarity, similarity-based image search, and the like.
- the features detected by applying the trained CNN on the input image are features that are relevant to the input image. That is, the input image is expressed through the features that are attributed with the greatest values, and which can therefore be considered as most pertinent to the image.
- the input image might be better expressed by the features learned and detected by the CNN, than by predetermined conventional features not adapted specifically to the analyzed input image.
- these high value features represent the input image in an optimized manner and can provide better results when employed for various visual tasks (as compared to conventional features).
- the disclosed technique represents an image by employing key points (interest points) that correspond to multi-scale salient features of the image as detected by the CNN.
- FIGS. 1A and 1B are schematic illustrations of a Convolutional Neural Network (CNN), generally referenced 10 , constructed and operative in accordance with an embodiment of the disclosed technique.
- CNN Convolutional Neural Network
- FIG. 1A depicts an overview of CNN 10
- FIG. 1B depicts a selected convolutional layer of CNN 10 .
- CNN 10 includes an input image 12 , followed by first and second convolutional layers 14 and 18 with respective outputs 16 and 20 . It is noted that CNN 10 can include more, or less, convolutional layers. The output of second convolutional layer 20 is then vectorized in vectorizing layer 22 . A vectoriziation output 24 is fed into a layered, fully connected, neural network (not referenced). In the example set forth in FIG. 1A , in the fully connected neural network of CNN 10 there are three fully connected layers 26 , 30 and 34 —more, or less, layers are possible.
- Each of fully connected layers 26 , 30 and 34 comprises a variable number of linear, or affine, operators potentially followed by a nonlinear activation function.
- the last fully connected layer 34 is typically a normalization layer so that the final elements of an output vector 36 are bounded in some fixed, interpretable range.
- the parameters of each convolutional layer and each fully connected layer are set during a training (i.e., learning) period of CNN 10 .
- each input to a convolutional layer is a multichannel feature map 52 that is represented by a three-dimensional (3D) matrix.
- a color input image may contain the various color intensity channels.
- the depth dimension of the input 3D matrix, representing feature map 52 is defined by the channels of multichannel feature map 52 .
- the 3D matrix could be an M ⁇ N ⁇ 3 matrix (i.e., the depth dimension has a value of three).
- the horizontal and vertical dimensions of 3D matrix 52 i.e., the height and width of matrix 52 ) are defined by the respective dimensions of the input image.
- the input is convolved with filters 54 that are set in the training stage of CNN 10 . While each of filters 54 has the same depth as input feature map 52 , the horizontal and vertical dimensions of the filter may vary. Each of the filters 54 is convolved with the layer input 52 to generate a two-dimensional (2D) matrix 56 .
- an optional max pooling operation 58 is applied to produce feature maps 60 .
- the output of convolutional layer 56 enters max pooling layer 58 (i.e., performing the max pooling operation) whose outputs are feature maps 60 .
- max pooling layer 58 i.e., performing the max pooling operation
- These 2D feature maps 60 are then stacked to yield a 3D output matrix 62 .
- Both convolution and max pooling operations contain various strides (or incremental steps) by which the respective input is horizontally and vertically traversed.
- Each of convolutional layer outputs 16 and 20 , and fully connected layer outputs 28 , 32 , and 36 details the image structures (i.e., features) that best matched the filters of the respective layer, thereby “detecting” those image structures.
- each of convolutional layer outputs 16 and 20 , and fully connected layer outputs 28 , 32 , and 36 detects image structures in an escalating manner such that the deeper layers detect features of greater complexity.
- the first convolutional layer 14 detects edges
- the second convolutional layer 18 which is deeper than first layer 14 , may detect object attributes, such as curvature and texture.
- CNN 10 FIG. 1A
- CNN 10 can include other numbers of convolutional layers, such as a single layer, four layers, five layers and the like.
- Max pooling layer 58 selects the input feature maps of greatest value (i.e., indicating that the filters that produced those largest feature map values can serve as salient feature detectors). Max pooling layer 58 demarcates its input into a set of overlapping or non-overlapping tiles and for each such tile, outputs the maximum value. Thus, max-pooling layer 58 reduces the computational cost for deeper layers (i.e., max pooling layer 58 serves as a sub-sampling or down-sampling layer).
- a convolution layer can be augmented with rectified linear operation and a max pooling layer 58 can be augmented with normalization (e.g., local response normalization—as described, for example, in the Krizhevsky article referenced in the background section herein above).
- max pooling layer 58 can be replaced by another feature-pooling layer, such as average pooling layer, a quantile pooling layer, or rank pooling layer.
- Fully connected layers 26 , 30 , and 34 operate as a Multilayer Perceptron (MLP).
- MLP Multilayer Perceptron
- CNN 10 includes two convolutional layers and three fully connected layers.
- the disclosed technique can be implemented by employing CNNs having more, or less, layers (e.g., three convolutional layers and five fully connected layers).
- other parameters and characteristics of the CNN can be adapted according to the specific task, available resources, user preferences, the training set, the input image, and the like.
- the disclosed technique is also applicable to other types of artificial neural networks (besides CNNs).
- the salient features detected by the neural network are regions, or patches, of the input image which are attributed with high values when convolved with the filters of the neural network.
- the salient features can vary between simple corners to semantic object parts, such as an eye of a person, a whole head or face, or a car wheel, depending on the input image.
- FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique.
- a trained Neural Network e.g., a trained Convolutional Neural Network—CNN
- the CNN may include convolutional layers and fully connected layers.
- CNN 10 is received after being trained with a selected training set.
- the trained CNN is applied on an input image.
- the input image may, or may not, be related to the training set employed for training the neural network. That is, there is no requirement to use a training image, or to use an image from an image class found in the training set.
- the input image conforms to the expected input dimensions of the trained CNN. As such, the input image may require resizing and cropping, for example, for adapting it to the input dimensions of the CNN.
- a pixel-based mean image, as determined in the training phase i.e., mean image of the image training set
- input image 12 is inputted into CNN 10 as a multichannel feature map represented by a 3D matrix. In general, the input image has to undergo the same (or similar) preprocessing, which was applied to every image when training the neural network.
- a plurality of feature maps from the output of the layers of the neural network are selected according to their values.
- the feature maps are produced in response to convolution of the various filters with the layer input.
- feature maps that are attributed with the top ranked values are selected. That is, the highest valued feature maps at the output of the convolutional layer (or the fully connected layer) are selected.
- the highest valued feature maps can be selected at any stage following the convolution operation, for example prior to max pooling (i.e., even if the convolutional layer includes the optional max pooling operation).
- the applied filters of the layers of the trained CNN serve as feature detectors that detect the locations of the layer input that have high correspondence with the filters.
- the feature maps having the top ranked values (i.e., also referred to as top ranked feature maps or top ranked values) represent the locations within the layer input that showed the greatest correspondence to the applied filters.
- the top ranked values represent salient features of the layer input as detected by the filter detectors of the respective layer.
- the top ranked values can be selected “on the fly” during application of the trained CNN on the input image. That is, as a convolutional layer processes its respective input and produces respective output, the largest output values are selected.
- the top ranked values can be selected such that a selected percentage or quantity of values is selected (e.g., the upper 15% or the largest 1000 values), or can be selected such that only values exceeding a threshold are selected. With reference to FIG. 1B , the greatest values of layer output 62 are selected.
- each selected top ranked value i.e., high value feature map
- each selected top ranked value selected for each layer of the CNN, is mapped back to the image space of the original image.
- the back-projection of the top ranked values to the image space of the input image is performed, for example, by employing a de-convolutional network.
- the back-projection is performed by a simple backpropagation (e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above).
- a simple backpropagation e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above.
- to approximately invert the convolutional step we may use any technique from the Blind Source Separation field, for example, a sparsity-based approach.
- a matched filter approach can be employed for inverting the convolutional step.
- the stored masks can be used to place the max values in their appropriate input locations (i.e., zeroes are placed by default).
- any technique for mapping the selected high valued feature maps back to the image space of the input image can be applied.
- the method of the disclosed technique can involve tracking all potential features (i.e., image patches or image regions detected by the neural network) throughout the network, thereby avoiding the need for back-projecting the features. For example, a selected image patch at the input to the first layer is tracked and the value attributed to that image patch by each of the filters of the first layer is recorded. Thus, the output of the first layer that is associated with the selected image patch is known.
- the output of the first layer, associated with the selected image patch, that enters the second layer as input, is tracked, and so forth. Thereby, the output of each subsequent layer that is associated with the selected image patch is determined.
- the selected highest (top ranked) values are back-projected to the image space of input image 12 .
- a plurality of interest points of the input image are defined based on the locations corresponding to the selected feature maps.
- Each interest point is associated with a distinct position within the image space of the input image.
- the geometric relations between the interest points e.g., the distances and/or the angles between the interest points
- a descriptor can be determined for each interest point.
- the descriptor of an interest point provides further information about the interest point. For example, in case the interest points are employed for determining image similarity, an interest point of a first image should not be compared to an interest point of a second image, having a completely different descriptor. In this manner, computational resources can be saved during image similarity determination, and other visual tasks related thereto.
- the locations determined in the back-projection step are defined as the interest points of the input image.
- the method continues in procedure 114 .
- a subset of the back-projected locations are employed as interest points for representing the input image.
- the selected subset of interest points should preferably correspond to the more prominent features detected by the different layers of the CNN.
- the method of the disclosed technique may include additional sub-steps 110 and 112 as detailed herein below.
- the locations corresponding to the selected feature maps are combined into a heat map.
- the heat map includes the selected top ranked values, each located in a location determined in the back-projection process.
- the heat map combines values representing salient features extracted from all layers of the CNN (i.e., features of various scale levels).
- a respective heat map is generated for each layer of the network.
- key points detected by each layer can be selected separately.
- knowledge of the scale level of each key point can be maintained and each layer can be represented separately.
- the selected highest values i.e., the locations corresponding to the selected feature maps attributed with the top ranked values
- Each selected value is located in its respective location within the image space of input image 12 as determined by back-projection.
- a plurality of interest points are extracted from the heat map (or heat maps).
- the interest points can be, for example, the peaks in the intensity map (e.g., global peaks or local peaks).
- the interest points are the centers of the densest portions of the heat map.
- any intensity based method for selecting key points out of the locations determined by back-projection of the detected salient features can be employed.
- the extracted interest points are employed for representing the input image for performing various visual tasks. With reference to FIG. 1A , interest points are extracted from the heat map, and can be employed for representing input image 12 .
- the input image is represented as a graph according to the extracted interest points and the geometric relations between them.
- the geometric relations between the interest points can be, for example, the distance between pairs of points and the angles between triplets of points.
- the graph image representation maintains data respective of the geometric relations between the interest points and thereby, can improve the results of various visual tasks, such as similarity based image search.
- procedure 114 is optional and the method can stop after procedure 112 (or even after procedure 108 ) and represent the image as a set of key points (interest points).
- input image 12 is represented as a graph according to the extracted interest points and the geometric relations between the interest points.
- FIG. 3 is a schematic illustration of a system, generally referenced 150 , for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.
- System 150 includes a CNN trainer 152 , a CNN executer 154 , a top ranks values selector 156 , a feature back-projector 158 , a heat map generator 160 , an interest point extractor 162 , an image representer 164 , and a storage device 168 .
- Storage device 168 is coupled with each of CNN trainer 152 , CNN executer 154 , top ranked values selector 156 , feature back-projector 158 , heat map generator 160 , interest point extractor 162 , and image representer 164 for enabling the different components of system 150 to store and retrieve data. It is noted that all components except storage device 168 can be embedded on a single processing device or on an array of processing devices connected there-between. For example, components 152 - 164 are all embedded on a single graphics processing unit (GPU) 166 , or a single Central Processing Unit (CPU) 166 .
- Storage device 168 can be any storage device, such as a magnetic storage device (e.g., Hard Disc Drive—HDD), an optic storage device, and the like.
- CNN trainer 152 retrieves a CNN architecture and a training image data set from storage device 168 or from another external data source. CNN trainer executes the CNN on any of the images of the training image data set, and accordingly trains the CNN to detect features pertinent to the images of the training image data set. CNN trainer stores the trained CNN on data storage device.
- CNN executer 154 retrieves the trained CNN from storage device 168 and further retrieves an input image to be represented as a graph according to interest points detected by applying the trained CNN on the input image. CNN executer applies the trained CNN to the input image.
- top ranked values selector 156 selects the top ranked values produced in response to the convolution of the various filters applied on the input to the respective layer.
- the top ranked values indicate that the filter that produced the high value is pertinent to the input image and therefore should be included in the image graph representation.
- Feature back-projector 158 retrieves the top ranked values and performs back-projection for each top ranked value.
- feature back-projector maps the top ranked value onto a respective location in the image space of the input image. That is, feature back-projector 158 determines for each selected value the location in the input image that when convolved with a respective filter of a respective convolutional layer produced the selected high value.
- Heat map generator 160 combines all back-projected top ranked values into a single heat map including each back-projected value positioned at its respective location within the image space of the input image, as determined by feature back-projector 158 .
- Interest point extractor 162 extracts interest points (e.g., intensity based interest points) from the heat map produced by heat map generator 160 .
- Each extracted interest point is associated with a location within the image space of the input image (e.g., the coordinates of the interest point). Additionally, the interest point extractor can also determine a descriptor for each of the extracted interest points.
- Image representer 164 represents the input image as a graph based on the extracted interest points and the geometric relations between the interest points (e.g., distance and angles between interest points) as determined according to the location of the extracted interest points.
- the method and system of the disclosed technique were exemplified by a CNN.
- the disclosed technique is not limited to CNNs only, and is applicable to other artificial neural networks as well.
- the neural network e.g., a feed-forward neural network, or any other configuration of artificial neural network
- High value features detected by the nodes of the network are mapped back to the image space of the input image, and key points (interest points) are selected therefrom.
- key points are selected therefrom.
- only a subset of the detected features are activating subsequent nodes (or are employed for detecting key points) for reducing computational cost and/or for filtering out features that are less pertinent.
- the key points are employed for representing the input image for performing various visual tasks. In this manner, the input image is represented by features learned and detected by the neural network that are better suited for representing the input image than conventional features (not specifically adapted to the input image).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Automation & Control Theory (AREA)
- Fuzzy Systems (AREA)
- Quality & Reliability (AREA)
- Electromagnetism (AREA)
- Toxicology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Image Analysis (AREA)
Abstract
A method for representing an input image, the method including the steps of applying a trained neural network (NN) on the input image, selecting a plurality of feature maps, determining a location of each of the feature maps in an image space of the input image, defining a plurality of interest points of the input image, representing the input image as a graph according to the interest points and geometric relations between the interest points, and employing the graph for performing a visual task, the graph including a plurality of vertices and edges, and maintaining the data respective of the geometric relations, the feature maps being selected of an output of at least one selected layer of the trained NN according to values attributed to the feature maps by the trained NN, the interest points of the input image being defined based on the locations corresponding to the feature maps.
Description
- This application is a Continuation of U.S. application Ser. No. 14/676,404, filed 1 Apr. 2015, which claims benefit of Serial No. 231862, filed 1 Apr. 2014 in Israel and which applications are incorporated herein by reference. To the extent appropriate, a claim of priority is made to the above disclosed applications.
- The disclosed technique relates to image representation in general, and to methods and systems for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, in particular.
- For many visual tasks, the manner in which the image is represented can have a substantial effect on both the performance and the results of the visual task. Convolutional neural networks (CNN), as known in the art, can learn to produce multiscale representations of an image. The features extracted by the convolutional neural networks are features that are pertinent to the image on which the convolutional network is applied.
- An article by Krizhevsky et al., entitled “ImageNet Classification with Deep Convolutional Neural Networks” published in the proceedings from the conference on Neural Information Processing Systems 2012, describes the architecture and operation of a deep convolutional neural network. The CNN of this publication includes eight learned layers (five convolutional layers and three fully-connected layers). The pooling layers in this publication include overlapping tiles covering their respective input in an overlapping manner. The detailed CNN is employed for image classification.
- An article by Zeiler et al., entitled “Visualizing and Understanding Convolutional Networks” published on http://arxiv.org/abs/1311.2901v3, is directed to a visualization technique that gives insight into the function of intermediate feature layers of a CNN. The visualization technique shows a plausible and interpretable input pattern (situated in the original input image space) that gives rise to a given activation in the feature maps. The visualization technique employs a multi-layered de-convolutional network. A de-convolutional network employs the same components as a convolutional network (e.g., filtering and pooling) but in reverse. Thus, this article describes mapping detected features in the produced feature maps to the image space of the input image. In this article, the de-convolutional networks are employed as a probe of an already trained convolutional network.
- An article by Simonyan et al., entitled “Deep Inside Convolutional Networks: Visualizing Image Classification Models and Saliency Maps” published on http://arxiv.org/abs/1312.6034, is directed to visualization of image classification models, learnt using deep Convolutional Networks (ConvNets). This article describes two visualization techniques. The first one generates an image for maximizing the class score based on computing the gradient of the class score with respect to the input image. The second one involves computing a class saliency map, specific to a given image and class.
- Reference is now made to US Patent Application Publication Number 2010/0266200 to Atallah et al., and entitled “Image Analysis through Neutral Network Using Image Average Color”. This publication is directed at a computer-implemented image system. The system includes an analysis component and a classification component. The analysis component analyzes image characteristics of an image that includes an average color value. The classification component includes a self-organizing map (e.g., Kohonen neural network) for classifying the image relative to a second image based on classification information computed from the average color value.
- It is an object of the disclosed technique to provide a novel method and system for representing an input image as a set of interest points detected by applying a trained Neural Network (NN) on the input image. In accordance with an embodiment of the disclosed technique, there is thus provided a method for representing an input image. The method includes the steps of applying a trained neural network on the input image, selecting a plurality of feature maps of an output of at least one selected layer of the trained neural network, determining a location corresponding to each of the plurality of feature maps in an image space of the input image, and defining a plurality of interest points of the input image for representing said input image. The feature maps are selected according to values attributed thereto by the trained neural network. The interest points are defined based on the determined locations corresponding to the feature maps.
- The disclosed technique will be understood and appreciated more fully from the following detailed description taken in conjunction with the drawings in which:
-
FIGS. 1A and 1B are schematic illustrations of a convolutional neural network, constructed and operative in accordance with an embodiment of the disclosed technique; -
FIG. 2 is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique; and -
FIG. 3 is a schematic illustration of a system for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique. - The disclosed technique overcomes the disadvantages of the prior art by providing a method and a system for representing an input image as a set of interest points (or key points) detected by applying a trained Neural Network (e.g., a Convolutional Neural Network—CNN) on the input image. The input image is run through the trained CNN and the most prominent extracted features (i.e., salient features) of the layers of the trained CNN are back-projected onto the image space of the original input image. The back-projected features are all combined into a single intensity map, or heat map. Interest points are extracted from the heat map. Each interest point is defined by a distinct location in the image space of the input image, and can be associated with a respective descriptor. Furthermore, the geometric relations between the extracted interest points are determined according to the locations of the interest points.
- Thereafter, the input image can be represented as a graph according to the extracted interest points and the geometric relations between the interest points. The graph representation of the input image can then be employed for various visual tasks, such as determining image similarity, similarity-based image search, and the like.
- It is noted that the features detected by applying the trained CNN on the input image are features that are relevant to the input image. That is, the input image is expressed through the features that are attributed with the greatest values, and which can therefore be considered as most pertinent to the image. In particular, the input image might be better expressed by the features learned and detected by the CNN, than by predetermined conventional features not adapted specifically to the analyzed input image. Thus, these high value features represent the input image in an optimized manner and can provide better results when employed for various visual tasks (as compared to conventional features). To sum up, the disclosed technique represents an image by employing key points (interest points) that correspond to multi-scale salient features of the image as detected by the CNN.
- Reference is now made to
FIGS. 1A and 1B , which are schematic illustrations of a Convolutional Neural Network (CNN), generally referenced 10, constructed and operative in accordance with an embodiment of the disclosed technique.FIG. 1A depicts an overview of CNN 10, andFIG. 1B depicts a selected convolutional layer of CNN 10. - With reference to
FIG. 1A , CNN 10 includes aninput image 12, followed by first and secondconvolutional layers respective outputs convolutional layer 20 is then vectorized in vectorizinglayer 22. Avectoriziation output 24 is fed into a layered, fully connected, neural network (not referenced). In the example set forth inFIG. 1A , in the fully connected neural network ofCNN 10 there are three fullyconnected layers - Each of fully connected
layers layer 34 is typically a normalization layer so that the final elements of anoutput vector 36 are bounded in some fixed, interpretable range. The parameters of each convolutional layer and each fully connected layer are set during a training (i.e., learning) period ofCNN 10. - The structure and operation of each of the convolutional layers and the fully connected layers is further detailed in the following paragraphs. With reference to
FIG. 1B , each input to a convolutional layer is amultichannel feature map 52 that is represented by a three-dimensional (3D) matrix. For example, a color input image may contain the various color intensity channels. The depth dimension of the input 3D matrix, representingfeature map 52, is defined by the channels ofmultichannel feature map 52. For instance, for an input image having three color channels, the 3D matrix could be an M×N×3 matrix (i.e., the depth dimension has a value of three). The horizontal and vertical dimensions of 3D matrix 52 (i.e., the height and width of matrix 52) are defined by the respective dimensions of the input image. - The input is convolved with
filters 54 that are set in the training stage ofCNN 10. While each offilters 54 has the same depth asinput feature map 52, the horizontal and vertical dimensions of the filter may vary. Each of thefilters 54 is convolved with thelayer input 52 to generate a two-dimensional (2D)matrix 56. - Subsequently, an optional
max pooling operation 58 is applied to producefeature maps 60. In other words, the output ofconvolutional layer 56 enters max pooling layer 58 (i.e., performing the max pooling operation) whose outputs arefeature maps 60. These 2D feature maps 60 are then stacked to yield a3D output matrix 62. Both convolution and max pooling operations contain various strides (or incremental steps) by which the respective input is horizontally and vertically traversed. - Each of convolutional layer outputs 16 and 20, and fully connected layer outputs 28, 32, and 36, details the image structures (i.e., features) that best matched the filters of the respective layer, thereby “detecting” those image structures. In general, each of convolutional layer outputs 16 and 20, and fully connected layer outputs 28, 32, and 36, detects image structures in an escalating manner such that the deeper layers detect features of greater complexity. For example, it has been empirically demonstrated that the first
convolutional layer 14 detects edges, and the secondconvolutional layer 18, which is deeper thanfirst layer 14, may detect object attributes, such as curvature and texture. It is noted that CNN 10 (FIG. 1A ) can include other numbers of convolutional layers, such as a single layer, four layers, five layers and the like. -
Max pooling layer 58 selects the input feature maps of greatest value (i.e., indicating that the filters that produced those largest feature map values can serve as salient feature detectors).Max pooling layer 58 demarcates its input into a set of overlapping or non-overlapping tiles and for each such tile, outputs the maximum value. Thus, max-poolinglayer 58 reduces the computational cost for deeper layers (i.e.,max pooling layer 58 serves as a sub-sampling or down-sampling layer). - It is noted that a convolution layer can be augmented with rectified linear operation and a
max pooling layer 58 can be augmented with normalization (e.g., local response normalization—as described, for example, in the Krizhevsky article referenced in the background section herein above). Alternatively,max pooling layer 58 can be replaced by another feature-pooling layer, such as average pooling layer, a quantile pooling layer, or rank pooling layer. Fullyconnected layers - In the example set forth in
FIGS. 1A and 1B ,CNN 10 includes two convolutional layers and three fully connected layers. However, the disclosed technique can be implemented by employing CNNs having more, or less, layers (e.g., three convolutional layers and five fully connected layers). Moreover, other parameters and characteristics of the CNN can be adapted according to the specific task, available resources, user preferences, the training set, the input image, and the like. Additionally, the disclosed technique is also applicable to other types of artificial neural networks (besides CNNs). - It is noted that the salient features detected by the neural network are regions, or patches, of the input image which are attributed with high values when convolved with the filters of the neural network. For example, the salient features can vary between simple corners to semantic object parts, such as an eye of a person, a whole head or face, or a car wheel, depending on the input image.
- Reference is now made to
FIG. 2 , which is a schematic illustration of a method for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, operative in accordance with another embodiment of the disclosed technique. Inprocedure 100, a trained Neural Network (e.g., a trained Convolutional Neural Network—CNN) is received. The CNN may include convolutional layers and fully connected layers. With reference toFIG. 1A ,CNN 10 is received after being trained with a selected training set. - In
procedure 102, the trained CNN is applied on an input image. The input image may, or may not, be related to the training set employed for training the neural network. That is, there is no requirement to use a training image, or to use an image from an image class found in the training set. The input image conforms to the expected input dimensions of the trained CNN. As such, the input image may require resizing and cropping, for example, for adapting it to the input dimensions of the CNN. Additionally, a pixel-based mean image, as determined in the training phase (i.e., mean image of the image training set), may be subtracted from the input image. With reference toFIG. 1A ,input image 12 is inputted intoCNN 10 as a multichannel feature map represented by a 3D matrix. In general, the input image has to undergo the same (or similar) preprocessing, which was applied to every image when training the neural network. - In
procedure 104, a plurality of feature maps from the output of the layers of the neural network are selected according to their values. The feature maps are produced in response to convolution of the various filters with the layer input. In particular, for each layer of the trained CNN, feature maps that are attributed with the top ranked values, are selected. That is, the highest valued feature maps at the output of the convolutional layer (or the fully connected layer) are selected. Alternatively, the highest valued feature maps can be selected at any stage following the convolution operation, for example prior to max pooling (i.e., even if the convolutional layer includes the optional max pooling operation). - The applied filters of the layers of the trained CNN serve as feature detectors that detect the locations of the layer input that have high correspondence with the filters. The feature maps having the top ranked values (i.e., also referred to as top ranked feature maps or top ranked values) represent the locations within the layer input that showed the greatest correspondence to the applied filters. Thus, the top ranked values represent salient features of the layer input as detected by the filter detectors of the respective layer.
- It is noted that the top ranked values can be selected “on the fly” during application of the trained CNN on the input image. That is, as a convolutional layer processes its respective input and produces respective output, the largest output values are selected. The top ranked values can be selected such that a selected percentage or quantity of values is selected (e.g., the upper 15% or the largest 1000 values), or can be selected such that only values exceeding a threshold are selected. With reference to
FIG. 1B , the greatest values oflayer output 62 are selected. - In
procedure 106, the locations corresponding to the selected feature maps (i.e., feature maps having the top ranked values) in an image space of the input image, are determined. The determination of these locations within the image space of the input image is also referred to herein as back-projection of the features that are represented by the selected top ranked values. In other words, in the back-projection process, each selected top ranked value (i.e., high value feature map), selected for each layer of the CNN, is mapped back to the image space of the original image. - The back-projection of the top ranked values to the image space of the input image is performed, for example, by employing a de-convolutional network. Alternatively, the back-projection is performed by a simple backpropagation (e.g., neural network technique used for training, as described, for example, in the Simonyan article referenced in the background section herein above). In particular, and as described, for example, in the Zeiler article referenced in the background section herein above, to approximately invert the convolutional step we may use any technique from the Blind Source Separation field, for example, a sparsity-based approach. Alternatively, a matched filter approach can be employed for inverting the convolutional step. To approximately invert the max pooling operation the stored masks can be used to place the max values in their appropriate input locations (i.e., zeroes are placed by default). Generally, any technique for mapping the selected high valued feature maps back to the image space of the input image can be applied. For example, the method of the disclosed technique can involve tracking all potential features (i.e., image patches or image regions detected by the neural network) throughout the network, thereby avoiding the need for back-projecting the features. For example, a selected image patch at the input to the first layer is tracked and the value attributed to that image patch by each of the filters of the first layer is recorded. Thus, the output of the first layer that is associated with the selected image patch is known. Similarly, the output of the first layer, associated with the selected image patch, that enters the second layer as input, is tracked, and so forth. Thereby, the output of each subsequent layer that is associated with the selected image patch is determined. With reference to
FIG. 1A , the selected highest (top ranked) values are back-projected to the image space ofinput image 12. - In
procedure 108, a plurality of interest points of the input image are defined based on the locations corresponding to the selected feature maps. Each interest point is associated with a distinct position within the image space of the input image. Thus, the geometric relations between the interest points (e.g., the distances and/or the angles between the interest points) can be determined according to the location of each interest point. Additionally, a descriptor can be determined for each interest point. The descriptor of an interest point provides further information about the interest point. For example, in case the interest points are employed for determining image similarity, an interest point of a first image should not be compared to an interest point of a second image, having a completely different descriptor. In this manner, computational resources can be saved during image similarity determination, and other visual tasks related thereto. - In accordance with the simplest (though not the most cost effective) embodiment of the disclose technique, the locations determined in the back-projection step are defined as the interest points of the input image. In this case, after
procedure 108, the method continues inprocedure 114. However, for reducing the number of interest points (i.e., thereby reducing the computational cost of the visual task performed based on the representation of the input image) only a subset of the back-projected locations are employed as interest points for representing the input image. Furthermore, the selected subset of interest points should preferably correspond to the more prominent features detected by the different layers of the CNN. Thus, for choosing the interest points that correspond to the highest back-projected values (i.e., corresponding to the most prominent salient features detected by the different layers of the CNN), the method of the disclosed technique may includeadditional sub-steps - In
procedure 110, the locations corresponding to the selected feature maps are combined into a heat map. The heat map includes the selected top ranked values, each located in a location determined in the back-projection process. Thereby, the heat map combines values representing salient features extracted from all layers of the CNN (i.e., features of various scale levels). Alternatively, a respective heat map is generated for each layer of the network. Thus, key points detected by each layer can be selected separately. Thereby, for example, knowledge of the scale level of each key point can be maintained and each layer can be represented separately. With reference toFIG. 1A , the selected highest values (i.e., the locations corresponding to the selected feature maps attributed with the top ranked values) are combined into a heat map. Each selected value is located in its respective location within the image space ofinput image 12 as determined by back-projection. - In
procedure 112, a plurality of interest points are extracted from the heat map (or heat maps). The interest points can be, for example, the peaks in the intensity map (e.g., global peaks or local peaks). Alternatively, the interest points are the centers of the densest portions of the heat map. Generally, any intensity based method for selecting key points out of the locations determined by back-projection of the detected salient features can be employed. The extracted interest points are employed for representing the input image for performing various visual tasks. With reference toFIG. 1A , interest points are extracted from the heat map, and can be employed for representinginput image 12. - In
procedure 114, the input image is represented as a graph according to the extracted interest points and the geometric relations between them. The geometric relations between the interest points can be, for example, the distance between pairs of points and the angles between triplets of points. The graph image representation maintains data respective of the geometric relations between the interest points and thereby, can improve the results of various visual tasks, such as similarity based image search. It is noted thatprocedure 114 is optional and the method can stop after procedure 112 (or even after procedure 108) and represent the image as a set of key points (interest points). With reference toFIG. 1A ,input image 12 is represented as a graph according to the extracted interest points and the geometric relations between the interest points. - Reference is now made to
FIG. 3 , which is a schematic illustration of a system, generally referenced 150, for representing an input image as a graph according to interest points detected by applying a trained convolutional neural network on the input image, constructed and operative in accordance with a further embodiment of the disclosed technique.System 150 includes aCNN trainer 152, aCNN executer 154, a topranks values selector 156, a feature back-projector 158, aheat map generator 160, aninterest point extractor 162, animage representer 164, and astorage device 168. -
Storage device 168 is coupled with each ofCNN trainer 152,CNN executer 154, top rankedvalues selector 156, feature back-projector 158,heat map generator 160,interest point extractor 162, andimage representer 164 for enabling the different components ofsystem 150 to store and retrieve data. It is noted that all components exceptstorage device 168 can be embedded on a single processing device or on an array of processing devices connected there-between. For example, components 152-164 are all embedded on a single graphics processing unit (GPU) 166, or a single Central Processing Unit (CPU) 166.Storage device 168 can be any storage device, such as a magnetic storage device (e.g., Hard Disc Drive—HDD), an optic storage device, and the like. -
CNN trainer 152 retrieves a CNN architecture and a training image data set fromstorage device 168 or from another external data source. CNN trainer executes the CNN on any of the images of the training image data set, and accordingly trains the CNN to detect features pertinent to the images of the training image data set. CNN trainer stores the trained CNN on data storage device. -
CNN executer 154 retrieves the trained CNN fromstorage device 168 and further retrieves an input image to be represented as a graph according to interest points detected by applying the trained CNN on the input image. CNN executer applies the trained CNN to the input image. - During execution of the trained CNN, top ranked
values selector 156 selects the top ranked values produced in response to the convolution of the various filters applied on the input to the respective layer. The top ranked values indicate that the filter that produced the high value is pertinent to the input image and therefore should be included in the image graph representation. - Feature back-
projector 158 retrieves the top ranked values and performs back-projection for each top ranked value. In other words, for each selected top ranked value, feature back-projector maps the top ranked value onto a respective location in the image space of the input image. That is, feature back-projector 158 determines for each selected value the location in the input image that when convolved with a respective filter of a respective convolutional layer produced the selected high value.Heat map generator 160 combines all back-projected top ranked values into a single heat map including each back-projected value positioned at its respective location within the image space of the input image, as determined by feature back-projector 158. -
Interest point extractor 162 extracts interest points (e.g., intensity based interest points) from the heat map produced byheat map generator 160. Each extracted interest point is associated with a location within the image space of the input image (e.g., the coordinates of the interest point). Additionally, the interest point extractor can also determine a descriptor for each of the extracted interest points.Image representer 164 represents the input image as a graph based on the extracted interest points and the geometric relations between the interest points (e.g., distance and angles between interest points) as determined according to the location of the extracted interest points. - In the examples set forth herein above with reference to
FIGS. 1A, 1 B, 2 and 3, the method and system of the disclosed technique were exemplified by a CNN. However, the disclosed technique is not limited to CNNs only, and is applicable to other artificial neural networks as well. In such cases the neural network (e.g., a feed-forward neural network, or any other configuration of artificial neural network) is applied onto an input image. High value features detected by the nodes of the network are mapped back to the image space of the input image, and key points (interest points) are selected therefrom. Optionally, only a subset of the detected features are activating subsequent nodes (or are employed for detecting key points) for reducing computational cost and/or for filtering out features that are less pertinent. The key points are employed for representing the input image for performing various visual tasks. In this manner, the input image is represented by features learned and detected by the neural network that are better suited for representing the input image than conventional features (not specifically adapted to the input image). - It will be appreciated by persons skilled in the art that the disclosed technique is not limited to what has been particularly shown and described hereinabove. Rather the scope of the disclosed technique is defined only by the claims, which follow.
Claims (6)
1. A method for representing an input image, the method comprising the following procedures:
applying a trained neural network on said input image;
selecting a plurality of feature maps of an output of at least one selected layer of said trained neural network according to values attributed to said plurality of feature maps by said trained neural network;
for each of said plurality of feature maps, determining a location corresponding thereto in an image space of said input image;
defining a plurality of interest points of said input image, based on said locations corresponding to said plurality of feature maps;
representing said input image as a graph according to said plurality of interest points and according to geometric relations between interest points of said plurality of interest points; and
employing said graph for performing a visual task,
wherein said graph comprises a plurality of vertices and edges; and
wherein said graph maintains data respective of said geometric relations between interest points.
2. The method of claim 1 , wherein said plurality of feature maps are selected according to a selected criterion of the list consisting of:
said values attributed to said plurality of feature maps exceed a threshold;
said values attributed to said plurality of feature maps being the N highest values; and
said values attributed to said plurality of feature maps being in the upper P % of values,
wherein N and P are selected numerical values.
3. The method of claim 1 , wherein said procedure of defining said plurality of interest points comprises the sub-procedures of:
combining said locations corresponding to said plurality of feature maps into at least one heat map; and
extracting said plurality of interest points from said at least one heat map.
4. The method of claim 3 , wherein each interest point of said plurality of interest points being an intensity peak of said at least one heat map.
5. The method of claim 3 , wherein each interest point of said plurality of interest points being a center of a region of said at least one heat map having high density of said locations corresponding to said plurality of feature maps, and wherein said region of said at least one heat map having high density of said locations being selected from the list consisting of:
regions having density value exceeding a threshold;
N regions having the highest density values; and
regions in the upper P % of density values,
wherein N and P are selected numerical values.
6. The method of claim 1 , further comprising the procedure of associating each one of said plurality of interest points with a respective descriptor before said procedure of representing said input image as a graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/188,729 US20160300121A1 (en) | 2014-04-01 | 2016-06-21 | Neural network image representation |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL231862 | 2014-04-01 | ||
IL231862A IL231862A (en) | 2014-04-01 | 2014-04-01 | Neural network image representation |
US14/676,404 US9396415B2 (en) | 2014-04-01 | 2015-04-01 | Neural network image representation |
US15/188,729 US20160300121A1 (en) | 2014-04-01 | 2016-06-21 | Neural network image representation |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/676,404 Continuation US9396415B2 (en) | 2014-04-01 | 2015-04-01 | Neural network image representation |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160300121A1 true US20160300121A1 (en) | 2016-10-13 |
Family
ID=51418161
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/676,404 Expired - Fee Related US9396415B2 (en) | 2014-04-01 | 2015-04-01 | Neural network image representation |
US15/188,729 Abandoned US20160300121A1 (en) | 2014-04-01 | 2016-06-21 | Neural network image representation |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/676,404 Expired - Fee Related US9396415B2 (en) | 2014-04-01 | 2015-04-01 | Neural network image representation |
Country Status (2)
Country | Link |
---|---|
US (2) | US9396415B2 (en) |
IL (1) | IL231862A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018113261A1 (en) * | 2016-12-22 | 2018-06-28 | 深圳光启合众科技有限公司 | Target object recognition method and apparatus, and robot |
WO2018221863A1 (en) * | 2017-05-31 | 2018-12-06 | Samsung Electronics Co., Ltd. | Method and device for processing multi-channel feature map images |
CN109063854A (en) * | 2018-08-23 | 2018-12-21 | 河南中裕广恒科技股份有限公司 | Intelligent O&M cloud platform system and its control method |
WO2019099515A1 (en) * | 2017-11-14 | 2019-05-23 | Magic Leap, Inc. | Fully convolutional interest point detection and description via homographic adaptation |
CN109961083A (en) * | 2017-12-14 | 2019-07-02 | 安讯士有限公司 | For convolutional neural networks to be applied to the method and image procossing entity of image |
US10769821B2 (en) | 2017-07-25 | 2020-09-08 | Nutech Company Limited | Method and device for reconstructing CT image and storage medium |
WO2020236624A1 (en) * | 2019-05-17 | 2020-11-26 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
US11119915B2 (en) | 2018-02-08 | 2021-09-14 | Samsung Electronics Co., Ltd. | Dynamic memory mapping for neural networks |
WO2021185379A1 (en) * | 2020-03-20 | 2021-09-23 | 长沙智能驾驶研究院有限公司 | Dense target detection method and system |
US20210350175A1 (en) * | 2020-05-07 | 2021-11-11 | Adobe Inc. | Key-value memory network for predicting time-series metrics of target entities |
US11270147B1 (en) | 2020-10-05 | 2022-03-08 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
US11348275B2 (en) * | 2017-11-21 | 2022-05-31 | Beijing Sensetime Technology Development Co. Ltd. | Methods and apparatuses for determining bounding box of target object, media, and devices |
US11423252B1 (en) | 2021-04-29 | 2022-08-23 | International Business Machines Corporation | Object dataset creation or modification using labeled action-object videos |
US20220414145A1 (en) * | 2019-08-16 | 2022-12-29 | The Toronto-Dominion Bank | Automated image retrieval with graph neural network |
US11797603B2 (en) | 2020-05-01 | 2023-10-24 | Magic Leap, Inc. | Image descriptor network with imposed hierarchical normalization |
Families Citing this family (83)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9542626B2 (en) * | 2013-09-06 | 2017-01-10 | Toyota Jidosha Kabushiki Kaisha | Augmenting layer-based object detection with deep convolutional neural networks |
US9953425B2 (en) | 2014-07-30 | 2018-04-24 | Adobe Systems Incorporated | Learning image categorization using related attributes |
US9536293B2 (en) * | 2014-07-30 | 2017-01-03 | Adobe Systems Incorporated | Image assessment using deep convolutional neural networks |
US9418458B2 (en) * | 2015-01-05 | 2016-08-16 | Superfish Ltd. | Graph image representation from convolutional neural networks |
US10872230B2 (en) * | 2015-03-27 | 2020-12-22 | Intel Corporation | Low-cost face recognition using Gaussian receptive field features |
US9436895B1 (en) * | 2015-04-03 | 2016-09-06 | Mitsubishi Electric Research Laboratories, Inc. | Method for determining similarity of objects represented in images |
AU2016261487B2 (en) * | 2015-05-11 | 2020-11-05 | Magic Leap, Inc. | Devices, methods and systems for biometric user recognition utilizing neural networks |
US20160350336A1 (en) * | 2015-05-31 | 2016-12-01 | Allyke, Inc. | Automated image searching, exploration and discovery |
EP3262569A1 (en) * | 2015-06-05 | 2018-01-03 | Google, Inc. | Spatial transformer modules |
US9971940B1 (en) * | 2015-08-10 | 2018-05-15 | Google Llc | Automatic learning of a video matching system |
US10255040B2 (en) * | 2017-05-11 | 2019-04-09 | Veridium Ip Limited | System and method for biometric identification |
US11329980B2 (en) | 2015-08-21 | 2022-05-10 | Veridium Ip Limited | System and method for biometric protocol standards |
US9852492B2 (en) * | 2015-09-18 | 2017-12-26 | Yahoo Holdings, Inc. | Face detection |
US11074492B2 (en) * | 2015-10-07 | 2021-07-27 | Altera Corporation | Method and apparatus for performing different types of convolution operations with the same processing elements |
CA2972183C (en) * | 2015-12-14 | 2018-03-27 | Motion Metrics International Corp. | Method and apparatus for identifying fragmented material portions within an image |
US10460231B2 (en) * | 2015-12-29 | 2019-10-29 | Samsung Electronics Co., Ltd. | Method and apparatus of neural network based image signal processor |
US20170236057A1 (en) * | 2016-02-16 | 2017-08-17 | Carnegie Mellon University, A Pennsylvania Non-Profit Corporation | System and Method for Face Detection and Landmark Localization |
US9916522B2 (en) * | 2016-03-11 | 2018-03-13 | Kabushiki Kaisha Toshiba | Training constrained deconvolutional networks for road scene semantic segmentation |
WO2017156547A1 (en) | 2016-03-11 | 2017-09-14 | Magic Leap, Inc. | Structure learning in convolutional neural networks |
US11461919B2 (en) | 2016-04-21 | 2022-10-04 | Ramot At Tel Aviv University Ltd. | Cascaded neural network |
GB2549554A (en) * | 2016-04-21 | 2017-10-25 | Ramot At Tel-Aviv Univ Ltd | Method and system for detecting an object in an image |
US10303977B2 (en) | 2016-06-28 | 2019-05-28 | Conduent Business Services, Llc | System and method for expanding and training convolutional neural networks for large size input images |
CN109661194B (en) | 2016-07-14 | 2022-02-25 | 奇跃公司 | Iris boundary estimation using corneal curvature |
EP3485425B1 (en) | 2016-07-14 | 2023-08-23 | Magic Leap, Inc. | Deep neural network for iris identification |
CN106251338B (en) * | 2016-07-20 | 2019-04-30 | 北京旷视科技有限公司 | Target integrity detection method and device |
WO2018014109A1 (en) * | 2016-07-22 | 2018-01-25 | 9206868 Canada Inc. | System and method for analyzing and searching for features associated with objects |
CA3034644A1 (en) | 2016-08-22 | 2018-03-01 | Magic Leap, Inc. | Augmented reality display device with deep learning sensors |
EP3300002A1 (en) | 2016-09-22 | 2018-03-28 | Styria medijski servisi d.o.o. | Method for determining the similarity of digital images |
RU2016138608A (en) | 2016-09-29 | 2018-03-30 | Мэджик Лип, Инк. | NEURAL NETWORK FOR SEGMENTING THE EYE IMAGE AND ASSESSING THE QUALITY OF THE IMAGE |
KR102216019B1 (en) | 2016-10-04 | 2021-02-15 | 매직 립, 인코포레이티드 | Efficient data layouts for convolutional neural networks |
US10552709B2 (en) * | 2016-10-05 | 2020-02-04 | Ecole Polytechnique Federale De Lausanne (Epfl) | Method, system, and device for learned invariant feature transform for computer images |
CN106529578A (en) * | 2016-10-20 | 2017-03-22 | 中山大学 | Vehicle brand model fine identification method and system based on depth learning |
US10339651B2 (en) * | 2016-10-28 | 2019-07-02 | International Business Machines Corporation | Simultaneous feature extraction and dictionary learning using deep learning architectures for characterization of images of heterogeneous tissue samples |
US10573040B2 (en) * | 2016-11-08 | 2020-02-25 | Adobe Inc. | Image modification using detected symmetry |
US10733505B2 (en) | 2016-11-10 | 2020-08-04 | Google Llc | Performing kernel striding in hardware |
JP6854344B2 (en) | 2016-11-15 | 2021-04-07 | マジック リープ, インコーポレイテッドMagic Leap,Inc. | Deep machine learning system for rectangular parallelepiped detection |
US10360494B2 (en) * | 2016-11-30 | 2019-07-23 | Altumview Systems Inc. | Convolutional neural network (CNN) system based on resolution-limited small-scale CNN modules |
KR20230070318A (en) | 2016-12-05 | 2023-05-22 | 매직 립, 인코포레이티드 | Virual user input controls in a mixed reality environment |
US11010431B2 (en) * | 2016-12-30 | 2021-05-18 | Samsung Electronics Co., Ltd. | Method and apparatus for supporting machine learning algorithms and data pattern matching in ethernet SSD |
US10546231B2 (en) * | 2017-01-23 | 2020-01-28 | Fotonation Limited | Method for synthesizing a neural network |
US10198655B2 (en) * | 2017-01-24 | 2019-02-05 | Ford Global Technologies, Llc | Object detection using recurrent neural network and concatenated feature map |
TWI617993B (en) * | 2017-03-03 | 2018-03-11 | 財團法人資訊工業策進會 | Recognition system and recognition method |
GB201703602D0 (en) * | 2017-03-07 | 2017-04-19 | Selerio Ltd | Multi-Modal image search |
AU2018236433B2 (en) | 2017-03-17 | 2022-03-03 | Magic Leap, Inc. | Room layout estimation methods and techniques |
US10496699B2 (en) * | 2017-03-20 | 2019-12-03 | Adobe Inc. | Topic association and tagging for dense images |
US10147019B2 (en) * | 2017-03-20 | 2018-12-04 | Sap Se | Small object detection |
CN107203765B (en) * | 2017-03-30 | 2023-08-25 | 腾讯科技(上海)有限公司 | Sensitive image detection method and device |
US10783393B2 (en) | 2017-06-20 | 2020-09-22 | Nvidia Corporation | Semi-supervised learning for landmark localization |
CN109214238B (en) * | 2017-06-30 | 2022-06-28 | 阿波罗智能技术(北京)有限公司 | Multi-target tracking method, device, equipment and storage medium |
US10380413B2 (en) | 2017-07-13 | 2019-08-13 | Robert Bosch Gmbh | System and method for pose-invariant face alignment |
KR102666475B1 (en) | 2017-07-26 | 2024-05-21 | 매직 립, 인코포레이티드 | Training a neural network with representations of user interface devices |
US10275646B2 (en) * | 2017-08-03 | 2019-04-30 | Gyrfalcon Technology Inc. | Motion recognition via a two-dimensional symbol having multiple ideograms contained therein |
US10521661B2 (en) | 2017-09-01 | 2019-12-31 | Magic Leap, Inc. | Detailed eye shape model for robust biometric applications |
US10268205B2 (en) * | 2017-09-13 | 2019-04-23 | TuSimple | Training and testing of a neural network method for deep odometry assisted by static scene optical flow |
CA3068481A1 (en) | 2017-09-20 | 2019-03-28 | Magic Leap, Inc. | Personalized neural network for eye tracking |
CN107680088A (en) * | 2017-09-30 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for analyzing medical image |
US11093832B2 (en) | 2017-10-19 | 2021-08-17 | International Business Machines Corporation | Pruning redundant neurons and kernels of deep convolutional neural networks |
US10535155B2 (en) * | 2017-10-24 | 2020-01-14 | Toyota Motor Engineering & Manufacturing North America, Inc. | Systems and methods for articulated pose estimation |
IL273991B2 (en) | 2017-10-26 | 2023-11-01 | Magic Leap Inc | Gradient normalization systems and methods for adaptive loss balancing in deep multitask networks |
EP3741109B1 (en) | 2018-01-17 | 2024-04-24 | Magic Leap, Inc. | Eye center of rotation determination, depth plane selection, and render camera positioning in display systems |
CN108399382A (en) | 2018-02-13 | 2018-08-14 | 阿里巴巴集团控股有限公司 | Vehicle insurance image processing method and device |
US10855986B2 (en) | 2018-05-29 | 2020-12-01 | Qualcomm Incorporated | Bandwidth compression for neural network systems |
US10671891B2 (en) | 2018-07-19 | 2020-06-02 | International Business Machines Corporation | Reducing computational costs of deep reinforcement learning by gated convolutional neural network |
US11567336B2 (en) | 2018-07-24 | 2023-01-31 | Magic Leap, Inc. | Display systems and methods for determining registration between display and eyes of user |
CN109101919B (en) * | 2018-08-03 | 2022-05-10 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
US10936907B2 (en) | 2018-08-10 | 2021-03-02 | Buffalo Automation Group Inc. | Training a deep learning system for maritime applications |
US10782691B2 (en) | 2018-08-10 | 2020-09-22 | Buffalo Automation Group Inc. | Deep learning and intelligent sensing system integration |
US10984262B2 (en) * | 2018-10-08 | 2021-04-20 | StradVision, Inc. | Learning method and testing method for monitoring blind spot of vehicle, and learning device and testing device using the same |
JP7357674B2 (en) * | 2018-10-24 | 2023-10-06 | クライメイト、リミテッド、ライアビリティー、カンパニー | Plant disease infection detection with improved machine learning |
US20210406695A1 (en) * | 2018-11-06 | 2021-12-30 | Emory University | Systems and Methods for Training an Autoencoder Neural Network Using Sparse Data |
US10977548B2 (en) | 2018-12-05 | 2021-04-13 | Bank Of America Corporation | Generation of capsule neural networks for enhancing image processing platforms |
KR102046113B1 (en) * | 2019-03-19 | 2019-11-18 | 주식회사 루닛 | Machine-learning method for neural network and apparatus thereof |
US11410016B2 (en) | 2019-04-26 | 2022-08-09 | Alibaba Group Holding Limited | Selective performance of deterministic computations for neural networks |
WO2020236993A1 (en) | 2019-05-21 | 2020-11-26 | Magic Leap, Inc. | Hand pose estimation |
CN110211164B (en) * | 2019-06-05 | 2021-05-07 | 中德(珠海)人工智能研究院有限公司 | Picture processing method of characteristic point operator based on neural network learning basic graph |
CN114424147A (en) | 2019-07-16 | 2022-04-29 | 奇跃公司 | Determining eye rotation center using one or more eye tracking cameras |
CN111368853A (en) * | 2020-02-04 | 2020-07-03 | 清华珠三角研究院 | Label construction method, system, device and storage medium |
CN111310509A (en) * | 2020-03-12 | 2020-06-19 | 北京大学 | Real-time bar code detection system and method based on logistics waybill |
CN113435233B (en) * | 2020-03-23 | 2024-03-05 | 北京金山云网络技术有限公司 | Pornographic image recognition method and system and electronic equipment |
US11500939B2 (en) * | 2020-04-21 | 2022-11-15 | Adobe Inc. | Unified framework for multi-modal similarity search |
US11532147B2 (en) * | 2020-09-25 | 2022-12-20 | Microsoft Technology Licensing, Llc | Diagnostic tool for deep learning similarity models |
CN113205481A (en) * | 2021-03-19 | 2021-08-03 | 浙江科技学院 | Salient object detection method based on stepped progressive neural network |
US20240338491A1 (en) * | 2021-08-16 | 2024-10-10 | The Regents Of The University Of California | A design automation methodology based on graph neural networks to model integrated circuits and mitigate hardware security threats |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2004055735A1 (en) * | 2002-12-16 | 2004-07-01 | Canon Kabushiki Kaisha | Pattern identification method, device thereof, and program thereof |
JP4532915B2 (en) * | 2004-01-29 | 2010-08-25 | キヤノン株式会社 | Pattern recognition learning method, pattern recognition learning device, image input device, computer program, and computer-readable recording medium |
JP4546157B2 (en) * | 2004-06-03 | 2010-09-15 | キヤノン株式会社 | Information processing method, information processing apparatus, and imaging apparatus |
WO2008133951A2 (en) * | 2007-04-24 | 2008-11-06 | Massachusetts Institute Of Technology | Method and apparatus for image processing |
US8135202B2 (en) * | 2008-06-02 | 2012-03-13 | Nec Laboratories America, Inc. | Automated method and system for nuclear analysis of biopsy images |
US8428348B2 (en) | 2009-04-15 | 2013-04-23 | Microsoft Corporation | Image analysis through neural network using image average color |
-
2014
- 2014-04-01 IL IL231862A patent/IL231862A/en not_active IP Right Cessation
-
2015
- 2015-04-01 US US14/676,404 patent/US9396415B2/en not_active Expired - Fee Related
-
2016
- 2016-06-21 US US15/188,729 patent/US20160300121A1/en not_active Abandoned
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229263A (en) * | 2016-12-22 | 2018-06-29 | 深圳光启合众科技有限公司 | The recognition methods of target object and device, robot |
WO2018113261A1 (en) * | 2016-12-22 | 2018-06-28 | 深圳光启合众科技有限公司 | Target object recognition method and apparatus, and robot |
US10733767B2 (en) | 2017-05-31 | 2020-08-04 | Samsung Electronics Co., Ltd. | Method and device for processing multi-channel feature map images |
WO2018221863A1 (en) * | 2017-05-31 | 2018-12-06 | Samsung Electronics Co., Ltd. | Method and device for processing multi-channel feature map images |
KR20180131073A (en) * | 2017-05-31 | 2018-12-10 | 삼성전자주식회사 | Method and apparatus for processing multiple-channel feature map images |
KR102301232B1 (en) | 2017-05-31 | 2021-09-10 | 삼성전자주식회사 | Method and apparatus for processing multiple-channel feature map images |
US10769821B2 (en) | 2017-07-25 | 2020-09-08 | Nutech Company Limited | Method and device for reconstructing CT image and storage medium |
US10977554B2 (en) * | 2017-11-14 | 2021-04-13 | Magic Leap, Inc. | Fully convolutional interest point detection and description via homographic adaptation |
US11537894B2 (en) * | 2017-11-14 | 2022-12-27 | Magic Leap, Inc. | Fully convolutional interest point detection and description via homographic adaptation |
WO2019099515A1 (en) * | 2017-11-14 | 2019-05-23 | Magic Leap, Inc. | Fully convolutional interest point detection and description via homographic adaptation |
AU2018369757B2 (en) * | 2017-11-14 | 2023-10-12 | Magic Leap, Inc. | Fully convolutional interest point detection and description via homographic adaptation |
US11348275B2 (en) * | 2017-11-21 | 2022-05-31 | Beijing Sensetime Technology Development Co. Ltd. | Methods and apparatuses for determining bounding box of target object, media, and devices |
CN109961083A (en) * | 2017-12-14 | 2019-07-02 | 安讯士有限公司 | For convolutional neural networks to be applied to the method and image procossing entity of image |
US11119915B2 (en) | 2018-02-08 | 2021-09-14 | Samsung Electronics Co., Ltd. | Dynamic memory mapping for neural networks |
CN109063854A (en) * | 2018-08-23 | 2018-12-21 | 河南中裕广恒科技股份有限公司 | Intelligent O&M cloud platform system and its control method |
WO2020236624A1 (en) * | 2019-05-17 | 2020-11-26 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
US12007564B2 (en) | 2019-05-17 | 2024-06-11 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
JP7422785B2 (en) | 2019-05-17 | 2024-01-26 | マジック リープ, インコーポレイテッド | Method and apparatus for angle detection using neural networks and angle detectors |
JP2022532238A (en) * | 2019-05-17 | 2022-07-13 | マジック リープ, インコーポレイテッド | Methods and equipment for angle detection using neural networks and angle detectors |
US11686941B2 (en) | 2019-05-17 | 2023-06-27 | Magic Leap, Inc. | Methods and apparatuses for corner detection using neural network and corner detector |
US20220414145A1 (en) * | 2019-08-16 | 2022-12-29 | The Toronto-Dominion Bank | Automated image retrieval with graph neural network |
US11809486B2 (en) * | 2019-08-16 | 2023-11-07 | The Toronto-Dominion Bank | Automated image retrieval with graph neural network |
WO2021185379A1 (en) * | 2020-03-20 | 2021-09-23 | 长沙智能驾驶研究院有限公司 | Dense target detection method and system |
US11797603B2 (en) | 2020-05-01 | 2023-10-24 | Magic Leap, Inc. | Image descriptor network with imposed hierarchical normalization |
US12072927B2 (en) | 2020-05-01 | 2024-08-27 | Magic Leap, Inc. | Image descriptor network with imposed hierarchical normalization |
US11501107B2 (en) * | 2020-05-07 | 2022-11-15 | Adobe Inc. | Key-value memory network for predicting time-series metrics of target entities |
US11694165B2 (en) | 2020-05-07 | 2023-07-04 | Adobe Inc. | Key-value memory network for predicting time-series metrics of target entities |
US20210350175A1 (en) * | 2020-05-07 | 2021-11-11 | Adobe Inc. | Key-value memory network for predicting time-series metrics of target entities |
US11270147B1 (en) | 2020-10-05 | 2022-03-08 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
US11928849B2 (en) | 2020-10-05 | 2024-03-12 | International Business Machines Corporation | Action-object recognition in cluttered video scenes using text |
US11423252B1 (en) | 2021-04-29 | 2022-08-23 | International Business Machines Corporation | Object dataset creation or modification using labeled action-object videos |
Also Published As
Publication number | Publication date |
---|---|
US9396415B2 (en) | 2016-07-19 |
IL231862A0 (en) | 2014-08-31 |
IL231862A (en) | 2015-04-30 |
US20150278642A1 (en) | 2015-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9396415B2 (en) | Neural network image representation | |
US9418458B2 (en) | Graph image representation from convolutional neural networks | |
CN110532920B (en) | Face recognition method for small-quantity data set based on FaceNet method | |
Von Stumberg et al. | Gn-net: The gauss-newton loss for multi-weather relocalization | |
CN107871106B (en) | Face detection method and device | |
Silberman et al. | Instance segmentation of indoor scenes using a coverage loss | |
Christlein et al. | An evaluation of popular copy-move forgery detection approaches | |
WO2020170014A1 (en) | Object counting and instance segmentation using neural network architectures with image-level supervision | |
US20160196479A1 (en) | Image similarity as a function of weighted descriptor similarities derived from neural networks | |
WO2016054779A1 (en) | Spatial pyramid pooling networks for image processing | |
EP3813661A1 (en) | Human pose analysis system and method | |
JP6905079B2 (en) | Detection and representation of objects in images | |
CN105139004A (en) | Face expression identification method based on video sequences | |
JP7225731B2 (en) | Imaging multivariable data sequences | |
Galety et al. | Marking attendance using modern face recognition (fr): Deep learning using the opencv method | |
Le et al. | Co-localization with category-consistent features and geodesic distance propagation | |
Zhang et al. | Multiresolution attention extractor for small object detection | |
Manzoor et al. | Ancient coin classification based on recent trends of deep learning. | |
Sreenivas et al. | Modified deep belief network based human emotion recognition with multiscale features from video sequences | |
CN114565752A (en) | Image weak supervision target detection method based on class-agnostic foreground mining | |
Mocanu et al. | Multimodal convolutional neural network for object detection using rgb-d images | |
Rasouli et al. | Integrating three mechanisms of visual attention for active visual search | |
Lee et al. | Where to look: Visual attention estimation in road scene video for safe driving | |
Seemanthini et al. | Small human group detection and validation using pyramidal histogram of oriented gradients and gray level run length method | |
Manimurugan et al. | HLASwin-T-ACoat-Net based Underwater Object Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SUPERFISH LTD., ISRAEL Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHERTOK, MICHAEL;LORBERT, ALEXANDER;PINHAS, ADI;SIGNING DATES FROM 20150512 TO 20150513;REEL/FRAME:038977/0420 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |