US20180204062A1 - Systems and methods for image processing - Google Patents

Systems and methods for image processing Download PDF

Info

Publication number
US20180204062A1
US20180204062A1 US15/879,343 US201815879343A US2018204062A1 US 20180204062 A1 US20180204062 A1 US 20180204062A1 US 201815879343 A US201815879343 A US 201815879343A US 2018204062 A1 US2018204062 A1 US 2018204062A1
Authority
US
United States
Prior art keywords
images
query image
image
feature vector
dcnn
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/879,343
Inventor
Vignesh KRISHNAKUMAR
Hariprasad Prayagai Sridharasingan
Adarsh Amarendra Tadimari
Saivenkatesh A.
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hyperverge Inc
Original Assignee
Hyperverge Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hyperverge Inc filed Critical Hyperverge Inc
Priority to US15/879,343 priority Critical patent/US20180204062A1/en
Assigned to HYPERVERGE INC. reassignment HYPERVERGE INC. NUNC PRO TUNC ASSIGNMENT (SEE DOCUMENT FOR DETAILS). Assignors: A, SAIVENKATESH, KRISHNAKUMAR, VIGNESH, SRIDHARASINGAN, HARIPRASAD PRAYAGAI, TADIMARI, ADARSH AMARENDRA
Publication of US20180204062A1 publication Critical patent/US20180204062A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06K9/00684
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06K9/00624
    • G06K9/4609
    • G06K9/4619
    • G06K9/4671
    • G06K9/6232
    • G06K9/6277
    • G06K9/66
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19127Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/192Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
    • G06V30/194References adjustable by an adaptive method, e.g. learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present invention relates to image processing systems and methods and more particularly to systems and methods for image scene classification and similarity matching.
  • Image processing is widely gaining popularity with the increasing usage of digital images for various purposes. Particularly, one of the most important areas in image processing relates to scene classification that deals with the problem of understanding the context of what is captured in an image. Understanding a holistic view of an image is a relatively difficult task due to the lack of text labels that represent the content present in them.
  • Existing systems and methods for scene recognition have a number of drawbacks and limitations. Existing solutions treat indoor and outdoor scene recognition as two different problems due to the significant variation in appearance characteristics between indoor and outdoor images. It has largely been perceived that different kinds of features would be required to discriminate between indoor scenes and outdoor scenes. This is highly inefficient since different systems and methods are required to be deployed for scene recognition of indoor and outdoor scenes. Current indoor scene recognition systems and methods use part-based models that look for characteristic objects in an image to determine the scene which results in the inaccurate assumption that similar-looking objects distributed spatially in a similar manner, constitute the same scene.
  • the current solutions are unable to effectively address the problem of overfitting caused by the use of real world image datasets (as input to these systems) that capture a lot of intra-class variation, i.e. the significant variation in appearance characteristics of images within each class.
  • existing solutions use hand crafted features to discriminate between images/scenes. However, features that are good for discriminating between some classes may not be good for other classes.
  • Existing approaches to image/scene recognition are incapable of continuously learning from or adapting to the increasing number of images uploaded to the internet every day.
  • Another important aspect of image processing relates to similarity matching of images.
  • image processing relates to similarity matching of images.
  • image recognition technologies With the growing requirement for image recognition technologies, the need for scalable recognition techniques that can handle a large number of classes and continuously learn from internet scale images has become very evident. Unlike searching for textual data, searching for images similar to a particular image is a challenging task. The number of images uploaded on the World Wide Web is increasing every day and it has become extremely difficult to incorporate these newly added images into the search database of existing similarity matching techniques.
  • existing image recognition solutions use hand crafted features to discriminate between images. A major disadvantage of such systems is that it results in a large reconstruction error, i.e. reconstruction of images using these hand crafted features is likely to produce an image very different from the original image.
  • an object of the present invention to provide systems and methods for image processing that facilitates scene recognition while minimizing the false positives. It is another object of the present invention to facilitate large scale recognition of indoor and outdoor scenes. Another object of the invention is to provide systems and methods for scene classification that helps in achieving translational invariance. Yet another object of the invention is to provide image processing systems and methods that efficiently provides images similar to a query image. Another object of the invention is to facilitate similarity matching that minimizes reconstruction error.
  • one aspect of the present disclosure relates to a method for scene classification for an image.
  • the method begins with receiving at least one image and classifying said image into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image.
  • the method further comprises validating the at least one intermediate output category based on said extracted characteristic features and providing a scene classification based on said validation.
  • DCNN deep convolutional neural network
  • Another aspect of the invention relates to a system for scene classification of an image, the system comprising a receiver unit for receiving at least one image for classification and a base classifier, associated with said receiver unit, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image.
  • the system further comprises a binary classifier associated with said base classifier, for providing a scene classification for said at least one image based on validation of the at least one intermediate output category, wherein said validation is based on said extracted characteristic features.
  • DCNN deep convolutional neural network
  • Yet another aspect of the disclosure relates to a method for providing images similar to a query image from within a set of images.
  • the method comprises receiving a query image from the user and providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image.
  • the method further comprises reducing the dimensionality of the extracted feature vector to form a reduced dimensional feature vector and subsequently splitting the reduced dimensional feature vector into a plurality of query feature segments.
  • each of said query image segments are compared with segments of the set of images and images similar to said query image is provided based on this comparison.
  • Yet another aspect of the invention relates to a system for providing images similar to a query image from within a set of images.
  • the system comprises an input module for receiving a query image from the user and a base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image.
  • the system further comprises a reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments.
  • the system also comprises a comparison module associated with said reduction module, for providing images similar to said query image, based on a comparison between each of said query image segments and segments of the set of images.
  • FIG. 1 illustrates a block diagram of a system for scene classification, in accordance with exemplary embodiments of the present invention.
  • FIG. 2 illustrates a general example of a neural network.
  • FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention.
  • FIG. 4 illustrates an exemplary list of categories/classes, in accordance with exemplary embodiments of the present invention.
  • FIG. 5 illustrates a method for scene classification, in accordance with exemplary embodiments of the present invention.
  • FIG. 6 illustrates a system for similarity matching, in accordance with exemplary embodiments of the present invention.
  • FIG. 7 illustrates a method for similarity matching, in accordance with exemplary embodiments of the present invention.
  • the system for scene classification in accordance with example embodiments of the present disclosure, comprises of a transceiver unit 102 associated with a base classifier 104 and a binary classifier 106 , all connected to a data repository 108 .
  • the transceiver unit 102 is configured to receive at least one image for classification, wherein the image may be received from a user or any other system.
  • the transceiver unit 102 is configured to pre-process this information and provide it to the base classifier 104 .
  • the base classifier 104 comprises of a deep convolutional neural network (DCNN), wherein the DCNN is configured to determine at least one intermediate output category for said image and extract one or more characteristic features thereof.
  • the intermediate output category may be chosen from a set of pre-defined output categories.
  • the intermediate output category determined by the DCNN is one of an indoor category, an outdoor category and a combination thereof.
  • the extracted one or more characteristic features may be the output feature vector produced by the first fully connected layer of the DCNN.
  • the DCNN is a large neural network configured to deal with two dimensional input data such as an image and perform 2D convolutional operation on such input. The DCNN has been discussed in detail with reference to FIG. 3 .
  • the binary classifier 106 associated with said base classifier 104 is configured to validate the intermediate output category provided by the base classifier based on the extracted characteristic features of an image and provide a scene classification for the image based on said validation.
  • the invention encompasses a binary classifier capable of predicting that the image does not belong to any scene classification.
  • the invention encompasses a binary classifier capable of predicting more than one scene category/classification to an image received from the user.
  • the system comprises of one binary classifier for each of the intermediate output categories pre-defined in the system.
  • the invention encompasses a binary classifier that is a Support Vector Machine (SVM).
  • SVM Support Vector Machine
  • the data repository 108 is configured to store the intermediate output category and the extracted characteristic features of the image received from the base classifier 104 .
  • the data repository 108 is further configured to store the scene classification provided by the binary classifier 106 .
  • FIG. 2 illustrates a simplified example of an artificial neural network comprising of multiple nodes and connections between them.
  • the nodes 202 ( i ) are referred to as input nodes and are configured to receive raw data in the form of inputs that trigger the nodes it is connected to.
  • Each connection as shown in FIG. 2 has a corresponding value referred to as weights, wherein these weights may indicate the importance of these connections/nodes.
  • weights when a node is triggered by two or more nodes, the input taken by the node depends upon the weight of the connections from which the node is triggered.
  • the nodes 204 ( j ) are referred to as intermediate/hidden nodes and are configured to accept input from one or more input nodes and provide output to one or more hidden nodes and/or output nodes 206 ( k ).
  • the input nodes 202 ( i ) form an input layer
  • the hidden nodes 204 ( j ) form one or more hidden layers
  • the output nodes 206 ( k ) form an output layer.
  • FIG. 2 it will be appreciated that any number of hidden layers may be implemented in the artificial neural network depending upon the complexity of the decision to be made, the dimensionality of the input data and size of the dataset used for training.
  • large number of hidden layers are stacked one above the other, wherein each layer computes a non-linear transformation of the outputs from the previous layer.
  • the deep neural network are required to be trained or customized, wherein a large set of inputs are provided to the neural network, outputs for said inputs are computed and the network weights are adjusted based on the error, if any, in the outputs.
  • the training of neural networks encompasses by the invention is performed using backpropagation, wherein random weights/values are assigned to each connection followed by computing a set of outputs for a given set of inputs using said random weights.
  • a desired output of each of the inputs is defined and is compared with the calculated output, wherein the difference between the two values may be referred to as an error in the network.
  • weights for each layer are adjusted based on the computed error for each layer.
  • FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention.
  • Deep Convolutional Neural Networks are neural nets that are specifically designed to deal with 2D input data and their patterns of translation, scale and distortion variances.
  • the DCNN is configured to perform a 2D convolution operation where the input to a neuron/node is the output of a 2D kernel operating on a local window of neurons from the previous layer. The same 2D kernel is applied throughout the layer, resulting in weights being shared by multiple connections in the network.
  • the DCNN comprises of several types of layers such as the convolution layer, pooling layers and fully connected layers.
  • the convolution layer is configured to perform individual convolution operations and send the output of each such operation to one or more nodes in the next layer. In a preferred embodiment, the output of the convolution operations is transferred to only some nodes in the next layer.
  • the pooling layer is configured to perform aggregate operations like max/average over neighboring windows of outputs. The pooling layer has no weights and just aggregates values over the receptive field.
  • a fully connected layer is the traditional layer type where each node is connected to every node in the next layer.
  • the DCNN networks may be trained using techniques such as rectified linear units, local response normalization, and parallelization on a GPU.
  • the weight update rule is as follows:
  • v i + 1 0.9 ⁇ v i - 0.0005 ⁇ ⁇ ⁇ w i - ⁇ ⁇ ⁇ ⁇ L ⁇ w ⁇
  • i is the iteration number
  • w is the weight vector
  • D i is the training data sampled for training in iteration l
  • v is the weight update.
  • the DCNN comprises of 5 convolution layers, 3 pooling layers and 3 fully connected layers.
  • the input layer is configured to read/receive raw RGB data from an image and the output layer is configured to produce intermediate output categories probabilities as outputs.
  • the first convolution layer has 96 kernels of size 11 ⁇ 11 ⁇ 3
  • second convolution layer has 256 kernels of size 5 ⁇ 5 ⁇ 96
  • third convolution layer has 384 kernels of size 3 ⁇ 3 ⁇ 256
  • fourth convolution layer has 384 kernels of size 3 ⁇ 3 ⁇ 384
  • fifth convolution layer has 256 kernels of size 3 ⁇ 3 ⁇ 384. All three pooling layers use a kernel of size 3 ⁇ 3 with a stride of 2 and the two fully connected layers comprise of 4096 nodes/neurons each.
  • the system for similarity matching i.e. for providing images similar to an query image from within a set of images, comprises an input/output module 602 connected to a base classifier 604 further connected to a reduction module 606 and comparison module 608 , all connected to a central database 610 .
  • the I/O module 602 is configured to receive a query image, wherein the query image may be received from a user input or as part of a request made by another system.
  • the I/O module 602 is also configured to store the query image into the central database 610 and provide the same to the base classifier 604 , wherein the base classifier 604 comprises of a deep convolutional neural network (DCNN).
  • the DCNN extracts a feature vector of said query image and provides it to the reduction module 606 , wherein in an embodiment the extracted feature vector is a 4096 dimensional vector.
  • the base classifier 604 stores this extracted feature vector in the central database 610 .
  • the reduction module 606 comprises of an auto-encoder (not shown in FIG. 6 ) configured to reduce the dimensionality of said feature vector to form a reduced dimensional feature vector.
  • the auto encoder converts the 4096 dimensional feature vector into 128 reduced dimensional feature vector.
  • the auto encoder is an artificial neural network trained in a supervised error back propagation manner same as or at least substantially similar to that discussed hereinabove.
  • the auto encoder is configured to adjust/learn a set of weights that minimize the reconstruction error, wherein reconstruction error captures the similarity between the image reconstructed from a reduced dimensional vector and the original query image. Minimizing the reconstruction error ensures that similar vectors in higher dimensional space are also similar in lower dimensional space and vice versa.
  • the auto encoder was trained with the network architecture consisting of the following sequence of dimensions: 4096-1024-512-128-32-128-512-1024.
  • the reduction module 606 is further configured to split the reduced dimensional feature vector into a plurality of query image segments and provide the same to the comparison module 608 .
  • the reduction module 606 splits the 128 bit reduced dimensional feature vector into 16 segments of 8 bits each.
  • the comparison module 608 is configured to compare the query image segments with the segments of set of images stored in the central database 610 and provide images similar to said query image based on this comparison, wherein the query image itself is excluded from the set of images used for this comparison.
  • the comparison module 608 is further configured to compare the original query image with the similar images and re-rank the similar images based on this comparison. In other words, similar images computed on the basis of the comparison using smaller reduced dimensionality feature vectors are re-ranked by the comparison module 608 based on a comparison between the feature vector of the query image and the feature vectors of the similar images.
  • the central database 610 is configured to store a large set of images indexed as 8 bit segments using hash tables. For every image in the set of images, the central database is configured to store a 128-bit representation of the image computed by the reduction module 606 , the original 4096 dimensional feature vector of the image extracted by the base classifier 604 and the image itself. In an embodiment, the 128-bit vector is split into 16 segments of 8 bits each.
  • the central database 610 is configured to maintain 16 hash tables, one for each segment, wherein the hash table corresponding to segment “i” uses the integer computed from those 8 bits as the hash key and the image as the value.
  • the central database 610 is further configured to store the input query images, the feature vectors, the reduced dimensionality feature vectors, comparison results and any other data/information used or processed by the system or any modules/components thereof.
  • the method for scene classification for an image begins at step 502 , wherein at least one image is received for classification pursuant to which it is pre-processed to alter the size, dimensionality of the image, extract the RGB data from said image, etc.
  • the received and/or pre-processed image is classified into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. Extracting one or more characteristic features of said image includes extracting characteristic features from a fully connected layer of the DCNN. Determining an intermediate output category includes passing the input image to the DCNN, computing features in each layer of said DCNN and propagating said features to the subsequent layers of the DCNN, at least one non-linear transformation of the received image is computed during this step.
  • DCNN deep convolutional neural network
  • the intermediate output category determined at step 504 is validated based on the extracted characteristic features.
  • Validating the intermediate output category includes passing the characteristics features of the image as an input to the binary classifier, wherein if the binary classifier returns a positive output then the scene classification for the input image is the same as the predicted intermediate output category, otherwise the scene classification for the input image is ‘none of the pre-defined scene classes’. Based on said validation a scene classification is provided for said image.
  • the method also encompasses reducing overfitting during the process of scene classification, wherein the overfitting may be reduced by performing data augmentation by replicating data/images through the introduction of small variations and augmenting this to the training dataset.
  • data augmentation is performed by extracting multiple patches of 227 ⁇ 227 from the training image of 256 ⁇ 256.
  • data augmentation is performed by performing PCA on R, G, B pixel values over the training data for each pixel and extracting the top 16 principal components. Subsequently, these components are multiplied with a random small factor and the weighed principal components are added to each image to get more replications.
  • the invention also encompasses reducing overfitting by dropping the inputs from certain randomly chosen nodes during training.
  • the outputs of all neurons are taken in but they are multiplied with a factor to account for dropout, for instance if 50% of the inputs are dropped out, the outputs of all neurons are multiplied with a factor of 0.5 to weigh their contributions in the training process.
  • the method also encompasses reducing overfitting by early termination wherein the learning rate is dropped by a factor of 10 whenever the training error increases or is constant for a sustained number of iterations.
  • the method for providing images similar to a query image from within a set of images begins with receiving a query image at step 702 .
  • the received input query image is provided as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image.
  • the extracted feature vector is an output of a fully connected layer of the DCNN.
  • the dimensionality of said feature vector is reduced to form a reduced dimensional feature vector.
  • the invention encompasses reducing dimensionality by providing the feature vector extracted by the DCNN as an input to the Deep Belief Network (DBN), wherein the DBN comprises layer-wise Restricted Boltzmann Machines (RBM).
  • the layer-wise RBMs are built for the following configurations: 4096 to 1024, 1024 to 512, 512 to 128 and 128 to 32 dimensions and are trained through contrastive divergence approach.
  • the invention also encompasses reducing dimensionality by providing the feature vector as an input to an auto encoder artificial neural network, wherein the auto encoder converts the feature vector into a reduced dimensionality feature vector.
  • the invention encompasses use of a hashing scheme to index all the images stored in the database, i.e. the set of images within which similarity search is to be performed.
  • Each image stored in the database is split into 16 segments of 8-bits, wherein each segment is stored in a hash table.
  • the reduced dimensional feature vector of the query image is also split/divided into a plurality of segments, preferably into 16 segments of 8-bits each, and each such segment is provided to one of the 16 hash tables.
  • query image segments are compared with segments of the set of images at step 710 . Comparing query image segments with segments of set of images includes providing each segment to one node, wherein each node looks up its specific hash table and returns images matching the 8 bit segment that was sent to that node.
  • Step 710 provides images similar to said query image based on this comparison, wherein similar images are those for which the distance from the query image is less than a pre-defined threshold.
  • the similar images computed in the above step are then re-ranked based on their distance from the query image in the original dimensional space.
  • hamming distance is used as a distance measure. This distance search is first performed in the reduced dimensional space (such as 128 bit dimensional space) and then the obtained results are re-ranked according to the distances in the original dimensional space (such as the 4096 bit dimensional space).
  • the disclosed methods and systems may be implemented on a Graphics Processing Unit (GPU).
  • GPU Graphics Processing Unit
  • the systems are implemented on NVIDIA Tesla M2070 GPU card with 2880 CUDA cores, an Intel Xeon X5675 CPU and 5375 MB of Video RAM.

Abstract

Efficient image processing systems and methods for image scene classification and similarity matching are disclosed. The image processing systems encompassed by this disclosure use a deep convolutional neural network to facilitate scene classification by recognizing the context of an image and thereby enabling searches for similar images. These methods and systems are scalable to a large set of images and achieve a higher performance compared to the current state of the art techniques.

Description

    FIELD OF THE INVENTION
  • The present invention relates to image processing systems and methods and more particularly to systems and methods for image scene classification and similarity matching.
  • BACKGROUND
  • The following description of related art is intended to provide background information pertaining to the field of the invention. This section may include certain aspects of the art that may be related to various aspects of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
  • Image processing is widely gaining popularity with the increasing usage of digital images for various purposes. Particularly, one of the most important areas in image processing relates to scene classification that deals with the problem of understanding the context of what is captured in an image. Understanding a holistic view of an image is a relatively difficult task due to the lack of text labels that represent the content present in them. Existing systems and methods for scene recognition have a number of drawbacks and limitations. Existing solutions treat indoor and outdoor scene recognition as two different problems due to the significant variation in appearance characteristics between indoor and outdoor images. It has largely been perceived that different kinds of features would be required to discriminate between indoor scenes and outdoor scenes. This is highly inefficient since different systems and methods are required to be deployed for scene recognition of indoor and outdoor scenes. Current indoor scene recognition systems and methods use part-based models that look for characteristic objects in an image to determine the scene which results in the inaccurate assumption that similar-looking objects distributed spatially in a similar manner, constitute the same scene.
  • Further, the current solutions are unable to effectively address the problem of overfitting caused by the use of real world image datasets (as input to these systems) that capture a lot of intra-class variation, i.e. the significant variation in appearance characteristics of images within each class. Furthermore, existing solutions use hand crafted features to discriminate between images/scenes. However, features that are good for discriminating between some classes may not be good for other classes. Existing approaches to image/scene recognition are incapable of continuously learning from or adapting to the increasing number of images uploaded to the internet every day.
  • Another important aspect of image processing relates to similarity matching of images. With the growing requirement for image recognition technologies, the need for scalable recognition techniques that can handle a large number of classes and continuously learn from internet scale images has become very evident. Unlike searching for textual data, searching for images similar to a particular image is a challenging task. The number of images uploaded on the World Wide Web is increasing every day and it has become extremely difficult to incorporate these newly added images into the search database of existing similarity matching techniques. As discussed above, existing image recognition solutions use hand crafted features to discriminate between images. A major disadvantage of such systems is that it results in a large reconstruction error, i.e. reconstruction of images using these hand crafted features is likely to produce an image very different from the original image.
  • Thus, there is a need for building improved and scalable image processing systems for scene classification and similarity matching that are capable of handling a large number of images/classes.
  • SUMMARY
  • This section is provided to introduce certain objects and aspects of the disclosed methods and systems in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
  • In view of the shortcomings of existing image processing systems and methods, as discussed in the background section, it is an object of the present invention to provide systems and methods for image processing that facilitates scene recognition while minimizing the false positives. It is another object of the present invention to facilitate large scale recognition of indoor and outdoor scenes. Another object of the invention is to provide systems and methods for scene classification that helps in achieving translational invariance. Yet another object of the invention is to provide image processing systems and methods that efficiently provides images similar to a query image. Another object of the invention is to facilitate similarity matching that minimizes reconstruction error.
  • In view of these and other objects, one aspect of the present disclosure relates to a method for scene classification for an image. The method begins with receiving at least one image and classifying said image into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. The method further comprises validating the at least one intermediate output category based on said extracted characteristic features and providing a scene classification based on said validation.
  • Another aspect of the invention relates to a system for scene classification of an image, the system comprising a receiver unit for receiving at least one image for classification and a base classifier, associated with said receiver unit, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. The system further comprises a binary classifier associated with said base classifier, for providing a scene classification for said at least one image based on validation of the at least one intermediate output category, wherein said validation is based on said extracted characteristic features.
  • Yet another aspect of the disclosure relates to a method for providing images similar to a query image from within a set of images. The method comprises receiving a query image from the user and providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image. The method further comprises reducing the dimensionality of the extracted feature vector to form a reduced dimensional feature vector and subsequently splitting the reduced dimensional feature vector into a plurality of query feature segments. Lastly, each of said query image segments are compared with segments of the set of images and images similar to said query image is provided based on this comparison.
  • Yet another aspect of the invention relates to a system for providing images similar to a query image from within a set of images. The system comprises an input module for receiving a query image from the user and a base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image. The system further comprises a reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments. The system also comprises a comparison module associated with said reduction module, for providing images similar to said query image, based on a comparison between each of said query image segments and segments of the set of images.
  • BRIEF DESCRIPTION OF DRAWINGS
  • The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Some drawings may indicate the components using block diagrams. It will be appreciated that disclosure of such block diagrams include disclosure of the internal sub-components of these components as discussed in the detailed description.
  • FIG. 1 illustrates a block diagram of a system for scene classification, in accordance with exemplary embodiments of the present invention.
  • FIG. 2 illustrates a general example of a neural network.
  • FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention.
  • FIG. 4 illustrates an exemplary list of categories/classes, in accordance with exemplary embodiments of the present invention.
  • FIG. 5 illustrates a method for scene classification, in accordance with exemplary embodiments of the present invention.
  • FIG. 6 illustrates a system for similarity matching, in accordance with exemplary embodiments of the present invention.
  • FIG. 7 illustrates a method for similarity matching, in accordance with exemplary embodiments of the present invention.
  • The foregoing will be apparent from the following more detailed description of example embodiments of the present disclosure, as illustrated in the accompanying drawings.
  • DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION
  • In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that the disclosed embodiments may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. Although headings are provided, information related to a particular heading, but not found in the section having that heading, may also be found elsewhere in the specification. Further, information provided under a particular heading may not necessarily be a part of only the section having that heading.
  • System Overview
  • Systems and methods for image processing in accordance with example embodiments of the present disclosure are described. In general, the systems and methods disclosed herein facilitate scene classification and similarity matching for an image. As shown in FIG. 1, the system for scene classification, in accordance with example embodiments of the present disclosure, comprises of a transceiver unit 102 associated with a base classifier 104 and a binary classifier 106, all connected to a data repository 108. The transceiver unit 102 is configured to receive at least one image for classification, wherein the image may be received from a user or any other system. The transceiver unit 102 is configured to pre-process this information and provide it to the base classifier 104.
  • The base classifier 104 comprises of a deep convolutional neural network (DCNN), wherein the DCNN is configured to determine at least one intermediate output category for said image and extract one or more characteristic features thereof. The intermediate output category may be chosen from a set of pre-defined output categories. The intermediate output category determined by the DCNN is one of an indoor category, an outdoor category and a combination thereof. In an embodiment, the extracted one or more characteristic features may be the output feature vector produced by the first fully connected layer of the DCNN. The DCNN is a large neural network configured to deal with two dimensional input data such as an image and perform 2D convolutional operation on such input. The DCNN has been discussed in detail with reference to FIG. 3.
  • The binary classifier 106 associated with said base classifier 104 is configured to validate the intermediate output category provided by the base classifier based on the extracted characteristic features of an image and provide a scene classification for the image based on said validation. The invention encompasses a binary classifier capable of predicting that the image does not belong to any scene classification. The invention encompasses a binary classifier capable of predicting more than one scene category/classification to an image received from the user. In a preferred embodiment, the system comprises of one binary classifier for each of the intermediate output categories pre-defined in the system. The invention encompasses a binary classifier that is a Support Vector Machine (SVM).
  • The data repository 108 is configured to store the intermediate output category and the extracted characteristic features of the image received from the base classifier 104. The data repository 108 is further configured to store the scene classification provided by the binary classifier 106.
  • FIG. 2 illustrates a simplified example of an artificial neural network comprising of multiple nodes and connections between them. The nodes 202 (i) are referred to as input nodes and are configured to receive raw data in the form of inputs that trigger the nodes it is connected to. Each connection as shown in FIG. 2 has a corresponding value referred to as weights, wherein these weights may indicate the importance of these connections/nodes. In an embodiment, when a node is triggered by two or more nodes, the input taken by the node depends upon the weight of the connections from which the node is triggered. The nodes 204 (j) are referred to as intermediate/hidden nodes and are configured to accept input from one or more input nodes and provide output to one or more hidden nodes and/or output nodes 206 (k). The input nodes 202 (i) form an input layer, the hidden nodes 204 (j) form one or more hidden layers and the output nodes 206 (k) form an output layer. Although only one hidden layer has been shown in FIG. 2, it will be appreciated that any number of hidden layers may be implemented in the artificial neural network depending upon the complexity of the decision to be made, the dimensionality of the input data and size of the dataset used for training. In a deep neural network, large number of hidden layers are stacked one above the other, wherein each layer computes a non-linear transformation of the outputs from the previous layer.
  • To facilitate generation/prediction of outputs with minimal error, the deep neural network are required to be trained or customized, wherein a large set of inputs are provided to the neural network, outputs for said inputs are computed and the network weights are adjusted based on the error, if any, in the outputs. In an embodiment, the training of neural networks encompasses by the invention is performed using backpropagation, wherein random weights/values are assigned to each connection followed by computing a set of outputs for a given set of inputs using said random weights. A desired output of each of the inputs is defined and is compared with the calculated output, wherein the difference between the two values may be referred to as an error in the network. Subsequently, weights for each layer are adjusted based on the computed error for each layer. Thus, using this method of backpropagation a network can be appropriately trained to minimize errors in the output.
  • FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention. Deep Convolutional Neural Networks are neural nets that are specifically designed to deal with 2D input data and their patterns of translation, scale and distortion variances. The DCNN is configured to perform a 2D convolution operation where the input to a neuron/node is the output of a 2D kernel operating on a local window of neurons from the previous layer. The same 2D kernel is applied throughout the layer, resulting in weights being shared by multiple connections in the network.
  • As shown in FIG. 3, the DCNN comprises of several types of layers such as the convolution layer, pooling layers and fully connected layers. The convolution layer is configured to perform individual convolution operations and send the output of each such operation to one or more nodes in the next layer. In a preferred embodiment, the output of the convolution operations is transferred to only some nodes in the next layer. The pooling layer is configured to perform aggregate operations like max/average over neighboring windows of outputs. The pooling layer has no weights and just aggregates values over the receptive field. A fully connected layer is the traditional layer type where each node is connected to every node in the next layer.
  • In addition to the backpropagation technique for training as discussed above, the DCNN networks may be trained using techniques such as rectified linear units, local response normalization, and parallelization on a GPU. In a preferred embodiment, the weight update rule is as follows:
  • v i + 1 = 0.9 · v i - 0.0005 · ϵ · w i - ϵ · L w | w i D i
  • Wherein i is the iteration number, w is the weight vector, Di is the training data sampled for training in iteration l and v is the weight update.
  • In an exemplary embodiment, as shown in FIG. 3, the DCNN comprises of 5 convolution layers, 3 pooling layers and 3 fully connected layers. The input layer is configured to read/receive raw RGB data from an image and the output layer is configured to produce intermediate output categories probabilities as outputs. As shown, in a preferred embodiment, the first convolution layer has 96 kernels of size 11×11×3, second convolution layer has 256 kernels of size 5×5×96, third convolution layer has 384 kernels of size 3×3×256, fourth convolution layer has 384 kernels of size 3×3×384 and fifth convolution layer has 256 kernels of size 3×3×384. All three pooling layers use a kernel of size 3×3 with a stride of 2 and the two fully connected layers comprise of 4096 nodes/neurons each.
  • As shown in FIG. 6, the system for similarity matching, i.e. for providing images similar to an query image from within a set of images, comprises an input/output module 602 connected to a base classifier 604 further connected to a reduction module 606 and comparison module 608, all connected to a central database 610. The I/O module 602 is configured to receive a query image, wherein the query image may be received from a user input or as part of a request made by another system. The I/O module 602 is also configured to store the query image into the central database 610 and provide the same to the base classifier 604, wherein the base classifier 604 comprises of a deep convolutional neural network (DCNN). The DCNN extracts a feature vector of said query image and provides it to the reduction module 606, wherein in an embodiment the extracted feature vector is a 4096 dimensional vector. The base classifier 604 stores this extracted feature vector in the central database 610.
  • The reduction module 606 comprises of an auto-encoder (not shown in FIG. 6) configured to reduce the dimensionality of said feature vector to form a reduced dimensional feature vector. In an embodiment, the auto encoder converts the 4096 dimensional feature vector into 128 reduced dimensional feature vector. The auto encoder is an artificial neural network trained in a supervised error back propagation manner same as or at least substantially similar to that discussed hereinabove. The auto encoder is configured to adjust/learn a set of weights that minimize the reconstruction error, wherein reconstruction error captures the similarity between the image reconstructed from a reduced dimensional vector and the original query image. Minimizing the reconstruction error ensures that similar vectors in higher dimensional space are also similar in lower dimensional space and vice versa. In an embodiment, the auto encoder was trained with the network architecture consisting of the following sequence of dimensions: 4096-1024-512-128-32-128-512-1024.
  • The reduction module 606 is further configured to split the reduced dimensional feature vector into a plurality of query image segments and provide the same to the comparison module 608. In a preferred embodiment, the reduction module 606 splits the 128 bit reduced dimensional feature vector into 16 segments of 8 bits each.
  • The comparison module 608 is configured to compare the query image segments with the segments of set of images stored in the central database 610 and provide images similar to said query image based on this comparison, wherein the query image itself is excluded from the set of images used for this comparison. The comparison module 608 is further configured to compare the original query image with the similar images and re-rank the similar images based on this comparison. In other words, similar images computed on the basis of the comparison using smaller reduced dimensionality feature vectors are re-ranked by the comparison module 608 based on a comparison between the feature vector of the query image and the feature vectors of the similar images.
  • The central database 610 is configured to store a large set of images indexed as 8 bit segments using hash tables. For every image in the set of images, the central database is configured to store a 128-bit representation of the image computed by the reduction module 606, the original 4096 dimensional feature vector of the image extracted by the base classifier 604 and the image itself. In an embodiment, the 128-bit vector is split into 16 segments of 8 bits each. The central database 610 is configured to maintain 16 hash tables, one for each segment, wherein the hash table corresponding to segment “i” uses the integer computed from those 8 bits as the hash key and the image as the value. The central database 610 is further configured to store the input query images, the feature vectors, the reduced dimensionality feature vectors, comparison results and any other data/information used or processed by the system or any modules/components thereof.
  • Method Overview
  • As shown in FIG. 5, the method for scene classification for an image, in accordance with example embodiments of the present invention begins at step 502, wherein at least one image is received for classification pursuant to which it is pre-processed to alter the size, dimensionality of the image, extract the RGB data from said image, etc. Next, at step 504, the received and/or pre-processed image is classified into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. Extracting one or more characteristic features of said image includes extracting characteristic features from a fully connected layer of the DCNN. Determining an intermediate output category includes passing the input image to the DCNN, computing features in each layer of said DCNN and propagating said features to the subsequent layers of the DCNN, at least one non-linear transformation of the received image is computed during this step.
  • Subsequently, at step 506, the intermediate output category determined at step 504 is validated based on the extracted characteristic features. Validating the intermediate output category includes passing the characteristics features of the image as an input to the binary classifier, wherein if the binary classifier returns a positive output then the scene classification for the input image is the same as the predicted intermediate output category, otherwise the scene classification for the input image is ‘none of the pre-defined scene classes’. Based on said validation a scene classification is provided for said image.
  • The method also encompasses reducing overfitting during the process of scene classification, wherein the overfitting may be reduced by performing data augmentation by replicating data/images through the introduction of small variations and augmenting this to the training dataset. In an embodiment, data augmentation is performed by extracting multiple patches of 227×227 from the training image of 256×256. In another embodiment, data augmentation is performed by performing PCA on R, G, B pixel values over the training data for each pixel and extracting the top 16 principal components. Subsequently, these components are multiplied with a random small factor and the weighed principal components are added to each image to get more replications. The invention also encompasses reducing overfitting by dropping the inputs from certain randomly chosen nodes during training. In an embodiment, when an image is processed, the outputs of all neurons are taken in but they are multiplied with a factor to account for dropout, for instance if 50% of the inputs are dropped out, the outputs of all neurons are multiplied with a factor of 0.5 to weigh their contributions in the training process. The method also encompasses reducing overfitting by early termination wherein the learning rate is dropped by a factor of 10 whenever the training error increases or is constant for a sustained number of iterations.
  • As shown in FIG. 7, the method for providing images similar to a query image from within a set of images, in accordance with example embodiments of the invention, begins with receiving a query image at step 702. Next, at step 704, the received input query image is provided as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image. In an embodiment, the extracted feature vector is an output of a fully connected layer of the DCNN.
  • At step 706, the dimensionality of said feature vector is reduced to form a reduced dimensional feature vector. The invention encompasses reducing dimensionality by providing the feature vector extracted by the DCNN as an input to the Deep Belief Network (DBN), wherein the DBN comprises layer-wise Restricted Boltzmann Machines (RBM). In an embodiment, the layer-wise RBMs are built for the following configurations: 4096 to 1024, 1024 to 512, 512 to 128 and 128 to 32 dimensions and are trained through contrastive divergence approach. The invention also encompasses reducing dimensionality by providing the feature vector as an input to an auto encoder artificial neural network, wherein the auto encoder converts the feature vector into a reduced dimensionality feature vector.
  • As discussed in the system overview, the invention encompasses use of a hashing scheme to index all the images stored in the database, i.e. the set of images within which similarity search is to be performed. Each image stored in the database is split into 16 segments of 8-bits, wherein each segment is stored in a hash table. At step 708, the reduced dimensional feature vector of the query image is also split/divided into a plurality of segments, preferably into 16 segments of 8-bits each, and each such segment is provided to one of the 16 hash tables. Subsequently, such query image segments are compared with segments of the set of images at step 710. Comparing query image segments with segments of set of images includes providing each segment to one node, wherein each node looks up its specific hash table and returns images matching the 8 bit segment that was sent to that node.
  • Step 710 provides images similar to said query image based on this comparison, wherein similar images are those for which the distance from the query image is less than a pre-defined threshold.
  • At step 712, the similar images computed in the above step are then re-ranked based on their distance from the query image in the original dimensional space. In an embodiment, hamming distance is used as a distance measure. This distance search is first performed in the reduced dimensional space (such as 128 bit dimensional space) and then the obtained results are re-ranked according to the distances in the original dimensional space (such as the 4096 bit dimensional space).
  • The disclosed methods and systems may be implemented on a Graphics Processing Unit (GPU). In an embodiment, the systems are implemented on NVIDIA Tesla M2070 GPU card with 2880 CUDA cores, an Intel Xeon X5675 CPU and 5375 MB of Video RAM.
  • While various embodiments of the image processing methods and systems have been disclosed, it should be apparent to those skilled in the art many more modifications, besides those described, are possible without departing from the inventive concepts herein. The embodiments, therefore, are not to be restricted. Moreover, in interpreting the disclosure, all terms should be understood in the broadest possible manner consistent with the context. It will be appreciated by those skilled in the art that any of the aforementioned steps and/or system modules may be suitably replaced, reordered, or removed, and additional steps and/or system modules may be inserted. Further, the invention encompasses embodiments for hardware, software, or a combination thereof.

Claims (7)

We claim:
1-9. (canceled)
10. A method providing images similar to a query image from within a set of images, the method comprising:
Receiving a query image;
Providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image;
Reducing dimensionality of said feature vector to form a reduced dimensional feature vector;
Splitting the reduced dimensional feature vector into a plurality of query image segments; and
providing images similar to said query image based on a comparison between each of said query image segments and segments of the set of images.
11. The method of claim 10 wherein reducing dimensionality of said feature vector comprises of providing said feature vector as an input to an auto-encoder, wherein the auto-encoder processes said feature vector to form a reduced dimensional feature vector.
12. The method of claim 10 further comprising ranking the one or more similar images based on a distance between the query image and the similar images.
13. The method of claim 10 wherein the distance between the query image and the provided similar images is less than a predefined threshold.
14. A system for providing images similar to a query image from within a set of images, the system comprising:
An input module for receiving a query image;
A base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image;
A reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments; and
A comparison module associated with said reduction module, for providing images similar to said query image based on a comparison between each of said query image segments and segments of the set of images.
15. The system of claim 14 further comprising a central database storing the set of images in the form of plurality of image segments, wherein each segment is stored in a hash table.
US15/879,343 2015-06-03 2018-01-24 Systems and methods for image processing Abandoned US20180204062A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/879,343 US20180204062A1 (en) 2015-06-03 2018-01-24 Systems and methods for image processing

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201562170451P 2015-06-03 2015-06-03
US15/172,139 US10095950B2 (en) 2015-06-03 2016-06-02 Systems and methods for image processing
US15/879,343 US20180204062A1 (en) 2015-06-03 2018-01-24 Systems and methods for image processing

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US15/172,139 Division US10095950B2 (en) 2015-06-03 2016-06-02 Systems and methods for image processing

Publications (1)

Publication Number Publication Date
US20180204062A1 true US20180204062A1 (en) 2018-07-19

Family

ID=57451115

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/172,139 Active US10095950B2 (en) 2015-06-03 2016-06-02 Systems and methods for image processing
US15/879,343 Abandoned US20180204062A1 (en) 2015-06-03 2018-01-24 Systems and methods for image processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US15/172,139 Active US10095950B2 (en) 2015-06-03 2016-06-02 Systems and methods for image processing

Country Status (1)

Country Link
US (2) US10095950B2 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241316A (en) * 2018-08-30 2019-01-18 北京旷视科技有限公司 Image search method, device, electronic equipment and storage medium
CN109886303A (en) * 2019-01-21 2019-06-14 武汉大学 A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing
US20190236774A1 (en) * 2018-01-30 2019-08-01 General Electric Company Systems and methods for capturing deep learning training data from imaging systems
CN110163286A (en) * 2019-05-24 2019-08-23 常熟理工学院 Hybrid pooling-based domain adaptive image classification method
WO2020093210A1 (en) * 2018-11-05 2020-05-14 中国科学院计算技术研究所 Scene segmentation method and system based on contenxtual information guidance
WO2021057046A1 (en) * 2019-09-24 2021-04-01 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image hash for fast photo search
WO2021115115A1 (en) * 2019-12-09 2021-06-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Zero-shot dynamic embeddings for photo search

Families Citing this family (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9753949B1 (en) 2016-03-14 2017-09-05 Shutterstock, Inc. Region-specific image download probability modeling
CN106021364B (en) * 2016-05-10 2017-12-12 百度在线网络技术(北京)有限公司 Foundation, image searching method and the device of picture searching dependency prediction model
DE102016223484B4 (en) * 2016-11-25 2021-04-15 Fujitsu Limited Determine Similarities in Computer Software Codes for Performance Analysis
KR102242623B1 (en) * 2016-12-15 2021-04-20 주식회사 엘지유플러스 Method and Apparatus For Processing Purchase Request Using The Wire-Wireless Communication Network
CN106599926A (en) * 2016-12-20 2017-04-26 上海寒武纪信息科技有限公司 Expression picture pushing method and system
KR101900180B1 (en) 2017-01-11 2018-09-18 포항공과대학교 산학협력단 Imgae analysis method for extracting feature of image and apparatus therefor
US10163043B2 (en) * 2017-03-31 2018-12-25 Clarifai, Inc. System and method for facilitating logo-recognition training of a recognition model
CN106886801B (en) 2017-04-14 2021-12-17 北京图森智途科技有限公司 Image semantic segmentation method and device
CN107220325A (en) * 2017-05-22 2017-09-29 华中科技大学 A kind of similar icon search methods of APP based on convolutional neural networks and system
US11126894B2 (en) * 2017-06-05 2021-09-21 Siemens Aktiengesellschaft Method and apparatus for analysing an image
CN107274378B (en) * 2017-07-25 2020-04-03 江西理工大学 Image fuzzy type identification and parameter setting method based on fusion memory CNN
CN109420622A (en) * 2017-08-27 2019-03-05 南京理工大学 Tobacco leaf method for sorting based on convolutional neural networks
GB2566257A (en) * 2017-08-29 2019-03-13 Sky Cp Ltd System and method for content discovery
EP3457324A1 (en) 2017-09-15 2019-03-20 Axis AB Method for locating one or more candidate digital images being likely candidates for depicting an object
US10970552B2 (en) * 2017-09-28 2021-04-06 Gopro, Inc. Scene classification for image processing
CN108399409B (en) * 2018-01-19 2019-06-18 北京达佳互联信息技术有限公司 Image classification method, device and terminal
CN108829692B (en) * 2018-04-09 2019-12-20 华中科技大学 Flower image retrieval method based on convolutional neural network
US10769261B2 (en) * 2018-05-09 2020-09-08 Futurewei Technologies, Inc. User image verification
CN108765425B (en) * 2018-05-15 2022-04-22 深圳大学 Image segmentation method and device, computer equipment and storage medium
US11409994B2 (en) 2018-05-15 2022-08-09 Shenzhen University Methods for image segmentation, computer devices, and storage mediums
CN110580487A (en) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 Neural network training method, neural network construction method, image processing method and device
CN109033172B (en) * 2018-06-21 2021-12-17 西安理工大学 Image retrieval method for deep learning and approximate target positioning
CN108983187B (en) * 2018-07-11 2022-07-15 西安电子科技大学 Online radar target identification method based on EWC
CN109446887B (en) * 2018-09-10 2022-03-25 易诚高科(大连)科技有限公司 Image scene description generation method for subjective evaluation of image quality
CN109597851B (en) * 2018-09-26 2023-03-21 创新先进技术有限公司 Feature extraction method and device based on incidence relation
US11222233B2 (en) 2018-09-26 2022-01-11 Samsung Electronics Co., Ltd. Method and apparatus for multi-category image recognition
RU2706960C1 (en) * 2019-01-25 2019-11-22 Самсунг Электроникс Ко., Лтд. Computationally efficient multi-class image recognition using successive analysis of neural network features
KR20200051278A (en) 2018-11-05 2020-05-13 삼성전자주식회사 Method of managing task in artificial neural network and system comprising the same
CN109685116B (en) * 2018-11-30 2022-12-30 腾讯科技(深圳)有限公司 Image description information generation method and device and electronic device
US20200192932A1 (en) * 2018-12-13 2020-06-18 Sap Se On-demand variable feature extraction in database environments
CN109639739B (en) * 2019-01-30 2020-05-19 大连理工大学 Abnormal flow detection method based on automatic encoder network
WO2020181098A1 (en) * 2019-03-05 2020-09-10 Memorial Sloan Kettering Cancer Center Systems and methods for image classification using visual dictionaries
CN110188827B (en) * 2019-05-29 2020-11-03 创意信息技术股份有限公司 Scene recognition method based on convolutional neural network and recursive automatic encoder model
CN110287800B (en) * 2019-05-29 2022-08-16 河海大学 Remote sensing image scene classification method based on SGSE-GAN
CN110211127B (en) * 2019-08-01 2019-11-26 成都考拉悠然科技有限公司 Image partition method based on bicoherence network
CN110689077A (en) * 2019-09-29 2020-01-14 福建师范大学 Novel digital image classification method
US11645611B1 (en) * 2019-12-23 2023-05-09 Blue Yonder Group, Inc. System and method of decoding supply chain signatures
US11487808B2 (en) 2020-02-17 2022-11-01 Wipro Limited Method and system for performing an optimized image search
CN111428785B (en) * 2020-03-23 2023-04-07 厦门大学 Puffer individual identification method based on deep learning
CN111428739B (en) * 2020-04-14 2023-08-25 图觉(广州)智能科技有限公司 High-precision image semantic segmentation method with continuous learning capability
KR102208685B1 (en) * 2020-07-23 2021-01-28 주식회사 어반베이스 Apparatus and method for developing space analysis model based on data augmentation

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8644624B2 (en) * 2009-07-28 2014-02-04 Samsung Electronics Co., Ltd. System and method for indoor-outdoor scene classification
US8873840B2 (en) * 2010-12-03 2014-10-28 Microsoft Corporation Reducing false detection rate using local pattern based post-filter
US9524450B2 (en) * 2015-03-04 2016-12-20 Accenture Global Services Limited Digital image processing using convolutional neural networks

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190236774A1 (en) * 2018-01-30 2019-08-01 General Electric Company Systems and methods for capturing deep learning training data from imaging systems
US10679346B2 (en) * 2018-01-30 2020-06-09 General Electric Company Systems and methods for capturing deep learning training data from imaging systems
CN109241316A (en) * 2018-08-30 2019-01-18 北京旷视科技有限公司 Image search method, device, electronic equipment and storage medium
WO2020093210A1 (en) * 2018-11-05 2020-05-14 中国科学院计算技术研究所 Scene segmentation method and system based on contenxtual information guidance
CN109886303A (en) * 2019-01-21 2019-06-14 武汉大学 A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing
CN110163286A (en) * 2019-05-24 2019-08-23 常熟理工学院 Hybrid pooling-based domain adaptive image classification method
WO2021057046A1 (en) * 2019-09-24 2021-04-01 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Image hash for fast photo search
WO2021115115A1 (en) * 2019-12-09 2021-06-17 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Zero-shot dynamic embeddings for photo search

Also Published As

Publication number Publication date
US10095950B2 (en) 2018-10-09
US20160358024A1 (en) 2016-12-08

Similar Documents

Publication Publication Date Title
US10095950B2 (en) Systems and methods for image processing
WO2021042828A1 (en) Neural network model compression method and apparatus, and storage medium and chip
US10438091B2 (en) Method and apparatus for recognizing image content
US20220044094A1 (en) Method and apparatus for constructing network structure optimizer, and computer-readable storage medium
US9400918B2 (en) Compact face representation
CN111198959B (en) Two-stage image retrieval method based on convolutional neural network
Lai et al. Instance-aware hashing for multi-label image retrieval
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
Pan et al. Deepfood: Automatic multi-class classification of food ingredients using deep learning
Faraki et al. Fisher tensors for classifying human epithelial cells
Bhateja et al. Iris recognition based on sparse representation and k-nearest subspace with genetic algorithm
JP7250126B2 (en) Computer architecture for artificial image generation using autoencoders
CN111091175A (en) Neural network model training method, neural network model classification method, neural network model training device and electronic equipment
CN108804617B (en) Domain term extraction method, device, terminal equipment and storage medium
CN111444765B (en) Image re-identification method, training method of related model, related device and equipment
US11593619B2 (en) Computer architecture for multiplier-less machine learning
Chadha et al. Voronoi-based compact image descriptors: Efficient region-of-interest retrieval with VLAD and deep-learning-based descriptors
CN112507800A (en) Pedestrian multi-attribute cooperative identification method based on channel attention mechanism and light convolutional neural network
EP4115321A1 (en) Systems and methods for fine tuning image classification neural networks
CN114491115B (en) Multi-model fusion integrated image retrieval method based on deep hash
WO2022063076A1 (en) Adversarial example identification method and apparatus
CN108229505A (en) Image classification method based on FISHER multistage dictionary learnings
CN111026887A (en) Cross-media retrieval method and system
Okokpujie et al. Predictive modeling of trait-aging invariant face recognition system using machine learning
Kopčan et al. Anomaly detection using Autoencoders and Deep Convolution Generative Adversarial Networks

Legal Events

Date Code Title Description
AS Assignment

Owner name: HYPERVERGE INC., CALIFORNIA

Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNORS:KRISHNAKUMAR, VIGNESH;SRIDHARASINGAN, HARIPRASAD PRAYAGAI;TADIMARI, ADARSH AMARENDRA;AND OTHERS;REEL/FRAME:045024/0289

Effective date: 20160602

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION