US20180204062A1 - Systems and methods for image processing - Google Patents
Systems and methods for image processing Download PDFInfo
- Publication number
- US20180204062A1 US20180204062A1 US15/879,343 US201815879343A US2018204062A1 US 20180204062 A1 US20180204062 A1 US 20180204062A1 US 201815879343 A US201815879343 A US 201815879343A US 2018204062 A1 US2018204062 A1 US 2018204062A1
- Authority
- US
- United States
- Prior art keywords
- images
- query image
- image
- feature vector
- dcnn
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 title abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims description 50
- 239000000284 extract Substances 0.000 claims description 11
- 230000009467 reduction Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 4
- 238000010420 art technique Methods 0.000 abstract 1
- 238000013528 artificial neural network Methods 0.000 description 11
- 238000012549 training Methods 0.000 description 11
- 210000002569 neuron Anatomy 0.000 description 5
- 238000011176 pooling Methods 0.000 description 5
- 238000010200 validation analysis Methods 0.000 description 5
- 238000013434 data augmentation Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical group OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G06K9/00684—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G06K9/00624—
-
- G06K9/4609—
-
- G06K9/4619—
-
- G06K9/4671—
-
- G06K9/6232—
-
- G06K9/6277—
-
- G06K9/66—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19127—Extracting features by transforming the feature space, e.g. multidimensional scaling; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/192—Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
- G06V30/194—References adjustable by an adaptive method, e.g. learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- the present invention relates to image processing systems and methods and more particularly to systems and methods for image scene classification and similarity matching.
- Image processing is widely gaining popularity with the increasing usage of digital images for various purposes. Particularly, one of the most important areas in image processing relates to scene classification that deals with the problem of understanding the context of what is captured in an image. Understanding a holistic view of an image is a relatively difficult task due to the lack of text labels that represent the content present in them.
- Existing systems and methods for scene recognition have a number of drawbacks and limitations. Existing solutions treat indoor and outdoor scene recognition as two different problems due to the significant variation in appearance characteristics between indoor and outdoor images. It has largely been perceived that different kinds of features would be required to discriminate between indoor scenes and outdoor scenes. This is highly inefficient since different systems and methods are required to be deployed for scene recognition of indoor and outdoor scenes. Current indoor scene recognition systems and methods use part-based models that look for characteristic objects in an image to determine the scene which results in the inaccurate assumption that similar-looking objects distributed spatially in a similar manner, constitute the same scene.
- the current solutions are unable to effectively address the problem of overfitting caused by the use of real world image datasets (as input to these systems) that capture a lot of intra-class variation, i.e. the significant variation in appearance characteristics of images within each class.
- existing solutions use hand crafted features to discriminate between images/scenes. However, features that are good for discriminating between some classes may not be good for other classes.
- Existing approaches to image/scene recognition are incapable of continuously learning from or adapting to the increasing number of images uploaded to the internet every day.
- Another important aspect of image processing relates to similarity matching of images.
- image processing relates to similarity matching of images.
- image recognition technologies With the growing requirement for image recognition technologies, the need for scalable recognition techniques that can handle a large number of classes and continuously learn from internet scale images has become very evident. Unlike searching for textual data, searching for images similar to a particular image is a challenging task. The number of images uploaded on the World Wide Web is increasing every day and it has become extremely difficult to incorporate these newly added images into the search database of existing similarity matching techniques.
- existing image recognition solutions use hand crafted features to discriminate between images. A major disadvantage of such systems is that it results in a large reconstruction error, i.e. reconstruction of images using these hand crafted features is likely to produce an image very different from the original image.
- an object of the present invention to provide systems and methods for image processing that facilitates scene recognition while minimizing the false positives. It is another object of the present invention to facilitate large scale recognition of indoor and outdoor scenes. Another object of the invention is to provide systems and methods for scene classification that helps in achieving translational invariance. Yet another object of the invention is to provide image processing systems and methods that efficiently provides images similar to a query image. Another object of the invention is to facilitate similarity matching that minimizes reconstruction error.
- one aspect of the present disclosure relates to a method for scene classification for an image.
- the method begins with receiving at least one image and classifying said image into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image.
- the method further comprises validating the at least one intermediate output category based on said extracted characteristic features and providing a scene classification based on said validation.
- DCNN deep convolutional neural network
- Another aspect of the invention relates to a system for scene classification of an image, the system comprising a receiver unit for receiving at least one image for classification and a base classifier, associated with said receiver unit, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image.
- the system further comprises a binary classifier associated with said base classifier, for providing a scene classification for said at least one image based on validation of the at least one intermediate output category, wherein said validation is based on said extracted characteristic features.
- DCNN deep convolutional neural network
- Yet another aspect of the disclosure relates to a method for providing images similar to a query image from within a set of images.
- the method comprises receiving a query image from the user and providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image.
- the method further comprises reducing the dimensionality of the extracted feature vector to form a reduced dimensional feature vector and subsequently splitting the reduced dimensional feature vector into a plurality of query feature segments.
- each of said query image segments are compared with segments of the set of images and images similar to said query image is provided based on this comparison.
- Yet another aspect of the invention relates to a system for providing images similar to a query image from within a set of images.
- the system comprises an input module for receiving a query image from the user and a base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image.
- the system further comprises a reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments.
- the system also comprises a comparison module associated with said reduction module, for providing images similar to said query image, based on a comparison between each of said query image segments and segments of the set of images.
- FIG. 1 illustrates a block diagram of a system for scene classification, in accordance with exemplary embodiments of the present invention.
- FIG. 2 illustrates a general example of a neural network.
- FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention.
- FIG. 4 illustrates an exemplary list of categories/classes, in accordance with exemplary embodiments of the present invention.
- FIG. 5 illustrates a method for scene classification, in accordance with exemplary embodiments of the present invention.
- FIG. 6 illustrates a system for similarity matching, in accordance with exemplary embodiments of the present invention.
- FIG. 7 illustrates a method for similarity matching, in accordance with exemplary embodiments of the present invention.
- the system for scene classification in accordance with example embodiments of the present disclosure, comprises of a transceiver unit 102 associated with a base classifier 104 and a binary classifier 106 , all connected to a data repository 108 .
- the transceiver unit 102 is configured to receive at least one image for classification, wherein the image may be received from a user or any other system.
- the transceiver unit 102 is configured to pre-process this information and provide it to the base classifier 104 .
- the base classifier 104 comprises of a deep convolutional neural network (DCNN), wherein the DCNN is configured to determine at least one intermediate output category for said image and extract one or more characteristic features thereof.
- the intermediate output category may be chosen from a set of pre-defined output categories.
- the intermediate output category determined by the DCNN is one of an indoor category, an outdoor category and a combination thereof.
- the extracted one or more characteristic features may be the output feature vector produced by the first fully connected layer of the DCNN.
- the DCNN is a large neural network configured to deal with two dimensional input data such as an image and perform 2D convolutional operation on such input. The DCNN has been discussed in detail with reference to FIG. 3 .
- the binary classifier 106 associated with said base classifier 104 is configured to validate the intermediate output category provided by the base classifier based on the extracted characteristic features of an image and provide a scene classification for the image based on said validation.
- the invention encompasses a binary classifier capable of predicting that the image does not belong to any scene classification.
- the invention encompasses a binary classifier capable of predicting more than one scene category/classification to an image received from the user.
- the system comprises of one binary classifier for each of the intermediate output categories pre-defined in the system.
- the invention encompasses a binary classifier that is a Support Vector Machine (SVM).
- SVM Support Vector Machine
- the data repository 108 is configured to store the intermediate output category and the extracted characteristic features of the image received from the base classifier 104 .
- the data repository 108 is further configured to store the scene classification provided by the binary classifier 106 .
- FIG. 2 illustrates a simplified example of an artificial neural network comprising of multiple nodes and connections between them.
- the nodes 202 ( i ) are referred to as input nodes and are configured to receive raw data in the form of inputs that trigger the nodes it is connected to.
- Each connection as shown in FIG. 2 has a corresponding value referred to as weights, wherein these weights may indicate the importance of these connections/nodes.
- weights when a node is triggered by two or more nodes, the input taken by the node depends upon the weight of the connections from which the node is triggered.
- the nodes 204 ( j ) are referred to as intermediate/hidden nodes and are configured to accept input from one or more input nodes and provide output to one or more hidden nodes and/or output nodes 206 ( k ).
- the input nodes 202 ( i ) form an input layer
- the hidden nodes 204 ( j ) form one or more hidden layers
- the output nodes 206 ( k ) form an output layer.
- FIG. 2 it will be appreciated that any number of hidden layers may be implemented in the artificial neural network depending upon the complexity of the decision to be made, the dimensionality of the input data and size of the dataset used for training.
- large number of hidden layers are stacked one above the other, wherein each layer computes a non-linear transformation of the outputs from the previous layer.
- the deep neural network are required to be trained or customized, wherein a large set of inputs are provided to the neural network, outputs for said inputs are computed and the network weights are adjusted based on the error, if any, in the outputs.
- the training of neural networks encompasses by the invention is performed using backpropagation, wherein random weights/values are assigned to each connection followed by computing a set of outputs for a given set of inputs using said random weights.
- a desired output of each of the inputs is defined and is compared with the calculated output, wherein the difference between the two values may be referred to as an error in the network.
- weights for each layer are adjusted based on the computed error for each layer.
- FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention.
- Deep Convolutional Neural Networks are neural nets that are specifically designed to deal with 2D input data and their patterns of translation, scale and distortion variances.
- the DCNN is configured to perform a 2D convolution operation where the input to a neuron/node is the output of a 2D kernel operating on a local window of neurons from the previous layer. The same 2D kernel is applied throughout the layer, resulting in weights being shared by multiple connections in the network.
- the DCNN comprises of several types of layers such as the convolution layer, pooling layers and fully connected layers.
- the convolution layer is configured to perform individual convolution operations and send the output of each such operation to one or more nodes in the next layer. In a preferred embodiment, the output of the convolution operations is transferred to only some nodes in the next layer.
- the pooling layer is configured to perform aggregate operations like max/average over neighboring windows of outputs. The pooling layer has no weights and just aggregates values over the receptive field.
- a fully connected layer is the traditional layer type where each node is connected to every node in the next layer.
- the DCNN networks may be trained using techniques such as rectified linear units, local response normalization, and parallelization on a GPU.
- the weight update rule is as follows:
- v i + 1 0.9 ⁇ v i - 0.0005 ⁇ ⁇ ⁇ w i - ⁇ ⁇ ⁇ ⁇ L ⁇ w ⁇
- i is the iteration number
- w is the weight vector
- D i is the training data sampled for training in iteration l
- v is the weight update.
- the DCNN comprises of 5 convolution layers, 3 pooling layers and 3 fully connected layers.
- the input layer is configured to read/receive raw RGB data from an image and the output layer is configured to produce intermediate output categories probabilities as outputs.
- the first convolution layer has 96 kernels of size 11 ⁇ 11 ⁇ 3
- second convolution layer has 256 kernels of size 5 ⁇ 5 ⁇ 96
- third convolution layer has 384 kernels of size 3 ⁇ 3 ⁇ 256
- fourth convolution layer has 384 kernels of size 3 ⁇ 3 ⁇ 384
- fifth convolution layer has 256 kernels of size 3 ⁇ 3 ⁇ 384. All three pooling layers use a kernel of size 3 ⁇ 3 with a stride of 2 and the two fully connected layers comprise of 4096 nodes/neurons each.
- the system for similarity matching i.e. for providing images similar to an query image from within a set of images, comprises an input/output module 602 connected to a base classifier 604 further connected to a reduction module 606 and comparison module 608 , all connected to a central database 610 .
- the I/O module 602 is configured to receive a query image, wherein the query image may be received from a user input or as part of a request made by another system.
- the I/O module 602 is also configured to store the query image into the central database 610 and provide the same to the base classifier 604 , wherein the base classifier 604 comprises of a deep convolutional neural network (DCNN).
- the DCNN extracts a feature vector of said query image and provides it to the reduction module 606 , wherein in an embodiment the extracted feature vector is a 4096 dimensional vector.
- the base classifier 604 stores this extracted feature vector in the central database 610 .
- the reduction module 606 comprises of an auto-encoder (not shown in FIG. 6 ) configured to reduce the dimensionality of said feature vector to form a reduced dimensional feature vector.
- the auto encoder converts the 4096 dimensional feature vector into 128 reduced dimensional feature vector.
- the auto encoder is an artificial neural network trained in a supervised error back propagation manner same as or at least substantially similar to that discussed hereinabove.
- the auto encoder is configured to adjust/learn a set of weights that minimize the reconstruction error, wherein reconstruction error captures the similarity between the image reconstructed from a reduced dimensional vector and the original query image. Minimizing the reconstruction error ensures that similar vectors in higher dimensional space are also similar in lower dimensional space and vice versa.
- the auto encoder was trained with the network architecture consisting of the following sequence of dimensions: 4096-1024-512-128-32-128-512-1024.
- the reduction module 606 is further configured to split the reduced dimensional feature vector into a plurality of query image segments and provide the same to the comparison module 608 .
- the reduction module 606 splits the 128 bit reduced dimensional feature vector into 16 segments of 8 bits each.
- the comparison module 608 is configured to compare the query image segments with the segments of set of images stored in the central database 610 and provide images similar to said query image based on this comparison, wherein the query image itself is excluded from the set of images used for this comparison.
- the comparison module 608 is further configured to compare the original query image with the similar images and re-rank the similar images based on this comparison. In other words, similar images computed on the basis of the comparison using smaller reduced dimensionality feature vectors are re-ranked by the comparison module 608 based on a comparison between the feature vector of the query image and the feature vectors of the similar images.
- the central database 610 is configured to store a large set of images indexed as 8 bit segments using hash tables. For every image in the set of images, the central database is configured to store a 128-bit representation of the image computed by the reduction module 606 , the original 4096 dimensional feature vector of the image extracted by the base classifier 604 and the image itself. In an embodiment, the 128-bit vector is split into 16 segments of 8 bits each.
- the central database 610 is configured to maintain 16 hash tables, one for each segment, wherein the hash table corresponding to segment “i” uses the integer computed from those 8 bits as the hash key and the image as the value.
- the central database 610 is further configured to store the input query images, the feature vectors, the reduced dimensionality feature vectors, comparison results and any other data/information used or processed by the system or any modules/components thereof.
- the method for scene classification for an image begins at step 502 , wherein at least one image is received for classification pursuant to which it is pre-processed to alter the size, dimensionality of the image, extract the RGB data from said image, etc.
- the received and/or pre-processed image is classified into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. Extracting one or more characteristic features of said image includes extracting characteristic features from a fully connected layer of the DCNN. Determining an intermediate output category includes passing the input image to the DCNN, computing features in each layer of said DCNN and propagating said features to the subsequent layers of the DCNN, at least one non-linear transformation of the received image is computed during this step.
- DCNN deep convolutional neural network
- the intermediate output category determined at step 504 is validated based on the extracted characteristic features.
- Validating the intermediate output category includes passing the characteristics features of the image as an input to the binary classifier, wherein if the binary classifier returns a positive output then the scene classification for the input image is the same as the predicted intermediate output category, otherwise the scene classification for the input image is ‘none of the pre-defined scene classes’. Based on said validation a scene classification is provided for said image.
- the method also encompasses reducing overfitting during the process of scene classification, wherein the overfitting may be reduced by performing data augmentation by replicating data/images through the introduction of small variations and augmenting this to the training dataset.
- data augmentation is performed by extracting multiple patches of 227 ⁇ 227 from the training image of 256 ⁇ 256.
- data augmentation is performed by performing PCA on R, G, B pixel values over the training data for each pixel and extracting the top 16 principal components. Subsequently, these components are multiplied with a random small factor and the weighed principal components are added to each image to get more replications.
- the invention also encompasses reducing overfitting by dropping the inputs from certain randomly chosen nodes during training.
- the outputs of all neurons are taken in but they are multiplied with a factor to account for dropout, for instance if 50% of the inputs are dropped out, the outputs of all neurons are multiplied with a factor of 0.5 to weigh their contributions in the training process.
- the method also encompasses reducing overfitting by early termination wherein the learning rate is dropped by a factor of 10 whenever the training error increases or is constant for a sustained number of iterations.
- the method for providing images similar to a query image from within a set of images begins with receiving a query image at step 702 .
- the received input query image is provided as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image.
- the extracted feature vector is an output of a fully connected layer of the DCNN.
- the dimensionality of said feature vector is reduced to form a reduced dimensional feature vector.
- the invention encompasses reducing dimensionality by providing the feature vector extracted by the DCNN as an input to the Deep Belief Network (DBN), wherein the DBN comprises layer-wise Restricted Boltzmann Machines (RBM).
- the layer-wise RBMs are built for the following configurations: 4096 to 1024, 1024 to 512, 512 to 128 and 128 to 32 dimensions and are trained through contrastive divergence approach.
- the invention also encompasses reducing dimensionality by providing the feature vector as an input to an auto encoder artificial neural network, wherein the auto encoder converts the feature vector into a reduced dimensionality feature vector.
- the invention encompasses use of a hashing scheme to index all the images stored in the database, i.e. the set of images within which similarity search is to be performed.
- Each image stored in the database is split into 16 segments of 8-bits, wherein each segment is stored in a hash table.
- the reduced dimensional feature vector of the query image is also split/divided into a plurality of segments, preferably into 16 segments of 8-bits each, and each such segment is provided to one of the 16 hash tables.
- query image segments are compared with segments of the set of images at step 710 . Comparing query image segments with segments of set of images includes providing each segment to one node, wherein each node looks up its specific hash table and returns images matching the 8 bit segment that was sent to that node.
- Step 710 provides images similar to said query image based on this comparison, wherein similar images are those for which the distance from the query image is less than a pre-defined threshold.
- the similar images computed in the above step are then re-ranked based on their distance from the query image in the original dimensional space.
- hamming distance is used as a distance measure. This distance search is first performed in the reduced dimensional space (such as 128 bit dimensional space) and then the obtained results are re-ranked according to the distances in the original dimensional space (such as the 4096 bit dimensional space).
- the disclosed methods and systems may be implemented on a Graphics Processing Unit (GPU).
- GPU Graphics Processing Unit
- the systems are implemented on NVIDIA Tesla M2070 GPU card with 2880 CUDA cores, an Intel Xeon X5675 CPU and 5375 MB of Video RAM.
Abstract
Efficient image processing systems and methods for image scene classification and similarity matching are disclosed. The image processing systems encompassed by this disclosure use a deep convolutional neural network to facilitate scene classification by recognizing the context of an image and thereby enabling searches for similar images. These methods and systems are scalable to a large set of images and achieve a higher performance compared to the current state of the art techniques.
Description
- The present invention relates to image processing systems and methods and more particularly to systems and methods for image scene classification and similarity matching.
- The following description of related art is intended to provide background information pertaining to the field of the invention. This section may include certain aspects of the art that may be related to various aspects of the present disclosure. However, it should be appreciated that this section be used only to enhance the understanding of the reader with respect to the present disclosure, and not as admissions of prior art.
- Image processing is widely gaining popularity with the increasing usage of digital images for various purposes. Particularly, one of the most important areas in image processing relates to scene classification that deals with the problem of understanding the context of what is captured in an image. Understanding a holistic view of an image is a relatively difficult task due to the lack of text labels that represent the content present in them. Existing systems and methods for scene recognition have a number of drawbacks and limitations. Existing solutions treat indoor and outdoor scene recognition as two different problems due to the significant variation in appearance characteristics between indoor and outdoor images. It has largely been perceived that different kinds of features would be required to discriminate between indoor scenes and outdoor scenes. This is highly inefficient since different systems and methods are required to be deployed for scene recognition of indoor and outdoor scenes. Current indoor scene recognition systems and methods use part-based models that look for characteristic objects in an image to determine the scene which results in the inaccurate assumption that similar-looking objects distributed spatially in a similar manner, constitute the same scene.
- Further, the current solutions are unable to effectively address the problem of overfitting caused by the use of real world image datasets (as input to these systems) that capture a lot of intra-class variation, i.e. the significant variation in appearance characteristics of images within each class. Furthermore, existing solutions use hand crafted features to discriminate between images/scenes. However, features that are good for discriminating between some classes may not be good for other classes. Existing approaches to image/scene recognition are incapable of continuously learning from or adapting to the increasing number of images uploaded to the internet every day.
- Another important aspect of image processing relates to similarity matching of images. With the growing requirement for image recognition technologies, the need for scalable recognition techniques that can handle a large number of classes and continuously learn from internet scale images has become very evident. Unlike searching for textual data, searching for images similar to a particular image is a challenging task. The number of images uploaded on the World Wide Web is increasing every day and it has become extremely difficult to incorporate these newly added images into the search database of existing similarity matching techniques. As discussed above, existing image recognition solutions use hand crafted features to discriminate between images. A major disadvantage of such systems is that it results in a large reconstruction error, i.e. reconstruction of images using these hand crafted features is likely to produce an image very different from the original image.
- Thus, there is a need for building improved and scalable image processing systems for scene classification and similarity matching that are capable of handling a large number of images/classes.
- This section is provided to introduce certain objects and aspects of the disclosed methods and systems in a simplified form that are further described below in the detailed description. This summary is not intended to identify the key features or the scope of the claimed subject matter.
- In view of the shortcomings of existing image processing systems and methods, as discussed in the background section, it is an object of the present invention to provide systems and methods for image processing that facilitates scene recognition while minimizing the false positives. It is another object of the present invention to facilitate large scale recognition of indoor and outdoor scenes. Another object of the invention is to provide systems and methods for scene classification that helps in achieving translational invariance. Yet another object of the invention is to provide image processing systems and methods that efficiently provides images similar to a query image. Another object of the invention is to facilitate similarity matching that minimizes reconstruction error.
- In view of these and other objects, one aspect of the present disclosure relates to a method for scene classification for an image. The method begins with receiving at least one image and classifying said image into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. The method further comprises validating the at least one intermediate output category based on said extracted characteristic features and providing a scene classification based on said validation.
- Another aspect of the invention relates to a system for scene classification of an image, the system comprising a receiver unit for receiving at least one image for classification and a base classifier, associated with said receiver unit, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. The system further comprises a binary classifier associated with said base classifier, for providing a scene classification for said at least one image based on validation of the at least one intermediate output category, wherein said validation is based on said extracted characteristic features.
- Yet another aspect of the disclosure relates to a method for providing images similar to a query image from within a set of images. The method comprises receiving a query image from the user and providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image. The method further comprises reducing the dimensionality of the extracted feature vector to form a reduced dimensional feature vector and subsequently splitting the reduced dimensional feature vector into a plurality of query feature segments. Lastly, each of said query image segments are compared with segments of the set of images and images similar to said query image is provided based on this comparison.
- Yet another aspect of the invention relates to a system for providing images similar to a query image from within a set of images. The system comprises an input module for receiving a query image from the user and a base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image. The system further comprises a reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments. The system also comprises a comparison module associated with said reduction module, for providing images similar to said query image, based on a comparison between each of said query image segments and segments of the set of images.
- The accompanying drawings, which are incorporated herein, and constitute a part of this disclosure, illustrate exemplary embodiments of the disclosed methods and systems in which like reference numerals refer to the same parts throughout the different drawings. Some drawings may indicate the components using block diagrams. It will be appreciated that disclosure of such block diagrams include disclosure of the internal sub-components of these components as discussed in the detailed description.
-
FIG. 1 illustrates a block diagram of a system for scene classification, in accordance with exemplary embodiments of the present invention. -
FIG. 2 illustrates a general example of a neural network. -
FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention. -
FIG. 4 illustrates an exemplary list of categories/classes, in accordance with exemplary embodiments of the present invention. -
FIG. 5 illustrates a method for scene classification, in accordance with exemplary embodiments of the present invention. -
FIG. 6 illustrates a system for similarity matching, in accordance with exemplary embodiments of the present invention. -
FIG. 7 illustrates a method for similarity matching, in accordance with exemplary embodiments of the present invention. - The foregoing will be apparent from the following more detailed description of example embodiments of the present disclosure, as illustrated in the accompanying drawings.
- In the following description, for the purposes of explanation, various specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. It will be apparent, however, that the disclosed embodiments may be practiced without these specific details. Several features described hereafter can each be used independently of one another or with any combination of other features. Although headings are provided, information related to a particular heading, but not found in the section having that heading, may also be found elsewhere in the specification. Further, information provided under a particular heading may not necessarily be a part of only the section having that heading.
- Systems and methods for image processing in accordance with example embodiments of the present disclosure are described. In general, the systems and methods disclosed herein facilitate scene classification and similarity matching for an image. As shown in
FIG. 1 , the system for scene classification, in accordance with example embodiments of the present disclosure, comprises of atransceiver unit 102 associated with abase classifier 104 and abinary classifier 106, all connected to adata repository 108. Thetransceiver unit 102 is configured to receive at least one image for classification, wherein the image may be received from a user or any other system. Thetransceiver unit 102 is configured to pre-process this information and provide it to thebase classifier 104. - The
base classifier 104 comprises of a deep convolutional neural network (DCNN), wherein the DCNN is configured to determine at least one intermediate output category for said image and extract one or more characteristic features thereof. The intermediate output category may be chosen from a set of pre-defined output categories. The intermediate output category determined by the DCNN is one of an indoor category, an outdoor category and a combination thereof. In an embodiment, the extracted one or more characteristic features may be the output feature vector produced by the first fully connected layer of the DCNN. The DCNN is a large neural network configured to deal with two dimensional input data such as an image and perform 2D convolutional operation on such input. The DCNN has been discussed in detail with reference toFIG. 3 . - The
binary classifier 106 associated with saidbase classifier 104 is configured to validate the intermediate output category provided by the base classifier based on the extracted characteristic features of an image and provide a scene classification for the image based on said validation. The invention encompasses a binary classifier capable of predicting that the image does not belong to any scene classification. The invention encompasses a binary classifier capable of predicting more than one scene category/classification to an image received from the user. In a preferred embodiment, the system comprises of one binary classifier for each of the intermediate output categories pre-defined in the system. The invention encompasses a binary classifier that is a Support Vector Machine (SVM). - The
data repository 108 is configured to store the intermediate output category and the extracted characteristic features of the image received from thebase classifier 104. Thedata repository 108 is further configured to store the scene classification provided by thebinary classifier 106. -
FIG. 2 illustrates a simplified example of an artificial neural network comprising of multiple nodes and connections between them. The nodes 202 (i) are referred to as input nodes and are configured to receive raw data in the form of inputs that trigger the nodes it is connected to. Each connection as shown inFIG. 2 has a corresponding value referred to as weights, wherein these weights may indicate the importance of these connections/nodes. In an embodiment, when a node is triggered by two or more nodes, the input taken by the node depends upon the weight of the connections from which the node is triggered. The nodes 204 (j) are referred to as intermediate/hidden nodes and are configured to accept input from one or more input nodes and provide output to one or more hidden nodes and/or output nodes 206 (k). The input nodes 202 (i) form an input layer, the hidden nodes 204 (j) form one or more hidden layers and the output nodes 206 (k) form an output layer. Although only one hidden layer has been shown inFIG. 2 , it will be appreciated that any number of hidden layers may be implemented in the artificial neural network depending upon the complexity of the decision to be made, the dimensionality of the input data and size of the dataset used for training. In a deep neural network, large number of hidden layers are stacked one above the other, wherein each layer computes a non-linear transformation of the outputs from the previous layer. - To facilitate generation/prediction of outputs with minimal error, the deep neural network are required to be trained or customized, wherein a large set of inputs are provided to the neural network, outputs for said inputs are computed and the network weights are adjusted based on the error, if any, in the outputs. In an embodiment, the training of neural networks encompasses by the invention is performed using backpropagation, wherein random weights/values are assigned to each connection followed by computing a set of outputs for a given set of inputs using said random weights. A desired output of each of the inputs is defined and is compared with the calculated output, wherein the difference between the two values may be referred to as an error in the network. Subsequently, weights for each layer are adjusted based on the computed error for each layer. Thus, using this method of backpropagation a network can be appropriately trained to minimize errors in the output.
-
FIG. 3 illustrates an architectural representation of the configuration of a deep convolutional neural network, in accordance with exemplary embodiments of the present invention. Deep Convolutional Neural Networks are neural nets that are specifically designed to deal with 2D input data and their patterns of translation, scale and distortion variances. The DCNN is configured to perform a 2D convolution operation where the input to a neuron/node is the output of a 2D kernel operating on a local window of neurons from the previous layer. The same 2D kernel is applied throughout the layer, resulting in weights being shared by multiple connections in the network. - As shown in
FIG. 3 , the DCNN comprises of several types of layers such as the convolution layer, pooling layers and fully connected layers. The convolution layer is configured to perform individual convolution operations and send the output of each such operation to one or more nodes in the next layer. In a preferred embodiment, the output of the convolution operations is transferred to only some nodes in the next layer. The pooling layer is configured to perform aggregate operations like max/average over neighboring windows of outputs. The pooling layer has no weights and just aggregates values over the receptive field. A fully connected layer is the traditional layer type where each node is connected to every node in the next layer. - In addition to the backpropagation technique for training as discussed above, the DCNN networks may be trained using techniques such as rectified linear units, local response normalization, and parallelization on a GPU. In a preferred embodiment, the weight update rule is as follows:
-
- Wherein i is the iteration number, w is the weight vector, Di is the training data sampled for training in iteration l and v is the weight update.
- In an exemplary embodiment, as shown in
FIG. 3 , the DCNN comprises of 5 convolution layers, 3 pooling layers and 3 fully connected layers. The input layer is configured to read/receive raw RGB data from an image and the output layer is configured to produce intermediate output categories probabilities as outputs. As shown, in a preferred embodiment, the first convolution layer has 96 kernels of size 11×11×3, second convolution layer has 256 kernels of size 5×5×96, third convolution layer has 384 kernels of size 3×3×256, fourth convolution layer has 384 kernels of size 3×3×384 and fifth convolution layer has 256 kernels of size 3×3×384. All three pooling layers use a kernel of size 3×3 with a stride of 2 and the two fully connected layers comprise of 4096 nodes/neurons each. - As shown in
FIG. 6 , the system for similarity matching, i.e. for providing images similar to an query image from within a set of images, comprises an input/output module 602 connected to abase classifier 604 further connected to areduction module 606 andcomparison module 608, all connected to acentral database 610. The I/O module 602 is configured to receive a query image, wherein the query image may be received from a user input or as part of a request made by another system. The I/O module 602 is also configured to store the query image into thecentral database 610 and provide the same to thebase classifier 604, wherein thebase classifier 604 comprises of a deep convolutional neural network (DCNN). The DCNN extracts a feature vector of said query image and provides it to thereduction module 606, wherein in an embodiment the extracted feature vector is a 4096 dimensional vector. Thebase classifier 604 stores this extracted feature vector in thecentral database 610. - The
reduction module 606 comprises of an auto-encoder (not shown inFIG. 6 ) configured to reduce the dimensionality of said feature vector to form a reduced dimensional feature vector. In an embodiment, the auto encoder converts the 4096 dimensional feature vector into 128 reduced dimensional feature vector. The auto encoder is an artificial neural network trained in a supervised error back propagation manner same as or at least substantially similar to that discussed hereinabove. The auto encoder is configured to adjust/learn a set of weights that minimize the reconstruction error, wherein reconstruction error captures the similarity between the image reconstructed from a reduced dimensional vector and the original query image. Minimizing the reconstruction error ensures that similar vectors in higher dimensional space are also similar in lower dimensional space and vice versa. In an embodiment, the auto encoder was trained with the network architecture consisting of the following sequence of dimensions: 4096-1024-512-128-32-128-512-1024. - The
reduction module 606 is further configured to split the reduced dimensional feature vector into a plurality of query image segments and provide the same to thecomparison module 608. In a preferred embodiment, thereduction module 606 splits the 128 bit reduced dimensional feature vector into 16 segments of 8 bits each. - The
comparison module 608 is configured to compare the query image segments with the segments of set of images stored in thecentral database 610 and provide images similar to said query image based on this comparison, wherein the query image itself is excluded from the set of images used for this comparison. Thecomparison module 608 is further configured to compare the original query image with the similar images and re-rank the similar images based on this comparison. In other words, similar images computed on the basis of the comparison using smaller reduced dimensionality feature vectors are re-ranked by thecomparison module 608 based on a comparison between the feature vector of the query image and the feature vectors of the similar images. - The
central database 610 is configured to store a large set of images indexed as 8 bit segments using hash tables. For every image in the set of images, the central database is configured to store a 128-bit representation of the image computed by thereduction module 606, the original 4096 dimensional feature vector of the image extracted by thebase classifier 604 and the image itself. In an embodiment, the 128-bit vector is split into 16 segments of 8 bits each. Thecentral database 610 is configured to maintain 16 hash tables, one for each segment, wherein the hash table corresponding to segment “i” uses the integer computed from those 8 bits as the hash key and the image as the value. Thecentral database 610 is further configured to store the input query images, the feature vectors, the reduced dimensionality feature vectors, comparison results and any other data/information used or processed by the system or any modules/components thereof. - As shown in
FIG. 5 , the method for scene classification for an image, in accordance with example embodiments of the present invention begins atstep 502, wherein at least one image is received for classification pursuant to which it is pre-processed to alter the size, dimensionality of the image, extract the RGB data from said image, etc. Next, atstep 504, the received and/or pre-processed image is classified into at least one category using a deep convolutional neural network (DCNN), wherein the DCNN determines at least one intermediate output category for said image and extracts one or more characteristic features of said image. Extracting one or more characteristic features of said image includes extracting characteristic features from a fully connected layer of the DCNN. Determining an intermediate output category includes passing the input image to the DCNN, computing features in each layer of said DCNN and propagating said features to the subsequent layers of the DCNN, at least one non-linear transformation of the received image is computed during this step. - Subsequently, at
step 506, the intermediate output category determined atstep 504 is validated based on the extracted characteristic features. Validating the intermediate output category includes passing the characteristics features of the image as an input to the binary classifier, wherein if the binary classifier returns a positive output then the scene classification for the input image is the same as the predicted intermediate output category, otherwise the scene classification for the input image is ‘none of the pre-defined scene classes’. Based on said validation a scene classification is provided for said image. - The method also encompasses reducing overfitting during the process of scene classification, wherein the overfitting may be reduced by performing data augmentation by replicating data/images through the introduction of small variations and augmenting this to the training dataset. In an embodiment, data augmentation is performed by extracting multiple patches of 227×227 from the training image of 256×256. In another embodiment, data augmentation is performed by performing PCA on R, G, B pixel values over the training data for each pixel and extracting the top 16 principal components. Subsequently, these components are multiplied with a random small factor and the weighed principal components are added to each image to get more replications. The invention also encompasses reducing overfitting by dropping the inputs from certain randomly chosen nodes during training. In an embodiment, when an image is processed, the outputs of all neurons are taken in but they are multiplied with a factor to account for dropout, for instance if 50% of the inputs are dropped out, the outputs of all neurons are multiplied with a factor of 0.5 to weigh their contributions in the training process. The method also encompasses reducing overfitting by early termination wherein the learning rate is dropped by a factor of 10 whenever the training error increases or is constant for a sustained number of iterations.
- As shown in
FIG. 7 , the method for providing images similar to a query image from within a set of images, in accordance with example embodiments of the invention, begins with receiving a query image atstep 702. Next, atstep 704, the received input query image is provided as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image. In an embodiment, the extracted feature vector is an output of a fully connected layer of the DCNN. - At
step 706, the dimensionality of said feature vector is reduced to form a reduced dimensional feature vector. The invention encompasses reducing dimensionality by providing the feature vector extracted by the DCNN as an input to the Deep Belief Network (DBN), wherein the DBN comprises layer-wise Restricted Boltzmann Machines (RBM). In an embodiment, the layer-wise RBMs are built for the following configurations: 4096 to 1024, 1024 to 512, 512 to 128 and 128 to 32 dimensions and are trained through contrastive divergence approach. The invention also encompasses reducing dimensionality by providing the feature vector as an input to an auto encoder artificial neural network, wherein the auto encoder converts the feature vector into a reduced dimensionality feature vector. - As discussed in the system overview, the invention encompasses use of a hashing scheme to index all the images stored in the database, i.e. the set of images within which similarity search is to be performed. Each image stored in the database is split into 16 segments of 8-bits, wherein each segment is stored in a hash table. At
step 708, the reduced dimensional feature vector of the query image is also split/divided into a plurality of segments, preferably into 16 segments of 8-bits each, and each such segment is provided to one of the 16 hash tables. Subsequently, such query image segments are compared with segments of the set of images atstep 710. Comparing query image segments with segments of set of images includes providing each segment to one node, wherein each node looks up its specific hash table and returns images matching the 8 bit segment that was sent to that node. - Step 710 provides images similar to said query image based on this comparison, wherein similar images are those for which the distance from the query image is less than a pre-defined threshold.
- At
step 712, the similar images computed in the above step are then re-ranked based on their distance from the query image in the original dimensional space. In an embodiment, hamming distance is used as a distance measure. This distance search is first performed in the reduced dimensional space (such as 128 bit dimensional space) and then the obtained results are re-ranked according to the distances in the original dimensional space (such as the 4096 bit dimensional space). - The disclosed methods and systems may be implemented on a Graphics Processing Unit (GPU). In an embodiment, the systems are implemented on NVIDIA Tesla M2070 GPU card with 2880 CUDA cores, an Intel Xeon X5675 CPU and 5375 MB of Video RAM.
- While various embodiments of the image processing methods and systems have been disclosed, it should be apparent to those skilled in the art many more modifications, besides those described, are possible without departing from the inventive concepts herein. The embodiments, therefore, are not to be restricted. Moreover, in interpreting the disclosure, all terms should be understood in the broadest possible manner consistent with the context. It will be appreciated by those skilled in the art that any of the aforementioned steps and/or system modules may be suitably replaced, reordered, or removed, and additional steps and/or system modules may be inserted. Further, the invention encompasses embodiments for hardware, software, or a combination thereof.
Claims (7)
1-9. (canceled)
10. A method providing images similar to a query image from within a set of images, the method comprising:
Receiving a query image;
Providing said query image as an input to a deep convolutional neural network (DCNN), wherein the DCNN extracts a feature vector of said query image;
Reducing dimensionality of said feature vector to form a reduced dimensional feature vector;
Splitting the reduced dimensional feature vector into a plurality of query image segments; and
providing images similar to said query image based on a comparison between each of said query image segments and segments of the set of images.
11. The method of claim 10 wherein reducing dimensionality of said feature vector comprises of providing said feature vector as an input to an auto-encoder, wherein the auto-encoder processes said feature vector to form a reduced dimensional feature vector.
12. The method of claim 10 further comprising ranking the one or more similar images based on a distance between the query image and the similar images.
13. The method of claim 10 wherein the distance between the query image and the provided similar images is less than a predefined threshold.
14. A system for providing images similar to a query image from within a set of images, the system comprising:
An input module for receiving a query image;
A base classifier associated with said input module, wherein the base classifier comprises a deep convolutional neural network (DCNN), and wherein the DCNN extracts a feature vector of said query image;
A reduction module associated with said base classifier, for reducing dimensionality of said feature vector to form a reduced dimensional feature vector and splitting the reduced dimensional feature vector into a plurality of query image segments; and
A comparison module associated with said reduction module, for providing images similar to said query image based on a comparison between each of said query image segments and segments of the set of images.
15. The system of claim 14 further comprising a central database storing the set of images in the form of plurality of image segments, wherein each segment is stored in a hash table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/879,343 US20180204062A1 (en) | 2015-06-03 | 2018-01-24 | Systems and methods for image processing |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562170451P | 2015-06-03 | 2015-06-03 | |
US15/172,139 US10095950B2 (en) | 2015-06-03 | 2016-06-02 | Systems and methods for image processing |
US15/879,343 US20180204062A1 (en) | 2015-06-03 | 2018-01-24 | Systems and methods for image processing |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/172,139 Division US10095950B2 (en) | 2015-06-03 | 2016-06-02 | Systems and methods for image processing |
Publications (1)
Publication Number | Publication Date |
---|---|
US20180204062A1 true US20180204062A1 (en) | 2018-07-19 |
Family
ID=57451115
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/172,139 Active US10095950B2 (en) | 2015-06-03 | 2016-06-02 | Systems and methods for image processing |
US15/879,343 Abandoned US20180204062A1 (en) | 2015-06-03 | 2018-01-24 | Systems and methods for image processing |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/172,139 Active US10095950B2 (en) | 2015-06-03 | 2016-06-02 | Systems and methods for image processing |
Country Status (1)
Country | Link |
---|---|
US (2) | US10095950B2 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109241316A (en) * | 2018-08-30 | 2019-01-18 | 北京旷视科技有限公司 | Image search method, device, electronic equipment and storage medium |
CN109886303A (en) * | 2019-01-21 | 2019-06-14 | 武汉大学 | A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing |
US20190236774A1 (en) * | 2018-01-30 | 2019-08-01 | General Electric Company | Systems and methods for capturing deep learning training data from imaging systems |
CN110163286A (en) * | 2019-05-24 | 2019-08-23 | 常熟理工学院 | Hybrid pooling-based domain adaptive image classification method |
WO2020093210A1 (en) * | 2018-11-05 | 2020-05-14 | 中国科学院计算技术研究所 | Scene segmentation method and system based on contenxtual information guidance |
WO2021057046A1 (en) * | 2019-09-24 | 2021-04-01 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Image hash for fast photo search |
WO2021115115A1 (en) * | 2019-12-09 | 2021-06-17 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Zero-shot dynamic embeddings for photo search |
Families Citing this family (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9753949B1 (en) | 2016-03-14 | 2017-09-05 | Shutterstock, Inc. | Region-specific image download probability modeling |
CN106021364B (en) * | 2016-05-10 | 2017-12-12 | 百度在线网络技术(北京)有限公司 | Foundation, image searching method and the device of picture searching dependency prediction model |
DE102016223484B4 (en) * | 2016-11-25 | 2021-04-15 | Fujitsu Limited | Determine Similarities in Computer Software Codes for Performance Analysis |
KR102242623B1 (en) * | 2016-12-15 | 2021-04-20 | 주식회사 엘지유플러스 | Method and Apparatus For Processing Purchase Request Using The Wire-Wireless Communication Network |
CN106599926A (en) * | 2016-12-20 | 2017-04-26 | 上海寒武纪信息科技有限公司 | Expression picture pushing method and system |
KR101900180B1 (en) | 2017-01-11 | 2018-09-18 | 포항공과대학교 산학협력단 | Imgae analysis method for extracting feature of image and apparatus therefor |
US10163043B2 (en) * | 2017-03-31 | 2018-12-25 | Clarifai, Inc. | System and method for facilitating logo-recognition training of a recognition model |
CN106886801B (en) | 2017-04-14 | 2021-12-17 | 北京图森智途科技有限公司 | Image semantic segmentation method and device |
CN107220325A (en) * | 2017-05-22 | 2017-09-29 | 华中科技大学 | A kind of similar icon search methods of APP based on convolutional neural networks and system |
US11126894B2 (en) * | 2017-06-05 | 2021-09-21 | Siemens Aktiengesellschaft | Method and apparatus for analysing an image |
CN107274378B (en) * | 2017-07-25 | 2020-04-03 | 江西理工大学 | Image fuzzy type identification and parameter setting method based on fusion memory CNN |
CN109420622A (en) * | 2017-08-27 | 2019-03-05 | 南京理工大学 | Tobacco leaf method for sorting based on convolutional neural networks |
GB2566257A (en) * | 2017-08-29 | 2019-03-13 | Sky Cp Ltd | System and method for content discovery |
EP3457324A1 (en) | 2017-09-15 | 2019-03-20 | Axis AB | Method for locating one or more candidate digital images being likely candidates for depicting an object |
US10970552B2 (en) * | 2017-09-28 | 2021-04-06 | Gopro, Inc. | Scene classification for image processing |
CN108399409B (en) * | 2018-01-19 | 2019-06-18 | 北京达佳互联信息技术有限公司 | Image classification method, device and terminal |
CN108829692B (en) * | 2018-04-09 | 2019-12-20 | 华中科技大学 | Flower image retrieval method based on convolutional neural network |
US10769261B2 (en) * | 2018-05-09 | 2020-09-08 | Futurewei Technologies, Inc. | User image verification |
CN108765425B (en) * | 2018-05-15 | 2022-04-22 | 深圳大学 | Image segmentation method and device, computer equipment and storage medium |
US11409994B2 (en) | 2018-05-15 | 2022-08-09 | Shenzhen University | Methods for image segmentation, computer devices, and storage mediums |
CN110580487A (en) * | 2018-06-08 | 2019-12-17 | Oppo广东移动通信有限公司 | Neural network training method, neural network construction method, image processing method and device |
CN109033172B (en) * | 2018-06-21 | 2021-12-17 | 西安理工大学 | Image retrieval method for deep learning and approximate target positioning |
CN108983187B (en) * | 2018-07-11 | 2022-07-15 | 西安电子科技大学 | Online radar target identification method based on EWC |
CN109446887B (en) * | 2018-09-10 | 2022-03-25 | 易诚高科(大连)科技有限公司 | Image scene description generation method for subjective evaluation of image quality |
CN109597851B (en) * | 2018-09-26 | 2023-03-21 | 创新先进技术有限公司 | Feature extraction method and device based on incidence relation |
US11222233B2 (en) | 2018-09-26 | 2022-01-11 | Samsung Electronics Co., Ltd. | Method and apparatus for multi-category image recognition |
RU2706960C1 (en) * | 2019-01-25 | 2019-11-22 | Самсунг Электроникс Ко., Лтд. | Computationally efficient multi-class image recognition using successive analysis of neural network features |
KR20200051278A (en) | 2018-11-05 | 2020-05-13 | 삼성전자주식회사 | Method of managing task in artificial neural network and system comprising the same |
CN109685116B (en) * | 2018-11-30 | 2022-12-30 | 腾讯科技(深圳)有限公司 | Image description information generation method and device and electronic device |
US20200192932A1 (en) * | 2018-12-13 | 2020-06-18 | Sap Se | On-demand variable feature extraction in database environments |
CN109639739B (en) * | 2019-01-30 | 2020-05-19 | 大连理工大学 | Abnormal flow detection method based on automatic encoder network |
WO2020181098A1 (en) * | 2019-03-05 | 2020-09-10 | Memorial Sloan Kettering Cancer Center | Systems and methods for image classification using visual dictionaries |
CN110188827B (en) * | 2019-05-29 | 2020-11-03 | 创意信息技术股份有限公司 | Scene recognition method based on convolutional neural network and recursive automatic encoder model |
CN110287800B (en) * | 2019-05-29 | 2022-08-16 | 河海大学 | Remote sensing image scene classification method based on SGSE-GAN |
CN110211127B (en) * | 2019-08-01 | 2019-11-26 | 成都考拉悠然科技有限公司 | Image partition method based on bicoherence network |
CN110689077A (en) * | 2019-09-29 | 2020-01-14 | 福建师范大学 | Novel digital image classification method |
US11645611B1 (en) * | 2019-12-23 | 2023-05-09 | Blue Yonder Group, Inc. | System and method of decoding supply chain signatures |
US11487808B2 (en) | 2020-02-17 | 2022-11-01 | Wipro Limited | Method and system for performing an optimized image search |
CN111428785B (en) * | 2020-03-23 | 2023-04-07 | 厦门大学 | Puffer individual identification method based on deep learning |
CN111428739B (en) * | 2020-04-14 | 2023-08-25 | 图觉(广州)智能科技有限公司 | High-precision image semantic segmentation method with continuous learning capability |
KR102208685B1 (en) * | 2020-07-23 | 2021-01-28 | 주식회사 어반베이스 | Apparatus and method for developing space analysis model based on data augmentation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8644624B2 (en) * | 2009-07-28 | 2014-02-04 | Samsung Electronics Co., Ltd. | System and method for indoor-outdoor scene classification |
US8873840B2 (en) * | 2010-12-03 | 2014-10-28 | Microsoft Corporation | Reducing false detection rate using local pattern based post-filter |
US9524450B2 (en) * | 2015-03-04 | 2016-12-20 | Accenture Global Services Limited | Digital image processing using convolutional neural networks |
-
2016
- 2016-06-02 US US15/172,139 patent/US10095950B2/en active Active
-
2018
- 2018-01-24 US US15/879,343 patent/US20180204062A1/en not_active Abandoned
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190236774A1 (en) * | 2018-01-30 | 2019-08-01 | General Electric Company | Systems and methods for capturing deep learning training data from imaging systems |
US10679346B2 (en) * | 2018-01-30 | 2020-06-09 | General Electric Company | Systems and methods for capturing deep learning training data from imaging systems |
CN109241316A (en) * | 2018-08-30 | 2019-01-18 | 北京旷视科技有限公司 | Image search method, device, electronic equipment and storage medium |
WO2020093210A1 (en) * | 2018-11-05 | 2020-05-14 | 中国科学院计算技术研究所 | Scene segmentation method and system based on contenxtual information guidance |
CN109886303A (en) * | 2019-01-21 | 2019-06-14 | 武汉大学 | A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing |
CN110163286A (en) * | 2019-05-24 | 2019-08-23 | 常熟理工学院 | Hybrid pooling-based domain adaptive image classification method |
WO2021057046A1 (en) * | 2019-09-24 | 2021-04-01 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Image hash for fast photo search |
WO2021115115A1 (en) * | 2019-12-09 | 2021-06-17 | Guangdong Oppo Mobile Telecommunications Corp., Ltd. | Zero-shot dynamic embeddings for photo search |
Also Published As
Publication number | Publication date |
---|---|
US10095950B2 (en) | 2018-10-09 |
US20160358024A1 (en) | 2016-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10095950B2 (en) | Systems and methods for image processing | |
WO2021042828A1 (en) | Neural network model compression method and apparatus, and storage medium and chip | |
US10438091B2 (en) | Method and apparatus for recognizing image content | |
US20220044094A1 (en) | Method and apparatus for constructing network structure optimizer, and computer-readable storage medium | |
US9400918B2 (en) | Compact face representation | |
CN111198959B (en) | Two-stage image retrieval method based on convolutional neural network | |
Lai et al. | Instance-aware hashing for multi-label image retrieval | |
CN111079639B (en) | Method, device, equipment and storage medium for constructing garbage image classification model | |
Pan et al. | Deepfood: Automatic multi-class classification of food ingredients using deep learning | |
Faraki et al. | Fisher tensors for classifying human epithelial cells | |
Bhateja et al. | Iris recognition based on sparse representation and k-nearest subspace with genetic algorithm | |
JP7250126B2 (en) | Computer architecture for artificial image generation using autoencoders | |
CN111091175A (en) | Neural network model training method, neural network model classification method, neural network model training device and electronic equipment | |
CN108804617B (en) | Domain term extraction method, device, terminal equipment and storage medium | |
CN111444765B (en) | Image re-identification method, training method of related model, related device and equipment | |
US11593619B2 (en) | Computer architecture for multiplier-less machine learning | |
Chadha et al. | Voronoi-based compact image descriptors: Efficient region-of-interest retrieval with VLAD and deep-learning-based descriptors | |
CN112507800A (en) | Pedestrian multi-attribute cooperative identification method based on channel attention mechanism and light convolutional neural network | |
EP4115321A1 (en) | Systems and methods for fine tuning image classification neural networks | |
CN114491115B (en) | Multi-model fusion integrated image retrieval method based on deep hash | |
WO2022063076A1 (en) | Adversarial example identification method and apparatus | |
CN108229505A (en) | Image classification method based on FISHER multistage dictionary learnings | |
CN111026887A (en) | Cross-media retrieval method and system | |
Okokpujie et al. | Predictive modeling of trait-aging invariant face recognition system using machine learning | |
Kopčan et al. | Anomaly detection using Autoencoders and Deep Convolution Generative Adversarial Networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: HYPERVERGE INC., CALIFORNIA Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNORS:KRISHNAKUMAR, VIGNESH;SRIDHARASINGAN, HARIPRASAD PRAYAGAI;TADIMARI, ADARSH AMARENDRA;AND OTHERS;REEL/FRAME:045024/0289 Effective date: 20160602 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |