AU2021101517A4 - A system for object recognition for visual surveillance - Google Patents
A system for object recognition for visual surveillance Download PDFInfo
- Publication number
- AU2021101517A4 AU2021101517A4 AU2021101517A AU2021101517A AU2021101517A4 AU 2021101517 A4 AU2021101517 A4 AU 2021101517A4 AU 2021101517 A AU2021101517 A AU 2021101517A AU 2021101517 A AU2021101517 A AU 2021101517A AU 2021101517 A4 AU2021101517 A4 AU 2021101517A4
- Authority
- AU
- Australia
- Prior art keywords
- features
- image
- images
- sift
- feature extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/467—Encoded features or binary features, e.g. local binary patterns [LBP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure relates to a system for real-time object recognition for visual
surveillance. The system uses a benchmark image dataset Caltech-101, a pre-processing module,
a feature extraction module, a feature selection module, a locality preserving projection, a feature
vector and four classification model trained by using Gaussian Naive Bayes, Decision Tree,
Random Forest and XGB Classifiers for recognizing object(s). The system is examined on a
fusion of various features extracted through a pre-trained deep learning model VGG19 model,
SIFT and Haralick Texture method. The objective of the system is to improve the computation
speed and recognition accuracy of the object recognition system. The fusion of deep and local
features for 2D object recognition is considered in this present disclosure.
17
LImpee duamet
034ngdeepDGRL IQ Gm
prvvakwdeVG619 SiFf HuahlckTexthw
k-n~e~meamJ"terIa9
Figure 3
Fl SIFT
F2 VGG19
F3 Hlaralick (without Saliency)
F4 Haralick (with Saiency)
Fl-F4 SIFT-+Haralick (with Saliency)
F2+F4 VGGti)Haralick (wth Saliency)
F1-+F2--F4 VGG19-+SIFT-+Haraick (with Saicy
Figure 4
Description
LImpee duamet
034ngdeepDGRL IQ Gm
prvvakwdeVG619 SiFf HuahlckTexthw
k-n~e~meamJ"terIa9
Figure 3
Fl SIFT F2 VGG19 F3 Hlaralick (without Saliency) F4 Haralick (with Saiency) Fl-F4 SIFT-+Haralick (with Saliency) F2+F4 VGGti)Haralick (wth Saliency) F1-+F2--F4 VGG19-+SIFT-+Haraick (with Saicy
Figure 4
A system for object recognition for visual surveillance
The present disclosure relates to a system for real-time object recognition for visual surveillance.
Object recognition/image classification is among the most popular applications of computer vision. The task of the object recognition system is to identify the object in the image and assign the class/label to the object. The system works like a human brain. Just like the human brain learns as many things as it encounters them and stores the information for future identification. The object recognition system follows the same process, as initially this system is trained with the features of the image and its associated class. Then the test images are used to predict the class for the candidate images. The efficiency of the system highly depends on the size of dataset because it helps to extract more features of the image. The features of the object may be the color, shape, texture or the spatial layout information of the object. There is a need to pre-process the image before feature extraction so that all the relevant and efficient information can be extracted from the image. Some of the feature extraction methods implicitly perform the necessary steps to remove these issues. But for some methods, they require to pre-process the image explicitly. There are various feature extraction algorithms used in image classification such as Scale Invariant Feature Transform (SIFT), Speed-Up Robust Feature (SURF), Local Binary Pattern (LBP), Histogram Oriented Gradients (HOG), Oriented FAST and rotated BRIEF (ORB), Color Histogram, Gabor texture, Haralick texture descriptors etc. But an image cannot be easily identified using any single feature as already proved by many researchers. So, the combination of these features is proposed by many researchers in their work and they achieved good results. Nowadays, deep learning features are becoming very popular for object recognition as they obtain the low-level information from each pixel of the image. It is very time consuming to design the deep learning model from scratch due to high computational power, GPU system, and huge amount of data. So, many researchers have experimented with various pre-trained deep learning for feature extraction.
A combined regional and texture feature extraction approach for image classification was proposed. Shape based features were extracted using statistical moment and image moment. After that, these features are described using Histogram of Oriented Gradients (HOG) for image classification. Local Binary Pattern (LBP) algorithm is used to extract texture features. The size of these combined set of features is optimized using principal component analysis algorithm. For classification, they have considered Support Vector Machine (SVM) classification algorithm. The system was experimented on Corel-100, Caltech-101, and Caltech-256 dataset and outperformed other methods. The behavior of features extracted through a deep CNN pre-trained model (VGG19) with a machine learning classifier (SVM) on three datasets - mit67, flowers102, and cub200 were studied and the behavior of individual feature of each class and explored the behavior of intra/inter property of the class were analyzed. A Bag-of-Features approach for image classification was presented. The features are extracted using a deep convolutional neural network based on salient object detection. The saliency map was used as the pre-processing step to extract the relevant features from the image. This process reduced the redundancy in the extracted features of the image that overcame the problem of over fitting and improves the recognition accuracy. Further, a multi-class linear SVM classifier to classify the features was implemented.
A Bag-of-Feature approach using SIFT and HOG feature descriptors to classify the objects was proposed. The experiment was conducted on a few categories of the Caltech-101 dataset and Scene-15 dataset. K-means clustering algorithm is used to reduce the size of the feature vector for fast and efficient computations. These form a Visual Dictionary for the image of the classification process. For classification, a state-of-art SVM classifier is adopted to classify the images. The model achieved good performance on these datasets. A new approach for image classification by combining the deep residual features using a pre-trained model with the supervised machine learning algorithm was presented. In this, pre-trained ResNet-152 and Google Net convolutional neural network were experimented for feature extraction of various food datasets- Food 5K, Food-11, Raw FooT-DB, and Food-101. Then, the extracted features through these models are used to train various machine learning algorithms, i.e., Naive Bayes, Artificial Neural Network (ANN), Support Vector Machine (SVM), and Random Forest. A new approach for content-based image retrieval system based on the combination of color and texture descriptors was proposed. A fusion of color histogram and Discrete wavelet Transforms and
Edge Histogram Texture descriptor is adopted for the experiment that proved efficient results as compared to the single feature extractor. Further, the Manhattan distance measure is used to match and retrieve the images like the query image.
The conclusion was that Random Forest classification algorithm is achieving highest accuracy among other classifiers. A fusion of handcrafted and deep features for content based image retrieval system was proposed. Further, a feature selection algorithm k-NN is applied on the extracted features to reduce the dimensions of the feature map. Similarity between the images in the dataset and the queried image is computed using Euclidean distance algorithm.
However, there are various challenges that affect the performance of the recognition system such as noise, invariant scale, rotation, translation, and illuminating effects in the image, which degrade the performance of the system. In order to overcome the aforementioned drawbacks, there is a need of a system for real-time object recognition for visual surveillance.
The present disclosure relates to, a system for real-time object recognition for visual surveillance. This is accomplished by extracting various features of the object that helps in the identification of the proper class for the object irrespective of other classes. The system is examined on a fusion of various features extracted through a pre-trained deep learning model VGG19 model, SIFT and Haralick Texture method. Further, various machine learning classification methods, i.e. Gaussian Naive Bayes, Decision Tree, Random Forest, and XGB classifier are used to classify the objects in various classes according to the extracted features. The experiment was conducted on a benchmark image dataset Caltech-101 that contains a collection of noisy, rotated, differently scaled images. Various other performance measurement parameters like precision, recall, Fl-Score, area under curve, false positive rate, root mean square error, and CPU execution time are also evaluated.
The present disclosure seeks to provide a system for real-time object recognition for visual surveillance. The system comprises: a benchmark image dataset Caltech-101 containing total of 9168 images that are grouped into 101 categories and 1 background scene category; a pre-processing module for differentiating visual features of the image and generating meaningful representation of the image; a feature extraction module for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; a feature selection module employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image; a locality preserving projection (LPP) for reducing dimensions of the selected features of the image; wherein a feature vector is formed by combining the features extracted from all these three feature extraction methods; and at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).
The present disclosure also seeks to provide a method for real-time object recognition for visual surveillance, the method comprises: differentiating visual features of the image of a benchmark image dataset Caltech-101and generating meaningful representation of the image; extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; selecting key features of the image and removing irrelevant features of the image; reducing dimensions of the selected features of the image using a locality preserving projection (LPP); forming a feature vector by combining the features extracted from all these three feature extraction methods; and recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers.
An objective of the present disclosure is to develop a system for real-time object recognition for visual surveillance.
Another object of the present disclosure is to extract various features of the object that helps in the identification of the proper class for the object irrespective of other classes.
Another object of the present disclosure is to use various machine learning classification methods to classify the objects in various classes according to the extracted features.
Another object of the present disclosure is to evaluate various performance measurement parameters
Yet, another object of the present disclosure is to improve the computation speed and recognition accuracy of the object recognition system.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 illustrates a block diagram of a system for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure;
Figure 2 illustrates a flow chart of a method for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure;
Figure 3 illustrates a Block diagram of the proposed system in accordance with an embodiment of the present disclosure;
Figure 4 illustrates a tabular form of feature extraction techniques in accordance with an embodiment of the present disclosure;
Figure 5 illustrates a tabular form of classifier wise recognition accuracy in accordance with an embodiment of the present disclosure;
Figure 6 illustrates a tabular form of classifier wise precision rate in accordance with an embodiment of the present disclosure;
Figure 7 illustrates a tabular form of classifier wise recall rate in accordance with an embodiment of the present disclosure;
Figure 8 illustrates a tabular form of classifier wise F-Score in accordance with an embodiment of the present disclosure;
Figure 9 illustrates a tabular form of classifier wise area under curve in accordance with an embodiment of the present disclosure;
Figure 10 illustrates a tabular form of classifier wise false positive rate in accordance with an embodiment of the present disclosure;
Figure 11 illustrates a tabular form of classifier wise root mean square error in accordance with an embodiment of the present disclosure;
Figure 12 illustrates a tabular form of classifier wise CPU elapse time in ms in accordance with an embodiment of the present disclosure;
Figure 13 illustrates classifier wise recognition accuracy in accordance with an embodiment of the present disclosure;
Figure 14 illustrates classifier wise root mean square error in accordance with an embodiment of the present disclosure;
Figure 15 illustrates classifier wise false positive rate in accordance with an embodiment of the present disclosure;
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Figure 1 illustrates a block diagram of a system for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure. The system 100 includes a benchmark image dataset (Caltech-101) unit 102. The database contains total of 9168 images that are grouped into 101 categories and 1 background scene category.
In an embodiment, a pre-processing module unit 104 is used for differentiating visual features of the image and generating meaningful representation of the image.
In an embodiment, a feature extraction module unit 106 is used for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region
In an embodiment, a feature selection module unit 108, employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image.
In an embodiment, a a locality preserving projection (LPP) unit 110 for reducing dimensions of the selected features of the image.
In an embodiment, a feature vector unit 112, which is formed by combining the features extracted from all these three feature extraction methods
In an embodiment, a classification model unit 114, use at least four classification module model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).
Figure 2 illustrates a flow chart of a method for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure. At step 202 the method 200 includes, differentiating visual features of the image of a benchmark image dataset Caltech 101and generating meaningful representation of the image. The dataset Caltech-101 contains a total of 9168 images that are grouped into 101 categories and1 background scene category.
At step 204 the method 200 includes, extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region. The three feature extraction methods are, pre-trained VGG19 model, SIFT and Haralick Texture Descriptor. All these methods require pre-processing individually before implementation. The color images are processed using the pre-processing method of Keras toolkit and then inputted to pre-trained VGG19 model. For SIFT algorithm, the color images are to be transformed into a gray scale. Haralick Texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale. The images in the Caltech-101 dataset are noisy and have invariant illumination. So, saliency maps are also applied to images to differentiate the visual features of the image and make the meaningful representation of the image. Further, threshold maps are also enforced to the images to extract each salient region.
At step 206 the method 200 includes, selecting key features of the image and removing irrelevant features of the image. After pre-processing of the images, feature extraction algorithms are applied to the processed images. They extract a set of features from the images and are stored as a feature vector. Pre-trained deep model VGG19 extracts 25088 features of an image. SIFT algorithm extracts 128 features of an image and Haralick texture algorithm 13 features of the image. As VGG19 and SIFT obtain a large size of features so both algorithms need a feature selection algorithm to select key features of an image. So, there is a need of feature selection algorithm that removes the irrelevant features of the image, and makes the recognition process very fast and more accurate. In the experiment, 64 key features are selected using k-means clustering algorithm. K-means clustering algorithm use Euclidian distance to process the features.
At step 208 the method 200 includes, reducing dimensions of the selected features of the image using a locality preserving projection (LPP). Eight dimensions are computed from features of size 64 using LPP.
At step 210 the method 200 includes, forming a feature vector by combining the features extracted from all these three feature extraction method.
At step 212 the method 200 includes, recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and
XGB Classifiers. Four models are trained using four classifiers- Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifier. To train the model, data-partitioning strategy is used. A standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.
Figure 3 illustrates a Block diagram of the proposed system in accordance with an embodiment of the present disclosure. The figure shows the architecture of the proposed system. The objective of this architecture is to improve the computation speed and recognition accuracy of the object recognition system. The objective of the proposed system is to improve the computation speed and recognition accuracy of the object recognition system. Then features are extracted using three feature extraction methods, i.e. pre-trained VGG19 model, SIFT and Haralick Texture Descriptor. All these methods require pre-processing individually before implementation. The color images are processed using the pre-processing method of Keras toolkit and then inputted to pre-trained VGG19 model. For SIFT algorithm, the color images are to be transformed into a gray scale. Haralick Texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale. The images in the Caltech-101 dataset are noisy and have invariant illumination. So, saliency maps (proposed by Montaboneet al. (2010)) are also applied to images to differentiate the visual features of the image and make the meaningful representation of the image. Further, threshold maps are also enforced to the images to extract each salient region. After pre-processing of the images, feature extraction algorithms are applied to the processed images. They extract a set of features from the images and are stored as a feature vector. Pre-trained deep model VGG19 extracts 25088 features of an image. SIFT algorithm extracts 128 features of an image and Haralick texture algorithm 13 features of the image. As VGG19 and SIFT obtain a large size of features so both algorithms need a feature selection algorithm to select key features of an image. So, there is a need of feature selection algorithm that removes the irrelevant features of the image, and makes the recognition process very fast and more accurate. In the experiment, 64 key features are selected using k-means clustering algorithm. K-means clustering algorithm use Euclidian distance to process the features. Then Locality Preserving Projection (LPP) is used to reduce the dimensions of the feature selected using the above mentioned method. In the experiment, eight dimensions are computed from features of size 64 using LPP. Next, a feature vector is formed by combining the features extracted from all these three feature extraction methods. Now four models are trained using four classifiers- Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifier. To train the model, data-partitioning strategy is used. In the experiment, a standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.
Figure 4 illustrates a tabular form of feature extraction techniques in accordance with an embodiment of the present disclosure. Features of an object are the properties of the object which make it distinguish from other objects. As more the features of an object, it becomes easy for the system to recognize the object image. Features of an object may be shape, color, texture or spatial layout information of the object. Scale Invariant Feature Transform (SIFT), Speed Up Robust Feature (SURF), Local Binary Pattern (LBP), Binary Robust Invariant Scalable Key points (BRISK), Haralick Texture Descriptor are some of the examples of local feature extraction algorithms.
Figure 5 illustrates a tabular form of classifier wise recognition accuracy in accordance with an embodiment of the present disclosure. The results reveal that a combination of SIFT, VGG19 and Haralick Texture descriptor has achieved an accuracy of 99.07% which is the highest accuracy obtained using Random Forest classification algorithm. Even decision tree classification algorithm determined 99.03% accuracy. Mathematical definition of Accuracy over a few classes (n) is given below:
1 n Number of correct predictions of ith class Accuracy (ACC) = ofculvle n . Total number of actual Values i=1
Figure 6 illustrates a tabular form of classifier wise precision rate in accordance with an embodiment of the present disclosure. The table shows the comparison of precision rate. Here, decision tree classifier presents the highest results for precision i.e. 99.13%. Mathematical definition of Precision over a few classes (n) is given below:
1 n Number of correct predictions of ith class Precision (P)= n . Total number of predictionsvalues .
Figure 7 illustrates a tabular form of classifier wise recall rate in accordance with an embodiment of the present disclosure. The results for recall are presented in this figure, which is maximum using random forest (99.07%). Mathematical definition of recall over a few classes (n) is given below:
1 n Number of correct predictionsof ith class Recall(R) n Total number of actual values
Figure 8 illustrates a tabular form of classifier wise F-Score in accordance with an embodiment of the present disclosure. Here, decision tree classifier presents the highest results for F-Score i.e. 99.13% and 99.05%. Mathematical definition of F-Score over a few classes (n) is given below:
n
F1 - score (F) = -. 1 ixR n . (Pi + Rj)
Figure 9 illustrates a tabular form of classifier wise area under curve in accordance with an embodiment of the present disclosure. In the results of the area under curve the random forest classifier achieved the highest results i.e. 99.53%. Area under curve (AUC) summarizes the overall performance of the classification algorithm which corresponds to the total proportion of correctly predicted values
Figure 10 illustrates a tabular form of classifier wise false positive rate in accordance with an embodiment of the present disclosure. It's a comparative analysis for a false positive rate. Random Forest shows better results for a false positive rate (0.02%). The random forest are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. Mathematical definition of false positive rate over a few classes (n) is given below:
1 n Number of incorrectpredictions of ith class False PositiveRate(FPR) Total number of incorrect actual & predicted values =1
Figure 11 illustrates a tabular form of classifier wise root mean square error in accordance with an embodiment of the present disclosure. It's a competitive analysis for a root mean square error. Root mean square shows the best results using Decision Tree, and Naive Bayes classifier i.e. (5.42%). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. Root mean square error (RMSE) is used to measure the average of the difference between the actual values and the predicted values of each class.
Figure 12 illustrates a tabular form of classifier wise CPU elapse time in ms in accordance with an embodiment of the present disclosure. It's a competitive analysis for CPU elapse time. CPU elapsed time shows the best results using Decision Tree, and Naive Bayes classifier i.e. (zero minutes). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. CPU elapse time (T) specifies the total elapse time used by the classification algorithm. The time is expressed in minutes.
Figure 13 illustrates classifier wise recognition accuracy in accordance with an embodiment of the present disclosure. The graph shows the comparative analysis of the proposed feature extraction approach among these classification algorithms is based on recognition accuracy. The results reveal that a combination of SIFT, VGG19 and Haralick Texture descriptor has achieved an accuracy of 99.07% which is the highest accuracy obtained using Random Forest classification algorithm. Even decision tree classification algorithm determined 99.03% accuracy.
Figure 14 illustrates classifier wise root mean square error in accordance with an embodiment of the present disclosure. Root mean square shows the best results using Decision Tree, and Naive Bayes classifier i.e. (5.42%). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor.
Figure 15 illustrates classifier wise false positive rate in accordance with an embodiment of the present disclosure. Random Forest shows better results for a false positive rate (0.02%). The random forest are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.
Claims (10)
- WE CLAIM 1. A system for real-time object recognition for visual surveillance, the system comprises:a benchmark image dataset Caltech-101 containing total of 9168 images that are grouped into 101 categories and 1 background scene category; a pre-processing module for differentiating visual features of the image and generating meaningful representation of the image; a feature extraction module for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; a feature selection module employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image; a locality preserving projection (LPP) for reducing dimensions of the selected features of the image; wherein a feature vector is formed by combining the features extracted from all these three feature extraction methods; and at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).
- 2. The system as claimed in claim 1, wherein the three feature extraction methods are VGG19 model, SIFT and Haralick Texture Descriptor.
- 3. The system as claimed in claim 2, wherein the color images are processed using deep pre processing method of Keras toolkit and then inputted to pre-trained VGG19 model, wherein for SIFT technique, the color images are to be transformed into a gray scale, wherein Haralick texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale.
- 4. The system as claimed in claim 1, wherein saliency maps based pre-processing module is used to remove noise and invariant illumination from the images in the Caltech-101 dataset.
- 5. The system as claimed in claim 1, wherein the feature extraction module extracts the set of features from the images and thereby stores as a feature vector.
- 6. The system as claimed in claim 1, wherein the pre-trained deep model VGG19 extracts 25088 features of an image and SIFT technique extracts 128 features of an image and Haralick texture technique 13 features of the image.
- 7. The system as claimed in claim 1, wherein in the experiment, 64 key features are selected using k-means clustering technique through Euclidian distance to process the features.
- 8. The system as claimed in claim 1, wherein data-partitioning strategy is used to train the model, wherein standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.
- 9. The system as claimed in claim 1, wherein the at least four classification models are predicted on the test data.
- 10. A method for real-time object recognition for visual surveillance, the method comprises:differentiating visual features of the image of a benchmark image dataset Caltech 101and generating meaningful representation of the image; extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; selecting key features of the image and removing irrelevant features of the image; reducing dimensions of the selected features of the image using a locality preserving projection (LPP); forming a feature vector by combining the features extracted from all these three feature extraction methods; and recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021101517A AU2021101517A4 (en) | 2021-03-25 | 2021-03-25 | A system for object recognition for visual surveillance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021101517A AU2021101517A4 (en) | 2021-03-25 | 2021-03-25 | A system for object recognition for visual surveillance |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021101517A4 true AU2021101517A4 (en) | 2021-05-13 |
Family
ID=75829145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021101517A Ceased AU2021101517A4 (en) | 2021-03-25 | 2021-03-25 | A system for object recognition for visual surveillance |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021101517A4 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569910A (en) * | 2021-06-25 | 2021-10-29 | 石化盈科信息技术有限责任公司 | Account type identification method and device, computer equipment and storage medium |
-
2021
- 2021-03-25 AU AU2021101517A patent/AU2021101517A4/en not_active Ceased
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569910A (en) * | 2021-06-25 | 2021-10-29 | 石化盈科信息技术有限责任公司 | Account type identification method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bansal et al. | 2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors | |
Gkelios et al. | Deep convolutional features for image retrieval | |
CN105518668B (en) | Content-based image retrieval | |
Yi et al. | Feature representations for scene text character recognition: A comparative study | |
Leibe et al. | Efficient clustering and matching for object class recognition. | |
US8452108B2 (en) | Systems and methods for image recognition using graph-based pattern matching | |
EP2657857A1 (en) | Method for binary classification of a query image | |
CN107480620B (en) | Remote sensing image automatic target identification method based on heterogeneous feature fusion | |
CN107633065B (en) | Identification method based on hand-drawn sketch | |
CN106909895B (en) | Gesture recognition method based on random projection multi-kernel learning | |
Li et al. | Image classification based on SIFT and SVM | |
Sampath et al. | Fuzzy-based multi-kernel spherical support vector machine for effective handwritten character recognition | |
Sampath et al. | Decision tree and deep learning based probabilistic model for character recognition | |
CN112163114B (en) | Image retrieval method based on feature fusion | |
Aslam et al. | Image classification based on mid-level feature fusion | |
Roy et al. | Word searching in scene image and video frame in multi-script scenario using dynamic shape coding | |
CN115203408A (en) | Intelligent labeling method for multi-modal test data | |
AU2021101517A4 (en) | A system for object recognition for visual surveillance | |
Kaur et al. | Cattle identification system: a comparative analysis of SIFT, SURF and ORB feature descriptors | |
John et al. | A multi-modal cbir framework with image segregation using autoencoders and deep learning-based pseudo-labeling | |
Villamizar et al. | Online learning and detection of faces with low human supervision | |
Elsayed et al. | Hand gesture recognition based on dimensionality reduction of histogram of oriented gradients | |
Mehri et al. | A pixel labeling approach for historical digitized books | |
Ghosh et al. | Efficient indexing for query by string text retrieval | |
Rahman et al. | Similarity searching in image retrieval with statistical distance measures and supervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |