AU2021101517A4

AU2021101517A4 - A system for object recognition for visual surveillance

Info

Publication number: AU2021101517A4
Application number: AU2021101517A
Authority: AU
Inventors: Monika Bansal; Munish Kumar; Ajay Mittal; Monika Sachdeva
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-05-13
Anticipated expiration: 2029-03-25

Abstract

The present disclosure relates to a system for real-time object recognition for visual surveillance. The system uses a benchmark image dataset Caltech-101, a pre-processing module, a feature extraction module, a feature selection module, a locality preserving projection, a feature vector and four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s). The system is examined on a fusion of various features extracted through a pre-trained deep learning model VGG19 model, SIFT and Haralick Texture method. The objective of the system is to improve the computation speed and recognition accuracy of the object recognition system. The fusion of deep and local features for 2D object recognition is considered in this present disclosure. 17 LImpee duamet 034ngdeepDGRL IQ Gm prvvakwdeVG619 SiFf HuahlckTexthw k-n~e~meamJ"terIa9 Figure 3 Fl SIFT F2 VGG19 F3 Hlaralick (without Saliency) F4 Haralick (with Saiency) Fl-F4 SIFT-+Haralick (with Saliency) F2+F4 VGGti)Haralick (wth Saliency) F1-+F2--F4 VGG19-+SIFT-+Haraick (with Saicy Figure 4

Description

LImpee duamet

034ngdeepDGRL IQ Gm

prvvakwdeVG619 SiFf HuahlckTexthw

k-n~e~meamJ"terIa9

Figure 3

Fl SIFT F2 VGG19 F3 Hlaralick (without Saliency) F4 Haralick (with Saiency) Fl-F4 SIFT-+Haralick (with Saliency) F2+F4 VGGti)Haralick (wth Saliency) F1-+F2--F4 VGG19-+SIFT-+Haraick (with Saicy

Figure 4

A system for object recognition for visual surveillance

FIELD OF THE INVENTION

The present disclosure relates to a system for real-time object recognition for visual surveillance.

BACKGROUND OF THE INVENTION

Object recognition/image classification is among the most popular applications of computer vision. The task of the object recognition system is to identify the object in the image and assign the class/label to the object. The system works like a human brain. Just like the human brain learns as many things as it encounters them and stores the information for future identification. The object recognition system follows the same process, as initially this system is trained with the features of the image and its associated class. Then the test images are used to predict the class for the candidate images. The efficiency of the system highly depends on the size of dataset because it helps to extract more features of the image. The features of the object may be the color, shape, texture or the spatial layout information of the object. There is a need to pre-process the image before feature extraction so that all the relevant and efficient information can be extracted from the image. Some of the feature extraction methods implicitly perform the necessary steps to remove these issues. But for some methods, they require to pre-process the image explicitly. There are various feature extraction algorithms used in image classification such as Scale Invariant Feature Transform (SIFT), Speed-Up Robust Feature (SURF), Local Binary Pattern (LBP), Histogram Oriented Gradients (HOG), Oriented FAST and rotated BRIEF (ORB), Color Histogram, Gabor texture, Haralick texture descriptors etc. But an image cannot be easily identified using any single feature as already proved by many researchers. So, the combination of these features is proposed by many researchers in their work and they achieved good results. Nowadays, deep learning features are becoming very popular for object recognition as they obtain the low-level information from each pixel of the image. It is very time consuming to design the deep learning model from scratch due to high computational power, GPU system, and huge amount of data. So, many researchers have experimented with various pre-trained deep learning for feature extraction.

A combined regional and texture feature extraction approach for image classification was proposed. Shape based features were extracted using statistical moment and image moment. After that, these features are described using Histogram of Oriented Gradients (HOG) for image classification. Local Binary Pattern (LBP) algorithm is used to extract texture features. The size of these combined set of features is optimized using principal component analysis algorithm. For classification, they have considered Support Vector Machine (SVM) classification algorithm. The system was experimented on Corel-100, Caltech-101, and Caltech-256 dataset and outperformed other methods. The behavior of features extracted through a deep CNN pre-trained model (VGG19) with a machine learning classifier (SVM) on three datasets - mit67, flowers102, and cub200 were studied and the behavior of individual feature of each class and explored the behavior of intra/inter property of the class were analyzed. A Bag-of-Features approach for image classification was presented. The features are extracted using a deep convolutional neural network based on salient object detection. The saliency map was used as the pre-processing step to extract the relevant features from the image. This process reduced the redundancy in the extracted features of the image that overcame the problem of over fitting and improves the recognition accuracy. Further, a multi-class linear SVM classifier to classify the features was implemented.

A Bag-of-Feature approach using SIFT and HOG feature descriptors to classify the objects was proposed. The experiment was conducted on a few categories of the Caltech-101 dataset and Scene-15 dataset. K-means clustering algorithm is used to reduce the size of the feature vector for fast and efficient computations. These form a Visual Dictionary for the image of the classification process. For classification, a state-of-art SVM classifier is adopted to classify the images. The model achieved good performance on these datasets. A new approach for image classification by combining the deep residual features using a pre-trained model with the supervised machine learning algorithm was presented. In this, pre-trained ResNet-152 and Google Net convolutional neural network were experimented for feature extraction of various food datasets- Food 5K, Food-11, Raw FooT-DB, and Food-101. Then, the extracted features through these models are used to train various machine learning algorithms, i.e., Naive Bayes, Artificial Neural Network (ANN), Support Vector Machine (SVM), and Random Forest. A new approach for content-based image retrieval system based on the combination of color and texture descriptors was proposed. A fusion of color histogram and Discrete wavelet Transforms and

Edge Histogram Texture descriptor is adopted for the experiment that proved efficient results as compared to the single feature extractor. Further, the Manhattan distance measure is used to match and retrieve the images like the query image.

The conclusion was that Random Forest classification algorithm is achieving highest accuracy among other classifiers. A fusion of handcrafted and deep features for content based image retrieval system was proposed. Further, a feature selection algorithm k-NN is applied on the extracted features to reduce the dimensions of the feature map. Similarity between the images in the dataset and the queried image is computed using Euclidean distance algorithm.

However, there are various challenges that affect the performance of the recognition system such as noise, invariant scale, rotation, translation, and illuminating effects in the image, which degrade the performance of the system. In order to overcome the aforementioned drawbacks, there is a need of a system for real-time object recognition for visual surveillance.

SUMMARY OF THE INVENTION

The present disclosure relates to, a system for real-time object recognition for visual surveillance. This is accomplished by extracting various features of the object that helps in the identification of the proper class for the object irrespective of other classes. The system is examined on a fusion of various features extracted through a pre-trained deep learning model VGG19 model, SIFT and Haralick Texture method. Further, various machine learning classification methods, i.e. Gaussian Naive Bayes, Decision Tree, Random Forest, and XGB classifier are used to classify the objects in various classes according to the extracted features. The experiment was conducted on a benchmark image dataset Caltech-101 that contains a collection of noisy, rotated, differently scaled images. Various other performance measurement parameters like precision, recall, Fl-Score, area under curve, false positive rate, root mean square error, and CPU execution time are also evaluated.

The present disclosure seeks to provide a system for real-time object recognition for visual surveillance. The system comprises: a benchmark image dataset Caltech-101 containing total of 9168 images that are grouped into 101 categories and 1 background scene category; a pre-processing module for differentiating visual features of the image and generating meaningful representation of the image; a feature extraction module for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; a feature selection module employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image; a locality preserving projection (LPP) for reducing dimensions of the selected features of the image; wherein a feature vector is formed by combining the features extracted from all these three feature extraction methods; and at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).

The present disclosure also seeks to provide a method for real-time object recognition for visual surveillance, the method comprises: differentiating visual features of the image of a benchmark image dataset Caltech-101and generating meaningful representation of the image; extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; selecting key features of the image and removing irrelevant features of the image; reducing dimensions of the selected features of the image using a locality preserving projection (LPP); forming a feature vector by combining the features extracted from all these three feature extraction methods; and recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers.

An objective of the present disclosure is to develop a system for real-time object recognition for visual surveillance.

Another object of the present disclosure is to extract various features of the object that helps in the identification of the proper class for the object irrespective of other classes.

Another object of the present disclosure is to use various machine learning classification methods to classify the objects in various classes according to the extracted features.

Another object of the present disclosure is to evaluate various performance measurement parameters

Yet, another object of the present disclosure is to improve the computation speed and recognition accuracy of the object recognition system.

To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.

BRIEF DESCRIPTION OF FIGURES

These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:

Figure 1 illustrates a block diagram of a system for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure;

Figure 2 illustrates a flow chart of a method for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure;

Figure 3 illustrates a Block diagram of the proposed system in accordance with an embodiment of the present disclosure;

Figure 4 illustrates a tabular form of feature extraction techniques in accordance with an embodiment of the present disclosure;

Figure 5 illustrates a tabular form of classifier wise recognition accuracy in accordance with an embodiment of the present disclosure;

Figure 6 illustrates a tabular form of classifier wise precision rate in accordance with an embodiment of the present disclosure;

Figure 7 illustrates a tabular form of classifier wise recall rate in accordance with an embodiment of the present disclosure;

Figure 8 illustrates a tabular form of classifier wise F-Score in accordance with an embodiment of the present disclosure;

Figure 9 illustrates a tabular form of classifier wise area under curve in accordance with an embodiment of the present disclosure;

Figure 10 illustrates a tabular form of classifier wise false positive rate in accordance with an embodiment of the present disclosure;

Figure 11 illustrates a tabular form of classifier wise root mean square error in accordance with an embodiment of the present disclosure;

Figure 12 illustrates a tabular form of classifier wise CPU elapse time in ms in accordance with an embodiment of the present disclosure;

Figure 13 illustrates classifier wise recognition accuracy in accordance with an embodiment of the present disclosure;

Figure 14 illustrates classifier wise root mean square error in accordance with an embodiment of the present disclosure;

Figure 15 illustrates classifier wise false positive rate in accordance with an embodiment of the present disclosure;

Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.

DETAILED DESCRIPTION

For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.

It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.

Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.

Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.

Figure 1 illustrates a block diagram of a system for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure. The system 100 includes a benchmark image dataset (Caltech-101) unit 102. The database contains total of 9168 images that are grouped into 101 categories and 1 background scene category.

In an embodiment, a pre-processing module unit 104 is used for differentiating visual features of the image and generating meaningful representation of the image.

In an embodiment, a feature extraction module unit 106 is used for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region

In an embodiment, a feature selection module unit 108, employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image.

In an embodiment, a a locality preserving projection (LPP) unit 110 for reducing dimensions of the selected features of the image.

In an embodiment, a feature vector unit 112, which is formed by combining the features extracted from all these three feature extraction methods

In an embodiment, a classification model unit 114, use at least four classification module model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).

Figure 2 illustrates a flow chart of a method for real-time object recognition for visual surveillance in accordance with an embodiment of the present disclosure. At step 202 the method 200 includes, differentiating visual features of the image of a benchmark image dataset Caltech 101and generating meaningful representation of the image. The dataset Caltech-101 contains a total of 9168 images that are grouped into 101 categories and1 background scene category.

At step 204 the method 200 includes, extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region. The three feature extraction methods are, pre-trained VGG19 model, SIFT and Haralick Texture Descriptor. All these methods require pre-processing individually before implementation. The color images are processed using the pre-processing method of Keras toolkit and then inputted to pre-trained VGG19 model. For SIFT algorithm, the color images are to be transformed into a gray scale. Haralick Texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale. The images in the Caltech-101 dataset are noisy and have invariant illumination. So, saliency maps are also applied to images to differentiate the visual features of the image and make the meaningful representation of the image. Further, threshold maps are also enforced to the images to extract each salient region.

At step 206 the method 200 includes, selecting key features of the image and removing irrelevant features of the image. After pre-processing of the images, feature extraction algorithms are applied to the processed images. They extract a set of features from the images and are stored as a feature vector. Pre-trained deep model VGG19 extracts 25088 features of an image. SIFT algorithm extracts 128 features of an image and Haralick texture algorithm 13 features of the image. As VGG19 and SIFT obtain a large size of features so both algorithms need a feature selection algorithm to select key features of an image. So, there is a need of feature selection algorithm that removes the irrelevant features of the image, and makes the recognition process very fast and more accurate. In the experiment, 64 key features are selected using k-means clustering algorithm. K-means clustering algorithm use Euclidian distance to process the features.

At step 208 the method 200 includes, reducing dimensions of the selected features of the image using a locality preserving projection (LPP). Eight dimensions are computed from features of size 64 using LPP.

At step 210 the method 200 includes, forming a feature vector by combining the features extracted from all these three feature extraction method.

At step 212 the method 200 includes, recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and

XGB Classifiers. Four models are trained using four classifiers- Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifier. To train the model, data-partitioning strategy is used. A standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.

Figure 3 illustrates a Block diagram of the proposed system in accordance with an embodiment of the present disclosure. The figure shows the architecture of the proposed system. The objective of this architecture is to improve the computation speed and recognition accuracy of the object recognition system. The objective of the proposed system is to improve the computation speed and recognition accuracy of the object recognition system. Then features are extracted using three feature extraction methods, i.e. pre-trained VGG19 model, SIFT and Haralick Texture Descriptor. All these methods require pre-processing individually before implementation. The color images are processed using the pre-processing method of Keras toolkit and then inputted to pre-trained VGG19 model. For SIFT algorithm, the color images are to be transformed into a gray scale. Haralick Texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale. The images in the Caltech-101 dataset are noisy and have invariant illumination. So, saliency maps (proposed by Montaboneet al. (2010)) are also applied to images to differentiate the visual features of the image and make the meaningful representation of the image. Further, threshold maps are also enforced to the images to extract each salient region. After pre-processing of the images, feature extraction algorithms are applied to the processed images. They extract a set of features from the images and are stored as a feature vector. Pre-trained deep model VGG19 extracts 25088 features of an image. SIFT algorithm extracts 128 features of an image and Haralick texture algorithm 13 features of the image. As VGG19 and SIFT obtain a large size of features so both algorithms need a feature selection algorithm to select key features of an image. So, there is a need of feature selection algorithm that removes the irrelevant features of the image, and makes the recognition process very fast and more accurate. In the experiment, 64 key features are selected using k-means clustering algorithm. K-means clustering algorithm use Euclidian distance to process the features. Then Locality Preserving Projection (LPP) is used to reduce the dimensions of the feature selected using the above mentioned method. In the experiment, eight dimensions are computed from features of size 64 using LPP. Next, a feature vector is formed by combining the features extracted from all these three feature extraction methods. Now four models are trained using four classifiers- Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifier. To train the model, data-partitioning strategy is used. In the experiment, a standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.

Figure 4 illustrates a tabular form of feature extraction techniques in accordance with an embodiment of the present disclosure. Features of an object are the properties of the object which make it distinguish from other objects. As more the features of an object, it becomes easy for the system to recognize the object image. Features of an object may be shape, color, texture or spatial layout information of the object. Scale Invariant Feature Transform (SIFT), Speed Up Robust Feature (SURF), Local Binary Pattern (LBP), Binary Robust Invariant Scalable Key points (BRISK), Haralick Texture Descriptor are some of the examples of local feature extraction algorithms.

Figure 5 illustrates a tabular form of classifier wise recognition accuracy in accordance with an embodiment of the present disclosure. The results reveal that a combination of SIFT, VGG19 and Haralick Texture descriptor has achieved an accuracy of 99.07% which is the highest accuracy obtained using Random Forest classification algorithm. Even decision tree classification algorithm determined 99.03% accuracy. Mathematical definition of Accuracy over a few classes (n) is given below:

1 n Number of correct predictions of ith class Accuracy (ACC) = ofculvle n . Total number of actual Values i=1

Figure 6 illustrates a tabular form of classifier wise precision rate in accordance with an embodiment of the present disclosure. The table shows the comparison of precision rate. Here, decision tree classifier presents the highest results for precision i.e. 99.13%. Mathematical definition of Precision over a few classes (n) is given below:

1 n Number of correct predictions of ith class Precision (P)= n . Total number of predictionsvalues .

Figure 7 illustrates a tabular form of classifier wise recall rate in accordance with an embodiment of the present disclosure. The results for recall are presented in this figure, which is maximum using random forest (99.07%). Mathematical definition of recall over a few classes (n) is given below:

1 n Number of correct predictionsof ith class Recall(R) n Total number of actual values

Figure 8 illustrates a tabular form of classifier wise F-Score in accordance with an embodiment of the present disclosure. Here, decision tree classifier presents the highest results for F-Score i.e. 99.13% and 99.05%. Mathematical definition of F-Score over a few classes (n) is given below:

n

F1 - score (F) = -. 1 ixR n . (Pi + Rj)

Figure 9 illustrates a tabular form of classifier wise area under curve in accordance with an embodiment of the present disclosure. In the results of the area under curve the random forest classifier achieved the highest results i.e. 99.53%. Area under curve (AUC) summarizes the overall performance of the classification algorithm which corresponds to the total proportion of correctly predicted values

Figure 10 illustrates a tabular form of classifier wise false positive rate in accordance with an embodiment of the present disclosure. It's a comparative analysis for a false positive rate. Random Forest shows better results for a false positive rate (0.02%). The random forest are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. Mathematical definition of false positive rate over a few classes (n) is given below:

1 n Number of incorrectpredictions of ith class False PositiveRate(FPR) Total number of incorrect actual & predicted values =1

Figure 11 illustrates a tabular form of classifier wise root mean square error in accordance with an embodiment of the present disclosure. It's a competitive analysis for a root mean square error. Root mean square shows the best results using Decision Tree, and Naive Bayes classifier i.e. (5.42%). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. Root mean square error (RMSE) is used to measure the average of the difference between the actual values and the predicted values of each class.

Figure 12 illustrates a tabular form of classifier wise CPU elapse time in ms in accordance with an embodiment of the present disclosure. It's a competitive analysis for CPU elapse time. CPU elapsed time shows the best results using Decision Tree, and Naive Bayes classifier i.e. (zero minutes). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor. CPU elapse time (T) specifies the total elapse time used by the classification algorithm. The time is expressed in minutes.

Figure 13 illustrates classifier wise recognition accuracy in accordance with an embodiment of the present disclosure. The graph shows the comparative analysis of the proposed feature extraction approach among these classification algorithms is based on recognition accuracy. The results reveal that a combination of SIFT, VGG19 and Haralick Texture descriptor has achieved an accuracy of 99.07% which is the highest accuracy obtained using Random Forest classification algorithm. Even decision tree classification algorithm determined 99.03% accuracy.

Figure 14 illustrates classifier wise root mean square error in accordance with an embodiment of the present disclosure. Root mean square shows the best results using Decision Tree, and Naive Bayes classifier i.e. (5.42%). Decision tree classifiers are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor.

Figure 15 illustrates classifier wise false positive rate in accordance with an embodiment of the present disclosure. Random Forest shows better results for a false positive rate (0.02%). The random forest are achieving the best results for all parameters among other methods when applied on a combination of SIFT, VGG19 and Haralick Texture descriptor

The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.

Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims

WE CLAIM 1. A system for real-time object recognition for visual surveillance, the system comprises:

a benchmark image dataset Caltech-101 containing total of 9168 images that are grouped into 101 categories and 1 background scene category; a pre-processing module for differentiating visual features of the image and generating meaningful representation of the image; a feature extraction module for extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; a feature selection module employing K-means clustering technique for selecting key features of the image and removing irrelevant features of the image; a locality preserving projection (LPP) for reducing dimensions of the selected features of the image; wherein a feature vector is formed by combining the features extracted from all these three feature extraction methods; and at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers for recognizing object(s).
2. The system as claimed in claim 1, wherein the three feature extraction methods are VGG19 model, SIFT and Haralick Texture Descriptor.
3. The system as claimed in claim 2, wherein the color images are processed using deep pre processing method of Keras toolkit and then inputted to pre-trained VGG19 model, wherein for SIFT technique, the color images are to be transformed into a gray scale, wherein Haralick texture descriptor extracts the information using gray level co-occurrence matrix (GLCM) so the images are to be converted into gray scale.
4. The system as claimed in claim 1, wherein saliency maps based pre-processing module is used to remove noise and invariant illumination from the images in the Caltech-101 dataset.
5. The system as claimed in claim 1, wherein the feature extraction module extracts the set of features from the images and thereby stores as a feature vector.
6. The system as claimed in claim 1, wherein the pre-trained deep model VGG19 extracts 25088 features of an image and SIFT technique extracts 128 features of an image and Haralick texture technique 13 features of the image.
7. The system as claimed in claim 1, wherein in the experiment, 64 key features are selected using k-means clustering technique through Euclidian distance to process the features.
8. The system as claimed in claim 1, wherein data-partitioning strategy is used to train the model, wherein standard 70:30 partitioning is used in which 70% of the images from each class are used for training and remaining 30% are used for testing.
9. The system as claimed in claim 1, wherein the at least four classification models are predicted on the test data.
10. A method for real-time object recognition for visual surveillance, the method comprises:

differentiating visual features of the image of a benchmark image dataset Caltech 101and generating meaningful representation of the image; extracting a set of features from the images using three feature extraction methods, wherein threshold maps are also enforced to the images to extract each salient region; selecting key features of the image and removing irrelevant features of the image; reducing dimensions of the selected features of the image using a locality preserving projection (LPP); forming a feature vector by combining the features extracted from all these three feature extraction methods; and recognizing object(s) using at least four classification model trained by using Gaussian Naive Bayes, Decision Tree, Random Forest and XGB Classifiers.