AU2021103865A4

AU2021103865A4 - Forensic Tool for the Semantic-Based Image Retrieval System

Info

Publication number: AU2021103865A4
Application number: AU2021103865A
Authority: AU
Inventors: Sreenivasa Reddy E.; Ramesh Babu P.
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-07-05
Filing date: 2021-07-05
Publication date: 2021-09-09
Anticipated expiration: 2029-07-05

Abstract

Forensic Tool for the Semantic-Based Image Retrieval System Computer forensics, network forensics, and internet forensics are all part of cyber forensics. In cyber forensics, digital photographs were frequently utilised to capture criminal images, fingerprints, and images of crime scenes, among other things. Because today's cyber forensic systems aren't well-equipped to handle large amounts of image data, obtaining image evidence to convict a criminal is a major challenge; most evidence is provided in the form of raw semantics. With the help of semantics, cyber forensic investigators are often faced with the task of manually evaluating a massive amount of digital image data in order to uncover direct evidence. As a result, the semantic based image retrieval system (SBIR) is the most recent and best alternative for addressing this flaw. The major goal of this study is to create a new framework for semantic-based image retrieval for cyber forensic tools. For cyber-related forensic tools, we present a deep learning framework based on the Convolution neural network (GoogleNet) for recognising distinct facial expressions from the Yale facial image database. The framework is very effective for classification and detection based on semantics or verbal descriptions. After training, the network achieved a reasonable accuracy of 86.25 percent. 1 Match input Input Semantic Features GoogLeNet featurewith Output hnagstoExtrctin an Covoluionfeatre ithImage Database Decoding using on Extracted feature learned basedon DWT Features by GoogLeNet Semantics Input feature for image retrieval in the form of semantics Fig.1 Input Hidden Hidden Output Layer Layer1 Layer 2 Layer Fig.2

Description

Match input Input Semantic Features GoogLeNet featurewith Output hnagstoExtrctin an Covoluionfeatre ithImage Database Decoding using on Extracted feature learned basedon DWT Features by GoogLeNet Semantics

Input feature for image retrieval in the form of semantics

Fig.1

Input Hidden Hidden Output

Layer Layer1 Layer 2 Layer

Fig.2

TITLE OF THE INVENTION Forensic Tool for the Semantic-Based Image Retrieval System

FIELD OF THE INVENTION

[001]. The present disclosure is generally related to a development of Forensic Tool for the Semantic-Based Image Retrieval System.

BACKGROUND OF THE INVENTION

[001]. Cyber forensics includes the areas of computer forensics, network forensics and internet forensics. Digital images were commonly used in cyber forensics to collect criminal images, fingerprints, images of crime events and so on. Since the current cyber forensic tools are not very much furnished with the course of action of huge image data, it becomes a big issue to obtain image evidence to prosecute the criminal, most of the evidence is available in the form of raw semantics. Cyber forensic investigators often face the challenge of manually examining an enormous quantity of digital image information to identify direct evidence with the assistance of this semantics. Semantic based image retrieval system (SBIR) is therefore the recent and best option to solve this drawback.

[002]. The main objective of this invention is to design for cyber forensic tools a novel framework of semantic-based image retrieval system.

SUMMARY OF THE INVENTION

[003]. Cyber forensics includes the areas of computer forensics, network forensics and internet forensics. Digital images were commonly used in cyber forensics to collect criminal images, fingerprints, images of crime events and so on. Since the current cyber forensic tools are not very much furnished with the course of action of huge image data, it becomes a big issue to obtain image evidence to prosecute the criminal, most of the evidence is available in the form of raw semantics. Cyber forensic investigators often face the challenge of manually examining an enormous quantity of digital image information to identify direct evidence with the assistance of this semantics. Semantic based image retrieval system (SBIR) is therefore the recent and best option to solve this drawback. The main purpose of this invention is to design for cyber forensic tools a novel framework of semantic-based image retrieval system.

[004]. This invention presents a deep learning framework based on the Convolution neural network (GoogLeNet) for recognizing distinct facial expressions from the Yale facial image database for cyber associated forensic tools, the presented framework is very effective for classification and detection based on semantics or verbal descriptions.The network has accomplished a decent accuracy of 86.25 % after training.

DETAILED DESCRIPTION OF THE INVENTION

[005]. As the general public has become increasingly reliant on computers, networks, and internet-related developments, which, while generally beneficial, have turned out to be a curse in some cases. The digital world is becoming a focal point for criminal acts such as web defacement, vandalism, and cyber warfare. Another rapidly growing discipline is cyber forensics, which is concerned with the investigation of cyber crimes in order to obtain potential evidence. Cyber forensics includes computer, network, and Internet forensics.

[006]. Examining digital photos found on the target media is typically a part of cyber forensic investigations. A digital image, or simply image, is any picture that can be made, copied, and saved in digital form. A vector graphic or raster graphic can be used to represent a picture. Vector graphics are visuals formed using mathematical expressions (lines and shapes of 2D or 3D images). Raster graphics are the images taken or scanned from a sample.

[0071. The semantics of an image are the spoken facts about the image, pixel, and its attributes. The semantics of an image is simply the meaning of a picture. In forensic examination, picture retrieval based on image semantics is critical.

[008]. In forensic examination, some facial image recovery challenges are now occuring for a few days. Facial retrieval is the process of extracting facial images from a series of photographs that are relevant to the user's needs. The SBIR is based on a set of spoken facts that are linked to an image. The retrieval system is anticipated to present images from the database that match the verbal facts.

[009]. The use of phrases like oval face, lips, and nose in the verbal fact or description of people's visual image is virtually invariably semantic in character. SBIR is concerned with retrieving images based on facial attributes, such as spoken facts about a person's nose, eyes, and lips, rather than raw image data, and then comparing the composite image to the data set photos. Facial recognition technology are in significant demand in forensic investigations of criminal identities and other apps. To retrieve the target face from a dataset, many law enforcement programmes must integrate the smooth biometric aspects of the face.

[0010]. Using computer vision, particularly Convolution neural networks, we can discern distinct facial expressions based on attributes such as happy, sad, surprise, happy face with glasses, sad face with spectacles, and sad face with moustache (CNN). CNN's GoogLeNet, a pre-trained CNN, has the potential to be useful.

[0011]. GoogLeNet has been trained on over 1000 million photos, and it learns rich feature representations of various images. It accepts input and then outputs as a label for an image. The real face is one of the most important parts of the body for defining people's low and high features. Now we'll see a brief explanation of a CNN experiment using face features identification.

[0012]. About DWT: The discrete wavelet transformation (DWT) is a transformation technique that uses a discrete set of image compression scales in accordance with specific guidelines. To put it another way, this technique breaks down the signal into a set of wavelets that are mutually orthogonal, sometimes called discrete-time continuous wavelet transformation that implements it for the discrete time series.

[0013]. Scaling function which defines its scaling characteristics can build the wavelet. The restriction of orthogonal scaling functions to their discrete translations implies such mathematical conditions listed almost anywhere.

[0014]. The discrete wavelet transform (DWT) algorithms have a strong place in signal processing in several invention and business fields. As DWT offers octave-scale frequency as well as spatial timing of the analysed signal, it is constantly used to fix and treat more and more sophisticated issues.

[0015]. Most of the image compression and image analysis uses the DWT for images. JPEG 2000 is one implementation of the 2D DWT. The heart of the algorithm is to break down the image into the DWT elements and then build trees of the DWT extracted coefficients to determine which elements can be omitted before saving the image. We can eliminate extraneous data in this manner, but there is also a huge advantage that the DWT is lossless.

[0016]. Which filter(s) are used in JPEG 2000, we don't understand, but we can understand for sure that the standard is lossless. This implies that without any artifacts or quantization mistakes we will be able to reconstruct the initial information. JPEG 2000 also has a loss option where we can further reduce the file size by eliminating more of the DWT coefficients in a way that is imperceptible to average usage.

[0017]. About GoogLeNet: The ImageNet Large-Scale Visual Recognition Challenge (ILSVRC 2014) competition launched GoogLeNet in 2014. GoogleNet is designed by a researcher from Google. GoogLeNet accomplished a 6.67 false positive rate that is quite near to the performance model at the human level. The best example for the convolution neural network (CNN) isGoogLeNet.

[0018]. It is also called as pre-trained ImageNet. It was proved very powerful model. It contains 22 layers initially and further it has enhanced. Using GoogleLeNet learning is typically much quicker and easier than training a scratch-based network. GoogLeNet generally handles classification, extraction and learning activities. GoogleNet contains a novel approach known as inception model.

[0019]. In a single layer it represents multiple kinds of "feature extractors". These extractors indirectly help to improve performance of network because the network at training itself have many choices to achieve the task. It has two options to turn the input, or to pool it directly. The final architecture contains multiple stacked model, even training process is different in google net as most layers which are at the top has its own output layer. This helps model in parallel training as well as joint training. This network actually works with machine potency and so that it can run on individual users with low computer resources especially with low memory.

[0020]. Problem Definition: As when the amount of digital images accessible increases significantly, the capacity to search across digital images is becoming relatively important. Moreover, defining any image that is available and influencing which digital images are comparable to many other digital images that become inconvenient for individuals. The current CBIR technique uses visual content images to find and retrieve images from the necessary databases.

[0021]. Face images usually vary from many other CBIR images because in their general setup facial images are difficult, nonlinear and preferable. In current techniques, image retrieval techniques mainly use low-level features such as color, texture, and shape. Pictorial features are acquired inevitably which use image processing methods to represent the raw material of an image. CBIR method generally produces comparable color, images, and shape-based image retrieval produces images that obviously have the same shape. Consequently, the present CBIR systems that are used with low-level features for the general purpose of image retrieval facial images are not efficient, particularly if the user's query is a verbal fact.

[0022]. These facts are not locating the semantic elements of a face. By fact, people also tend to use all the verbal facts of semantic characteristics to specify what they have been looking for and find it difficult to use low-level features. A past few years developed biometric safety CBIR system is applied to low-level facial images. This technique has obtained a precise face image recovery of approximately 95 percent accuracy, however, this method does not apply to real-time information.

[0023]. In addition, low-level feature-based CBIR does not generate precise outcomes in face images with the same individuals at various features such as Glasses, No Glasses, Moustache and No Moustache, etc. The current technologies therefore present some disadvantages in the retrieval of facial images using semantic queries or verbal descriptions; that is, during retrieval they are not regarded in the semantic features.

[0024]. Proposed Methodology: We described an accurate model for the image retrieval system based on semantics as a cyber-forensic tool using CNN based pre-trained deep learning network (GoogLeNet) and Discrete Wavelet Transform (DWT) domain. This proposed method investigates its chance and potential benefits of studying the filters in the DWT domain of convolutional neural network (CNN) to image recognition. We initiated the discrete transformation of the wavelet on face images to extract features. Finally, the classification process will be done by deep learning through GoogLeNet. It is a type of pre-trained neural network that has been used in several areas such as classification, decision-making, and so on.

[0025]. This proposed method retrieves more related features from the given input image. The overall process diagram of the architecture is presented in figure 1.

[0026]. Now we will go through the basic information on important concepts of DWT and GoogLeNet, which are playing the vital role in the proposed methodology of an efficient framework for semantic based facial image retrieval system as a cyber-forensic tool.

[0027]. Algorithm of deep CNN for Proposed Framework:

Step 1. Input image layer which has input image dimension as 224*224*3, with normalization as zero centre. Step 2. Second layer Convolution 2-D layer with filter size. Step 3. 3'd layer is a layer of ReLu which conducts a threshold function for each i/ component in which any variable just under zero is assigned to zero. Step 4. 4 layer by splitting the input into rectangular pooling areas and calculating the maximum of each region, a max pooling layer conducts down-sampling. Step 5. Cross channel normalization layer - A local reaction channel-wise layer of normalization performs channel-wise normalization. Step 6. After that convolution 2-d layer with filter size. Step 7. ReLu layer, convolution layer repeats till9 layer. Step 8. Convolution layer as inception module is stacked linearly to get better performance. Step 9. Last 4 layer of 144 layer are > Average pooling 2D layer. > Dropout layer. > Fully connected layer. > Softmax layer (Softmax function to the input and classification output layer)

[0028]. In training phase, neural network is trained using features which are extracted from discrete wavelet transform. Discrete wavelet transform technique is used to calculate wavelet decomposition. In testing phase, to test unknown image and classify, first wavelet transform of an unknown image is calculated and all features will be extracted. Low-level features are minor details of the image, like lines or dots, that can be picked up by, say, a convolutional filter (for really low-level things) or SIFT or HOG (for more abstract things like edges). High-level features are designed to detect entities and larger shapes in the image on top of low-level features. Convolutional neural networks use both types of features: the first couple convolutional layers will learn filters for finding lines, dots, curves etc. while the later layers will learn to recognize common objects and shapes.

[0029]. The second step is to use GoogLeNet for extracted features with the desired values and test it to determine the object class of a given unknown image.

[0030]. Algorithm:

Step 1: Read the input image and resize it. (If not 224*224 size) Step 2: Image pre-processing is applied. Step 3: Calculate Wavelet Transform of a given input image. Step 4: Extract Low Level and High Level features from Discrete Wavelet Transform. Step 5: For classification, use the pre-trained conventional neural network (GoogLeNet). Step 6: Test the given image. Stop.

[0031]. CNN Model Phase: this phase have four layers such as input layer, hidden layer 1, hidden layer-2 and output layer.

[0032]. Input Layer: The Input Layer provides an entry point for incoming data. As such it needs to match the format or "shape" of the expected input. For example, an RGB image 224 pixels high and 224 pixels wide might require an Input Layer of 150528 nodes organised into a 3D structure (224* 224 *3). In such a structure each node would represent the Red, Green or Blue value for a given pixel.

[0033]. Hidden Layer: Hidden layers are so called because they sit between the Input and Output Layers and have no contact with the "outside world". The function is to define input information features and use them to correlate a specified input with the right output. A CNN can have multiple hidden Layers. In the figure 2 hidden layer 1 describes about low level features (edges, circles) and hidden layer 2 describes about high level features (nose, lips and mouth).

[0034]. A drawback of deep learning is that the feature representations in hidden layers are not always human readable like the above example. This means that it can be extremely difficult to get insight into what a Deep Learning ANN delivers a specific result, especially if CNN is working in more than 3 dimensions.

[0035]. Output Layer: The Output Layer provides the CNN's end outcome and is organized by the use case you are working on. For example, if you wanted a CNN to recognise 10 different objects in images you might want 10 output nodes, each representing one of the objects you are trying to find. The final score from each output node would then indicate whether or not the associated object had been found by the CNN.

[0036]. Datasets Considered: We have trained and tested the Yale dataset, Dataset have 16 categorized folder named as big eyes, small eyes, glasses, happy, sad, wink, long nose, moustache, no moustache, no glasses, sleepy, surprise, short nose, long nose, big eyes, small eyes, square face, round face, oval face. All images are resized to 224*224 in pre processing. There are total 167 images in the dataset out of which 70% is used as training data and 30 % as validation data. Yale Facial Database is approximate 6.4MB in size.

[0037]. Network Layer Graph (GoogLeNet): Network contains 144 layers with 170 connections, we can visualize the layer graph by plotting networgh file of the network. The component of the layer graph is input, Input are the images of dimension 224-by 224-by-3, where 3 indicates the color channel. This dimension of the images is required by the first layer. The image input layer is the first component of the network's Layers property.

[0038]. Image feature is acquired from the network's convolution layers. This is the last layer and use the final layer of categorization to categorize the image of the input.Google Net's two layers called loss3-classifying layer and output layer have data about how to merge the features the network extracts into class outcomes, a projected labels, and a drop rate. We need to substitute these two layers with new layers that are tailored to the current information set to retrain a pre-trained network to classify raw images.

[0039]. Find the names to substitute the two layers. We used to discover Layers to replace the supporting feature. Afterwards, replace by a new fully linked layer with the amount of inputs equal to the amount of categories within the new data set.

[0040]. Experimental Results: Using the Yale facial dataset, the suggested facial image retrieval framework is analysed. This Yale facial dataset is made up of 167 images in total, which includes the face features of happy, sad, wink, surprise, oval face, round face, moustache face with glasses, sad face with glasses and sad face with No glasses and so on.

[0041]. Network Training: 224*224*3 type of images are needed by the network for training process, it indicates extra increment activities to be performed on the training images: flip the training images evenly across the vertical axis and evenly convert them up to 30 pixels and scale them up to 10 percent vertical and horizontal. Data enhancement helps avoid overfitting and memorizing the precise information of the training images by the network.

[0042]. Results: It takes 35 min to complete the training process with accuracy 86.25 %,Total iteration, Training cycle parameter as iteration per epoch are 5, Maximum iterations are 100.We have tested single feature as well as multiple features, we have tested for single feature as image retrieval for happy expression.

[0043]. This invention represents about the design of an efficient framework for semantic based facial image retrieval system as a cyber-forensic tool using CNN based deep learning. Since current cyber forensic tools are not set up with beneficial semantic based image retrieval methods, there is a strong need of advancement of novel methods and frameworks in cyber forensics tool development in perspective of image retrieval methods.

[0044]. The proposed CNN deep learning based semantic based facial expression image retrieval system can also help in solving many real time problems using sentimental analysis, such as it can help in automatic driving, we can monitor from mood of the driver so accidents can be avoided. Finally after 144 layers of deep training we got the best result with 86.25 % accuracy.

[0045]. Hope that this invention will address the requirements of novice scientists and learners who are actively engaged in the CNN based deep learning, cyber forensics and semantic based image retrieval systems.

Claims

CLAIMS: We Claim: 1. We claim that the present disclosure is related to the development of Forensic Tool for the Semantic-Based Image Retrieval System.
2. We claim that in this proposed invention, CNN deep learning based semantic based facial expression image retrieval system was used.
3. As claimed in 2, the deep learning based semantic facial expression image retrieval system help in solving many real time problems using sentimental analysis
4. We claim that this proposed invention will help in automatic driving.
5. As claimed in 4, this invention will helps in avoiding accidents by monitoring the mood of the driver.

Match input Input Semantic Features GoogLeNet Output Extraction and feature with Images to Convolution Image Decoding using feature learned Database on Extracted based on DWT Features by GoogLeNet Semantics 2021103865

Input feature for image retrieval in the form of semantics (Happy) Fig.1

Input Hidden Hidden Output Layer Layer 1 Layer 2 Layer

Fig.2