AU2021107299A4 - A system for deep neural network based handwritten digit classification for low resource bengali script - Google Patents

A system for deep neural network based handwritten digit classification for low resource bengali script Download PDF

Info

Publication number
AU2021107299A4
AU2021107299A4 AU2021107299A AU2021107299A AU2021107299A4 AU 2021107299 A4 AU2021107299 A4 AU 2021107299A4 AU 2021107299 A AU2021107299 A AU 2021107299A AU 2021107299 A AU2021107299 A AU 2021107299A AU 2021107299 A4 AU2021107299 A4 AU 2021107299A4
Authority
AU
Australia
Prior art keywords
image
filter
images
parameters
convolutional layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021107299A
Inventor
Amitava Choudhury
Abhijit Kumar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2021107299A priority Critical patent/AU2021107299A4/en
Application granted granted Critical
Publication of AU2021107299A4 publication Critical patent/AU2021107299A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction

Abstract

The system comprises an acquisition module for receiving images of handwritten Bangla numbers; a pre-processing module for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance; wherein interpolation is employed for predicting new pixel values of the resultant image; a convolutional Layer equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional layer, that implies for a convolutional layer, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster; and an activation function for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not. 21 c 0 4-J uoo (U 4-Q) Cix IL rcN Cwi 4-J =3 ar c 00 0U

Description

c 0 4-J
uoo
(U
4-Q)
IL Cix
rcN Cwi 4-J =3
ar c
U A SYSTEM FOR DEEP NEURAL NETWORK BASED HANDWRITTEN DIGIT CLASSIFICATION FOR LOW RESOURCE BENGALI SCRIPT FIELDOFTHEINVENTION
The present disclosure relates to a system for deep neural network based handwritten digit classification for low resource Bengali script.
BACKGROUND OF THE INVENTION
Numeral systems represent numeric digits in a well-defined and understandable manner. In India, numerous regional numeral systems along with the widely used Western numeral system is used in day-to-day life. The use of handwritten language is more prevalent in Indian as compared to digitized scripts. Handwritten character recognition is gaining a lot of attention in the academic world. In India a large number of languages are spoken and written and each of the regional numeric systems bring along with it a different challenge, so far character recognition is concerned. By 2019, the projected number of internet users is 627 million out of a population of about 1.37 billion. This shows that only about 46% of the Indian population is connected to the web. The remaining chunk of the population has no access to the Internet and relies on handwritten methods for official and personal work. This large population generates a huge volume of handwritten data and thereby creates need for automated systems capable of recognizing and indexing the characters. The problem becomes more intense as most of the Indian languages are under resourced.
Bangla is an Indo-Aryan script used in both Bengali and Assamese language. It is the official language of Bangladesh and is the second most widely spoken language of India. In India, the language is in common use especially in the states of West Bengal, Tripura, and Assam. Worldwide, it is spoken by about 260 million people and is the 6th most spoken language throughout the world. Bangla's cultural significance and importance therefore is clearly undeniable.
In the view of the forgoing discussion, it is clearly portrayed that there is a need to have a system for deep neural network based handwritten digit classification for low resource Bengali script.
SUMMARY OF THE INVENTION
The present disclosure seeks to provide a system for deep neural network based handwritten digit classification for robust recognition of Bengali numeric digits.
In an embodiment, a system for deep neural network based handwritten digit classification for low resource Bengali script is disclosed. The system includes an acquisition module for receiving images of handwritten Bangla numbers. The system further includes a pre processing module for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance. Interpolation is employed for predicting new pixel values of the resultant image. The system further includes a convolutional Layer equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional layer, that implies for a convolutional layer, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster. The system further includes an activation function for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
In an embodiment, a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
In an embodiment, brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
In an embodiment, interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
In an embodiment, bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
In an embodiment, element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
In an embodiment, Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
In an embodiment, convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
In an embodiment, the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
An object of the present disclosure is to develop a recognition system based on transfer learning has been proposed for robust recognition of Bengali numeric digits.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
BRIEFDESCRIPTIONOFFIGURES
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure lillustrates a block diagram of system for deep neural network based handwritten digit classification for low resource Bengali scriptin accordance with an embodiment of the present disclosure; Figure2illustrates a working block diagram of ResNet 18in accordance with an embodiment of the present disclosure; and Figure 3 illustrates a graphical representation of training and validation accuracy and training and validation lossin accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily beendrawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
DETAILED DESCRIPTION
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Referring to Figure 1, a block diagram of system for deep neural network based handwritten digit classification for low resource Bengali scriptis illustrated in accordance with an embodiment of the present disclosure.The system 100 includes an acquisition module 102 for receiving images of handwritten Bangla numbers.
In an embodiment, a pre-processing module 104is configured with the acquisition module 102for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance. Interpolation is employed for predicting new pixel values of the resultant image.
In an embodiment, a convolutional Layer 106is equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional Layer 106, that implies for a convolutional Layer 106, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster.
In an embodiment, an activation function 108is engaged with the convolutional Layer 106 of a convolutional neural network for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
In an embodiment, a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
In an embodiment, brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
In an embodiment, interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
In an embodiment, bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
In an embodiment, element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
In an embodiment, Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
In an embodiment, convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
In an embodiment, the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
The dataset used in this study consists of 6000 images of handwritten Bangla numbers. The images are all 32x32 pixels and each image is a unique handwritten variant of the required Bangla numeral. The dataset has been divided into testing, training and validation subsets to aid the training process of the neural network. The model to learn the features of an image and learn the corresponding classifications in a supervised learning paradigm uses the training set. This split contains ten classes, one for each digit in the Bangla script numeral. Each class contains 420 unique handwritten characters for that numeral. The second split is the validation split, which is used to tune hyper parameters like the learning rate, number of epochs, and so forth. This split too contains ten classes each having 162 images. Finally, the test split to check the performance of the model has been created. This subset has ten classes with 10 images in each class.
Pre-processing is an important aspect of any neural network-based classifier. The pre-processing techniques applied herein has been discussed below:
Random resize, crop, and flip
Data augmentation is an important means of increasing the size of a dataset. It also proves useful in providing more vantage points to the model to learn from. This helps in generalizing the framework. It is shown that small distortions that are visibly hard to distinguish provide incorrect or no predictions. For all these reasons, random resizing, cropping and random vertical and horizontal flipping has been performed on the dataset. This augmentation has been done by randomly cropping the original image. This cropping has been set to be about 0.08 to 1.0 times the original image. A random aspect ratio has also been chosen which is 3/4th to 4/3rd of the original image. These settings have been verified to provide good results in image recognition tasks using CNNs. After these steps the image is resized back to the required image size for the CNN.
Random changes in brightness, saturation, hue, and contrast
In order to create a framework capable of recognizing Bengali handwritten characters in a number of situations, a pre-processing technique that randomly varies the properties of the original image is used. This augmentation allows the model to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image. The brightness is chosen from a uniform distribution lying between 1 and 3. Similarly, contrast and saturation is chosen from the uniform distribution that is between 1 and 3. The hue of the image is allowed to be changed from the distribution -0.1 to 0.5. This process makes the proposed model more robust.
Normalize
Normalization is the process of centering an image. In an image the ranges of the differing features in different color channels can be vastly different. This could cause some features to dominate over others based only on numerical significance rather than the importance of the feature. Normalization helps in converting the images to have zero mean and unit variance. Normalization has proven to increase the accuracy power of a neural network-based classifier like CNNs. In this study, the mean and standard variance is calculated for all images across all color channels. Then each channel's value is modified by using the following formula:
input~ci inputch-meanch stddevch
Interpolation
Interpolation is the technique by which new data points are predicted when the range between which the new data point is to be predicted is known. When cropping and resizing images, interpolation aids in the prediction of new pixel values of the resultant image. This is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel. In this study, Bicubic interpolation has been used. This technique provides a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc. It produces sharper images when compared to Linear and Bilinear interpolation. Instead of considering 4 pixels in the nearest 2x2 matrix of neighboring pixels like in Bilinear interpolation, Bicubic interpolation uses 16 pixels in the nearest 4x4 matrix of neighboring pixels. This increases the number of calculations but provides smoother images.
Convolutional Neural Networks
Convolutional Neural Networks have proven to be quite robust in their feature extraction. This makes them useful in image recognition, natural language processing, etc.
Convolution
The convolutional Layer 106 is the first layer in a Convolutional Neural Network and it takes the handwritten image. Receiving an entire image with all pixel values of an image as input, as done in the fully connected layers of an Artificial Neural Network, not only drastically increases the computation overhead but also introduces irrelevant features into the learning. This negatively impacts the model's performance and generalization. Convolutional Layers instead uses a filter to scan a particular region of the image. This region is known as the receptive field. The parameters in the filter are learnable and these parameters are shared for the convolutional Layer 106, this means that for a convolutional Layer 106, the filter has the same weights. This reduces the number of parameters to optimize while making the convergence faster. Element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter. A feature map is produced as an output of the convolutional Layer 106. Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation. The formula used for the convolution operation is
featuremap = input * kernel = (-1(VIFR[input]T[kernel]) (2)
In equation (2), the convolution operation is denoted by *. is the Fourier transform whereas F-1 is the inverse Fourier transform. VZ is the normalization constant.
Convolution layers extract features from the image. They start with learning simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
The size of the output produced as a result of the convolution operation is given by the formula (for each dimension)
Dimension Ldimensio+2pkj+1 (3)
In equation (3), dimension is the length of the dimension of the image (height, width). pis the padding applied on the image. Padding is the process of adding zeroes along the height and width of the image. Without padding the kernel lands on the corners much less frequently in comparison to the pixels in the center, this skews the learning of the network. Furthermore because of the aforementioned reason, the feature map size reduces after each convolution operation, this would hinder layering of layers. For all of these reasons padding is performed on the images. kis the kernel size. sis the stride length, it is the distance between successive kernel positions.
Activation function
Activation functions 108 are used to compute the weighted sum of the inputs along with the bias. Based on this weighted sum, it is decided if a node fires or not. Activation functions 108 can be linear or non-linear. Non-linear Activation functions 108 are used to allow for more complex learning by the network. Some Activation functions 108 used in the network are:
Rectified Linear Units (ReLU)
Rectified Linear Units activation function clamps the negative value at 0. Basically, for values of x that are less than 0, the output becomes zero. For values of x greater than 0, a linear function is produced as an output. The function used for the implementation of ReLU is computationally cheaper than Activation functions 108 like tanh and sigmoid, which involve expensive operations like exponentiation. The formula that lies behind rectified linear units is:
0) = max(w(i)Tx,0) =w(i)Tx, Ifx < 0s 0, otherwise
In equation (4), h0 gives the activation of a hidden layer. w(Ois hidden weight matrix of a hidden layer. xis the input.
ReLU faces an issue where for low values, the output is zero which makes it such that optimization techniques will not update that neuron. Adding to this, during the forward pass if the output are positive then back propagation is allowed otherwise it isn't. To combat these issues with ReLU, leaky ReLUs have been proposed.
Pooling
Pooling layers perform down sampling and help in dimensionality reduction which aids in achieving translational invariance. This layer also helps in avoiding over fitting by making the network's learning more general. Like the convolution operation, pooling too has hyperparameters like filter size, stride, and padding. There are two types of pooling operations, Max pooling and Global Average pooling. In Max pooling a filter is applied on the feature map. The filter is then moved all over the feature map with the value specified by the stride.
a = maxNN(au(n,n)) (7)
Equation (7) specifies the max pooling operation; it finds the maximum value encountered by the filter. Here, u(n,n) is the filter applied on the feature map.
The output dimensions are given by:
dimension =LdiensiokI +1 (8) S
Loss Function
For a network to learn, it is important to first evaluate how distant from the actual value the predictions are. To do this in a quantitative manner, loss functions are used. Easily differentiable functions are chosen as loss functions to ease the task of back propagation. In this method, Cross-Entropy loss has been used as the loss function.
N WT xi+by. Loss = log n (9) N =1 e Wxi+bj j=1
In equation (9), W are the weights vector, b is the bias, xi is the training sample, yj is the class of the xfh training sample, N is the total number of samples, W and Wy are the j'h and y!h column of the weights vector.
Gradient Descent
Gradient Descent is performed on the learnable parameters of the network. In this operation, the parameters P are varied by a small change in the parameters 5P << P. The small variation is chosen in such a manner that the loss of the network reduces. In this method, Stochastic Gradient Descent (SGD) has been used. In SGD, the parameters are updated for each training example, because of which redundancy of computation is reduced which increases the speed of learning.
P = P - rl. 7 8J(; x(); y (0) (10)
In equation (10), P are the parameters, r7 is the learning rate, J(;x(0;y(0) is the loss function, x1 is the ih training example, yi is the label of the ith training example, and 70 is the gradient of the loss function.
SGD faces difficulty in finding the local minima of an error space characterized by difference in "steepness" across different dimensions. In such scenarios, SGD makes slower progress towards the minima and tends to oscillate. Momentum diminishes this oscillation and increases the speed of SGD in the required direction:
v = YVt1+ 77FJ(0) (11)
P = P - vt
In equation (11), y is the momentum term and in this method, it has been set to 0.9.Learning rate has been set to 0.001 in this study. The learning rate is made to decay after every 7 epochs by a factor of 0.1. Decaying the learning rate leads to faster convergence to the local minima and higher accuracy.
Regularization
A major problem faced while training CNNs is over fitting. Over fitting leads to good performance on the training set but extremely poor performance on the validation set. The network, in this state, learns the training data too well and loses all capability to generalize. To combat this problem regularization techniques like L2, L1, Dropout, etc. are used. In this study Dropout has been used for regularization.
In Dropout, co-adaptions are reduced by randomly dropping off of some connections in a network. Because of this there is no guarantee of the availability of a particular hidden neuron.
Pre trained Networks
ImageNet
ImageNetis the top performer of the ILSVRC 2010. It contains of eight layers in total, five of which are convolutional and three layers are fully connected. Finally, the Soft Max function is used to output the class scores. The activation function used is Rectified Linear Unit (ReLU). To prevent over fitting, Data augmentation techniques and Dropout is used. The number of parameters is about 60 million. The smaller size leads and small number of parameters, it is easier to train in comparison to VGGNet. This light weightiness comes at the cost of accuracy.
Figure 2 illustrates a working block diagram of Res Net 18in accordance with an embodiment of the present disclosure. Res Net seeks to solve the problem of loss in accuracy as the network becomes deeper. This problem of vanishing gradient and degradation of accuracy is dealt with the help of skip or shortcut connections in the Res Net model. A diagrammatic representation of the residual block. Instead of approximating a function, the layers try to approximate a residual function. Formally, if F(x) is the function that the layers are trying to approximate, and x is the input, the residual function is denoted by R(x) = F(x) - x, the original function to approximate now becomes R(x) +
x.
Figure 3 illustrates a graphical representation of training and validation accuracy and training and validation lossin accordance with an embodiment of the present disclosure. The 18-layer variant of the residual network, ResNet-18 has been used. It contains eighteen layers, seventeen of which are convolutional layers, followed by one fully connected layer which produces the final output. A Batch Normalization layer is present after each convolutional layer. Batch Normalization is used for the normalization of the inputs inside the network. Every mini batch is normalized to a unit standard deviation and a mean of zero. The images areresizing to 24 x 24 pixels. Data Augmentation techniques like random cropping, random changes in brightness, and saturation along with several affine transformations are applied as discussed in the earlier sections. Two approaches are used for the recognition task. In one approach the pre-trained ResNet-18 model is fine-tuned to our dataset. In this approach all weights are updatable it is the architecture of ResNet-18 that is used. The final fully connected layer is transformed to better match our dataset. A decaying learning rate is used to better improve the performance. Regularization techniques like Dropout are also used. This approach yielded an accuracy of 96%. The accuracy for each character is shown in figure 3. The model performed exceptionally well with the digits , 2, 3, 4, 5, 6, and 8 producing an accuracy of 100%. The model didn't perform well with the digit 9 and 1, this is perhaps due to further ambiguity in their structure.
In the second approach, the ResNet-18 model is used as a feature extractor and hence the weights of the underlying network are not allowed to change. Only the final fully connected layer is fine-tuned on the dataset. Same data augmentation techniques are applied as in the previous approach. A decaying learning rate is also used. This approach yielded an output of 60% which is considerably worse than the fine-tuned approach.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims (9)

WE CLAIM:
1. A system for deep neural network based handwritten digit classification for low resource Bengali script, the system comprises:
an acquisition module for receiving images of handwritten Bangla numbers; a pre-processing module for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance; wherein interpolation is employed for predicting new pixel values of the resultant image; a convolutional Layer equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional layer, that implies for a convolutional layer, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster; and an activation function for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
2. The system as claimed in claim 1, wherein a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
3. The system as claimed in claim 1, wherein brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
4. The system as claimed in claim 1, wherein interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
5. The system as claimed in claim 1, wherein bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
6. The system as claimed in claim 1, wherein element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
7. The system as claimed in claim 1, wherein Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
8. The system as claimed in claim 1, wherein convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
9. The system as claimed in claim 1, wherein the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
Acquisition Module Pre-Processing 102 Module 104
Convolutional Layer Activation Function 106 108
Figure 1
Figure 2
Figure 3
AU2021107299A 2021-08-25 2021-08-25 A system for deep neural network based handwritten digit classification for low resource bengali script Ceased AU2021107299A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021107299A AU2021107299A4 (en) 2021-08-25 2021-08-25 A system for deep neural network based handwritten digit classification for low resource bengali script

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021107299A AU2021107299A4 (en) 2021-08-25 2021-08-25 A system for deep neural network based handwritten digit classification for low resource bengali script

Publications (1)

Publication Number Publication Date
AU2021107299A4 true AU2021107299A4 (en) 2022-01-06

Family

ID=78958470

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021107299A Ceased AU2021107299A4 (en) 2021-08-25 2021-08-25 A system for deep neural network based handwritten digit classification for low resource bengali script

Country Status (1)

Country Link
AU (1) AU2021107299A4 (en)

Similar Documents

Publication Publication Date Title
Naz et al. Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features
EP3971772A1 (en) Model training method and apparatus, and terminal and storage medium
US20170316311A1 (en) Sparse inference modules for deep learning
Ahmad et al. Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder
CN109840531A (en) The method and apparatus of training multi-tag disaggregated model
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN110070115B (en) Single-pixel attack sample generation method, device, equipment and storage medium
Sharma et al. Deep eigen space based ASL recognition system
Rabby et al. Bangla handwritten digit recognition using convolutional neural network
Kembuan et al. Convolutional neural network (CNN) for image classification of indonesia sign language using tensorflow
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
Shuvo et al. MathNET: using CNN bangla handwritten digit, mathematical symbols, and trigonometric function recognition
Escalera et al. Re-coding ECOCs without re-training
Doherty et al. Comparative study of activation functions and their impact on the YOLOv5 object detection model
Biniz et al. Recognition of Tifinagh characters using optimized convolutional neural network
AU2021107299A4 (en) A system for deep neural network based handwritten digit classification for low resource bengali script
Wu CNN-Based Recognition of Handwritten Digits in MNIST Database
Chacon-Murguia et al. Moving object detection in video sequences based on a two-frame temporal information CNN
CN116894207A (en) Intelligent radiation source identification method based on Swin transducer and transfer learning
Rehman et al. High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks
CN112733670B (en) Fingerprint feature extraction method and device, electronic equipment and storage medium
Nouri Handwritten digit recognition by deep learning for automatic entering of academic transcripts
Zhou et al. Image Segmentation Algorithms Based on Convolutional Neural Networks
Choudhury et al. Handwritten Bengali Digit Classification Using Deep Learning
Burciu et al. Sensing forest for pattern recognition

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry