AU2021107299A4 - A system for deep neural network based handwritten digit classification for low resource bengali script - Google Patents
A system for deep neural network based handwritten digit classification for low resource bengali script Download PDFInfo
- Publication number
- AU2021107299A4 AU2021107299A4 AU2021107299A AU2021107299A AU2021107299A4 AU 2021107299 A4 AU2021107299 A4 AU 2021107299A4 AU 2021107299 A AU2021107299 A AU 2021107299A AU 2021107299 A AU2021107299 A AU 2021107299A AU 2021107299 A4 AU2021107299 A4 AU 2021107299A4
- Authority
- AU
- Australia
- Prior art keywords
- image
- filter
- images
- parameters
- convolutional layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
- 238000013528 artificial neural network Methods 0.000 title claims description 13
- 238000013515 script Methods 0.000 title claims description 9
- 230000004913 activation Effects 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 32
- 238000000034 method Methods 0.000 claims description 27
- 238000011176 pooling Methods 0.000 claims description 11
- 238000009827 uniform distribution Methods 0.000 claims description 8
- 239000003086 colorant Substances 0.000 claims description 4
- 238000009826 distribution Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 description 15
- 230000008901 benefit Effects 0.000 description 9
- 238000013459 approach Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 8
- 238000010606 normalization Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000010200 validation analysis Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000013434 data augmentation Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/333—Preprocessing; Feature extraction
Abstract
The system comprises an acquisition module for receiving images of
handwritten Bangla numbers; a pre-processing module for random
resizing, cropping and random vertical and horizontal flipping to the
received images, wherein the images are thereby normalized for
converting the images to have zero mean and unit variance; wherein
interpolation is employed for predicting new pixel values of the resultant
image; a convolutional Layer equipped with a filter for scanning a
particular region of the image to produce a feature map, wherein
parameters in the filter are learnable and these parameters are shared for
the convolutional layer, that implies for a convolutional layer, the filter is
having same weights reduces the number of parameters to optimize while
making the convergence faster; and an activation function for computing
weighted sum of the inputs along with the bias, wherein based on this
weighted sum, it is decided if a node fires or not.
21
c
0
4-J
uoo
(U
4-Q)
Cix
IL
rcN Cwi
4-J =3
ar c
00
0U
Description
c 0 4-J
uoo
4-Q)
IL Cix
rcN Cwi 4-J =3
ar c
The present disclosure relates to a system for deep neural network based handwritten digit classification for low resource Bengali script.
Numeral systems represent numeric digits in a well-defined and understandable manner. In India, numerous regional numeral systems along with the widely used Western numeral system is used in day-to-day life. The use of handwritten language is more prevalent in Indian as compared to digitized scripts. Handwritten character recognition is gaining a lot of attention in the academic world. In India a large number of languages are spoken and written and each of the regional numeric systems bring along with it a different challenge, so far character recognition is concerned. By 2019, the projected number of internet users is 627 million out of a population of about 1.37 billion. This shows that only about 46% of the Indian population is connected to the web. The remaining chunk of the population has no access to the Internet and relies on handwritten methods for official and personal work. This large population generates a huge volume of handwritten data and thereby creates need for automated systems capable of recognizing and indexing the characters. The problem becomes more intense as most of the Indian languages are under resourced.
Bangla is an Indo-Aryan script used in both Bengali and Assamese language. It is the official language of Bangladesh and is the second most widely spoken language of India. In India, the language is in common use especially in the states of West Bengal, Tripura, and Assam. Worldwide, it is spoken by about 260 million people and is the 6th most spoken language throughout the world. Bangla's cultural significance and importance therefore is clearly undeniable.
In the view of the forgoing discussion, it is clearly portrayed that there is a need to have a system for deep neural network based handwritten digit classification for low resource Bengali script.
The present disclosure seeks to provide a system for deep neural network based handwritten digit classification for robust recognition of Bengali numeric digits.
In an embodiment, a system for deep neural network based handwritten digit classification for low resource Bengali script is disclosed. The system includes an acquisition module for receiving images of handwritten Bangla numbers. The system further includes a pre processing module for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance. Interpolation is employed for predicting new pixel values of the resultant image. The system further includes a convolutional Layer equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional layer, that implies for a convolutional layer, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster. The system further includes an activation function for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
In an embodiment, a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
In an embodiment, brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
In an embodiment, interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
In an embodiment, bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
In an embodiment, element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
In an embodiment, Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
In an embodiment, convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
In an embodiment, the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
An object of the present disclosure is to develop a recognition system based on transfer learning has been proposed for robust recognition of Bengali numeric digits.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure lillustrates a block diagram of system for deep neural network based handwritten digit classification for low resource Bengali scriptin accordance with an embodiment of the present disclosure; Figure2illustrates a working block diagram of ResNet 18in accordance with an embodiment of the present disclosure; and Figure 3 illustrates a graphical representation of training and validation accuracy and training and validation lossin accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have necessarily beendrawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Referring to Figure 1, a block diagram of system for deep neural network based handwritten digit classification for low resource Bengali scriptis illustrated in accordance with an embodiment of the present disclosure.The system 100 includes an acquisition module 102 for receiving images of handwritten Bangla numbers.
In an embodiment, a pre-processing module 104is configured with the acquisition module 102for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance. Interpolation is employed for predicting new pixel values of the resultant image.
In an embodiment, a convolutional Layer 106is equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional Layer 106, that implies for a convolutional Layer 106, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster.
In an embodiment, an activation function 108is engaged with the convolutional Layer 106 of a convolutional neural network for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
In an embodiment, a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
In an embodiment, brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
In an embodiment, interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
In an embodiment, bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
In an embodiment, element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
In an embodiment, Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
In an embodiment, convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
In an embodiment, the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
The dataset used in this study consists of 6000 images of handwritten Bangla numbers. The images are all 32x32 pixels and each image is a unique handwritten variant of the required Bangla numeral. The dataset has been divided into testing, training and validation subsets to aid the training process of the neural network. The model to learn the features of an image and learn the corresponding classifications in a supervised learning paradigm uses the training set. This split contains ten classes, one for each digit in the Bangla script numeral. Each class contains 420 unique handwritten characters for that numeral. The second split is the validation split, which is used to tune hyper parameters like the learning rate, number of epochs, and so forth. This split too contains ten classes each having 162 images. Finally, the test split to check the performance of the model has been created. This subset has ten classes with 10 images in each class.
Pre-processing is an important aspect of any neural network-based classifier. The pre-processing techniques applied herein has been discussed below:
Random resize, crop, and flip
Data augmentation is an important means of increasing the size of a dataset. It also proves useful in providing more vantage points to the model to learn from. This helps in generalizing the framework. It is shown that small distortions that are visibly hard to distinguish provide incorrect or no predictions. For all these reasons, random resizing, cropping and random vertical and horizontal flipping has been performed on the dataset. This augmentation has been done by randomly cropping the original image. This cropping has been set to be about 0.08 to 1.0 times the original image. A random aspect ratio has also been chosen which is 3/4th to 4/3rd of the original image. These settings have been verified to provide good results in image recognition tasks using CNNs. After these steps the image is resized back to the required image size for the CNN.
Random changes in brightness, saturation, hue, and contrast
In order to create a framework capable of recognizing Bengali handwritten characters in a number of situations, a pre-processing technique that randomly varies the properties of the original image is used. This augmentation allows the model to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image. The brightness is chosen from a uniform distribution lying between 1 and 3. Similarly, contrast and saturation is chosen from the uniform distribution that is between 1 and 3. The hue of the image is allowed to be changed from the distribution -0.1 to 0.5. This process makes the proposed model more robust.
Normalize
Normalization is the process of centering an image. In an image the ranges of the differing features in different color channels can be vastly different. This could cause some features to dominate over others based only on numerical significance rather than the importance of the feature. Normalization helps in converting the images to have zero mean and unit variance. Normalization has proven to increase the accuracy power of a neural network-based classifier like CNNs. In this study, the mean and standard variance is calculated for all images across all color channels. Then each channel's value is modified by using the following formula:
input~ci inputch-meanch stddevch
Interpolation
Interpolation is the technique by which new data points are predicted when the range between which the new data point is to be predicted is known. When cropping and resizing images, interpolation aids in the prediction of new pixel values of the resultant image. This is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel. In this study, Bicubic interpolation has been used. This technique provides a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc. It produces sharper images when compared to Linear and Bilinear interpolation. Instead of considering 4 pixels in the nearest 2x2 matrix of neighboring pixels like in Bilinear interpolation, Bicubic interpolation uses 16 pixels in the nearest 4x4 matrix of neighboring pixels. This increases the number of calculations but provides smoother images.
Convolutional Neural Networks
Convolutional Neural Networks have proven to be quite robust in their feature extraction. This makes them useful in image recognition, natural language processing, etc.
Convolution
The convolutional Layer 106 is the first layer in a Convolutional Neural Network and it takes the handwritten image. Receiving an entire image with all pixel values of an image as input, as done in the fully connected layers of an Artificial Neural Network, not only drastically increases the computation overhead but also introduces irrelevant features into the learning. This negatively impacts the model's performance and generalization. Convolutional Layers instead uses a filter to scan a particular region of the image. This region is known as the receptive field. The parameters in the filter are learnable and these parameters are shared for the convolutional Layer 106, this means that for a convolutional Layer 106, the filter has the same weights. This reduces the number of parameters to optimize while making the convergence faster. Element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter. A feature map is produced as an output of the convolutional Layer 106. Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation. The formula used for the convolution operation is
featuremap = input * kernel = (-1(VIFR[input]T[kernel]) (2)
In equation (2), the convolution operation is denoted by *. is the Fourier transform whereas F-1 is the inverse Fourier transform. VZ is the normalization constant.
Convolution layers extract features from the image. They start with learning simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
The size of the output produced as a result of the convolution operation is given by the formula (for each dimension)
Dimension Ldimensio+2pkj+1 (3)
In equation (3), dimension is the length of the dimension of the image (height, width). pis the padding applied on the image. Padding is the process of adding zeroes along the height and width of the image. Without padding the kernel lands on the corners much less frequently in comparison to the pixels in the center, this skews the learning of the network. Furthermore because of the aforementioned reason, the feature map size reduces after each convolution operation, this would hinder layering of layers. For all of these reasons padding is performed on the images. kis the kernel size. sis the stride length, it is the distance between successive kernel positions.
Activation function
Activation functions 108 are used to compute the weighted sum of the inputs along with the bias. Based on this weighted sum, it is decided if a node fires or not. Activation functions 108 can be linear or non-linear. Non-linear Activation functions 108 are used to allow for more complex learning by the network. Some Activation functions 108 used in the network are:
Rectified Linear Units (ReLU)
Rectified Linear Units activation function clamps the negative value at 0. Basically, for values of x that are less than 0, the output becomes zero. For values of x greater than 0, a linear function is produced as an output. The function used for the implementation of ReLU is computationally cheaper than Activation functions 108 like tanh and sigmoid, which involve expensive operations like exponentiation. The formula that lies behind rectified linear units is:
0) = max(w(i)Tx,0) =w(i)Tx, Ifx < 0s 0, otherwise
In equation (4), h0 gives the activation of a hidden layer. w(Ois hidden weight matrix of a hidden layer. xis the input.
ReLU faces an issue where for low values, the output is zero which makes it such that optimization techniques will not update that neuron. Adding to this, during the forward pass if the output are positive then back propagation is allowed otherwise it isn't. To combat these issues with ReLU, leaky ReLUs have been proposed.
Pooling
Pooling layers perform down sampling and help in dimensionality reduction which aids in achieving translational invariance. This layer also helps in avoiding over fitting by making the network's learning more general. Like the convolution operation, pooling too has hyperparameters like filter size, stride, and padding. There are two types of pooling operations, Max pooling and Global Average pooling. In Max pooling a filter is applied on the feature map. The filter is then moved all over the feature map with the value specified by the stride.
a = maxNN(au(n,n)) (7)
Equation (7) specifies the max pooling operation; it finds the maximum value encountered by the filter. Here, u(n,n) is the filter applied on the feature map.
The output dimensions are given by:
dimension =LdiensiokI +1 (8) S
Loss Function
For a network to learn, it is important to first evaluate how distant from the actual value the predictions are. To do this in a quantitative manner, loss functions are used. Easily differentiable functions are chosen as loss functions to ease the task of back propagation. In this method, Cross-Entropy loss has been used as the loss function.
N WT xi+by. Loss = log n (9) N =1 e Wxi+bj j=1
In equation (9), W are the weights vector, b is the bias, xi is the training sample, yj is the class of the xfh training sample, N is the total number of samples, W and Wy are the j'h and y!h column of the weights vector.
Gradient Descent
Gradient Descent is performed on the learnable parameters of the network. In this operation, the parameters P are varied by a small change in the parameters 5P << P. The small variation is chosen in such a manner that the loss of the network reduces. In this method, Stochastic Gradient Descent (SGD) has been used. In SGD, the parameters are updated for each training example, because of which redundancy of computation is reduced which increases the speed of learning.
P = P - rl. 7 8J(; x(); y (0) (10)
In equation (10), P are the parameters, r7 is the learning rate, J(;x(0;y(0) is the loss function, x1 is the ih training example, yi is the label of the ith training example, and 70 is the gradient of the loss function.
SGD faces difficulty in finding the local minima of an error space characterized by difference in "steepness" across different dimensions. In such scenarios, SGD makes slower progress towards the minima and tends to oscillate. Momentum diminishes this oscillation and increases the speed of SGD in the required direction:
v = YVt1+ 77FJ(0) (11)
P = P - vt
In equation (11), y is the momentum term and in this method, it has been set to 0.9.Learning rate has been set to 0.001 in this study. The learning rate is made to decay after every 7 epochs by a factor of 0.1. Decaying the learning rate leads to faster convergence to the local minima and higher accuracy.
Regularization
A major problem faced while training CNNs is over fitting. Over fitting leads to good performance on the training set but extremely poor performance on the validation set. The network, in this state, learns the training data too well and loses all capability to generalize. To combat this problem regularization techniques like L2, L1, Dropout, etc. are used. In this study Dropout has been used for regularization.
In Dropout, co-adaptions are reduced by randomly dropping off of some connections in a network. Because of this there is no guarantee of the availability of a particular hidden neuron.
Pre trained Networks
ImageNet
ImageNetis the top performer of the ILSVRC 2010. It contains of eight layers in total, five of which are convolutional and three layers are fully connected. Finally, the Soft Max function is used to output the class scores. The activation function used is Rectified Linear Unit (ReLU). To prevent over fitting, Data augmentation techniques and Dropout is used. The number of parameters is about 60 million. The smaller size leads and small number of parameters, it is easier to train in comparison to VGGNet. This light weightiness comes at the cost of accuracy.
Figure 2 illustrates a working block diagram of Res Net 18in accordance with an embodiment of the present disclosure. Res Net seeks to solve the problem of loss in accuracy as the network becomes deeper. This problem of vanishing gradient and degradation of accuracy is dealt with the help of skip or shortcut connections in the Res Net model. A diagrammatic representation of the residual block. Instead of approximating a function, the layers try to approximate a residual function. Formally, if F(x) is the function that the layers are trying to approximate, and x is the input, the residual function is denoted by R(x) = F(x) - x, the original function to approximate now becomes R(x) +
x.
Figure 3 illustrates a graphical representation of training and validation accuracy and training and validation lossin accordance with an embodiment of the present disclosure. The 18-layer variant of the residual network, ResNet-18 has been used. It contains eighteen layers, seventeen of which are convolutional layers, followed by one fully connected layer which produces the final output. A Batch Normalization layer is present after each convolutional layer. Batch Normalization is used for the normalization of the inputs inside the network. Every mini batch is normalized to a unit standard deviation and a mean of zero. The images areresizing to 24 x 24 pixels. Data Augmentation techniques like random cropping, random changes in brightness, and saturation along with several affine transformations are applied as discussed in the earlier sections. Two approaches are used for the recognition task. In one approach the pre-trained ResNet-18 model is fine-tuned to our dataset. In this approach all weights are updatable it is the architecture of ResNet-18 that is used. The final fully connected layer is transformed to better match our dataset. A decaying learning rate is used to better improve the performance. Regularization techniques like Dropout are also used. This approach yielded an accuracy of 96%. The accuracy for each character is shown in figure 3. The model performed exceptionally well with the digits , 2, 3, 4, 5, 6, and 8 producing an accuracy of 100%. The model didn't perform well with the digit 9 and 1, this is perhaps due to further ambiguity in their structure.
In the second approach, the ResNet-18 model is used as a feature extractor and hence the weights of the underlying network are not allowed to change. Only the final fully connected layer is fine-tuned on the dataset. Same data augmentation techniques are applied as in the previous approach. A decaying learning rate is also used. This approach yielded an output of 60% which is considerably worse than the fine-tuned approach.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.
Claims (9)
1. A system for deep neural network based handwritten digit classification for low resource Bengali script, the system comprises:
an acquisition module for receiving images of handwritten Bangla numbers; a pre-processing module for random resizing, cropping and random vertical and horizontal flipping to the received images, wherein the images are thereby normalized for converting the images to have zero mean and unit variance; wherein interpolation is employed for predicting new pixel values of the resultant image; a convolutional Layer equipped with a filter for scanning a particular region of the image to produce a feature map, wherein parameters in the filter are learnable and these parameters are shared for the convolutional layer, that implies for a convolutional layer, the filter is having same weights reduces the number of parameters to optimize while making the convergence faster; and an activation function for computing weighted sum of the inputs along with the bias, wherein based on this weighted sum, it is decided if a node fires or not.
2. The system as claimed in claim 1, wherein a pre-processing technique that randomly varies the properties of the original image is used which allows to predict characters in situations with low brightness, differing colors of the background/paper and the ink used, and the saturation of the image.
3. The system as claimed in claim 1, wherein brightness is chosen from a uniform distribution lying between 1 and 3, contrast and saturation is chosen from the uniform distribution that is between 1 and 3 and hue of the image is allowed to be changed from the distribution -0.1 to 0.5.
4. The system as claimed in claim 1, wherein interpolation is achieved by predicting the value of a given pixel by looking at the pixels neighboring the pixel in question and using the values of the neighboring pixels to predict value of the current pixel.
5. The system as claimed in claim 1, wherein bicubic interpolation is employed to provide a sharper image with reduced interpolation artifacts like Aliasing, Blurring, etc.
6. The system as claimed in claim 1, wherein element-wise multiplication is performed between the pixels of the image within the receptive field of the filter, and the weights of the filter.
7. The system as claimed in claim 1, wherein Fast Fourier transform turns the convolution operation into element wise multiplication reducing computation.
8. The system as claimed in claim 1, wherein convolution layers extract simple features like edges, etc., and with subsequent convolution layers learn more intricate and abstract patterns from the image.
9. The system as claimed in claim 1, wherein the activation functions are Rectified Linear Units, pooling, loss function, gradient descent, and regularization.
Acquisition Module Pre-Processing 102 Module 104
Convolutional Layer Activation Function 106 108
Figure 1
Figure 2
Figure 3
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021107299A AU2021107299A4 (en) | 2021-08-25 | 2021-08-25 | A system for deep neural network based handwritten digit classification for low resource bengali script |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2021107299A AU2021107299A4 (en) | 2021-08-25 | 2021-08-25 | A system for deep neural network based handwritten digit classification for low resource bengali script |
Publications (1)
Publication Number | Publication Date |
---|---|
AU2021107299A4 true AU2021107299A4 (en) | 2022-01-06 |
Family
ID=78958470
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2021107299A Ceased AU2021107299A4 (en) | 2021-08-25 | 2021-08-25 | A system for deep neural network based handwritten digit classification for low resource bengali script |
Country Status (1)
Country | Link |
---|---|
AU (1) | AU2021107299A4 (en) |
-
2021
- 2021-08-25 AU AU2021107299A patent/AU2021107299A4/en not_active Ceased
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Naz et al. | Urdu Nasta’liq text recognition system based on multi-dimensional recurrent neural network and statistical features | |
EP3971772A1 (en) | Model training method and apparatus, and terminal and storage medium | |
US20170316311A1 (en) | Sparse inference modules for deep learning | |
Ahmad et al. | Offline Urdu Nastaleeq optical character recognition based on stacked denoising autoencoder | |
CN109840531A (en) | The method and apparatus of training multi-tag disaggregated model | |
CN112639828A (en) | Data processing method, method and equipment for training neural network model | |
CN110070115B (en) | Single-pixel attack sample generation method, device, equipment and storage medium | |
Sharma et al. | Deep eigen space based ASL recognition system | |
Rabby et al. | Bangla handwritten digit recognition using convolutional neural network | |
Kembuan et al. | Convolutional neural network (CNN) for image classification of indonesia sign language using tensorflow | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
Shuvo et al. | MathNET: using CNN bangla handwritten digit, mathematical symbols, and trigonometric function recognition | |
Escalera et al. | Re-coding ECOCs without re-training | |
Doherty et al. | Comparative study of activation functions and their impact on the YOLOv5 object detection model | |
Biniz et al. | Recognition of Tifinagh characters using optimized convolutional neural network | |
AU2021107299A4 (en) | A system for deep neural network based handwritten digit classification for low resource bengali script | |
Wu | CNN-Based Recognition of Handwritten Digits in MNIST Database | |
Chacon-Murguia et al. | Moving object detection in video sequences based on a two-frame temporal information CNN | |
CN116894207A (en) | Intelligent radiation source identification method based on Swin transducer and transfer learning | |
Rehman et al. | High performance Urdu and Arabic video text recognition using convolutional recurrent neural networks | |
CN112733670B (en) | Fingerprint feature extraction method and device, electronic equipment and storage medium | |
Nouri | Handwritten digit recognition by deep learning for automatic entering of academic transcripts | |
Zhou et al. | Image Segmentation Algorithms Based on Convolutional Neural Networks | |
Choudhury et al. | Handwritten Bengali Digit Classification Using Deep Learning | |
Burciu et al. | Sensing forest for pattern recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGI | Letters patent sealed or granted (innovation patent) | ||
MK22 | Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry |