AU2021100089A4 - A method to word recognition for the postal automation and a system thereof - Google Patents

A method to word recognition for the postal automation and a system thereof Download PDF

Info

Publication number
AU2021100089A4
AU2021100089A4 AU2021100089A AU2021100089A AU2021100089A4 AU 2021100089 A4 AU2021100089 A4 AU 2021100089A4 AU 2021100089 A AU2021100089 A AU 2021100089A AU 2021100089 A AU2021100089 A AU 2021100089A AU 2021100089 A4 AU2021100089 A4 AU 2021100089A4
Authority
AU
Australia
Prior art keywords
features
words
word
image
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU2021100089A
Inventor
Harmandeep KAUR
Krishan Kumar
Manish Kumar
Munish Kumar
Simpel Rani
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to AU2021100089A priority Critical patent/AU2021100089A4/en
Application granted granted Critical
Publication of AU2021100089A4 publication Critical patent/AU2021100089A4/en
Ceased legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • G06V30/424Postal images, e.g. labels or addresses on parcels or postal envelopes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/186Extraction of features or characteristics of the image by deriving mathematical or geometrical properties from the whole image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10008Still image; Photographic image from scanner, fax or copier
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/166Normalisation of pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/293Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of characters other than Kanji, Hiragana or Katakana
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink

Abstract

ABTRACT The present disclosure relates to a system and method for word recognition for postal automation using xgboost. A holistic method and extreme gradient boosting technique to recognize offline handwritten Gurumukhi words. In this direction, four state-of-the-art features like zoning, diagonal, intersection &open-end points and peak extent features have been considered to extract discriminant features from the handwritten word digital images. The method is evaluated on a public benchmark dataset of Gurumukhi script that comprises 40,000 samples of handwritten words. Based on extracted features, the words are classified into one of the 100 classes based on XGBoost technique. XGBoost technique attained the best results of accuracy (91.66%), recall (91.66%), precision (91.39%), Fl-score (91.14%) and AUC (95.66%) using zoning features based on 90% data as the training set and remaining 10% data as the testing set. 35 C C C C CN. CN (N C11 -EE Ct -w 0U-U U)C- (2~ CU-DC -2CU C Ebw E -( o1 F U,* 0 -0 0 oEo _C OCE N CU 5U CCY: a-) a) D Q) C:U' C E 4D' =C oU 12 .0 -E .r CC CX p 3: bO U- . UO- 0 n 0D Uns E U S C U- t5 - t = -E S . c '-C U)U U) >) UnC -m CU :tE C-F 0o _0E C :Q Cto 0 a) bD to mn cC -E U) a,) 0_-0t O0 -U) S Q cCU E Qj 0 - 0.' C2 C U)Us S~ (UU C C 04

Description

ABTRACT
The present disclosure relates to a system and method for word recognition for postal automation using xgboost. A holistic method and extreme gradient boosting technique to recognize offline handwritten Gurumukhi words. In this direction, four state-of-the-art features like zoning, diagonal, intersection &open-end points and peak extent features have been considered to extract discriminant features from the handwritten word digital images. The method is evaluated on a public benchmark dataset of Gurumukhi script that comprises ,000 samples of handwritten words. Based on extracted features, the words are classified into one of the 100 classes based on XGBoost technique. XGBoost technique attained the best results of accuracy (91.66%), recall (91.66%), precision (91.39%), Fl-score (91.14%) and AUC (95.66%) using zoning features based on 90% data as the training set and remaining 10% data as the testing set.
C C C C CN. CN (N C11
-EE Ct -w 0U-U
U)C- (2~ CU-DC
-2CU C Ebw E -( o1 F
U,* 0 -0 0
oEo _C OCE
N CU
5U CCY: a-) a) D
Q) C:U' C E
.r 4D' =C oU 12
.0 -E 3: bO CC CX p
U- . UO-0 n 0D
Uns E U C S U- t5 - t = -E S. c
'-C U)U U) >) UnC -m CU :tE C-F 0o _0E C :Q Cto 0 a) bD to mn cC -E U)
a,) 0_-0t
O0 -U) S Q cCU E Qj 0 - 0.' C2 C U)Us S~
(UU C C A METHOD TO WORD RECOGNITION FOR THE POSTAL AUTOMATION AND A SYSTEM THEREOF FIELD OF THE INVENTION
The present disclosure relates to text recognizing systems. More particularly, the present invention relates to a system and method for word recognition for postal automation using xgboost.
BACKGROUND OF THE INVENTION
Document image analysis and recognition is one of the significant progressions towards making society paperless. Handwritten word recognition is an emerging field in the domain of document analysis and recognition, which has been a subject of deep research over the past 10 years. Handwritten word recognition is the process to recognize handwritten words (which may be written using any natural language) by the machine. Words can be written using two modes, namely, online mode and offline mode. In online mode, words are written using a pen on a digital tablet where the pen tip directions are noted to recognize the written word. Whereas, in an offline mode, the handwritten word samples are written on a sheet of paper using pen/pencil and then the paper sheet is fed to the scanner to get a digitized image of the document. For recognizing handwritten words, two approaches, namely, segmentation-based approach and segmentation free approach are considered. The segmentation-based approach is also known as analytical approach, which considers the word as a collection of individual characters. Thus, to recognize, initially the word is divided into its individual units (characters) and then the individual characters are recognized using recognition techniques. Sometimes, there is existence of overlapping characters in a word which leads to an issue in its segmentation. This issue can be solved by the segmentation free approach to word recognition, which considers the whole word as an individual entity. As this approach does not consider the individual characters of a word separately, thus there is no issue of explicit segmentation and the whole word is recognized using recognition techniques. This approach is also known as a holistic approach to word recognition. There are several applications of the handwritten word recognition system in various areas like automation of handwritten documents, processing of bank cheques, postal automation, signature verification, writer identification, storage of historical documents, document authentication etc.
In order to have a very high predictive capability, gradient boosting technique is proposed. But its acceptance is very limited due to take large time in order to train even the straightforward models. This is due to the fact that this technique needs a single decision tree at a time for minimizing the errors of all antecedent trees in the model. In order to overcome this issue, a new technique known as XGBoost is proposed. In XGBoost technique, independent trees are generated and data is arranged to lessen the lookup time, which leads to less training time of models and thus enhances the accuracy of classification. The key factor which supports the accomplishment of XGBoost is its extensibility in all scenarios. Due to this factor, the system executes over ten times faster as compared to the existing famous solutions on a single machine and extends to billion examples in distributed settings. There are numerous systematic and technique optimizations available that account for the extensibility of XGBoost. Based on these features, XGBoost technique has been applied to recognize offline handwritten Gurumukhi words. Gurumukhi script is utilized to write Punjabi language which is the official language of Punjab state of India. Gurumukhi script comprises 41 consonants, 9 vowels, 2 symbols for nasal sounds, 1 symbol for consonant lengthening and 3 subscript characters. It follows left to right direction of writing in a horizontal way.
Traditionally, a lot of work has been done in recognition of various Indic and Non Indic scripts. For example, A new hierarchical approach is developed to recognize offline handwritten Gurumukhi characters. They extracted a set of 105 features employing four features, namely, horizontally peak extent features, vertically peak extent features, diagonal features, and centroid features. To reduce the dimensionality of the features, various feature selection approaches such as Correlation-based feature selection (CFS), Principal Component Analysis (PCA) and Consistency-Based (CON) feature selection approaches were employed. Based on dataset of 3500-character samples, they attained a recognition rate of 91.80% using PCA and linear-kernel Support Vector Machine (SVM) classifier. They also proposed two feature extraction techniques, namely, power curve fitting and parabola curve fitting-based features to recognize offline handwritten Gurumukhi characters. To determine the efficiency of the proposed feature extraction techniques, they also considered existing features, namely, zoning, transition, diagonal, intersection, and open-end points, directional, gradient and chain code features. They also used the same dataset of 3500 samples of Gurumukhi isolated characters to perform experiments and attained recognition rates of 97.14% and 98.10% using SVM and k-NN, respectively, by considering the power curve fitting-based features. A holistic approach is proposed for the recognition of offline handwritten English words based on structural features which finds its application in recognition of postal addresses. For classification, Euclidean distance-based k-NN classifier is employed that achieved 90% accuracy on a dataset comprising 300 samples of 30 district names of Karnataka state. Harmony Search (HS) technique-based feature selection approach is proposed to recognize handwritten Bangla words by reducing the dimensionality of features of the method presented. To recognize handwritten Bangla words, a set of 65 elliptical features was considered with a recognition rate of 81.37% based on MLP (Multilayer Perceptron) classifier. To eliminate undesired features and to select only relevant features, HS based feature selection approach only considered 48 features which reported a recognition rate of 90.29% on the same dataset comprising 1020 words handwritten in Bangla script. Based on comparison with two feature dimensionality reduction techniques, namely, Genetic Technique (GA) and Particle Swarm Optimization (PSO), the proposed Harmony Search (HS) technique provided better recognition results using a holistic approach. A new technique is proposed to extract features from the pre-segmented offline handwritten Gurumukhi characters. The proposed feature extraction technique considered the boundary extent of the character sample to extract the desirable features which were then reduced using PCA feature selection approach. The experiments were conducted on a dataset comprising 7000 Gurumukhi character samples using three classifiers, namely, k-NN, SVM and MLP. The best recognition rate of 93.8% was attained using RBF-kemel SVM classifier and 5-fold cross validation approach.
Further, a holistic approach is proposed to recognize offline handwritten Arabic words based on Gabor filters. Two types of features like statistical Gabor features and Gabor descriptors were considered from the word samples which were then integrated with Bag-of features framework for extracting the desired features. Then, based on extracted features, the words were recognized using SVM classifier with linear kernel function. The proposed recognition system achieved the best average recognition rate of 86.44% by considering experiments on CENPARMI public dataset comprising Arabic handwritten checks. Various transformations-based techniques are proposed to recognize offline handwritten Gurumukhi characters. The various transformation techniques used were discrete cosine transformations
(DCT2), discrete wavelet transformations (DWT2), fast Fourier transformations and fan beam transformations. Considering experiments on a dataset of 10,500-character samples, they attained a recognition rate of 95.8% using 5-fold cross validation technique and by employing DCT2 features to linear- kernel SVM classifier. To resolve the issue of scripts having less training dataset, cross language framework was proposed to recognize words and employed zone-wise approach to map characters in order to locate Indian scripts that execute training on one script having large dataset and testing on other scripts having less samples comparatively. They considered lexicons of sizes 1921, 1934 and 1953 for Bangla, Devanagari, and Gurumukhi scripts, respectively. Based on experiments, the global mean average precision rate of 66.87% (66.42) was reported for Devanagari (Bangla) script where the Gurumukhi script was used as the source script and the results revealed higher script similarity between Bangla and Devanagari as compared to Bangla and Gurumukhi script. An ensemble model is presented to combine the output of SVM classifiers based on three features (two handcrafted and one machine generated features), namely, Arnold transform based features, curvature-based features and Deep Convolution Neural Network (DCNN) based features, in order to recognize offline handwritten words. To combine the decisions of three classifiers, three strategies were employed, namely, vote for majority decision, vote for strongest decision and vote for the sum of the decisions. The proposed system achieved the best recognition rates of 95.23%, 97.16% and 95.07% on public datasets, namely, CENPARMI, ISIHWD and IAM200, respectively.
Furthermore, a novel feature extraction technique named as SGCSL (Statistical Geometric Components of Straight lines) was proposed to recognize offline handwritten Arabic/Persian words. The extracted features were then fed to the SVM classifier for classification task. They attained recognition rates of 67.47%, 80.78%, and 86.22% by experimenting the proposed approach on three public datasets, namely, Iran-cities, IFN/ENIT, and IBN SINA Arabic dataset, respectively. A holistic approach is demonstrated to recognize handwritten Farsi words based on the fusion of three HMM classifiers that were trained separately using three feature sets, namely, image gradient, black-white transitions and contour chain code features. Then the fused output of HMMs was given to the MLP classifier for the recognition purpose. They evaluated the approach on "Iranshahr 3" dataset and attained a recognition rate of 89.06%, which is observed as superior than independent base classifiers. In order to make comparison possible between the proposed approach and the existing approaches in Gurumukhi script, seven benchmark datasets in Gurumukhi script, namely, HWR-Gurmukhi_1.1, HWR-Gurmukhi_1.2, HWR-Gurmukhi_1.3, HWR Gurmukhi_2.1, HWR-Gurmukhi_2.2, HWR-Gurmukhi_2.3, and HWR-Gurmukhi_3.1 are developed. Each of HWR-Gurmukhi_1.1, HWR-Gurmukhi_1.2, HWR-Gurmukhi_1.3 benchmark datasets comprise 3500-character samples whereas HWR-Gurmukhi_2.1, HWR Gurmukhi_2.2, HWR-Gurmukhi_2.3 comprise 5600-character samples, individually and the last benchmark dataset HWR-Gurmukhi_3.1 comprises 7000-character samples. They also performed experiments on the proposed datasets using k-NN, RBF-SVM, MLP, neural network, decision tree and random forest classifiers by considering existing features, namely, zoning features, diagonal features, intersection &open-end points features, directional features, transition features and centroid features and reported results based on Precision rate, False Acceptance Rate (FAR), and False Rejection Rate (FRR. An approach to recognize unconstrained offline handwritten words based on integration of position embeds with residual networks (ResNets) was developed. Based on this integration, generated outputs were given as an input to bidirectional long short-term memory (BiLSTM) networks for recognition of characters. They reported results of 91.97% accuracy and 1.79%-character error rate on two standard datasets, namely, 2017 ICDAR IEHHR competition and RIMES datasets, respectively. To select only relevant and discriminant characteristics from the word images, feature selection has a significant role. In this direction, Memetic Technique (MA) based wrapper filter selection approach is proposed to reduce the dimensionality of gradient based features and modified Statistical and Contour based features (SCF) in order to recognize offline handwritten Bangla words using holistic approach. Using MLP classifier, the proposed approach was experimented on dataset comprising 7500 words handwritten in Bangla script and attained 93% recognition accuracy after applying feature selection approach on hybrid of the considered features and gained 3.33% enhancement as compared to the rate attained through the original feature set. They also compared MA based approach with Genetic Technique (GA) based feature selection approach and revealed that MA achieved better recognition accuracy comparatively, even though GA selected a smaller number of features. GA based hierarchical feature selection method is proposed to minimize the shape (elliptical features) and texture (gradient based features) based on features extracted from the word samples, which has been reduced by approximately 28%. They evaluated the proposed method on a dataset of 12,000 handwritten Bangla word samples using the MLP classification technique and attained the recognition rate of 95.30%, which is 1.28% superior in comparison to the recognition rate attained through the original feature set. A hybrid Convolutional Neural Network (CNN) and XGBoost classifier was developed to recognize handwritten Ethiopian characters, where CNN was employed to extract features from images and to XGBoost for recognition and classification purpose. To conduct experiments, Handwritten Ethiopian Character Recognition (HECR) dataset was proposed which comprises a mixture of scripts, numerical representations, tonal symbols, special characters, punctuation and combining symbols. CNN generated error rate of 0.4630 whereas CNN XGBoost generated error rate of 0.1612, thus reveals the superiority of the hybrid model as compared to CNN model. An accuracy of 99.84% was attained, which is considered as superior based on comparison with some existing approaches.
However, till now, no recognized work is available in offline handwritten Gurumukhi word recognition system. Yet various word recognizing methods are available, but the existing methods are having issues of overlapping characters in a word which leads to a problem in its segmentation. Furthermore, the existing methods takes large time for recognizing words without segmentation of texts. In view of the foregoing discussion, there exists a need of a system and method for word recognition for postal automation using xgboost.
SUMMARY OF THE INVENTION
The present disclosure seeks to provide a holistic system and a method for offline handwritten Gurumukhi word recognition using extreme gradient boosting methodology by considering the four state-of-the-art features, namely, zoning, diagonal, intersection &open end points and peak extent features.
In an embodiment, a system for word recognition for postal automation using xgboost is provided. The system includes a scanner for scanning and thereby digitizing documents upon converting offline handwritten documents into a digital form to generate bitmap image of the documents, wherein the digitized image is stored in the form of bits.
The system includes a pre-processing module for pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel.
The system includes a feature extraction module for extracting zoning features, diagonal features, intersection & open-end points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate.
The system further includes a classification module in connection with the feature extraction module for classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique.
In an embodiment, a training module is used for training the system using a training dataset for predicting the words, wherein time spent on training is based on number of classes present in the dataset.
In another embodiment, a method for word recognition for postal automation using xgboost is provided. The method includes digitizing documents upon converting offline handwritten documents into a digital form using a scanner for scanning the documents to generate bitmap image of the documents, wherein the digitized image is stored in the form of bits. The method includes pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre-processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel.
The method includes extracting zoning features, diagonal features, intersection &
open-end points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate.
The method further includes classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique.
In an embodiment, to create the binary image, the threshold constant is positioned between higher and lower values which equate to white and black pixels, respectively.
In an embodiment, a process to provide uniformity to the words comprises: slicing the words from the binarized image of the document; cropping word to eliminate the white space around the words; and normalizing words to uniform size of 256x64.
In an embodiment, a process for extracting zoning features comprises: partitioning the word image into at least four zones and then further portioning the at least four zones into four zones resulting in total 16 zones, wherein each of the 16 zones further partitioning into 4 zones further resulting in total 64 zones; extricating features from each zone and computing features from pattern characteristics of each considered zone; and normalizing features to 0 and 1 after getting foreground pixels from n considered zones.
In an embodiment, a process for extracting diagonal features comprises: partitioning the word image into zones; extracting features from the foreground pixels along the diagonal of each considered zone, wherein there are multiple characteristics along the diagonal of each zone is available; considering average of multiple values to acquire a single value corresponding to each zone;and extricating 85 diagonal features after partitioning the word image into a few zones.
In an embodiment, a process for extracting peak extent features comprises: considering the various zones for the word image and extricating features by considering the sum of the lengths of the peak extent that places consecutive black pixels along each zone, wherein peak extent features are extracted horizontally as well as vertically, wherein in horizontal peak extent features, the sum of the lengths of peak extent is taken that place consecutive black pixels horizontally in each row of a zone whereas in vertical peak extent features, the sum of the lengths of peak extent that place consecutive black pixels vertically in each column of a zone is considered, wherein 170 (85 horizontal
&85 vertical peak extent features) peak extent features are considered herein from the word image.
In an embodiment, the gradient boosting technique boosts weak classifiers (learners) and thereafter generates a predicted model in the form of an ensemble of weak learners, wherein initially similar weight is assigned to all training samples that specifies probability of the record getting selected by the decision tree for the training purpose, wherein the weights are similar, so the probability of selection of all records is equal, wherein after training, the model is ready to predict, where in after prediction, whichever records are incorrectly classified by model, those weights get updated and thus fed to the second decision tree, where in for the second decision tree, whichever records have maximum weight, those records get selected for the training purpose when the weight is updated for the wrong predicted results, then that is passed to the next decision tree similarly, this process continues sequentially one after another up tonth decision tree, wherein after combining all the weak classifiers, the new final classifier gets generated which generates the final class of the record.
In an embodiment, the various machine learning approaches like HMM, k-NN, MLP, SVM, DCNN and various feature dimensionality reduction approaches such as GA, PSO and HS are utilized to recognize handwritten words of various scripts.
An object of the present disclosure is to develop a holistic system for word recognition for postal automation using xgboost.
Another object of the present disclosure is to extract features from the complete word image without segmenting the word image into its primitive components (characters).
Another object of the present disclosure is to eliminate all the limitations raised through touching characters, overlapping characters, cursive writing style etc. in segmentation-basedapproach.
Another object of the present disclosure is to develop a postal automation system in Gurumukhi script.
Yet another object of the present invention is to deliver an expeditious and cost effective a method for offline handwritten Gurumukhi word recognition using extreme gradient boosting methodology.
To further clarify advantages and features of the present disclosure, a more particular description of the invention will be rendered by reference to specific embodiments thereof, which is illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail with the accompanying drawings.
BRIEF DESCRIPTION OF FIGURES
These and other features, aspects, and advantages of the present disclosure will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 illustrates a schematic block diagram of a system for word recognition for postal automation using xgboost in accordance with an embodiment of the present disclosure; Figure 2 illustrates a flow chart of a method for word recognition for postal automation using xgboost in accordance with an embodiment of the present disclosure; Figure 3 illustrates a process flow of a system for word recognition for postal automation using xgboost in accordance with an embodiment of the present disclosure; Figure 4 illustrates an exemplary profile of a zoning features of considered Gurumukhi word in accordance with an embodiment of the present disclosure; Figures 5A, 5B, and 5C illustrate a plurality of exemplary profiles of a peak extent features in accordance with an embodiment of the present disclosure; Figure 6 illustrates an exemplary profile of a demonstration of extreme gradient boosting in accordance with an embodiment of the present disclosure; and Figure 7 illustrates an exemplary profile of an are a under curve (AUC) in accordance with an embodiment of the present disclosure.
Further, skilled artisans will appreciate that elements in the drawings are illustrated for simplicity and may not have been necessarily been drawn to scale. For example, the flow charts illustrate the method in terms of the most prominent steps involved to help to improve understanding of aspects of the present disclosure. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the drawings by conventional symbols, and the drawings may show only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the drawings with details that will be readily apparent to those of ordinary skill in the art having benefit of the description herein.
DETAILED DESCRIPTION
For the purpose of promoting an understanding of the principles of the invention, reference will now be made to the embodiment illustrated in the drawings and specific language will be used to describe the same. It will nevertheless be understood that no limitation of the scope of the invention is thereby intended, such alterations and further modifications in the illustrated system, and such further applications of the principles of the invention as illustrated therein being contemplated as would normally occur to one skilled in the art to which the invention relates.
It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the invention and are not intended to be restrictive thereof.
Reference throughout this specification to "an aspect", "another aspect" or similar language means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.
The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such process or method. Similarly, one or more devices or sub-systems or elements or structures or components proceeded by "comprises...a" does not, without more constraints, preclude the existence of other devices or other sub-systems or other elements or other structures or other components or additional devices or additional sub-systems or additional elements or additional structures or additional components.
Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The system, methods, and examples provided herein are illustrative only and not intended to be limiting.
Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.
Referring to Figure 1, a schematic block diagram of a system for word recognition for postal automation using xgboost is illustrated in accordance with an embodiment of the present disclosure. The system facilitates a holistic system for offline handwritten Gurumukhi word recognition based on XGBoost technique by considering the four state-of-the-art features, namely, zoning, diagonal, intersection &open-end points and peak extent features. The system 100 includes a scanner 102 for scanning and thereby digitizing documents upon converting offline handwritten documents into a digital form to generate bitmap image of the documents. The digitized image is stored in the form of bits. The documents are scanned at 300 dpi through the scanner 102.
In an embodiment, a pre-processing module 104is used for pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre-processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel. The binarization implies generating of the binary form. To create the binary image, the threshold constant is positioned between higher and lower values which equate to white and black pixels, respectively. The normalization operation takes place to normalize the words to uniform size in order to provide uniformity to the words from different size of words written by distinct writers with various writing styles to provide uniformity to the words. In disclosed system, after slicing the words from the binarized image of the document, the cropping operation is applied to eliminate the white space around the word. Then the normalization operation is applied to provide a uniform size of 256x64. This size has been selected due to horizontal writing style of Gurumukhi script. The thinning is used to minimize the text width from multiple pixels to a single pixel using parallel thinning technique.
In an embodiment, a feature extraction module 106is used for extracting zoning features, diagonal features, intersection & open-end points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate.
In an embodiment, a classification module 108is in connection with the feature extraction module 106 for classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique. In an embodiment, a training module 110 is used for training the system using a training dataset for predicting the words. Time spent on training is based on number of classes present in the dataset.
Figure 2 illustrates a flow chart of a method for word recognition for postal automation using xgboost in accordance with an embodiment of the present disclosure. At step 202, the method 200 includes digitizing documents upon converting offline handwritten documents into a digital form using a scanner 102 for scanning the documents to generate bitmap image of the documents. The digitized image is stored in the form of bits.
At step 204, the method 200 includes pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre-processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel.
At step 206, the method 200 includes extracting zoning features, diagonal features, intersection & open-end points features and peak extent features separately from the pre processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate.
At step 208, the method 200 includes classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique. In an embodiment, to create the binary image, the threshold constant is positioned between higher and lower values which equate to white and black pixels, respectively.
In an embodiment, a process to provide uniformity to the words comprises slicing the words from the binarized image of the document. The process further comprises cropping word to eliminate the white space around the words and normalizing words to uniform size of 256x64.
In an embodiment, a process for extracting zoning features comprises partitioning the word image into at least four zones and then further portioning the at least four zones into four zones resulting in total 16 zones, wherein each of the 16 zones further partitioning into 4 zones further resulting in total 64 zones. The process further comprises extricating features from each zone and computing features from pattern characteristics of each considered zone and normalizing features to 0 and 1 after getting foreground pixels from n considered zones.
In an embodiment, a process for extracting diagonal features comprises partitioning the word image into zones. The process further comprises extracting features from the foreground pixels along the diagonal of each considered zone, wherein there are multiple characteristics along the diagonal of each zone is available. The process further comprises considering average of multiple values to acquire a single value corresponding to each zone extricating 85 diagonal features after partitioning the word image into a few zones.
In an embodiment, a process for extracting peak extent features comprises considering the various zones for the word image and extricating features by considering the sum of the lengths of the peak extent that places consecutive black pixels along each zone. Peak extent features are extracted horizontally as well as vertically. In horizontal peak extent features, the sum of the lengths of peak extent is taken that place consecutive black pixels horizontally in each row of a zone whereas in vertical peak extent features, the sum of the lengths of peak extent that place consecutive black pixels vertically in each column of a zone is considered. Total 170 (85 horizontal & 85 vertical peak extent features) peak extent features are considered herein from the word image.
In an embodiment, the gradient boosting technique boosts weak classifiers (learners) and thereafter generates a predicted model in the form of an ensemble of weak learners. Initially similar weight is assigned to all training samples that specifies probability of the record getting selected by the decision tree for the training purpose. The weights are similar, so the probability of selection of all records is equal. After training, the model is ready to predict. After prediction, whichever records are incorrectly classified by model, those weights get updated and thus fed to the second decision tree. For the second decision tree, whichever records have maximum weight, those records get selected for the training purpose when the weight is updated for the wrong predicted results, then that is passed to the next decision tree. Similarly, this process continues sequentially one after another up to nth decision tree. After combining all the weak classifiers, the new final classifier gets generated which generates the final class of the record.
In an embodiment, the various machine learning approaches like HMM, k-NN, MLP, SVM, DCNN and various feature dimensionality reduction approaches such as GA, PSO and HS are utilized to recognize handwritten words of various scripts.
Figure 3 illustrates a process flow of a system for word recognition for postal automation using xgboost in accordance with an embodiment of the present disclosure. The main contributions of the proposed system includes employing method to offline handwritten Gurumukhi word recognition based on XGBoost technique by considering the four state-of the-art features, namely, zoning, diagonal, intersection &open-end points and peak extent features. To the best of authors' knowledge, XGBoost technique has been applied for the very first time for word recognition in Gurumukhi script. From the past few years, holistic approach to word recognition has been gaining a lot of attention due to better results as compared to segmentation-based approach. So, the holistic approach to word recognition has been utilized in this method. The proposed work has been evaluated on a public benchmark dataset in Gurumukhi script. In Punjab state, postal letters can be written using the Gurumukhi script and till now, no postal automation system exists in Gurumukhi script. Hence, this method is a motivating factor in this direction that recognizes offline handwritten Gurumukhi words which finds its application in postal automation. The high performance of the XGBoost technique as compared to other techniques motivates the authors to employ it for the recognition purpose. So, the aim of this method is to explore the efficiency of XGBoost technique for the proposed work.
At step 302, the method 300 includes converting offline handwritten documents into a digital form is called digitization. The digitization phase is the first step to process the documents by computer system whereby the digitized image is stored in the form of bits. The digitized image gets formed using scanner 102 which scans the document to generate the bitmap image of the document.
At step 304, the method 300 includes pre-processing the digitized image to reduce the variations in the writing styles. At step 306, the method 300 includes extracting the desirable features of the word image is called feature extraction. The feature extraction phase is significant for recognizing word images with high recognition rate because based on the extracted features, the word image is classified. There are various features available to recognize word images and various combinations of features have been proposed in the literature. In the system, only individual features such as zoning features, diagonal features, intersection & open-end points features and peak extent features, without taking their combinations are considered. To extract diagonal features, initially, the word image is partitioned into zones. Then, as the name implies, the features are extracted from the foreground pixels along the diagonal of each considered zone. There are multiple characteristics along the diagonal of each zone, so the average of these multiple values is considered to acquire a single value corresponding to each zone. After partitioning the word image into a few zones as illustrated in Figure4, the method extricated 85 diagonal features for the proposed work.
At step 308, the method 300 includes recognizing words based on the extracted features using the classification phase. This phase determines the class to which the word belongs to. There are many classification techniques available in the literature to classify the characters and words of various Indic and Non-Indic scripts. In the latest research, extreme gradient boosting (XGBoost) technique is performing superior to boost the performance of the recognition systems in the field of pattern recognition and image processing.
The extreme gradient boosting or XGBoost is an ensemble technique that comprises sequential decision trees. So, it also called as sequential ensemble technique. It is based on a gradient boosting technique that boosts weak classifiers (learners) and generates a predicted model in the form of an ensemble of weak learners. In this technique, initially similar weight is assigned to all training samples. These weights, specify the probability of the record getting selected by the decision tree for the training purpose. Since the weights are similar, so the probability of selection of all records is equal. After training, the model is ready to predict. After prediction, whichever records are incorrectly classified by model, those weights get updated and thus fed to the second decision tree, this model is called weak classifier, because this has not classified all the records correctly. For the second decision tree, whichever records have maximum weight, those records get selected for the training purpose. Thus, weight updating is significant in XGBoost. When the weight is updated for the wrong predicted results, then that is passed to the next decision tree. Similarly, this process continues sequentially one after another up to nth decision tree. After combining all the weak classifiers, the new final classifier gets generated which generates the final class of the record. Then the final classifier classifies the test sample based on the maximum of similar predictions by the weak classifiers as described in Figure 6.
In mathematical implementation, the XGBoost is a tree ensemble technique which combines multiple classifications and regression trees. Consider a dataset comprising n samples and m features D = R"', yic R'), the mathematical delineation of ,(xi,yi)}(IDI=xic
the ensemble technique is given as follows.
y^i = Z' _1 hk (xi), hk c R(1
where, k denotes the number of trees, h represents function in functional space R, R denotes the set comprising all possible classification and regression trees,
The objective function is delineated as follows:
F1 (O) = Eg h(yi, y^) + Zk_1 n(hk) (2)
where, h(yi y^)represents the training loss function, f(h)denotes the regularization function. h denotes the differentiable convex loss function that quantifies the deviation between prediction y^iand the target Yi. The purpose of XGBoost technique is to lessen F1 (0).
The regularization complexity can be described as follows:
D(h) = yL + AZT (3) where, y denotes the gamma parameter, L denotes the number of leaves, X denotes L 2 regularization term on weights in the model,o denotes the vector score on leaves,
As XGBoost technique generates trees on the basis of number of labels, so the time spent on training will be based on number of classes present in the dataset.
In an embodiment, in merits of XGBoost technique, due to following merits of XGBoost technique, it has been employed for the recognition purpose. It is an efficient and simple to use technique which provides high performance and accuracy than other techniques. It supports parallel processing and is faster in comparison to Gradient Boosting Machine (GBM).It has inherent LI and L2 regularization, which resolve the over fitting problem. Due to an in-built regularization facility, it is also known as a regularized form of GBM. It has an inherent potential to handle missing values. It permits users to perform cross validation at each successive (iteration) of the boosting process and hence it is easy to obtain the precise optimum number of boosting iterations in a single run.
In evaluation parameters, to evaluate the performance of the proposed system, various evaluation parameters have been considered like CPU elapsed time, Accuracy, Precision, Recall, Fl-Score and Area Under Curve (AUC). There are following four important terms that need to be considered. True Positive (TP): when the observation is true and is predicted to be true. False Negative (FN): when the observation is true and is predicted to be false. True Negative (TN): when the observation is false and is predicted to be false. False Positive (FP): when the observation is false and is predicted to be true.
For CPU elapsed time, the elapsed time is an evaluation parameter to measure the speed of the processor. To quantify the performance of the processor, it is considered as inversely proportional to the execution time. It is a superior measure to examine the processor speed due to its less dependent on other system components. It measures in milliseconds (ms). Accuracy is defined as the proportion of the number of correct predictions and the total number of input specimens. Accuracy can also be computed in terms of positives and negatives in case of binary classification as shown below.
TP+TN Accuracy = TP + TN + FP + FN
Precision is considered as the proportion of positive identification that i correct. It can be computed as the ratio of correct positive outcomes and the number of predicted positive outcomes.
TP Precision= TP +FP
Recall is considered as the proportion of actual positives that are identified correctly. It is computed as the proportion of correct positive outcomes and number of all relevant specimens identified as positive.
TP Recall = TP + FN
Fl-Score integrates precision and recall proportionate to certain positive class. It is computed as the weighted average of precision and recall as shown below. It has its best value at 1 and worst value at 0.
Precisionx Recall l-Score =2xPrecision + Recall
Figure 4 illustrates an exemplary profile of a zoning features of considered Gurumukhi word in accordance with an embodiment of the present disclosure. To extract zoning features, the word image is partitioned into a few zones and then from each zone, the features are extricated. These features are computed from pattern characteristics of each considered zone. After getting foreground pixels from n considered zones, these are normalized to thus corresponds to feature a set of n elements. For proposed method, initially, the method have partitioned the word image into 4 zones and then each of the 4 zones gets partitioned into 4 zones resulting in total 16 zones. Each of the 16 zones gets partitioned into 4 zones further resulting in total 64 zones and thus the individual features in form of foreground pixels are extracted from the respective zones. One feature is considered from the whole word image. Thus, total 1+4+16+64=85zoning features are extracted from the word image as depicted in Figure 4.
Figures 5A, 5B, and 5C illustrate a plurality of exemplary profiles of apeak extent features in accordance with an embodiment of the present disclosure. Peak extent features are proposed for the recognition of offline handwritten Gurumukhi characters. For disclosed method, the peak extent features have been applied to recognize offline handwritten Gurumukhi words. For extracting these features, initially, the various zones have been considered for the word image. Then these features are extricated by considering the sum of the lengths of the peak extent that places consecutive black pixels along each zone. Peak extent features are extracted horizontally as well as vertically. In horizontal peak extent features, the sum of the lengths of peak extent is taken that place consecutive black pixels horizontally in each row of a zone as shown in Figure 3B. Whereas in vertical peak extent features, the sum of the lengths of peak extent that place consecutive black pixels vertically in each column of a zone is considered as shown in Figure 3C.For the proposed work, the method have considered 170 peak extent features (85 horizontal & 85 vertical peak extent features) from the word image. Figure 3A illustrates zoning of bitmap word image.
Figure 6 illustrates an exemplary profile of a demonstration of extreme gradient boosting in accordance with an embodiment of the present disclosure. The extreme gradient boosting or XGBoost is an ensemble technique that comprises sequential decision trees. So, it also calls sequential ensemble technique. It is based on a gradient boosting technique that boosts weak classifiers (learners) and generates a predicted model in the form of an ensemble of weak learners. In this technique, initially similar weight is assigned to all training samples. These weights, specify the probability of the record getting selected by the decision tree for the training purpose. Since the weights are similar, so the probability of selection of all records is equal. After training, the model is ready to predict. After prediction, whichever records are incorrectly classified by model, those weights get updated and thus fed to the second decision tree, this model is called weak classifier, because this has not classified all the records correctly. For the second decision tree, whichever records have maximum weight, those records get selected for the training purpose. Thus, weight updating is significant in XGBoost. When the weight is updated for the wrong predicted results, then that is passed to the next decision tree. Similarly, this process continues sequentially one after another up to nth decision tree. After combining all the weak classifiers, the new final classifier gets generated which generates the final class of the record. Then the final classifier classifies the test sample based on the maximum of similar predictions by the weak classifiers as described in Figure6.
Figure 7 illustrates an exemplary profile of an area under curve (AUC) in accordance with an embodiment of the present disclosure. AUC is employed for binary classification problems. It indicates the probability that the classifier will rank a randomly selected positive sample higher than a randomly selected negative sample. AUC is calculated using ROC (Receiver Operating Characteristic) curve by mapping True Positive Rate (TPR) along y-axis against False Positive Rate (FPR) along x-axis (Figure 7), which are explained as below. AUC lies in the range of [0,1]. The higher the value of the AUC, the superior is the performance of the model. True Positive Rate (TPR): TPR is considered as the ratio of positive data points that are correctly observed as positive, with respect to all positive data points. It also terms as sensitivity.
TP T PR = (FN+TP)
False Positive Rate (FPR): FPR is considered as the ratio of negative data points that are wrongly observed as positive, with respect to all negative data points. It also terms as specificity.
FP FPR = (FP + TN)
In an embodiment, the benchmark dataset of Gurumukhi script is used in the system. They have also presented a survey on word recognition for non-Indic and Indic scripts. This dataset comprises 40,000 samples of 100 place names written by 40 different writers, where each writer has written each word 10 times. Table 1 illustrates the few samples of the dataset written by 4 different writers. Table 1: Handwritten Gurumukhi word Samples
Script Word W1 W2 W3 W4
2D I A1Ws I
In an embodiment, the experiments on the proposed system are presented by considering the mentioned evaluation parameters. For performing experiments on the considered dataset, the dataset has been divided into training and testing sets using three partitioning strategies as depicted in Table 2. In the strategy a, 90% data is considered in the training set and remaining % data is considered in the testing set. The strategy b considers 80% data in the training set and 20% data in the testing set. Whereas last strategy c considers 70% data in the training set and remaining 30% data in the testing set.
Table 2: Dataset Partitioning Strategies
Strategy Training: Testing Training Set (words) Testing Set (words) ratio a 90:10 36,000 4,000 b 80:20 32,000 8,000 c 70:30 28,000 12,000
Experiments on the proposed system has been performed using four feature extraction techniques, namely, zoning features, diagonal features, intersection & open-end point features and peak extent features. Then the feature wise results are reported based on six evaluation parameters, namely, CPU Elapsed time, Accuracy, Precision, Recall, Fl-score, AUC by considering the XGBoost technique as discussed below. Performance based on zoning features based on zoning features, the best accuracy rate, precision rate, recall rate, Fl-score and AUC of 91.66%, 91.39%, 91.66%, 91.14% and 95.66% are achieved, respectively, by considering 90% data in the training set and 10% data in the testing set as shown in Table 3. By employing only zoning features to XGBoost technique, the best CPU Elapsed time is 43.63ms based on 80:20 partitioning strategy. Thus, the best rates are reported using strategy a (90:10).
Table 3: System performance based on zoning features
Evaluation parameters Partitioning CPU Elapsed Time Accuracy Precision Recall Fl Score AUC strategy 90:10 50.09 91.66 91.39 91.66 91.14 95.66 80:20 43.63 91.34 91.21 91.34 91.04 95.50 70:30 46.33 89.93 90.01 89.93 89.84 94.76
In an embodiment, Performance based on diagonal features using diagonal features as an input to XGBoost technique, the method have attained the best accuracy and recall rates of 91.30% and an AUC of 95.47% based on 90% training and 10% testing set. Whereas, the best precision and Fl-score of 91.03% and 90.88% are reported, respectively, based on 80:20 ratio of training and testing set as depicted in Table 4. The best CPU elapsed time reported is 42.77 ms using 70:30 partitioning strategy.
Table 4: System performance based on diagonal features
Evaluation parameters Partitioning CPU Elapsed Time Accuracy Precision Recall Fl Score AUC strategy :10 51.60 91.30 90.95 91.30 90.73 95.47 :20 46.86 91.18 91.03 91.18 90.88 95.41 :30 42.77 90.00 90.06 90.00 89.92 94.80
In performance based on intersection & open-end points features by employing intersection & open-end point features to the XGBoost technique, the maximum accuracy and recall rates are 88.37%, precision and AUC are 88.40% and 93.94%, respectively, using :10 partitioning strategy. Whereas the best Fl-score of 87.95% has been reported based on :20 partitioning strategy as illustrated in Table 5. The method have attained the best CPU elapsed time of 57.20 ms using 70% training and 30% testing set.
Table 5: System Performance based on intersection & open-end points features
Evaluation parameters Partitioning CPU Elapsed Time Accuracy Precision Recall Fl Score AUC strategy 90:10 73.51 88.37 88.40 88.37 87.87 93.94 80:20 67.28 88.31 88.24 88.31 87.95 93.91 70:30 57.20 87.22 87.54 87.22 87.21 93.35
In performance based on peak extent features based on peak extent features, the maximum accuracy and recall rates of 86.27%, a precision rate of 86.13%, Fl-score of 85.73% and an AUC of 92.85% have been attained using 90:20 partitioning strategy. Whereas the best CPU elapsed time of 57.23ms is reported using 70:30 partitioning strategy as mentioned in Table 6.
Table 6: System Performance based on peak extent features
Evaluation parameters Partitioning strategy CPU Elapsed Accuracy Precision Recall Fl Score AUC Time :10 79.16 86.27 86.13 86.27 85.73 92.85 :20 64.82 84.44 84.53 84.44 84.26 91.90 :30 57.23 83.36 83.27 83.36 83.12 91.34
In an embodiment, comparison with state-of-the-art work and analysis. Various machine learning approaches like HMM, k-NN, MLP, SVM, DCNN and various feature dimensionality reduction approaches such as GA, PSO and HS, which are utilized to recognize handwritten words of various scripts. These state-of-the-art approaches have been compared with the proposed approach in terms of recognition accuracy as depicted in Table 7. Due to non-availability of word recognition approaches in Gurumukhi script, the proposed approach has also been compared with character recognition approaches in Gurumukhi script as presented in Table 8.
Table 7:Comparison of the proposed work with existing methodologies
Dataset Feature Classification Recognition Authors extraction approach accuracy approach (i) IFN/ENIT, Density and (i) 79.8%o, Kessentiniet al. (ii) IRONOFF contour-based HMM (ii) 89.8%7 features 300 English word samples of Structural Patel et al. 30 district names features k-NN 90% of Kamataka state 1020 Bangla HS based Das et al. 1020 feature selection MLP 90.29% approach and Gabor filters Assayony CENPARMI integrated with SVM 86.44% Mahmoud Bag-of-features Arnold transform based (i) CENPARMI, features, (i) 95.23%, Gupta et al. (ii) ISIHWD, curvature-based SVM (ii) 97.16%, (iii) IAM200 features and (iii) 95.07% DCNN based features (i) Iran-cities, (i) 67.47%, Tavoliet al. (ii) IFN/ENIT, SGCSL SVM (ii) 80.78%, (iii) IBN SINA (iii) 86.22% image gradient, black-white Araniet al. Iranshahr 3 transitions, and HMM and MLP 89.06% contour chain code features gradient -based 89.67% (without features and feature 7500 Bangla modified SCF; selection) Ghosh et al. handwritten MA based MLP 93% (with words wrapper filter feature selection selection) approach (i) zoning features, 40,000 (ii) diagonal (i) 91.66% Proposed Gurumukhi features, (ii) 91.30%, Approach handwritten (iii) intersection XGBoost (iii) 88.37%, words &open-end (iv) 86.27% points, (iv) peak extent features
Table 8: Comparison of the proposed work with existing Gurumukhi character recognition methodologies
Dataset in Feature Gurumukhi extraction Classification Recognition Authors script (character approach approach accuracy samples) horizontal peak extent, vertical (i) Linear-SVM, (i) 95.62%, Kumar et al. 7000 peak extent, (ii) k-NN, (ii) 95.48%, shadow, centroid (iii) MLP (iii) 94.74% features horizontal peak extent, vertical peak extent, diagonal, and Kumar et al. 3500 centroid SVM 91.80% (PCA) features; SM9.0oPA Feature selection approaches: CFS, PCA and CON power curve fitting and (i) SVM (i) 97.14% Kumar et al. 3500 parabola curve (ii) k-NN (ii) 98.10% fitting-based features features based on boundary extent (i) k-NN, 93.8% (RBF Kumar et al. 7000 of the character (ii) SVM' SVM) sample; (iii) MLP PCA DCT2, DWT2, fast Fourier Kumar et al. 10,500 transformations SVM 95.8% (DCT2) and fan beam transformations (i) zoning features, 40,000 (ii) diagonal (i) 91.66% Proposed Gurumukhi features, (ii) 91.30%, Approach handwritten (iii) intersection XGBoost (iii) 88.37%, words &open-end (iv) 86.27% points, (iv) peak extent features
In comparative analysis based on comparison with existing approaches to word recognition, the method have drawn the following inferences. Based on a benchmark dataset of Gurumukhi script, the method have achieved the best word recognition rate of 91.66% using zoning features which surpasses the recognition rates achieved through some state-of the-art methods in other scripts .Comparatively, the proposed approach attained recognition rates of 91.30%, 88.37% and 86.27% based on diagonal features, intersection &open-end points features and peak extent features, respectively. The proposed method attained a higher recognition rate in comparison to the approach in the case where the latter approach considered a hybrid of the features without using feature selection approach. But with feature selection approach, the latter approach surpassed the proposed approach. Even without using the feature selection approach, the proposed approach attained significant results in comparison who used HS based feature selection approach to minimize the dimensionality of the set of 65 elliptical features to 48 features and enhanced the Bangla handwritten word recognition rate from 81.37% to 90.29%. A recognition rate of 91.80% for Gurumukhi handwritten characters using a PCA feature selection approach which is very close to the recognition rate attained through the proposed method is attained. Thus, the proposed approach can provide better recognition rate using feature selection approaches which remain a further area of research. The best recognition rates of 95.23%, 97.16% and 95.07% on three public datasets, namely, CENPARMI, ISIHWD and IAM200, respectively is achieved. The spike in their recognition results is due to the amalgamation of the outcome of SVM classifiers based on three features, namely, Arnold transformation-based features, curvature based features and DCNN based features. The same amalgamation approach can be applied to the proposed word recognition system in the future. But without hybrid approach, the proposed system provides significant results comparatively when the former approach is tested on Arnold transformation-based features and curvature-based features individually. As elucidated in Table 8, SVM classifier is mostly used classifier in character recognition of Gurumukhi script and has provided the best result based on the power curve fitting-based features. It is clear from the comparative analysis that XGBoost technique provided a better machine learning model to recognize handwritten words of Gurumukhi script. But this technique proved good in combination with several features considered and has provided the best result in combination with zoning features using 90:10 partitioning strategy where 90% data has been used to train the model and remaining 10% data has been used to test the proposed model.
In advantages and disadvantages of proposed approach, the advantages and disadvantages of the proposed approach as an indicator for further research in the field of word recognition. Advantages of proposed approach. The proposed approach extracts the features from the complete word image without segmenting the word image into its primitive components (characters). Thus, it eliminates all the limitations raised through touching characters, overlapping characters, cursive writing style etc. in segmentation-based approach. Maximum recognition accuracy of 91.66% has been attained, which is significant based on comparison with state-of-the-art work as delineated in Table 7 and 8.Till now no postal automation system exists in Gurumukhi script. The proposed work is an endeavor in this direction as this work recognizes handwritten place names in Gurumukhi script. This approach can be applied to other scripts by training the proposed model using a dataset of that script.
In disadvantages of proposed method, as the training time of XGBoost depends on the number of classes present in the dataset. The proposed approach has been evaluated on 100 classes, thus XGBoost training time is comparatively more in this work. This approach may fail for overlapped and touched characters in a word. This approach may not work for overlapping words. The approach has been tested on limited dataset comprising only 40,000 word samples. The results can be enhanced by using a larger training dataset.
In inferences and future directions, a holistic approach is proposed to offline handwritten Gurumukhi word recognition based on XGBoost technique. A few state-of-the art features such as zoning features, diagonal features, intersection & open-end point-based features and peak extent features are given as an input to XGBoost technique to test the efficacy of the proposed system. The proposed approach finds its application in postal automation due to recognition of handwritten place names of Gurumukhi script present in the public benchmark dataset. Among all the features considered, the zoning features provide the best accuracy and recall rate of 91.66%, a precision rate of 91.39%, Fl-score of 91.14% and an AUC of 95.66% based on 36,000 training and 4000 testing words handwritten in Gurumukhi script. Except zoning features, the least CPU elapsed time is reported in the case of 70% training and 30% testing set. The comparison of the proposed approach with existing approaches in character and word recognition indicates the efficiency of XGBoost technique for the considered features. In the future, several combinations of the features can be tested as an input to XGBoost technique to test its efficiency for the hybrid features. Moreover, different feature selection techniques can be applied to reduce the dimensionality of the original feature set in order to lessen the burden of classification and to enhance the system performance. The proposed approach can also be applied to other North Indian scripts such as Devanagari which are having a similar structure as that of the Gurumukhi script.
In an embodiment, handwritten word recognition is undoubtedly a challenging task due to various writing styles of individuals. So, lots of efforts are put to recognize handwritten words using efficient classifiers based on extracted features that rely on the visual appearance of the handwritten text. Due to numerous real time applications, handwritten word recognition is an important research area which is seeking a lot of attention from researchers for the last 10 years. A holistic approach and extreme Gradient Boosting (XGBoost) technique is proposed to recognize offline handwritten Gurumukhi words. In this direction, four state-of-the-art features like zoning, diagonal, intersection &open-end points and peak extent features have been considered to extract discriminant features from the handwritten word digital images. The proposed approach is evaluated on a public benchmark dataset of Gurumukhi script that comprises 40,000 samples of handwritten words. Based on extracted features, the words are classified into one of the 100 classes based on XGBoost technique. Effectiveness of the system is assessed based on several evaluation parameters like CPU Elapsed time, Accuracy, Precision, Recall, Fl-Score and AUC (Area Under Curve). XGBoost technique attained the best results of accuracy (91.66%), recall (91.66%), precision (91.39%), Fl-score (91.14%) and AUC (95.66%) using zoning features based on 90% data as the training set and remaining 10% data as the testing set. The comparison of the proposed approach with existing approaches has also been done which reveals the significance of the XGBoost technique comparatively.
The drawings and the forgoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples.
Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of embodiments is at least as broad as given by the following claims.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.

Claims (10)

WE CLAIM
1. A method for word recognition for postal automation using xgboost, the method comprising:
digitizing documents upon converting offline handwritten documents into a digital form using a scanner for scanning the documents to generate bitmap image of the documents, wherein the digitized image is stored in the form of bits; pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre-processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel; extracting zoning features, diagonal features, intersection & open-end points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate; and classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique.
2. The method as claimed in claim 1, wherein to create the binary image, the threshold constant is positioned between higher and lower values which equate to white and black pixels, respectively.
3. The method as claimed in claim 1, wherein a process to provide uniformity to the words comprises:
slicing the words from the binarized image of the document; cropping word to eliminate the white space around the words; and normalizing words to uniform size of 256x64.
4. The method as claimed in claim 1, wherein a process for extracting zoning features comprises: partitioning the word image into at least four zones and then further portioning the at least four zones into four zones resulting in total 16 zones, wherein each of the 16 zones further partitioning into 4 zones further resulting in total 64 zones; extricating features from each zone and computing features from pattern characteristics of each considered zone; and normalizing features to 0 and 1 after getting foreground pixels from n considered zones.
5. The method as claimed in claim 1, wherein a process for extracting diagonal features comprises:
partitioning the word image into zones; extracting features from the foreground pixels along the diagonal of each considered zone, wherein there are multiple characteristics along the diagonal of each zone is available; considering average of multiple values to acquire a single value corresponding to each zone; and extricating 85 diagonal features after partitioning the word image into a few zones.
6. The method as claimed in claim 1, wherein a process for extracting peak extent features comprises:
considering the various zones for the word image and extricating features by considering the sum of the lengths of the peak extent that places consecutive black pixels along each zone, wherein peak extent features are extracted horizontally as well as vertically, wherein in horizontal peak extent features, the sum of the lengths of peak extent is taken that place consecutive black pixels horizontally in each row of a zone whereas in vertical peak extent features, the sum of the lengths of peak extent that place consecutive black pixels vertically in each column of a zone is considered, wherein 170(85 horizontal & 85 vertical peak extent features) peak extent features are considered herein from the word image.
7. The method as claimed in claim 1, wherein the gradient boosting technique boosts weak classifiers (learners) and thereafter generates a predicted model in the form of an ensemble of weak learners, wherein initially similar weight is assigned to all training samples that specifies probability of the record getting selected by the decision tree for the training purpose, wherein the weights are similar, so the probability of selection of all records is equal, wherein after training, the model is ready to predict, where in after prediction, whichever records are incorrectly classified by model, those weights get updated and thus fed to the second decision tree, where in for the second decision tree, whichever records have maximum weight, those records get selected for the training purpose when the weight is updated for the wrong predicted results, then that is passed to the next decision tree similarly, this process continues sequentially one after another up tonth decision tree, where in after combining all the weak classifiers, the new final classifier gets generated which generates the final class of the record.
8. The method as claimed in claim 1, whereinthe various machine learning approaches like HMM, k-NN, MLP, SVM, DCNN and various feature dimensionality reduction approaches such as GA, PSO and HS are utilized to recognize handwritten words of various scripts.
9. A system for word recognition for postal automation using xgboost, the system comprising:
a scanner for scanning and thereby digitizing documents upon converting offline handwritten documents into a digital form to generate bitmap image of the documents, wherein the digitized image is stored in the form of bits; a pre-processing module for pre-processing the digitized image for reducing variations in writing styles of the documents by means of three basic pre-processing operations including binarization to create the binary image, normalization to provide uniformity to the words and thinning operations to minimize the text width from multiple pixels to a single pixel; a feature extraction module for extracting zoning features, diagonal features, intersection & open-end points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate; and a classification module in connection with the feature extraction module for classifying extracted features and thereby recognizing words using classification phase upon determining class of the words using extreme gradient boosting technique.
10. The system as claimed in claim 1, comprises a training module for training the system using a training dataset for predicting the words, wherein time spent on training is based on number of classes present in the dataset.
Scanner 102 Pre-processing Module 104
Feature Classification Extraction Module 108 Module 106
Training Module 110
Figure 1
0
digitizing documents upon converting offline handwritten documents into a digital form using a scanner for scanning the documents to generate bitmap image of the documents, wherein the digitized image is stored in the form of bits 2 202
pre-processing the digitized image for reducing variations in writing writ styles of the documents by means of three basic pre- processing operations including binarization to create the binary image, normalization to provide uniformity to the words and 204 2 thinning operations to minimize the text width from multiple pixels to a single pixel
2 206 extracting zoning features, diagonal features, intersection & open-end pen-e points features and peak extent features separately from the pre-processed digitized image in a holistic way in order to get the desired feature database for recognizing word images with high recognition rate
2 208 classifying extracted features and thereby recognizing words using classification c phase upon determining class of the words using extreme gradient boosting technique
Figure 2
304
306
308
Figure 3
Figure 4
Figure 5A Figure 5B
Figure 5C Figure 6
Figure 7
AU2021100089A 2021-01-07 2021-01-07 A method to word recognition for the postal automation and a system thereof Ceased AU2021100089A4 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2021100089A AU2021100089A4 (en) 2021-01-07 2021-01-07 A method to word recognition for the postal automation and a system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AU2021100089A AU2021100089A4 (en) 2021-01-07 2021-01-07 A method to word recognition for the postal automation and a system thereof

Publications (1)

Publication Number Publication Date
AU2021100089A4 true AU2021100089A4 (en) 2021-04-01

Family

ID=75267730

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2021100089A Ceased AU2021100089A4 (en) 2021-01-07 2021-01-07 A method to word recognition for the postal automation and a system thereof

Country Status (1)

Country Link
AU (1) AU2021100089A4 (en)

Similar Documents

Publication Publication Date Title
Malakar et al. A GA based hierarchical feature selection approach for handwritten word recognition
Jayadevan et al. Offline recognition of Devanagari script: A survey
Das et al. Handwritten Bangla character recognition using a soft computing paradigm embedded in two pass approach
Nurseitov et al. Handwritten Kazakh and Russian (HKR) database for text recognition
Kaur et al. Offline handwritten Gurumukhi word recognition using eXtreme Gradient Boosting methodology
Dargan et al. Writer identification system for indic and non-indic scripts: State-of-the-art survey
Kaur et al. A comprehensive survey on word recognition for non-Indic and Indic scripts
Fazilov et al. State of the art of writer identification
Guptha et al. Cross lingual handwritten character recognition using long short term memory network with aid of elephant herding optimization algorithm
Kaur et al. On the recognition of offline handwritten word using holistic approach and AdaBoost methodology
Inunganbi et al. Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition
Kumar et al. A study on recognition of pre-segmented handwritten multi-lingual characters
Borah et al. Efficient Assamese Word Recognition for Societal Empowerment: A Comparative Feature-Based Analysis.
Malakar et al. Handwritten Arabic and Roman word recognition using holistic approach
Sathya Narayanan et al. RETRACTED ARTICLE: An efficient recognition system for preserving ancient historical documents of English characters
Sundaram et al. Performance enhancement of online handwritten Tamil symbol recognition with reevaluation techniques
Singh et al. On the performance analysis of various features and classifiers for handwritten devanagari word recognition
Singh et al. Online handwritten Gurmukhi words recognition: An inclusive study
Kaur et al. Bagging: An Ensemble Approach for Recognition of Handwritten Place Names in Gurumukhi Script
Choudhary et al. A neural approach to cursive handwritten character recognition using features extracted from binarization technique
AU2021100089A4 (en) A method to word recognition for the postal automation and a system thereof
Abaynarh et al. ENHANCED FEATURE EXTRACTION OF HANDWRITTEN CHARACTERS AND RECOGNITION USING ARTIFICIAL NEURAL NETWORKS.
Suriya et al. Intelligent character recognition system using convolutional neural network
Sonavane et al. Review on optical character recognition-based applications of industrial iot
Balakrishnan Offline handwritten recognition of Malayalam district name-a holistic approach

Legal Events

Date Code Title Description
FGI Letters patent sealed or granted (innovation patent)
MK22 Patent ceased section 143a(d), or expired - non payment of renewal fee or expiry