SE2251012A1

SE2251012A1 - System and method for form-filling by character recognition of identity documents

Info

Publication number: SE2251012A1
Application number: SE2251012A
Authority: SE
Inventors: Unadkat Chetan Harshadkumar; Tomar Manvendra Singh; Rishab Bhattacharjee; Narayana Bulusu Sita Rama
Original assignee: Seamless Distrib Systems Ab
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2024-03-01

Abstract

A system for form-filling by character recognition of identity document(s) is disclosed. The system includes a processing subsystem which includes an image capture assistance module, which receives a signal corresponding to a mode of scanning selected by a first user, an image of the identity document(s) affiliated to a second user upon scanning, and performs adjustment(s) and detection of an area of interest in the image, a text cropping module, which detects text area contour(s) from the area of interest and generates sub-image(s) by cropping a text area from the image, a character recognition module, which performs dynamic quantization on the sub-image(s) for converting them to binary image(s) which are fed to a character recognition engine, and receives corresponding text, and a form filling module, which maps the text to field(s) in a preferred form for form-filling the preferred form by dodging a scope of tempering of information.

Description

SYSTEM AND METHOD FOR FORM-FILLING BY CHARACTER RECOGNITION OF IDENTITY DOCUMENTS BACKGROUND [000 l] Embodiments of the present disclosure relate to, a character recognition systern and more particularly to a system and a method for form-f1lling by character recognition of one or more identity documents.

[0002] Filling of forms is generally done for collecting relevant and required identity- related information from an applicant for accomplishing a preferred activity such as purchasing a Subscriber Identity Module (SIM) card, performing a banking-related activity, or performing a legal activity, and the like. It is typically difficult for a machine to comprehend the language and other information on identity documents because they are designed to be human-readable. Some documents have text information for humans and a QR code for machines as part of their hybrid reading design. HoWever, as the maj ority of documents lack this feature, data capture frequently depends on the accuracy of the person inputting the information from the identity document or requires an additional layer of data validation against a photocopy of the original document.

Otherwise, tampering With the information While filling out the form can be done.

[0003] Further, there are multiple systems that can be used for recognizing text from the identity documents. HoWever, such multiple systems need expensive servers and high bandwidth connections between server and client to process images and extract text, thereby making the process time-consuming.

[0004] Hence, there is a need for an improved system and a method for form-filling by character recognition of one or more identity documents in order to address the aforementioned issues. BRIEF DESCRIPTION

[0005] In accordance With an embodiment of a present disclosure, a system for form- filling by character recognition of one or more identity documents is disclosed. The system includes a processing subsystem. The processing subsystem is configured to execute on a netWork to control bidirectional communications among a plurality of modules. The processing subsystem includes an image Capture assistance module configured to receive a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration. The image capture assistance module is also configured to receive an image of the one or more identity documents affiliated to a second user When the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning. Further, the image capture assistance module is also configured to perform one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming models, based on the mode of scanning selected by the first user. The processing subsystem also includes a text cropping module operatively coupled to the image capture assistance module. The text cropping module is configured to detect one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection machine leaming model. The text cropping module is also configured to generate one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours. Further, the processing subsystem also includes a character recognition module operatively coupled to the text cropping module. The character recognition module is configured to perform dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images. The character recognition module is also configured to feed the one or more binary images to a character recognition engine. The character recognition engine includes a character recognition machine leaming model trained to recognize one or more characters in one or more predefined languages using machine leaming. Further, the character recognition module is also configured to receive a text for each of the one or more sub-images from the character recognition engine. Furthermore, the processing subsystem also includes a form ﬁlling module operatively coupled to the character recognition module. The form filling module is configured to map the text received by the character recognition module to one or more fields in a preferred form for forrn-filling the preferred form by dodging a scope of tempering of information When forrn-filling.

[0006] In accordance With another embodiment of the present disclosure, a method for forrn-ﬁlling by character recognition of one or more identity documents is disclosed.

The method includes receiving a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration. The method also includes receiving an image of the one or more identity documents affiliated to a second user When the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning. Further, the method also includes performing one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming models, based on the mode of scanning selected by the first user. Furthermore, the method also includes detecting one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection machine leaming model. The method further includes generating one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours. Moreover, the method also includes performing dynamic quantization on the one or more sub- images for converting the corresponding one or more sub-images to one or more binary images. In addition, the method also includes feeding the one or more binary images to a character recognition engine, Wherein the character recognition engine includes a character recognition machine leaming model trained to recognize one or more characters in one or more predefined languages using machine leaming. The method also includes receiving a text for each of the one or more sub-images from the character recognition engine. Further, the method includes mapping the text received by the character recognition module to one or more fields in a preferred form for forrn-filling the preferred form by dodging a scope of tempering of information When forrn-filling.

[0007] To further clarify the advantages and features of the present disclosure, a more particular description of the disclosure Will follow by reference to specific embodiments thereof, Which are illustrated in the appended figures. It is to be appreciated that these figures depict only typical embodiments of the disclosure and are therefore not to be considered limiting in scope. The disclosure Will be described and explained With additional specificity and detail With the appended figures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0008] The disclosure Will be described and explained With additional specificity and detail With the accompanying figures in Which:

[0009] FIG. 1 is a block diagram of a system for form-filling by character recognition of one or more identity documents in accordance With an embodiment of the present disclosure;

[0010] FIG. 2 is a schematic representation of an exemplary embodiment of a system for form-filling by character recognition of one or more identity documents of FIG. 1 in accordance With an embodiment of the present disclosure;

[0011] FIG. 3 is a block diagram of a character recognition computer or a character recognition server in accordance With an embodiment of the present disclosure;

[0012] FIG. 4 (a) is a ﬂoW chart representing the steps involved in a method for form- filling by character recognition of one or more identity documents of FIG.l in accordance With an embodiment of the present disclosure; and

[0013] FIG. 4 (b) is a ﬂoW chart representing continued steps involved in a method of FIG. 4 (a) in accordance With an embodiment of the present disclosure.

[0014] Further, those skilled in the art Will appreciate that elements in the figures are illustrated for simplicity and may not have necessarily been draWn to scale. Furthermore, in terms of the construction of the device, one or more components of the device may have been represented in the figures by conventional symbols, and the figures may show only those speciﬁc details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the figures With details that Will be readily apparent to those skilled in the art having the benefit of the description herein. DETAILED DESCRIPTION

[0015] For the purpose of promoting an understanding of the principles of the disclosure, reference Will now be made to the embodiment illustrated in the figures and specific language Will be used to describe them. It Will nevertheless be understood that no limitation of the scope of the disclosure is thereby intended. Such alterations and further modifications in the illustrated system, and such further applications of the principles of the disclosure as would norrnally occur to those skilled in the art are to be construed as being within the scope of the present disclosure.

[0016] The terms "comprises", "comprising", or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps but may include other steps not expressly listed or inherent to such a process or method. Similarly, one or more devices or sub- systems or elements or structures or components preceded by "comprises... a" does not, without more constraints, preclude the existence of other devices, sub-systems, elements, structures, components, additional devices, additional sub-systems, additional elements, additional structures or additional components. Appearances of the phrase "in an embodiment", "in another embodiment" and similar language throughout this specification may, but not necessarily do, all refer to the same embodiment.

[0017] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the art to which this disclosure belongs. The system, methods, and examples provided herein are only illustrative and not intended to be limiting.

[0018] In the following specification and the claims, reference will be made to a number of terms, which shall be defined to have the following meanings. The singular a: aa a: a, forms an”, and “the” include plural references unless the context clearly dictates otherwise.

[0001] Embodiments of the present disclosure relate to a system and a method for form-filling by character recognition of one or more identity documents. Generally, form filling is done to collect relevant and required identity-related information from an applicant for accomplishing a preferred activity. In one embodiment, the preferred activity may be purchasing a Subscriber Identity Module (SIM) card, performing a banking-related activity, performing a legal activity, and the like. Further, as used herein, the term “character recognition” refers to a process which allows computers to recognize written or printed characters such as numbers or letters and to change them into a form that the computer can use. Also, as used herein, the term “identity document” refers to a document issued by a state authority to an individual for providing evidence of the identity of that individual. In one embodiment, the one or more identity documents may include a driver”s license, an identity card, passport, Aadhaar card, or the like. Further, the system described hereafter in FIG. 1 is the system for form-filling by character recognition of the one or more identity documents.

[0019] FIG. 1 is a block diagram of a system 100 for form-filling by character recognition of one or more identity documents in accordance with an embodiment of the present disclosure. The system 100 includes a processing subsystem 105. In one embodiment, the processing subsystem 105 may be hosted on a server. In one embodiment, the server may include a cloud server. In another embodiment, the server may include a local server. The processing subsystem 105 is configured to execute on a network (not shown in FIG. 1) to control bidirectional communications among a plurality of modules. In one embodiment, the network may include a wired network such as a local area network (LAN). In another embodiment, the network may include a wireless network such as wireless fidelity (Wi-Fi), Bluetooth, Zigbee, near field communication (N FC), infrared communication, or the like.

[0020] Basically, for filling a preferred form by performing character recognition on the one or more identity documents, an image of the corresponding one or more identity documents in good quality may be needed. Therefore, a user authorized for receiving the preferred form upon filling, so that a preferred activity is accomplished, may use the system 100 for doing the same. Further, for assisting the user in capturing the image of good quality, the processing subsystem 105 includes an image capture assistance module 110. The image capture assistance module 110 is configured to receive a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish the preferred activity, upon registration. The image capture assistance module 110 is also configured to receive the image of the one or more identity documents affiliated to a second user when the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning. Further, the image capture assistance module 110 is also configured to perform one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming (ML) models, based on the mode of scanning selected by the first user.

[0002] Moreover, for the first user to use the system 100, the first user may have to register With the systern 100. Thus, in an embodiment, the processing subsystem 105 may include a registration module (as shown in FIG. 2). The registration module may register the first user upon receiving a plurality of first user details via a first user device. In one embodiment, the plurality of first user details may include at least one of a name, an address, contact details, and the like corresponding to the first user. The plurality of first user details may be stored in a database of the system 100. In one exemplary embodiment, the database may be a local database or a cloud database. Also, in an embodiment, the first user device may be a mobile phone, a tablet, a laptop, or the like.

[0003] In one embodiment, the first user may be the user authorized for receiving the preferred form upon filling, so that the preferred activity is accomplished. In a further embodiment, the second user may be a person Willing to get the preferred activity accomplished by the first user, upon providing the one or more identity documents associated With the second user. Further, in an embodiment, the mode of scanning may include a user-controlled mode. In another embodiment, the mode of scanning may include a system-controlled mode.

[0004] In the user-controlled mode, the first user may receive guidance from the system 100 for capturing the image correctly, however, a capture action may be done by the first user via the first user device. The first user device may be equipped With a camera. In such embodiment, the one or more adjustments performed by the system 100 via the image capture assistance module 110 may include at least one of an orientation correction, a comer detection, and the like. The orientation correction may correspond to correcting an orientation of the image to a desired orientation needed by one or more doWnstream systems. The correction may include minor correction of the orientation With respect to x and y axes as Well as a major correction of multiples of 90°. Further, the comer detection may be performed on a resultant image for obtaining boundaries of the image. Fine adjustments of the boundaries may be done by the first user. Furthermore, the Whole process is repeated by the first user for a backside of the image of the corresponding one or more identity documents.

[0005] In the system-controlled mode, the first user may only hold the camera of the first user device in front of the corresponding one or more identity documents. The system 100 detects comers of the corresponding one or more identity documents, captures the 7 image between the detected coordinates, corrects the orientation, and prompts the first user to rotate the corresponding one or more identity documents. Once the first user ﬂips the corresponding one or more identity documents, the system 100 repeats the comer detection, capture, and correction on the backside as well. During the comer detection process, the system 100 intelligently predicts the position of a fourth comer if the system 100 is able to detect three unique comers with decent accuracy.

[0006] Further, the one or more adjustments and detection of the area of interest in the image of the one or more identity documents are performed using the one or more image- related ML models. As used herein, the term “machine leaming” is defined as a type of artificial intelligence (AI) that allows applications to become more accurate at predicting outcomes without being explicitly programmed to do so. In one embodiment, the one or more image-related ML models may include a comer detection ML model, an orientation correction ML model, and the like.

[0007] In such embodiment, the one or more adjustments performed by the system 100 Via the image capture assistance module 110 may also include to identify a document type, a boundary adjustment, a front-and-back identification, and the like. Therefore, in an embodiment, the one or more image-related ML models may further include a document type identification ML model, a boundary adjustment ML model, and the like. Moreover, in an embodiment, the processing subsystem 105 may also include a front- and-back identification sub-module (as shown in FIG. 2). The front-and-back identification sub-module may be configured to generate a front-and-back classification ML model by performing a front-and-back classification training with a plurality of identity-related documents using ML. The plurality of identity-related documents may include a corresponding plurality of front sides and a corresponding plurality of back sides having one or more predeterrnined differences between each other. The front-and- back identification sub-module may also be configured to identify a front side and a back side of the one or more identity documents using the front-and-back classiﬁcation ML model in real-time, based on the mode of scanning selected by the first user.

[0021] Subsequently, the processing subsystem 105 also includes a text cropping module 120 operatively coupled to the image capture assistance module 110. The text cropping module 120 is configured to detect one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection ML model. The text cropping module 120 is also configured to generate one or more sub- images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours. During this process, basically, the system 100 identifies areas that have text and eliminates all other irrelevant areas. Then, respective one or more sub-images are generated for each text, each Word, each line, or the like.

[0022] Further, the processing subsystem 105 also includes a character recognition module 125 operatively coupled to the text cropping module 120. The character recognition module 125 is configured to perform dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images. The one or more binary images may include one bit per pixel With information of 0 or 1 representing black or White pixels respectively. The character recognition module 125 is also configured to feed the one or more binary images to a character recognition engine. The character recognition engine includes a character recognition ML model trained to recognize one or more characters in one or more predefined languages using ML. In one embodiment, the one or more predefined languages, the character recognition ML model is trained With may include at least one of English, Arabic, French, and the like. Further, the character recognition module 125 is also configured to receive a text for each of the one or more sub-images from the character recognition engine.

[0023] Furthermore, the processing subsystem 105 also includes a form filling module 130 operatively coupled to the character recognition module 125. The form f1lling module 130 is conf1gured to map the text received by the character recognition module 125 to one or more fields in the preferred form for forrn-filling the preferred form by dodging a scope of tempering of information When form-filling.

[0024] In addition, the one or more identity documents may have one or more machine-readable zones. Therefore, the processing subsystem 105 may also include a machine-readable zone reading module (as shown in FIG. 2) operatively coupled to the image capture assistance module 110. The machine-readable zone reading module my be configured to read and interpret the one or more machine-readable zones from the area of interest in the image of the one or more identity documents using an interpretation ML model. In one exemplary embodiment, the one or more machine-readable zones may 9 include Quick Response (QR) Codes, bar codes, radio-frequency identification (RFID) tags, and the like.

[0025] Additionally, in an embodiment, the system 100 may further include a database management module (as shown in FIG. 2) operatively coupled to the image capture assistance module l l0. The database management module may be configured to store the image of the one or more identity documents locally in a device storage associated With the first user device of the first user, upon scanning. The database management module may also be configured to upload the image of the one or more identity documents stored in the device storage to a national identity server as a part of a regulatory process in real- time, upon establishing a connection With the corresponding national identity server.

[0026] Moreover, in an embodiment, the processing subsystem 105 may further include an accuracy improvement module (as shown in FIG. 2) operatively coupled to the database management module. The accuracy management module may be conf1gured to re-train and update one or more ML models, With the image of the one or more identity documents uploaded to the national identity sever in real-time, for improving an accuracy of operation of the processing subsystem 105. The one or more ML models are used by the processing subsystem l05 for performing a plurality of operations via the plurality of modules. In one embodiment, the one or more ML models may include the comer detection ML model, the front-and-back classification ML model, the text detection ML model, the character recognition ML model, and the like.

[0027] As used herein, the term “comer detection ML model” refers to a ML model in Which one or more features selected for this model are four small rectangles draWn on four comers of each of the plurality of identity-related documents from a training data set. These four rectangles are used for reinforced supervised leaming of an object detection vision model. A typical dataset of about 300 good-quality images is adequate to obtain a reasonably accurate and small-sized model. Training images should be uniforrnly oriented and outliers should be corrected/removed from the training data set.

[0028] Similarly, as used herein, the term “front-and-back classification ML model” refers to a ML model in Which one or more features selected for this model are a front and a back of the one or more identity documents. One or more images in the training data set should consist of an equal number of a front side and a back side of the plurality l0 of identity-related documents. A typical dataset of about 200 good-quality images is adequate to obtain an accurate and small-sized model. Training images should be uniforrnly oriented, and outliers should be corrected/removed from the training dataset.

[0029] Further, as used herein, the term “text detection ML model” refers to a model Which is a critical model in Which one or more features selected for this model are text areas of each of the one or more images of the plurality of identity-related documents from the training data set. This training can be done only on a desired text of a document. For instance, text like a name of a country or a document issuer and labels like name, age, address, and the like can be skipped. Hence, the number of the one or more features selected for training this model depends on the number of the one or more fields of the text that is preferred in reading from the document. Once the feature set is finalized, rectangles are draWn in a given sequence and labeled systematically in all the one or more images of the training dataset. A typical dataset of about 500 to about 1000 good-quality images is adequate to obtain an accurate model. The accuracy of the model improves With the number of cycles of training performed and the range of distribution of text areas on the document. Training images should be uniforrnly oriented, and outliers should be corrected/removed from the training dataset.

[0030] Furthermore, as used herein, the term “character recognition ML model” refers to a model Which is also a critical model that decides an accuracy of the system 100. This model is language and font specific. Hence, the training dataset should have a decent representation of each symbol/character. The one or more features should hence be selected per symbol/character in the same font or a font as close as possible to the target document font and language. In case of documents With symbols from multiple languages, all possible symbols should be included in the feature set and training data should be arranged accordingly. Classification models are less accurate since they result in more false positives. Supervised reinforcement training is more effective in this process. It is not necessary to train this model from the identification document data set.

It can be trained in multiple cycles for an accurate resultant model.

[0031] FIG. 2 is a schematic representation of an exemplary embodiment of the system 100 for form-filling by character recognition of one or more identity documents of FIG. 1 in accordance With an embodiment of the present disclosure. Considering an example, Where the system 100 is utilized for filling a form 131 for a person “X° 132 Who 11 visits a SIM store 134 for purchasing a SIM card. Suppose the SIM store 134 is using the system 100 for f1lling a form 131 Which is a part of a forrnality to be done to get the SIM card. Then, suppose a sales associate “Y° 138 Working in the SIM store 134 perforrns the activity of filling the forrn 131 using the system 100 for the person “X” 132. Basically, the sales associate “Y° 138 is registered With the system 100 via the registration module 140 by providing a plurality of personal details via a personalized mobile phone 145. The plurality of personal details is stored in the local storage 142.

[0032] Here, the registration module 140 is located at the processing subsystem 105 Which is hosted on the local storage 142 of the personalized mobile phone 145. The processing subsystem 105 is configured to execute on the network to control bidirectional communications among a plurality of modules. Along With the registration module 140, the processing subsystem 105 also includes several other modules such as the image capture assistance module 110, the text cropping module 120, the character recognition module 125, the forrn f1lling module 130, the database management module 160, the accuracy improvement module 170, and the machine-readable zone reading module 180.

[0033] Upon registration, the sales associate “Y° 138 is now authorized to use the system 100 and assist the person “X° 132 to get forrnalities completed for purchasing the SIM card. Considering that, the sales associate “Y° 138 selects the mode of scanning to be the system-controlled mode via the personalized mobile phone 145, the system 100 receives the signal for the same via the image capture assistance module 110. Then, the sales associate “Y° 138 scans an Aadhaar card of the person “X” 132 using a camera of the personalized mobile phone 145. Upon scanning, the one or more adjustments and the detection of the area of interest in an image of the Aadhaar card are performed via the image capture assistance module 110. The one or more adjustments and the detection of the area of interest in the image are performed on both a front side and a back side of the image, Wherein the front side and the back side of the Aadhaar card is identified via the front-and-back identification sub-module 190 of the image capture assistance module 110. Also, the one or more machine-readable zones are also read and interpreted via the machine-readable zone reading module 180.

[0034] Further, the one or more text area contours are detected and one or more sub- images by cropping a text area from the image are generated via the text cropping module 120. Lastly, the text for each of the one or more sub-images is generated by performing 12 dynamic quantization and character recognition on the one or more sub-images via the character recognition module 125. Then, the text is mapped to one or more fields in the forrn, thereby f1lling the forrn 131 by dodging a scope of tempering of inforrnation When form-filling.

[0035] The system 100 also stored the image in the local storage 142 of the personalized mobile phone 145 of the sales associate “Y° 138, and then uploads the same to a predeterrnined national identity server 195 as a part of a regulatory process in real- time, upon establishing a connection With the corresponding predeterrnined national identity server 195, via the database management module 160. Further, the one or more ML models used by the system 100 are re-trained and updated in real-time, for improving the accuracy of operation of the system 100 via the accuracy improvement module 170.

[0036] FIG. 3 is a block diagram of a character recognition computer or a character recognition server 200 in accordance With an embodiment of the present disclosure. The character recognition server 200 includes processor(s) 210, and memory 220 operatively coupled to the bus 230. The processor(s) 210, as used herein, means any type of computational circuit, such as, but not limited to, a microprocessor, a microcontroller, a complex instruction set computing microprocessor, a reduced instruction set computing microprocessor, a very long instruction Word microprocessor, an explicitly parallel instruction computing microprocessor, a digital signal processor, or any other type of processing circuit, or a combination thereof. [003 7] The memory 220 includes several subsystems stored in the form of executable program Which instructs the processor(s) 210 to perform the method steps illustrated in FIG. 1. The memory 220 includes a processing subsystem 105 of FIG. 1. The processing subsystem 105 further has following modules: an image capture assistance module 110, a text cropping module 120, a character recognition module 125, a form f1lling module 130, a database management module 160, an accuracy improvement module 170, and a machine-readable zone reading module 180.

[0038] The image capture assistance module 110 is configured to receive a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration. The image capture assistance module 110 is also configured to receive an image of the one or more identity 13 documents affiliated to a second user When the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning. The image capture assistance module 110 is also configured to perform one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming models, based on the mode of scanning selected by the first user.

[0039] The text cropping module 120 is configured to detect one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection machine leaming model. The text cropping module 120 is also configured to generate one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours.

[0040] The character recognition module 125 is conf1gured to perform dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images. The character recognition module 125 is also conf1gured to feed the one or more binary images to a character recognition engine. The character recognition engine includes a character recognition machine leaming model trained to recognize one or more characters in one or more predefined languages using machine leaming. The character recognition module 125 is also configured to receive a text for each of the one or more sub-images from the character recognition engine.

[0041] The form filling module 130 is configured to map the text received by the character recognition module 125 to one or more fields in a preferred form for form- filling the preferred form by dodging a scope of tempering of information When form- ﬁinng.

[0042] The database management module 160 is conf1gured to store the image of the one or more identity documents locally in a device storage associated With a first user device of the first user, upon scanning. The database management module 160 is also conf1gured to upload the image of the one or more identity documents stored in the device storage to a national identity server as a part of a regulatory process in real-time, upon establishing a connection With the corresponding national identity server. 14

[0043] The accuracy improvement module 170 is configured to re-train and update one or more machine leaming models, with the image of the one or more identity documents uploaded to the national identity sever in real-time, for improving an accuracy of operation of the processing subsystem 105. The one or more machine leaming models are used by the processing subsystem 105 for performing a plurality of operations via the plurality of modules.

[0044] The machine-readable zone reading module 180 is configured to read and interpret one or more machine-readable zones from the area of interest in the image of the one or more identity documents using an interpretation machine leaming model.

[0045] The bus 230 as used herein refers to be intemal memory channels or computer network that is used to connect computer components and transfer data between them. The bus 230 includes a serial bus or a parallel bus, wherein the serial bus transmits data in a bit-serial format and the parallel bus transmits data across multiple wires. The bus 230 as used herein, may include but not limited to, a system bus, an intemal bus, an extemal bus, an expansion bus, a frontside bus, a backside bus, and the like.

[0046] FIG. 4 (a) is a ﬂow chart representing the steps involved in a method 300 for form-filling by character recognition of one or more identity documents of FIG.1 in accordance with an embodiment of the present disclosure. FIG. 4 (b) is a ﬂow chart representing continued steps involved in the method 300 of FIG. 4 (a) in accordance with an embodiment of the present disclosure. The method 300 includes receiving a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration in step 310. In one embodiment, receiving the signal may include receiving the signal via an image capture assistance module 110.

[0047] The method 300 also includes receiving an image of the one or more identity documents affiliated to a second user when the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning in step 320. In one embodiment, receiving the image of the one or more identity documents may include receiving the image of the one or more identity documents via the image capture assistance module 110.

[0048] Further, the method 300 also includes performing one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming (ML) models, based on the mode of scanning selected by the first user in step 330. In one embodiment, performing the one or more adjustments and the detection of the area of interest in the image may include performing the one or more adjustments and the detection of the area of interest in the image via the image capture assistance module ll0.

[0049] In one exemplary embodiment, the method 300 may further include generating a front-and-back classification ML model by performing a front-and-back classification training With a plurality of identity-related documents using ML. The plurality of identity- related documents may include a corresponding plurality of front sides and a corresponding plurality of back sides having one or more predeterrnined differences between each other. In such embodiment, generating the front-and-back classification ML model may include generating the front-and-back classification ML model via a front-and-back identification sub-module 190 of the image capture assistance module ll0. [005 0] Further, in an embodiment, the method 300 may also include identifying a front side and a back side of the one or more identity documents using a front-and-back classification ML model in real-time, based on the mode of scanning selected by the first user. In such embodiment, identifying the front side and the back side of the one or more identity documents may include identifying the front side and the back side of the one or more identity documents via the front-and-back identification sub-module 190 of the image capture assistance module ll0.

[0051] Subsequently, in a specific embodiment, the method 300 may include storing the image of the one or more identity documents locally in a device storage associated With a first user device of the first user, upon scanning. In such embodiment, storing the image of the one or more identity documents locally in the device storage may include storing the image of the one or more identity documents locally in the device storage via a database management module l60.

[0052] The method 300 may further include uploading the image of the one or more identity documents stored in the device storage to a national identity sever as a part of a 16 regulatory process in real-time, upon establishing a connection With the corresponding national identity server. In such embodiment, uploading the image of the one or more identity documents stored in the device storage to the national identity sever may include uploading the image of the one or more identity documents stored in the device storage to the national identity sever via the database management module 160.

[0053] In a further embodiment, the method 300 may include re-training and updating one or more ML models, With the image of the one or more identity documents uploaded to the national identity server in real-time, for improving an accuracy of the operation of the processing subsystem. The one or more ML models are used by the processing subsystem for performing a plurality of operations via the plurality of modules. In such embodiment, re-training and updating the one or more ML models may include via an accuracy improvement module 170.

[0054] Subsequently, in an embodiment, the method 300 may also include reading and interpreting one or more machine-readable zones from the area of interest in the image of the one or more identity documents using an interpretation ML model. In such embodiment, reading and interpreting the one or more machine-readable zones from the area of interest in the image may include reading and interpreting the one or more machine-readable zones from the area of interest in the image via a machine-readable zone reading module 180.

[0055] Furthermore, the method 300 also includes detecting one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection ML model in step 340. In one embodiment, detecting the one or more text area contours from the area of interest in the image may include detecting the one or more text area contours from the area of interest in the image via a text cropping module 120.

[0056] The method 300 further includes generating one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours in step 350. In one embodiment, generating the one or more sub-images may include generating the one or more sub-images via the text cropping module 120. 17

[0057] Moreover, the method 300 also includes performing dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images in step 360. In one embodiment, performing the dynamic quantization on the one or more sub-images may include performing the dynamic quantization on the one or more sub-images via a character recognition module 125.

[0058] In addition, the method 300 also includes feeding the one or more binary images to a character recognition engine, Wherein the character recognition engine includes a character recognition ML model trained to recognize one or more characters in one or more predefined languages using ML in step 370. In one embodiment, feeding the one or more binary images to the character recognition engine may include feeding the one or more binary images to the character recognition engine via the character recognition module 125.

[0059] The method 300 also includes receiving a text for each of the one or more sub- images from the character recognition engine in step 380. In one embodiment, receiving the text for each of the one or more sub-images may include receiving the text for each of the one or more sub-images via the character recognition module 125.

[0060] Further, the method 300 includes mapping the text received by the character recognition module to one or more fields in a preferred form for form-filling the preferred form by dodging a scope of tempering of information When form-f1lling in step 390. In one embodiment, mapping the text received by the character recognition module to the one or more fields in the preferred form may include mapping the text received by the character recognition module to the one or more fields in the preferred form via a form f1lling module 130.

[0061] Various embodiments of the present disclosure enable form-filling by character recognition of the one or more identity documents by eliminating human intervention. Also, the system reads the information from the one or more identity documents reliably, quickly, and accurately. Further, a step in Which the one or more sub- image is generated for the text to be used for further processing in character recognition, there is a reduction of number of pixels to be processed to identify the text. Moreover, the system can run on a mobile phone, and hence reduction of the number of pixels to be 18 processed enables the usage of resources like battery power, processing power, and memory to be optimized for the improved overall performance of the system.

[0062] It will be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of the disclosure and are not intended to be restrictive thereof

[0063] While specific language has been used to describe the disclosure, any limitations arising on account of the same are not intended. As would be apparent to a person skilled in the art, various working modifications may be made to the method in order to implement the inventive concept as taught herein.

[0064] The figures and the foregoing description give examples of embodiments. Those skilled in the art will appreciate that one or more of the described elements may well be combined into a single functional element. Altematively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, the order of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions of any ﬂow diagram need not be implemented in the order shown; nor do all of the acts need to be necessarily performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of embodiments is by no means limited by these specific examples. 19

Claims

l. A system for forrn-filling by character recognition of one or more identity documents, comprising: a processing subsystem configured to execute on a network to control bidirectional communications among a plurality of modules comprising: an image capture assistance module configured to: receive a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration; receive an image of the one or more identity documents affiliated to a second user When the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning; and perform one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming models, based on the mode of scanning selected by the first user; a text cropping module operatively coupled to the image capture assistance module, Wherein the text cropping module is configured to: detect one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection machine leaming model; and generate one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours; a character recognition module operatively coupled to the text cropping module, Wherein the character recognition module is configured to: perform dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images; feed the one or more binary images to a character recognition engine, Wherein the character recognition engine comprises a character recognition machine leaming model trained to recognize one or more characters in one or more predefined languages using machine leaming; and receive a text for each of the one or more sub-images from the character recognition engine; and a form filling module operatively coupled to the character recognition module, Wherein the form f1lling module is conf1gured to map the text received by the character recognition module to one or more fields in a preferred form for form-filling the preferred form by dodging a scope of tempering of information When form-f1lling.
2. The system of claim l, Wherein the one or more adjustments comprises at least one of an orientation correction, identify a document type, a boundary adjustment, a comer detection, and a front-and-back identification.
3. The system of claim l, Wherein the one or more predefined languages, the character recognition machine leaming model is trained With comprises at least one of English, Arabic, and French.
4. The system of claim 1, Wherein the image capture assistance module comprises a front-and-back identification sub-module configured to: generate a front-and-back classif1cation machine leaming model by performing a front-and-back classif1cation training With a plurality of identity-related documents using machine leaming, Wherein the plurality of identity-related documents comprises a corresponding plurality of front sides and a corresponding plurality of back sides having one or more predeterrnined differences between each other; andidentify a front side and a back side of the one or more identity documents using the front-and-back Classification machine leaming model in real-time, based on the mode of scanning selected by the first user.
5. The system of claim 1, Wherein the processing subsystem comprises a database management module operatively coupled to the image capture assistance module, Wherein the database management module is configured to: store the image of the one or more identity documents locally in a device storage associated With a first user device of the first user, upon scanning; and upload the image of the one or more identity documents stored in the device storage to a national identity server as a part of a regulatory process in real-time, upon establishing a connection With the corresponding national identity server.
6. The system of claim 5, Wherein the processing subsystem comprises an accuracy improvement module operatively coupled to the database management module, Wherein the accuracy management module is configured to re-train and update one or more machine leaming models, With the image of the one or more identity documents uploaded to the national identity sever in real-time, for improving an accuracy of operation of the processing subsystem, Wherein the one or more machine leaming models are used by the processing subsystem for performing a plurality of operations via the plurality of modules.
7. The system of claim 1, Wherein the processing subsystem comprises a machine-readable zone reading module operatively coupled to the image capture assistance module, Wherein the machine-readable zone reading module is configured to read and interpret one or more machine-readable zones from the area of interest in the image of the one or more identity documents using an interpretation machine leaming model.
8. A method for form-filling by character recognition of one or more identity documents, comprising:receiving, Via an image Capture assistance module, a signal corresponding to a mode of scanning selected by a first user upon authorizing the first user for scanning to accomplish a preferred activity, upon registration; receiving, Via the image capture assistance module, an image of the one or more identity documents affiliated to a second user When the one or more identity documents are scanned by the first user, based on the mode of scanning selected by the first user and receiving approval from the second user for performing the corresponding scanning; performing, via the image capture assistance module, one or more adjustments and detection of an area of interest in the image of the one or more identity documents using one or more image-related machine leaming models, based on the mode of scanning selected by the first user; detecting, Via a text cropping module, one or more text area contours from the area of interest in the image of the one or more identity documents using a text detection machine leaming model; generating, via the text cropping module, one or more sub-images by cropping a text area from the image of the one or more identity documents based on the detection of the corresponding one or more text area contours; performing, via a character recognition module, dynamic quantization on the one or more sub-images for converting the corresponding one or more sub-images to one or more binary images; feeding, Via the character recognition module, the one or more binary images to a character recognition engine, Wherein the character recognition engine comprises a character recognition machine leaming model trained to recognize one or more characters in one or more predefined languages using machine leaming; receiving, Via the character recognition module, a text for each of the one or more sub-images from the character recognition engine; andmapping, via a form filling module, the text received by the character recognition module to one or more fields in a preferred form for form-filling the preferred form by dodging a scope of tempering of information When form-filling.
9. The method of claim 8, comprises generating, via a front-and-back identification sub-module of the image capture assistance module, a front-and-back classification machine leaming model by performing a front-and-back classification training With a plurality of identity-related documents using machine leaming, Wherein the plurality of identity-related documents comprises a corresponding plurality of front sides and a corresponding plurality of back sides having one or more predeterrnined differences between each other.
10. The method of claim 8, comprises identifying, via a front-and-back identification sub-module of the image capture assistance module, a front side and a back side of the one or more identity documents using a front-and-back classification machine leaming model in real-time, based on the mode of scanning selected by the first user.
11. The method of claim 8, comprises storing, via a database management module, the image of the one or more identity documents locally in a device storage associated With a first user device of the first user, upon scanning.
12. The method of claim 11, comprises uploading, via the database management module, the image of the one or more identity documents stored in the device storage to a national identity sever as a part of a regulatory process in real-time, upon establishing a connection With the corresponding national identity server.
13. The method of claim 12, comprises re-training and updating, via an accuracy improvement module, one or more machine leaming models, With the image of the one or more identity documents uploaded to the national identity server in real-time, for improving an accuracy of operation of the processing subsystem, Wherein the one or more machine leaming models are used by the processing subsystem for performing a plurality of operations via the plurality of modules.
14. The method of claim 8, comprises reading and interpreting, via a machine- readable zone reading module, one or more machine-readable zones from the area ofinterest in the image of the one or more identity documents using an interpretation machine leaming model.