WO2020079605A1 - Device and method for enhancing readability of a low-resolution binary image - Google Patents

Device and method for enhancing readability of a low-resolution binary image Download PDF

Info

Publication number
WO2020079605A1
WO2020079605A1 PCT/IB2019/058813 IB2019058813W WO2020079605A1 WO 2020079605 A1 WO2020079605 A1 WO 2020079605A1 IB 2019058813 W IB2019058813 W IB 2019058813W WO 2020079605 A1 WO2020079605 A1 WO 2020079605A1
Authority
WO
WIPO (PCT)
Prior art keywords
binary image
readability
convolution layer
resolution
patch
Prior art date
Application number
PCT/IB2019/058813
Other languages
French (fr)
Inventor
Ram Krishna Pandey
Angarai Ganesan RAMAKRISHNAN
Original Assignee
Indian Institute Of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indian Institute Of Science filed Critical Indian Institute Of Science
Publication of WO2020079605A1 publication Critical patent/WO2020079605A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/1801Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections
    • G06V30/18019Detecting partial patterns, e.g. edges or contours, or configurations, e.g. loops, corners, strokes or intersections by matching or filtering
    • G06V30/18038Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters
    • G06V30/18048Biologically-inspired filters, e.g. difference of Gaussians [DoG], Gabor filters with interaction between the responses of different filters, e.g. cortical complex cells
    • G06V30/18057Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • Embodiments of the present disclosure relates to image processing, more particularly, to a method and device for enhancing readability of a low-resolution binary image for character recognition.
  • OCR Optical Character Recognition
  • OCR systems aim at converting images of typed text to machine-encoded text. By mimicking human reading process, OCR systems enable a machine to understand the text information in images and recognize specific alphanumeric words, text phrases or sentences.
  • OCR systems trained on high resolution text images may fail to accurately predict text in elusive low-resolution text images. Specifically, low-resolution text images may lack fine details, making it harder for the OCR systems to correctly retrieve textual information from some commonly acquired OCR features.
  • One or more techniques in the art disclose to enhance resolution of low-resolution text images, for character recognitions.
  • One such technique includes super resolution imaging which provides an algorithmic solution to the resolution enhancement problem.
  • the super resolution imaging exploits image-specific information.
  • Super-resolution of low-resolution document images is becoming an important pre-requisite for design and development of robust document analysis systems. Without enhancement, a simple binarization will completely remove many strokes. In these conditions, it is virtually impossible to do character recognition as most of the OCRs are designed to work at high resolutions.
  • the task of resolution enhancement is typically to increase spatial resolution, while maintaining the difference between text and background. It can further assist the cause of recognition in low-resolution text images.
  • the present disclosure relates to a method for enhancing readability of a Low Resolution (LR) binary mage.
  • the method comprises to generate an input LR patch of predefined size from a LR binary image and extract one or more features associated with the input LR patch, in LR space, using one or more convolution layers.
  • the LR binary image is upscaled to High Resolution (HR) space, by predefined upscaling factor, for enhancing readability of the LR binary image.
  • the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • the present disclosure relates to readability enhancement device for enhancing readability of a Low Resolution (LR) binary image.
  • the readability enhancement device includes a processor and a memory communicatively coupled to the processor.
  • the memory stores processor-executable instructions, which on execution cause the processor to enhance the readability.
  • an input LR patch of predefined size is generated from a LR binary image and one or more features associated with the input LR patch is extracted, in LR space, using one or more convolution layers.
  • the LR binary image is upscaled to High Resolution (HR) space, by predefined upscaling factor, for enhancing readability of the LR binary image.
  • the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • Figure 1 shows exemplary environment of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure
  • Figure 2 shows a detailed block diagram of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure
  • Figure 3 shows exemplary representation of training dataset generated for training a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of present disclosure
  • Figures 4a-4f illustrate exemplary embodiments associated with a readability enhancement device for enhancing readability' of a LR binary image, in accordance with some embodiments of present disclosure
  • Figure 5 shows a flow diagram illustrating method of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of present disclosure.
  • Figure 6 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure.
  • any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter.
  • any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether such computer or processor is explicitly shown.
  • Present disclosure relates to a method and device for enhancing readability of a LR binary image.
  • the readability may be enhanced by using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • Said layers are configured to upscale the LR binary' image to High Resolution (HR) space by a predefined upscaling factor.
  • Upscaled binary image obtained from the proposed method and device is provided to an Optical Character Recognizer (OCR) system for achieving accurate character recognition.
  • OCR Optical Character Recognizer
  • FIG. 1 shows exemplary environment of a readability' enhancement device 101 for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure.
  • the environment 100 may include the readability enhancement device 101 in communication with an OCR system 102 via a communication network 103, and a binary data repository 104.
  • the readability enhancement device 101 may be configured to enhance the readability of a LR binary image inputted to the readability enhancement device 101.
  • the LR binary image provided to the readability enhancement device 101 may be of a very low- resolution. In an embodiment, said LR binary image may be scanned with a low-resolution settings in a scanner. There may be a need for recognizing characters in the LR binary image.
  • the OCR system 102 associated with the readability enhancement device 101 may be configured to perform the character recognition in the LR binary image.
  • One or more techniques, known to a person skilled in the art, to recognize character in a binary image, may be implemented in the OCR system 102.
  • Prior to providing the LR binary image to the OCR system 102 such LR binary' image may be provided to the readability enhancement device 101 for enhancing readability of the LR binary image. Output of the readability enhancement device
  • the readability enhancement device 101 may be commutatively coupled with the OCR system 102 using the communication network 103.
  • the communication network 103 may include, without limitation, a direct interconnection. Local Area Network (LAN), Wide Area Network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, and the like.
  • the readability enhancement device 101 may be a dedicated server or a cloud-based server, in communication with the OCR system 102.
  • the readability enhancement device 101 may be an integral part of the OCR system 102.
  • the readability enhancement device 101 when a LR binary image is inputted to the OCR system 102, the readability enhancement device 101, by default, may be configured to perform the steps of the proposed method, to enhance the readability of the LR binary image.
  • a user associated with the OCR system 102 may provide instruction to enhance the readability of the LR binary image.
  • the readability enhancement device 101 may be configured to perform the steps of the proposed method, to enhance the readability of the LR binary image.
  • the readability enhancement device 101 may include a processor 105, I/O interface 106, and a memory 107.
  • the memory 107 may be communicatively coupled to the processor 105.
  • the memory 107 stores instructions, executable by the processor 105, which, on execution, may cause the readability enhancement device 101 to enhance the readability of the LR binary image.
  • the memory 107 may include one or more modules 108 and data 109.
  • the one or more modules 108 may be configured to perform the steps of the present disclosure using the data 109, to enhance the readability.
  • each of the one or more modules 108 may be a hardware unit which may be outside the memory 107 and coupled with the readability enhancement device 101.
  • the readability enhancement device 101 for enhancing the readability, may be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a Personal Computer (PC), a notebook, a smartphone, a tablet, e-book readers, a server, a network server, a cloud-based server and the like.
  • the readability enhancement device 101 may be associated with plurality of OCR systems. LR binary' images to be provided to the plurality of OCR systems may be processed by the readability enhancement device 101, to enhance the readability of the LR binary images.
  • the readability enhancement device 101 is initially configured to receive the LR binary image and generate an input LR patch of predefined size from the LR binary' image.
  • the input LR patch may be generated by cropping the LR binary image. In an embodiment, the input LR patch may be generated by resizing the LR binary image.
  • One or more other techniques, known to a person skilled in the art, may be implemented for generating the input LR patch using the LR binary image.
  • the readability enhancement device 101 may be configured to extract one or more features associated with the input LR patch.
  • the one or more features may be extracted in LR space using one or more convolution layers.
  • the LR binary image is upscaled to HR space using the one or more features.
  • the LR binary image is upscaled by a predefined upscaling factor.
  • the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • the readability' enhancement device 101 may be configured to map each pixel in the LR space with multiple pixels in HR space, for enhancing readability of the LR binary image.
  • the mapping may be performed using a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2.
  • the first transposed convolution layer 402.1 and the second transposed convolution layer 402.2 may be cascaded with each other.
  • the readability enhancement device 101 may be configured to map each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2.
  • the first transposed convolution layer 402.1 and the second transposed convolution layer 402.2 may be cascaded with each other.
  • the readability enhancement device 101 may be configured to perform convolution on output of the first transposed convolution layer 402.1 using an additional convolution layer.
  • the LR binary image is merged with output of the additional convolution layer to obtain one or more merged features with dimension same as that of the LR binary image. Further, the one or more merged features is merged with output of the second transposed convolution layer 402.2, for enhancing readability of the LR binary image.
  • the readability enhancement device 101 may be configured to map each pixel of the one or more features in the LR space with multiple pixels in HR space using a first transposed convolution layer 402.1. Further, the one or more features is re-arranged in the HR space obtained from the first transposed convolution layer 402.1, using a sub-pixel convolution layer, for enhancing readability of the LR binary image.
  • the readability enhancement device 101 may be configured to perform activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer, based on corresponding output.
  • at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU) may be used for performing the activation.
  • the readability enhancement device 101 may be configured to perform training using a training dataset.
  • the training dataset includes plurality of LR-HR patch pairs, generated from plurality of LR and corresponding HR binary images.
  • the plurality of LR and corresponding HR binary images may include a large set of samples of all characters of specific language of a document, segmented from a collection of very old books with different fonts and sizes of the characters.
  • the readability enhancement device 101 may be associated with the binary data repository 104 storing the plurality of LR and corresponding HR binary images. The readability enhancement device 101 may be configured to retrieve the the plurality of LR and corresponding HR binary images, from the binary data repository 104, for generating the training dataset.
  • the readability enhancement device 101 may be configured to generate each of plurality of LR patches, from the LR-HR patch pairs, by performing one or more processing steps on the plurality of LR and corresponding HR binary images.
  • One of the one or more processing steps may include choosing alternate pixels of a corresponding HR patch from one or more HR binary images.
  • Other processing steps may include to multiply the corresponding LR patch with a mask comprising randomly placed zeroes and ones, to generate one or more rotated variants of the corresponding HR patch and to scan the training images at a low scanning resolution.
  • any one of the one or more processing steps may be performed to generate the LR-HR patch pairs.
  • combination of the one or more processing steps may also be performed to generate the LR-HR patch pairs.
  • the readability enhancement device 101 may be configured to generate the plurality of HR patches, from the LR-HR patch pairs, by taking overlapping patches with stride“r” from the plurality of HR binary mages.
  • the readability enhancement device 101 may be configured to input the upscaled LR binary image to the OCR system 102, for improving accuracy of character recognition in the LR binary image.
  • the readability enhancement device 101 may be configured to receive and transmit data via the I/O interface 106.
  • Received data may include the LR binary image, plurality of LR and corresponding HR binary images and so on.
  • Transmitted data may include the upscaled LR binary image, which may be transmitted to the OCR system 102.
  • Figure 2 show's a detailed block diagram of the readability enhancement device 101 for enhancing readability of the LR binary image, in accordance with some embodiments of the present disclosure.
  • the data 109 and the one or more modules 108 in the memory' 107 of the readability enhancement device 101 is described herein in detail.
  • the one or more modules 108 may include, but are not limited to, an input LR patch generation module 201, a feature extraction module 202, a LR binary image upscale module 203, a training module 204, an activation module 205, an OCR system input module 206, and one or more other modules 207, associated with the readability enhancement device 101.
  • the data 109 in the memory 107 may include LR binary image data
  • LR binary' image 208 also referred to as LR binary' image 208
  • input LR patch data 209 also referred to as input LR patch 209
  • feature data 210 also referred to as one or more features 210
  • upscaled LR binary image data 211 training dataset 212
  • binary image data 213 and other data 214 associated with the readability' enhancement device 101.
  • the data 109 in the memory 107 may be processed by the one or more modules 108 of the readability enhancement device 101.
  • the one or more modules 108 may be implemented as dedicated units and when implemented in such a manner, said modules may be configured with the functionality defined in the present disclosure to result in a novel hardware.
  • the term module may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a Field-Programmable Gate Arrays (FPGA), Programmable System-on-Chip (PSoC), a combinational logic circuit, and/or other suitable components that provide the described functionality.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Arrays
  • PSoC Programmable System-on-Chip
  • the one or more modules 108 of the present disclosure function to enhance the readability of the LR binary image 208 for accurate character recognition.
  • the one or more modules 108 along with the data 109, may be implemented in any system, for enhancing the readability.
  • the readability enhancement device 101 proposed in the present disclosure is trained using training dataset 212 for effective enhancement of readability.
  • the training dataset 212 may include approximately 5 million LR-HR patch pairs with diverse low- resolution properties.
  • the training module 204 may be configured to generate the training dataset 212.
  • the training dataset 212 may be stored in the memory 107 in compressed :npz format, with which the training of neural network models of the readability enhancement device 101 is performed.
  • the training dataset 212 may include overlapping patches from the plurality of LR and HR binary images of a document. Such overlapping patches may also be referred to as LR-HR patch pairs.
  • the plurality of LR and HR binary images may be retrieved from the binary data repository 104.
  • the plurality of LR and HR binary' images may be stored as binary image data 213, in the memory 107 of the readability enhancement device 101.
  • the LR binary images may be considered as training data and the HR binary images may be considered as Ground Truth (GT).
  • GT Ground Truth
  • LR patches from the LR-HR patch pairs may be generated by taking overlapping patches of stride 1 from the LR binary images.
  • the corresponding HR patches are generated by taking overlapping patches with stride’r’ from the HR binary images.
  • the training dataset 212 may be generated with an assumption that a function that upscales by a predefined upscaling factor of "2” or“4” is being modelled to super-resolve the LR binary image 208.
  • the training dataset 212 may be rich and diverse, generated from plurality of LR images, by performing one or more processing steps.
  • the one or more processing steps may include, but is not limited to, alternate pixels, random deletion of pixels, cropping from character images and direct scanning at low spatial resolution and so on.
  • a binary document may be scanned at 200 dpi, so that resulting binary images may be considered as HR images for training purposes. Separate copies of these digitized HR document images may be converted to LR by selecting only the alternate pixels in the HR image in both x and y directions. Thus, the LR patches has one-fourth pixels of the HR patches. Instead of entire image, such LR-HR patch pairs may be used for training the readability enhancement device 101. In an embodiment, dimensions of the LR-HR patch pairs are 16X16 and 32X32, respectively. Since the same document is scanned and converted to a LR patch, content of the HR binary image is reciprocated in the LR binary image 208, but with a reduction in number of pixels, and hence the clarity.
  • XLRZ be one of the LR patches of dimension 16X16 and GHM be its HR ground truth of dimension 32X32.
  • the two may be related as given in below equation (1):
  • generating the LR-HR patch pairs may include to initially generate the LR-HR patch pairs by skipping alternate pixels and then, apply a mask, which has randomly distributed ones and zeros on each of the LR patches. Since the distribution is random, the mask for each LR patch is different, but the dimensions of the mask may be the same as that of the LR patches, i.e., 16X16.
  • the image patch that entails is the result of pixel-wise multiplication of the mask and the LR image patch.
  • the LR patch may be generated through alternate pixel removal. If Xbu-mnd is the patch obtained after applying the mask Miand on XLRZ, then
  • Miand is a matrix of randomly placed ones and zeros.
  • the ones and zeros may be generated by non-uniform probability distributions such as a Gaussian distribution. It may be observed that more discontinuities in the pixel structure of Xuu-randthan in XLRZ. This helps the readability' enhancement device 101 to be trained in such a way that it can tackle super-resolution tasks of randomly lost pixel data from a document. The ground truth for these new patches may be denoted as Graz-iand.
  • ground truth data GHR may be represented as in equation (4) given below:
  • the entire training dataset 212 for upscaling fector of“4” is the combination of LR data represented as in equation (5) given below:
  • the GT may be represented as in equation (6) given below:
  • FIG. 3 shows few character sample from the training dataset 212 generated by the training module 204.
  • 301.1a is a 200 dpi HR patch GHR2.
  • 301.1b may be a LR version of 301.1a, which is generated by taking alternate pixels XLR2.
  • 301.2a is a 200 dpi HR character Gchar2.
  • 301.2b may be LR version of 301.2a, represented as Xchar2.
  • 301.3a is HR version of a character Gmu-nui.
  • 301 ,3b may be LR version of 301 ,3a which is generated by applying the mask of random ones and zeros, represented as Xuu-tand.
  • 301.4a is 150 dpi HR resolution image, represented as Gisodpi.
  • 301.4b may be LR image which is directly scanned at 75 dpi, corresponding to the image in 301.4a, represented as X?5dpi.
  • 301.5a is 300 dpi HR image, represented as G.ioodpi.
  • 301.5b may be LR image which is directly scanned at 75 dpi, corresponding to the image in 301.5a, represented as X?5dpi.
  • the trained readability enhancement device 101 may be deployed with the OCR system 102 to enhance readability of a LR binary image 208 which is to be scanned by the OCR system 102.
  • the input LR patch generation module 201 may be configured to generate the input LR patch 209 of predefined size.
  • the input LR patch 209 may be generated by cropping the LR binary image 208.
  • the input LR patch 209 may be generated by resizing the LR binary image 208.
  • One or more other techniques, known to a person skilled in the art, may be implemented for generating the input LR patch 209 using the LR binary image 208.
  • the input LR patch 209 may be of size 16x16x1.
  • the feature extraction module 202 may be configured to extract one or more features 210 associated with the input LR patch 209.
  • the one or more features 210 may be extracted in LR space using one or more convolution layers.
  • the LR binary image 208 is upscaled to High Resolution (HR) space using the one or more features 210.
  • the LR binary image 208 is upscaled by predefined upscaling factor. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • the LR binary image upscale module 203 may be configured to use first architecture 400a illustrated in Figure 4a, for upscaling the resolution by the predefined upscaling factor of“2”.
  • the first architecture 400a may include first convolution layer 401.1, second convolution layer 401.2, first transposed convolution layer 402.1 and second transposed convolution layer 402.2.
  • the first convolution layer 401.1 and the second convolution layer 401.2 may be used without padding, followed by a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2.
  • a deeper model would consume a lot of time to train and test.
  • the first architecture 400a is proposed to have a balance between performance and speed.
  • the ReLU and PReLU activation functions may be used to evaluate performance of the first architecture 400a.
  • an additional new transposed convolution layer may be included in the first architecture 400a, instead of replicating the first architecture 400a.
  • Such architecture may be used to reduce network depth, while achieving an upscaling factor of“4”.
  • a transposed convolution layer used in the proposed system may also be referred to as fractionally strided convolution layer.
  • the transposed convolution layer operates by interchanging the forward and backward passes of the convolution process.
  • the transposed convolution layer may have found its application in semantic segmentation representation learning, mid-level and high-level feature learning and so on.
  • a function that maps every pixel in the LR to multiple pixels in the HR space This may be achieved by introducing the transposed convolution layer after extracting the one or more features 210 in the LR space.
  • kernel associated with uneven overlaps with the input feature map when the kernel size i.e., output window size is not divisible by the stride i.e., spacing between the input neurons. Such overlaps occur in two dimensions, resulting in checkerboard-like patterns of varying magnitude.
  • unit stride may be used in the transposed convolution layer, along with increasing kernel sizes for enhancing the readability.
  • by upscaling the LR image using bilinear interpolation and then utilizing the convolution layers for feature computation occurrence of these checkerboard patterns may be prevented.
  • Table 1 below shows dimensions of intermediate feature maps of the first architecture 400a for an upscaling factor of“2”.
  • Table 2 below shows dimensions of intermediate feature maps of the first architecture 400a for an upscaling factor of“4”, with an additional transposed convolution layer.
  • the LR binary image upscale module 203 may be configured to use second architecture illustrated in Figure 4b, for upscaling the resolution by the predefined upscaling factor of“4”.
  • the second architecture 400b may include the first convolution layer 401.1, the second convolution layer 401.2, the first transposed convolution layer 402.1, the second transposed convolution layer 402.2, a third convolution layer 401.3 and a third transposed convolution layer 402.3.
  • the second architecture 400b include the first architecture 400a followed by the third convolution layer added to output of the first transposed convolution layer 402.1, whose output is then merged with the input LR patch 209. Since feature output is merged with the input LR patch 209, dimension of the merged feature map is the same as that of the input.
  • the third transposed convolution layer 402.3 may be used in the end to upscale by two times.
  • the second architecture 400b includes two parallel feature maps, which are merged to obtain the final high-resolution output as shown in Figure 4b.
  • the second architecture 400b may be referred to as Parallel Stream Convolution (PSC) architecture.
  • performance of the second architecture 400b may also be evaluated using at least one of ReLU or PReLU activation functions.
  • the training module 204 may be configured to incorporate residual learning which is useful when there is a chance of occurrence of exploding/vanishing gradients while training the architecture. Simply stacking more layers does not improve the performance of the architecture, as compared to combining residual blocks of layers. In residual learning, the architecture does not learn the exact pixel-pixel correspondence. Instead, the architecture learns the residual output, which consists mostly of zeros or negligible numbers. Thus, the architecture may be trained at a higher learning rate to predict the residuals rather than the actual pixels, while using mote number of layers than the usual CNN’s. In the second architecture 400b, a residual connection from the input to one of the intermediate layers is included, instead of typically connecting the input to final output layer.
  • Table 3 below shows dimensions of intermediate feature maps of the second architecture 400b for an upscaling factor of“4”.
  • alternative of the second architecture 400b may be proposed with additional two transposed convolution layers.
  • a fourth transposed convolution layer and fifth transposed convolution layer may be cascaded with the remaining transposed convolution layers.
  • Output of the fourth transposed convolution layer (TRCONV4) and the fifth transposed convolution layer (TRCONV5) may be merged to output upscaled LR binary image enhanced by the upscaling factor of“4”.
  • Table 4 below shows dimensions of intermediate feature maps of the second architecture 400b for an upscaling factor of“4”.
  • the LR binary image upscale module 203 may be configured to use third architecture illustrated in Figure 4c, for upscaling the resolution by the predefined upscaling factor of“4”.
  • the third architecture 400c may include the first convolution layer 401.1, the second convolution layer 401.2, the first transposed convolution layer 402.1 and a sub-pixel convolution layer 403.
  • the second transposed layer of the first architecture 400a is replaced with the sub-pixel convolution layer 403. Since sub-pixel operation does not include trainable parameters as other upscaling layers do, computational complexity is less than that of the first architecture 400a.
  • number of feature maps in the layer before sub-pixel convolution may be increased.
  • the third architecture 400c only a single sub-pixel layer may be sufficient to upscale by the upscaling factor of“2” or“4”.
  • the sub-pixel convolution layer 403 may be an alternative to fractionally strided convolution, interpolation and un-pooling methods for increasing the dimensionality is the sub-pixel convolution operation.
  • the sub-pixel convolution layer 403 may be a non-trainable layer, since it only implements matrix manipulations to change the feature dimensions and does not have any weights to learn.
  • Table 5 below shows dimensions of intermediate feature maps of the third architecture
  • Table 6 below shows dimensions of intermediate feature maps of the third architecture 400c for an upscaling factor of“4”.
  • the activation module 205 may be configured to perform activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer 403, based on corresponding output.
  • at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU) may be used for performing the activation.
  • One or more other modules 207 of the readability enhancement device 101 may include to perform loss function the proposed architectures.
  • Standard Mean Square Error (MSE) function may be used as the loss function to train the architectures.
  • a fourth architecture may be proposed which is a three stream, parallel neural network as shown in Figure 4d.
  • the fourth architecture includes to merge output of the first architecture 400a, the second architecture 400b and the third architecture 400c, to get the final output. It may be observed that the third architecture 400c may contribute more details than the other architecture to the overall output, while training on either with upscaling factor of“2’' or“4”.
  • the output of the readability enhancement device 101 which is the upscaled LR binary image may be stored as the upscaled LR binary image data 211, in the memory 107.
  • Figure 4e gives the results of the proposed architectures for a small cropped region of one of the input test images.
  • the left figure in the top panel shows the input, which has been cropped from a Tamil document image and zoomed for the purpose of visualization and beside it is the corresponding zoomed ground truth image.
  • bicubic interpolation is used as a baseline and given the images interpolated by factors of 2 and 4.
  • the second row displays the outputs of the first architecture 400a and its variants.
  • the first image C2 is the two times upscaled output of the first architecture 400a.
  • the second image C4 is the result of four times upscaling of the first architecture 400a.
  • the third result CP2 is the two times upscaled output using PReLU as the activation function of the first architecture 400a.
  • CP4 is the image obtained after four times upscaling using PReLU of the first architecture 400a.
  • Third and fourth rows show the output images of the different variants of the second architecture 400b and the third architecture 400c, respectively.
  • the image R2 is the two times upscaled output of the second architecture 400b.
  • the image R4 is the result of four times upscaling of the second architecture.
  • the third result RP2 is the two times upscaled output using PReLU as the activation function of the second architecture 400b.
  • RP4 is the image obtained after four times upscaling using PReLU of the second architecture 400b.
  • the image S2 is the two times upscaled output of the third architecture 400c.
  • the image S4 is the result of four times upscaling of the third architecture 400c.
  • the third result SP2 is the two times upscaled output using PReLU as the activation function of the third architecture 400c.
  • SP4 is the image obtained after four times upscaling using PReLU of the third architecture
  • Providing the LR binary image 208 to the OCR system 102 may lead to poor performance of the OCR system 102 on sparsely connected symbols.
  • a low-quality image is passed to the OCR system 102, since pixels representing a symbol are not properly connected, during segmentation stage, many symbols may be segmented into multiple pieces.
  • Each of these split components may be wrongly classified by the OCR system 102 as one of known characters, leading to the poor classification of the binary image.
  • the OCR system input module 206 may be configured to input the upscaled LR binary image to the OCR system 102, for improving accuracy of character recognition in the LR binary image 208.
  • Figure 4f shows a part of a test image, its output image and the corresponding text outputs obtained from an online OCR.
  • Image 404.1 shows a poor quality of 75-dpi binary input image, which is not even easy for native Tamils to read directly from.
  • Image 404.2 shows a poor quality of 75-dpi binary input image, which is not even easy for native Tamils to read directly from.
  • Image 404.2 shows a poor quality of 75-dpi binary input image, which is not even easy for native Tamils to read directly from.
  • the output text given in image 404.2 there are too many errors arising out of the poor image segmentation during the OCR process.
  • Roman and Chinese characters, lndo-Arabic numerals and certain other symbols are wrongly present in the recognized output.
  • Image 404.3 illustrates relatively high quality, with upscaling factor of“4” produced by the third architecture 400c with PReLU activation.
  • the other data 214 may store data, including temporary data and temporary files, generated by modules for performing the various functions of the readability enhancement device 101.
  • the one or more modules 108 may also include other modules 207 to perform various miscellaneous functionalities of the readability enhancement device 101. It will be appreciated that such modules may be represented as a single module or a combination of different modules.
  • Figure 5 illustrates a flowchart showing an exemplary method to enhance readability of the LR binary image 208, in accordance with some embodiments of present disclosure.
  • the readability enhancement device 101 may be configured to generate the input LR patch v of predefined size from the LR binary image 208.
  • One or more techniques, known to a person skilled in the art, may be implemented to generate the input LR patch 209.
  • the readability enhancement device 101 may be configured to extract one or more features 210 associated with the input LR patch 209, in LR space, using one or more convolution layers.
  • two convolution layers, cascaded with each other, may be implemented to extract the one or more features 210.
  • the readability enhancement device 101 may be configured to upscale the LR binary image 208 to the HR space using the one or more features 210 for enhancing readability of the LR binary image 208.
  • the LR binary image 208 may be upscaled by the predefined upscaling factor.
  • the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
  • the method 500 may include one or more blocks for executing processes in the readability enhancement device 101.
  • the method 500 may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions or implement particular abstract data types.
  • FIG. 6 illustrates a block diagram of an exemplary computer system 600 for implementing embodiments consistent with the present disclosure.
  • the computer system 600 is used to implement the readability enhancement device 101.
  • the computer system 600 may include a central processing unit (“CPU” or“processor”) 602.
  • the processor 602 may include at least one data processor for executing processes in Virtual Storage Area Network.
  • the processor 602 may include specialized processing units such as, integrated system (bus) controllers, memory' management control units, floating point units, graphics processing units, digital signal processing units, etc.
  • the processor 602 may be disposed in communication with one or more input/output (I/O) devices 609 and 610 via I/O interface 601.
  • the I/O interface 601 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.n /b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc.
  • CDMA code-division multiple access
  • HSPA+ high-speed packet access
  • GSM global system for mobile communications
  • LTE long-term evolution
  • WiMax wireless wide area network
  • the computer system 600 may communicate with one or more I/O devices 609 and 610.
  • the input devices 609 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, stylus, scanner, storage device, transceiver, video device/source, etc.
  • the output devices 610 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light- emitting diode (LED), plasma, Plasma display panel (PDP), Organic light-emitting diode display (OLED) or the like), audio speaker, etc.
  • CTR cathode ray tube
  • LCD liquid crystal display
  • LED light- emitting diode
  • PDP Plasma display panel
  • OLED Organic light-emitting diode display
  • the computer system 600 may consist of the readability enhancement device 101.
  • the processor 602 may be disposed in communication with the communication network 611 via a network interface 603.
  • the network interface 603 may communicate with the communication network 611.
  • the network interface 603 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802.1 la/b/g/n/x, etc.
  • the communication network 611 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc.
  • connection protocols include, but not limited to, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802.1 la/b/g/n/x, etc.
  • the communication network 611 includes, but is not limited to, a direct interconnection, an e-commerce network, a peer to peer (P2P) network, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, Wi-Fi, and such.
  • the first network and the second network may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocolvlntemet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other.
  • HTTP Hypertext Transfer Protocol
  • TCP/IP Transmission Control Protocolvlntemet Protocol
  • WAP Wireless Application Protocol
  • the first network and the second network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc.
  • the processor 602 may be disposed in communication with a memory 605 (e.g., RAM, ROM, etc. not shown in Figure 6) via a storage interface 604.
  • the storage interface 604 may connect to memory 605 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as, serial advanced technology attachment (SATA), Integrated Drive Electronics (IDE), IEEE- 1394, Universal Serial Bus (USB), fibre channel, Small Computer Systems Interface (SCSI), etc.
  • the memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive. Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.
  • the memory 605 may store a collection of program or database components, including, without limitation, user interface 606, an operating system 607, web browser etc.
  • computer system 600 may store user/application data 606, such as, the data, variables, records, etc., as described in this disclosure.
  • databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle ® or Sybase®.
  • the computer system 600 may implement a web browser 608 stored program component.
  • the web browser 608 may be a hypertext viewing application, such as Microsoft Internet Explorer, Google Chrome, Mozilla Firefbx, Apple Safari, etc. Secure web browsing may be provided using Secure Hypertext Transport Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), etc. Web browsers 508 may utilize facilities such as AJAX, DHTML, Adobe Flash, JavaScript, Java, Application Programming Interfaces (APIs), etc.
  • the computer system 600 may implement a mail server stored program component.
  • the mail server may be an Internet mail server such as Microsoft Exchange, or the like.
  • the mail server may utilize facilities such as ASP, ActiveX, ANSI C++/C#, Microsoft .NET, CGI scripts, Java, JavaScript, PERL, PHP, Python, WebObjects, etc.
  • the mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), Microsoft Exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like.
  • IMAP Internet Message Access Protocol
  • MAPI Messaging Application Programming Interface
  • PMP Post Office Protocol
  • SMTP Simple Mail Transfer Protocol
  • the computer system 600 may implement a mail client stored program component.
  • the mail client may be a mail viewing application, such as Apple Mail, Microsoft Entourage, Microsoft Outlook, Mozilla Thunderbird, etc.
  • the operating system 607 may facilitate resource management and operation of the computer system 600.
  • operating systems include, without limitation, APPLE MACINTOSH® OS X, UNIX®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTIONTM (BSD), FREEBSDTM, NETBSDTM, OPENBSDTM, etc.), LINUX DISTRIBUTIONSTM (E G., RED HATTM, UBUNTUTM, KUBUNTUTM, etc.), IBMTM OS/2, MICROSOFTTM WINDOWSTM (XPTM, VISTATM/7/8, 10 etc.), APPLE® IOSTM,
  • APPLE MACINTOSH® OS X UNIX®
  • UNIX-like system distributions E.G., BERKELEY SOFTWARE DISTRIBUTIONTM (BSD), FREEBSDTM, NETBSDTM, OPENBSDTM, etc.
  • LINUX DISTRIBUTIONSTM E G., RED HATTM, UBUNTUTM, K
  • a computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored.
  • a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processors) to perform steps or stages consistent with the embodiments described herein.
  • the term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, which may be non-transitory. Examples include Random Access Memory' (RAM), Read-Only Memory' (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
  • the described operations may be implemented as a method, system or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof.
  • the described operations may be implemented as code maintained in a“non-transitory' computer readable medium”, where a processor may read and execute the code from the computer readable medium.
  • the processor is at least one of a microprocessor and a processor capable of processing and executing the queries.
  • a non-transitory computer readable medium may include media such as magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), optical storage (CD-ROMs, DVDs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, Flash Memory, firmware, programmable logic, etc.), etc.
  • non-transitory computer-readable media may include all computer-readable media except for a transitory.
  • the code implementing the described operations may further be implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.).
  • An“article of manufacture” includes non-transitory computer readable medium, and /or hardware logic, in which code may be implemented.
  • a device in which the code implementing the described embodiments of operations is encoded may include a computer readable medium or hardware logic.

Abstract

Embodiments of present disclosure relates to device and method for enhancing readability of a low-resolution binary image. Initially, an input LR patch of predefined size is generated from a LR binary image and one or more features associated with the input LR patch is extracted, in LR space, using one or more convolution layers. Further, the LR binary image is upscaled to High Resolution (HR) space, by predefined upscaling factor, for enhancing readability of the LR binary image. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.

Description

DEVICE AND METHOD FOR ENHANCING READABILITY OF A LOW-
RESOLUTION BINARY IMAGE
TECHNICAL FIELD
[001] Embodiments of the present disclosure relates to image processing, more particularly, to a method and device for enhancing readability of a low-resolution binary image for character recognition.
BACKGROUND
[002] Optical Character Recognition (OCR) systems aim at converting images of typed text to machine-encoded text. By mimicking human reading process, OCR systems enable a machine to understand the text information in images and recognize specific alphanumeric words, text phrases or sentences. OCR systems trained on high resolution text images may fail to accurately predict text in elusive low-resolution text images. Specifically, low-resolution text images may lack fine details, making it harder for the OCR systems to correctly retrieve textual information from some commonly acquired OCR features.
[003] It is generally advisable to scan text documents at a resolution of around 300-600 Dots Per Inch (dpi) on a flatbed scanner for the OCR systems to achieve best performance. Though it is necessary to scan under these settings, there already exist large collections of documents which have already been scanned at low resolution and later, original documents have been destroyed or lost, which prevents from scanning again. Also, scanning at a higher resolution implies representing digitized pixels with more number of dots. This takes time and such files consume a lot of system memory for storage or bandwidth for transmission. Thus, there exists a limitation to a user to scan only a few documents in a given period of time and hardware capacity.
[004] One or more techniques in the art disclose to enhance resolution of low-resolution text images, for character recognitions. One such technique includes super resolution imaging which provides an algorithmic solution to the resolution enhancement problem. The super resolution imaging exploits image-specific information. Super-resolution of low-resolution document images is becoming an important pre-requisite for design and development of robust document analysis systems. Without enhancement, a simple binarization will completely remove many strokes. In these conditions, it is virtually impossible to do character recognition as most of the OCRs are designed to work at high resolutions. The task of resolution enhancement is typically to increase spatial resolution, while maintaining the difference between text and background. It can further assist the cause of recognition in low-resolution text images.
[005] The problem of document super-resolution is a special case of image super-resolution because document images are pseudo binary' in nature and regularity of patterns used in such visual language distinguishes the document images from natural scenes. A successful document super resolution algorithm needs to use the text-specific priori information. Edges are geometric regular spatial patterns and are among the most noticeable features in document images. The visual quality near edge areas adversely affect our perception of distortion. [006] Super-resolution methods for improving the quality of Low-Resolution (LR) gray document images deal with a down-sampled image of an originally High Resolution (HR) image, and hence do not have real-life application. Different super-resolution methods can be utilized based on the nature of media, namely video, images, and depth maps for various practical applications such as video information enhancement, medical diagnosis, surveillance, remote sensing, astronomical observations and bio-metric information identification. In order to obtain better super-resolution reconstruction, temporal information in videos and fine structures in depth maps should be captured properly.
[007] The information disclosed in this background of the disclosure section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any firm of suggestion that this information forms the prior art already known to a person skilled in the art.
SUMMARY
[008] In an embodiment, the present disclosure relates to a method for enhancing readability of a Low Resolution (LR) binary mage. The method comprises to generate an input LR patch of predefined size from a LR binary image and extract one or more features associated with the input LR patch, in LR space, using one or more convolution layers. Further, the LR binary image is upscaled to High Resolution (HR) space, by predefined upscaling factor, for enhancing readability of the LR binary image. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers. [009] In an embodiment, the present disclosure relates to readability enhancement device for enhancing readability of a Low Resolution (LR) binary image. The readability enhancement device includes a processor and a memory communicatively coupled to the processor. The memory stores processor-executable instructions, which on execution cause the processor to enhance the readability. Initially, an input LR patch of predefined size is generated from a LR binary image and one or more features associated with the input LR patch is extracted, in LR space, using one or more convolution layers. Further, the LR binary image is upscaled to High Resolution (HR) space, by predefined upscaling factor, for enhancing readability of the LR binary image. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
[0010] The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.
BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS
[0011] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and regarding the accompanying figures, in which:
[0012] Figure 1 shows exemplary environment of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure;
[0013] Figure 2 shows a detailed block diagram of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure; [0014] Figure 3 shows exemplary representation of training dataset generated for training a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of present disclosure;
[0015] Figures 4a-4f illustrate exemplary embodiments associated with a readability enhancement device for enhancing readability' of a LR binary image, in accordance with some embodiments of present disclosure;
[0016] Figure 5 shows a flow diagram illustrating method of a readability enhancement device for enhancing readability of a LR binary image, in accordance with some embodiments of present disclosure; and
[0017] Figure 6 illustrates a block diagram of an exemplary computer system for implementing embodiments consistent with the present disclosure. [0018] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and executed by a computer or processor, whether such computer or processor is explicitly shown.
DETAILED DESCRIPTION
[0019] In the present document, the word "exemplary" is used herein to mean "serving as an example, instance, or illustration." Any embodiment or implementation of the present subject matter described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
[0020] While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to tire forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure. [0021] The terms“comprises”,“comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by“comprises... a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
[0022] The terms“includes”,“including”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device, or method that includes a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a system or apparatus proceeded by“includes... a” does not, without more constraints, preclude the existence of other elements or additional elements in the system or method.
[0023] In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by' way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.
[0024] Present disclosure relates to a method and device for enhancing readability of a LR binary image. The readability may be enhanced by using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers. Said layers are configured to upscale the LR binary' image to High Resolution (HR) space by a predefined upscaling factor. Upscaled binary image obtained from the proposed method and device is provided to an Optical Character Recognizer (OCR) system for achieving accurate character recognition.
[0025] Figure 1 shows exemplary environment of a readability' enhancement device 101 for enhancing readability of a LR binary image, in accordance with some embodiments of the present disclosure. The environment 100 may include the readability enhancement device 101 in communication with an OCR system 102 via a communication network 103, and a binary data repository 104. The readability enhancement device 101 may be configured to enhance the readability of a LR binary image inputted to the readability enhancement device 101. The LR binary image provided to the readability enhancement device 101 may be of a very low- resolution. In an embodiment, said LR binary image may be scanned with a low-resolution settings in a scanner. There may be a need for recognizing characters in the LR binary image. The OCR system 102 associated with the readability enhancement device 101 may be configured to perform the character recognition in the LR binary image. One or more techniques, known to a person skilled in the art, to recognize character in a binary image, may be implemented in the OCR system 102. Prior to providing the LR binary image to the OCR system 102, such LR binary' image may be provided to the readability enhancement device 101 for enhancing readability of the LR binary image. Output of the readability enhancement device
101 may be a upscaled LR binary- image with upscaled resolution and enhanced readability-. By using such upscaled LR binary image, accuracy' of character recognition in the OCR system
102 may be increased. In an embodiment, the readability enhancement device 101 may be commutatively coupled with the OCR system 102 using the communication network 103. In an embodiment, the communication network 103 may include, without limitation, a direct interconnection. Local Area Network (LAN), Wide Area Network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, and the like. In an embodiment, the readability enhancement device 101 may be a dedicated server or a cloud-based server, in communication with the OCR system 102.
[0026] In an embodiment, the readability enhancement device 101 may be an integral part of the OCR system 102. In an embodiment, when a LR binary image is inputted to the OCR system 102, the readability enhancement device 101, by default, may be configured to perform the steps of the proposed method, to enhance the readability of the LR binary image. In an embodiment, when a LR binary image is inputted to the OCR system 102, a user associated with the OCR system 102 may provide instruction to enhance the readability of the LR binary image. Upon receipt of the instruction, the readability enhancement device 101 may be configured to perform the steps of the proposed method, to enhance the readability of the LR binary image.
[0027] Further, the readability enhancement device 101 may include a processor 105, I/O interface 106, and a memory 107. In some embodiments, the memory 107 may be communicatively coupled to the processor 105. The memory 107 stores instructions, executable by the processor 105, which, on execution, may cause the readability enhancement device 101 to enhance the readability of the LR binary image. In an embodiment, the memory 107 may include one or more modules 108 and data 109. The one or more modules 108 may be configured to perform the steps of the present disclosure using the data 109, to enhance the readability. In an embodiment, each of the one or more modules 108 may be a hardware unit which may be outside the memory 107 and coupled with the readability enhancement device 101. In an embodiment, the readability enhancement device 101 , for enhancing the readability, may be implemented in a variety of computing systems, such as a laptop computer, a desktop computer, a Personal Computer (PC), a notebook, a smartphone, a tablet, e-book readers, a server, a network server, a cloud-based server and the like. In an embodiment, the readability enhancement device 101 may be associated with plurality of OCR systems. LR binary' images to be provided to the plurality of OCR systems may be processed by the readability enhancement device 101, to enhance the readability of the LR binary images. [0028] The readability enhancement device 101 is initially configured to receive the LR binary image and generate an input LR patch of predefined size from the LR binary' image. In an embodiment, the input LR patch may be generated by cropping the LR binary image. In an embodiment, the input LR patch may be generated by resizing the LR binary image. One or more other techniques, known to a person skilled in the art, may be implemented for generating the input LR patch using the LR binary image.
[0029] Upon generating the input LR patch, the readability enhancement device 101 may be configured to extract one or more features associated with the input LR patch. The one or more features may be extracted in LR space using one or more convolution layers.
[0030] Further, the LR binary image is upscaled to HR space using the one or more features. The LR binary image is upscaled by a predefined upscaling factor. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
[0031] In an embodiment, for upscaling the resolution by the predefined upscaling factor of “2”, the readability' enhancement device 101 may be configured to map each pixel in the LR space with multiple pixels in HR space, for enhancing readability of the LR binary image. In an embodiment, the mapping may be performed using a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2. In an embodiment, the first transposed convolution layer 402.1 and the second transposed convolution layer 402.2 may be cascaded with each other.
[0032] In an embodiment, for upscaling the resolution by the predefined upscaling factor of “4”, the readability enhancement device 101 may be configured to map each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2. The first transposed convolution layer 402.1 and the second transposed convolution layer 402.2 may be cascaded with each other. Further, the readability enhancement device 101 may be configured to perform convolution on output of the first transposed convolution layer 402.1 using an additional convolution layer. The LR binary image is merged with output of the additional convolution layer to obtain one or more merged features with dimension same as that of the LR binary image. Further, the one or more merged features is merged with output of the second transposed convolution layer 402.2, for enhancing readability of the LR binary image.
[0033] In an embodiment, for upscaling the resolution by the predefined upscaling factor of “4”, the readability enhancement device 101 may be configured to map each pixel of the one or more features in the LR space with multiple pixels in HR space using a first transposed convolution layer 402.1. Further, the one or more features is re-arranged in the HR space obtained from the first transposed convolution layer 402.1, using a sub-pixel convolution layer, for enhancing readability of the LR binary image.
[0034] In an embodiment, the readability enhancement device 101 may be configured to perform activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer, based on corresponding output. In an embodiment, at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU) may be used for performing the activation.
[0035] In an embodiment, the readability enhancement device 101 may be configured to perform training using a training dataset. In an embodiment, the training dataset includes plurality of LR-HR patch pairs, generated from plurality of LR and corresponding HR binary images. In an embodiment, the plurality of LR and corresponding HR binary images may include a large set of samples of all characters of specific language of a document, segmented from a collection of very old books with different fonts and sizes of the characters. In an embodiment, the readability enhancement device 101 may be associated with the binary data repository 104 storing the plurality of LR and corresponding HR binary images. The readability enhancement device 101 may be configured to retrieve the the plurality of LR and corresponding HR binary images, from the binary data repository 104, for generating the training dataset.
[0036] In an embodiment, the readability enhancement device 101 may be configured to generate each of plurality of LR patches, from the LR-HR patch pairs, by performing one or more processing steps on the plurality of LR and corresponding HR binary images. One of the one or more processing steps may include choosing alternate pixels of a corresponding HR patch from one or more HR binary images. Other processing steps may include to multiply the corresponding LR patch with a mask comprising randomly placed zeroes and ones, to generate one or more rotated variants of the corresponding HR patch and to scan the training images at a low scanning resolution. In an embodiment, any one of the one or more processing steps may be performed to generate the LR-HR patch pairs. In an embodiment, combination of the one or more processing steps may also be performed to generate the LR-HR patch pairs.
[0037] In an embodiment, when the predefined upscaling factor for the LR binary image is‘V’, the readability enhancement device 101 may be configured to generate the plurality of HR patches, from the LR-HR patch pairs, by taking overlapping patches with stride“r” from the plurality of HR binary mages.
[0038] In an embodiment, the readability enhancement device 101 may be configured to input the upscaled LR binary image to the OCR system 102, for improving accuracy of character recognition in the LR binary image.
[0039] In an embodiment, the readability enhancement device 101 may be configured to receive and transmit data via the I/O interface 106. Received data may include the LR binary image, plurality of LR and corresponding HR binary images and so on. Transmitted data may include the upscaled LR binary image, which may be transmitted to the OCR system 102.
[0040] Figure 2 show's a detailed block diagram of the readability enhancement device 101 for enhancing readability of the LR binary image, in accordance with some embodiments of the present disclosure. [0041] The data 109 and the one or more modules 108 in the memory' 107 of the readability enhancement device 101 is described herein in detail.
[0042] In one implementation, the one or more modules 108 may include, but are not limited to, an input LR patch generation module 201, a feature extraction module 202, a LR binary image upscale module 203, a training module 204, an activation module 205, an OCR system input module 206, and one or more other modules 207, associated with the readability enhancement device 101. [0043] In an embodiment, the data 109 in the memory 107 may include LR binary image data
208 (also referred to as LR binary' image 208), input LR patch data 209 (also referred to as input LR patch 209), feature data 210 (also referred to as one or more features 210), upscaled LR binary image data 211, training dataset 212, binary image data 213 and other data 214 associated with the readability' enhancement device 101.
[0044] In an embodiment, the data 109 in the memory 107 may be processed by the one or more modules 108 of the readability enhancement device 101. In an embodiment, the one or more modules 108 may be implemented as dedicated units and when implemented in such a manner, said modules may be configured with the functionality defined in the present disclosure to result in a novel hardware. As used herein, the term module may refer to an Application Specific Integrated Circuit (ASIC), an electronic circuit, a Field-Programmable Gate Arrays (FPGA), Programmable System-on-Chip (PSoC), a combinational logic circuit, and/or other suitable components that provide the described functionality.
[0045] The one or more modules 108 of the present disclosure function to enhance the readability of the LR binary image 208 for accurate character recognition. The one or more modules 108 along with the data 109, may be implemented in any system, for enhancing the readability.
[0046] The readability enhancement device 101 proposed in the present disclosure is trained using training dataset 212 for effective enhancement of readability. In an embodiment, the training dataset 212 may include approximately 5 million LR-HR patch pairs with diverse low- resolution properties. The training module 204 may be configured to generate the training dataset 212. In an embodiment, the training dataset 212 may be stored in the memory 107 in compressed :npz format, with which the training of neural network models of the readability enhancement device 101 is performed.
[0047] The training dataset 212 may include overlapping patches from the plurality of LR and HR binary images of a document. Such overlapping patches may also be referred to as LR-HR patch pairs. In an embodiment, the plurality of LR and HR binary images may be retrieved from the binary data repository 104. The plurality of LR and HR binary' images may be stored as binary image data 213, in the memory 107 of the readability enhancement device 101. The LR binary images may be considered as training data and the HR binary images may be considered as Ground Truth (GT). In an embodiment, LR patches from the LR-HR patch pairs may be generated by taking overlapping patches of stride 1 from the LR binary images. If an upscaling factor of ’r’ is required from the readability enhancement device 101, the corresponding HR patches are generated by taking overlapping patches with stride’r’ from the HR binary images. In an embodiment, the training dataset 212 may be generated with an assumption that a function that upscales by a predefined upscaling factor of "2” or“4” is being modelled to super-resolve the LR binary image 208. The training dataset 212 may be rich and diverse, generated from plurality of LR images, by performing one or more processing steps. In an embodiment, the one or more processing steps may include, but is not limited to, alternate pixels, random deletion of pixels, cropping from character images and direct scanning at low spatial resolution and so on.
[0048] For upscaling factor of“2”, in an embodiment, a binary document may be scanned at 200 dpi, so that resulting binary images may be considered as HR images for training purposes. Separate copies of these digitized HR document images may be converted to LR by selecting only the alternate pixels in the HR image in both x and y directions. Thus, the LR patches has one-fourth pixels of the HR patches. Instead of entire image, such LR-HR patch pairs may be used for training the readability enhancement device 101. In an embodiment, dimensions of the LR-HR patch pairs are 16X16 and 32X32, respectively. Since the same document is scanned and converted to a LR patch, content of the HR binary image is reciprocated in the LR binary image 208, but with a reduction in number of pixels, and hence the clarity.
[0049] Let XLRZ be one of the LR patches of dimension 16X16 and GHM be its HR ground truth of dimension 32X32. The two may be related as given in below equation (1):
Figure imgf000013_0001
Figure imgf000014_0001
where x, y are the co-ordinates of the binary XLR image.
[0050] Since alternate pixels are considered, the dimensions of GHR may be ensured to be even. Picking alternate pixels to create LR images may be considered as the scanner skipping the alternate pixels from the original image. As a result, a loss in structure or shape of the symbols may be observed.
[0051] Additionally, generating the LR-HR patch pairs may include to initially generate the LR-HR patch pairs by skipping alternate pixels and then, apply a mask, which has randomly distributed ones and zeros on each of the LR patches. Since the distribution is random, the mask for each LR patch is different, but the dimensions of the mask may be the same as that of the LR patches, i.e., 16X16. The image patch that entails is the result of pixel-wise multiplication of the mask and the LR image patch. Using equation (1), the LR patch may be generated through alternate pixel removal. If Xbu-mnd is the patch obtained after applying the mask Miand on XLRZ, then
Figure imgf000014_0002
where the dot operator (.) represents the element-wise multiplication of XLRZ and Miand. [0052] Miand is a matrix of randomly placed ones and zeros. In an embodiment, the ones and zeros may be generated by non-uniform probability distributions such as a Gaussian distribution. It may be observed that more discontinuities in the pixel structure of Xuu-randthan in XLRZ. This helps the readability' enhancement device 101 to be trained in such a way that it can tackle super-resolution tasks of randomly lost pixel data from a document. The ground truth for these new patches may be denoted as Graz-iand.
[0053] In an embodiment, in order to specifically improve the resolution of the characters and thus to further enhance the performance of the OCR system 102, individual character data may be utilized. This data facilitates improvement in pixel connectivity between the strokes in the symbols in the high-resolution output. Consider entire LR patches may be denoted as Xchatz and the ground truth as Gcietz. Finally, to generalize the upscaling function and to make it independent of the input resolution, font and thickness of the symbols in the document image, an additional dataset may be generated from images scanned at 75 and 150 dpi, respectively. Consider additional dataset may be denoted as Xrsdpi and Gisodpi. The entire training dataset 212 required for upscaling factor of“2” is thus combination of all the LR data and may be represented as in equation (3) given below:
Figure imgf000015_0003
[0054] Further, the ground truth data GHR may be represented as in equation (4) given below:
Figure imgf000015_0004
[0055] For the upscaling factor of“4”, the same processing steps may be used, but the 200 dpi images may be replaced by 300 dpi images as the GT. The entire training dataset 212 for upscaling fector of“4” is the combination of LR data represented as in equation (5) given below:
Figure imgf000015_0001
[0056] The GT may be represented as in equation (6) given below:
Figure imgf000015_0002
[0057] Figure 3 shows few character sample from the training dataset 212 generated by the training module 204. 301.1a is a 200 dpi HR patch GHR2. 301.1b may be a LR version of 301.1a, which is generated by taking alternate pixels XLR2. 301.2a is a 200 dpi HR character Gchar2. 301.2b may be LR version of 301.2a, represented as Xchar2. 301.3a is HR version of a character Gmu-nui. 301 ,3b may be LR version of 301 ,3a which is generated by applying the mask of random ones and zeros, represented as Xuu-tand. 301.4a is 150 dpi HR resolution image, represented as Gisodpi. 301.4b may be LR image which is directly scanned at 75 dpi, corresponding to the image in 301.4a, represented as X?5dpi. 301.5a is 300 dpi HR image, represented as G.ioodpi. 301.5b may be LR image which is directly scanned at 75 dpi, corresponding to the image in 301.5a, represented as X?5dpi.
[0058] The trained readability enhancement device 101 may be deployed with the OCR system 102 to enhance readability of a LR binary image 208 which is to be scanned by the OCR system 102. From the LR binary image 208, the input LR patch generation module 201 may be configured to generate the input LR patch 209 of predefined size. In an embodiment, the input LR patch 209 may be generated by cropping the LR binary image 208. In an embodiment, the input LR patch 209 may be generated by resizing the LR binary image 208. One or more other techniques, known to a person skilled in the art, may be implemented for generating the input LR patch 209 using the LR binary image 208. In an embodiment, the input LR patch 209 may be of size 16x16x1.
[0059] Upon generating the input LR patch 209, the feature extraction module 202 may be configured to extract one or more features 210 associated with the input LR patch 209. The one or more features 210 may be extracted in LR space using one or more convolution layers. Further, the LR binary image 208 is upscaled to High Resolution (HR) space using the one or more features 210. The LR binary image 208 is upscaled by predefined upscaling factor. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
[0060] In an embodiment, the LR binary image upscale module 203 may be configured to use first architecture 400a illustrated in Figure 4a, for upscaling the resolution by the predefined upscaling factor of“2”. The first architecture 400a may include first convolution layer 401.1, second convolution layer 401.2, first transposed convolution layer 402.1 and second transposed convolution layer 402.2. The first convolution layer 401.1 and the second convolution layer 401.2 may be used without padding, followed by a first transposed convolution layer 402.1 and a second transposed convolution layer 402.2. A deeper model would consume a lot of time to train and test. Hence the first architecture 400a is proposed to have a balance between performance and speed. In an embodiment, the ReLU and PReLU activation functions may be used to evaluate performance of the first architecture 400a. In an embodiment, in order to upscale by the upscaling factor of "4”, an additional new transposed convolution layer may be included in the first architecture 400a, instead of replicating the first architecture 400a. Such architecture may be used to reduce network depth, while achieving an upscaling factor of“4”. [0061] A transposed convolution layer used in the proposed system may also be referred to as fractionally strided convolution layer. The transposed convolution layer operates by interchanging the forward and backward passes of the convolution process. The transposed convolution layer may have found its application in semantic segmentation representation learning, mid-level and high-level feature learning and so on. To enhance the resolution, we need a function that maps every pixel in the LR to multiple pixels in the HR space. This may be achieved by introducing the transposed convolution layer after extracting the one or more features 210 in the LR space. In an embodiment, kernel associated with uneven overlaps with the input feature map, when the kernel size i.e., output window size is not divisible by the stride i.e., spacing between the input neurons. Such overlaps occur in two dimensions, resulting in checkerboard-like patterns of varying magnitude. To tackle such issue, unit stride may be used in the transposed convolution layer, along with increasing kernel sizes for enhancing the readability. In an embodiment, by upscaling the LR image using bilinear interpolation and then utilizing the convolution layers for feature computation, occurrence of these checkerboard patterns may be prevented.
[0062] Table 1 below shows dimensions of intermediate feature maps of the first architecture 400a for an upscaling factor of“2”.
Figure imgf000017_0001
[0063] Table 2 below shows dimensions of intermediate feature maps of the first architecture 400a for an upscaling factor of“4”, with an additional transposed convolution layer.
Figure imgf000017_0002
[0064] In an embodiment, the LR binary image upscale module 203 may be configured to use second architecture illustrated in Figure 4b, for upscaling the resolution by the predefined upscaling factor of“4”. The second architecture 400b may include the first convolution layer 401.1, the second convolution layer 401.2, the first transposed convolution layer 402.1, the second transposed convolution layer 402.2, a third convolution layer 401.3 and a third transposed convolution layer 402.3. The second architecture 400b include the first architecture 400a followed by the third convolution layer added to output of the first transposed convolution layer 402.1, whose output is then merged with the input LR patch 209. Since feature output is merged with the input LR patch 209, dimension of the merged feature map is the same as that of the input. Therefore, the third transposed convolution layer 402.3 may be used in the end to upscale by two times. The second architecture 400b includes two parallel feature maps, which are merged to obtain the final high-resolution output as shown in Figure 4b. In an embodiment, the second architecture 400b may be referred to as Parallel Stream Convolution (PSC) architecture. In an embodiment, performance of the second architecture 400b may also be evaluated using at least one of ReLU or PReLU activation functions.
[0065] In an embodiment, the training module 204 may be configured to incorporate residual learning which is useful when there is a chance of occurrence of exploding/vanishing gradients while training the architecture. Simply stacking more layers does not improve the performance of the architecture, as compared to combining residual blocks of layers. In residual learning, the architecture does not learn the exact pixel-pixel correspondence. Instead, the architecture learns the residual output, which consists mostly of zeros or negligible numbers. Thus, the architecture may be trained at a higher learning rate to predict the residuals rather than the actual pixels, while using mote number of layers than the usual CNN’s. In the second architecture 400b, a residual connection from the input to one of the intermediate layers is included, instead of typically connecting the input to final output layer. Since the input image is combined with intermediate feature tensor, it is sufficient for the architecture to learn those extra set of features that are requited for efficient upscaling. Thus, the need to leam the redundant features already present in the input image is obviated. Here, effectiveness of using residual connections between the intermediate features is shown instead of initially upscaling the input image and combining it with the CNN model’s final output
[0066] Table 3 below shows dimensions of intermediate feature maps of the second architecture 400b for an upscaling factor of“4”.
Figure imgf000019_0001
[0067] In an embodiment, alternative of the second architecture 400b may be proposed with additional two transposed convolution layers. In such alternative, a fourth transposed convolution layer and fifth transposed convolution layer may be cascaded with the remaining transposed convolution layers. Output of the fourth transposed convolution layer (TRCONV4) and the fifth transposed convolution layer (TRCONV5) may be merged to output upscaled LR binary image enhanced by the upscaling factor of“4”.
[0068] Table 4 below shows dimensions of intermediate feature maps of the second architecture 400b for an upscaling factor of“4”.
Figure imgf000019_0002
Figure imgf000020_0002
[0069] In an embodiment, the LR binary image upscale module 203 may be configured to use third architecture illustrated in Figure 4c, for upscaling the resolution by the predefined upscaling factor of“4”. The third architecture 400c may include the first convolution layer 401.1, the second convolution layer 401.2, the first transposed convolution layer 402.1 and a sub-pixel convolution layer 403. In the third architecture 400c, the second transposed layer of the first architecture 400a is replaced with the sub-pixel convolution layer 403. Since sub-pixel operation does not include trainable parameters as other upscaling layers do, computational complexity is less than that of the first architecture 400a. In an embodiment, to achieve further upscaling of the LR binary image 208, number of feature maps in the layer before sub-pixel convolution may be increased. Using the third architecture 400c, only a single sub-pixel layer may be sufficient to upscale by the upscaling factor of“2” or“4”.
[0070] In an embodiment, the sub-pixel convolution layer 403 may be an alternative to fractionally strided convolution, interpolation and un-pooling methods for increasing the dimensionality is the sub-pixel convolution operation. The sub-pixel convolution layer 403 may be a non-trainable layer, since it only implements matrix manipulations to change the feature dimensions and does not have any weights to learn. [0071] Table 5 below shows dimensions of intermediate feature maps of the third architecture
400c for an upscaling factor of“2”.
Figure imgf000020_0001
[0072] Table 6 below shows dimensions of intermediate feature maps of the third architecture 400c for an upscaling factor of“4”.
Figure imgf000021_0001
[0073] In an embodiment, the activation module 205 may be configured to perform activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer 403, based on corresponding output. In an embodiment, at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU) may be used for performing the activation.
[0074] One or more other modules 207 of the readability enhancement device 101 may include to perform loss function the proposed architectures. Standard Mean Square Error (MSE) function may be used as the loss function to train the architectures.
[0075] In an embodiment, a fourth architecture may be proposed which is a three stream, parallel neural network as shown in Figure 4d. The fourth architecture includes to merge output of the first architecture 400a, the second architecture 400b and the third architecture 400c, to get the final output. It may be observed that the third architecture 400c may contribute more details than the other architecture to the overall output, while training on either with upscaling factor of“2’' or“4”. In an embodiment, the output of the readability enhancement device 101 which is the upscaled LR binary image may be stored as the upscaled LR binary image data 211, in the memory 107. [0076] Figure 4e gives the results of the proposed architectures for a small cropped region of one of the input test images. From Figure 4e, major and minor differences in the character level predictions of the proposed architecture may be qualitatively observed. The left figure in the top panel shows the input, which has been cropped from a Tamil document image and zoomed for the purpose of visualization and beside it is the corresponding zoomed ground truth image. For the sake of visual comparison, bicubic interpolation is used as a baseline and given the images interpolated by factors of 2 and 4. The second row displays the outputs of the first architecture 400a and its variants. The first image C2 is the two times upscaled output of the first architecture 400a. The second image C4 is the result of four times upscaling of the first architecture 400a. The third result CP2 is the two times upscaled output using PReLU as the activation function of the first architecture 400a. Similarly, CP4 is the image obtained after four times upscaling using PReLU of the first architecture 400a. Third and fourth rows show the output images of the different variants of the second architecture 400b and the third architecture 400c, respectively. The image R2 is the two times upscaled output of the second architecture 400b. The image R4 is the result of four times upscaling of the second architecture. The third result RP2 is the two times upscaled output using PReLU as the activation function of the second architecture 400b. Similarly, RP4 is the image obtained after four times upscaling using PReLU of the second architecture 400b. The image S2 is the two times upscaled output of the third architecture 400c. The image S4 is the result of four times upscaling of the third architecture 400c. The third result SP2 is the two times upscaled output using PReLU as the activation function of the third architecture 400c. Similarly, SP4 is the image obtained after four times upscaling using PReLU of the third architecture 400c.
[0077] Providing the LR binary image 208 to the OCR system 102 may lead to poor performance of the OCR system 102 on sparsely connected symbols. When a low-quality image is passed to the OCR system 102, since pixels representing a symbol are not properly connected, during segmentation stage, many symbols may be segmented into multiple pieces. Each of these split components may be wrongly classified by the OCR system 102 as one of known characters, leading to the poor classification of the binary image. By the proposed architecture and generating diverse training dataset, challenging task of binary document image super-resolution, using deep neural networks is explored. The OCR system input module 206 may be configured to input the upscaled LR binary image to the OCR system 102, for improving accuracy of character recognition in the LR binary image 208.
[0078] Figure 4f shows a part of a test image, its output image and the corresponding text outputs obtained from an online OCR. Image 404.1 shows a poor quality of 75-dpi binary input image, which is not even easy for native Tamils to read directly from. As clearly revealed by the output text given in image 404.2, there are too many errors arising out of the poor image segmentation during the OCR process. Roman and Chinese characters, lndo-Arabic numerals and certain other symbols are wrongly present in the recognized output. Image 404.3 illustrates relatively high quality, with upscaling factor of“4” produced by the third architecture 400c with PReLU activation. It may be obvious that the human readability of the resultant image 404.3 is high, and that a native Tamil can read the text easily, in spite of some strokes still missing. Accordingly, the text output by online OCR as shown as image 404.4 is also significantly better, where not even a single Roman character or numeral is present. [0079] The other data 214 may store data, including temporary data and temporary files, generated by modules for performing the various functions of the readability enhancement device 101. The one or more modules 108 may also include other modules 207 to perform various miscellaneous functionalities of the readability enhancement device 101. It will be appreciated that such modules may be represented as a single module or a combination of different modules.
[0080] Figure 5 illustrates a flowchart showing an exemplary method to enhance readability of the LR binary image 208, in accordance with some embodiments of present disclosure. [0081] At block 501, the readability enhancement device 101 may be configured to generate the input LR patch v of predefined size from the LR binary image 208. One or more techniques, known to a person skilled in the art, may be implemented to generate the input LR patch 209.
[0082] At block 502, the readability enhancement device 101 may be configured to extract one or more features 210 associated with the input LR patch 209, in LR space, using one or more convolution layers. In an embodiment, two convolution layers, cascaded with each other, may be implemented to extract the one or more features 210.
[0083] At block 503, the readability enhancement device 101 may be configured to upscale the LR binary image 208 to the HR space using the one or more features 210 for enhancing readability of the LR binary image 208. The LR binary image 208 may be upscaled by the predefined upscaling factor. The resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers. [0084] As illustrated in Figure 5, the method 500 may include one or more blocks for executing processes in the readability enhancement device 101. The method 500 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions or implement particular abstract data types.
[0085] The order in which the method 500 is described may not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method. Additionally, individual blocks may be deleted from the methods without departing from the scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof.
Computing System
[0086] Figure 6 illustrates a block diagram of an exemplary computer system 600 for implementing embodiments consistent with the present disclosure. In an embodiment, the computer system 600 is used to implement the readability enhancement device 101. The computer system 600 may include a central processing unit (“CPU” or“processor”) 602. The processor 602 may include at least one data processor for executing processes in Virtual Storage Area Network. The processor 602 may include specialized processing units such as, integrated system (bus) controllers, memory' management control units, floating point units, graphics processing units, digital signal processing units, etc.
[0087] The processor 602 may be disposed in communication with one or more input/output (I/O) devices 609 and 610 via I/O interface 601. The I/O interface 601 may employ communication protocols/methods such as, without limitation, audio, analog, digital, monaural, RCA, stereo, IEEE-1394, serial bus, universal serial bus (USB), infrared, PS/2, BNC, coaxial, component, composite, digital visual interface (DVI), high-definition multimedia interface (HDMI), RF antennas, S-Video, VGA, IEEE 802.n /b/g/n/x, Bluetooth, cellular (e.g., code-division multiple access (CDMA), high-speed packet access (HSPA+), global system for mobile communications (GSM), long-term evolution (LTE), WiMax, or the like), etc. [0088] Using the VO interface 601, the computer system 600 may communicate with one or more I/O devices 609 and 610. For example, the input devices 609 may be an antenna, keyboard, mouse, joystick, (infrared) remote control, camera, card reader, fax machine, dongle, biometric reader, microphone, touch screen, touchpad, trackball, stylus, scanner, storage device, transceiver, video device/source, etc. The output devices 610 may be a printer, fax machine, video display (e.g., cathode ray tube (CRT), liquid crystal display (LCD), light- emitting diode (LED), plasma, Plasma display panel (PDP), Organic light-emitting diode display (OLED) or the like), audio speaker, etc.
[0089] In some embodiments, the computer system 600 may consist of the readability enhancement device 101. The processor 602 may be disposed in communication with the communication network 611 via a network interface 603. The network interface 603 may communicate with the communication network 611. The network interface 603 may employ connection protocols including, without limitation, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802.1 la/b/g/n/x, etc. The communication network 611 may include, without limitation, a direct interconnection, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, etc. Using the network interface 603 and the communication network 611, the computer system 600 may communicate with OCR system 612 for enhancing readability and accurate character recognition. The network interface 603 may employ connection protocols include, but not limited to, direct connect, Ethernet (e.g., twisted pair 10/100/1000 Base T), transmission control protocol/intemet protocol (TCP/IP), token ring, IEEE 802.1 la/b/g/n/x, etc.
[0090] The communication network 611 includes, but is not limited to, a direct interconnection, an e-commerce network, a peer to peer (P2P) network, local area network (LAN), wide area network (WAN), wireless network (e.g., using Wireless Application Protocol), the Internet, Wi-Fi, and such. The first network and the second network may either be a dedicated network or a shared network, which represents an association of the different types of networks that use a variety of protocols, for example, Hypertext Transfer Protocol (HTTP), Transmission Control Protocolvlntemet Protocol (TCP/IP), Wireless Application Protocol (WAP), etc., to communicate with each other. Further, the first network and the second network may include a variety of network devices, including routers, bridges, servers, computing devices, storage devices, etc. [0091] In some embodiments, the processor 602 may be disposed in communication with a memory 605 (e.g., RAM, ROM, etc. not shown in Figure 6) via a storage interface 604. The storage interface 604 may connect to memory 605 including, without limitation, memory drives, removable disc drives, etc., employing connection protocols such as, serial advanced technology attachment (SATA), Integrated Drive Electronics (IDE), IEEE- 1394, Universal Serial Bus (USB), fibre channel, Small Computer Systems Interface (SCSI), etc. The memory drives may further include a drum, magnetic disc drive, magneto-optical drive, optical drive. Redundant Array of Independent Discs (RAID), solid-state memory devices, solid-state drives, etc.
[0092] The memory 605 may store a collection of program or database components, including, without limitation, user interface 606, an operating system 607, web browser etc. In some embodiments, computer system 600 may store user/application data 606, such as, the data, variables, records, etc., as described in this disclosure. Such databases may be implemented as fault-tolerant, relational, scalable, secure databases such as Oracle ® or Sybase®.
[0093] In some embodiments, the computer system 600 may implement a web browser 608 stored program component. The web browser 608 may be a hypertext viewing application, such as Microsoft Internet Explorer, Google Chrome, Mozilla Firefbx, Apple Safari, etc. Secure web browsing may be provided using Secure Hypertext Transport Protocol (HTTPS), Secure Sockets Layer (SSL), Transport Layer Security (TLS), etc. Web browsers 508 may utilize facilities such as AJAX, DHTML, Adobe Flash, JavaScript, Java, Application Programming Interfaces (APIs), etc. In some embodiments, the computer system 600 may implement a mail server stored program component. The mail server may be an Internet mail server such as Microsoft Exchange, or the like. The mail server may utilize facilities such as ASP, ActiveX, ANSI C++/C#, Microsoft .NET, CGI scripts, Java, JavaScript, PERL, PHP, Python, WebObjects, etc. The mail server may utilize communication protocols such as Internet Message Access Protocol (IMAP), Messaging Application Programming Interface (MAPI), Microsoft Exchange, Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), or the like. In some embodiments, the computer system 600 may implement a mail client stored program component. The mail client may be a mail viewing application, such as Apple Mail, Microsoft Entourage, Microsoft Outlook, Mozilla Thunderbird, etc. [0094] The operating system 607 may facilitate resource management and operation of the computer system 600. Examples of operating systems include, without limitation, APPLE MACINTOSH® OS X, UNIX®, UNIX-like system distributions (E.G., BERKELEY SOFTWARE DISTRIBUTION™ (BSD), FREEBSD™, NETBSD™, OPENBSD™, etc.), LINUX DISTRIBUTIONS™ (E G., RED HAT™, UBUNTU™, KUBUNTU™, etc.), IBM™ OS/2, MICROSOFT™ WINDOWS™ (XP™, VISTA™/7/8, 10 etc.), APPLE® IOS™,
GOOGLE® ANDROID™, BLACKBERRY® OS, or the like.
[0095] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processors) to perform steps or stages consistent with the embodiments described herein. The term“computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, which may be non-transitory. Examples include Random Access Memory' (RAM), Read-Only Memory' (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.
[0096] The described operations may be implemented as a method, system or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The described operations may be implemented as code maintained in a“non-transitory' computer readable medium”, where a processor may read and execute the code from the computer readable medium. The processor is at least one of a microprocessor and a processor capable of processing and executing the queries. A non-transitory computer readable medium may include media such as magnetic storage medium (e.g., hard disk drives, floppy disks, tape, etc.), optical storage (CD-ROMs, DVDs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, Flash Memory, firmware, programmable logic, etc.), etc. Further, non-transitory computer-readable media may include all computer-readable media except for a transitory. The code implementing the described operations may further be implemented in hardware logic (e.g., an integrated circuit chip, Programmable Gate Array (PGA), Application Specific Integrated Circuit (ASIC), etc.). [0097] An“article of manufacture” includes non-transitory computer readable medium, and /or hardware logic, in which code may be implemented. A device in which the code implementing the described embodiments of operations is encoded may include a computer readable medium or hardware logic. Of course, those skilled in the art will recognize that many modifications may be made to this configuration without departing from the scope of the invention, and that the article of manufacture may include suitable information bearing medium known in the art.
[0098] The terms“an embodiment”,“embodiment”,“embodiments”,“the embodiment”,“the embodiments”,“one or more embodiments”,“some embodiments”, and“one embodiment” mean“one or more (but not all) embodiments of the invention(s)” unless expressly specified otherwise.
[0099] The terms“including”,“comprising”,“having” and variations thereof mean“including but not limited to”, unless expressly specified otherwise.
[00100] The enumerated listing of items does not imply that any or all of the items are mutually exclusive, unless expressly specified otherwise. [00101] The terms“a”,“an” and“the” mean“one or more”, unless expressly specified otherwise.
[00102] A description of an embodiment with several components in communication with each other does not imply that all such components are required. On the contrary' a variety of optional components are described to illustrate the wide variety of possible embodiments of the invention.
[00103] When a single device or article is described herein, it will be readily apparent that more than one device/article (whether or not they cooperate) may be used in place of a single device/article. Similarly, where more than one device or article is described herein (whether or not they cooperate), it will be readily apparent that a single device/article may be used in place of the more than one device or article or a different number of devices/articles may be used instead of the shown number of devices or programs. The functionality and/or the features of a device may be alternatively embodied by one or more other devices which are not explicitly described as having such functionality/features. Thus, other embodiments of the invention need not include the device itself.
[00104] The illustrated operations of Figure 5 show certain events occurring in a certain order. In alternative embodiments, certain operations may be performed in a different order, modified, or removed. Moreover, steps may be added to the above described logic and still conform to the described embodiments. Further, operations described herein may occur sequentially or certain operations may be processed in parallel. Yet further, operations may be performed by a single processing unit or by distributed processing units. [00105] Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. It is therefore intended that the scope of the invention be limited not by this detailed description, but rather by any claims that issue on an application based here on. Accordingly, the disclosure of the embodiments of the invention is intended to be illustrative, but not limiting, of the scope of the invention, which is set forth in the following claims.
[00106] While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.
Figure imgf000030_0001
Figure imgf000031_0001
610 Output Devices
611 Communication Network
612 OCR system

Claims

We claim:
1. A method for enhancing readability of a Low Resolution (LR) binary image, comprising:
generating, by a readability enhancement device, an input LR patch of predefined size from a LR binary image;
extracting, by the readability enhancement device, one or more features associated with the input LR patch, in LR space, using one or more convolution layers; and
upscaling, by the readability enhancement device, the LR binary image to High Resolution (HR) space using one or more features, by predefined upscaling factor, for enhancing readability of the LR binary image, wherein the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
2. The method as claimed in claim 1 further comprises training the readability enhancement device using a training dataset comprising plurality of LR-HR patch pairs, generated from plurality' of LR and corresponding HR binary images.
3. The method as claimed in claim 2, wherein each of plurality of LR patches, from the LR-HR patch pairs, is generated by performing at least one of:
alternating pixels of a corresponding HR patch from one or more HR binary image;
multiplying the corresponding LR patch with a mask comprising randomly placed zeroes and ones;
generating one or more rotated variants of the corresponding HR patch; and scanning the training images at a low scanning resolution.
4. The method as claimed in claim 2, wherein the plurality of LR and corresponding HR binary' images comprises a large set of samples of all characters of specific language of a document, segmented from a collection of very old books with different fonts and sizes of the characters.
5. The method as claimed in claim 2, wherein when the predefined upscaling factor for the LR binary image is“r”, plurality of HR patches, from the LR-HR patch pairs, are generated by taking overtyping patches with stride“r" from the plurality of HR binary images.
6. The method as claimed in claim 1, wherein upscaling the resolution by the predefined upscaling factor of“2” comprises:
mapping each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer and a second transposed convolution layer, cascaded with each other, for enhancing readability of the LR binary image.
7. The method as claimed in claim 1, wherein upscaling the resolution by the predefined upscaling factor of“4” comprises:
mapping each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer and a second transposed convolution layer, cascaded with each other;
performing convolution on output of the first transposed convolution layer using an additional convolution layer;
merging the LR binary image with output of the additional convolution layer to obtain one or more merged features with dimension same as that of the LR binary image; and
merging the one or more merged features with output of the second transposed convolution layer, for enhancing readability of the LR binary image.
8. The method as claimed in claim 1, wherein upscaling the resolution by the predefined upscaling factor of“4” comprises:
mapping each pixel of the one or more features in the LR space with multiple pixels in HR space using a first transposed convolution layer; and
re-arranging one or more features in the HR space obtained from the first transposed convolution layer, using a sub-pixel convolution layer, for enhancing readability of the LR binary image.
9. The method as claimed in claim 1, further comprises:
performing activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer, based on corresponding output, using at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU).
10. The method as claimed in claim 1, further comprises:
inputting the upscaled HR binary image to an Optical Character Recognizer (OCR) system, for improving accuracy of character recognition in the LR binary image.
11. A readability enhancement device for enhancing readability of a Low Resolution (LR) binary image, comprises:
a processor; and
a memory communicatively coupled to the processor, wherein the memory stores processor-executable instructions, which, on execution, cause the processor to:
generate an input LR patch of predefined size from a LR binary image; extract one or more features associated with the input LR patch, in LR space, using one or more convolution layers; and
upscale the LR binary image to High Resolution (HR) space using the one or more features, by predefined upscaling factor, for enhancing readability of the LR binary image, wherein the resolution is upscaled using at least one or more transposed convolution layers, one or more additional convolution layers and one or more sub-pixel convolution layers.
12. The readability enhancement device as claimed in claim 11 further comprises the processor configured to train the readability enhancement device using a training dataset comprising plurality of LR-HR patch pairs, generated from plurality of LR and corresponding HR binary images.
13. The readability enhancement device as claimed in claim 12, wherein the processor is configured to generate each of plurality of LR patches, from the LR-HR patch pairs, by performing at least one of:
alternating pixels of a corresponding HR patch from one or more HR binary rmage;
multiplying the corresponding LR patch with a mask comprising randomly placed zeroes and ones;
generating one or more rotated variants of the corresponding HR patch; and scanning the training images at a low scanning resolution.
14. The readability enhancement device as claimed in claim 12, wherein the plurality of LR and corresponding HR binary images comprises a large set of samples of all characters of specific language of a document, segmented from a collection of very old books with different fonts and sizes of the characters.
15. The readability enhancement device as claimed in claim 12, wherein when the predefined upscaling factor for the LR binary' image is“r”, the processor is configured to generate plurality of HR patches, from the LR-HR patch pairs, by taking overlapping patches with stride“r" from the plurality of HR binary images.
16. The readability enhancement device as claimed in claim 11, wherein the processor is configured to upscale the resolution by the predefined upscaling factor of“2” by: mapping each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer and a second transposed convolution layer, cascaded with each other, for enhancing readability of the LR binary image.
17. The readability enhancement device as claimed in claim 11, wherein the processor is configured to upscale the resolution by the predefined upscaling factor of“4” by: mapping each pixel in the LR space with multiple pixels in HR space using a first transposed convolution layer and a second transposed convolution layer, cascaded with each other;
performing convolution on output of the first transposed convolution layer using an additional convolution layer;
merging the LR binary image with output of the additional convolution layer to obtain one or more merged features with dimension same as that of the LR binary image; and
merging the one or more merged features with output of the second transposed convolution layer, for enhancing readability' of the LR binary image.
18. The readability enhancement device as claimed in claim 11, wherein the processor is configured to upscale the resolution by the predefined upscaling factor of“4” by: mapping each pixel of the one or more features in the LR space with multiple pixels in HR space using a first transposed convolution layer; and re-arranging one or more features in the HR space obtained from the first transposed convolution layer, using a sub-pixel convolution layer, for enhancing readability of the LR binary image.
19. The readability enhancement device as claimed in claim 11, further comprises the processor configured to:
perform activation of at least one of the one or more convolution layers, one or more transposed convolution layers, one or more additional convolution layers and the sub-pixel convolution layer, based on corresponding output, using at least one of Rectified Linear Unit (ReLU) or Parametric ReLU (PReLU).
20. The readability enhancement device as claimed in claim 11, further comprises the processor configured to:
input the upscaled HR binary image to an Optical Character Recognizer (OCR) system, for improving the accuracy of character recognition in the LR binary image.
PCT/IB2019/058813 2018-10-16 2019-10-16 Device and method for enhancing readability of a low-resolution binary image WO2020079605A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201841030740 2018-10-16
IN201841030740 2018-10-16

Publications (1)

Publication Number Publication Date
WO2020079605A1 true WO2020079605A1 (en) 2020-04-23

Family

ID=70283726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/058813 WO2020079605A1 (en) 2018-10-16 2019-10-16 Device and method for enhancing readability of a low-resolution binary image

Country Status (1)

Country Link
WO (1) WO2020079605A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499588B2 (en) * 2004-05-20 2009-03-03 Microsoft Corporation Low resolution OCR for camera acquired documents
CN107369189A (en) * 2017-07-21 2017-11-21 成都信息工程大学 The medical image super resolution ratio reconstruction method of feature based loss
US9934553B2 (en) * 2015-11-06 2018-04-03 Thomason Licensing Method for upscaling an image and apparatus for upscaling an image

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7499588B2 (en) * 2004-05-20 2009-03-03 Microsoft Corporation Low resolution OCR for camera acquired documents
US9934553B2 (en) * 2015-11-06 2018-04-03 Thomason Licensing Method for upscaling an image and apparatus for upscaling an image
CN107369189A (en) * 2017-07-21 2017-11-21 成都信息工程大学 The medical image super resolution ratio reconstruction method of feature based loss

Similar Documents

Publication Publication Date Title
US10733433B2 (en) Method and system for detecting and extracting a tabular data from a document
EP3437019B1 (en) Optical character recognition in structured documents
US11048867B2 (en) System and method for extracting tabular data from a document
US9984287B2 (en) Method and image processing apparatus for performing optical character recognition (OCR) of an article
US10846525B2 (en) Method and system for identifying cell region of table comprising cell borders from image document
US20200320288A1 (en) Method and system for determining one or more target objects in an image
US11657513B2 (en) Method and system for generating a tri-map for image matting
US9330331B2 (en) Systems and methods for offline character recognition
US9412052B1 (en) Methods and systems of text extraction from images
US20070009155A1 (en) Intelligent importation of information from foreign application user interface using artificial intelligence
US10984279B2 (en) System and method for machine translation of text
US11704925B2 (en) Systems and methods for digitized document image data spillage recovery
US11216798B2 (en) System and computer implemented method for extracting information in a document using machine readable code
US10769429B2 (en) Method and system for extracting text from an engineering drawing
Farahmand et al. Noise removal and binarization of scanned document images using clustering of features
WO2020079605A1 (en) Device and method for enhancing readability of a low-resolution binary image
US9373048B1 (en) Method and system for recognizing characters
CN113793264B (en) Archive image processing method and system based on convolution model and electronic equipment
US11386687B2 (en) System and method for reconstructing an image
US20200293613A1 (en) Method and system for identifying and rendering hand written content onto digital display interface
US11205084B2 (en) Method and system for evaluating an image quality for optical character recognition (OCR)
US20160086056A1 (en) Systems and methods for recognizing alphanumeric characters
US20180285334A1 (en) System and method for detecting and annotating bold text in an image document
US10929992B2 (en) Method and system for rendering augmented reality (AR) content for textureless objects
US20230049395A1 (en) System and Computer-Implemented Method for Character Recognition in Payment Card

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19873130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19873130

Country of ref document: EP

Kind code of ref document: A1