AU2018229526B2

AU2018229526B2 - Recursive contour merging based detection of text area in an image

Info

Publication number: AU2018229526B2
Application number: AU2018229526A
Authority: AU
Inventors: Manmath KUMAR
Original assignee: Tata Consultancy Services Ltd
Current assignee: Tata Consultancy Services Ltd
Priority date: 2017-09-16
Filing date: 2018-09-14
Publication date: 2020-07-16
Anticipated expiration: 2038-09-14
Also published as: AU2018229526A1

Abstract

RECURSIVE CONTOUR MERGING BASED DETECTION OF TEXT AREA INANIMAGE 5 Scanning an entire image to extract text results in unwanted processing and generation of poor output pertaining to non-text portions of the image. Existing computer vision solutions that cater to identifying text area in an image for further extraction of text provide poor output when the image is of poor quality or 10 particularly when the text is warped or skewed. Systems and methods of the present disclosure deduce contours around each figure in an input image and bound them with a rectangle. Based on recursive contour merging, one or more potential text areas are obtained from which one or more candidate text areas are identified for further processing and extraction of text from the input image. 15 (To be published with FIG.2A) Cl 00 Ct -C -t ct (D -C cct Ct~ 0 -0 0D C) 0-- ctC -C ~ ) 0 C p o ~ 0 0 -t -~0 Qt 0-0 -C p ct o -Ce C 0t ( c~ ~ '-' n ct (D )t c-e ct ~ e ctc C)e

Description

Cl 00

Ct

-C -t

ct (D -C

cct Ct~ 0

-0 0D

C) 0-- ctC

-C ~) 0 C p o ~ 0 0 -t

-~0 Qt 0-0 -C pct o -Ce C

0t (

ct c~ ~ '-' n (D )t

c-e

ct ~ e

ctc C)e

RECURSIVE CONTOUR MERGING BASED DETECTION OF TEXT AREA INANIMAGE PRIORITY CLAIM

[001] The present application claims priority from: Indian Patent Application No. 201721032846, filed on 1 6th September, 2017. The entire contents of the aforementioned application are incorporated herein by reference.

TECHNICAL FIELD

[002] The disclosure herein generally relates to identification of text in graphical images, and particularly to recursive contour merging based detection of text area in an image, wherein images maybe of low quality or having warped or skewed text.

BACKGROUND

[003] Extracting text is a common activity in the field of image processing and extraction of information from images. There are numerous algorithms and Optical Character Recognition (OCR) tools in the market to extract text from images. In the process of extracting text from images, most OCR tools try to convert entire image to text which includes non-text sections leading to junk character generation in the output. To overcome this issue some algorithms have been developed to find exact text area(s) in the image and only let the OCR scan through identified text area to reduce junk characters in the output. A major challenge that needs to be addressed in such algorithms is extracting text from distorted, blurred, poor quality images wherein the text may either be of low quality or warped or skewed.

SUMMARY

[004] Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems.

[005] In an aspect, there is provided a processor implemented method comprising deducing a contour around each figure in an input image to obtain one or more deduced contours and bounding each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merging two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours; computing density of the rectangles within the one or more overall potential text areas; and identifying one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles.

[006] In another aspect, there is provided a system comprising: one or more data storage devices operatively coupled to the one or more processors and configured to store instructions configured for execution by the one or more processors to: deduce a contour around each figure in an input image to obtain one or more deduced contours and bound each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merge two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours; compute density of the rectangles within the one or more overall potential text areas; and identify one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles.

[007] In yet another aspect, there is provided a computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: deduce a contour around each figure in an input image to obtain one or more deduced contours and bound each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merge two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours; compute density of the rectangles within the one or more overall potential text areas; and identify one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles.

[008] In an embodiment of the present disclosure, the one or more hardware processors are further configured to perform one or more of: de-skewing the input image to correct rotation of the input image; pre-processing the input image to enhance quality of the input image by performing one or more of: converting the input image to grayscale; introducing blurriness, pixel density adjustment and histogram equalization; detecting edges in the pre-processed input image; applying erosion to reduce thickness of the detected edges; and deducing the contour around the detected edges, prior to deducing the contour around each figure in the input image.

[009] In an embodiment of the present disclosure, the pre-defined threshold pixel distance is based on the input image size and text to text distance in pixels.

[010] In an embodiment of the present disclosure, the one or more hardware processors are further configured to identify one or more candidate text areas by: selecting one or more of the one or more potential text areas having density of the rectangles greater than a pre-defined threshold density, wherein the pre-defined threshold density is based on empirical observations; and cropping off the selected one or more potential text areas to identify the one or more candidate text areas for the further processing.

[011] In an embodiment of the present disclosure, the one or more hardware processors are further configured to de-skew the one or more candidate text areas for alignment; apply one or more morphology techniques to enhance text within the aligned one or more candidate text areas; and further process the aligned one or more candidate text areas.

[012] In an embodiment of the present disclosure, the one or more hardware processors are further configured to perform the further processing by an Optical Character Recognition (OCR) engine.

[013] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the embodiments of the present disclosure, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

[014] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles.

[015] FIG.1 illustrates an exemplary block diagram of a system for recursive contour merging based detection of text area in an image, in accordance with an embodiment of the present disclosure.

[016] FIG.2A and FIG.2B illustrate an exemplary flow diagram illustrating a computer implemented method for recursive contour merging based detection of text area in an image, in accordance with an embodiment of the present disclosure.

[017] FIG.3A is an illustrative image of an exemplary retail receipt and FIG.3B is an illustrative image of a cropped text area from the image of FIG.3A, in accordance with an embodiment of the present disclosure.

[018] FIG.4 illustrates a grayscale conversion of an exemplary color image, in accordance with an embodiment of the present disclosure.

[019] FIG.5 illustrates blurring of an exemplary image, in accordance with an embodiment of the present disclosure.

[020] FIG.6 illustrates edge detection using Canny's Algorithm, in accordance with an embodiment of the present disclosure.

[021] FIG.7 illustrates erosion transformation of an exemplary image, in accordance with an embodiment of the present disclosure.

[022] FIG.8 illustrates contour detection in an exemplary image, in accordance with an embodiment of the present disclosure.

[023] FIG.9 illustrates bounding rectangles in an exemplary image, in accordance with an embodiment of the present disclosure.

[024] FIG.10 illustrates logic in detection of potential text areas, in accordance with an embodiment of the present disclosure.

[025] FIG.11A illustrates grayscale conversion of an exemplary image, in accordance with an embodiment of the present disclosure.

[026] FIG.11B illustrates brightness and contrast adjustment of the exemplary image of FIG.11A, in accordance with an embodiment of the present disclosure.

[027] FIG.12A illustrates output of edge detection on the exemplary image of FIG.11B using Canny's Algorithm, in accordance with an embodiment of the present disclosure.

[028] FIG.12B illustrates contour detection in the exemplary image of FIG.12A, in accordance with an embodiment of the present disclosure.

[029] FIG.13A illustrates overall bounding contours in the exemplary image of FIG.12B, in accordance with an embodiment of the present disclosure.

[030] FIG.13B illustrates the final image to be considered for further processing by the OCR engine, in accordance with an embodiment of the present disclosure.

[031] It should be appreciated by those skilled in the art that any block diagram herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.

DETAILED DESCRIPTION

[032] Exemplary embodiments are described with reference to the accompanying drawings. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. Wherever convenient, the same reference numbers are used throughout the drawings to refer to the same or like parts. While examples and features of disclosed principles are described herein, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the disclosed embodiments. It is intended that the following detailed description be considered as exemplary only, with the true scope and spirit being indicated by the following claims.

[033] Although there are solutions in computer vision domain for extracting text from images, the percentage of text detection from low quality images are very low. Some solutions may provide very good output for good quality portion of the image ignoring the low quality portion of the image. Some solutions are capable of enhancing the image from its initial quality but at the cost of erasing the text within. Some solutions work well with aligned text in the image, but fail when the text portion is skewed. Systems and methods of the present disclosure aim to provide a solution that may cater to all these drawbacks in the existing solution by providing a recursive contour merging based detection of text area in an image. The present disclosure provides an adaptive algorithm for text area detection from an image, wherein a hierarchical algorithm structure is provided with different emphasis at each stage. At a first stage, an input image is reduced to binary edge image. This helps in removing background noise and only keeps a trail of high pixel density change or RGB (red green blue) color component change. Then out of all edges found, probable structuring elements for identifying potential text are obtained. In a next stage all the probable structuring elements are enclosed in bounding rectangles to define a symmetric area around the structuring elements. Then the rectangles are clubbed together based on density of the rectangle within a particular threshold pixel distance. All the rectangles which do not meet a pre-defined threshold density or a pre-defined threshold pixel distance are discarded. The resulting clubbed rectangle is the potential text area within the image which contains the texts. The coordinates of the deduced rectangle may be used to crop the image and resulting image containing only text may be fed to an Optical Character Recognition (OCR) engine to obtain text with minimal junk characters.

[034] Referring now to the drawings, and more particularly to FIGS. 1 through 13, where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments and these embodiments are described in the context of the following exemplary system and method.

[035] FIG.1 illustrates an exemplary block diagram of a system 100 for recursive contour merging based detection of text area in an image, in accordance with an embodiment of the present disclosure. In an embodiment, the system 100 includes one or more processors 104, communication interface device(s) or input/output (I/O) interface(s) 106, and one or more data storage devices or memory 102 operatively coupled to the one or more processors 104. The one or more processors 104 that are hardware processors can be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, graphics controllers, logic circuitries, and/or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) are configured to fetch and execute computer-readable instructions stored in the memory. In an embodiment, the system 100 can be implemented in a variety of computing systems, such as laptop computers, notebooks, hand-held devices, workstations, mainframe computers, servers, a network cloud and the like.

[036] The I/O interface device(s) 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications within a wide variety of networks N/W and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite. In an embodiment, the I/O interface device(s) can include one or more ports for connecting a number of devices to one another or to another server.

[037] The memory 102 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes. In an embodiment, one or more modules (not shown) of the system 100 can be stored in the memory 102.

[038] In an embodiment, the system 100 comprises one or more data storage devices or memory 102 operatively coupled to the one or more processors 104 and is configured to store instructions configured for execution of steps of the method 200 by the one or more processors 104.

[039] FIG.2A and FIG.2B illustrate an exemplary flow diagram illustrating a computer implemented method for recursive contour merging based detection of text area in an image, in accordance with an embodiment of the present disclosure. The steps of the method 200 will now be explained in detail with reference to the components of the system 100 of FIG.1. Although process steps, method steps, techniques or the like may be described in a sequential order, such processes, methods and techniques may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical. Further, some steps may be performed simultaneously.

[040] While scanning through an input image, the OCR engine tries to find out patterns within the input image resembling texts patterns stored in its model. Based on the patterns found in the input image, OCR deduces corresponding text and provides that as an output. Pattern matching is a probabilistic approach within the OCR engine which results in varying quality output based on the content and quality of the image. The OCR engine gives a very good text scanning output if the image is of good quality or the region of interest (ROI) is defined perfectly. Finding only the actual text within the image and letting the OCR engine scan it is the major challenge. FIG.3A is an illustrative image of an exemplary retail receipt. It may be noted that the exemplary retail receipt is creased, a typical scenario wherein the image is warped with possibly low quality text or skewed text because of the crease and convention text extraction solutions fail to generate desirable output. The text in the image seems quite readable to human eyes as human eyes tend to ignore surrounding background and look directly at the text area to read. Human eyes are naturally tuned to focus on the target automatically, but same is not true when a computer is involved for the job. When the OCR engine tries to scan, it scans all the pixels including the non-text area like the comers and tries to produce unnecessary text out of it. Hence to avoid the OCR engine to read through non-text area, it is necessary to identify the actual text part of the image section that can be passed to the OCR tool for text scanning. FIG.3B is an illustrative image of a cropped text area from the image of FIG.3A, in accordance with an embodiment of the present disclosure. The cropped text area illustrated in FIG.3B generated in accordance with the present disclosure may be used as an input to an OCR engine to get better quality text output.

[041] Before initiating the key steps of the method 200 of the present disclosure, certain pre-processing may be performed to enhance the quality of the output. For instance, the input image may be de-skewed to correct rotation thereof. Then, the input image may be converted to grayscale to reduce complexity of handling different color variations within the input image to only gray color gradient. Each pixel in a color image has three components namely R Red, G-Green, B-Blue i.e. {R,G,B} and each component has a numeric range of 0-255. Grayscaling converts all the colors in the color image to a range of Black{0,0,0} -- Gray{128,128,128} -> White{255,255,255} colors. FIG.4 illustrates a grayscale conversion of an exemplary color image, in accordance with an embodiment of the present disclosure. In an embodiment, an Open Source

Computer Vision Library's (OpenCV's) API method that may be used for grayscaling is as given below. org.opencv.core.MatgrayScaleImage=org.opencv.imgcodecs.Imgcodecs.imread (colorImagepath,org.opencv.imgcodecs.Imgcodecs.CVLOADIMAGEGRAYSC ALE) (with reference to OpenCV Java API V.3.1.0)

[042] Further, to reduce probability of choosing unnecessary noise within the input image as edges, certain amount of blurring may be done using OpenCV's component for Gaussian's Blur Algorithm, wherein the pixel's gradient is adjusted based on the pixel value of adjacent pixels from all sides. FIG.5 illustrates blurring of an exemplary image, in accordance with an embodiment of the present disclosure. In an embodiment, an OpenCV API method that may be used for blurring is as given below. org.opencv.imgcodecs.Imgproc.blur(sourceMat,destinationMat, new Size(n,m)); (with reference to OpenCV Java API V.3.1.0) where sourceMat and destinationMat are of type org.opencv.core.Mat and represents input and output respectively. new Size(n,m) is the range of pixels to be considered to do the blurring i.e. for a candidate pixel with n, m = 20 all side 10 pixel's weighted sum may be considered to achieve a blurring value.

[043] Pixel density adjustment and histogram equalization may also be employed to remove background noise and enhance contrast respectively.

[044] Further, edges may be detected in the pre-processed input image, wherein low to high pixel gradient changes are identified. For instance, Canny's Algorithm for edge detection may be applied, using OpenCV's component. FIG.6 illustrates edge detection using Canny's Algorithm, in accordance with an embodiment of the present disclosure. In an embodiment, an OpenCV API method that may be used for edge detection is as given below. org.opencv.imgcodecs.Imgproc.Canny(sourceMat,destinationMat, threshold], threshold2, aperture, L2gradient); (with reference to OpenCV Java API V.3.1.0) where sourceMat and destinationMat are of type org.opencv.core.Mat and represents input and output respectively; threshold], threshold2 and aperture are numeric values which decide edge detection intensity; L2gradient is a Boolean and default set to true in accordance with the present disclosure. Based on different combination of values to the above input, for optimum result following values may be considered: threshold] = 50, threshold2 = 150 and aperture= 3.

[045] Once the edges are detected, erosion technique may be applied to reduce thickness of the detected edges to remove thin edges considered as noise. Erosion thickens dark color pixels around light colored pixels. FIG.7 illustrates erosion transformation of an exemplary image, in accordance with an embodiment of the present disclosure. In an embodiment, an OpenCV API method that may be used for erosion is as given below. org.opencv.imgcodecs.Imgproc.erode(sourceMat,destinationMat, kernel); (with reference to OpenCV Java API V.3.1.0) wherein kernel is defined structuring element size in Mat. In accordance with the present disclosure, size may be defined as the average size in pixel of the text character (figure or structuring element) against the size of the image in pixel.

[046] Further, OpenCV's component may be used to deduce contours around each of the detected edges. FIG.8 illustrates contour detection in an exemplary image, in accordance with an embodiment of the present disclosure. The contour basically bounds each figure in the input image that constitutes structuring elements for identifying potential text in the input image. In an embodiment, an OpenCV API method that may be used for deducing contours is as given below. org.opencv.imgcodecs.Imgproc.findContours(erodeMat,contours, hierarchy, org.opencv.imgcodecs.Imgproc.RETRCCOMP, org.opencv.imgcodecs.Imgproc.CHAINAPPROXSIMPLE, new Point(0, 0)); (with reference to OpenCV Java API V.3.1.0) where, erodeMat is output from erosion method of OpenCV, contours is a list of org.opencv.core.MatOfPoints. hierarchy is again a Mat object (not used in the current requirement), RETRCCOMP and CHAINAPPROXSIMPLE are mode and method respectively in OpenCV to define contour.

[047] Accordingly, in accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to deduce, at step 202, a contour around each figure in an input image and bound each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle. In an embodiment, an OpenCV API method that may be used for bounding each of the contours with a rectangle is as given below. org.opencv.core.Rectrect = org.opencv.imgcodecs.Imgproc.boundingRect(contours.get(idx)); (with reference to OpenCV Java API V.3.1.0) where rect is the rectangle identified bounding each of the contours. Finally the list of rect is collected and the same is redrawn in the actual image for illustration. FIG.9 illustrates bounding rectangles in an exemplary image, in accordance with an embodiment of the present disclosure.

[048] After the bounding rectangles are identified, in accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to recursively merge, at step 204, two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, wherein each of the one or more overall bounding contours encompass at least some of the deduced contours. In an embodiment, the pre-defined threshold pixel distance is based on the input image size and text to text distance in pixels. In an embodiment, for the exemplary input image under consideration, maximum distance to consider between two rectangles to be merged is image height x 0.04 i.e. 4% of the height of the input image. Again, more the density of characters (figures) in the image smaller the value of the threshold distance. The distance between each individual rectangle is calculated using respective rectangles center points. To identify the center (x, y) coordinates of any rectangle, Pythagoras

Theorem may be applied against 12 of length and 12 of breadth of the rectangle. The method in accordance with the present disclosure starts with any one rectangle and tries to mark all the rectangles which are within the pre-defined threshold pixel distance. Once all the possible nearest rectangles are found, the method is recursively called to check again starting with the first marked rectangle. Meanwhile, the minimum x and y axis value out of all the rectangles' top left points are recorded. Similarly the maximum of all x and y axis values out of the marked rectangles' bottom right points are recorded. After all the possible rectangles are traversed, final bounding rectangle is created based on top left point as (minX, minY) and bottom right point as (maxX, maxY). Once the bounding rectangle is deduced, all the rectangles within the new bounding area are discarded. FIG.10 illustrates logic in detection of potential text areas, in accordance with an embodiment of the present disclosure. The method of the present disclosure traverses in a numbering sequence shown in FIG.10. After traversing all the rectangles within the pre-defined threshold pixel distance, it finds the minimum and maximum (X,Y) coordinates out of all the rectangles traversed. So a potential text area is defined as RectFinal.TopLeftPoint(minX, minY) and RectFinal.BottomRightPoint(maxX, maxY), indicated by an overall bounding contour.

[049] The method 200 of the present disclosure is now explained further based on specific retail receipt image. FIG.3A is an illustrative image of an exemplary retail receipt to be processed based on the method 200 of the present disclosure. To identify the text area within the image, firstly, the image may be pre-processed by grayscaling. FIG.11A illustrates grayscale conversion of the exemplary image of FIG.3A, in accordance with an embodiment of the present disclosure. Further brightness and contrast of the image of FIG.11A may be enhanced as illustrated in FIG.11B. Further pre-processing may be performed by applying blurring effect to remove noise and edge detection by applying, say Canny's algorithm may be performed. FIG.12A illustrates output of edge detection on the exemplary image of FIG.11B using Canny's Algorithm, in accordance with an embodiment of the present disclosure. After the edges are detected, contours are deduced around each figure in the image and are further bound by minimum bounding rectangles as illustrated in FIG.12B. Once the rectangles are identified, two or more rectangles are merged in accordance with the present disclosure to obtain one or more overall bounding contours indicated by bold black rectangles. FIG.13A illustrates two overall bounding contours in the exemplary image of FIG.12B, in accordance with an embodiment of the present disclosure.

[050] In accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to compute, at step 206, density of the rectangles within the one or more overall potential text areas. In accordance with an embodiment of the present disclosure, the one or more processors 104 are configured to identify, at step 208, one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles. In accordance with the present disclosure, the step of identifying one or more candidate text areas comprises: selecting one or more of the one or more potential text areas having density of the rectangles greater than a pre-defined threshold density, wherein the pre-defined threshold density is based on empirical observations; and cropping off the selected one or more potential text areas to identify the one or more candidate text areas for the further processing.

[051] Accordingly, if the density of the rectangles is found to be greater than the threshold density, that text area is considered for processing by the OCR engine, else the identified text area is discarded. FIG.13B illustrates the final image to be considered for further processing by the OCR engine. The cropped section of the entire image as illustrated in FIG.13B contains most of the text and hence post OCR, percentage of noise in the output reduces drastically.

[052] To further enhance output of the OCR, the present disclosure provides post processing methods on the cropped section illustrated in FIG.13B prior to feeding the image to the OCR engine. In accordance with the present disclosure, the one or more processors 104 may be further configured to de-skew, at step 210, the one or more candidate text areas for alignment. For instance, if the cropped section of FIG.13B is skewed, it may be processed further and aligned. In accordance with the present disclosure, the one or more processors 104 may be further configured to apply, at step 212, one or more morphology techniques to enhance text within the aligned one or more candidate text areas. Again, in accordance with the present disclosure, the one or more processors 104 may be further configured to further process, at step 214, the aligned one or more candidate text areas, wherein the further processing comprises processing by an Optical Character Recognition (OCR) engine.

[053] Text detection from an image/video is useful in many applications such as vehicle license detection and recognition, retail receipt scanning, medical report transcription, land record scanning, and the like. In accordance with the present disclosure, systems and methods of the present disclosure facilitate identifying required sections of an image that should be fed to an OCR engine to extract text with precision. To further enhance output of the OCR engine, the image maybe subjected to one or more pre-processing and one or more post processing steps on the method of the present disclosure.

[054] The written description describes the subject matter herein to enable any person skilled in the art to make and use the embodiments. The scope of the subject matter embodiments is defined by the claims and may include other modifications that occur to those skilled in the art. Such other modifications are intended to be within the scope of the claims if they have similar elements that do not differ from the literal language of the claims or if they include equivalent elements with insubstantial differences from the literal language of the claims.

[055] It is to be understood that the scope of the protection is extended to such a program and in addition to a computer-readable means having a message therein; such computer-readable storage means contain program-code means for implementation of one or more steps of the method, when the program runs on a server or mobile device or any suitable programmable device. The hardware device can be any kind of device which can be programmed including e.g. any kind of computer like a server or a personal computer, or the like, or any combination thereof. The device may also include means which could be e.g. hardware means like e.g. an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA), or a combination of hardware and software means, e.g. an ASIC and an FPGA, or at least one microprocessor and at least one memory with software modules located therein. Thus, the means can include both hardware means and software means. The method embodiments described herein could be implemented in hardware and software. The device may also include software means. Alternatively, the embodiments may be implemented on different hardware devices, e.g. using a plurality of CPUs.

[056] The embodiments herein can comprise hardware and software elements. The embodiments that are implemented in software include but are not limited to, firmware, resident software, microcode, etc. The functions performed by various modules described herein may be implemented in other modules or combinations of other modules. For the purposes of this description, a computer usable or computer readable medium can be any apparatus that can comprise, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

[057] The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words "comprising," "having," "containing," and "including," and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items, or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise.

[058] Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term "computer-readable medium" should be understood to include tangible items and exclude carrier waves and transient signals, i.e., be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.

[059] It is intended that the disclosure and examples be considered as exemplary only, with a true scope and spirit of disclosed embodiments being indicated by the following claims.

Claims

CLAIMS:

1. A processor implemented method comprising: deducing a contour around each figure in an input image to obtain one or more deduced contours and bounding each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merging two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours, wherein the pre-defined threshold pixel distance is based on the input image size and text to text distance in pixels; computing density of the rectangles within the one or more overall potential text areas; and identifying one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles, wherein the step of identifying one or more candidate text areas comprises: selecting one or more of the one or more potential text areas having density of the rectangles greater than a pre-defined threshold density, wherein the pre-defined threshold density is based on empirical observations; and cropping off the selected one or more potential text areas to identify the one or more candidate text areas for further processing.

2. The processor implemented method of claim 1, wherein the step of deducing the contour around each figure in the input image is preceded by one or more of: de-skewing the input image to correct rotation of the input image; pre-processing the input image to enhance quality of the input image by performing one or more of: converting the input image to grayscale; introducing blurriness, pixel density adjustment and histogram equalization; detecting edges in the pre-processed input image; applying erosion to reduce thickness of the detected edges; and deducing the contour around the detected edges.

3.The processor implemented method of claim 1 further comprising: de-skewing the one or more candidate text areas for alignment; applying one or more morphology techniques to enhance text within the aligned one or more candidate text areas; and further processing the aligned one or more candidate text areas.

4. The processor implemented method of claim 1, wherein the further processing comprises processing by an Optical Character Recognition (OCR) engine.

5. A system comprising: one or more data storage devices operatively coupled to one or more hardware processors and configured to store instructions configured for execution by the one or more hardware processors to: deduce a contour around each figure in an input image to obtain one or more deduced contours and bound each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merge two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours, wherein the pre-defined threshold pixel distance is based on the input image size and text to text distance in pixels; compute density of the rectangles within the one or more overall potential text areas; and identify one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles, wherein the one or more hardware processors are further configured to identify one or more candidate text areas by: selecting one or more of the one or more potential text areas having density of the rectangles greater than a pre-defined threshold density, wherein the pre-defined threshold density is based on empirical observations; and cropping off the selected one or more potential text areas to identify the one or more candidate text areas for further processing.

6. The system of claim 5, wherein the one or more hardware processors are further configured to perform one or more of: de-skewing the input image to correct rotation of the input image; pre-processing the input image to enhance quality of the input image by performing one or more of: converting the input image to grayscale; introducing blurriness, pixel density adjustment and histogram equalization; detecting edges in the pre-processed input image; applying erosion to reduce thickness of the detected edges; and deducing the contour around the detected edges, prior to deducing the contour around each figure in the input image.

7. The system of claim 6, wherein the one or more hardware processors are further configured to: de-skew the one or more candidate text areas for alignment; apply one or more morphology techniques to enhance text within the aligned one or more candidate text areas; and further process the aligned one or more candidate text areas.

8. The system of claim 6, wherein the one or more hardware processors are further configured to perform the further processing by an Optical Character Recognition (OCR) engine.

9. A computer program product comprising a non-transitory computer readable medium having a computer readable program embodied therein, wherein the computer readable program, when executed on a computing device, causes the computing device to: deduce a contour around each figure in an input image to obtain one or more deduced contours and bound each of the contours with a rectangle, wherein each figure constitutes structuring elements for identifying potential text and the rectangle is a minimum bounding rectangle; recursively merge two or more of the rectangles based on a pre-defined threshold pixel distance to obtain one or more overall bounding contours corresponding to one or more potential text areas within the input image, each of the one or more overall bounding contours encompassing at least some of the deduced contours, wherein the pre-defined threshold pixel distance is based on the input image size and text to text distance in pixels; compute density of the rectangles within the one or more overall potential text areas; and identify one or more candidate text areas for further processing from the one or more potential text areas based on the density of the rectangles, wherein the step of identifying one or more candidate text areas comprises: selecting one or more of the one or more potential text areas having density of the rectangles greater than a pre-defined threshold density, wherein the pre-defined threshold density is based on empirical observations; and cropping off the selected one or more potential text areas to identify the one or more candidate text areas for further processing.

10. The computer program product of claim 9, wherein the computer readable program further causes the computing device to perform: de-skewing of the one or more candidate text areas for alignment; applying one or more morphology techniques to enhance text within the aligned one or more candidate text areas; and further processing the aligned one or more candidate text areas.