US20130194448A1 - Rules for merging blocks of connected components in natural images - Google Patents

Rules for merging blocks of connected components in natural images Download PDF

Info

Publication number
US20130194448A1
US20130194448A1 US13/748,574 US201313748574A US2013194448A1 US 20130194448 A1 US20130194448 A1 US 20130194448A1 US 201313748574 A US201313748574 A US 201313748574A US 2013194448 A1 US2013194448 A1 US 2013194448A1
Authority
US
United States
Prior art keywords
block
projection
merged
test
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/748,574
Inventor
Pawan Kumar Baheti
Ankit Agarwal
Dhananjay Ashok Gore
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Priority to US13/748,574 priority Critical patent/US20130194448A1/en
Priority to PCT/US2013/023012 priority patent/WO2013112753A1/en
Priority to PCT/US2013/023003 priority patent/WO2013112746A1/en
Priority to PCT/US2013/022994 priority patent/WO2013112738A1/en
Assigned to QUALCOMM INCORPORATED reassignment QUALCOMM INCORPORATED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BAHETI, PAWAN KUMAR, GORE, DHANANJAY ASHOK, AGARWAL, ANKIT
Priority to US13/791,188 priority patent/US9064191B2/en
Priority to US13/831,237 priority patent/US9076242B2/en
Priority to IN2654MUN2014 priority patent/IN2014MN02654A/en
Priority to PCT/US2013/051144 priority patent/WO2014015178A1/en
Publication of US20130194448A1 publication Critical patent/US20130194448A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • G06K9/00456
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • G06V30/18086Extraction of features or characteristics of the image by performing operations within image blocks or by using histograms
    • G06V30/18095Summing image-intensity values; Projection and histogram analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • This patent application relates to devices and methods for applying rules (called “clustering rules”) to check whether or not blocks of one or more regions in an image should be merged, prior to classification of the blocks as text or non-text.
  • Identification of text regions in documents that are scanned is significantly easier than detecting text regions in images generated by a handheld camera, of scenes in the real world (also called “natural images”).
  • OCR optical character recognition
  • the document image contains a series of lines of text (e.g. 20 lines of text) of a scanned page in a document.
  • Document processing techniques although successfully used on scanned documents created by optical scanners, generate too many false positives and/or negatives so as to be impractical when used on natural images.
  • detection of text regions in a real world image generated by a handheld camera is performed using different techniques.
  • Image processing techniques of the prior art described above appear to be developed primarily to identify regions in images that contain text which is written in the language English. Use of such techniques to identify in natural images, regions of text in other languages that use different scripts for letters of their alphabets can result in false positives and/or negatives so as to render the techniques impractical.
  • FIG. 1A illustrates a newspaper 100 in the real world in India.
  • a user 110 may use a camera-equipped mobile device (such as a cellular phone) 108 to capture an image 107 of newspaper 100 .
  • Camera captured image 107 may be displayed on a screen 106 of mobile device 108 .
  • Such an image 107 FIG. 1C
  • text-containing regions of a camera-captured image may be classified as non-text and vice versa e.g. due to variations in lighting, color, tilt, focus, etc.
  • One or more prior art criteria that are used by a classifier to identify text in natural images can be relaxed, so that blocks 103 A- 103 D are then classified as text, but on doing so one or more portions of another region 105 ( FIG. 1C ) may coincidentally satisfy the relaxed criteria, and blocks in region 105 may be then mis-classified as text although these blocks contain graphics (e.g., pictures of cars in FIG. 1B ).
  • a natural image 107 ( FIG. 1C ) is processed by a prior art method to form rectangular blocks
  • certain portions of text may be omitted from a rectangular block that is classified as text.
  • pixels in such text portions may be separated from (i.e. not contiguous with) pixels that form the remainder of text in the rectangular block, due to pixels at a boundary of the rectangular block not satisfying a prior art test used to form the rectangular block.
  • Such omission of pixels of a portion of text, from a rectangular block adjacent to the portion is illustrated in FIG. 1C at least twice. See pixels of text to the left of block 103 B, and see pixels of text to the left of block 103 C (in FIG. 1C ).
  • Omission of text portions from rectangular blocks of a natural image can result in errors, when such incomplete blocks are further processed after classification, e.g. by an optical character recognition (OCR) system.
  • OCR optical character recognition
  • an electronic device and method use a camera to capture an image of an environment outside the electronic device followed by identification of blocks that enclose regions of pixels in the image, with each region being initially included in a single block.
  • each region may be identified to have pixels contiguous with one another and including a local extrema (maxima or minima) of intensity in the image, e.g. a maximally stable extremal region (MSER).
  • MSER maximally stable extremal region
  • each block that contains such a region (which may constitute a “connected component”) is tested for presence of a line of pixels binarizable to a common value (“pixel-line-present” block), followed by identification of one or more blocks adjacent thereto which are then tested for merger as follows.
  • One or more processors of several embodiments execute computer instructions (also called “first instructions”) to test for overlap of projections, between a projection of a pixel-line-present block on to a line (e.g. x-axis) and another projection of an adjacent block on to the same line (e.g. x-axis also).
  • first instructions computer instructions
  • second instructions computer instructions
  • An additional test that may be performed prior to merger of two blocks may be based on, for example, relative heights of the two blocks, and/or aspect ratio of either or both blocks, etc.
  • Information on a merged block that is obtained as a result of merging of two or more blocks is stored in memory by one or more processors executing computer instructions (also called “third instructions”).
  • the merged block is then processed further in certain embodiments, e.g. subject to verification of presence of a pixel line, and followed by classification of the merged block as text or non-text.
  • classification of a merged block (with multiple connected components therein) as text or non-text may use one or more predetermined attributes of the merged block, such as location and thickness of a line of pixels binarizable to a common binary value and oriented longitudinally in the merged block (e.g. parallel to or within a small angle of, whichever side of the block is longer).
  • the just-described classification of the merged block may additionally or alternatively use: a number of lines of pixels oriented laterally in the merged block (e.g.
  • one or more of: identification of blocks, testing for overlap of projections on to a common line, merger of blocks that satisfy tests, followed by text/non-text classification as described above are performed by one or more processor(s) operatively coupled to memory and configured to execute computer instructions stored in the memory (or in another non-transitory computer readable storage media).
  • one or more non-transitory storage media include a plurality of computer instructions, which when executed, cause one or more processors in a handheld device to perform operations, and these computer instructions include computer instructions to perform one or more of: identification of blocks, testing for overlap of projections on to a common line, merger of blocks that satisfy tests, followed by text/non-text classification described above.
  • one or more acts of the type described above are performed by a mobile device (such as a smart phone) that includes a camera, a memory operatively connected to the camera to receive images therefrom, and at least one processor operatively coupled to the memory to execute computer instructions stored in the memory (or in another non-transitory computer readable storage media).
  • the processor processes an image to check two blocks that are adjacent to one another in the image for satisfying one or more predetermined rules (e.g. based on geometric attributes of the blocks), and on finding the rule(s) to be satisfied merging the two blocks to generate a merged block, subsequently classifying the merged block as text or non-text, followed by OCR of blocks that are classified as text (in the normal manner).
  • an apparatus includes several means implemented by logic in hardware or logic in software or a combination thereof, to perform one or more acts described above.
  • FIG. 1A illustrates a newspaper of the prior art, in the real world in India.
  • FIG. 1B illustrates a user using a camera-equipped mobile device of the prior art to capture an image of a newspaper in the real world.
  • FIG. 1C illustrates blocks formed by identifying connected components in a portion of the image of FIG. 1B by use of a prior art method.
  • FIG. 2 illustrates, in a high-level flow chart, various acts performed by a mobile device in a method of identifying regions to merge in some aspects of the described embodiments.
  • FIG. 3A illustrates a memory of a mobile device during application of a predetermined test to detect pixel line presence, in illustrative aspects of the described embodiments.
  • FIG. 3B illustrates, in an intermediate-level flow chart, various acts performed by a mobile device to implement a predetermined test to detect pixel line presence, in some aspects of the described embodiments.
  • FIG. 3C illustrates another projection profile of English text in prior art.
  • FIG. 4A illustrates an example of text in a prior art image.
  • FIGS. 4B-4F illustrate formation of a block by use of the method of FIG. 2 in illustrative aspects of the described embodiments.
  • FIG. 5 illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to identify blocks that can be merged as per operation 230 in FIG. 2 , by application of three sets of rules.
  • FIG. 6A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of modifiers or accent marks.
  • FIGS. 6B , 6 C, and 6 D illustrate examples of text wherein blocks 621 and 622 are determined to be merged by applying the set of rules as per the method of FIG. 6A .
  • FIG. 6E illustrates another example of text wherein blocks 631 and 632 are determined to be merged by applying the set of rules as per the method of FIG. 6A .
  • FIG. 7A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of broken words.
  • FIG. 7B illustrates the example of text of FIG. 6B wherein blocks 620 , 621 and 623 are determined to be merged by applying the second set of tests as per the method of FIG. 7A .
  • FIG. 8A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of half letters.
  • FIG. 8B illustrates another example of text wherein blocks 821 and 822 are determined to be merged by applying the third set of tests as per the method of FIG. 8A .
  • FIG. 9 illustrates parameters 911 - 915 computed for use in a neural network that performs classification of blocks into text or non-text, as per operation 240 ( FIG. 2 ), performed by a mobile device in some aspects of the described embodiments.
  • FIG. 10 illustrates, in a block diagram, a mobile device including processor and memory of the type described above, in some aspects of the described embodiments.
  • FIG. 11 illustrates, in a block diagram, computer instructions in a memory 1012 of the described embodiments, to perform several of the operations illustrated in FIG. 2 .
  • FIG. 12 illustrates, in a high-level flow chart, various acts performed by a mobile device in an alternative method of identifying regions to merge in some aspects of the described embodiments.
  • a number of regions of an image of a real world scene are initially identified in several aspects of the described embodiments, in the normal manner.
  • a mobile device e.g. a smartphone or a tablet which can be held in a human hand
  • Mobile device 200 of some embodiments includes one or more processors 1013 ( FIG.
  • Merger software 141 of some embodiments when executed by one or more processors, identifies blocks of regions in an image (in memory) that can be merged with one another, as described in U.S. application Ser. No. 13/748,539, Attorney Docket No. Q111559Usos, filed concurrently herewith, entitled “Identifying Regions of Text to Merge In A Natural Image or Video Frame” which is incorporated herein by reference in its entirety, above.
  • Blocks that are identified as candidates for merger are thereafter subject to certain predetermined rules (also called clustering rules) as described below, and when these rules are found to be satisfied the blocks are merged, even though it is not known whether the blocks are text or non-text.
  • an image 107 received by a processor 1013 of mobile device 200 in certain described embodiments, as per act 211 in FIG. 2 is a snapshot (in a set of snapshots generated by a digital camera) or a video frame (in a stream of video frames generated by a video camera) or any image stored in memory and retrieved therefrom.
  • image 107 is not generated by an optical scanner, of a copier or printer and instead image 107 is generated by a hand-held camera, as noted above.
  • image 107 is generated by an optical scanner, of a copier or printer, from printed paper.
  • processor 1013 which is programmed to perform one or more acts described herein is external to mobile device 200 , e.g. included in a server to which mobile device 200 is operatively coupled by a wireless link.
  • processor 1013 in described embodiments identifies, as per act 212 in FIG. 2 , a set of regions (also called “blobs”) in image 107 with boundaries that differ from surrounding pixels in a predetermined manner (as specified in a parameter input to the method) from surrounding pixels in one or more properties, such as intensity and/or color.
  • Some methods that may be used in act 212 first identify a pixel of local minima or maxima (also called “extrema”) of a property (such as intensity) in the image, followed by identifying neighboring pixels that are contiguous with one another and with the identified extrema pixel, within a range of values of the property that is obtained in a predetermined manner, so as to identify in act 212 an MSER region.
  • MSERs that are identified in act 212 of some embodiments are regions that are geometrically contiguous (with any one pixel in the region being reachable from any other pixel in the region by traversal of one or more pixels that contact one another in the region) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). Boundaries of MSERs may be used as connected components in some embodiments described herein, to identify regions of an image, as candidates for recognition as text.
  • regions in image 107 are automatically identified in act 212 based on variation in intensities of pixels by use a method of the type described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions” Proc. Of British Machine Vision Conference, pages 384-396, published 2002 that is incorporated by reference herein in its entirety.
  • the time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety.
  • Another such method is described in, for example, an article entitled “Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions” by Chen et al, IEEE International Conference on Image Processing (ICIP), September 2011 that is incorporated by reference herein in its entirety.
  • Such a lookup table may supply one or more specific combinations of values for the parameters ⁇ and Max Variation, which are input to an MSER method (also called MSER input parameters).
  • MSER method also called MSER input parameters.
  • Such a lookup table may be populated ahead of time, with specific values for ⁇ and Max Variation, e.g. determined by experimentation to generate contours that are appropriate for recognition of text in a natural image, such as value 8 for ⁇ and value 0.07 for Max Variation.
  • Block 302 is automatically marked as having a pixel line present along straight line 304 by a test in act 222 of some embodiments that compares the number of black pixels occurring along straight line 304 to the number of black pixels occurring along other lines passing through block 302 .
  • act 222 compares the number of pixels along multiple lines parallel to one another in a longitudinal direction of block 302 , for example as discussed below.
  • act 222 determines that a pixel line is present in block 302 along straight line 304 when straight line 304 is found to have the maximum number of black pixels (relative to all the lines tested in block 302 ). In another example, act 222 further checks that the maximum number of black pixels along straight line 304 is larger than a mean of black pixels along the lines being tested by a predetermined amount and if so then block 302 is determined to have a pixel line present therein. The same test or a similar test may be alternatively performed with white pixels in some embodiments of act 222 . Moreover, in some embodiments of act 212 , the same test or a similar test may be performed on two regions of an image, namely the regions called MSER+and MSER ⁇ , generated by an MSER method (with intensities inverted relative to one another).
  • block 302 is subdivided into rows oriented parallel to the longitudinal direction of block 302 .
  • Some embodiments of act 222 prepare a histogram of counters, based on pixels identified in a list of positions indicative of a region, with one counter being used for each unit of distance (“bin” or “row”) along a height (in a second direction, which is perpendicular to a first direction (e.g. the longitudinal direction)) of block 302 .
  • block 302 is oriented with its longest side along the x-axis, and act 222 is performed by sorting pixels by their y-coordinates followed by binning (e.g.
  • a result of act 222 in the just-described example is that a pixel line (of black pixels) has been found to be present in block 302 .
  • processor 1013 is programmed to perform an act 223 to mark in a storage element 381 of memory 1012 (by setting a flag), based on a result of act 222 , e.g. that block 302 has a line of pixels present therein (or has no pixel line present, depending on the result).
  • block 302 may be identified in some embodiments as having a pixel line present therein, by including an identifier of block 302 in a list 1501 of identifiers ( FIG. 10 ) in memory 1012 .
  • mobile device 200 uses each block 302 that has been marked as pixel-line-present in act 222 , to start looking for and marking in memory 1012 (e.g. in a list 1502 in FIG. 10 ), any block (pixel-line-present or pixel-line-absent) that is located physically adjacent to the block 302 (which is marked pixel-line-present and has no other block located there-between).
  • memory 1012 e.g. in a list 1502 in FIG. 10
  • any block pixel-line-present or pixel-line-absent
  • blocks 402 , 404 and 405 are marked in a memory as “adjacent” blocks.
  • Act 231 is performed repeatedly in a loop in some embodiments, until all adjacent blocks in an image are identified followed by the act 232 , although other embodiments repeat the act 231 after performance of acts 232 and 233 (described next).
  • a block which is marked by operation 220 as pixel-line-present may have a region representing a non-text feature in the image, e.g. a branch of a tree, or a light pole.
  • Another block of the image, similarly marked by operation 220 as pixel-line-present may have a region representing a text feature in the image, e.g. text with the format strike-through (in which a line is drawn through middle of text), or underlining (in which a line is drawn through bottom of text), or shiro-rekha (a headline in Devanagari script). So, operation 220 is performed prior to classification as text or non-text, any pixels in the regions that are being processed in operation 220 .
  • block 302 which is marked in memory 1012 as “pixel-line-present”, contains an MSER whose boundary may (or may not) form one or more characters of text in certain languages.
  • characters of text may contain and/or may be joined to one another by a line segment formed by pixels in contact with one another and spanning across a word formed by the multiple characters, as illustrated in FIG. 3A . Therefore, a merged block formed by operation 230 which may contain text, or alternatively non-text, is subjected to an operation 240 (also called “verification” operation).
  • Pixel intensities that are used in binarization and in pixel-line-presence test in operation 240 are of all pixels in the merged block, which may include pixels of a pixel-line-present block (which contains a core portion of text) and pixels of an adjacent block which may be a pixel-line-absent block (which contain(s) supplemental portion(s) of text, such as accent marks).
  • pixels in a merged block on which operation 240 is being performed have not yet been classified as text or non-text, hence the pixel-line-presence test may or may not be met by the merged block, e.g. depending whether or not a line of pixels is present therein (based on the blocks being merged).
  • processor(s) 1013 of some embodiments form a feature vector for each sub-bock and then decode the feature vector, by comparison to corresponding feature vectors of letters of a predetermined alphabet, to identify one or more characters (e.g. alternative characters for each block, with a probability of each character), and use one or more sequences of the identified characters with a repository of character sequences, to identify and store in memory 1012 (and/or display on a touch-sensitive screen 1001 or a normal screen 1002 ) a word identified as being present in the merged block.
  • processor(s) 1013 of some embodiments form a feature vector for each sub-bock and then decode the feature vector, by comparison to corresponding feature vectors of letters of a predetermined alphabet, to identify one or more characters (e.g. alternative characters for each block, with a probability of each character), and use one or more sequences of the identified characters with a repository of character sequences, to identify and store in memory 1012 (and/or display on a touch-sensitive screen 1001 or a normal
  • a line segment 305 contains characters of text, rather than details of natural features (such as leaves of a tree or leaves of plants, shrubs, and bushes) that are normally present in a natural image. Presence of text of certain languages in a natural image results in pixels 303 A- 303 N ( FIG. 3A ) that may form a line segment 305 in block 302 .
  • a line segment 305 of pixels that is detected in operation 220 may be oriented longitudinally relative to a block 302 ( FIG. 3A ), or oriented laterally relative to block 302 , or block 302 may contain both longitudinally-oriented lines and laterally-oriented lines of pixels.
  • block 403 has one longitudinal line of black pixels 403 T ( FIG. 4C ) and two lateral lines of black pixels 403 A and 403 B ( FIG. 4B ), while block 404 has three lateral lines of black pixels (not labeled) and one longitudinal line of black pixels (not labeled).
  • a pixel-line-presence test used in act 222 ( FIG. 2 ) of some embodiments may be selected based on a language likely to be found in an image, as per act 202 described next.
  • selection of a pixel-line-presence test is made in act 202 based on a language that is identified in memory 1012 as being used by a user of mobile device 200 (e.g. in user input), or as being used in a geographic location at which mobile device 200 is located in real world (e.g. in a table).
  • memory 1012 in mobile device 200 of some embodiments includes user input wherein the user has explicitly identified the language.
  • the language identified by processor 1013 is Hindi
  • the pixel-line-presence test that is selected in act 202 ( FIG. 2 ) is used identically when each of blocks 402 , 403 and 404 ( FIG. 4B ) is evaluated by act 211 ( FIG. 2 ) to identify presence of a pixel line that is a characteristic of the language Hindi, namely a shiro-rekha (also called “header line”).
  • the pixel-line-presence test that is selected may test for pixels of a common binary value arranged to form a line segment 305 that is aligned with a top side of block 302 and located in an upper portion of block 302 (e.g. located within an upper one-third of the block, as described below in reference to a peak-location preset criterion in reference to FIG. 3A ).
  • a block 403 ( FIG. 4B ) that is marked as pixel-line-present may have one or more adjacent blocks such as block 405 that contain one or more portions of text, such as an accent mark. Due to the image being captured by a camera from a scene, there may be numerous other blocks (not shown in FIG. 4B ) in the image which may have a similar configuration (in pixel intensities and locations), but such other blocks may (or may not) constitute details of natural features (such as leaves of plants, shrubs, and bushes), rather than portions of text.
  • some embodiments compare a current pixel's intensity with the mean intensity of pixels in block, and also to the mean intensity of pixels in one or more MSERs included in the block, and if the current pixel's intensity is closer to the mean intensity of MSER(s) then the current pixel's binary value is set to 1 (in act 314 in FIG. 3B ), else the binary value is set to 0 (in act 315 in FIG. 3B ).
  • the just-described binarization technique is just one example, and other embodiments may apply other techniques that are readily apparent in view of this disclosure.
  • the current pixels' intensity may be compared to just a mean intensity across all pixels in block 302 , and if the current pixel's intensity exceeds the mean, the current pixel is marked as 1 (in act 314 ) else the current pixel is marked as 0 (in act 315 ).
  • mobile device 200 may be programmed to binarize pixels by 1) using pixels in a block to determine a set of one or more thresholds that depend on the embodiment, and 2) compare each pixel in the block with this set of thresholds, and 3) subsequently set the binarized value to 1 or 0 based on results of the comparison.
  • mobile device 200 identifies a row Hp that contains a maximum value Np of all projection counts N[ 0 ]-N[ 450 ] in block 302 , i.e. the value of peak 308 in graph 310 in the form of a histogram of counts of black pixels (alternatively counts of white pixels).
  • a row Hp e.g. counted in bin 130 in FIG. 3A
  • mobile device 200 computes a mean Nm, across the projection counts N[ 0 ]-N[ 450 ].
  • mobile device 200 checks (in act 331 ) whether the just-computed values Nm and Np satisfy a preset criterion on intensity of a peak 308 .
  • a peak-intensity preset criterion is Nm/Np ⁇ 1.75, and if not then the block 302 is marked as “pixel-line-absent” in act 332 and if so then block 302 may be marked as “pixel-line-present” in act 334 (e.g. in a location in memory 1012 shown in FIG. 3A as storage element 381 ).
  • another act 333 is performed to check whether a property of profile 311 satisfies an additional preset criterion.
  • the additional preset criterion is on a location of peak 308 relative to a span of block 302 in a direction perpendicular to the direction of projection, e.g. relative to height of block 302 .
  • a peak-location preset criterion may check where a row Hp (containing peak 308 ) occurs relative to height H of the text in block 302 .
  • peak-location preset criterion may be satisfied when Hp/H ⁇ r wherein r is a predetermined constant, such as 0.3 or 0.4. Accordingly, presence of a line of pixels is tested in some embodiments within a predetermined rage, such as 30% from an upper end of a block.
  • mobile device 200 When one or more such preset criteria are satisfied in act 334 , mobile device 200 then marks the block as “pixel-line-present” and otherwise goes to act 332 to mark the block as “pixel-line-absent.”
  • illustrative preset criteria have been described, other such criteria may be used in other embodiments of act 334 .
  • certain values have been described for two preset criteria, other values and/or other preset criteria may be used in other embodiments.
  • the peak-location preset criterion for Arabic may be 0.4 ⁇ Hp/H ⁇ 0.6, to test for presence of a peak in a middle 20 % region of a block, based on profiles for Arabic text shown and described in an article entitled “Techniques for Language Identification for Hybrid Arabic-English Document Images” by Ahmed M. Elgammal and Mohamed A. Ismail, believed to be published 2001 in Proc. of IEEE 6th International Conference on Document Analysis and Recognition, pages 1100-1104, which is incorporated by reference herein in its entirety. Note that although certain criteria are described for Arabic and English (see next paragraph), other similar criteria may be used for text in other languages wherein a horizontal line is used to interconnect letters of a word, e.g. text in the language Bengali (or Bangla).
  • embodiments may test for presence of two peaks (e.g. as shown in FIG. 3C for English text) in act 334 , so as to mark blocks of MSERs that satisfy the two-peak test, as “pixel-lines-present” or “pixel-lines-absent” followed by merging thereto of adjacent block(s) when certain predetermined rules are satisfied (for the English language), and followed by re-doing the two-peak test, in the manner described herein. Therefore, several such criteria will be readily apparent to the skilled artisan in view of this disclosure, based on one or more methods known in the prior art.
  • Devanagari use Devanagari to illustrate certain concepts
  • these concepts may be applied to languages or scripts other than Devanagari.
  • embodiments described herein may be used to identify characters in Korean, Chinese, Japanese, Greek, Hebrew and/or other languages.
  • processor 1013 of some embodiments checks (in act 336 in FIG. 3B ) if all blocks have been marked by one of acts 332 and 334 by performing an act 335 . If the answer in act 335 is no, then some embodiments of processor 1013 checks whether a preset time limit has been reached in performing the method illustrated in FIG. 3B and if not returns to act 301 and otherwise exits the method. In act 335 if the answer is yes, then processor 1013 goes to act 337 to identify adjacent blocks (as per act 231 ) that may be merged a rule in a plurality of predetermined rules (“clustering” rules) is met.
  • FIG. 4A illustrates an example of an image 401 in the prior art.
  • Image 401 is processed by performing a method of some embodiments as described above, and as illustrated in FIGS. 4B-4F , to form a merged block.
  • initialization in act 212 identifies MSERs to form blocks 402 - 405 , and in doing so an accent mark 406 of this example happens to be not identified as being included in any MSER, and therefore not included in any block.
  • a block 402 (also called “first block”) is identified in the example of FIG. 4B to include a first region in the image 401 with a first plurality of pixels (identified by a first set of positions) that are contiguous with one another and include a first local extrema of intensity in the image 401 .
  • a block 403 (also called “second block”) is identified in the example of FIG. 4B to include a second region in the image 401 with a second plurality of pixels (identified by a second set of positions) that are contiguous with one another and include a second local extrema of intensity in the image 401 .
  • each of blocks 402 - 405 illustrated in FIG. 4B is identified in act 212 .
  • an MSER method is used in some embodiments of act 212
  • other embodiments of act 212 use other methods that identify connected components.
  • blocks 402 - 405 are thereafter processed for pixel line presence detection, as described above in reference to operation 220 ( FIG. 2 ).
  • blocks 402 , 404 and 404 are tagged (or marked) as being “pixel-line-present” (see FIG. 4B ), and block 405 is tagged (or marked) as being “pixel-line-absent.”
  • image 401 has the polarity (or intensity) of its pixels reversed (as would be readily apparent to a skilled artisan, so that white pixels are changed to black and vice versa) and the reversed-polarity version of image 401 is then processed by act 212 ( FIG. 2 ) to identify blocks 411 - 413 ( FIG. 4C ). Blocks 411 - 413 are then processed by act 223 to mark them as pixel-line-absent.
  • each of blocks 402 - 404 (also called the pixel-line-present blocks) are checked for presence of any adjacent blocks that can be merged. Specifically, on checking the block 402 which is identified as pixel-line-present, for any adjacent blocks, a block 411 ( FIG. 4C ) is found. Therefore, the blocks 402 and 411 are evaluated by application of one or more clustering rules 503 ( FIG. 10 ) in block merging module 141 B ( FIG. 10 ).
  • the clustering rules 503 can be different for different scripts, e.g. depending on language to be recognized, in the different embodiments.
  • Clustering rules 503 to be applied in operation 230 may be pre-selected, e.g. based on external input, such as identification of Devanagri as a script in use in the real world scene wherein the image is taken.
  • the external input may be automatically generated, e.g. based on geographic location of mobile device 200 in a region of India, e.g. by use of an in-built GPS sensor as described above.
  • external input to identify the script and/or the geographic location may be received by manual entry by a user.
  • the identification of a script in use can be done differently in different embodiments.
  • one or more clustering rules 503 are predetermined for use in operation 230 .
  • blocks 402 and 411 are merged with one another, to form block 421 (see FIG. 4D ).
  • block 403 which has been identified as pixel-line-present is checked for any adjacent blocks and block 405 is found. Therefore, the blocks 403 and 405 are evaluated by use of clustering rule(s) 503 in block merging module 141 B ( FIG. 10 ), and the rules are met in this example, so blocks 403 and 405 are merged by block merging module 141 B to form block 422 in FIG. 4D .
  • blocks 404 and 413 are evaluated by use of clustering rule(s) 503 in block merging module 141 B ( FIG. 10 ), and the rules are met in this example, so blocks 404 and 413 are merged by block merging module 141 B to form block 423 (also called “merged” block) in FIG. 4D .
  • Merged blocks that are generated by block merging module 141 B as described above may themselves be further processed in the manner described above in operation 230 .
  • block 421 also called “merged” block
  • block 421 is used to identify any adjacent block thereto, and block 422 is found.
  • the block 421 (also called “merged” block) and block 422 are evaluated by use of clustering rule(s) in block merging module 141 B, and assuming the rules are met in this example, so block 421 (which is a merged block) and block 422 are merged by block merging module 141 B to form block 424 (also called “merged” block) in FIG. 4E .
  • block 423 (also called “merged” block) is used to identify any adjacent block, and therefore block 424 (also a “merged” block) is found.
  • the blocks 423 and 424 (both of which are merged blocks) are evaluated by use of clustering rule(s) in block merging module 141 B, and the rules are met in this example, so blocks 423 and 424 are merged by block merging module 141 B to form block 425 in FIG. 4F .
  • Block 425 (also a merged block) is thereafter processed by operation 230 , and on finding no adjacent blocks, it is then processed by operation 240 (see FIG. 2 ) in the normal manner.
  • One or more predetermined rules (“clustering rules”) 503 are used in some embodiments of operation 230 (described above, also called “merger” operation) by block merging module 141 B to decide whether or not to merge a block that is known to have a pixel line present therein (such as block 403 ) in an image, with one or more blocks adjacent to it (such as block 405 ), by performance of a method illustrated in FIG. 5 .
  • clustering rules are used in some embodiments of operation 230 (described above, also called “merger” operation) by block merging module 141 B to decide whether or not to merge a block that is known to have a pixel line present therein (such as block 403 ) in an image, with one or more blocks adjacent to it (such as block 405 ), by performance of a method illustrated in FIG. 5 .
  • three predetermined rules are applied by block merging module 141 B in operations 510 , 520 and 530 respectively, as described below.
  • operation 510 to apply a first rule (which is formulated based on accent marks or modifiers, called ‘maatras’ in Devanagri script), followed by operation 520 to apply a second rule (which is formulated based on a broken word), followed by operation 530 to apply a third rule (which is formulated based on half letters), other embodiments may perform these operations (or other such operations) in a different order relative to one another, or may omit one or more of these operations or perform additional operations, as will be readily apparent, to decide whether or not to merge blocks.
  • a first rule which is formulated based on accent marks or modifiers, called ‘maatras’ in Devanagri script
  • operation 520 to apply a second rule (which is formulated based on a broken word)
  • operation 530 to apply a third rule (which is formulated based on half letters)
  • other embodiments may perform these operations (or other such operations) in a different order relative to one another, or may omit one or more of these operations or
  • operation 230 when any one rule is satisfied, in a corresponding one of the operations 510 , 520 and 530 , then operation 230 (also called “merger” operation) is performed, regardless of whether or not the blocks have text therein.
  • the merged block On completion of operation 230 , in some embodiments the merged block is itself marked as pixel-line-present by block merging module 141 B, and therefore eligible for selection as the first block in act 501 (followed by act 502 in which an adjacent block is selected as the second block).
  • operation 230 is performed prior to classification and therefore it is not known to processor 1013 at the time of operations 510 , 520 and 530 , whether the blocks that are being merged have pixels that represent text or non-text in the image.
  • act 541 is performed to check if all blocks adjacent to the pixel-line-present block (i.e. first block) have been checked, and if not control returns to act 502 and another block that is adjacent to the first block is then selected as the second block.
  • act 541 when all blocks that are adjacent to the pixel-line-present block (i.e. first block) have been checked, control transfers to act 542 to check if all pixel-line-present blocks have been checked in the just-described manner and if not, control transfers to act 501 to select another pixel-line-present block as the first block.
  • control transfers to operation 240 also called “verification” operation
  • operation 240 also called “verification” operation
  • Operation 240 is followed by operation 250 wherein classification of merged blocks (as well as unmerged blocks) as text or non-text, is performed (as described above), which is then followed by optical character recognition.
  • Operations 510 , 520 and 530 of some embodiments check for of overlap between projections (see projection overlap rules 503 P in FIG. 10 ) of a pixel-line-present block and its adjacent block on to straight lines that are perpendicular to each other, e.g. x-axis and y-axis.
  • a projection of a block on to a straight line (also called “support”) can be same as a “span” of the block along the straight line, e.g. when the straight line is at an edge of the block.
  • a block's left edge and the y-axis may be coincident, in which case the vertical projection and the vertical span are identical, or the block's bottom edge and the x-axis may be coincident, in which case the horizontal projection and the horizontal span are identical.
  • processor 1013 determines whether or not the blocks include text or non-text regions of the image. Applying clustering rules 503 to blocks that happen to be adjacent, one of which has a pixel line present, but neither of which has yet been classified as text/non-text, enables processor 1013 to generate merged blocks on which verification is performed, followed by classification and OCR which is found to be more successful than in the prior art, as described below.
  • operation 510 performs a test (also called “first test”) to check for a first predetermined percentage of overlap (e.g. 100% overlap) between horizontal projections 621 H and 622 H (on to a straight line, e.g. the horizontal line 625 H) and performs another test (also called “second test”) to check for a second predetermined percentage of overlap (e.g. 0% overlap) between vertical projections 621 V and 622 V (on to an additional straight line, e.g. the vertical line 625 V) of blocks 621 and 622 .
  • first test also called “first test”
  • second test also called “second test”
  • blocks 621 and 622 being used in the just-described tests are selected such that they do not themselves overlap one another, as shown in FIG. 6D .
  • operation 520 checks for a third predetermined percentage of overlap (e.g. 0% overlap) between horizontal projections 621 H and 620 H, and a fourth predetermined percentage of overlap (e.g. 100% overlap) between vertical projections 621 V and 620 V, of blocks 621 and 620 that do not themselves overlap one another, as shown in FIG. 7B . Any two blocks do not overlap one another, when pixels of one block are not present in the other block and vice versa, and such blocks are used in tests of these two examples.
  • a clustering rule e.g. projection overlap rule 503 P in FIG. 10
  • operation 520 checks for a third predetermined percentage of overlap (e.g. 0% overlap) between horizontal projections 621 H and 620 H, and a fourth predetermined percentage of overlap (e.g. 100% overlap) between vertical projections 621 V and 620 V, of blocks 621
  • processor 1013 is programmed to use such clustering rules 503 that are predetermined, as described more completely below, to select two blocks to merge in block merging module 141 B, when the two blocks do not overlap one another, regardless of whether the blocks contain text or non-text. Merger of two or more non-overlapping blocks by block merging module 141 B s, when the predetermined rules are met as just described, results in a merged block on completion of operation 230 ( FIG. 2 ) which is then subjected to verification in operation 240 .
  • operation 510 includes acts 611 - 617 , described next.
  • act 611 FIG. 6A
  • mobile device 200 evaluates for merger with one another: a pixel-line-present (“first”) block (e.g. block 621 in FIG. 6B ) and another (“second”) block (e.g. block 622 in FIG. 6C ) that is adjacent thereto, which do not overlap one another.
  • processor 1013 is programmed (e.g. to implement the projection overlap rule 503 P in FIG. 10 ) to check if a projection (“horizontal projection”) 621 H of block 621 ( FIG.
  • a test for overlap of projections of blocks is satisfied in some examples, when a projection of a block 622 that is adjacent (e.g. horizontal projection 622 H on to horizontal line 625 H, or x-axis in FIG. 6C ) is overlapped partially or wholly by a projection of block 621 that is marked pixel-line-present (e.g. horizontal projection 621 H on the x-axis in FIG. 6B ).
  • a 100% horizontal projection overlap condition is tested by block merging module 141 B in one example of act 611 by use of x-coordinates x 1 and x 2 of bottom left and bottom right corners of block 621 that is marked pixel-line-present (which identify the horizontal projection of block 621 ), and x-coordinates x 3 and x 4 of the bottom left and bottom right corners of block 622 that is adjacent (which identify the horizontal projection of block 622 ) as follows, is the following condition met: x 1 ⁇ x 3 ⁇ x 4 ⁇ x 2 by the x-coordinates of the corners of the two blocks.
  • the just-described condition on overlap of projections is based on geometric attributes of the two blocks subject to the test, namely two specified coordinates (on a coordinate axis, e.g. x-axis) of two specified corners of one block with two specified coordinates of two specified corners of the other block (on the same coordinate axis).
  • the just-described horizontal projection overlap condition of 100% can be satisfied in some situations (as illustrated in FIG. 6D ) wherein the two blocks are aligned with horizontal line 625 H, but this condition is not satisfied in other situations.
  • an angular offset between the blocks and the horizontal line, such as angle 629 in FIG. 6D may be sufficiently large that the 100% overlap condition (described above) is not satisfied.
  • processor 1013 is programmed to implement the block merging module 141 B by using less stringent conditions in some embodiments of a horizontal projection overlap condition (in projection overlap rule 503 P in FIG. 10 ), as follows.
  • a left-partial horizontal projection overlap condition is tested by block merging module 141 B in one example of act 611 when x 3 ⁇ x 1 ⁇ x 4 ⁇ x 2 , and the ratio (x 4 ⁇ x 1 )/(x 4 ⁇ x 3 ) is greater than a predetermined fraction, e.g. 0.5.
  • a right-partial horizontal projection overlap condition is tested by block merging module 141 B in another example of act 611 , when x 1 ⁇ x 3 ⁇ x 2 ⁇ x 4 , and the ratio (x 2 ⁇ x 3 )/(x 4 ⁇ x 3 ) is greater than a predetermined fraction, e.g. also 0.5.
  • the just-described two conditions are also based on geometric attributes of the two blocks, as noted above.
  • mobile device 200 checks if a first additional projection (e.g. vertical projection) of the first block (e.g. block 621 in FIG. 6B ) and a second additional projection (e.g. vertical projection) of the second block (e.g. block 622 in FIG. 6C ) satisfy a second test for overlap on an additional straight line (e.g. the y-axis) which is perpendicular to the straight line used in act 611 (e.g. the x-axis).
  • a first additional projection e.g. vertical projection
  • a second additional projection e.g. vertical projection of the second block
  • a second additional projection e.g. vertical projection of the second block
  • a second additional projection e.g. vertical projection of the second block
  • block merging module 141 B finds that such a vertical projection overlap test is met, e.g. using only the y-coordinates of the corners of the two blocks (which identify vertical projections)
  • control transfers from act 612 to act 613 Illustr
  • a 0% vertical projection overlap condition is tested by block merging module 141 B in one example of act 612 (see FIG. 6D ), assuming the second block is located in the image above the first block, by use of a y-coordinate y 3 of bottom-left corner of block 622 (also called “second block”, i.e. bottom coordinate y 3 of vertical projection 622 V) and the y-coordinate y 2 of the top-right corner of block 621 (also called “first block”, i.e. top coordinate y 2 of vertical projection 621 V), as follows: y 2 ⁇ y 3 .
  • a less stringent, partial vertical projection overlap condition is tested in another example of act 612 when y 2 >y 3 , if the ratio (y 2 ⁇ y 3 )/(y 2 ⁇ yl) is less than a predetermined fraction, e.g. 0.1.
  • mobile device 200 may check a similar condition under the assumption that the second block is located in the image below the first block, by block merging module 141 B using the bottom left y-coordinate yl of block 621 (also called “first” block, FIG. 6B ) and the top left y-coordinate y 3 of block 622 (also called “second” block, FIG. 6C ), as follows: y 4 ⁇ y 1 .
  • a less stringent, partial vertical projection overlap condition is tested by block merging module 141 B in another example of act 612 when y 4 >y 1 , if the ratio (y 4 ⁇ yl)/(y 2 ⁇ yl) is less than a predetermined fraction, e.g. 0.1.
  • Another predetermined test in such a clustering rule may cause block merging module 141 B to check as per act 613 that the aspect ratio of the second block (or adjacent block) 622 lies within a predetermined range e.g. Thresh 1 ⁇ Length:Breadth of block ⁇ Thresh 2 wherein Thresh 1 and Thresh 2 are constants that are empirically determined
  • the aspect ratio (x 4 ⁇ x 3 )/(y 4 ⁇ y 3 ) is computed by block merging module 141 B and then checked against the limits 0.8 and 1.2.
  • the just-described condition is based on a geometric attribute of the second block, namely an aspect ratio of the block.
  • the act 613 does not use any information on the first block (e.g. block 621 , marked as pixel line present), other than the fact that second block (e.g. block 622 ) is adjacent thereto.
  • Yet another predetermined test in a clustering rule may cause block merging module 141 B to check, as per act 614 , that the height of an adjacent (second) block (“Maatra Height”) is within a certain percentage of the height of the pixel-line-present (first) block (“Word Height”).
  • Mcatra Height the height of an adjacent (second) block
  • Word Height the height of the pixel-line-present (first) block
  • the just-described condition is again based on geometric attributes of the two blocks subject to the test, in this example it is a comparison of heights of the two blocks (i.e. relative heights).
  • the ratio Maatra Height/Word Height e.g. ratio (vertical projection 622 V/vertical projection 621 V) in FIG. 6B
  • Still another first type of clustering rule such as spacing rule 503 S may cause block merging module 141 B to check, as per act 615 performed by mobile device 200 of some embodiments, that the location of the adjacent (second) block (e.g. block 622 in FIG. 6D or block 632 in FIG. 6E ) is above (or below depending on the rule) the pixel-line-present (first) block (e.g. block 621 in FIG. 6D or block 631 in FIG. 6E respectively) within a predetermined distance, e.g. check for less than 10% vertical projection overlap between the two blocks (in addition to more than 90% horizontal projection overlap).
  • the just-described condition is again based on geometric attributes of the two blocks, namely overlap of projections.
  • block merging module 141 B check whether an additional rule is satisfied by a predetermined geometric attribute (e.g. aspect ratio) of at least one block (e.g. pixel-line-absent block) relative to another block (e.g. pixel-line-present block). In response to finding that such rules are satisfied, the two blocks are merged in some embodiments.
  • a predetermined geometric attribute e.g. aspect ratio
  • act 615 When a result of act 615 is that the adjacent (second) block is located above the pixel-line-present (first) block, mobile device 200 performs act 617 and else performs act 616 .
  • block merging module 141 B in mobile device 200 checks if the distance between the adjacent (second) block and the pixel-line-present (first) block is less than Thresh 5 *Word Height, wherein Thresh 5 is an above-block limit (also called “first predetermined limit”) that is predetermined empirically, and Word Height is the height of the pixel-line-present (first) block (e.g. vertical projection 621 V in FIG. 6B ).
  • Thresh 5 is an above-block limit (also called “first predetermined limit”) that is predetermined empirically
  • Word Height is the height of the pixel-line-present (first) block (e.g. vertical projection 621 V in FIG. 6B ).
  • the just-described condition is once again based on geometric attributes of the two blocks, namely vertical
  • block merging module 141 B in mobile device 200 checks if the distance between the adjacent (second) block and the pixel-line-present (first) block is less than Thresh 6 *Word Height, wherein Thresh 6 is a below-block limit (also called “second predetermined limit”) that is also predetermined empirically. The just-described condition is once again based on geometric attributes of the two blocks, namely vertical separation (or gap) below the pixel-line-present block. If the answer is yes in either of acts 616 and 617 , control transfers to operation 230 , and otherwise control transfers to operation 520 . In some embodiments, the acts 614 and 615 may further check that the adjacent (second) block is marked as pixel-line-absent.
  • a second clustering rule ( FIG. 7A ) is used in some embodiments of operation 520 , based on an assumption that two pixel-line-present blocks that are located adjacent to one another constitute a word, with a broken header line in Hindi (or base line in Arabic) resulting in two separate connected components for the single word.
  • one test in the second clustering rule such as projection overlap rule 503 P may cause block merging module 141 B to check for 0% horizontal projection overlap and 95% vertical projection overlap between the two pixel-line-present blocks (e.g. see acts 711 and 712 in FIG. 7A ). Therefore, in the example shown in FIG. 7B , the horizontal projections 620 H, 621 H and 623 H in FIG. 7B ) may be checked by block merging module 141 B for zero overlap among each other, and the vertical projections 620 V, 621 V and 623 V may be checked for 100% overlap with each other.
  • Another second type of clustering rule may cause block merging module 141 B to check that a height difference between the two pixel-line-present blocks, as a percentage of the height of one of the blocks is less than 5%, e.g. see act 713 in FIG. 7A .
  • the two conditions in this paragraph are also based on geometric attributes of the two blocks, namely projections and height differences.
  • differences between pairs of vertical projections ( 620 V, 621 V), ( 620 V, 623 V) and ( 621 V, 623 V), which are also called vertical spans, may be computed by block merging module 141 B and checked to see if they are less than 5% of one of the spans used to compute the difference, e.g. 620 V, 620 V and 621 V respectively.
  • Yet another second type of clustering rule may cause block merging module 141 B to check, as per act 714 in FIG. 7A that the horizontal distance of separation between the two line-present blocks, as a percentage of the length of one of the blocks is less than 5%.
  • differences between pairs of horizontal projections ( 620 H, 621 H) and ( 621 H, 623 H) which are also called horizontal spans may be computed by block merging module 141 B and checked to see if they are less than 5% of one of the spans used to compute the difference, e.g. 620 H and 621 V respectively.
  • a third clustering rule ( FIG. 8A ) checks if a second block constitutes a half letter of text within the pixel-line-present block. For example a test in a third clustering rule may cause block merging module 141 B to check for 100% horizontal projection overlap and 100% vertical projection overlap between the adjacent block and the line-present block as per acts 811 and 812 , because a broken letter in the language Hindi is normally embedded within a main word. Therefore, this is an example wherein the second block completely overlaps the first (pixel-line-present) block, although as noted above most tests check that the two blocks do not overlap one another.
  • the coordinates of the four corners of block 822 may be checked by block merging module 141 B relative to the coordinates of the four corners of block 821 (also called “first” block) in acts 811 and 812 , to ensure 100% overlap in both horizontal projections (the projection 822 H fully overlaps the projection 821 H) and vertical projections (the projection 822 V fully overlaps the projection 821 V).
  • one more test in the third clustering rule may cause block merging module 141 B to check as per act 813 that a height difference between the first (pixel-line-present) block (e.g. block 821 ) and the second block (e.g.
  • another test in the third clustering rule may cause block merging module 141 B to check, as per act 814 that the aspect ratio (i.e. the ratio Length/Breadth) of the second block is between 0.7 and 0.9 (denoting a half-character of smaller width than a single character) while the aspect of the first (pixel-line-present) block is greater than 2 (denoting multiple characters of greater width than a single character).
  • the aspect ratio i.e. the ratio Length/Breadth
  • the ratio of the height of projection 822 V (also called “vertical span”) and the width of projection 822 H (also called “horizontal span”) is checked by block merging module 141 B to be between 0.7 and 0.9 and the ratio of the height of projection 821 V (also called “vertical span”) and width of projection 821 H (also called “horizontal span”) is checked to be greater 2.
  • Still another test in the third clustering rule may cause block merging module 141 B to check as per act 815 in FIG. 8A that the center of the second block and the center of the first (pixel-line-present) block have y-coordinates, differ from each other by less than 5%.
  • FIG. 8A the ratio of the height of projection 822 V (also called “vertical span”) and the width of projection 822 H (also called “horizontal span”) is checked by block merging module 141 B to be between 0.7 and 0.9 and the ratio of the height of projection 821 V (also called “vertical span”) and width of projection 821 H (also called “horizontal span
  • the difference between the y-coordinate at center of projection 822 V (also called “vertical span”) and the y-coordinate at center of projection 821 V (also called “vertical span”) may be checked by block merging module 141 B to be within 5% of projection 822 V.
  • classification of blocks into text or non-text is performed by use of a neural network in operation 230 using parameters 911 - 915 illustrated in FIG. 9 , for a merged block.
  • parameter 911 is the relative location of a line 910 (such as a header line or shiro-rekha in text written in Devanagri script), computed as Hp/H (as discussed above in reference to act 333 of FIG. 3B ).
  • parameter 912 is the relative strength of this line 910 , computed as Np/Nm (as discussed above in reference to act 331 of FIG. 3B ).
  • the number of vertical lines 901 , 902 . . . 905 . . . 909 is counted, e.g. as peaks in a vertical projection and a ratio of this number to the length L (also called width) of the block is another parameter 913 that is also used in the neural network classifier.
  • An example of another attribute of a merged block is indicative of another mean of another number of transitions in a predetermined direction (e.g. longitudinal direction), from the second binary value to the first binary value (e.g. from value 1 to value 0), in a row in a set of rows.
  • a predetermined direction e.g. longitudinal direction
  • two numbers are counted, namely white-to-black transitions and black-to-white transitions in a predetermined direction, with each number being another attribute of the merged block.
  • Some embodiments use an attribute of the merged block that is indicative of a ratio of (A) a mean of a number of transitions in a predetermined direction (e.g. horizontal direction) from a first binary value (e.g. value 1 for a black colored pixel) to a second binary value (e.g. value 0 for a white colored pixel), in each row in a set of rows in the merged block and (B) a width of the merged block.
  • A a mean of
  • two numbers of transition are counted for a subset of rows in the merged block that are located at specified position(s) relative to a position of a peak in the block (e.g. at which the header line of a word of text in Hindi occurs, if present in a pixel-line-present block), as follows.
  • a peak's position (relative to a vertical span of the block) may be used to identify rows in the block that are located below the peak by at least a predetermined distance (e.g. specified as a percentage of the block height) as belonging to the subset.
  • two types of transitions are averaged (namely a number of transitions from value 0 in binary to value 1 in binary and another number of transitions from value 1 in binary to value 0 in binary), and the resulting means (i.e. averages) are used as parameters 914 and 915 which are input to a neural network classifier, e.g. implemented by a processor executing the classifier software 552 used in operation 250 .
  • a neural network classifier is just one example of the type of classifier that can be programmed to use one or more of parameters 911 - 915 in different aspects of the described embodiments.
  • the operation 220 (for pixel line presence detection) and operation 230 are performed assuming that a longitudinal direction of a connected component of text is well-aligned (e.g. within angular range of +5° and ⁇ 5°) relative to the longitudinal direction of the block containing that connected component. Accordingly, in such embodiments, blocks in which the respective connected components are misaligned may not be marked as “pixel-line-present” and therefore not be merged with their adjacent “pixel-line-absent” blocks.
  • skew of one or more connected components relative to blocks that contain them may be identified by performing geometric rectification(e.g. re-sizing blocks), and skew correction (of the type performed in operation 240 ).
  • an operation 270 to detect and correct skew is performed in some embodiments as illustrated in FIG. 12 (after initialization operation 210 ), followed by operation 220 .
  • Operation 270 ( FIG. 12 ) may be based on prompting for and receiving user input on tilt or skew in some embodiments, while other embodiments (described in the next paragraph, below) automatically search coarsely, followed by searching finely within a coarsely determined range of tilt angle.
  • operation 270 to determine skew also identifies presence of a line of pixels, and hence acts 221 - 223 are performed as steps within operation 270 .
  • a specific manner in which skew is corrected in operation 270 can be different in different embodiments, and hence not a critical aspect of many embodiments of operation 220 .
  • processor 1013 is programmed to select blocks based on variance in stroke width and automatically detect skew of selected blocks as follows.
  • Processor 1013 checks whether at a candidate angle, one or more attributes of projection profiles meet at least one test for presence of a straight line of pixels, e.g. test for presence of straight line 304 ( FIG. 3A ) of pixels in block 302 with a common binary value (e.g. pixels of a connected component).
  • Some embodiments detect a peak of the histogram of block 302 at the candidate angle by comparing a highest value Np in the counters to a mean Nm of all values in the counters e.g.
  • Processor 1013 repeats the process described in this paragraph with additional blocks of the image, and after a sufficient number of such votes have been counted (e.g. 10 votes), the candidate angle of a counter which has the largest number of votes is used as the skew angle, which is then used to automatically correct skew in each block (among the multiple blocks used in the skew computation).
  • a predetermined limit e.g. ratio >1.75 indicates peak.
  • a y-coordinate of the peak is compared with a height of the box Hb to determine whether the peak occurs in an upper 30% (or upper 20% or 40% in alternative embodiments) of the block. If so, the candidate angle is selected for use in a voting process, and a counter associated with the candidate angle is incremented.
  • Processor 1013 repeats the process described in this paragraph with additional blocks of the image, and after a sufficient number of such votes have been counted (e.g. 10 votes), the candidate angle of a counter which has the largest number of votes is used as the skew angle
  • Classification of the type described herein in operation 250 may be implemented using machine learning methods (e.g. neural networks) as described in a webpage at http://en.wikipedia.org/wiki/Machine_learning.
  • Other methods of classification in operation 240 that can also be used are described in, for example the following, each of which is incorporated by reference herein in its entirety:
  • Mobile device 200 may include a camera 1011 to generate an image 107 (or frames of a video, each of which may be image 107 ) of a scene in the real world.
  • Mobile device 200 may further include sensors 1003 , such as accelerometers, gyroscopes, GPS sensor or the like, which may be used to assist in determining the pose (including position and orientation) of the mobile device 200 relative to a real world scene.
  • mobile device 200 may additionally include a graphics engine 1004 , an image processor 1005 , and a position processor.
  • mobile device 200 may include one or more other types of memory such as flash memory (or SD card) 1008 and/or a hard disk and/or an optical disk (also called “secondary memory”) to store data and/or software for loading into memory 1012 (also called “main memory”) and/or for use by processor(s) 1013 .
  • flash memory or SD card
  • main memory also called “main memory”
  • Mobile device 200 may further include a circuit 1010 (e.g. with wireless transmitter and receiver circuitry therein) and/or any other communication interfaces 1009 .
  • a transmitter in circuit 1010 which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network.
  • mobile device 200 may be any portable electronic device such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, camera, smartphone, tablet (such as iPad available from Apple Inc) or other suitable mobile platform that is capable of creating an augmented reality (AR) environment.
  • PCS personal communication system
  • PND personal navigation device
  • PIM Personal Information Manager
  • PDA Personal Digital Assistant
  • laptop camera
  • smartphone such as iPad available from Apple Inc
  • tablet such as iPad available from Apple Inc
  • AR augmented reality
  • input to mobile device 200 can be in video mode, where each frame in the video is equivalent to the image input which is used to identify connected components, and to compute a skew metric as described herein. Also, the image used to compute a skew metric as described herein can be fetched from a pre-stored file in a memory 1012 of mobile device 200 .
  • a mobile device 200 of the type described above may include an optical character recognition (OCR) system as well as software that uses “computer vision” techniques.
  • the mobile device 200 may further include, in a user interface, a microphone and a speaker (not labeled) in addition to touch-sensitive screen 1001 or normal screen 1002 for displaying captured images and any text/graphics to augment the images.
  • OCR optical character recognition
  • mobile device 200 may include other elements unrelated to the present disclosure, such as a read-only-memory 1007 which may be used to store firmware for use by processor 1013 .
  • Mobile device 200 of some embodiments includes, in memory 1012 ( FIG. 10 ) computer instructions in the form of software 141 that is used to process an image 107 of a scene of the real world, as follows. Specifically, in such embodiments, a region identifier 141 R ( FIG. 10 ) is coupled to the locations in memory 1012 wherein image 107 is stored. Region identifier 141 R ( FIG. 10 ) is implemented in these embodiments by processor 1013 executing computer instructions to implement any method of identifying MSERs, thereby to generate a set of blocks 302 in memory 1012 .
  • a pixel line presence tester 141 T ( FIG. 10 ) is implemented in several embodiments by processor 1013 executing computer instructions to use any test (e.g. by selecting the test based on user input) to check whether each block, in the set of blocks 302 satisfies the test.
  • a test may be selected by identification of a script of a specific language, designed to identify presence of a line of pixels in each block.
  • Pixel line presence tester 141 T of some embodiments includes a binarization module (not shown) and a histogram generator (also not shown), for use in generating a profile of the number of pixels having a common binary value (relative to one another) and located along each row (or each column), depending on the language identified by user input 141 U.
  • a pixel line presence marker 141 M ( FIG. 10 ) is implemented in several embodiments by processor 1013 executing computer instructions to receive a result from pixel line presence tester 141 T and respond by storing in memory 1012 , e.g. a list 1501 of identifiers of blocks that are marked as “line-present” blocks. Blocks not identified in the list 1501 are treated, in some embodiments of software 141 , as line-absent blocks.
  • an adjacent block identifier 141 A ( FIG. 10 ) is implemented in several embodiments by processor 1013 executing computer instructions to use a block that is marked in list 1501 of identifiers as being line-present, to identify from among the set of blocks 302 , one or more blocks that are located adjacent to the line-present block e.g. as another list 1502 of identifiers of adjacent blocks.
  • processor 1013 on execution of software 141 implements a block merging module 141 B that uses the lists 1501 and 1502 to merge two blocks, and that then supplies a merged block to storage module 141 S.
  • block merging module 141 B implements the clustering rules 503 , including projection overlap rules 503 P, relative heights rules 503 R, aspect ratio rules 503 A and spacing rules 503 S.
  • Storage module 141 S is implemented by execution of software 141 by processor 1013 to store the merged block in memory 1012 , e.g. as a list of positions 504 that identify four corners of each merged block.
  • software 141 may include a classifier software 552 that when executed by processor 1013 classifies unmerged blocks and/or merged blocks as text or non-text (after binarization based on pixel values in image 107 to identify connected components therein), and any block classified as text is supplied to OCR software 551 .
  • classifier software 552 when executed by processor 1013 classifies unmerged blocks and/or merged blocks as text or non-text (after binarization based on pixel values in image 107 to identify connected components therein), and any block classified as text is supplied to OCR software 551 .
  • mobile device 200 shown in FIG. 2 of some embodiments is a hand-held device, in other embodiments are implemented by use of one or more parts that are stationary relative to a real world scene whose image is being captured by camera 1011 .
  • processor 1013 when a limit on time spent in processing an image as per the method of FIG. 3B is exceeded, processor 1013 exits the method. On exiting in this manner, processor 1013 may then rotate the image through an angle (automatically, or based on user input, or a combination thereof), and then re-initiate performance of the method illustrated in FIG. 3B .
  • processor 1013 may check for presence of a line of pixels oriented differently (e.g. located in a column in the block) depending on the characteristics of the language of text that may be included in the image.
  • methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware in read-only-memory 1007 ( FIG. 10 ) or software, or hardware or any combination thereof.
  • the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGAs field programmable gate arrays
  • processors controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
  • the methodologies may be implemented with modules (e.g
  • mobile device 200 shown in FIG. 10 of some embodiments is a hand-held device, other embodiments are implemented by use of form factors that are different, e.g. in certain other embodiments the mobile device 200 is a mobile platform (such as a tablet) while still other embodiments are implemented by use of any electronic device or system.
  • Illustrative embodiments of such an electronic device or system may include multiple physical parts that intercommunicate wirelessly, such as a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand.
  • a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

An electronic device and method may capture an image of an environment, followed by identification of blocks of connected components in the image. A test for overlap of spans may be made, between a span of a block selected (e.g. for having a line of pixels) and another span of an adjacent block located above, or below, or to the left, or to the right of the selected block and when satisfied, these two blocks are merged. Blocks may additionally be tested, e.g., for relative heights of the two blocks, and/or aspect ratio of either or both blocks, etc. Classification of a merged block as text or non-text may use attributes of the merged block, such as location of a horizontal pixel line, number of vertical pixel lines, and number of black-white transitions and number of white-black transitions in a subset of rows located below the horizontal pixel line.

Description

    CROSS-REFERENCE TO PROVISIONAL APPLICATIONS
  • This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/590,966 filed on Jan. 26, 2012 and entitled “Identifying Regions Of Text To Merge In A Natural Image or Video Frame”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/590,983 filed on Jan. 26, 2012 and entitled “Detecting and Correcting Skew In Regions Of Text In Natural Images”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/590,973 filed on Jan. 26, 2012 and entitled “Rules For Merging Blocks Of Connected Components In Natural Images”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • This application claims priority under 35 USC §119 (e) from U.S. Provisional Application No. 61/673,703 filed on Jul. 19, 2012 and entitled “Automatic Correction of Skew In Natural Images and Video”, which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is also related to U.S. application Ser. No. 13/748,562, Attorney Docket No. Q112726USos, filed concurrently herewith, entitled “Detecting and Correcting Skew In Regions Of Text In Natural Images” which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • This application is also related to U.S. application Ser. No. 13/748,539, Attorney Docket No. Q111559USos, filed concurrently herewith, entitled “Identifying Regions of Text to Merge In A Natural Image or Video Frame” which is assigned to the assignee hereof and which is incorporated herein by reference in its entirety.
  • FIELD
  • This patent application relates to devices and methods for applying rules (called “clustering rules”) to check whether or not blocks of one or more regions in an image should be merged, prior to classification of the blocks as text or non-text.
  • BACKGROUND
  • Identification of text regions in documents that are scanned (e.g. by an optical scanner of a printer or copier) is significantly easier than detecting text regions in images generated by a handheld camera, of scenes in the real world (also called “natural images”). Specifically, optical character recognition (OCR) methods of the prior art originate in the field of document processing, wherein the document image contains a series of lines of text (e.g. 20 lines of text) of a scanned page in a document. Document processing techniques, although successfully used on scanned documents created by optical scanners, generate too many false positives and/or negatives so as to be impractical when used on natural images. Hence, detection of text regions in a real world image generated by a handheld camera is performed using different techniques. For additional information on techniques used in the prior art, to identify text regions in natural images, see the following articles that are incorporated by reference herein in their entirety as background:
    • (a) H. Li et al. “Automatic text detection and tracking in digital video,” IEEE transactions on Image processing, vol. 9., no. 1, pp. 147-156, 2000;
    • (b) X. Chen and A. Yuille, “Detecting and reading text in natural scenes,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'04), 2004, pages 1-8;
    • (c) S. W. Lee et al, “A new methodology for gray-scale character segmentation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, October 1996, pp. 1045-1050, vol. 18, no. 10;
    • (d) B. Epshtein et al, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition (CVPR) 2010, pages 2963-2970; and
    • (e) A. Jain and B. Yu, “Automatic text location in images and video frames”, Pattern Recognition, 1998, pp. 2055-2076, Vol. 31, No. 12.
  • Image processing techniques of the prior art described above appear to be developed primarily to identify regions in images that contain text which is written in the language English. Use of such techniques to identify in natural images, regions of text in other languages that use different scripts for letters of their alphabets can result in false positives and/or negatives so as to render the techniques impractical.
  • FIG. 1A illustrates a newspaper 100 in the real world in India. A user 110 (see FIG. 1B) may use a camera-equipped mobile device (such as a cellular phone) 108 to capture an image 107 of newspaper 100. Camera captured image 107 may be displayed on a screen 106 of mobile device 108. Such an image 107 (FIG. 1C), if processed directly using prior art image processing techniques may result in failure to classify one or more regions 103 as text (see FIG. 1A). Specifically, text-containing regions of a camera-captured image may be classified as non-text and vice versa e.g. due to variations in lighting, color, tilt, focus, etc.
  • Additionally, presence in natural images, of text written in non-English languages, such as Hindi can result in false positives/negatives, when technique(s) developed primarily for identifying text in the English language are used in classification of regions as text/non-text. For example, although blocks in regions that contain text in the English language may be correctly classified to be text (e.g. by a neural network), one or more blocks 103A, 103B, 103C and 103D (FIG. 1C) in a region 103 contain text in Hindi that may be mis-classified as non-text (e.g. even when the neural network has been trained with text in Hindi).
  • One or more prior art criteria that are used by a classifier to identify text in natural images can be relaxed, so that blocks 103A-103D are then classified as text, but on doing so one or more portions of another region 105 (FIG. 1C) may coincidentally satisfy the relaxed criteria, and blocks in region 105 may be then mis-classified as text although these blocks contain graphics (e.g., pictures of cars in FIG. 1B).
  • Moreover, when a natural image 107 (FIG. 1C) is processed by a prior art method to form rectangular blocks, certain portions of text may be omitted from a rectangular block that is classified as text. For example, pixels in such text portions may be separated from (i.e. not contiguous with) pixels that form the remainder of text in the rectangular block, due to pixels at a boundary of the rectangular block not satisfying a prior art test used to form the rectangular block. Such omission of pixels of a portion of text, from a rectangular block adjacent to the portion is illustrated in FIG. 1C at least twice. See pixels of text to the left of block 103B, and see pixels of text to the left of block 103C (in FIG. 1C). Omission of text portions from rectangular blocks of a natural image can result in errors, when such incomplete blocks are further processed after classification, e.g. by an optical character recognition (OCR) system.
  • Accordingly, there is a need to improve the identification of regions of text in a natural image or video frame, as described below.
  • SUMMARY
  • In several aspects of described embodiments, an electronic device and method use a camera to capture an image of an environment outside the electronic device followed by identification of blocks that enclose regions of pixels in the image, with each region being initially included in a single block. Depending on the embodiment, each region may be identified to have pixels contiguous with one another and including a local extrema (maxima or minima) of intensity in the image, e.g. a maximally stable extremal region (MSER). In some embodiments, each block that contains such a region (which may constitute a “connected component”) is tested for presence of a line of pixels binarizable to a common value (“pixel-line-present” block), followed by identification of one or more blocks adjacent thereto which are then tested for merger as follows.
  • One or more processors of several embodiments execute computer instructions (also called “first instructions”) to test for overlap of projections, between a projection of a pixel-line-present block on to a line (e.g. x-axis) and another projection of an adjacent block on to the same line (e.g. x-axis also). When one or more such test(s) for overlap on a line of projections (also called “supports” or “spans”) of blocks is/are satisfied, these two blocks are automatically merged with one another by one or more processors executing computer instructions (also called “second instructions”), although at the time of merger it is not known whether the blocks being merged contain text or non-text. An additional test that may be performed prior to merger of two blocks may be based on, for example, relative heights of the two blocks, and/or aspect ratio of either or both blocks, etc. Information on a merged block that is obtained as a result of merging of two or more blocks is stored in memory by one or more processors executing computer instructions (also called “third instructions”). The merged block is then processed further in certain embodiments, e.g. subject to verification of presence of a pixel line, and followed by classification of the merged block as text or non-text.
  • Depending on the embodiment, classification of a merged block (with multiple connected components therein) as text or non-text may use one or more predetermined attributes of the merged block, such as location and thickness of a line of pixels binarizable to a common binary value and oriented longitudinally in the merged block (e.g. parallel to or within a small angle of, whichever side of the block is longer). The just-described classification of the merged block may additionally or alternatively use: a number of lines of pixels oriented laterally in the merged block (e.g. vertically), and a number of black-white transitions (and number of white-black transitions) in a subset of rows located below the line of pixels, when the line of pixels is located in an upper portion of the merged block (e.g. within 30% or 40% of the merged block's height, measured from a top side of the merged block).
  • In some embodiments, one or more of: identification of blocks, testing for overlap of projections on to a common line, merger of blocks that satisfy tests, followed by text/non-text classification as described above are performed by one or more processor(s) operatively coupled to memory and configured to execute computer instructions stored in the memory (or in another non-transitory computer readable storage media). Moreover, in some embodiments, one or more non-transitory storage media include a plurality of computer instructions, which when executed, cause one or more processors in a handheld device to perform operations, and these computer instructions include computer instructions to perform one or more of: identification of blocks, testing for overlap of projections on to a common line, merger of blocks that satisfy tests, followed by text/non-text classification described above.
  • In certain embodiments, one or more acts of the type described above are performed by a mobile device (such as a smart phone) that includes a camera, a memory operatively connected to the camera to receive images therefrom, and at least one processor operatively coupled to the memory to execute computer instructions stored in the memory (or in another non-transitory computer readable storage media). On execution of the computer instructions, the processor processes an image to check two blocks that are adjacent to one another in the image for satisfying one or more predetermined rules (e.g. based on geometric attributes of the blocks), and on finding the rule(s) to be satisfied merging the two blocks to generate a merged block, subsequently classifying the merged block as text or non-text, followed by OCR of blocks that are classified as text (in the normal manner). In some embodiments, an apparatus includes several means implemented by logic in hardware or logic in software or a combination thereof, to perform one or more acts described above.
  • It is to be understood that several other aspects will become readily apparent to those skilled in the art from the description herein, wherein it is shown and described various embodiments by way of illustration. The drawings and detailed description below are to be regarded as illustrative in nature and not as restrictive.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1A illustrates a newspaper of the prior art, in the real world in India.
  • FIG. 1B illustrates a user using a camera-equipped mobile device of the prior art to capture an image of a newspaper in the real world.
  • FIG. 1C illustrates blocks formed by identifying connected components in a portion of the image of FIG. 1B by use of a prior art method.
  • FIG. 2 illustrates, in a high-level flow chart, various acts performed by a mobile device in a method of identifying regions to merge in some aspects of the described embodiments.
  • FIG. 3A illustrates a memory of a mobile device during application of a predetermined test to detect pixel line presence, in illustrative aspects of the described embodiments.
  • FIG. 3B illustrates, in an intermediate-level flow chart, various acts performed by a mobile device to implement a predetermined test to detect pixel line presence, in some aspects of the described embodiments.
  • FIG. 3C illustrates another projection profile of English text in prior art.
  • FIG. 4A illustrates an example of text in a prior art image.
  • FIGS. 4B-4F illustrate formation of a block by use of the method of FIG. 2 in illustrative aspects of the described embodiments.
  • FIG. 5 illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to identify blocks that can be merged as per operation 230 in FIG. 2, by application of three sets of rules.
  • FIG. 6A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of modifiers or accent marks.
  • FIGS. 6B, 6C, and 6D illustrate examples of text wherein blocks 621 and 622 are determined to be merged by applying the set of rules as per the method of FIG. 6A.
  • FIG. 6E illustrates another example of text wherein blocks 631 and 632 are determined to be merged by applying the set of rules as per the method of FIG. 6A.
  • FIG. 7A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of broken words.
  • FIG. 7B illustrates the example of text of FIG. 6B wherein blocks 620, 621 and 623 are determined to be merged by applying the second set of tests as per the method of FIG. 7A.
  • FIG. 8A illustrates, in a high-level flow chart, various acts performed by a mobile device in some aspects of the described embodiments, to apply a set of rules, to identify blocks to be merged based on certain attributes of half letters.
  • FIG. 8B illustrates another example of text wherein blocks 821 and 822 are determined to be merged by applying the third set of tests as per the method of FIG. 8A.
  • FIG. 9 illustrates parameters 911-915 computed for use in a neural network that performs classification of blocks into text or non-text, as per operation 240 (FIG. 2), performed by a mobile device in some aspects of the described embodiments.
  • FIG. 10 illustrates, in a block diagram, a mobile device including processor and memory of the type described above, in some aspects of the described embodiments.
  • FIG. 11 illustrates, in a block diagram, computer instructions in a memory 1012 of the described embodiments, to perform several of the operations illustrated in FIG. 2.
  • FIG. 12 illustrates, in a high-level flow chart, various acts performed by a mobile device in an alternative method of identifying regions to merge in some aspects of the described embodiments.
  • DETAILED DESCRIPTION
  • A number of regions of an image of a real world scene (e.g. an image 107 of a newspaper 100 in FIG. 1B) are initially identified in several aspects of the described embodiments, in the normal manner. Hence, a mobile device (e.g. a smartphone or a tablet which can be held in a human hand) 200 in certain described embodiments may use a camera 1011 (FIG. 10) included therein, to capture an image of an environment outside the mobile device 200, such as a scene of real world. Mobile device 200 of some embodiments includes one or more processors 1013 (FIG. 10) programmed with software 141 (also called “merger” software), classifier software 552, and OCR software 551 (all of which are stored in a memory 1012, which may be any non-transitory memory that is computer readable). OCR software 551 is used to eventually recognize text in one or more image(s) 107 generated by camera 1011, e.g. by performing Optical Character Recognition (“OCR”). Depending on the embodiment, camera 1011 may be a digital camera that captures still images (also called “snapshots”), or a video camera that generates a video stream of images at a known rate, e.g. 30 frames/second.
  • Merger software 141 of some embodiments, when executed by one or more processors, identifies blocks of regions in an image (in memory) that can be merged with one another, as described in U.S. application Ser. No. 13/748,539, Attorney Docket No. Q111559Usos, filed concurrently herewith, entitled “Identifying Regions of Text to Merge In A Natural Image or Video Frame” which is incorporated herein by reference in its entirety, above. Blocks that are identified as candidates for merger are thereafter subject to certain predetermined rules (also called clustering rules) as described below, and when these rules are found to be satisfied the blocks are merged, even though it is not known whether the blocks are text or non-text.
  • Specifically, an image 107 (e.g. a hand-held camera captured image) received by a processor 1013 of mobile device 200 in certain described embodiments, as per act 211 in FIG. 2 is a snapshot (in a set of snapshots generated by a digital camera) or a video frame (in a stream of video frames generated by a video camera) or any image stored in memory and retrieved therefrom. In many embodiments, image 107 is not generated by an optical scanner, of a copier or printer and instead image 107 is generated by a hand-held camera, as noted above. In alternative embodiments, image 107 is generated by an optical scanner, of a copier or printer, from printed paper. Although processor 1013, which performs one or more acts shown in FIG. 2 is included in mobile device 200 of some embodiments, in other embodiments the processor 1013 which is programmed to perform one or more acts described herein is external to mobile device 200, e.g. included in a server to which mobile device 200 is operatively coupled by a wireless link.
  • After receipt of image 107, processor 1013 in described embodiments identifies, as per act 212 in FIG. 2, a set of regions (also called “blobs”) in image 107 with boundaries that differ from surrounding pixels in a predetermined manner (as specified in a parameter input to the method) from surrounding pixels in one or more properties, such as intensity and/or color. Some methods that may be used in act 212 first identify a pixel of local minima or maxima (also called “extrema”) of a property (such as intensity) in the image, followed by identifying neighboring pixels that are contiguous with one another and with the identified extrema pixel, within a range of values of the property that is obtained in a predetermined manner, so as to identify in act 212 an MSER region.
  • Specifically, MSERs that are identified in act 212 of some embodiments are regions that are geometrically contiguous (with any one pixel in the region being reachable from any other pixel in the region by traversal of one or more pixels that contact one another in the region) with monotonic transformation in property values, and invariant to affine transformations (transformations that preserve straight lines and ratios of distances between points on the straight lines). Boundaries of MSERs may be used as connected components in some embodiments described herein, to identify regions of an image, as candidates for recognition as text.
  • In several of the described embodiments, regions in image 107 are automatically identified in act 212 based on variation in intensities of pixels by use a method of the type described by Matas et al., e.g. in an article entitled “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions” Proc. Of British Machine Vision Conference, pages 384-396, published 2002 that is incorporated by reference herein in its entirety. The time taken to identify MSERs in an image can be reduced by use of a method of the type described by Nister, et al., “Linear Time Maximally Stable Extremal Regions”, ECCV, 2008, Part II, LNCS 5303, pp 183-196, published by Springer-Verlag Berlin Heidelberg that is also incorporated by reference herein in its entirety. Another such method is described in, for example, an article entitled “Robust Text Detection In Natural Images With Edge-Enhanced Maximally Stable Extremal Regions” by Chen et al, IEEE International Conference on Image Processing (ICIP), September 2011 that is incorporated by reference herein in its entirety.
  • The current inventors note that prior art methods of the type described by Chen et al. or by Matas et al. or by Nister et al. identify hundreds of MSERs, and sometimes identify thousands of MSERs in an image 107 that includes details of natural features, such as leaves of a tree or leaves of plants, shrubs, and bushes. Hence, use of MSER methods of the type described above result in identification of regions whose number depends on the content within the image 107. Moreover, a specific manner in which pixels of a region differ from surrounding pixels at the boundary identified by such an MSER method may be predetermined in some embodiments by use of a lookup table in memory. Such a lookup table may supply one or more specific combinations of values for the parameters Δ and Max Variation, which are input to an MSER method (also called MSER input parameters). Such a lookup table may be populated ahead of time, with specific values for Δ and Max Variation, e.g. determined by experimentation to generate contours that are appropriate for recognition of text in a natural image, such as value 8 for Δ and value 0.07 for Max Variation.
  • In some embodiments, pixels are identified in a set (which may be implemented in a list) that in turn identifies a region Qi which includes a local extrema of intensity (such as local maxima or local minima) in the image 107. Such a region Qi may be identified in act 212 (FIG. 2) as being maximally stable relative to one or more intensities in a range i−Δ to i+Δ (depending on the embodiment, including the above-described intensity i), each intensity i being used as a threshold (with Δ being a parameter input to an MSER method) in comparisons with intensities of a plurality of pixels included in region Qi to identify respective regions Qi−Δ and Qi+Δ. In some embodiments, a number of pixels in the region Qi remains within a predetermined (e.g. user specified) range relative to changes in intensity i across a range i−Δ to i+Δ, with a local minima in a ratio [Qi−Δ−Qi+Δ]/Qi occurring at the intensity i. Therefore, the just-described set of positions in certain embodiments are indicative of (or identify) a region Qi that constitutes an MSER (i.e. a maximally stable extremal region).
  • Other methods that can be used to identify such regions in act 212 may be similar or identical to methods for identification of connected components, e.g. as described in an article entitled “Application of Floyd-Warshall Labelling Technique: Identification of Connected Pixel Components In Binary Image” by Hyunkyung Shin and Joong Sang Shin published in Kangweon-Kyungki Math. Jour. 14 (2006), No. 1, pp. 47-55 that is incorporated by reference herein in its entirety, or as described in an article entitled “Fast Connected Component Labeling Algorithm Using A Divide and Conquer Technique” by Jung-Me Park, Carl G. Looney and Hui-Chuan Chen believed to be published in Matrix (2000), Volume: 4, Issue: 1, Publisher: Elsevier LTD, pp 4-7 that is also incorporated by reference herein in its entirety.
  • A specific manner in which regions of an image 107 are identified in act 212 by mobile device 200 in described embodiments can be different, depending on the embodiment. Each region of image 107 that is identified by use of an MSER method of the type described above is represented by act 212 in the form of a list of pixels, with two coordinates for each pixel, namely the x-coordinate and the y-coordinate in two dimensional space (of the image).
  • After identification of regions, each region is initially included in a single rectangular block which may be automatically identified by mobile device 200 of some embodiments in act 212, e.g. as a minimum bounding rectangle of a region, by identification of a largest x-coordinate, a largest y-coordinate, a smallest x-coordinate and a smallest y-coordinate of all pixels within the region. The just-described four coordinates may be used in act 212, or subsequently when needed, to identify the corners of a rectangular block that tightly fits the region. As discussed below, such a block (and therefore its four corners) may be used in checking whether a predetermined rule is satisfied, e.g. by one or more geometric attributes of the block relative to an adjacent block (such as overlap of projection (“support”) on a common line). Also, a block's four sides may need to be identified, in order to identify all pixels in the block and their binarizable values, followed by generation of a profile of counts of pixels of a common binary value. When needed, four corners of a rectangular block that includes a region may be identified, e.g. as follows:
    • (largest x-coordinate, largest y-coordinate),
    • (largest x-coordinate, smallest y-coordinate),
    • (smallest x-coordinate, largest y-coordinate) and
    • (smallest x-coordinate, smallest y-coordinate).
      The above-described acts 211 and 212 are performed in several embodiments, an initialization operation 210 (FIG. 2) in a manner similar or identical to corresponding operations of the prior art, for example as described above in reference to FIGS. 1A-1C. Accordingly, each block (also called “unmerged block” or “initially identified block”) that is identified at the end of act 212 of some embodiments contains a single region (which may constitute a “connected component”), such as an MSER.
  • After a set of blocks are identified in act 212, such as block 302 in FIG. 3A, processor 1013 in mobile device 200 of some described embodiments determines in an operation 220 (see FIG. 2), whether or not block 302 has a peak in the number of pixels 303I-303J (FIG. 3A) that can be binarized (i.e. binarizable) to a common value (e.g. to a value 1 or value 0 in a binarized version of block 302), located along a straight line 304 (FIG. 3A) through block 302, which straight line satisfies a test which includes a predetermined condition (e.g. peak is located in the top ⅓rd of the block). In several embodiments, operation 220 is performed in a deterministic manner, i.e. without learning (and without use of a neural network).
  • Processor 1013 of certain embodiments is programmed, in any deterministic manner that will be apparent to the skilled artisan in view of this detailed description, to determine occurrence of pixels that are binarizable to a value 1 (or alternatively to value 0) along a straight line defined by the equation y=mx+c and that satisfy a specific test that is predetermined For example, some embodiments of processor 1013 may be programmed to simply enter the x,y coordinates of all pixels of a block 302 into such an equation, to determine for how many pixels in the block (that are binarizable to value 1) is such an equation satisfied (e.g. within preset limits). For example, to check if there is a straight line that is oriented parallel to the x-axis in block 302, processor 1013 may be programmed to set the slope m=0, then check if there are any pixels in block 302 at a common y co-ordinate (with a value of the constant c in the above-identified equation), which can be binarized to the value 1 (and then repeat for the value 0). In this manner, processor 1013 may be programmed to use a series of values (e.g. integer values) of constant “c” in the equation, to check for presence of lines parallel to the x-axis, at different values of y-coordinates of pixels in block 302.
  • During operation 220 (of pixel-line-presence detection), processor 1013 of some embodiments performs at least three acts 221-223 as follows. Specifically, in act 221 of several embodiments, processor 1013 is programmed to perform an initial binarization of pixels of a block 302 which is fitted around a region (e.g. the region
    Figure US20130194448A1-20130801-P00001
    in FIG. 3A) identified by act 212 (described above). The initial binarizing in act 221 is performed individually, on each unmerged block, as a part of the operation 220 (for pixel-line-presence detection) on the block (e.g. by assigning one of two binary values to each pixel). In some embodiments, all pixels identified as constituting a region, which is represented by the above-described list (e.g. generated by an MSER method, with two coordinates for each pixel in the region) are assigned the value 1 (in binary), and all remaining pixels in a block of this region are assigned the value 0 (in binary). Hence, in some embodiments, all pixels within the block but outside the region are assigned the value 0. The just-described binary values may be switched in other embodiments (e.g. pixels of a region may be assigned the value 0 and pixels in a block that are outside the region assigned the value 1). The binary values assigned in act 221 to pixels in a block are used within operation 220, and these values can be overwritten later, if binarization is performed again after merger (during verification in operation 240, described below).
  • Next, in act 222, processor 1013 is programmed to test for the presence or absence of a straight line passing through positions of the just-described binary-valued pixels (resulting from act 221) in the block. For example, in act 222 of some embodiments, processor 1013 checks whether a test (“pixel-line-presence test” or simply “line-presence test”) is satisfied, for detecting the presence of a line segment 305 (FIG. 3A) formed by pixels 303I . . . 303J of the value 1 (which is a common binary value of all these pixels),within block 302 (FIG. 3A). In several such embodiments, operation 220 checks for presence of pixels 303I . . . 303J (FIG. 3A) of a common binary value (or a common range of grey-scale values) occurring along a straight line 304 that is oriented longitudinally relative to block 302 (e.g. parallel to or within a small angle of, whichever side of block 302 is longer). Such a straight line 304 may be formed in block 302 by, for example, a number of pixels 303A . . . 303N which include several black colored pixels (with intensity of value 1 in binary) that are located in a single row (or in alternative embodiments, located in a group of adjacent rows).
  • Along the straight line 304 shown in FIG. 3A, not all pixels are binarizable to value 1 (representing black color) and instead certain pixels such as pixel 303I and 303J (also called “connected component pixels”) are binarizable to value 1 while pixels 303A and 303N (also called “other pixels”) are binarizable to value 0. Block 302 is automatically marked as having a pixel line present along straight line 304 by a test in act 222 of some embodiments that compares the number of black pixels occurring along straight line 304 to the number of black pixels occurring along other lines passing through block 302. In many embodiments, act 222 compares the number of pixels along multiple lines parallel to one another in a longitudinal direction of block 302, for example as discussed below.
  • In one example, act 222 determines that a pixel line is present in block 302 along straight line 304 when straight line 304 is found to have the maximum number of black pixels (relative to all the lines tested in block 302). In another example, act 222 further checks that the maximum number of black pixels along straight line 304 is larger than a mean of black pixels along the lines being tested by a predetermined amount and if so then block 302 is determined to have a pixel line present therein. The same test or a similar test may be alternatively performed with white pixels in some embodiments of act 222. Moreover, in some embodiments of act 212, the same test or a similar test may be performed on two regions of an image, namely the regions called MSER+and MSER−, generated by an MSER method (with intensities inverted relative to one another).
  • In some embodiments, block 302 is subdivided into rows oriented parallel to the longitudinal direction of block 302. Some embodiments of act 222 prepare a histogram of counters, based on pixels identified in a list of positions indicative of a region, with one counter being used for each unit of distance (“bin” or “row”) along a height (in a second direction, which is perpendicular to a first direction (e.g. the longitudinal direction)) of block 302. In some embodiments, block 302 is oriented with its longest side along the x-axis, and act 222 is performed by sorting pixels by their y-coordinates followed by binning (e.g. counting the number of pixels) at each intercept on the y-axis (which forms a bin), followed by identifying a counter which has the largest value among counters. Therefore, the identified counter identifies a peak in the histogram, which is followed by checking whether a relative location of the peak (along the y-axis) happens to be within a predetermined range, e.g. top ⅓rd of block height, and if so the pixel-line-presence test is met. So, a result of act 222 in the just-described example is that a pixel line (of black pixels) has been found to be present in block 302.
  • In several aspects of the described embodiments, processor 1013 is programmed to perform an act 223 to mark in a storage element 381 of memory 1012 (by setting a flag), based on a result of act 222, e.g. that block 302 has a line of pixels present therein (or has no pixel line present, depending on the result). Instead of setting the flag in storage element 381, block 302 may be identified in some embodiments as having a pixel line present therein, by including an identifier of block 302 in a list 1501 of identifiers (FIG. 10) in memory 1012.
  • After performance of act 223 (FIG. 2), processor 1013 may return to act 221, e.g. if the pixel-line-presence test has not yet been applied to any block, in a set of blocks formed by identification of MSERs of the image. Alternatively, act 221 may be performed repeatedly (prior to act 222), for all blocks in the set of blocks, followed performance of acts 222 and 223 repeatedly (e.g. in a loop), for all blocks in the set of blocks that have been binarized in act 221.
  • After operation 220, processor 1013 of some embodiments performs operation 230 wherein a block 302 which has been marked as pixel-line-present is tested for possible merger with one or more blocks that are adjacent to block 302, e.g. by applying one or more predetermined rules. Processor 1013 of some embodiments is programmed to perform an operation 230 (also called “merger operation”) which includes at least three acts 231, 233 and 233 as follows. In act 231, each block which has no intervening block between itself and a pixel-line-present block, and which is located at less than a specified distance (e.g. half of height of pixel-line-present block), is identified and marked in memory 1012 as “adjacent.”
  • In some embodiments of act 231, mobile device 200 uses each block 302 that has been marked as pixel-line-present in act 222, to start looking for and marking in memory 1012 (e.g. in a list 1502 in FIG. 10), any block (pixel-line-present or pixel-line-absent) that is located physically adjacent to the block 302 (which is marked pixel-line-present and has no other block located there-between). For example, performance of act 231 with block 403 in FIG. 4B as the pixel-line-present block results in blocks 402, 404 and 405 being marked in a memory as “adjacent” blocks. Act 231 is performed repeatedly in a loop in some embodiments, until all adjacent blocks in an image are identified followed by the act 232, although other embodiments repeat the act 231 after performance of acts 232 and 233 (described next).
  • In act 232 of some embodiments, processor 1013 merges a pixel-line-present block with a block adjacent to it, as identified in act 231. On completion of the merged, pixels in the merged block include at least pixels in the pixel-line-present block and pixels in the adjacent block (which may or may not have a pixel line present therein). A specific technique that is used in act 231 to merge two adjacent blocks can be different, depending on the embodiment, etc.
  • In some embodiments, a first list of positions of pixels of a first region 403R in a block 403 (FIG. 4B, also called “first block”) that has been marked as pixel-line-present is merged with a second list of positions of pixels of a second region 405R in a block 405 located above thereto (FIG. 4B, also called “second block”), to obtain a merged list of positions of a block 422 (FIG. 4D). In the block 422 (FIG. 4D, also called “merged block”), pixels of the first region 403R and pixels of the second region 405R (FIG. 4C) do not contact one another, because each region itself constitutes a connected component, unconnected to any other such region. The merged list of positions may then be used to identify the four corners of a rectangular block that tightly fits the merged region, in the manner described above in reference to act 212 (based on largest and smallest x and y coordinates of positions in the merged list). In some embodiments, the positions of the four corners of the merged block are stored in memory 1012, as per act 233. Furthermore, a mean intensity is computed in some embodiments, across all pixels in two blocks being merged and this value is also stored in memory 1012 in act 233, as the mean intensity of the merged block (e.g. for use in identifying binarizable values of pixels therein).
  • After act 233, processor 1013 of some embodiments returns to act 231 to identify an additional block that is located adjacent to the merged block. The additional block which is adjacent to the merged block (e.g. formed by merging a first block and a second block) may be a third block which has a third region therein. Therefore, in act 232 of some embodiments, processor 1013 merges a merged set of positions of the merged block with an additional set of positions of the third region in the third block. Depending on the image, the additional block which is adjacent to a merged block may itself be another merged block (e.g. formed by merging a third block with a third region therein and a fourth block with a fourth region therein). At least one of the third block and the fourth block is marked as pixel-line-present (for these two blocks to have been merged to form the additional block). Hence, act 232 in this example merges two blocks each of which is itself a merged block. Accordingly, the result of act 232 is a block that includes positions of pixels in each of the first block, the second block, the third block and the fourth block.
  • In some embodiments, act 232 is performed conditionally, only when certain predetermined rules are met by two blocks that are adjacent to one another. Specifically, in such embodiments, whether or not two adjacent blocks can be merged is typically decided by application of one or more rules that are predetermined (called “clustering rules”), which may be based on attributes and characteristics of a specific script, such as Devanagari script. The predetermined rules, although based on properties of a predetermined script of a human language, are applied in operation 230 of some embodiments, regardless of whether the two or more blocks being tested for merger are text or non-text. Different embodiments use different rules in deciding whether or not to merge two blocks, and hence specific rules are not critical to several embodiments. The one or more predetermined rules applied in operation 230 either individually or in combination with one another, to determine whether or not to merge a pixel-line-present block with its adjacent block may be different, depending on the embodiment, and specific examples are described below in reference to FIGS. 4A-4F.
  • As noted above, it is not known to processor 1013, at the time of performance of operation 220, whether any region(s) in a block 403 (FIG. 4B, also called “pixel-line-present” block) on which operation 220 is being performed, happens to be text or non-text. Specifically, operation 230 is performed prior to classification as text or non-text, which is performed in operation 250. Hence, in many embodiments, it is not known, at the time of performance of operation 230, whether two blocks being merged, are text or non-text. More specifically, on completion of operation 220, when a block 403 (FIG. 4B) has just been marked as pixel-line-present, it is not known to processor 1013 whether any region within block 403 is text or non-text.
  • Depending on the content of the image, a block which is marked by operation 220 as pixel-line-present may have a region representing a non-text feature in the image, e.g. a branch of a tree, or a light pole. Another block of the image, similarly marked by operation 220 as pixel-line-present, may have a region representing a text feature in the image, e.g. text with the format strike-through (in which a line is drawn through middle of text), or underlining (in which a line is drawn through bottom of text), or shiro-rekha (a headline in Devanagari script). So, operation 220 is performed prior to classification as text or non-text, any pixels in the regions that are being processed in operation 220.
  • In some embodiments, block 302, which is marked in memory 1012 as “pixel-line-present”, contains an MSER whose boundary may (or may not) form one or more characters of text in certain languages. In some languages, characters of text may contain and/or may be joined to one another by a line segment formed by pixels in contact with one another and spanning across a word formed by the multiple characters, as illustrated in FIG. 3A. Therefore, a merged block formed by operation 230 which may contain text, or alternatively non-text, is subjected to an operation 240 (also called “verification” operation).
  • Specifically, in some embodiments, after operation 230, mobile device 200 performs operation 240 which includes several acts that are performed normally prior to OCR, such as geometric rectification of scale (by converting parallelograms into rectangles, re-sizing blocks to a predetermined height, e.g. 48 pixels) and/or detecting and correcting tilt or skew. Hence, depending on the embodiment, a merged block obtained from operation 230 may be subject to skew correction, with or without user input. For example, skew may be detected and corrected via user input as described in U.S. patent application Ser. No. 13/748,562, Attorney Docket No. Q112726USos, filed concurrently herewith, entitled “Detecting and Correcting Skew In Regions Of Text In Natural Images” which is incorporated herein by reference in its entirety, above.
  • Operation 240 (for verification) of several embodiments further includes re-doing the binarization in act 241 (see FIG. 11; initially done in act 221 above), this time for a merged block. Operation 240 of several embodiments additionally includes re-doing the pixel-line-presence test in act 252 (see FIG. 11; initially done in act 222 above), this time for the merged block. Accordingly, some embodiments check whether an additional test is satisfied by the merged block, for presence of the pixels with a common binary value along another straight line passing through the merged block.
  • Pixel intensities that are used in binarization and in pixel-line-presence test in operation 240 (FIG. 2) are of all pixels in the merged block, which may include pixels of a pixel-line-present block (which contains a core portion of text) and pixels of an adjacent block which may be a pixel-line-absent block (which contain(s) supplemental portion(s) of text, such as accent marks). As noted above, pixels in a merged block on which operation 240 is being performed have not yet been classified as text or non-text, hence the pixel-line-presence test may or may not be met by the merged block, e.g. depending whether or not a line of pixels is present therein (based on the blocks being merged).
  • Accordingly, in several embodiments, binarization and pixel-line-presence test are performed twice, a first time in operation 220 and a second time in operation 240. So, a processor 1013 is programmed with computer instructions in some embodiments, to re-do the binarization and pixel-line-presence test, initially on pixels in at least one block and subsequently on pixels in a merged block (obtained by merging the just-described block and one or more blocks adjacent thereto). Note that at the time of performance of each of operations 220 and 240 it is not known whether or not the pixels (on which the operations are being performed) are text or non-text. This is because classification of pixels as text or non-text in operation 250 is performed after performance of both operations 220 and 240. Performing binarization and pixel-line-presence test twice, while the pixels are not yet classified as text/non-text, as described is believed to improve accuracy subsequently, in operations 250 and 260 (described below).
  • A merged block that passes the pixel-line-presence test in operation 240 is thereafter subject to classification as text or non-text, in an operation 250. Operation 250 may be performed in the normal manner, e.g. by use of a classifier that may include a neural network. Such a neural network may use learning methods of the type described in, for example, U.S. Pat. No. 7,817,855 that is incorporated by reference herein in its entirety. Alternatively, operation 250 may be performed in a deterministic manner, depending on the embodiment.
  • After operation 250, a merged block that is classified as text is processed by an operation 260 to perform optical character recognition (OCR) in the normal manner. Therefore, processor 1013 supplies information related to a merged block (such as coordinates of the four corners) to an OCR system, in some embodiments. During OCR, processor(s) 1013 of certain embodiments obtains a sequence of sub-blocks from the merged block in the normal manner, e.g. by subdividing (or slicing). Sub-blocks may be sliced from a merged block using any known method e.g. based on height of the text region, and a predetermined aspect ratio of characters and/or based on occurrence of spaces outside the boundary of pixels identified as forming an MSER region but within the text region. The result of slicing in operation 260 is a sequence of sub-blocks, and each sub-block is then subject to optical character recognition (OCR).
  • Specifically, in operation 260, processor(s) 1013 of some embodiments form a feature vector for each sub-bock and then decode the feature vector, by comparison to corresponding feature vectors of letters of a predetermined alphabet, to identify one or more characters (e.g. alternative characters for each block, with a probability of each character), and use one or more sequences of the identified characters with a repository of character sequences, to identify and store in memory 1012 (and/or display on a touch-sensitive screen 1001 or a normal screen 1002) a word identified as being present in the merged block.
  • As noted above, it is not known in operation 220, whether or not block 302 (FIG. 3A) which is being checked contains any text, and it is also not known in operation 230 whether any blocks being merged contain text. Therefore, when a pixel-line-present block is merged with an adjacent block, the two blocks may contain pixels that represent a non-text feature in an image of the real world, such as a light pole. Even so, several embodiments of the type described herein are based on an assumption that a block 302 with one or more rows of pixels 303I-303N (FIG. 3A) that form a line segment 305 contains characters of text, rather than details of natural features (such as leaves of a tree or leaves of plants, shrubs, and bushes) that are normally present in a natural image. Presence of text of certain languages in a natural image results in pixels 303A-303N (FIG. 3A) that may form a line segment 305 in block 302.
  • Note however, that even when text is actually contained in block 302, a line segment 305 of pixels that is detected in operation 220 may be oriented longitudinally relative to a block 302 (FIG. 3A), or oriented laterally relative to block 302, or block 302 may contain both longitudinally-oriented lines and laterally-oriented lines of pixels. In an illustrative example, shown in FIG. 4B, block 403 has one longitudinal line of black pixels 403T (FIG. 4C) and two lateral lines of black pixels 403A and 403B (FIG. 4B), while block 404 has three lateral lines of black pixels (not labeled) and one longitudinal line of black pixels (not labeled).
  • Depending on the font and the script of the text in an image, lines of pixels of a common binary value that traverse a block need not be strictly longitudinal or strictly lateral. Instead, a longitudinally-oriented line of pixels can be but need not be longitudinal. So, a longitudinally-oriented line in a block may be at a small angle (e.g. less than 20° or 10°) relative to a top side (or bottom side) of the block, depending on a pose of (i.e. position and orientation of) camera 1011 relative to a scene. When block 302 has its longitudinal direction oriented parallel (or within the small angle) to the x-axis (e.g. after geometric rectification, scaling and tilt correction), a longitudinal pixel line through block 302 has a constant y coordinate, which is tested in some embodiments by setting slope m to zero and using a series of values of constant “c”, as described above.
  • A pixel-line-presence test used in act 222 (FIG. 2) of some embodiments may be selected based on a language likely to be found in an image, as per act 202 described next. In some embodiments, selection of a pixel-line-presence test is made in act 202 based on a language that is identified in memory 1012 as being used by a user of mobile device 200 (e.g. in user input), or as being used in a geographic location at which mobile device 200 is located in real world (e.g. in a table). For example, memory 1012 in mobile device 200 of some embodiments includes user input wherein the user has explicitly identified the language. In another example, memory 1012 includes a table with one entry therein that maps the language Hindi as being used in the city Mumbai, India, and another entry therein that maps the language Arabic as being used in the city Riyadh, Saudi Arabia. Hence, processor 1013 of some embodiments performs a look-up on the just-described table, using a city in which mobile device 200 is located, which is identified as follows: processor 1013 uses bus 1113 to operate an in-built GPS sensor, such as sensor 1003 (FIG. 10) to obtain location coordinates, and then uses the location coordinates with map data (in disk 1008) to identify the city.
  • In certain illustrative embodiments, the language identified by processor 1013 is Hindi, and the pixel-line-presence test that is selected in act 202 (FIG. 2) is used identically when each of blocks 402, 403 and 404 (FIG. 4B) is evaluated by act 211 (FIG. 2) to identify presence of a pixel line that is a characteristic of the language Hindi, namely a shiro-rekha (also called “header line”). Hence, the pixel-line-presence test that is selected may test for pixels of a common binary value arranged to form a line segment 305 that is aligned with a top side of block 302 and located in an upper portion of block 302 (e.g. located within an upper one-third of the block, as described below in reference to a peak-location preset criterion in reference to FIG. 3A).
  • On completion of operation 220 (FIG. 2), a block 403 (FIG. 4B) that is marked as pixel-line-present may have one or more adjacent blocks such as block 405 that contain one or more portions of text, such as an accent mark. Due to the image being captured by a camera from a scene, there may be numerous other blocks (not shown in FIG. 4B) in the image which may have a similar configuration (in pixel intensities and locations), but such other blocks may (or may not) constitute details of natural features (such as leaves of plants, shrubs, and bushes), rather than portions of text.
  • Accordingly, accuracy in identifying text regions of a natural image (or video frame) is higher when using blocks that have been merged (based on presence of a pixel line in or between multiple characters) than the accuracy of prior art methods known to the inventors. For example, OCR accuracy of block 425 (FIG. 4F, also called “merged” block) is higher than OCR accuracy when each of blocks 402-405 (FIG. 4B) and blocks 411-413 (FIG. 4C) are OCR processed individually. In some embodiments, operation 240 may start by performing connected component analysis on each block received as input, e.g. so that an accent mark 406 that happens to be not included in any of blocks 402-405 and 411, 412 and 413 is likely to be included in a block that is classified as text and output by operation 240, for input to the OCR system.
  • As noted above in reference to act 222 of operation 220, FIG. 3A illustrates a block 302 of an image in memory 1012 of some embodiments of mobile device 200, wherein a selected test is applied to detect pixel line presence. An example of such a pixel-line-presence test is illustrated in FIG. 3B, and described next. After initialization in acts 311 and 312 (to select a row of pixels 303A-303N in block 302 and select as a current pixel, the pixel 303I in the selected row), the intensity of pixel 303I is checked (in act 313 of FIG. 3B) against one or more criteria that are based on other pixels in block 302. For example, some embodiments compare a current pixel's intensity with the mean intensity of pixels in block, and also to the mean intensity of pixels in one or more MSERs included in the block, and if the current pixel's intensity is closer to the mean intensity of MSER(s) then the current pixel's binary value is set to 1(in act 314 in FIG. 3B), else the binary value is set to 0 (in act 315 in FIG. 3B).
  • The just-described binarization technique is just one example, and other embodiments may apply other techniques that are readily apparent in view of this disclosure. In a simpler example, the current pixels' intensity may be compared to just a mean intensity across all pixels in block 302, and if the current pixel's intensity exceeds the mean, the current pixel is marked as 1 (in act 314) else the current pixel is marked as 0 (in act 315). Hence, mobile device 200 may be programmed to binarize pixels by 1) using pixels in a block to determine a set of one or more thresholds that depend on the embodiment, and 2) compare each pixel in the block with this set of thresholds, and 3) subsequently set the binarized value to 1 or 0 based on results of the comparison.
  • On completion of acts 314 and 315, control returns to act 312 to select a next pixel for binarizing, and the above-described acts are repeated when not all pixels in the current row have been visited (as per act 316). When act 316 finds that all pixels in a row of the block have been binarized, the number of pixels with value 1 in binary (e.g. black pixels) in each row “J” of block 302 is counted (as per act 317 in FIG. 3B) and stored in an array 371 (FIG. 3A) of memory 1012, as a projection count N[J]. After projection count N[J] is computed for a current row J, control returns to act 301 to select another row J+1 for generation of the projection count. When all rows have been processed (as per act 318 in FIG. 3B), projection count N[J], if plotted in a graph for human understanding, appears as graph 310 (FIG. 3A) that conceptually shows profile 311 of a histogram, although note that graph 310 is normally not plotted by mobile device 200.
  • Instead, after projection count N[J] is computed for all rows of a block 302 to form the histogram, the looping ends, and control transfers to operation 320 that computes attributes at the level of blocks, e.g. in acts 321 and/or 322. In act 321, mobile device 200 identifies a row Hp that contains a maximum value Np of all projection counts N[0]-N[450] in block 302, i.e. the value of peak 308 in graph 310 in the form of a histogram of counts of black pixels (alternatively counts of white pixels). At this stage, a row Hp (e.g. counted in bin 130 in FIG. 3A) in which peak 308 occurs is also known. Similarly, in act 322, mobile device 200 computes a mean Nm, across the projection counts N[0]-N[450].
  • Thereafter, mobile device 200 checks (in act 331) whether the just-computed values Nm and Np satisfy a preset criterion on intensity of a peak 308. An example of such a peak-intensity preset criterion is Nm/Np≧1.75, and if not then the block 302 is marked as “pixel-line-absent” in act 332 and if so then block 302 may be marked as “pixel-line-present” in act 334 (e.g. in a location in memory 1012 shown in FIG. 3A as storage element 381). In certain illustrative embodiments, when the preset criterion is met in act 331, another act 333 is performed to check whether a property of profile 311 satisfies an additional preset criterion.
  • In some embodiments, the additional preset criterion is on a location of peak 308 relative to a span of block 302 in a direction perpendicular to the direction of projection, e.g. relative to height of block 302. Specifically, a peak-location preset criterion may check where a row Hp (containing peak 308) occurs relative to height H of the text in block 302. For example, such peak-location preset criterion may be satisfied when Hp/H≦r wherein r is a predetermined constant, such as 0.3 or 0.4. Accordingly, presence of a line of pixels is tested in some embodiments within a predetermined rage, such as 30% from an upper end of a block.
  • When one or more such preset criteria are satisfied in act 334, mobile device 200 then marks the block as “pixel-line-present” and otherwise goes to act 332 to mark the block as “pixel-line-absent.” Although illustrative preset criteria have been described, other such criteria may be used in other embodiments of act 334. Moreover, although certain values have been described for two preset criteria, other values and/or other preset criteria may be used in other embodiments.
  • Note that a 0.33 value in the peak-location preset criterion described above results in act 334 testing for presence of a peak in the upper ⅓rd region of a block, wherein a pixel line called header line (also called shiro-rekha) is normally present in Hindi language text written in the Devanagari script. However, as will be readily apparent in view of this disclosure, specific preset criteria used in act 334 may be different, e.g. depending on the language and script of text to be detected in an image.
  • Specifically, in some embodiments, blocks of connected components that contain pixels of text in Arabic are marked as “pixel-line-present” or “pixel-line-absent” in the manner described herein, after applying the following two preset criteria. A first preset criterion for Arabic is same as the above-described peak-intensity preset criterion for Devanagri (namely Nm/Np>1.75). A second preset criterion for Arabic is a modified form of Devanagri's peak-location preset criterion described above.
  • For example, the peak-location preset criterion for Arabic may be 0.4≦<Hp/H≦0.6, to test for presence of a peak in a middle 20% region of a block, based on profiles for Arabic text shown and described in an article entitled “Techniques for Language Identification for Hybrid Arabic-English Document Images” by Ahmed M. Elgammal and Mohamed A. Ismail, believed to be published 2001 in Proc. of IEEE 6th International Conference on Document Analysis and Recognition, pages 1100-1104, which is incorporated by reference herein in its entirety. Note that although certain criteria are described for Arabic and English (see next paragraph), other similar criteria may be used for text in other languages wherein a horizontal line is used to interconnect letters of a word, e.g. text in the language Bengali (or Bangla).
  • Furthermore, other embodiments may test for presence of two peaks (e.g. as shown in FIG. 3C for English text) in act 334, so as to mark blocks of MSERs that satisfy the two-peak test, as “pixel-lines-present” or “pixel-lines-absent” followed by merging thereto of adjacent block(s) when certain predetermined rules are satisfied (for the English language), and followed by re-doing the two-peak test, in the manner described herein. Therefore, several such criteria will be readily apparent to the skilled artisan in view of this disclosure, based on one or more methods known in the prior art.
  • Accordingly, while various examples described herein use Devanagari to illustrate certain concepts, those of skill in the art will appreciate that these concepts may be applied to languages or scripts other than Devanagari. For example, embodiments described herein may be used to identify characters in Korean, Chinese, Japanese, Greek, Hebrew and/or other languages.
  • After marking a block in one of acts 332 and 334, processor 1013 of some embodiments checks (in act 336 in FIG. 3B) if all blocks have been marked by one of acts 332 and 334 by performing an act 335. If the answer in act 335 is no, then some embodiments of processor 1013 checks whether a preset time limit has been reached in performing the method illustrated in FIG. 3B and if not returns to act 301 and otherwise exits the method. In act 335 if the answer is yes, then processor 1013 goes to act 337 to identify adjacent blocks (as per act 231) that may be merged a rule in a plurality of predetermined rules (“clustering” rules) is met.
  • FIG. 4A illustrates an example of an image 401 in the prior art.
  • Image 401 is processed by performing a method of some embodiments as described above, and as illustrated in FIGS. 4B-4F, to form a merged block. Note that initialization in act 212 identifies MSERs to form blocks 402-405, and in doing so an accent mark 406 of this example happens to be not identified as being included in any MSER, and therefore not included in any block.
  • More specifically, in act 212, a block 402 (also called “first block”) is identified in the example of FIG. 4B to include a first region
    Figure US20130194448A1-20130801-P00002
    in the image 401 with a first plurality of pixels (identified by a first set of positions) that are contiguous with one another and include a first local extrema of intensity in the image 401. Also in act 212, a block 403 (also called “second block”) is identified in the example of FIG. 4B to include a second region
    Figure US20130194448A1-20130801-P00003
    in the image 401 with a second plurality of pixels (identified by a second set of positions) that are contiguous with one another and include a second local extrema of intensity in the image 401. In this manner, each of blocks 402-405 illustrated in FIG. 4B is identified in act 212. Although an MSER method is used in some embodiments of act 212, other embodiments of act 212 use other methods that identify connected components.
  • In several of the described embodiments, blocks 402-405 are thereafter processed for pixel line presence detection, as described above in reference to operation 220 (FIG. 2). On completion of operation 220, blocks 402, 404 and 404 are tagged (or marked) as being “pixel-line-present” (see FIG. 4B), and block 405 is tagged (or marked) as being “pixel-line-absent.”
  • Next, image 401, has the polarity (or intensity) of its pixels reversed (as would be readily apparent to a skilled artisan, so that white pixels are changed to black and vice versa) and the reversed-polarity version of image 401 is then processed by act 212 (FIG. 2) to identify blocks 411-413 (FIG. 4C). Blocks 411-413 are then processed by act 223 to mark them as pixel-line-absent.
  • Next, as per operation 230, each of blocks 402-404 (also called the pixel-line-present blocks) are checked for presence of any adjacent blocks that can be merged. Specifically, on checking the block 402 which is identified as pixel-line-present, for any adjacent blocks, a block 411 (FIG. 4C) is found. Therefore, the blocks 402 and 411 are evaluated by application of one or more clustering rules 503 (FIG. 10) in block merging module 141B (FIG. 10). The clustering rules 503 can be different for different scripts, e.g. depending on language to be recognized, in the different embodiments.
  • Clustering rules 503 to be applied in operation 230 may be pre-selected, e.g. based on external input, such as identification of Devanagri as a script in use in the real world scene wherein the image is taken. The external input may be automatically generated, e.g. based on geographic location of mobile device 200 in a region of India, e.g. by use of an in-built GPS sensor as described above. Alternatively, external input to identify the script and/or the geographic location may be received by manual entry by a user. Hence, the identification of a script in use can be done differently in different embodiments.
  • Based on an externally-identified script, one or more clustering rules 503 (FIG. 10) are predetermined for use in operation 230. In this example, assuming the clustering rules 503 are met, blocks 402 and 411 are merged with one another, to form block 421 (see FIG. 4D). Similarly, block 403 which has been identified as pixel-line-present, is checked for any adjacent blocks and block 405 is found. Therefore, the blocks 403 and 405 are evaluated by use of clustering rule(s) 503 in block merging module 141B (FIG. 10), and the rules are met in this example, so blocks 403 and 405 are merged by block merging module 141B to form block 422 in FIG. 4D. Finally, on checking the block 404 also identified as pixel-line-present for any adjacent blocks, a block 413 is found. Therefore, the blocks 404 and 413 are evaluated by use of clustering rule(s) 503 in block merging module 141B (FIG. 10), and the rules are met in this example, so blocks 404 and 413 are merged by block merging module 141B to form block 423 (also called “merged” block) in FIG. 4D.
  • Merged blocks that are generated by block merging module 141B as described above may themselves be further processed in the manner described above in operation 230. For example, block 421 (also called “merged” block) is used to identify any adjacent block thereto, and block 422 is found. Then, the block 421 (also called “merged” block) and block 422 are evaluated by use of clustering rule(s) in block merging module 141B, and assuming the rules are met in this example, so block 421 (which is a merged block) and block 422 are merged by block merging module 141B to form block 424 (also called “merged” block) in FIG. 4E. Similarly, block 423 (also called “merged” block) is used to identify any adjacent block, and therefore block 424 (also a “merged” block) is found. Next, the blocks 423 and 424 (both of which are merged blocks) are evaluated by use of clustering rule(s) in block merging module 141B, and the rules are met in this example, so blocks 423 and 424 are merged by block merging module 141B to form block 425 in FIG. 4F. Block 425 (also a merged block) is thereafter processed by operation 230, and on finding no adjacent blocks, it is then processed by operation 240 (see FIG. 2) in the normal manner.
  • One or more predetermined rules (“clustering rules”) 503 are used in some embodiments of operation 230 (described above, also called “merger” operation) by block merging module 141B to decide whether or not to merge a block that is known to have a pixel line present therein (such as block 403) in an image, with one or more blocks adjacent to it (such as block 405), by performance of a method illustrated in FIG. 5. Specifically, after initialization in acts 501 and 502 to select a pixel-line-present block as the first block, and select as the second block any block that is located adjacent to the pixel-line-present block (i.e. first block), three predetermined rules are applied by block merging module 141B in operations 510, 520 and 530 respectively, as described below.
  • Although a specific order of operations is illustrated in FIG. 5 for some embodiments, namely operation 510 to apply a first rule (which is formulated based on accent marks or modifiers, called ‘maatras’ in Devanagri script), followed by operation 520 to apply a second rule (which is formulated based on a broken word), followed by operation 530 to apply a third rule (which is formulated based on half letters), other embodiments may perform these operations (or other such operations) in a different order relative to one another, or may omit one or more of these operations or perform additional operations, as will be readily apparent, to decide whether or not to merge blocks.
  • In some embodiments, when any one rule is satisfied, in a corresponding one of the operations 510, 520 and 530, then operation 230 (also called “merger” operation) is performed, regardless of whether or not the blocks have text therein. On completion of operation 230, in some embodiments the merged block is itself marked as pixel-line-present by block merging module 141B, and therefore eligible for selection as the first block in act 501 (followed by act 502 in which an adjacent block is selected as the second block). In several embodiments, operation 230 is performed prior to classification and therefore it is not known to processor 1013 at the time of operations 510, 520 and 530, whether the blocks that are being merged have pixels that represent text or non-text in the image. When no rule is found to be satisfied in any of operations 510, 520 and 530, act 541 is performed to check if all blocks adjacent to the pixel-line-present block (i.e. first block) have been checked, and if not control returns to act 502 and another block that is adjacent to the first block is then selected as the second block.
  • In act 541, when all blocks that are adjacent to the pixel-line-present block (i.e. first block) have been checked, control transfers to act 542 to check if all pixel-line-present blocks have been checked in the just-described manner and if not, control transfers to act 501 to select another pixel-line-present block as the first block. When all pixel-line-present blocks have been checked (for merger with their respective adjacent blocks), then control transfers to operation 240 (also called “verification” operation) to continue with further processing, such as geometric rectification of scale and/or tilt, followed by binarization of merged blocks, which is then followed by pixel-line-presence test on merged block(s). Operation 240 is followed by operation 250 wherein classification of merged blocks (as well as unmerged blocks) as text or non-text, is performed (as described above), which is then followed by optical character recognition.
  • Operations 510, 520 and 530 of some embodiments check for of overlap between projections (see projection overlap rules 503P in FIG. 10) of a pixel-line-present block and its adjacent block on to straight lines that are perpendicular to each other, e.g. x-axis and y-axis. A projection of a block on to a straight line (also called “support”) can be same as a “span” of the block along the straight line, e.g. when the straight line is at an edge of the block. Hence, a block's left edge and the y-axis may be coincident, in which case the vertical projection and the vertical span are identical, or the block's bottom edge and the x-axis may be coincident, in which case the horizontal projection and the horizontal span are identical.
  • As noted above, at this stage, during performance of operations 510, 520 and 530 prior to operation 240, it is not known to processor 1013 whether or not the blocks include text or non-text regions of the image. Applying clustering rules 503 to blocks that happen to be adjacent, one of which has a pixel line present, but neither of which has yet been classified as text/non-text, enables processor 1013 to generate merged blocks on which verification is performed, followed by classification and OCR which is found to be more successful than in the prior art, as described below.
  • In a first example of applying a clustering rule (e.g. projection overlap rule 503P in FIG. 10), operation 510 performs a test (also called “first test”) to check for a first predetermined percentage of overlap (e.g. 100% overlap) between horizontal projections 621H and 622H (on to a straight line, e.g. the horizontal line 625H) and performs another test (also called “second test”) to check for a second predetermined percentage of overlap (e.g. 0% overlap) between vertical projections 621V and 622V (on to an additional straight line, e.g. the vertical line 625V) of blocks 621 and 622. Note that blocks 621 and 622 being used in the just-described tests are selected such that they do not themselves overlap one another, as shown in FIG. 6D. In a second example of applying a clustering rule (e.g. projection overlap rule 503P in FIG. 10), operation 520 checks for a third predetermined percentage of overlap (e.g. 0% overlap) between horizontal projections 621H and 620H, and a fourth predetermined percentage of overlap (e.g. 100% overlap) between vertical projections 621V and 620V, of blocks 621 and 620 that do not themselves overlap one another, as shown in FIG. 7B. Any two blocks do not overlap one another, when pixels of one block are not present in the other block and vice versa, and such blocks are used in tests of these two examples.
  • Hence, processor 1013 is programmed to use such clustering rules 503 that are predetermined, as described more completely below, to select two blocks to merge in block merging module 141B, when the two blocks do not overlap one another, regardless of whether the blocks contain text or non-text. Merger of two or more non-overlapping blocks by block merging module 141B s, when the predetermined rules are met as just described, results in a merged block on completion of operation 230 (FIG. 2) which is then subjected to verification in operation 240.
  • Specifically, in one illustrative example, operation 510 includes acts 611-617, described next. In act 611 (FIG. 6A), mobile device 200 evaluates for merger with one another: a pixel-line-present (“first”) block (e.g. block 621 in FIG. 6B) and another (“second”) block (e.g. block 622 in FIG. 6C) that is adjacent thereto, which do not overlap one another. Hence, processor 1013 is programmed (e.g. to implement the projection overlap rule 503P in FIG. 10) to check if a projection (“horizontal projection”) 621H of block 621 (FIG. 6B) on horizontal line 625H satisfies a test of overlap with a horizontal projection 622H of block 622 (FIG. 6C) on the same horizontal line 625H, e.g. a test for 100% overlap between projections. A test for overlap of projections of blocks is satisfied in some examples, when a projection of a block 622 that is adjacent (e.g. horizontal projection 622H on to horizontal line 625H, or x-axis in FIG. 6C) is overlapped partially or wholly by a projection of block 621 that is marked pixel-line-present (e.g. horizontal projection 621H on the x-axis in FIG. 6B). When such a horizontal projection overlap test, using only x-coordinates, is found to be met by block merging module 141B, control transfers from act 611 to act 612. Three illustrative examples of such a horizontal projection overlap test (e.g. to implement the projection overlap rule 503P in FIG. 10) are described next.
  • A 100% horizontal projection overlap condition is tested by block merging module 141B in one example of act 611 by use of x-coordinates x1 and x2 of bottom left and bottom right corners of block 621 that is marked pixel-line-present (which identify the horizontal projection of block 621), and x-coordinates x3 and x4 of the bottom left and bottom right corners of block 622 that is adjacent (which identify the horizontal projection of block 622) as follows, is the following condition met: x1<x3<x4<x2 by the x-coordinates of the corners of the two blocks. The just-described condition on overlap of projections is based on geometric attributes of the two blocks subject to the test, namely two specified coordinates (on a coordinate axis, e.g. x-axis) of two specified corners of one block with two specified coordinates of two specified corners of the other block (on the same coordinate axis).
  • The just-described horizontal projection overlap condition of 100% can be satisfied in some situations (as illustrated in FIG. 6D) wherein the two blocks are aligned with horizontal line 625H, but this condition is not satisfied in other situations. For example, an angular offset between the blocks and the horizontal line, such as angle 629 in FIG. 6D may be sufficiently large that the 100% overlap condition (described above) is not satisfied. To accommodate such other situations, processor 1013 is programmed to implement the block merging module 141B by using less stringent conditions in some embodiments of a horizontal projection overlap condition (in projection overlap rule 503P in FIG. 10), as follows.
  • A left-partial horizontal projection overlap condition is tested by block merging module 141B in one example of act 611 when x3<x1<x4<x2, and the ratio (x4−x1)/(x4−x3) is greater than a predetermined fraction, e.g. 0.5. A right-partial horizontal projection overlap condition is tested by block merging module 141B in another example of act 611, when x1<x3<x2<x4, and the ratio (x2−x3)/(x4−x3) is greater than a predetermined fraction, e.g. also 0.5. The just-described two conditions are also based on geometric attributes of the two blocks, as noted above.
  • In act 612 (FIG. 6A), mobile device 200 checks if a first additional projection (e.g. vertical projection) of the first block (e.g. block 621 in FIG. 6B) and a second additional projection (e.g. vertical projection) of the second block (e.g. block 622 in FIG. 6C) satisfy a second test for overlap on an additional straight line (e.g. the y-axis) which is perpendicular to the straight line used in act 611 (e.g. the x-axis). When block merging module 141B finds that such a vertical projection overlap test is met, e.g. using only the y-coordinates of the corners of the two blocks (which identify vertical projections), control transfers from act 612 to act 613. Illustrative examples of such a vertical projection overlap test are described next.
  • A 0% vertical projection overlap condition is tested by block merging module 141B in one example of act 612 (see FIG. 6D), assuming the second block is located in the image above the first block, by use of a y-coordinate y3 of bottom-left corner of block 622 (also called “second block”, i.e. bottom coordinate y3 of vertical projection 622V) and the y-coordinate y2 of the top-right corner of block 621 (also called “first block”, i.e. top coordinate y2 of vertical projection 621V), as follows: y2<y3. A less stringent, partial vertical projection overlap condition is tested in another example of act 612 when y2>y3, if the ratio (y2−y3)/(y2−yl) is less than a predetermined fraction, e.g. 0.1.
  • Alternatively, if the above-described conditions are not met in act 612 mobile device 200 may check a similar condition under the assumption that the second block is located in the image below the first block, by block merging module 141B using the bottom left y-coordinate yl of block 621 (also called “first” block, FIG. 6B) and the top left y-coordinate y3 of block 622 (also called “second” block, FIG. 6C), as follows: y4<y1. A less stringent, partial vertical projection overlap condition is tested by block merging module 141B in another example of act 612 when y4>y1, if the ratio (y4−yl)/(y2−yl) is less than a predetermined fraction, e.g. 0.1.
  • Another predetermined test in such a clustering rule, e.g. aspect ratio rule 503A (FIG. 10) may cause block merging module 141B to check as per act 613 that the aspect ratio of the second block (or adjacent block) 622 lies within a predetermined range e.g. Thresh1≦Length:Breadth of block≦Thresh2 wherein Thresh1 and Thresh2 are constants that are empirically determined In the example illustrated in FIG. 6C, the aspect ratio (x4−x3)/(y4−y3) is computed by block merging module 141B and then checked against the limits 0.8 and 1.2. The just-described condition is based on a geometric attribute of the second block, namely an aspect ratio of the block. Note that the act 613 does not use any information on the first block (e.g. block 621, marked as pixel line present), other than the fact that second block (e.g. block 622) is adjacent thereto.
  • Yet another predetermined test in a clustering rule, such as a relative heights rule 503R (FIG. 10) may cause block merging module 141B to check, as per act 614, that the height of an adjacent (second) block (“Maatra Height”) is within a certain percentage of the height of the pixel-line-present (first) block (“Word Height”). For example, the following condition is checked by block merging module 141B in some embodiments of first tests: Thresh3*Word Height≦Maatra Height≦Thresh4*Word Height, wherein Thresh3 and Thresh4 are constants that are empirically determined. The just-described condition is again based on geometric attributes of the two blocks subject to the test, in this example it is a comparison of heights of the two blocks (i.e. relative heights). Note that the ratio Maatra Height/Word Height (e.g. ratio (vertical projection 622V/vertical projection 621V) in FIG. 6B) may be computed by block merging module 141B and checked against the range Thresh3-Thresh4 in some embodiments of act 614. So, an additional test for merger of blocks 403 and 405 is satisfied when the height of block 405 (also called “second” block) is between two predetermined fractions (namely Thresh 3 and Thresh 4) of the height of block 403 (also called “first” block).
  • Still another first type of clustering rule, such as spacing rule 503S may cause block merging module 141B to check, as per act 615 performed by mobile device 200 of some embodiments, that the location of the adjacent (second) block (e.g. block 622 in FIG. 6D or block 632 in FIG. 6E) is above (or below depending on the rule) the pixel-line-present (first) block (e.g. block 621 in FIG. 6D or block 631 in FIG. 6E respectively) within a predetermined distance, e.g. check for less than 10% vertical projection overlap between the two blocks (in addition to more than 90% horizontal projection overlap). The just-described condition is again based on geometric attributes of the two blocks, namely overlap of projections. Some embodiments of block merging module 141B check whether an additional rule is satisfied by a predetermined geometric attribute (e.g. aspect ratio) of at least one block (e.g. pixel-line-absent block) relative to another block (e.g. pixel-line-present block). In response to finding that such rules are satisfied, the two blocks are merged in some embodiments.
  • When a result of act 615 is that the adjacent (second) block is located above the pixel-line-present (first) block, mobile device 200 performs act 617 and else performs act 616. In act 617, block merging module 141B in mobile device 200 checks if the distance between the adjacent (second) block and the pixel-line-present (first) block is less than Thresh5*Word Height, wherein Thresh5 is an above-block limit (also called “first predetermined limit”) that is predetermined empirically, and Word Height is the height of the pixel-line-present (first) block (e.g. vertical projection 621V in FIG. 6B). The just-described condition is once again based on geometric attributes of the two blocks, namely vertical separation (or gap) above the pixel-line-present block.
  • In act 616, block merging module 141B in mobile device 200 checks if the distance between the adjacent (second) block and the pixel-line-present (first) block is less than Thresh6*Word Height, wherein Thresh6 is a below-block limit (also called “second predetermined limit”) that is also predetermined empirically. The just-described condition is once again based on geometric attributes of the two blocks, namely vertical separation (or gap) below the pixel-line-present block. If the answer is yes in either of acts 616 and 617, control transfers to operation 230, and otherwise control transfers to operation 520. In some embodiments, the acts 614 and 615 may further check that the adjacent (second) block is marked as pixel-line-absent.
  • At this stage, in operation 520, as noted above it is not known to processor 1013 (at this stage) whether the blocks have text or non-text. Even so, a second clustering rule (FIG. 7A) is used in some embodiments of operation 520, based on an assumption that two pixel-line-present blocks that are located adjacent to one another constitute a word, with a broken header line in Hindi (or base line in Arabic) resulting in two separate connected components for the single word.
  • For example, one test in the second clustering rule, such as projection overlap rule 503P may cause block merging module 141B to check for 0% horizontal projection overlap and 95% vertical projection overlap between the two pixel-line-present blocks (e.g. see acts 711 and 712 in FIG. 7A). Therefore, in the example shown in FIG. 7B, the horizontal projections 620H, 621H and 623H in FIG. 7B) may be checked by block merging module 141B for zero overlap among each other, and the vertical projections 620V, 621V and 623V may be checked for 100% overlap with each other. Another second type of clustering rule may cause block merging module 141B to check that a height difference between the two pixel-line-present blocks, as a percentage of the height of one of the blocks is less than 5%, e.g. see act 713 in FIG. 7A. The two conditions in this paragraph are also based on geometric attributes of the two blocks, namely projections and height differences.
  • In the example of FIG. 7B, differences between pairs of vertical projections (620V, 621V), (620V, 623V) and (621V, 623V), which are also called vertical spans, may be computed by block merging module 141B and checked to see if they are less than 5% of one of the spans used to compute the difference, e.g. 620V, 620V and 621V respectively. Yet another second type of clustering rule may cause block merging module 141B to check, as per act 714 in FIG. 7A that the horizontal distance of separation between the two line-present blocks, as a percentage of the length of one of the blocks is less than 5%. In the example of FIG. 7B, differences between pairs of horizontal projections (620H, 621H) and (621H, 623H) which are also called horizontal spans may be computed by block merging module 141B and checked to see if they are less than 5% of one of the spans used to compute the difference, e.g. 620H and 621V respectively.
  • A third clustering rule (FIG. 8A) checks if a second block constitutes a half letter of text within the pixel-line-present block. For example a test in a third clustering rule may cause block merging module 141B to check for 100% horizontal projection overlap and 100% vertical projection overlap between the adjacent block and the line-present block as per acts 811 and 812, because a broken letter in the language Hindi is normally embedded within a main word. Therefore, this is an example wherein the second block completely overlaps the first (pixel-line-present) block, although as noted above most tests check that the two blocks do not overlap one another.
  • Specifically, as illustrated in FIG. 8B, the coordinates of the four corners of block 822 (also called “second” block) may be checked by block merging module 141B relative to the coordinates of the four corners of block 821 (also called “first” block) in acts 811 and 812, to ensure 100% overlap in both horizontal projections (the projection 822H fully overlaps the projection 821H) and vertical projections (the projection 822V fully overlaps the projection 821V). As another example, one more test in the third clustering rule may cause block merging module 141B to check as per act 813 that a height difference between the first (pixel-line-present) block (e.g. block 821) and the second block (e.g. block 822), as a percentage of the height of one of the blocks is less than 5%. Specifically, as illustrated in FIG. 8B, the ratio of the height of projection 822V (also called “vertical span”) and the height of projection 821V (also called “vertical span”) is checked to be within 95%.
  • Furthermore, another test in the third clustering rule may cause block merging module 141B to check, as per act 814 that the aspect ratio (i.e. the ratio Length/Breadth) of the second block is between 0.7 and 0.9 (denoting a half-character of smaller width than a single character) while the aspect of the first (pixel-line-present) block is greater than 2 (denoting multiple characters of greater width than a single character). In the example of FIG. 8B, the ratio of the height of projection 822V (also called “vertical span”) and the width of projection 822H (also called “horizontal span”) is checked by block merging module 141B to be between 0.7 and 0.9 and the ratio of the height of projection 821V (also called “vertical span”) and width of projection 821H (also called “horizontal span”) is checked to be greater 2. Still another test in the third clustering rule may cause block merging module 141B to check as per act 815 in FIG. 8A that the center of the second block and the center of the first (pixel-line-present) block have y-coordinates, differ from each other by less than 5%. In the example of FIG. 8B, the difference between the y-coordinate at center of projection 822V (also called “vertical span”) and the y-coordinate at center of projection 821V (also called “vertical span”) may be checked by block merging module 141B to be within 5% of projection 822V.
  • Moreover, in some aspects of the described embodiments, classification of blocks into text or non-text is performed by use of a neural network in operation 230 using parameters 911-915 illustrated in FIG. 9, for a merged block. For example, parameter 911 is the relative location of a line 910 (such as a header line or shiro-rekha in text written in Devanagri script), computed as Hp/H (as discussed above in reference to act 333 of FIG. 3B). Moreover, parameter 912 is the relative strength of this line 910, computed as Np/Nm (as discussed above in reference to act 331 of FIG. 3B). Furthermore, the number of vertical lines 901, 902 . . . 905 . . . 909 (FIG. 9) is counted, e.g. as peaks in a vertical projection and a ratio of this number to the length L (also called width) of the block is another parameter 913 that is also used in the neural network classifier.
  • An example of another attribute of a merged block is indicative of another mean of another number of transitions in a predetermined direction (e.g. longitudinal direction), from the second binary value to the first binary value (e.g. from value 1 to value 0), in a row in a set of rows. Specifically, in some embodiments, during classification, two numbers are counted, namely white-to-black transitions and black-to-white transitions in a predetermined direction, with each number being another attribute of the merged block. Some embodiments use an attribute of the merged block that is indicative of a ratio of (A) a mean of a number of transitions in a predetermined direction (e.g. horizontal direction) from a first binary value (e.g. value 1 for a black colored pixel) to a second binary value (e.g. value 0 for a white colored pixel), in each row in a set of rows in the merged block and (B) a width of the merged block.
  • In one illustrative example, two numbers of transition are counted for a subset of rows in the merged block that are located at specified position(s) relative to a position of a peak in the block (e.g. at which the header line of a word of text in Hindi occurs, if present in a pixel-line-present block), as follows. In the illustrative example, a peak's position (relative to a vertical span of the block) may be used to identify rows in the block that are located below the peak by at least a predetermined distance (e.g. specified as a percentage of the block height) as belonging to the subset. In a subset of rows that are identified by use of the block's pixel line, in some embodiments, two types of transitions are averaged (namely a number of transitions from value 0 in binary to value 1 in binary and another number of transitions from value 1 in binary to value 0 in binary), and the resulting means (i.e. averages) are used as parameters 914 and 915 which are input to a neural network classifier, e.g. implemented by a processor executing the classifier software 552 used in operation 250. As noted below, a neural network classifier is just one example of the type of classifier that can be programmed to use one or more of parameters 911-915 in different aspects of the described embodiments.
  • In some embodiments, the operation 220 (for pixel line presence detection) and operation 230 are performed assuming that a longitudinal direction of a connected component of text is well-aligned (e.g. within angular range of +5° and −5°) relative to the longitudinal direction of the block containing that connected component. Accordingly, in such embodiments, blocks in which the respective connected components are misaligned may not be marked as “pixel-line-present” and therefore not be merged with their adjacent “pixel-line-absent” blocks.
  • Accordingly, in some embodiments, skew of one or more connected components relative to blocks that contain them may be identified by performing geometric rectification(e.g. re-sizing blocks), and skew correction (of the type performed in operation 240). Specifically, an operation 270 to detect and correct skew is performed in some embodiments as illustrated in FIG. 12 (after initialization operation 210), followed by operation 220. Operation 270 (FIG. 12) may be based on prompting for and receiving user input on tilt or skew in some embodiments, while other embodiments (described in the next paragraph, below) automatically search coarsely, followed by searching finely within a coarsely determined range of tilt angle. Hence, in several embodiments it is the skew-corrected blocks that are subjected to operation 220 (for pixel line presence detection) and operation 230, as described above. In some embodiments, operation 270 to determine skew also identifies presence of a line of pixels, and hence acts 221-223 are performed as steps within operation 270. A specific manner in which skew is corrected in operation 270 can be different in different embodiments, and hence not a critical aspect of many embodiments of operation 220.
  • In some embodiments, processor 1013 is programmed to select blocks based on variance in stroke width and automatically detect skew of selected blocks as follows. Processor 1013 checks whether at a candidate angle, one or more attributes of projection profiles meet at least one test for presence of a straight line of pixels, e.g. test for presence of straight line 304 (FIG. 3A) of pixels in block 302 with a common binary value (e.g. pixels of a connected component). Some embodiments detect a peak of the histogram of block 302 at the candidate angle by comparing a highest value Np in the counters to a mean Nm of all values in the counters e.g. by forming a ratio therebetween as Np/Nm, followed by comparing that ratio against a predetermined limit (e.g. ratio >1.75 indicates peak). When a peak is found (e.g. the predetermined limit is exceeded by the ratio), a y-coordinate of the peak (see Hp in FIG. 3A) is compared with a height of the box Hb to determine whether the peak occurs in an upper 30% (or upper 20% or 40% in alternative embodiments) of the block. If so, the candidate angle is selected for use in a voting process, and a counter associated with the candidate angle is incremented. Processor 1013 repeats the process described in this paragraph with additional blocks of the image, and after a sufficient number of such votes have been counted (e.g. 10 votes), the candidate angle of a counter which has the largest number of votes is used as the skew angle, which is then used to automatically correct skew in each block (among the multiple blocks used in the skew computation).
  • Classification of the type described herein in operation 250 may be implemented using machine learning methods (e.g. neural networks) as described in a webpage at http://en.wikipedia.org/wiki/Machine_learning. Other methods of classification in operation 240 that can also be used are described in, for example the following, each of which is incorporated by reference herein in its entirety:
    • a. Matteo Pardo and Giorgio Sberveglieri, “Learning From Data: A Tutorial With Emphasis on Modern Pattern Recognition Methods,” IEEE Sensors Journal, vol. 2, no. 3, June 2002;
    • b. Lasse Holmstrom, Petri Koistinen, Jorma Laaksonen and Erkki Oja, “Neural and Statistical Classifiers—Taxonomy and Two Case Studies,” IEEE Transactions on Neural Networks, vol. 8, no. 1, January 1997.
  • Several operations and acts of the type described herein are implemented by a processor 1013 (FIG. 10) that is included in a mobile device 200 capable of identifying blocks of connected components in which a pixel line is present, followed by merger of adjacent blocks. Mobile device 200 may include a camera 1011 to generate an image 107 (or frames of a video, each of which may be image 107) of a scene in the real world. Mobile device 200 may further include sensors 1003, such as accelerometers, gyroscopes, GPS sensor or the like, which may be used to assist in determining the pose (including position and orientation) of the mobile device 200 relative to a real world scene.
  • Also, mobile device 200 may additionally include a graphics engine 1004, an image processor 1005, and a position processor. In addition to memory 1012, mobile device 200 may include one or more other types of memory such as flash memory (or SD card) 1008 and/or a hard disk and/or an optical disk (also called “secondary memory”) to store data and/or software for loading into memory 1012 (also called “main memory”) and/or for use by processor(s) 1013.
  • Mobile device 200 may further include a circuit 1010 (e.g. with wireless transmitter and receiver circuitry therein) and/or any other communication interfaces 1009. A transmitter in circuit 1010, which may be an IR or RF transmitter or a wireless a transmitter enabled to transmit one or more signals over one or more types of wireless communication networks such as the Internet, WiFi, cellular wireless network or other network.
  • It should be understood that mobile device 200 may be any portable electronic device such as a cellular or other wireless communication device, personal communication system (PCS) device, personal navigation device (PND), Personal Information Manager (PIM), Personal Digital Assistant (PDA), laptop, camera, smartphone, tablet (such as iPad available from Apple Inc) or other suitable mobile platform that is capable of creating an augmented reality (AR) environment.
  • Note that input to mobile device 200 can be in video mode, where each frame in the video is equivalent to the image input which is used to identify connected components, and to compute a skew metric as described herein. Also, the image used to compute a skew metric as described herein can be fetched from a pre-stored file in a memory 1012 of mobile device 200.
  • A mobile device 200 of the type described above may include an optical character recognition (OCR) system as well as software that uses “computer vision” techniques. The mobile device 200 may further include, in a user interface, a microphone and a speaker (not labeled) in addition to touch-sensitive screen 1001 or normal screen 1002 for displaying captured images and any text/graphics to augment the images. Of course, mobile device 200 may include other elements unrelated to the present disclosure, such as a read-only-memory 1007 which may be used to store firmware for use by processor 1013.
  • Mobile device 200 of some embodiments includes, in memory 1012 (FIG. 10) computer instructions in the form of software 141 that is used to process an image 107 of a scene of the real world, as follows. Specifically, in such embodiments, a region identifier 141R (FIG. 10) is coupled to the locations in memory 1012 wherein image 107 is stored. Region identifier 141R (FIG. 10) is implemented in these embodiments by processor 1013 executing computer instructions to implement any method of identifying MSERs, thereby to generate a set of blocks 302 in memory 1012.
  • Furthermore, a pixel line presence tester 141T (FIG. 10) is implemented in several embodiments by processor 1013 executing computer instructions to use any test (e.g. by selecting the test based on user input) to check whether each block, in the set of blocks 302 satisfies the test. As noted above, such a test may be selected by identification of a script of a specific language, designed to identify presence of a line of pixels in each block. Pixel line presence tester 141T of some embodiments includes a binarization module (not shown) and a histogram generator (also not shown), for use in generating a profile of the number of pixels having a common binary value (relative to one another) and located along each row (or each column), depending on the language identified by user input 141U.
  • Moreover, a pixel line presence marker 141M (FIG. 10) is implemented in several embodiments by processor 1013 executing computer instructions to receive a result from pixel line presence tester 141T and respond by storing in memory 1012, e.g. a list 1501 of identifiers of blocks that are marked as “line-present” blocks. Blocks not identified in the list 1501 are treated, in some embodiments of software 141, as line-absent blocks.
  • Furthermore, an adjacent block identifier 141A (FIG. 10) is implemented in several embodiments by processor 1013 executing computer instructions to use a block that is marked in list 1501 of identifiers as being line-present, to identify from among the set of blocks 302, one or more blocks that are located adjacent to the line-present block e.g. as another list 1502 of identifiers of adjacent blocks. Also, processor 1013 on execution of software 141 implements a block merging module 141B that uses the lists 1501 and 1502 to merge two blocks, and that then supplies a merged block to storage module 141S. As noted above, some embodiments of block merging module 141B implements the clustering rules 503, including projection overlap rules 503P, relative heights rules 503R, aspect ratio rules 503A and spacing rules 503S. Storage module 141S is implemented by execution of software 141 by processor 1013 to store the merged block in memory 1012, e.g. as a list of positions 504 that identify four corners of each merged block.
  • In some embodiments, software 141 may include a classifier software 552 that when executed by processor 1013 classifies unmerged blocks and/or merged blocks as text or non-text (after binarization based on pixel values in image 107 to identify connected components therein), and any block classified as text is supplied to OCR software 551.
  • Although various aspects are illustrated in connection with specific embodiments for instructional purposes, the described embodiments are not limited thereto. For example, although mobile device 200 shown in FIG. 2 of some embodiments is a hand-held device, in other embodiments are implemented by use of one or more parts that are stationary relative to a real world scene whose image is being captured by camera 1011.
  • As noted above, in some embodiments, when a limit on time spent in processing an image as per the method of FIG. 3B is exceeded, processor 1013 exits the method. On exiting in this manner, processor 1013 may then rotate the image through an angle (automatically, or based on user input, or a combination thereof), and then re-initiate performance of the method illustrated in FIG. 3B.
  • Moreover, in certain embodiments, processor 1013 may check for presence of a line of pixels oriented differently (e.g. located in a column in the block) depending on the characteristics of the language of text that may be included in the image.
  • Although a test for pixels arranged in a straight line has been described in some embodiments, as will be readily apparent in view of this detailed description, such a line need not be straight in other embodiments (e.g. a portion of the line inside a block may be wavy, or form an arc of a circle or ellipse).
  • Note that input to mobile device 200 can be in video mode, where each frame in the video is equivalent to the image input which is used to identify blocks of connected components and to check for overlap as described herein. Also, the image used to compute a skew metric as described herein can be fetched from a pre-stored file in a memory 1012 of mobile device 200.
  • Depending on the embodiment, various functions of the type described herein may be implemented in software (executed by one or more processors or processor cores) or in dedicated hardware circuitry or in firmware, or in any combination thereof. Accordingly, depending on the embodiment, any one or more of pixel line presence tester 141T, pixel line presence marker 141M, adjacent block identifier 141A, block merging module 141B (including computer instructions to implement the clustering rules 503, such as projection overlap rules 503P, relative heights rules 503R, aspect ratio rules 503A and spacing rules 503S), storage module 141S and verification module 141V illustrated in FIG. 10 and described above can, but need not necessarily include, one or more microprocessors, embedded processors, controllers, application specific integrated circuits (ASICs), digital signal processors (DSPs), and the like. The term processor is intended to describe the functions implemented by the system rather than specific hardware. Moreover, as used herein the term “memory” refers to any type of non-transitory computer storage medium, including long term, short term, or other memory associated with a mobile platform, and is not to be limited to any particular type of memory or number of memories, or type of media upon which information (such as software and clustering rules) may be stored.
  • Accordingly, in some embodiments, block merging module 141B (including computer instructions to implement the clustering rules 503) implements means for checking whether a first block and a second block that are adjacent to one another and do not overlap are such that a first projection of the first block on a straight line and a second projection of the second block on the straight line satisfy a test for overlap. Moreover, block merging module 141B of several such embodiments additionally implements means for merging the first block and the second block to obtain a merged block, based at least on an outcome of the test for overlap. In certain embodiments, storage module 141S implements means for storing in at least one memory, information related to the merged block, which information is received from block merging module 141B.
  • Hence, methodologies described herein may be implemented by various means depending upon the application. For example, these methodologies may be implemented in firmware in read-only-memory 1007 (FIG. 10) or software, or hardware or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, controllers, micro-controllers, microprocessors, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof. For a firmware and/or software implementation, the methodologies may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein.
  • Any machine-readable medium tangibly embodying computer instructions may be used in implementing the methodologies described herein. For example, software 141 (FIG. 10) may include program codes stored in memory 1012 and executed by processor 1013. Memory 1012 may be implemented within or external to the processor 1013. If implemented in firmware and/or software, the functions may be stored as one or more computer instructions or code on non-transitory computer readable medium. Examples include non-transitory computer readable storage media encoded with a data structure (such as a sequence of images) and non-transitory computer readable media encoded with a computer program (such as software 141 that can be executed to perform the method of FIGS. 2, 3B, and 7).
  • One or more non-transitory computer readable media include physical computer storage media. A computer readable medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, non-transitory computer readable storage media can comprise RAM, ROM, Flash Memory, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store program code in the form of software instructions (also called “processor instructions” or “computer instructions”) or data structures and that can be accessed by a computer; disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of one or more non-transitory computer readable storage media.
  • Although certain aspects are illustrated in connection with specific embodiments for instructional purposes, the described embodiments are not limited thereto. Hence, although mobile device 200 shown in FIG. 10 of some embodiments is a hand-held device, other embodiments are implemented by use of form factors that are different, e.g. in certain other embodiments the mobile device 200 is a mobile platform (such as a tablet) while still other embodiments are implemented by use of any electronic device or system. Illustrative embodiments of such an electronic device or system may include multiple physical parts that intercommunicate wirelessly, such as a processor and a memory that are portions of a stationary computer, such as a lap-top computer, a desk-top computer, or a server computer communicating over one or more wireless link(s) with sensors and user input circuitry enclosed in a housing that is small enough to be held in a hand.
  • Various adaptations and modifications may be made without departing from the scope of the described embodiments. Therefore, the spirit and scope of the appended claims should not be limited to the foregoing description.

Claims (25)

1. A method to determine whether to merge blocks of regions of an image, the method comprising:
checking whether a first block and a second block that are located adjacent to one another and not overlapping one another in the image are such that a first projection of the first block on a straight line and a second projection of the second block on the straight line satisfy a test of overlap;
wherein the first block comprises a first region in the image with a first plurality of pixels that are contiguous with one another and comprising a first local extrema of intensity in the image;
wherein the second block comprises a second region in the image with a second plurality of pixels that are contiguous with one another and comprising a second local extrema of intensity in the image;
merging a first set of positions indicative of the first region in the first block with a second set of positions indicative of the second region in the second block to obtain a merged set of positions in a merged block, based at least on an outcome of the test of overlap;
wherein the first region and the second region do not contact one another in the merged block;
wherein the merged block comprises at least the first plurality of pixels in the first block and the second plurality of pixels in the second block; and
storing information related to the merged block in a memory;
wherein at least one of the checking, the merging and the storing are performed by one or more processors.
2. The method of claim 1 wherein:
the second projection is smaller than the first projection; and
the test of overlap is satisfied when the second projection is entirely overlapped by the first projection.
3. The method of claim 1 wherein the test of overlap is hereinafter first test, and the method further comprises:
checking whether a first additional projection and a second additional projection of the first block and the second block respectively on an additional straight line perpendicular to the straight line satisfy a second test;
wherein the merging is further based on at least the outcome of the second test.
4. The method of claim 3 wherein:
the first test is satisfied when a portion of the second projection overlapped by the first projection, is less than a first predetermined percentage; and
the second test is satisfied when the portion of the second additional projection overlapped by the first additional projection, is greater than a second predetermined percentage.
5. The method of claim 1 further comprising:
checking whether an additional test comparing a height of the first block to the height of the second block is satisfied;
wherein the height of the first block and the height of the second block are both in a first direction that is perpendicular to a second direction of the straight line;
wherein the merging is further based on at least the outcome of the additional test.
6. The method of claim 5 wherein:
the second projection is smaller than the first projection; and
the additional test is satisfied when the height of the second block is between two predetermined fractions of the height of the first block.
7. The method of claim 1 further comprising:
checking whether an additional test on an aspect ratio of the second block is satisfied, when the second projection is smaller than the first projection;
wherein the merging is further based on at least the outcome of the additional test.
8. The method of claim 1 further comprising:
checking whether an additional test is satisfied, on a space between the first block and the second block in a direction perpendicular to the straight line;
wherein the merging is further based on at least the outcome of the additional test.
9. The method of claim 8 wherein:
when the second block is located above the first block, the additional test compares a distance of the space to a first predetermined limit thereon; and
when the second block is located below the first block, the additional test compares the distance of the space to a second predetermined limit thereon.
10. The method of claim 1 wherein:
the second projection is smaller than the first projection;
the method further comprises checking whether an additional test is satisfied, for presence of the pixels of a common binary value in a binarized version of the image along an additional straight line passing through at least the first block; and
wherein the merging is further based on at least the outcome of the additional test.
11. The method of claim 1 wherein:
prior to the checking, the first block and the second block are not classified as text or non-text;
the method further comprises verification of the merged block, followed by classifying the merged block as text or non-text.
12. The method of claim 11 wherein:
the verification comprises additionally checking whether an additional test is satisfied by the merged block, for presence of the pixels with a common binary value along another straight line passing through the merged block.
13. The method of claim 12 wherein the additionally checking comprises:
using a peak in a profile obtained by projection of intensities of the pixels in the merged block along a predetermined direction.
14. The method of claim 13 wherein the additionally checking comprises:
using a location of the peak relative to a span of the profile.
15. The method of claim 11 wherein the verification comprises:
binarizing each pixel in the merged block by assigning one of two binary values, based on comparison of an intensity of the pixel with a threshold determined by use of the pixels in the merged block; and
using an attribute of the merged block indicative of a ratio of (A) a mean of a number of transitions in a predetermined direction from a first binary value to a second binary value in each row in a set of rows in the merged block and (B) a width of the merged block.
16. The method of claim 15 wherein the verification further comprises:
using another attribute of the merged block;
wherein said another attribute is indicative of another mean of another number of transitions in the predetermined direction, from the second binary value to the first binary value in said row in the set of rows.
17. The method of claim 11 wherein the classifying comprises:
using an attribute of the merged block indicative of a ratio of (A) a number of vertical lines in the merged block and (B) a length of the merged block.
18. A mobile device comprising:
a camera;
a memory operatively connected to the camera to receive at least an image therefrom;
at least one processor operatively connected to the memory to execute a plurality of computer instructions stored in the memory, to supply information related to a merged block, the merged block being obtained by the at least one processor executing the plurality of computer instructions to merge a first block with a second block that is located adjacent to and not overlapping the first block;
wherein the plurality of computer instructions cause the at least one processor to check whether a first projection of the first block on a straight line and a second projection of the second block on the straight line satisfy a test for overlap; and
wherein the first block and the second block comprise a first region and a second region in the image having pixels contiguous with one another and comprising a local extrema of intensity in the image.
19. The mobile device of claim 18 wherein:
the second projection is smaller than the first projection; and
the test is satisfied when the second projection is entirely overlapped by the first projection.
20. The mobile device of claim 19 wherein the plurality of computer instructions when executed cause the at least one processor to:
further check whether a first additional projection and a second additional projection of the first block and the second block respectively on an additional straight line perpendicular to the straight line satisfy an additional test;
wherein execution of the plurality of computer instructions to merge is based on at least an outcome of the additional test.
21. One or more non-transitory computer readable storage media comprising computer instructions, which when executed in a handheld device, cause one or more processors in the handheld device to perform operations, the computer instructions comprising:
first instructions to check whether a first block and a second block located adjacent to one another in an image and not overlapping one another are such that a first projection of the first block on a straight line and a second projection of the second block on the straight line satisfy a test for overlap; and
wherein the first block and the second block comprise a first region and a second region in the image having pixels contiguous with one another and comprising a local extrema of intensity in the image;
second instructions to merge the first block and the second block to obtain a merged block, based at least on an outcome of the test;
wherein pixels in the merged block comprise at least a first plurality of pixels in the first block and a second plurality of pixels in the second block; and
third instructions to store information related to the merged block in a memory;
wherein one or more of the first instructions, the second instructions, and the third instructions are to be executed by at least one processor among the one or more processors.
22. The one or more non-transitory computer readable storage media of claim 21 wherein:
the second projection is smaller than the first projection; and
the test is satisfied when the second projection is entirely overlapped by the first projection.
23. The one or more non-transitory computer readable storage media of claim 22 wherein the computer instructions further comprise:
instructions to further check whether a first additional projection and a second additional projection of the first block and the second block respectively on an additional straight line perpendicular to the straight line satisfy an additional test;
wherein execution of the second instructions is based on at least the outcome of the additional test.
24. An apparatus for identifying regions of text, the apparatus comprising:
a memory storing an image of an environment outside the apparatus;
means, coupled to the memory, for checking whether a first block and a second block that are adjacent to one another and do not overlap are such that a first projection of the first block on a straight line and a second projection of the second block on the straight line satisfy a test for overlap; and
wherein the first block and the second block comprise a first region and a second region in the image having pixels contiguous with one another and comprising a local extrema of intensity in the image;
means for merging the first block and the second block to obtain a merged block, based at least on an outcome of the test;
wherein pixels in the merged block comprise at least a first plurality of pixels in the first block and a second plurality of pixels in the second block; and
means for storing in at least one memory, information related to the merged block.
25. The apparatus of claim 24 wherein:
the second projection is smaller than the first projection; and
the test is satisfied when the second projection is entirely overlapped by the first projection.
US13/748,574 2012-01-26 2013-01-23 Rules for merging blocks of connected components in natural images Abandoned US20130194448A1 (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
US13/748,574 US20130194448A1 (en) 2012-01-26 2013-01-23 Rules for merging blocks of connected components in natural images
PCT/US2013/023012 WO2013112753A1 (en) 2012-01-26 2013-01-24 Rules for merging blocks of connected components in natural images
PCT/US2013/023003 WO2013112746A1 (en) 2012-01-26 2013-01-24 Detecting and correcting skew in regions of text in natural images
PCT/US2013/022994 WO2013112738A1 (en) 2012-01-26 2013-01-24 Identifying regions of text to merge in a natural image or video frame
US13/791,188 US9064191B2 (en) 2012-01-26 2013-03-08 Lower modifier detection and extraction from devanagari text images to improve OCR performance
US13/831,237 US9076242B2 (en) 2012-07-19 2013-03-14 Automatic correction of skew in natural images and video
IN2654MUN2014 IN2014MN02654A (en) 2012-07-19 2013-07-18
PCT/US2013/051144 WO2014015178A1 (en) 2012-07-19 2013-07-18 Lower modifier detection and extraction from devanagari text images to improve ocr performance

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US201261590966P 2012-01-26 2012-01-26
US201261590973P 2012-01-26 2012-01-26
US201261590983P 2012-01-26 2012-01-26
US201261673703P 2012-07-19 2012-07-19
US13/748,574 US20130194448A1 (en) 2012-01-26 2013-01-23 Rules for merging blocks of connected components in natural images

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/748,539 Continuation-In-Part US9053361B2 (en) 2012-01-26 2013-01-23 Identifying regions of text to merge in a natural image or video frame

Publications (1)

Publication Number Publication Date
US20130194448A1 true US20130194448A1 (en) 2013-08-01

Family

ID=48869893

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/748,574 Abandoned US20130194448A1 (en) 2012-01-26 2013-01-23 Rules for merging blocks of connected components in natural images
US13/748,539 Expired - Fee Related US9053361B2 (en) 2012-01-26 2013-01-23 Identifying regions of text to merge in a natural image or video frame
US13/748,562 Active 2033-03-24 US8831381B2 (en) 2012-01-26 2013-01-23 Detecting and correcting skew in regions of text in natural images

Family Applications After (2)

Application Number Title Priority Date Filing Date
US13/748,539 Expired - Fee Related US9053361B2 (en) 2012-01-26 2013-01-23 Identifying regions of text to merge in a natural image or video frame
US13/748,562 Active 2033-03-24 US8831381B2 (en) 2012-01-26 2013-01-23 Detecting and correcting skew in regions of text in natural images

Country Status (2)

Country Link
US (3) US20130194448A1 (en)
WO (3) WO2013112746A1 (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130051681A1 (en) * 2011-08-29 2013-02-28 Chirag Jain System and method for script and orientation detection of images
US20130266176A1 (en) * 2012-04-10 2013-10-10 Chirag Jain System and method for script and orientation detection of images using artificial neural networks
US20140023271A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Identifying A Maximally Stable Extremal Region (MSER) In An Image By Skipping Comparison Of Pixels In The Region
US20140146042A1 (en) * 2012-11-29 2014-05-29 Samsung Electronics Co., Ltd. Apparatus and method for processing primitive in three-dimensional (3d) graphics rendering system
KR20140070316A (en) * 2012-11-29 2014-06-10 삼성전자주식회사 Method and apparatus for processing primitive in 3 dimensional graphics rendering system
US8831381B2 (en) 2012-01-26 2014-09-09 Qualcomm Incorporated Detecting and correcting skew in regions of text in natural images
CN104199841A (en) * 2014-08-06 2014-12-10 武汉图歌信息技术有限责任公司 Video editing method for generating animation through pictures and splicing and composing animation and video clips
US9047540B2 (en) 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9064191B2 (en) 2012-01-26 2015-06-23 Qualcomm Incorporated Lower modifier detection and extraction from devanagari text images to improve OCR performance
US9076242B2 (en) 2012-07-19 2015-07-07 Qualcomm Incorporated Automatic correction of skew in natural images and video
US9141874B2 (en) 2012-07-19 2015-09-22 Qualcomm Incorporated Feature extraction and use with a probability density function (PDF) divergence metric
US9262699B2 (en) 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US20160140409A1 (en) * 2014-11-13 2016-05-19 Alcatel Lucent Text classification based on joint complexity and compressed sensing
US9384391B2 (en) * 2014-10-03 2016-07-05 Xerox Corporation Methods and systems for processing documents
CN105844275A (en) * 2016-03-25 2016-08-10 北京云江科技有限公司 Method for positioning text lines in text image
EP3149658A4 (en) * 2014-05-28 2017-11-29 Gracenote Inc. Text detection in video
CN108170806A (en) * 2017-12-28 2018-06-15 东软集团股份有限公司 Sensitive word detection filter method, device and computer equipment
WO2018125926A1 (en) * 2016-12-27 2018-07-05 Datalogic Usa, Inc Robust string text detection for industrial optical character recognition
US10095946B2 (en) * 2016-07-07 2018-10-09 Lockheed Martin Corporation Systems and methods for strike through detection
EP3265960A4 (en) * 2015-03-04 2018-10-10 Au10tix Limited Methods for categorizing input images for use e.g. as a gateway to authentication systems
US20190370594A1 (en) * 2018-06-05 2019-12-05 Microsoft Technology Licensing, Llc Alignment of user input on a screen
CN111768345A (en) * 2020-05-12 2020-10-13 北京奇艺世纪科技有限公司 Method, device and equipment for correcting back image of identity card and storage medium
US10867204B2 (en) * 2019-04-30 2020-12-15 Hulu, LLC Frame level and video level text detection in video
CN115658778A (en) * 2022-07-27 2023-01-31 重庆忽米网络科技有限公司 Excel data source-based data processing method for visual application creation

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012037721A1 (en) * 2010-09-21 2012-03-29 Hewlett-Packard Development Company,L.P. Handwritten character font library
US9251144B2 (en) * 2011-10-19 2016-02-02 Microsoft Technology Licensing, Llc Translating language characters in media content
US9171204B2 (en) 2012-12-12 2015-10-27 Qualcomm Incorporated Method of perspective correction for devanagari text
US9070183B2 (en) 2013-06-28 2015-06-30 Google Inc. Extracting card data with linear and nonlinear transformations
US9245192B2 (en) * 2013-09-20 2016-01-26 Here Global B.V. Ad collateral detection
US9329692B2 (en) 2013-09-27 2016-05-03 Microsoft Technology Licensing, Llc Actionable content displayed on a touch screen
KR20150060338A (en) * 2013-11-26 2015-06-03 삼성전자주식회사 Electronic device and method for recogniting character in electronic device
US9460357B2 (en) 2014-01-08 2016-10-04 Qualcomm Incorporated Processing text images with shadows
JP6270597B2 (en) * 2014-04-04 2018-01-31 キヤノン株式会社 Image forming apparatus
US9342830B2 (en) 2014-07-15 2016-05-17 Google Inc. Classifying open-loop and closed-loop payment cards based on optical character recognition
US9576196B1 (en) 2014-08-20 2017-02-21 Amazon Technologies, Inc. Leveraging image context for improved glyph classification
US9418283B1 (en) * 2014-08-20 2016-08-16 Amazon Technologies, Inc. Image processing using multiple aspect ratios
US9600739B2 (en) * 2014-09-10 2017-03-21 Khalifa University of Science, Technology & Research Architecture for real-time extraction of extended maximally stable extremal regions (X-MSERs)
US9740947B2 (en) 2014-09-10 2017-08-22 Khalifa University of Science and Technology Hardware architecture for linear-time extraction of maximally stable extremal regions (MSERs)
US9489578B2 (en) * 2014-09-10 2016-11-08 Khalifa University Of Science, Technology And Research Hardware architecture for real-time extraction of maximally stable extremal regions (MSERs)
US20160074707A1 (en) * 2014-09-17 2016-03-17 Cambia Health Solutions, Inc. Systems and methods for achieving and maintaining behavioral fitness
US9503612B1 (en) * 2014-10-20 2016-11-22 Evernote Corporation Glare mitigation for dynamic document scanning
US20160196676A1 (en) * 2015-01-02 2016-07-07 Monotype Imaging Inc. Using Character Classes for Font Selection
US10456027B2 (en) 2017-04-26 2019-10-29 Khalifa University of Science and Technology Architecture and method for maximally stable extremal regions (MSERs)-based exudates detection in fundus images for diabetic retinopathy
US10163007B2 (en) * 2017-04-27 2018-12-25 Intuit Inc. Detecting orientation of textual documents on a live camera feed
JP2019097050A (en) * 2017-11-24 2019-06-20 京セラドキュメントソリューションズ株式会社 Image reading device and image reading program
US20190188513A1 (en) * 2017-12-20 2019-06-20 Datalogic Usa Inc. Systems and methods for object deskewing using stereovision or structured light
US20190205634A1 (en) * 2017-12-29 2019-07-04 Idemia Identity & Security USA LLC Capturing Digital Images of Documents
US10373022B1 (en) * 2018-02-28 2019-08-06 Konica Minolta Laboratory U.S.A., Inc. Text image processing using stroke-aware max-min pooling for OCR system employing artificial neural network
US10339622B1 (en) * 2018-03-02 2019-07-02 Capital One Services, Llc Systems and methods for enhancing machine vision object recognition through accumulated classifications
US10824808B2 (en) * 2018-11-20 2020-11-03 Sap Se Robust key value extraction
CN109933756B (en) * 2019-03-22 2022-04-15 腾讯科技(深圳)有限公司 Image file transferring method, device and equipment based on OCR (optical character recognition), and readable storage medium
JP7406884B2 (en) * 2019-06-27 2023-12-28 キヤノン株式会社 Information processing device, program and control method
CN112825079B (en) * 2019-11-20 2024-04-05 北京沃东天骏信息技术有限公司 Information display method and device
US10976974B1 (en) 2019-12-23 2021-04-13 Ricoh Company, Ltd. Defect size detection mechanism
JP7480536B2 (en) * 2020-03-12 2024-05-10 富士フイルムビジネスイノベーション株式会社 Document processing device and program
US11373294B2 (en) 2020-09-28 2022-06-28 Ricoh Company, Ltd. Print defect detection mechanism
US11694376B2 (en) * 2020-10-19 2023-07-04 Adobe Inc. Intuitive 3D transformations for 2D graphics

Family Cites Families (109)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3710321A (en) 1971-01-18 1973-01-09 Ibm Machine recognition of lexical symbols
US4654875A (en) 1983-05-23 1987-03-31 The Research Foundation Of State University Of New York System to achieve automatic recognition of linguistic strings
US5459739A (en) 1992-03-18 1995-10-17 Oclc Online Computer Library Center, Incorporated Merging three optical character recognition outputs for improved precision using a minimum edit distance function
US5335290A (en) * 1992-04-06 1994-08-02 Ricoh Corporation Segmentation of text, picture and lines of a document image
US5321768A (en) 1992-09-22 1994-06-14 The Research Foundation, State University Of New York At Buffalo System for recognizing handwritten character strings containing overlapping and/or broken characters
WO1994027251A1 (en) 1993-05-18 1994-11-24 Massachusetts Institute Of Technology Automated reading system and method
ATE196205T1 (en) * 1993-06-30 2000-09-15 Ibm METHOD FOR SEGMENTING IMAGES AND CLASSIFYING IMAGE ELEMENTS FOR DOCUMENT PROCESSING
JPH07182465A (en) 1993-12-22 1995-07-21 Hitachi Ltd Character recognition method
JP3338537B2 (en) 1993-12-27 2002-10-28 株式会社リコー Image tilt detector
US5519786A (en) 1994-08-09 1996-05-21 Trw Inc. Method and apparatus for implementing a weighted voting scheme for multiple optical character recognition systems
US5805747A (en) 1994-10-04 1998-09-08 Science Applications International Corporation Apparatus and method for OCR character and confidence determination using multiple OCR devices
US5764799A (en) 1995-06-26 1998-06-09 Research Foundation Of State Of State Of New York OCR method and apparatus using image equivalents
JPH0916598A (en) 1995-07-03 1997-01-17 Fujitsu Ltd System and method for character string correction using error pattern
US5844991A (en) 1995-08-07 1998-12-01 The Regents Of The University Of California Script identification from images using cluster-based templates
EP0767581B1 (en) * 1995-09-29 2002-05-22 Hewlett-Packard Company, A Delaware Corporation Image processing apparatus and method
US5835633A (en) 1995-11-20 1998-11-10 International Business Machines Corporation Concurrent two-stage multi-network optical character recognition system
IL121457A (en) 1997-08-03 2004-06-01 Guru Internat Inc Computerized dictionary and thesaurus applications
US7738015B2 (en) 1997-10-09 2010-06-15 Fotonation Vision Limited Red-eye filter method and apparatus
US5978443A (en) 1997-11-10 1999-11-02 General Electric Company Automated removal of background regions from radiographic images
US6473517B1 (en) 1999-09-15 2002-10-29 Siemens Corporate Research, Inc. Character segmentation method for vehicle license plate recognition
US6674919B1 (en) 1999-09-21 2004-01-06 Matsushita Electric Industrial Co., Ltd. Method for determining the skew angle of a two-dimensional barcode
US6687421B1 (en) * 2000-03-17 2004-02-03 International Business Machines Corporation Skew detection of text in a noisy digitized image
US6674900B1 (en) 2000-03-29 2004-01-06 Matsushita Electric Industrial Co., Ltd. Method for extracting titles from digital images
US6954795B2 (en) 2000-04-05 2005-10-11 Matsushita Electric Industrial Co., Ltd. Transmission/reception system and method for data broadcast, and transmission apparatus for data broadcast
US6678415B1 (en) 2000-05-12 2004-01-13 Xerox Corporation Document image decoding using an integrated stochastic language model
US6738512B1 (en) 2000-06-19 2004-05-18 Microsoft Corporation Using shape suppression to identify areas of images that include particular shapes
US6920247B1 (en) 2000-06-27 2005-07-19 Cardiff Software, Inc. Method for optical recognition of a multi-language set of letters with diacritics
US7031553B2 (en) 2000-09-22 2006-04-18 Sri International Method and apparatus for recognizing text in an image sequence of scene imagery
US7738706B2 (en) 2000-09-22 2010-06-15 Sri International Method and apparatus for recognition of symbols in images of three-dimensional scenes
JP4843867B2 (en) 2001-05-10 2011-12-21 ソニー株式会社 Document processing apparatus, document processing method, document processing program, and recording medium
US6873732B2 (en) * 2001-07-09 2005-03-29 Xerox Corporation Method and apparatus for resolving perspective distortion in a document image and for calculating line sums in images
US6915025B2 (en) 2001-11-27 2005-07-05 Microsoft Corporation Automatic image orientation detection based on classification of low-level image features
US7142728B2 (en) 2002-05-17 2006-11-28 Science Applications International Corporation Method and system for extracting information from a document
US7142727B2 (en) 2002-12-17 2006-11-28 Xerox Corporation Non-iterative method of calculating image skew
WO2004077358A1 (en) 2003-02-28 2004-09-10 Cedara Software Corporation Image region segmentation system and method
JP2004280334A (en) 2003-03-14 2004-10-07 Pfu Ltd Image reading device
KR100977713B1 (en) 2003-03-15 2010-08-24 삼성전자주식회사 Device and method for pre-processing in order to recognize characters in images
US7263223B2 (en) * 2003-05-05 2007-08-28 Hewlett-Packard Development Company, L.P. Image manipulation according to pixel type
US7403661B2 (en) 2004-02-12 2008-07-22 Xerox Corporation Systems and methods for generating high compression image data files having multiple foreground planes
US7336813B2 (en) 2004-04-26 2008-02-26 International Business Machines Corporation System and method of determining image skew using connected components
TWI284288B (en) * 2004-06-04 2007-07-21 Benq Corp Text region recognition method, storage medium and system
US7450268B2 (en) * 2004-07-02 2008-11-11 Hewlett-Packard Development Company, L.P. Image reproduction
JP4713107B2 (en) 2004-08-20 2011-06-29 日立オムロンターミナルソリューションズ株式会社 Character string recognition method and device in landscape
US8749839B2 (en) 2005-03-24 2014-06-10 Kofax, Inc. Systems and methods of processing scanned data
JP2007004584A (en) 2005-06-24 2007-01-11 Toshiba Corp Information processor
US7783117B2 (en) 2005-08-12 2010-08-24 Seiko Epson Corporation Systems and methods for generating background and foreground images for document compression
WO2007028166A2 (en) 2005-09-02 2007-03-08 Blindsight, Inc. A system and method for detecting text in real-world color images
KR100745753B1 (en) 2005-11-21 2007-08-02 삼성전자주식회사 Apparatus and method for detecting a text area of a image
US7949186B2 (en) 2006-03-15 2011-05-24 Massachusetts Institute Of Technology Pyramid match kernel and related techniques
EP1840798A1 (en) 2006-03-27 2007-10-03 Sony Deutschland Gmbh Method for classifying digital image data
US7778491B2 (en) 2006-04-10 2010-08-17 Microsoft Corporation Oblique image stitching
US7734065B2 (en) 2006-07-06 2010-06-08 Abbyy Software Ltd. Method of text information recognition from a graphical file with use of dictionaries and other supplementary data
US7724957B2 (en) 2006-07-31 2010-05-25 Microsoft Corporation Two tiered text recognition
US8285082B2 (en) 2006-09-01 2012-10-09 Getty Images, Inc. Automatic identification of digital content related to a block of text, such as a blog entry
JP4909216B2 (en) 2006-09-13 2012-04-04 株式会社キーエンス Character segmentation device, method and program
US20080112614A1 (en) 2006-11-14 2008-05-15 Siemens Corporate Research, Inc. Histogram tile map for gpu based histogram computation
JP4861845B2 (en) * 2007-02-05 2012-01-25 富士通株式会社 Telop character extraction program, recording medium, method and apparatus
JP4886850B2 (en) 2007-06-28 2012-02-29 パナソニック株式会社 Image processing apparatus, image processing method, and program
US7909248B1 (en) 2007-08-17 2011-03-22 Evolution Robotics Retail, Inc. Self checkout with visual recognition
US8014603B2 (en) 2007-08-30 2011-09-06 Xerox Corporation System and method for characterizing handwritten or typed words in a document
GB2453366B (en) 2007-10-04 2011-04-06 Toshiba Res Europ Ltd Automatic speech recognition method and apparatus
KR101589711B1 (en) 2007-10-12 2016-01-28 도요타 모터 유럽 Methods and systems for processing of video data
DE102007052622A1 (en) 2007-11-05 2009-05-07 T-Mobile International Ag Method for image analysis, in particular for a mobile radio device
US8009928B1 (en) 2008-01-23 2011-08-30 A9.Com, Inc. Method and system for detecting and recognizing text in images
JP5125573B2 (en) 2008-02-12 2013-01-23 富士通株式会社 Region extraction program, character recognition program, and character recognition device
GB2458278A (en) 2008-03-11 2009-09-16 Geoffrey Cross A method of recognising signs in images taken from video data
US8064729B2 (en) 2008-04-03 2011-11-22 Seiko Epson Corporation Image skew detection apparatus and methods
US20090317003A1 (en) 2008-06-22 2009-12-24 Andre Heilper Correcting segmentation errors in ocr
US8331680B2 (en) 2008-06-23 2012-12-11 International Business Machines Corporation Method of gray-level optical segmentation and isolation using incremental connected components
US8498487B2 (en) 2008-08-20 2013-07-30 Sri International Content-based matching of videos using local spatio-temporal fingerprints
US8160393B2 (en) 2008-09-18 2012-04-17 Certifi Media Inc. Method for image skew detection
US8559723B2 (en) 2008-09-29 2013-10-15 Microsoft Corporation Letter model and character bigram based language model for handwriting recognition
EP2189926B1 (en) 2008-11-21 2012-09-19 beyo GmbH Method for providing camera-based services using a portable communication device of a user and portable communication device of a user
KR101035744B1 (en) 2008-12-08 2011-05-20 삼성전자주식회사 Apparatus and method for character recognition using camera
WO2010067203A2 (en) 2008-12-08 2010-06-17 Georgios Stylianou Vision assistance using mobile telephone
US8295593B2 (en) 2009-01-07 2012-10-23 Seiko Epson Corporation Method of detecting red-eye objects in digital images using color, structural, and geometric characteristics
US8290302B2 (en) 2009-01-30 2012-10-16 Xerox Corporation Method and system for skew detection of a scanned document using connected components analysis
GB0904274D0 (en) 2009-03-12 2009-04-22 Siemens Medical Solutions Connecting regions of interest in 4D in dynamic, gated and longitudinal studies
JP5075861B2 (en) 2009-03-16 2012-11-21 株式会社東芝 Image processing apparatus and image processing method
JP4772888B2 (en) 2009-03-27 2011-09-14 シャープ株式会社 Image processing apparatus, image forming apparatus, image processing method, program, and recording medium thereof
US8855420B2 (en) 2009-04-09 2014-10-07 France Telecom Descriptor determination in a multimedia content
US8111911B2 (en) * 2009-04-27 2012-02-07 King Abdulaziz City For Science And Technology System and methods for arabic text recognition based on effective arabic text feature extraction
US20110052094A1 (en) 2009-08-28 2011-03-03 Chunyu Gao Skew Correction for Scanned Japanese/English Document Images
US8520983B2 (en) 2009-10-07 2013-08-27 Google Inc. Gesture-based selective text recognition
KR101220709B1 (en) 2010-02-03 2013-01-10 삼성전자주식회사 Search apparatus and method for document mixing hangeul and chinese characters using electronic dictionary
US8526732B2 (en) * 2010-03-10 2013-09-03 Microsoft Corporation Text enhancement of a textual image undergoing optical character recognition
US8391602B2 (en) 2010-04-08 2013-03-05 University Of Calcutta Character recognition
US8571270B2 (en) 2010-05-10 2013-10-29 Microsoft Corporation Segmentation of a word bitmap into individual characters or glyphs during an OCR process
US20110280484A1 (en) 2010-05-12 2011-11-17 Microsoft Corporation Feature design for hmm-based handwriting recognition
US8194983B2 (en) 2010-05-13 2012-06-05 Hussein Khalid Al-Omari Method and system for preprocessing an image for optical character recognition
US8600167B2 (en) 2010-05-21 2013-12-03 Hand Held Products, Inc. System for capturing a document in an image signal
US8189961B2 (en) 2010-06-09 2012-05-29 Microsoft Corporation Techniques in optical character recognition
JP5716328B2 (en) 2010-09-14 2015-05-13 株式会社リコー Information processing apparatus, information processing method, and information processing program
US20120092329A1 (en) 2010-10-13 2012-04-19 Qualcomm Incorporated Text-based 3d augmented reality
US8768062B2 (en) 2010-11-09 2014-07-01 Tata Consulting Services Limited Online script independent recognition of handwritten sub-word units and words
US8542926B2 (en) * 2010-11-19 2013-09-24 Microsoft Corporation Script-agnostic text reflow for document images
AU2010257298B2 (en) 2010-12-17 2014-01-23 Canon Kabushiki Kaisha Finding text regions from coloured image independent of colours
US8942484B2 (en) 2011-09-06 2015-01-27 Qualcomm Incorporated Text detection using image regions
US8611662B2 (en) 2011-11-21 2013-12-17 Nokia Corporation Text detection using multi-layer connected components with histograms
US9064191B2 (en) 2012-01-26 2015-06-23 Qualcomm Incorporated Lower modifier detection and extraction from devanagari text images to improve OCR performance
US20130194448A1 (en) 2012-01-26 2013-08-01 Qualcomm Incorporated Rules for merging blocks of connected components in natural images
EP2677464B1 (en) 2012-05-16 2018-05-02 IMEC vzw Feature detection in numeric data
US9053372B2 (en) 2012-06-28 2015-06-09 Honda Motor Co., Ltd. Road marking detection and recognition
US9014480B2 (en) 2012-07-19 2015-04-21 Qualcomm Incorporated Identifying a maximally stable extremal region (MSER) in an image by skipping comparison of pixels in the region
US9047540B2 (en) 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9262699B2 (en) 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US9141874B2 (en) 2012-07-19 2015-09-22 Qualcomm Incorporated Feature extraction and use with a probability density function (PDF) divergence metric
US20140023275A1 (en) 2012-07-19 2014-01-23 Qualcomm Incorporated Redundant aspect ratio decoding of devanagari characters
US9317764B2 (en) 2012-12-13 2016-04-19 Qualcomm Incorporated Text image quality based feedback for improving OCR

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8903175B2 (en) * 2011-08-29 2014-12-02 Hewlett-Packard Development Company, L.P. System and method for script and orientation detection of images
US20130051681A1 (en) * 2011-08-29 2013-02-28 Chirag Jain System and method for script and orientation detection of images
US8831381B2 (en) 2012-01-26 2014-09-09 Qualcomm Incorporated Detecting and correcting skew in regions of text in natural images
US9064191B2 (en) 2012-01-26 2015-06-23 Qualcomm Incorporated Lower modifier detection and extraction from devanagari text images to improve OCR performance
US9053361B2 (en) 2012-01-26 2015-06-09 Qualcomm Incorporated Identifying regions of text to merge in a natural image or video frame
US8891822B2 (en) * 2012-04-10 2014-11-18 Hewlett-Packard Development Company, L.P. System and method for script and orientation detection of images using artificial neural networks
US20130266176A1 (en) * 2012-04-10 2013-10-10 Chirag Jain System and method for script and orientation detection of images using artificial neural networks
US20140023271A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Identifying A Maximally Stable Extremal Region (MSER) In An Image By Skipping Comparison Of Pixels In The Region
US9183458B2 (en) * 2012-07-19 2015-11-10 Qualcomm Incorporated Parameter selection and coarse localization of interest regions for MSER processing
US9639783B2 (en) 2012-07-19 2017-05-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US9076242B2 (en) 2012-07-19 2015-07-07 Qualcomm Incorporated Automatic correction of skew in natural images and video
US9047540B2 (en) 2012-07-19 2015-06-02 Qualcomm Incorporated Trellis based word decoder with reverse pass
US20140023270A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Parameter Selection and Coarse Localization of Interest Regions for MSER Processing
US9262699B2 (en) 2012-07-19 2016-02-16 Qualcomm Incorporated Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US9014480B2 (en) * 2012-07-19 2015-04-21 Qualcomm Incorporated Identifying a maximally stable extremal region (MSER) in an image by skipping comparison of pixels in the region
US9141874B2 (en) 2012-07-19 2015-09-22 Qualcomm Incorporated Feature extraction and use with a probability density function (PDF) divergence metric
US9858709B2 (en) * 2012-11-29 2018-01-02 Samsung Electronics Co., Ltd. Apparatus and method for processing primitive in three-dimensional (3D) graphics rendering system
US20140146042A1 (en) * 2012-11-29 2014-05-29 Samsung Electronics Co., Ltd. Apparatus and method for processing primitive in three-dimensional (3d) graphics rendering system
KR102059578B1 (en) * 2012-11-29 2019-12-27 삼성전자주식회사 Method and apparatus for processing primitive in 3 dimensional graphics rendering system
KR20140070316A (en) * 2012-11-29 2014-06-10 삼성전자주식회사 Method and apparatus for processing primitive in 3 dimensional graphics rendering system
US9876982B2 (en) 2014-05-28 2018-01-23 Gracenote, Inc. Text detection in video
EP3149658A4 (en) * 2014-05-28 2017-11-29 Gracenote Inc. Text detection in video
CN104199841A (en) * 2014-08-06 2014-12-10 武汉图歌信息技术有限责任公司 Video editing method for generating animation through pictures and splicing and composing animation and video clips
US9384391B2 (en) * 2014-10-03 2016-07-05 Xerox Corporation Methods and systems for processing documents
US20160140409A1 (en) * 2014-11-13 2016-05-19 Alcatel Lucent Text classification based on joint complexity and compressed sensing
EP3265960A4 (en) * 2015-03-04 2018-10-10 Au10tix Limited Methods for categorizing input images for use e.g. as a gateway to authentication systems
US10956744B2 (en) 2015-03-04 2021-03-23 Au10Tix Ltd. Methods for categorizing input images for use e.g. as a gateway to authentication systems
CN105844275A (en) * 2016-03-25 2016-08-10 北京云江科技有限公司 Method for positioning text lines in text image
US10095946B2 (en) * 2016-07-07 2018-10-09 Lockheed Martin Corporation Systems and methods for strike through detection
WO2018125926A1 (en) * 2016-12-27 2018-07-05 Datalogic Usa, Inc Robust string text detection for industrial optical character recognition
US10552699B2 (en) 2016-12-27 2020-02-04 Datalogic Usa, Inc. Robust string text detection for industrial optical character recognition
CN108170806A (en) * 2017-12-28 2018-06-15 东软集团股份有限公司 Sensitive word detection filter method, device and computer equipment
US20190370594A1 (en) * 2018-06-05 2019-12-05 Microsoft Technology Licensing, Llc Alignment of user input on a screen
US11017258B2 (en) * 2018-06-05 2021-05-25 Microsoft Technology Licensing, Llc Alignment of user input on a screen
US10867204B2 (en) * 2019-04-30 2020-12-15 Hulu, LLC Frame level and video level text detection in video
CN111768345A (en) * 2020-05-12 2020-10-13 北京奇艺世纪科技有限公司 Method, device and equipment for correcting back image of identity card and storage medium
CN115658778A (en) * 2022-07-27 2023-01-31 重庆忽米网络科技有限公司 Excel data source-based data processing method for visual application creation

Also Published As

Publication number Publication date
WO2013112746A1 (en) 2013-08-01
US20130195376A1 (en) 2013-08-01
WO2013112738A1 (en) 2013-08-01
WO2013112753A1 (en) 2013-08-01
US20130195315A1 (en) 2013-08-01
US8831381B2 (en) 2014-09-09
US9053361B2 (en) 2015-06-09

Similar Documents

Publication Publication Date Title
US9053361B2 (en) Identifying regions of text to merge in a natural image or video frame
US9171204B2 (en) Method of perspective correction for devanagari text
US9317764B2 (en) Text image quality based feedback for improving OCR
US9141874B2 (en) Feature extraction and use with a probability density function (PDF) divergence metric
US9183458B2 (en) Parameter selection and coarse localization of interest regions for MSER processing
US9076242B2 (en) Automatic correction of skew in natural images and video
US8611662B2 (en) Text detection using multi-layer connected components with histograms
US9262699B2 (en) Method of handling complex variants of words through prefix-tree based decoding for Devanagiri OCR
US7813553B2 (en) Image region detection method, recording medium, and device therefor
US9076056B2 (en) Text detection in natural images
US8059868B2 (en) License plate recognition apparatus, license plate recognition method, and computer-readable storage medium
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
US20140023275A1 (en) Redundant aspect ratio decoding of devanagari characters
CN109389115B (en) Text recognition method, device, storage medium and computer equipment
Tam et al. Quadrilateral Signboard Detection and Text Extraction.
Ikica et al. Influence of image quality on SWT voting-based color reduction method for detecting text in natural scene images

Legal Events

Date Code Title Description
AS Assignment

Owner name: QUALCOMM INCORPORATED, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BAHETI, PAWAN KUMAR;AGARWAL, ANKIT;GORE, DHANANJAY ASHOK;SIGNING DATES FROM 20130201 TO 20130207;REEL/FRAME:029831/0950

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION