CN108509955B

CN108509955B - Method, system, and non-transitory computer readable medium for character recognition

Info

Publication number: CN108509955B
Application number: CN201810161029.6A
Authority: CN
Inventors: 斯图尔特.瓜尔涅里; 詹森.詹姆斯.格拉姆斯
Original assignee: Konica Minolta Laboratory USA Inc
Current assignee: Konica Minolta Laboratory USA Inc
Priority date: 2017-02-28
Filing date: 2018-02-27
Publication date: 2022-04-15
Anticipated expiration: 2038-02-27
Also published as: JP2018152059A; JP7071840B2; CN108509955A

Abstract

A method for character recognition. The method comprises the following steps: obtaining a plurality of character segments extracted from an image; determining a first character bounding box having a first set of the plurality of character fragments and a second character bounding box having a second set of the plurality of character fragments; determining a plurality of directions for the first set and a plurality of timing attributes for the first set, wherein the plurality of timing attributes comprises an ordering for the first set and a rendering duration for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the plurality of timing attributes for the first set to an Intelligent Character Recognition (ICR) engine.

Description

Method, system, and non-transitory computer readable medium for character recognition

Technical Field

The present invention relates to character recognition, and more particularly, to methods, systems, and non-transitory computer readable media for character recognition.

Background

The image may be generated by scanning a hardcopy document. Images may also be generated by software applications that transform electronic documents (e.g., word processing documents, slides of a slide show, spreadsheets, web pages, etc.) into an image format (e.g., a bitmap). Thus, an image typically includes a plurality of hand-drawn text characters regardless of how the image is generated. The image with text characters may be stored (i.e., archived) for a considerable length of time before the image is restored for viewing, printing, analysis, etc.

Intelligent Character Recognition (ICR) is a technique that identifies (i.e., recognizes) text characters in an image and outputs electronically editable versions (e.g., strings) of these text characters. The ICR may be performed while hand-drawing the text characters and thus the ICR can utilize the timing information to correctly recognize the characters. However, if ICR is performed after the text characters are rendered (e.g., on an archived image), timing information is not available and the performance of ICR suffers. In any event, the user still desires to perform ICR on the archived image with the hand-drawn characters.

Disclosure of Invention

In general, in one aspect, embodiments of the invention relate to a method for character recognition. The method comprises the following steps: obtaining a plurality of character segments extracted from an image; determining a first character bounding boxes (bounding boxes) comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining a plurality of directions for the first set and a plurality of timing attributes for the first set, wherein the plurality of timing attributes comprises an ordering for the first set and a rendering duration for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the plurality of timing attributes for the first set to an Intelligent Character Recognition (ICR) engine.

In general, in one aspect, embodiments of the invention relate to a system for character recognition. The system comprises: a memory; a computer processor coupled to the memory and configured to: obtaining a plurality of character segments extracted from an image; determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining a plurality of directions for the first set and a plurality of timing attributes for the first set, wherein the plurality of timing attributes comprises an ordering for the first set and a rendering duration for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the plurality of timing attributes for the first set to an Intelligent Character Recognition (ICR) engine.

In general, in one aspect, embodiments of the invention relate to a non-transitory Computer Readable Medium (CRM) having computer program code stored thereon. The computer program code, when executed by a computer processor, is for: obtaining a plurality of character segments extracted from an image; determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining a plurality of directions for the first set and a plurality of timing attributes for the first set, wherein the plurality of timing attributes comprises an ordering for the first set and a rendering duration for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the plurality of timing attributes for the first set to an Intelligent Character Recognition (ICR) engine.

In general, in one aspect, embodiments of the invention relate to a method for character recognition. The method comprises the following steps: obtaining a plurality of character segments extracted from an image; determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining an ordering for the first set based on a plurality of texture attributes for the first set; determining a plurality of directions for the first set based on a plurality of brush widths and a plurality of densities for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the ordering for the first set to an Intelligent Character Recognition (ICR) engine.

In general, in one aspect, embodiments of the invention relate to a system for character recognition. The system comprises: a memory; a computer processor coupled to the memory and configured to: obtaining a plurality of character segments extracted from an image; determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining an ordering for the first set based on a plurality of texture attributes for the first set; determining a plurality of directions for the first set based on a plurality of brush widths and a plurality of densities for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the ordering for the first set to an Intelligent Character Recognition (ICR) engine.

In general, in one aspect, embodiments of the invention relate to a non-transitory Computer Readable Medium (CRM) having computer program code stored thereon. The computer program code, when executed by a computer processor, is for: obtaining a plurality of character segments extracted from an image; determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments; determining an ordering for the first set based on a plurality of texture attributes for the first set; determining a plurality of directions for the first set based on a plurality of brush widths and a plurality of densities for the first set; and running character recognition for the first character bounding box by sending the first set, the plurality of directions for the first set, and the ordering for the first set to an Intelligent Character Recognition (ICR) engine.

Other aspects of the invention will be apparent from the following description and appended claims.

Drawings

FIG. 1 shows a system in accordance with one or more embodiments of the invention.

Fig. 2, 3, 4A, and 4B illustrate flow diagrams in accordance with one or more embodiments of the invention.

Fig. 5A and 5B illustrate one or more examples in accordance with one or more embodiments of the invention.

FIG. 6 illustrates a computer system in accordance with one or more embodiments of the invention.

Detailed Description

Specific embodiments of the present invention will now be described in detail with reference to the accompanying drawings. Like elements in the various figures are denoted by like reference numerals for consistency.

In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.

In general, embodiments of the invention provide methods, systems, and non-transitory Computer Readable Media (CRM) for character recognition. In particular, character segments extracted from the image are obtained, and then character bounding boxes are determined for the character segments. These character fragments correspond to hand-drawn text characters in the image. For each character bounding box, the direction and timing attributes (e.g., ordering, drawing duration, etc.) for the set of character fragments in the character bounding box are determined. The ordering of the character fragments may be determined based on the intersection points and texture properties of the character fragments. One or more directions of the character fragments may be based on brush width and/or density.

The set of character fragments, the direction for the set of character fragments, and the timing attributes for the set of character fragments are then submitted to the ICR engine to perform character recognition. In other words, the ICR engine utilizes the orientation and timing attributes to identify hand-drawn text characters in the image. By utilizing the determined direction and the determined timing attributes, the performance of the ICR engine is improved (i.e., the identified text characters are more likely to correctly match the hand-drawn characters in the image).

FIG. 1 shows a system (100) according to one or more embodiments of the invention. As shown in fig. 1, the system (100) has a number of components, including a skeleton extractor (104), a stroke analyzer (106), and an ICR engine (108). Each component (104,106,108) may correspond to a Personal Computer (PC), laptop, mobile computing device (e.g., tablet PC, smart phone, etc.), server, mainframe (mainframe), kiosk (kiosk), etc., connected together by a network having wired and/or wireless segments. Additionally or alternatively, two or more components (104,106,108) may be located on the same hardware device having at least a computer processor and memory.

As shown in fig. 1, an image (102) is input to the system (100). The image (102) may be obtained from a scanner, downloaded from a website, restored from a repository (repository), and so forth. The image (102) may be a bitmap. Additionally or alternatively, the image (102) may be in any form. The image (102) may include one or more hand-drawn text characters. The recognized character (110) is the output of the system (100). The recognized character (110) is an electronically editable version (e.g., string) of the hand-drawn text character in the image (102).

In one or more embodiments of the invention, the system (100) includes a skeleton extractor (104). The skeleton extractor (104) may be implemented in hardware (i.e., circuitry), software, or any combination thereof. The skeleton extractor (104) is configured to extract and output character segments found in the image (102). This may include performing connected component analysis on the image (102). The skeleton extractor may extract and output character segments one text line at a time. In one or more embodiments of the invention, the skeleton extractor (104) outputs one or more brush widths for each character segment (i.e., the brush widths may vary with the length of the character segment), one or more density values for each character segment (i.e., the densities may vary with the length of the character segment), one or more color segments for each character segment (i.e., the text attributes may vary with the length of the character segment), and the like.

In one or more embodiments of the invention, the system (100) includes a stroke analyzer (106). The stroke analyzer (106) may be implemented in hardware (i.e., circuitry), software, or any combination thereof. The stroke analyzer (106) is configured to determine character bounding boxes for the character fragments received from the skeleton extractor (104). The stroke analyzer (106) is further configured to determine a direction for the set of character fragments in each character bounding box and to determine timing attributes (e.g., rendering duration and/or ordering) for the set of character fragments in the character bounding box. The stroke analyzer (106) may determine the direction using at least the brush width and/or density. The stroke analyzer (106) may utilize at least the texture properties to determine timing properties (e.g., ordering). Character fragments that have been assigned a direction, order, and/or additional timing attributes may be referred to as strokes. A stroke may include one or more character fragments. A text character may include one or more strokes.

In one or more embodiments of the invention, a system (100) includes an ICR engine (108). The ICR engine (108) may be implemented in hardware, software, or any combination thereof. The ICR engine (108) inputs a set of character fragments, a direction for the set of character fragments, and timing attributes (e.g., rendering duration, ordering, etc.) for the set of character fragments. The ICR engine (108) identifies and outputs the recognized character (110) using the set of character fragments, the direction for the set of character fragments, and the timing attributes for the set of character fragments. The use of the determined orientation and the determined timing attributes increases the likelihood that the recognized character (110) correctly matches the hand-drawn text character in the image (102). For example, the character "O" and the character "D" may have similar character segments. However, the manner in which "O" is drawn (i.e., the direction and timing attributes) and the manner in which "D" is drawn are quite different. Thus, the use of the ICR engine (108) for direction and timing attributes may resolve ambiguities. The use of a determined direction and a determined timing attribute may also reduce the time required to output the recognized character (110).

Those skilled in the art having the benefit of this detailed description will appreciate that the recognized characters (110) can be used to generate an electronic document that includes the contents of the image (102) and that is also editable. Those skilled in the art having the benefit of this detailed description will also appreciate that the skeleton extractor (104), stroke analyzer (106), and ICR engine (108) may be customized (specialized) for a particular language or letter/character set. Additionally or alternatively, the skeleton extractor (104), stroke analyzer (106), and ICR engine (108) may be capable of handling multiple languages or letter/character sets.

FIG. 2 shows a flow diagram in accordance with one or more embodiments of the invention. The flow chart depicts a process for character recognition. One or more of the steps in fig. 2 may be performed by a component of system (100), such as stroke analyzer (106), as discussed above with reference to fig. 1. In one or more embodiments of the invention, one or more of the steps shown in fig. 2 may be omitted, repeated, and/or performed in a different order than that shown in fig. 2. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 2.

Initially, a character fragment is obtained (step 205). These character fragments may have been extracted from the image by a skeleton extractor that performs connected component analysis. These character fragments may correspond to hand-drawn characters in the image. Further, the image may have been previously generated by scanning a hard document and/or the image may have been downloaded/retrieved from a website, warehouse, etc. In one or more embodiments, the image is a bitmap.

At step 210, a plurality of character bounding boxes is determined. Each character bounding box includes a collection of character fragments. Each character bounding box may correspond to a single text character and/or multiple text characters (e.g., when two or more text characters touch in an image). Determining the character bounding box may effectively entail performing a cluster analysis to determine a plurality of sets, where each set has connected character fragments. Multiple character fragments in a collection may be merged into a new character fragment. The new character fragment is also part of the collection.

In step 215, a character bounding box is selected. The character bounding box may be randomly selected. Alternatively, if multiple character segments correspond to lines (i.e., rows, columns, etc.) of text, the character bounding box may be selected from left-to-right, right-to-left, top-to-bottom, etc.

At step 220, the direction and timing properties of the set of character fragments in the selected bounding box are determined. In particular, the direction of each segment may be determined. Further, the ordering of the set of character fragments (i.e., the first rendered character fragment, the second rendered character fragment, the last rendered character fragment, etc.) may be determined. Further, a rendering duration may be determined for the set of character fragments. The drawing duration may correspond to a total time required to hand-draw all of the character fragments in the selected bounding box. In one or more embodiments, the rendering duration also includes a gap time between each character segment (i.e., the time between the end of rendering one character segment and the beginning of the next character segment). Additionally or alternatively, a rendering duration is calculated and maintained for each character segment in the set. Additional details regarding step 220 are provided in fig. 3, 4A, and 4B.

At step 225, character recognition is run based on the set of character segments, the determined direction, and the determined timing attributes. In particular, the set of character fragments, the determined direction, and the determined timing attributes (e.g., ordering, rendering duration) may be sent to an ICR engine that outputs the identified characters. These directional and timing attributes increase the likelihood that the recognized character will correctly match the actual hand-drawn character in the image. These directional and timing attributes may also reduce the time required to output the recognized characters.

At step 230, it is determined whether there are existing character bounding boxes that have not yet been processed. When it is determined that additional character bounding boxes need to be processed, processing returns to step 215.

Those skilled in the art having the benefit of this detailed description will appreciate that in the process of FIG. 2, the set of character fragments, the timing attributes for the set of character fragments, and the direction of the character fragments may be provided (i.e., sent) to the ICR engine on a bounding box-by-bounding box basis. Those skilled in the art who have the benefit of this detailed description also will appreciate that the process depicted in FIG. 2 may be repeated for each text line (e.g., row, column, etc.) in the image.

FIG. 3 shows a flow diagram in accordance with one or more embodiments of the invention. The flow chart depicts a process for character recognition. In particular, the flow chart depicts a process for determining the orientation and timing properties of a collection of character fragments. One or more of the steps in fig. 3 may be performed by a component of system (100), such as stroke analyzer (106), as discussed above with reference to fig. 1. The process depicted in fig. 3 may correspond to step 220 in fig. 2. In one or more embodiments of the invention, one or more of the steps shown in fig. 3 may be omitted, repeated, and/or performed in a different order than that shown in fig. 3. Accordingly, the scope of the present invention should not be considered limited to the particular arrangement of steps shown in FIG. 3.

Initially, the ordering of the character segments is determined (step 305). Determining the ordering may include determining which character segment was drawn first, which character segment was drawn second, which character segment was drawn last, etc. The ranking is determined based on assumptions that may depend on the language. For example, it may be assumed that a longer character fragment is drawn before a shorter character fragment. Additionally or alternatively, it may be assumed that the character fragments near the left side of the bounding box are drawn before the character fragments near the right side of the bounding box. Additionally or alternatively, it may be assumed that the character segment near the top of the bounding box is drawn before the character segment near the bottom of the bounding box. Additionally or alternatively, it may be assumed that the vertical character segment is drawn before the horizontal character segment, etc. Additionally or alternatively, it may be assumed that character segments connected by sharp changes in direction (e.g., corners) are rendered before other character segments. One or more hypotheses may result from observing repetitive behavior among multiple individual operators when rendering text characters.

In one or more embodiments of the invention, the ordering of two intersecting character segments is determined based on text properties (e.g., fill color, fill pattern, etc.), particularly at and near the intersection (discussed below). Fig. 4A and 4B illustrate example tests for determining whether two character fragments have a correct ordering.

At step 310, the clock value is reset for the set of character fragments. The clock value is used to measure the time required to draw one or more character fragments in the collection. A counter may be used to implement the clock value.

In step 315, a character fragment is selected. Character segments may be selected based on the determined ordering (step 305). Additionally or alternatively, the character segments may be randomly selected.

In step 320, the orientation of the character fragment is selected. The character segment has two endpoints and determining the direction of the character segment includes determining which endpoint is a starting endpoint and which endpoint is an ending endpoint.

Those skilled in the art who have the benefit of this detailed description will appreciate that most users are accustomed to using the right hand and dragging a drawing tool (e.g., a pen, pencil, marker (marker), etc.) toward themselves. Thus, determining the direction of a character segment may include selecting a user point representing the user's location while the user is drawing the text character, and then determining the distance between the user point and the two endpoints. The closer endpoint may be designated as the ending endpoint and the farther endpoint may be designated as the starting endpoint.

Additionally or alternatively, character fragments tend to be drawn from left to right and from top to bottom. Depending on the long axis of the character fragment. Horizontal character fragments are typically drawn from left to right. Vertical character fragments are usually drawn from top to bottom.

Additionally or alternatively, in one or more embodiments, brush width and/or density are used to determine the beginning and ending endpoints of a character segment. In particular, the brush width at the beginning end is often greater than the brush width at the ending end. Similarly, the density of the beginning end point is often deeper than the density of the ending end point. Thus, the end points with the greater brush width and/or greater depth density may be designated as starting end points, while the remaining end points are designated as ending end points.

At step 325, the length of the character fragment is calculated. The length of the character fragment may have been calculated to run the previous step (e.g., step 305) and thus may be omitted here.

At step 330, the time to draw the character fragment is calculated and the clock value is incremented according to the calculated time. The time for drawing the character fragments is a function of the length of the character fragments and the speed of the writing instrument (e.g., pen, pencil, etc.). The same constant velocity (V) may be assumed for all character segments in the set_C). Additionally or alternatively, different fractions (or multiples) of constant velocity may be assumed for different character segments of different lengths (e.g., 0.25V_C、0.5V_C、1.2V_C、1.8V_C). Further, the speed may be selected based on the ordering of the character segments. For example, a velocity of V may be assumed for the first character segment in the sequence, while a velocity of 1.25V or 0.7V (i.e., greater or lesser velocity) may be assumed for the last character segment in the sequence. As another example, for all character segments that are not the first or last character segment in the sequence (i.e., the middle character segment), it may be assumed that the velocity is the average of the velocity assumed for the first character segment and the velocity assumed for the last character segment. Additionally or alternatively, a different velocity may be assumed for each middle character segment. For example, the velocity assumed for the middle character segment may be spaced between the velocity assumed for the first character segment and the velocity assumed for the last character segment (i.e., the higher the ranking, the higher the assumed velocity). Other schemes are possible.

At step 335, it is determined whether there are additional fragments that have not yet been processed. When it is determined that there are additional segments that need to be processed, processing proceeds to step 340. When it is determined that there are no additional fragments that need to be processed, processing proceeds to step 345.

At step 340, the clock value is incremented to occupy the time gap between (account for) ending the rendering of the selected character fragment and beginning the rendering of the next character fragment. In one or more embodiments, the same time gap is assumed for all consecutive character segments. In one or more embodiments, different time gaps are used for passage between different character fragments (processing).

At step 345, a rendering duration is determined based on the clock value. In one or more embodiments, the drawing duration is a current clock value. In one or more embodiments, the drawing duration is the current clock value with one or more adjustments to occupy the time gap.

Those skilled in the art having the benefit of this detailed description will appreciate that the process illustrated in FIG. 3 may be performed for a set of character fragments in each character bounding box.

In fig. 3, the ordering of the character fragments based on the texture attributes is determined (step 305) and the orientation of the character fragments based on the brush width and/or density is determined (step 320). However, in one or more embodiments of the invention, only the ordering of the character fragments based on the texture attribute is determined (i.e., step 305 is performed, but step 320 is omitted). In one or more embodiments of the invention, only the direction of the character segment based on the brush width and/or density is determined (i.e., step 320 is performed, but step 305 is omitted). In such embodiments, only the ordering of the character fragments or only the orientation of the character fragments is provided to the ICR engine to perform character recognition.

Fig. 4A and 4B illustrate a flow diagram in accordance with one or more embodiments of the invention. The flow chart depicts a test for determining whether two character segments (i.e., character segment a, character segment B) in a single character bounding box have the correct ordering. In one or more embodiments, the test involves the latin alphabet. One or more of the steps in fig. 4A and 4B may be performed by a component of the system (100) discussed above with reference to fig. 1, such as the stroke analyzer (106). The process depicted in fig. 4A and 4B may correspond to step 305 in fig. 3. In one or more embodiments of the invention, one or more of the steps shown in fig. 4A and 4B may be omitted, repeated, and/or performed in a different order than that shown in fig. 4A and 4B. Accordingly, the scope of the present invention should not be construed as limited to the particular arrangement of steps illustrated in fig. 4A and 4B.

Initially, assume that character segment a is drawn before character segment B (step 405).

At step 489, it is determined whether segment A and segment B intersect. In other words, at step 489, the intersection of segment a and segment B, if present, is located. When fragment a and fragment B cross, processing proceeds to step 491. When segment A and segment B do not intersect, processing proceeds to step 410 (shown in FIG. 4B).

At step 491, texture attributes are computed for fragment A and fragment B. The texture attributes may correspond to, for example, a fill color of the segment, a fill pattern of the segment, etc. In one or more embodiments, the texture attributes may be homologous (homogenes) for the entire length of the character fragment. Additionally or alternatively, texture attributes may vary along the length of the character fragment. In one or more embodiments of the invention, texture attributes for a segment may be available from the output of the skeleton extractor (104), or at least may be derived from the output of the skeleton extractor (104).

At step 493, it is determined whether the texture properties of segment A and segment B, particularly near the intersection, are significantly different. For example, if the texture attribute is color, it is determined whether the difference between the RGB color value for segment A and the RGB color value for segment B exceeds a predetermined threshold. When it is determined that the texture attributes are significantly different, processing proceeds to step 495. When it is determined that the texture attributes are not significantly different, processing proceeds to step 410 (shown in FIG. 4B).

At step 495, the intersection texture attributes (i.e., the texture attributes at the intersection of segment A and segment B) are compared to both the texture attributes of segment A near the intersection and the texture attributes of segment B near the intersection.

At step 497, a determination is made as to whether the intersection texture attribute matches or better matches the texture attribute of segment B. For example, if the texture attribute is color, a determination is made whether the RGB color values of the intersection better match the RGB color values of character segment A or the RGB color values of character segment B. If step 497 is true, this implies that segment B is above character segment A, and thus segment A is drawn before character segment B. Thus, the assumption that character segment A was drawn before character segment B is correct (step 440). However, if step 497 is false, this implies that character segment A is above character segment B, and thus that character segment B is drawn before character segment A. Thus, the assumption that character a was drawn before segment B is incorrect (step 445) (i.e., character segment B was actually drawn before character segment a).

Referring now to FIG. 4B, at step 410, various calculations are performed. Specifically, the angle (θ) between the perpendicular line and the character segment A is calculated_A) (ii) a And calculates an angle (theta) between the perpendicular line and the character segment B_B). In addition, the vertical position (i.e., top position) of the character fragment A is calculated (measured from the top of the character bounding box) (VP)_A) (ii) a And also calculates the wordVertical position of fragment B (VP)_B). In addition, the Leftmost Position (LP) of the character fragment A is calculated_A) (ii) a And calculates the Leftmost Position (LP) of character fragment B_B)。

In step 415, θ is determined_AAnd theta_BWhether the absolute value of the difference between is substantial (i.e., exceeds a predetermined threshold). At step 420, it is determined whether character segment A is more vertical than character segment B. If step 420 is true, then the assumption that character segment A drawn before character segment B is correct (step 440). However, if step 420 is false, then the assumption that character segment A was drawn before character segment B is incorrect (step 445) (i.e., character segment B was actually drawn before character segment A).

In step 425, LP is determined_AAnd LP_BWhether the absolute value of the difference between is substantial (i.e., exceeds a predetermined threshold). At step 430, it is determined whether character segment A is closer to the left side of the character bounding box than character segment B. If step 430 is true, then the assumption that character segment A drawn before character segment B is correct (step 440). However, if step 430 is false, then the assumption that character segment A was drawn before character segment B is incorrect (step 445) (i.e., character segment B was actually drawn before character segment A).

At step 435, it is determined whether character segment A is closer to the top of the character bounding box than character segment B. If step 435 is true, then the assumption that character segment A drawn before character segment B is correct (step 440). However, if step 435 is false, then the assumption that character segment A was drawn before character segment B is incorrect (step 445) (i.e., character segment B was actually drawn before character segment A).

The process depicted in FIG. 4 may be repeated for each and every pair of character fragments in the character bounding box. Those skilled in the art who have the benefit of this detailed description will appreciate that when step 440 is reached for each and every pair of character fragments in the character bounding box, the determined ordering for those character fragments is correct.

Although FIG. 4 shows all calculations occurring at step 410, in one or more embodiments of the invention, calculation (ii) is performed only after step 425 is found to be false. Similarly, in one or more embodiments of the invention, calculation (iii) is performed only after step 415 is found to be false.

FIG. 5A shows an example in accordance with one or more embodiments of the invention. Fig. 5A shows an image (502) with hand-drawn text characters. The image (502) may have other text characters (not shown). Further, the image (502) may have a plurality of lines of text (not shown). The skeleton extractor may extract character segments (504) from the image (502). As shown in fig. 5A, the character segments Ω, Δ, and Σ have been extracted from the image (502).

The ordering for the character fragments is then determined (506) using one or more of the processes described above. Specifically, it is determined that the character segment Ω is drawn first, the character segment Σ is drawn second, and the character segment Δ is drawn last.

The direction for the character fragment is then determined (508) using one or more of the processes described above. In particular, the character segments Ω and Σ are determined to be drawn from top to bottom (i.e., toward the user). Further, the character segment Δ is plotted from left to right.

The character fragments (504), the ordering (506), and the direction (508) are sent to the ICR engine to perform character recognition. The ICR engine uses the character fragments (504), the ordering (506), and the orientation (508) to identify the characters. Furthermore, by utilizing the ordering (506) and direction (508) in the character recognition process, the recognized characters will be more likely to correctly match hand-drawn characters from the image (502).

FIG. 5B shows an example in accordance with one or more embodiments of the invention. As shown in fig. 5B, there is a character segment (599) corresponding to a character (not shown). Character segments are extracted from the bitmap image by a skeleton extractor (599). The character fragment (599) includes two endpoints: endpoint a (597) and endpoint B (595). Endpoint a (597) has a larger brush width than endpoint B (595). Endpoint a (597) has a deeper density than endpoint B (595). Thus, endpoint a (597) is considered the beginning endpoint, and endpoint B (595) is considered the ending endpoint. Thus, by using the brush width and/or density, the character fragment (599) is oriented from left to right. The brush width and/or density may be provided by a skeletal extractor.

Still referring to fig. 5B, there is also a character fragment a (589) and a character fragment B (587). The two character segments (587,589) are extracted from the bitmap image by a skeleton extractor. As shown in fig. 5B, character segment a (589) and character segment B (587) intersect. The intersection texture attribute (585) matches better with the texture attribute (583) of character fragment B than with the textural property (581) of character fragment a. This implies that character fragment B (587) is above character fragment a (589), and thus character fragment a (589) is drawn before character fragment B (587). In other words, the ordering of the character fragments (587,589) may be determined based on the texture attributes. These texture attributes may be provided by a skeleton extractor that extracts the character fragments (587,589).

Various embodiments of the invention may have one or more of the following advantages: the ability to determine the direction of a set of character segments based on brush width and/or density; an ability to determine an ordering of a set of character fragments based on the texture attribute; an ability to determine a rendering duration of a set of character fragments; increasing the ability of the ICR engine to output recognized characters that correctly match characters in the image; the ability to reduce the time required to output recognized characters; the ability to test whether the ordering of the two character segments is correct; the ability to assign constant speed or different speeds to different character segments; and so on.

Embodiments of the invention may be implemented on virtually any type of computing system, regardless of the platform being used. For example, the computing system may be one or more mobile devices (e.g., laptop computers, smart phones, personal digital assistants, tablet computers, or other mobile devices), desktop computers, servers, blades (blades) in a server rack (chassis), or any other type of computing device or devices that includes at least minimum processing power, memory, and input and output device(s) to perform one or more embodiments of the invention. For example, as shown in fig. 6, a computing system (600) may include one or more computer processors (602), associated memory (604) (e.g., Random Access Memory (RAM), cache, flash memory, etc.), one or more storage devices (606) (e.g., a hard disk, an optical disk such as a Compact Disk (CD) drive or a Digital Versatile Disk (DVD) drive, a flash memory stick, etc.), and various other elements and functionality. The computer processor(s) (602) may be an integrated circuit for processing instructions. For example, the computer processor(s) may be one or more cores or microkernels of a processor. The computing system (600) may also include one or more input devices (610), such as a touch screen, keyboard, mouse, microphone, touch pad, electronic pen, or any other type of input device. Further, the computing system (600) may include one or more output devices (608), such as a screen (e.g., a Liquid Crystal Display (LCD), a plasma display, a touch screen, a Cathode Ray Tube (CRT) monitor, a projector, or other display device), a printer, external storage, or any other output device. One or more of the output device(s) may be the same or different from the input device(s). The computing system (600) may be connected to a network (612) (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), such as the internet, a mobile network, or any other type of network) via a network interface connection (not shown). The input and output device(s) may be connected to the computer processor(s) (602), memory (604), and storage device(s) (606), either locally or remotely (e.g., via a network (612)). Many different types of computing systems exist, and the aforementioned input and output device(s) may take other forms.

Software instructions in the form of computer readable program code may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, DVD, storage device, disk, tape, flash memory, physical memory, or any other computer readable storage medium to carry out embodiments of the invention. In particular, these software instructions may correspond to computer readable program code which, when executed by a processor(s), is configured to perform embodiments of the invention.

Further, one or more elements of the aforementioned computing system (600) may be located at a remote location and connected to the other elements over a network (612). Furthermore, one or more embodiments of the invention may be implemented on a distributed system having multiple nodes, where each portion of the invention may be located on a different node within the distributed system. In one embodiment of the invention, the node corresponds to a distinct computing device. Alternatively, the node may correspond to a computer processor with associated physical memory. The node may alternatively correspond to a microkernel of a computer processor or a computer processor having shared memory and/or resources.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. A method for character recognition, comprising:

obtaining a plurality of character segments extracted from an image;

determining a first character bounding box comprising a first set of the plurality of character fragments and a second character bounding box comprising a second set of the plurality of character fragments;

determining a plurality of directions for the first set and a plurality of timing attributes for the first set, wherein the plurality of timing attributes comprises an ordering for the first set and a rendering duration for the first set;

calculating a first angle between the vertical and a first character segment in the first set;

calculating a second angle between the vertical line and a second character segment in the first set;

determining that the first character fragment is drawn before the second character fragment in response to the first angle being less than the second angle; and

character recognition for the first character bounding box is run by sending the first set, a plurality of directions for the first set, and a plurality of timing attributes for the first set to a smart character recognition ICR engine.

2. The method of claim 1, further comprising:

determining a plurality of directions for the second set and a plurality of timing attributes for the second set; and

running character recognition for the second character bounding box by sending the second set, a plurality of directions for the second set, and a plurality of timing attributes for the second set to the ICR engine,

wherein the plurality of character segments are extracted from the image by a skeleton extractor, an

Wherein the plurality of character fragments form a single text line in the image.

3. The method according to claim 1 or 2, further comprising:

calculating a first vertical position of a first character segment in the first set;

calculating a second vertical position of a second character segment in the first set; and

in response to the first vertical position being less than the second vertical position, determining to draw the first character fragment before the second character fragment.

4. The method according to claim 1 or 2, further comprising:

resetting a clock value for the first character bounding box;

calculating a first length of a first character segment in the first set;

increasing a clock value for the first character segment based on the first length;

calculating a second length of a second character segment in the first set; and

increasing a clock value for the second character segment based on the second length,

wherein the rendering duration for the first set is the clock value.

5. The method of claim 4, further comprising:

selecting a first speed for the first character segment based on the first length, wherein increasing the clock value for the first character segment is further based on the first speed; and

selecting a second speed for the second character segment based on the second length, wherein increasing the clock value for the second character segment is further based on the second speed.

6. The method according to claim 1 or 2, further comprising:

determining a first endpoint and a second endpoint of the character segments in the first set;

calculating a first distance from the first endpoint to the user point;

calculating a second distance from the second endpoint to the user point; and

in response to the second distance being less than the first distance, determining that the first endpoint is a starting endpoint.

7. The method according to claim 1 or 2, further comprising: an editable electronic document is generated including the recognized characters output by the ICR engine.

8. A system for character recognition, comprising:

a memory;

a computer processor coupled to the memory and configured to:

obtaining a plurality of character segments extracted from an image;

9. The system of claim 8, wherein the computer processor further:

10. The system of claim 8 or 9, wherein the computer processor further:

resetting a clock value for the first character bounding box;

calculating a first length of a first character segment in the first set;

calculating a second length of a second character segment in the first set; and

wherein the rendering duration for the first set is the clock value.

11. The system of claim 10, wherein the computer processor further:

12. The system of claim 8 or 9, wherein the computer processor further:

calculating a first distance from the first endpoint to the user point;

calculating a second distance from the second endpoint to the user point; and

13. The system of claim 8 or 9, wherein the computer processor further generates an editable electronic document comprising the recognized characters output by the ICR engine.

14. A non-transitory computer readable medium, CRM, storing computer program code for execution by a computer processor for:

obtaining a plurality of character segments extracted from an image;

15. The non-transitory CRM of claim 14, further storing computer program code executed by the computer processor for:

resetting a clock value for the first character bounding box;

calculating a first length of a first character segment in the first set;

calculating a second length of a second character segment in the first set; and

wherein the rendering duration for the first set is the clock value.

16. The non-transitory CRM of claim 14 or 15, further storing computer program code for execution by the computer processor for:

calculating a first distance from the first endpoint to the user point;

calculating a second distance from the second endpoint to the user point; and

17. The non-transitory CRM of claim 14 or 15, further storing computer program code for execution by the computer processor for generating an editable electronic document comprising the recognition characters output by the ICR engine.

18. A method for character recognition, comprising:

obtaining a plurality of character segments extracted from an image;

determining an ordering for the first set based on a plurality of texture attributes for the first set;

determining a plurality of directions for the first set based on a plurality of brush widths and a plurality of densities for the first set; and

character recognition for the first character bounding box is run by sending the first set, a plurality of directions for the first set, and an ordering for the first set to a smart character recognition ICR engine.

19. The method of claim 18, further comprising:

determining a plurality of directions for the second set and timing attributes for the second set; and

running character recognition for the second character bounding box by sending the second set, a plurality of directions for the second set, and timing attributes for the second set to the ICR engine,

20. The method of claim 18 or 19, wherein determining the rank comprises:

locating an intersection of a first character segment and a second character segment in the first set;

determining intersection texture attributes;

comparing the intersection texture attribute with the texture attribute of the first character segment and the texture attribute of the second character segment; and

determining that the first character fragment is drawn before the second character fragment in response to the intersection texture attribute matching the texture attribute of the second character fragment.

21. The method of claim 20, wherein the texture attribute is color.

22. The method of claim 20, wherein the texture attribute is a fill pattern.

23. The method of claim 18 or 19, wherein determining a plurality of directions comprises:

comparing the brush width of the first endpoint with the brush width of the second endpoint; and

determining that the first endpoint is a starting endpoint in response to the brush width of the first endpoint exceeding the brush width of the second endpoint.

24. The method of claim 18 or 19, wherein determining a plurality of directions comprises:

comparing the density of the first endpoint with the density of the second endpoint; and

determining that the first endpoint is a starting endpoint in response to the density of the first endpoint exceeding the density of the second endpoint.

25. The method of claim 18 or 19, wherein determining the rank comprises:

calculating a second angle between the vertical line and a second character segment in the first set; and

in response to the first angle being less than the second angle, determining that the first character fragment is drawn before the second character fragment.

26. The method of claim 18 or 19, further comprising determining a rendering duration for the first set by:

resetting a clock value for the first character bounding box;

calculating a first length of a first character segment in the first set;

calculating a second length of a second character segment in the first set; and

wherein the rendering duration for the first set is the clock value, and

wherein the rendering duration for the first set is sent to the ICR engine.

27. The method of claim 26, further comprising:

28. The method according to claim 18 or 19, further comprising:

an editable electronic document is generated including the recognized characters output by the ICR engine.

29. A system for character recognition, comprising:

a memory;

a computer processor coupled to the memory and configured to:

obtaining a plurality of character segments extracted from an image;

30. The system of claim 29, wherein determining the rank comprises:

determining intersection texture attributes;

31. The system of claim 29 or 30, wherein determining the plurality of directions comprises:

32. The system of claim 29 or 30, wherein determining the plurality of directions comprises:

33. The system of claim 29 or 30, wherein the computer processor further generates an editable electronic document comprising the recognized characters output by the ICR engine.

34. A non-transitory computer readable medium, CRM, storing computer program code for execution by a computer processor for:

obtaining a plurality of character segments extracted from an image;

character recognition for the first character bounding box is run by sending the first set, a plurality of directions for the first set, and a ranking for the first set to a smart character recognition ICR engine.

35. The non-transitory CRM of claim 34, wherein determining an ordering comprises:

determining intersection texture attributes;

36. The non-transitory CRM of claim 34 or 35, wherein determining a plurality of directions comprises:

37. The non-transitory CRM of claim 34 or 35, wherein determining a plurality of directions comprises: