WO2022169123A1

WO2022169123A1 - Method for recognizing handwritten data as characters, and device therefor

Info

Publication number: WO2022169123A1
Application number: PCT/KR2022/000402
Authority: WO
Inventors: 정강훈; 최화영; 이상규
Original assignee: 주식회사 네오랩컨버전스
Priority date: 2021-02-04
Filing date: 2022-01-10
Publication date: 2022-08-11
Also published as: KR20220112368A

Abstract

Disclosed are a method for recognizing handwritten data as characters, and a device therefor, the method receiving handwritten data, extracting the rotation angle of the received handwritten data, and rotating the received handwritten data on the basis of the extracted rotation angle to recognize the rotated handwritten data as characters.

Description

Handwriting data character recognition method and device

The present invention relates to a method and apparatus for recognizing characters in handwriting data, and more particularly, to rotating handwriting data that is not horizontally aligned so that character recognition is impossible to enable character recognition, and to move the recognized character writing data to a position before recognition. The present invention relates to a method and apparatus for recognizing characters of handwriting data to be rearranged.

Recently, when writing is performed on a predetermined piece of paper with an electronic pen, the electronic pen transmits writing trace related information to a predetermined device (eg, a smartphone or computer), and the predetermined device responds to the received writing trace related information. A technique for electronically reconstructing and displaying a writing trace written on paper is used. In addition, when writing is performed on the screen of a tablet or smartphone with an input means such as a finger, a smart pen, or a stylus, the writing trace is restored in the application and displayed. In general, in a smartphone, tablet, or computer, a handwriting trace is expressed as a vector graphic and stored as a file, which is displayed on a screen through a predetermined application.

In addition, the handwriting trace displayed in the above device may be converted into text through handwriting recognition through a character recognition engine and provided to the user.

Handwriting data is unstructured data having very large fluctuations depending on individual writing habits, such as writing speed, size of characters/vowels, crooked writing, and writing direction. Such unstructured data characteristics are directly related to a decrease in the recognition rate at the time of handwriting recognition according to individual writing habits.

1 and 2 are diagrams illustrating examples of electronically implemented conventional handwriting trajectories.

Referring to FIG. 1 , the writing trajectory is arranged to be horizontal in the horizontal direction. In this case, the recognition rate is excellent, and handwriting can be recognized as a character almost as it is.

Referring to FIG. 2 , the writing is not arranged so as to be horizontal, but is inclined in an oblique direction. Recognition is impossible with the currently applied character recognition engine, and also intermediate curves or figures cannot be recognized, so there is a problem in that a broken character is provided to the user due to an error in the character recognition result.

SUMMARY OF THE INVENTION It is an object of the present invention to provide a method and an apparatus for recognizing handwriting data for recognizing a displayed or stored handwriting trace and rearranging the recognized handwriting to its original arrangement.

In order to solve the technical problem of the present invention, the present invention extracts the rotation angle of a displayed or stored writing trajectory and rearranges it in parallel by the extracted rotation angle to increase the character recognition ability, and also to convert the recognized character writing to the original position. It is provided by rearranging it according to the arrangement.

According to the present invention, there is an advantage in that the character recognition rate can be increased through the rotation of the handwriting data for handwriting in any direction.

In addition, by rotating the character recognition data again and rearranging the handwriting data generated as graphic data according to the coordinates, the overall shape and position of the handwriting data that has undergone the character recognition procedure is expressed in a form similar to the received original handwriting data, It is possible to resolve the objection to the handwriting recognition result. In addition, since the unrecognized handwriting data is re-expressed as a vector graphic, the original handwriting content can be checked even if the character recognition fails.

3 is a flowchart illustrating a method for recognizing characters in handwriting data according to an embodiment of the present invention.

4 is an exemplary diagram of received stroke data.

5 is a diagram illustrating an example of a process of dividing a word using the coordinates of the stroke data of FIG. 4 .

6 is a diagram illustrating an equation according to an embodiment of the present invention.

7 is a diagram illustrating an example of a word-separated result according to an embodiment of the present invention.

8 is a diagram illustrating an example of an optimization function according to an embodiment of the present invention.

9 to 21 are diagrams illustrating equations according to another embodiment of the present invention.

22 is a diagram illustrating an example of an optimization function of stroke data in word units according to an embodiment of the present invention.

23 is a diagram illustrating an example in which the handwriting data of FIG. 2 is recognized and rearranged.

24 is a block diagram of an apparatus for recognizing handwriting data according to an embodiment of the present invention.

According to an embodiment of the present invention for solving the above technical problem, a method for recognizing a character of handwriting data includes: receiving handwriting data; extracting a rotation angle of the received handwriting data; rotating the received handwriting data based on the extracted rotation angle; and character recognition of the rotated handwriting data.

The extracting of the rotation angle of the received handwriting data may include dividing the received handwriting data into word units; and extracting a rotation angle of the handwriting data separated in units of words.

Separating the received handwriting data into word units may include dividing the received handwriting data into word units according to data related to a stroke time of the received handwriting data and a weight based on coordinates of the received handwriting data. may include

The extracting of the rotation angle of the received writing data may include extracting the rotation angle of the received writing data according to an optimization function calculated using a least-squares method.

The character recognition of the rotated handwriting data may include normalizing a size of the rotated handwriting data; and character recognition of the handwriting data whose size is normalized.

The character recognition of the rotated handwriting data may include: arranging the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data was written; and character recognition of the arranged handwriting data.

The handwriting data character recognition method may further include outputting the character-recognized handwriting data.

The outputting of the character-recognized writing data may include re-rotating and outputting the character-recognized writing data based on the extracted rotation angle.

The handwriting data character recognition method includes the steps of: extracting unrecognized handwriting data from the received handwriting data; generating the received handwriting data corresponding to the unrecognized handwriting data as graphic data; and superimposing the generated graphic data on the output handwriting data and outputting the overlapping step.

The handwriting data character recognition method includes: extracting handwriting data recognized as a figure from the received handwriting data; generating handwriting data recognized as the figure as graphic data; and superimposing the generated graphic data on the output handwriting data and outputting the overlapping step.

The step of re-rotating and outputting the character-recognized handwriting data based on the extracted rotation angle may include uniformizing the size, horizontal spacing, and inter-character spacing of the character-recognized writing data; and re-rotating and outputting the character-recognized handwriting data having the uniform size, horizontal spacing, and spacing based on the extracted rotation angle.

According to another embodiment of the present invention for solving the above technical problem, a receiving unit for receiving handwriting data; a preprocessing unit extracting a rotation angle of the received writing data and rotating the received writing data based on the extracted rotation angle; and a character recognition unit for character recognition of the rotated handwriting data.

In describing the embodiments of the present specification, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present specification, the detailed description thereof will be omitted.

In the present invention, the description of "including" a specific configuration does not exclude configurations other than the corresponding configuration, and it means that additional configurations may be included in the practice of the present invention or the scope of the technical spirit of the present invention.

In addition, the components shown in the embodiment of the present invention are shown independently to represent different characteristic functions, and it does not mean that each component is composed of separate hardware or a single software component. That is, each component is listed as each component for convenience of description, and at least two components of each component are combined to form one component, or one component can be divided into a plurality of components to perform a function, and each Integrated embodiments and separate embodiments of components are also included in the scope of the present invention without departing from the essence of the present invention.

In addition, some of the components are not essential components for performing essential functions in the present invention, but may be optional components for merely improving performance. The present invention can be implemented by including only essential components to implement the essence of the present invention except for components used for improving performance, and only having a structure including essential components excluding optional components used for improving performance Also included in the scope of the present invention.

Also, in this specification, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted.

A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

Referring to FIG. 3 , in step 310, an apparatus (hereinafter, referred to as a 'character recognition apparatus') that performs a method for recognizing text on handwriting data receives handwriting data.

The handwriting data refers to handwriting trace data stored in a file in a smartphone, tablet, or computer, or handwriting trace data currently displayed on a screen. The handwriting data consists of stroke data consisting of a continuous set of coordinates. The stroke data represents one stroke or spelling for each language. Stroke data may be divided into data as detailed elements such as figures and pictures. Stroke data itself has no special meaning in cases other than figures, and is stored in contact with an input tool (electronic pen, stylus, smart pen, etc.) on paper or other media such as a smartphone or tablet, A single character or symbol data is stored through a contact-non-contact process.

The received handwriting data is data continuously including coordinates and time information, and has various forms in which figures and symbols are mixed.

In step 320, the character recognition apparatus determines whether the handwriting data is a figure. If it is a figure, the process proceeds to step 360; otherwise, the process proceeds to step 330.

Whether to determine whether a figure is a figure may be determined by determining handwriting data corresponding to a predefined figure as a figure, or writing data determined not to be a character as a figure. Whether to determine the figure may be determined in advance through learning.

In step 330, the character recognition apparatus extracts a rotation angle of the handwriting data.

According to an embodiment of the present invention, the character recognition apparatus separates handwriting data in a page according to a predetermined criterion, and then extracts a rotation angle for each separated unit of handwriting data.

In one embodiment of the present invention, the character recognition apparatus separates handwriting data in a page in units of words. A superset to which a meaning is given to a collection of stroke data is called a word or a word. A word is meaningful data and serves as a minimum unit for handwriting recognition. When the handwriting data is analyzed, the handwriting tends to be almost constant in the case of handwriting within a word unit rather than a sentence, and the handwriting is often written in a different position or direction for each word. Accordingly, in the present invention, in order to maximize the possibility of character recognition, handwriting data is divided into word units.

Since the displacement of the handwriting time and coordinates for one word in the handwriting data is small, and the handwriting time and coordinate change between words is large compared to within a word, the handwriting data storage time, that is, data related to the stroke time of the handwriting data and the received handwriting Set weights based on the coordinates of the data.

A method of setting a weight according to an embodiment of the present invention is as follows.

1. Timestamp average of all handwriting data, that is, T _avg , which is the average of the required writing time for each stroke, and the average of the time difference between handwriting data, that is, the average of the time obtained by subtracting the start time of the next stroke from the end time of the previous stroke Find T' _avg .

2. Find the time threshold, T _th . T _th =T _avg + T' _avg .

3. Obtain the average coordinate for each stroke and use it as the measured value (x _i , y _i ), find the maximum and minimum of the coordinate value, and determine the distance from the farthest value as the radius (R _i ) of the stroke. Algorithms for finding the maximum and minimum of the coordinate values use known algorithms.

4. If the time difference T' between the current stroke and the next continuous stroke is greater than the time threshold value T _th , a weight is added. In one embodiment of the present invention, a weight +2.5 is added.

5. If T'/T _th >1.7, a weight is added. Here, 1.7 is the experimental value obtained through the experiment. In this case as well, in an embodiment of the present invention, a weight of +2.5 is added.

6. After that, the weight of word separation is added based on the coordinates as shown below. The conditions of the coordinate-based processing will be described with reference to FIG. 5 as follows.

FIG. 4 is a diagram of received stroke data, and FIG. 5 is a diagram illustrating a process of dividing a word using the coordinates of the stroke.

(1) Find the distance between the midpoint of the current stroke as a reference and the midpoint of the previous stroke. The distance l is the same as the equation of FIG. 6 .

Referring to FIG. 6 , (x _i , y _i ) is the midpoint of the current stroke, and (x _i-1 , y _i-1 ) is the midpoint of the previous stroke.

(2) The following is performed through the radius R i of the current stroke and the radii R _i _-1, R _i-2 , R _i-3 , and R _i-4 of the previous stroke. Compare the R _sum > R _i condition to the previous sum of stroke radii R _sum , and update the R _sum value when the condition is true for all stroke data (the first time, R _sum is a real number to calculate R _sum ) assign a maximum value). Setting the previous stroke to four was obtained experimentally. In another embodiment of the invention, the number of previous strokes may be changed.

(3) For the case of R _sum > R _i , if l > (R _i + R _i-1 ), the weight is increased by +1. If l > (R _i + R _i-1 ) does not hold, it is compared with the previous radius value repeatedly as (l > (R _i + R _i-1 ) _, 2≤i≤4), and one If even the condition of is satisfied, the weight is increased by +1. If all coordinate comparison conditions are not met, the weight is reduced by -1 so that words are not separated.

According to the conditions 4 to 6 above, when the weight is 5 or more, the stroke data is divided into word units. The criterion of weight 5 is a numerical value obtained experimentally. In the present invention, as a criterion for separating words, the importance of writing data storage time is set higher than that of the coordinates.

After separating into word units, the character recognition apparatus obtains the directionality of the stroke data of the word unit, and then extracts the rotation angle of the stroke data of the word unit based on the directionality.

In an embodiment of the present invention, an optimization function representing a correlation is obtained using a least-squares method, and a rotation angle of stroke data in word units is obtained according to the optimization function.

In general data analysis, the correlation of data is analyzed by finding the regularity of the dependent variable y that changes according to the independent variable x. In an embodiment of the present invention, the least squares method is used for correlation analysis of the atypical stroke data. The least-squares method is one of the methods for finding the function y=f(x) indicating the correlation, and can derive the characteristic function f(x) of the data regardless of the atypical characteristic of the stroke data. Through the derived characteristic function f(x), the directionality, slope, and scale factor of the word can be calculated, and the words are normalized based on this value.

The least squares method is used to predict one reference variable as one or more predictive variables by a linear assumption so that the sum of the squares of the distance between the actual reference variable and the reference variable predicted by the linear assumption is minimized. When a functional relationship exists between any two economic variables x and y, the least-squares method is generally used to quantitatively determine the causal relationship.

In Fig. 8, the square figure represents the midpoint of each stroke, and the straight line represents the optimization function.

Referring to FIG. 8 , y=ax+b, which is optimized for the measured value (x _i , y _i ), which is the midpoint of each stroke, is derived. The optimization function y=ax+b is a linear function that minimizes the sum of the errors calculated through the least-squares method.

Each measurement (x ₁ , y ₁ ), ( _{x2, y2} ), … , the distance from (x _n , y _n ) to the optimal function has a minimum value. The equation of the least squares method is the same as the equation of FIG. 9 .

As shown in FIG. 8 , the sum of squared deviations ε ² is referred to as a residual, and the error is the same as the equation of FIG. 10 .

If the stroke data is a linear relationship, it can be expressed in the form of a linear equation of y _real =ax+b, and the error is the same as the equation of FIG. 11 .

Referring to the equations of FIGS. 10 and 11 , as shown in the equations of FIGS. 12 and 13 below, a and b have a value that minimizes the error ε ² when the partial derivative of each becomes 0.

A and b satisfying the equations of FIGS. 12 and 13 are the same as the equations of FIGS. 14 and 15 .

According to the above equation, it is possible to obtain an optimization function y=ax+b by obtaining a and b.

The standard deviations σx, σy, σa, and σb of x, y, a, and b are the same as the equations of FIGS. 16 to 19 .

In the equation of FIG. 16 , x _avg is the average value of x, and the standard deviations of a and b in the equations of FIGS. 18 and 19 can be obtained from the standard deviation of y in the equation of FIG. 17 .

Here, the suitability of the optimization function can be determined through the correlation coefficient. The value of the correlation coefficient is a value indicating the data representation suitability of the optimization function, and the closer to 1, the more optimized the function is. When the correlation coefficient is 1, all data exactly match the optimization function; when it is close to 1, it does not match but is close to a straight line; when it is 0, it means that all data coordinates are evenly distributed and do not approach a straight line. .

After calculating y _{avg which is the average value of the measured value y, accumulating the squares of the difference between the measured value y and the average value y avg} _, calculating the y _real value corresponding to the measured value x _i , and then calculating the difference between the measured value and the straight line value Calculate the sum of squares. Thereafter, the suitability of the optimization function is examined as shown in the equation of FIG. 20 .

In the equation of FIG. 20 , if appropriate, υ=0, and if appropriate, ∀y _real ≡y _i , υ=∑(y _i -y _avg ) ² .

Referring to the equation of FIG. 20 , a correlation coefficient r ² can be obtained as shown in the equation of FIG. 21 .

Referring to FIG. 21 , as the correlation coefficient r ² is closer to 0, it indicates the degree of inappropriateness of the optimization function, and as the correlation coefficient r 2 is closer to 1, it indicates that the optimization function is suitable. In this way, it is possible to determine the suitability of the optimization function using the correlation coefficient.

22 is a diagram illustrating an example of an optimization function of stroke data in word units according to an embodiment of the present invention. In FIG. 22, the red solid line is the optimization function.

As such, when the optimization function of the linear function is obtained based on the central coordinates of the stroke data, the rotation angle of the stroke data can be known due to the coefficient a of the variable x due to the characteristic of the linear function. For example, when a is -1, it can be seen that the angle is tilted by 45° in the clockwise direction.

In operation 340, the text recognition apparatus horizontally rotates the received handwriting data based on the extracted rotation angle.

The character recognition apparatus removes the angular element of the rotation angle extracted for each word unit. For example, if the angle of rotation extracted above is 45°, rotate -45° to remove the angle component. In this case, the received handwriting data are horizontally and parallelly aligned.

As another embodiment of the present invention, when a rotation angle in which the extracted rotation angle dominates the entire page exists, the entire coordinates of the stroke data may be rotated based on the rotation angle.

Thereafter, in order to increase the accuracy of character recognition of the handwriting, the size of the handwriting may be normalized in units of words. The size of each handwriting is normalized based on the vertical and horizontal ratios of the word, so that the handwritings in the word unit have the same size scale factor. In addition, the horizontal spacing and the letter spacing of the handwriting data in the word unit may be changed in the same way, and the spacing between the word units may be changed in the same way.

In one embodiment of the present invention, the vertical and horizontal ratio of the word unit handwriting arranged to be rotated and horizontal cannot exceed 1.0. If the ratio is greater than 1.0, since the direction of the word is vertical, it is possible to additionally rotate the handwriting in word units by 90° or 270°.

In operation 350, the character recognition apparatus recognizes the rotated handwriting data as a character.

After arranging the rotated handwriting data as a data set for handwriting recognition in a virtual space to match the order of handwriting in word units, the character recognition apparatus recognizes the handwriting data in word units using a character recognition module.

If the character recognition apparatus does not recognize the rotated handwriting data, the process moves to step 360 .

In step 360, the character recognition device generates the handwriting data as graphic data. In an embodiment of the present invention, the character recognition apparatus generates handwriting data recognized as a figure or not recognized as a character as vector graphic data.

In step 370, the character recognition apparatus rearranges the recognized character handwriting data on a screen or a page and outputs it.

The character recognition apparatus generates text by applying a predetermined font to the character-recognized handwriting data. The size of the text is output to be the same as the normalized size when character recognition is performed for each word unit, and the spacing and horizontal spacing of the text within the word unit are the same and outputted. Thereafter, the character recognition apparatus rearranges the text according to the rotation angle for each word unit and the received coordinates and outputs the rearranged text on a new page.

In addition, the character recognition apparatus superimposes the handwriting data generated as graphic data on the text according to the coordinates and outputs it.

As a result of this rearrangement, the overall shape and location of the handwriting data that has undergone the character recognition procedure is expressed in a form similar to the received original handwriting data, thereby resolving the objection to the handwriting recognition result. In addition, since the unrecognized handwriting data is re-expressed as a vector graphic, the original handwriting content can be checked even if the character recognition fails.

In an embodiment of the present invention, the character recognition apparatus may rearrange and output the text box form for each word unit. In this case, the handwriting recognition result may be provided to an external document through copying and pasting, or the like, or the original form and order of saving in a rich format may be maintained.

Referring to FIG. 24 , the character recognition apparatus 1000 includes a receiver 1010 , a preprocessor 1020 , a character recognition unit 1030 , and an output unit 1040 .

The receiver 1010 receives handwriting data.

The preprocessor 1020 determines whether the handwriting data is a figure. In the case of a figure, vector graphic data is generated according to the corresponding stroke coordinates.

The preprocessor 1020 extracts the rotation angle of the handwriting data.

According to an embodiment of the present invention, the preprocessor 1020 separates the writing data in the page according to a predetermined criterion, and then extracts a rotation angle for each separated unit of the writing data.

In one embodiment of the present invention, the preprocessor 1020 separates the handwriting data in the page in word units. A superset to which a meaning is given to a collection of stroke data is called a word or a word. A word is meaningful data and becomes the minimum unit for handwriting recognition. When the handwriting data is analyzed, the handwriting tends to be almost constant in the case of handwriting within a word unit rather than a sentence, and the handwriting is often written in a different position or direction for each word. Accordingly, in the present invention, in order to maximize the possibility of character recognition, handwriting data is divided into word units.

Since the displacement of the writing time and coordinates for one word in the writing data is small, and the writing time and coordinate change between words is large compared to within a word, the preprocessor 1020 determines the writing data storage time, that is, the stroke time of the writing data and A weight is set based on the associated data and the coordinates of the received handwriting data.

1. The preprocessor 1020 calculates the timestamp average of all the writing data, that is, T _avg , which is the average of the required writing time for each stroke, and the average of the time difference between the writing data, that is, the start time of the next stroke from the end time of the previous stroke. Find the average of the subtracted times, T' _avg .

2. The preprocessor 1020 obtains T _th , which is a time threshold. T _th =T _avg + T' _avg .

3. The preprocessor 1020 obtains the average coordinate for each stroke and uses it as the measured value (x _i , y _i ), obtains the maximum and minimum of the coordinate value, and calculates the distance from the most distant value to the radius (R) of the stroke _i ) is determined. The preprocessor 1020 uses a known algorithm for an algorithm for obtaining the maximum and minimum of the coordinate values.

4. The preprocessor 1020 adds weights when the time difference T' between the current stroke and the subsequent stroke is greater than the time threshold value T _th . In an embodiment of the present invention, the preprocessor 1020 adds a weight of +2.5.

5. The preprocessor 1020 adds weights when T'/T _th >1.7. Here, 1.7 is the experimental value obtained through the experiment. Also in this case, in an embodiment of the present invention, the pre-processing unit 1020 adds a weight of +2.5.

6. After that, the preprocessor 1020 adds a weight of word separation based on the coordinates as follows. The conditions of the coordinate-based processing will be described below with reference to FIG. 5 mentioned above.

(1) The preprocessor 1020 obtains a distance between the midpoint of the current stroke as a reference and the midpoint of the previous stroke. The distance l is the same as the equation of FIG. 6 . Here, (x _i , y _i ) is the midpoint of the current stroke, and (x _i-1 , y _i-1 ) is the midpoint of the previous stroke.

(2) The preprocessor 1020 performs the following through the radius R _i of the current stroke and the radii R _{i-1 ,} R _i-2 , R _i-3 , and R _i-4 of the previous stroke. The preprocessor 1020 compares the R _sum > R _i condition with respect to the sum of the previous stroke radii R _sum , and updates the R _sum value when the condition is true for all stroke data (the first time is R _sum ) assign the real maximum to R _sum to calculate). Setting the previous stroke to four was obtained experimentally. In another embodiment of the invention, the number of previous strokes may be changed.

(3) The preprocessor 1020 increases the weight by +1 when l > (R _i + R _i-1) with respect to the case of R _sum > R _i . If l > (R _i + R _i-1) does not hold, the preprocessor 1020 repeats the previous radius value and (l > (R _i + R _i-1), 2≤i≤4) and They are compared together, and if at least one condition is satisfied, the weight is increased by +1. If all the coordinate comparison conditions are not met, the preprocessor 1020 reduces the weight by -1 so that the word is not separated.

The preprocessor 1020 separates stroke data in word units when the weight is 5 or more according to conditions 4 to 6 above. The criterion of weight 5 is a numerical value obtained experimentally. In the present invention, as a criterion for separating words, the importance of the writing data storage time is set higher than that of the coordinates.

The preprocessor 1020 separates the word units, acquires the directionality of the word unit stroke data, and extracts a rotation angle of the word unit stroke data based on the directionality.

In one embodiment of the present invention, the preprocessor 1020 obtains an optimization function representing a correlation by using the least squares method, and obtains a rotation angle of the stroke data in word units according to the optimization function.

Since the method for obtaining the optimization function y=ax+b has been described with the equations and descriptions thereof in the drawings above, it will be omitted to avoid duplicate descriptions.

Here, the suitability of the optimization function can be determined through the correlation coefficient. The value of the correlation coefficient is a value indicating the data expression suitability of the optimization function, and the closer to 1, the more optimized the function is. When the correlation coefficient is 1, all data exactly match the optimization function; when it is close to 1, it does not match but is close to a straight line; when it is 0, it means that all data coordinates are evenly distributed and do not approach a straight line. .

The preprocessor 1020 horizontally rotates the received handwriting data based on the extracted rotation angle.

The preprocessor 1020 removes the angular element of the rotation angle extracted for each word unit. For example, when the rotation angle extracted above is 45°, the preprocessor 1020 rotates the word part by -45° to remove the angle element. In this case, the received handwriting data are horizontally and parallelly aligned.

In another embodiment of the present invention, when the extracted rotation angle has a dominant rotation angle in the entire page, the preprocessor 1020 may rotate the entire coordinates of the stroke data based on the rotation angle.

Thereafter, in order to increase the accuracy of character recognition of the handwriting, the preprocessor 1020 may normalize the size of the handwriting in units of words. The size of each handwriting is normalized based on the vertical and horizontal ratios of the word, so that the handwritings in the word unit have the same size scale factor. In addition, the horizontal spacing and the letter spacing of the handwriting data in the word unit may be changed equally, and the spacing between the word units may also be changed in the same way.

In an embodiment of the present invention, the vertical and horizontal ratios of the word unit handwriting arranged to be rotated and horizontal cannot exceed 1.0. If the ratio is greater than 1.0, since the direction of the word is vertical, the preprocessor 1020 may additionally rotate the handwriting in word units by 90° or 270°.

The character recognition unit 1030 recognizes the rotated handwriting data as a character.

The character recognition unit 1030 arranges the rotated handwriting data in a virtual space as a data set for handwriting recognition so as to match the written order in units of words, and then recognizes the handwriting data in units of words as characters.

If the character recognition unit 1030 does not recognize the rotated writing data as characters, the writing data in word units is generated as graphic data. In an embodiment of the present invention, the character recognition unit 1030 generates handwriting data recognized as a figure or not recognized as a character as vector graphic data.

The output unit 1040 rearranges the character-recognized handwriting data on a screen or page and outputs it.

The output unit 1040 generates text by applying a predetermined font to the character-recognized handwriting data. The output unit 1040 outputs the size of the text to be the same as the normalized size when recognizing characters for each word unit, and outputs the same spacing and horizontal spacing of the text within the word unit. Thereafter, the output unit 1040 rearranges the text according to the rotation angle for each word unit and the received coordinates and outputs the rearranged text on a new page.

Also, the output unit 1040 superimposes the handwriting data generated as graphic data on the text according to the coordinates and outputs it.

In an embodiment of the present invention, the output unit 1040 may rearrange and output the text box form for each word unit. In this case, the handwriting recognition result may be provided to an external document through copying and pasting, or the like, or the original form and order of saving in a rich format may be maintained.

The handwriting data character recognition method as described above can also be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes any type of recording medium in which data readable by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. In addition, the computer-readable recording medium is distributed in a computer system connected through a network, so that the computer-readable code can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the disk management method can be easily inferred by programmers in the art to which the present invention pertains.

So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

Claims

receiving handwriting data;

extracting a rotation angle of the received handwriting data;

rotating the received handwriting data based on the extracted rotation angle; and

and recognizing the rotated handwriting data as a character.
The method of claim 1,

The step of extracting the rotation angle of the received handwriting data includes:

separating the received handwriting data into word units; and

and extracting a rotation angle of the handwriting data separated in word units.
3. The method of claim 2,

Separating the received handwriting data into word units includes:

and separating the received handwriting data in word units according to data related to a stroke time of the received handwriting data and a weight based on coordinates of the received handwriting data.
The method of claim 1,

The step of extracting the rotation angle of the received handwriting data includes:

and extracting a rotation angle of the received handwriting data according to an optimization function calculated using a least-squares method.
The method of claim 1,

The character recognition of the rotated handwriting data includes:

normalizing the size of the rotated handwriting data; and

and character recognition of the handwriting data whose size is normalized.
The method of claim 1,

The character recognition of the rotated handwriting data includes:

arranging the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data was written; and

and character recognition of the arranged writing data.
The method of claim 1,

and outputting the character-recognized writing data.
8. The method of claim 7,

The step of outputting the character-recognized handwriting data includes:

and re-rotating and outputting the recognized handwriting data based on the extracted rotation angle.
9. The method of claim 8,

extracting non-character-recognized handwriting data from the received handwriting data;

generating the received handwriting data corresponding to the unrecognized handwriting data as graphic data; and

and outputting the generated graphic data by superimposing the generated graphic data on the output handwriting data.
9. The method of claim 8,

extracting handwriting data recognized as a figure from among the received handwriting data;

generating handwriting data recognized as the figure as graphic data; and

and outputting the generated graphic data by superimposing the generated graphic data on the output handwriting data.
9. The method of claim 8,

The step of re-rotating and outputting the character-recognized handwriting data based on the extracted rotation angle includes:

uniformizing the size, horizontal spacing, and spacing of the character-recognized handwriting data; and

and re-rotating and outputting the recognized character-recognized writing data with the same size, horizontal spacing, and inter-character spacing based on the extracted rotation angle.
a receiver for receiving handwriting data;

a preprocessor extracting a rotation angle of the received writing data and rotating the received writing data based on the extracted rotation angle; and

and a character recognition unit for character recognition of the rotated writing data.
13. The method of claim 12,

and the preprocessing unit separates the received writing data in units of words and extracts a rotation angle of the separated writing data in units of words.
14. The method of claim 13,

and the pre-processing unit separates the received handwriting data in word units according to weights based on data related to a stroke time of the received handwriting data and coordinates of the received handwriting data.
13. The method of claim 12,

and the preprocessor extracts the rotation angle of the received writing data according to an optimization function calculated using a least-squares method.
13. The method of claim 12,

and the character recognition unit normalizes the size of the rotated writing data and recognizes the character of the normalized size of the writing data.
13. The method of claim 12,

and the character recognition unit arranges the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data is written, and recognizes the arranged handwriting data as a character.
13. The method of claim 12,

and an output unit for outputting the recognized handwriting data.
19. The method of claim 18,

and the output unit rotates and outputs the recognized handwriting data based on the extracted rotation angle.
20. The method of claim 19,

The preprocessor extracts unrecognized writing data from the received writing data, and generates the received writing data corresponding to the unrecognized writing data as graphic data;

and the output unit overlaps the generated graphic data with the output handwriting data and outputs the superimposed graphic data.
20. The method of claim 19,

The preprocessor extracts handwriting data recognized as a figure from the received handwriting data, and generates the handwriting data recognized as a figure as graphic data;

and the output unit overlaps the generated graphic data with the output handwriting data and outputs the superimposed graphic data.
20. The method of claim 19,

The output unit equalizes the size, horizontal spacing, and spacing of the character-recognized handwriting data, and re-rotates and outputs the character-recognized handwriting data having the same size, horizontal spacing, and spacing based on the extracted rotation angle. handwriting data character recognition device.