KR20220112368A

KR20220112368A - Method and apparatus of recognizing character of writing data

Info

Publication number: KR20220112368A
Application number: KR1020210015814A
Authority: KR
Inventors: 정강훈; 최화영; 이상규
Original assignee: 주식회사 네오랩컨버전스
Priority date: 2021-02-04
Filing date: 2021-02-04
Publication date: 2022-08-11
Also published as: WO2022169123A1

Abstract

Disclosed are a method for recognizing a character of handwriting data and a device thereof. The method may receive handwriting data, extract the rotation angle of the received handwriting data, rotate the received handwriting data based on the extracted rotation angle, and perform character recognition of rotated handwriting data. According to the present invention, the character recognition rate can be increased by rotating the handwriting data even if the handwriting is written in any direction.

Description

Method and apparatus of recognizing character of writing data

본 발명은 필기 데이터 문자 인식 방법 및 그 장치에 관한 것으로, 더욱 상세하게는 수평적으로 정렬되지 않아 문자 인식이 불가능한 필기 데이터를 회전시켜 문자 인식 가능하게 하며, 문자 인식된 필기 데이터를 인식 전 위치로 재배치하는 필기 데이터 문자 인식 방법 및 그 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing characters in handwriting data, and more particularly, to rotating handwriting data that is not horizontally aligned so that character recognition is impossible to enable character recognition, and to move the recognized character writing data to a position before recognition. The present invention relates to a method and apparatus for recognizing characters of handwriting data to be rearranged.

최근에는 전자펜으로 소정의 종이에 필기를 한 경우, 전자펜이 필기 궤적 관련 정보를 소정의 장치(예를 들어, 스마트폰 또는 컴퓨터)로 송신하고, 소정의 장치는 수신된 필기 궤적 관련 정보에 기초하여 종이에 필기된 필기 궤적을 전자적으로 복원하여 표시하는 기술이 사용되고 있다. 또한, 태블릿 또는 스마트폰의 화면에 손가락, 스마트펜 또는 스타일러스 등의 입력 수단으로 필기를 한 경우, 필기 궤적을 어플리케이션에서 복원하여 이를 표시한다. 일반적으로, 스마트폰, 태블릿 또는 컴퓨터에서 필기 궤적은 벡터 그래픽으로 표현되어 파일로 저장되어 있으며, 이를 소정의 어플리케이션을 통하여 화면에 표시한다.Recently, when writing is performed on a predetermined piece of paper with an electronic pen, the electronic pen transmits writing trace related information to a predetermined device (eg, a smartphone or computer), and the predetermined device responds to the received writing trace related information. A technique for electronically reconstructing and displaying a writing trace written on paper is used. In addition, when writing is performed on the screen of a tablet or smartphone with an input means such as a finger, a smart pen, or a stylus, the writing trace is restored in the application and displayed. In general, in a smartphone, tablet, or computer, a handwriting trace is expressed as a vector graphic and stored as a file, which is displayed on a screen through a predetermined application.

그리고, 위의 장치에서 표시된 필기 궤적은 문자 인식 엔진을 통한 필기 인식을 거쳐 텍스트로 전환되어 사용자에게 제공될 수 있다.In addition, the handwriting trace displayed in the above device may be converted into text through handwriting recognition through a character recognition engine and provided to the user.

필기 데이터는 필기 속도, 자/모음의 크기, 비뚤어진 필기, 필기 방향 등의 개인별 필기 습관에 따라 매우 큰 변동을 갖는 비정형 데이터이다. 이와 같은 비정형 데이터 특성은 개인별 필기 습관에 따라 필기 인식 시의 인식률 저하로 직결된다. Handwriting data is unstructured data having very large fluctuations depending on individual writing habits, such as writing speed, size of characters/vowels, crooked writing, and writing direction. Such unstructured data characteristics are directly related to a decrease in the recognition rate at the time of handwriting recognition according to individual writing habits.

도 1 및 도 2는 전자적으로 구현된 종래의 필기 궤적의 예를 나타내는 도면이다.1 and 2 are diagrams illustrating examples of electronically implemented conventional handwriting trajectories.

도 1을 참조하면, 필기 궤적이 가록 방향으로 수평이 되도록 배치되어 있다. 이 경우에는, 인식률이 뛰어나 거의 그대로 필기가 문자 인식 가능하다.Referring to FIG. 1 , the writing trajectory is arranged to be horizontal in the horizontal direction. In this case, the recognition rate is excellent, and handwriting can be recognized as a character almost as it is.

도 2를 참조하면, 필기가 수평이 되도록 배치되어 있지 않고, 사선 방향으로 기울어져 필기가 이루어져 있다. 현재 적용되고 있는 문자 인식 엔진으로는 인식이 불가능하며, 또한 중간의 곡선 또는 도형 또한 인식이 불가능하여, 문자 인식된 결과가 에러가 나서 깨진 문자가 사용자에게 제공되는 문제점이 있다.Referring to FIG. 2 , the writing is not arranged so as to be horizontal, but is inclined in an oblique direction. Recognition is impossible with the currently applied character recognition engine, and also intermediate curves or figures cannot be recognized, so there is a problem in that a broken character is provided to the user due to an error in the character recognition result.

본 발명이 해결하고자 하는 과제는 표시 또는 저장되어 있는 필기 궤적의 회전 각도를 추출하여, 추출된 회전 각도만큼 평행하게 재배치하여 문자 인식력을 높이고, 또한, 문자 인식된 필기를 원래의 배치대로 재배치하는 필기 데이터 문자 인식 방법 및 그 장치를 제공하는데 있다.The problem to be solved by the present invention is to extract the rotation angle of the displayed or stored writing trajectory and rearrange it in parallel by the extracted rotation angle to increase the character recognition power, and also to rearrange the recognized character writing to the original arrangement. To provide a data character recognition method and an apparatus therefor.

상기 기술적 과제를 해결하기 위한 본 발명의 일 실시예에 따라, 필기 데이터 문자 인식 방법은 필기 데이터를 수신하는 단계; 상기 수신된 필기 데이터의 회전 각도를 추출하는 단계; 상기 추출된 회전 각도에 기초하여 수신된 필기 데이터를 회전하는 단계; 및 상기 회전된 필기 데이터를 문자 인식하는 단계를 포함한다.According to an embodiment of the present invention for solving the above technical problem, a method for recognizing a character of handwriting data includes the steps of: receiving handwriting data; extracting a rotation angle of the received handwriting data; rotating the received handwriting data based on the extracted rotation angle; and character recognition of the rotated handwriting data.

상기 수신된 필기 데이터의 회전 각도를 추출하는 단계는 상기 수신된 필기 데이터를 워드 단위로 분리하는 단계; 및 상기 워드 단위로 분리된 필기 데이터의 회전 각도를 추출하는 단계를 포함할 수 있다.The extracting of the rotation angle of the received handwriting data may include dividing the received handwriting data into word units; and extracting a rotation angle of the handwriting data separated in units of words.

상기 수신된 필기 데이터를 워드 단위로 분리하는 단계는 상기 수신된 필기 데이터의 스트로크 시간과 관련된 데이터와 상기 수신된 필기 데이터의 좌표에 기초한 가중치에 따라 상기 수신된 필기 데이터를 워드 단위로 분리하는 단계를 포함할 수 있다.Separating the received handwriting data into word units may include dividing the received handwriting data into word units according to data related to a stroke time of the received handwriting data and a weight based on coordinates of the received handwriting data. may include

상기 수신된 필기 데이터의 회전 각도를 추출하는 단계는 최소자승법을 이용하여 계산된 최적화 함수에 따라 상기 수신된 필기 데이터의 회전 각도를 추출하는 단계를 포함할 수 있다.The extracting of the rotation angle of the received writing data may include extracting the rotation angle of the received writing data according to an optimization function calculated using a least-squares method.

상기 회전된 필기 데이터를 문자 인식하는 단계는 상기 회전된 필기 데이터의 크기를 정규화하는 단계; 및 상기 크기가 정규화된 필기 데이터를 문자 인식하는 단계를 포함할 수 있다.The character recognition of the rotated handwriting data may include normalizing a size of the rotated handwriting data; and character recognition of the handwriting data whose size is normalized.

상기 회전된 필기 데이터를 문자 인식하는 단계는 상기 회전된 필기 데이터를 상기 필기 데이터가 필기된 시간 순서에 따라 가상의 페이지 공간에 배치하는 단계; 및 상기 배치된 필기 데이터를 문자 인식하는 단계를 포함할 수 있다.The character recognition of the rotated handwriting data may include: arranging the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data was written; and character recognition of the arranged handwriting data.

필기 데이터 문자 인식 방법은 상기 문자 인식된 필기 데이터를 출력하는 단계를 더 포함할 수 있다.The handwriting data character recognition method may further include outputting the character-recognized handwriting data.

상기 문자 인식된 필기 데이터를 출력하는 단계는 상기 문자 인식된 필기 데이터를 상기 추출된 회전 각도에 기초하여 재회전하여 출력하는 단계를 포함할 수 있다.The outputting of the character-recognized writing data may include re-rotating and outputting the character-recognized writing data based on the extracted rotation angle.

필기 데이터 문자 인식 방법은 상기 수신된 필기 데이터 중에서 문자 인식되지 않은 필기 데이터를 추출하는 단계; 상기 문자 인식되지 않은 필기 데이터에 대응되는 상기 수신된 필기 데이터를 그래픽 데이터로 생성하는 단계; 및 상기 생성된 그래픽 데이터를 상기 출력된 필기 데이터에 중첩하여 출력하는 단계를 더 포함할 수 있다.The handwriting data character recognition method includes the steps of: extracting unrecognized handwriting data from the received handwriting data; generating the received handwriting data corresponding to the unrecognized handwriting data as graphic data; and superimposing the generated graphic data on the output handwriting data and outputting the overlapping step.

필기 데이터 문자 인식 방법은 상기 수신된 필기 데이터 중에서 도형으로 인식되는 필기 데이터를 추출하는 단계; 상기 도형으로 인식되는 필기 데이터를 그래픽 데이터로 생성하는 단계; 및 상기 생성된 그래픽 데이터를 상기 출력된 필기 데이터에 중첩하여 출력하는 단계를 더 포함할 수 있다.The handwriting data character recognition method includes: extracting handwriting data recognized as a figure from the received handwriting data; generating handwriting data recognized as the figure as graphic data; and superimposing the generated graphic data on the output handwriting data and outputting the overlapping step.

상기 문자 인식된 필기 데이터를 상기 추출된 회전 각도에 기초하여 재회전하여 출력하는 단계는 상기 문자 인식된 필기 데이터의 크기, 횡간 및 자간을 균일하게 하는 단계; 및 상기 크기, 횡간 및 자간이 균일하게 된 문자 인식된 필기 데이터를 상기 추출된 회전 각도에 기초하여 재회전하여 출력하는 단계를 포함할 수 있다.The step of re-rotating and outputting the character-recognized handwriting data based on the extracted rotation angle may include uniformizing the size, horizontal spacing, and inter-character spacing of the character-recognized writing data; and re-rotating and outputting the character-recognized handwriting data having the uniform size, horizontal spacing, and spacing based on the extracted rotation angle.

상기 기술적 과제를 해결하기 위한 본 발명의 다른 실시예에 따라, 필기 데이터를 수신하는 수신부; 상기 수신된 필기 데이터의 회전 각도를 추출하고, 상기 추출된 회전 각도에 기초하여 수신된 필기 데이터를 회전하는 전처리부; 및 상기 회전된 필기 데이터를 문자 인식하는 문자 인식부를 포함한다.According to another embodiment of the present invention for solving the above technical problem, a receiving unit for receiving handwriting data; a preprocessor extracting a rotation angle of the received writing data and rotating the received writing data based on the extracted rotation angle; and a character recognition unit for character recognition of the rotated handwriting data.

본 발명에 의하면, 필기를 아무 방향이나 하더라도, 이에 대한 필기 데이터의 회전을 통하여 문자 인식률을 높일 수 있는 장점이 있다.According to the present invention, there is an advantage in that the character recognition rate can be increased through the rotation of the handwriting data for handwriting in any direction.

또한, 문자 인식 데이터를 다시 회전하고, 그래픽 데이터로 생성된 필기 데이터를 좌표에 따라 재배치함으로써, 문자 인식 절차를 거친 필기 데이터의 전체적인 형태와 위치는 수신된 원본 필기 데이터와 유사한 형태로 표현이 되어, 필기 인식 결과에 대한 거부감을 해소할 수가 있다. 또한, 필기 인식되지 않은 데이터는 벡터 그래픽으로 다시 표현되기 때문에, 문자 인식을 실패하더라도 원본 필기 내용이 확인 가능하다. In addition, by rotating the character recognition data again and rearranging the handwriting data generated as graphic data according to the coordinates, the overall shape and position of the handwriting data that has undergone the character recognition procedure is expressed in a form similar to the received original handwriting data, It is possible to resolve the objection to the handwriting recognition result. In addition, since the unrecognized handwriting data is re-expressed as vector graphics, the original handwriting content can be checked even if the character recognition fails.

도 1 및 도 2는 전자적으로 구현된 종래의 필기 궤적의 예를 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 필기 데이터 문자 인식 방법에 관한 흐름도를 나타내는 도면이다.
도 4는 수신한 스트로크 데이터에 대한 예시 도면이다.
도 5는 도 4의 스트로크 데이터의 좌표를 이용하여 워드를 분리하는 과정의 예를 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따라, 워드로 분리된 결과의 예시를 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 최적화 함수의 예를 나타내는 도면이다.
도 8은 본 발명의 일 실시예에 따른 워드 단위의 스트로크 데이터의 최적화 함수의 예를 나타내는 도면이다.
도 9는 도 2의 필기 데이터를 문자 인식하여 재배치한 예시를 나타내는 도면이다.
도 10은 본 발명의 일 실시예에 따른 필기 데이터 문자 인식 장치에 관한 블록 다이어그램을 나타내는 도면이다.1 and 2 are diagrams illustrating examples of electronically implemented conventional handwriting trajectories.
3 is a flowchart illustrating a method for recognizing characters in handwriting data according to an embodiment of the present invention.
4 is an exemplary diagram of received stroke data.
FIG. 5 is a diagram illustrating an example of a process of dividing a word using the coordinates of the stroke data of FIG. 4 .
6 is a diagram illustrating an example of a word-separated result according to an embodiment of the present invention.
7 is a diagram illustrating an example of an optimization function according to an embodiment of the present invention.
8 is a diagram illustrating an example of an optimization function of stroke data in word units according to an embodiment of the present invention.
9 is a diagram illustrating an example in which the handwriting data of FIG. 2 is recognized and rearranged.
10 is a diagram illustrating a block diagram of an apparatus for recognizing handwriting data according to an embodiment of the present invention.

본 명세서의 실시예를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 명세서의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In describing the embodiments of the present specification, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present specification, the detailed description thereof will be omitted.

본 발명에서 특정 구성을 "포함"한다고 기술하는 내용은 해당 구성 이외의 구성을 배제하는 것이 아니며, 추가적인 구성이 본 발명의 실시 또는 본 발명의 기술적 사상의 범위에 포함될 수 있음을 의미한다. In the present invention, the description of "including" a specific configuration does not exclude configurations other than the corresponding configuration, and it means that additional configurations may be included in the practice of the present invention or the scope of the technical spirit of the present invention.

또한 본 발명의 실시예에 나타나는 구성부들은 서로 다른 특징적인 기능들을 나타내기 위해 독립적으로 도시되는 것으로, 각 구성부들이 분리된 하드웨어나 하나의 소프트웨어 구성단위로 이루어짐을 의미하지 않는다. 즉, 각 구성부는 설명의 편의상 각각의 구성부로 나열하여 포함한 것으로 각 구성부 중 적어도 두 개의 구성부가 합쳐져 하나의 구성부로 이루어지거나, 하나의 구성부가 복수 개의 구성부로 나뉘어져 기능을 수행할 수 있고 이러한 각 구성부의 통합된 실시예 및 분리된 실시예도 본 발명의 본질에서 벗어나지 않는 한 본 발명의 권리 범위에 포함된다.In addition, the components shown in the embodiment of the present invention are shown independently to represent different characteristic functions, and it does not mean that each component is composed of separate hardware or a single software component. That is, each component is listed as each component for convenience of description, and at least two components of each component are combined to form one component, or one component can be divided into a plurality of components to perform a function, and each Integrated embodiments and separate embodiments of components are also included in the scope of the present invention without departing from the essence of the present invention.

또한, 일부의 구성 요소는 본 발명에서 본질적인 기능을 수행하는 필수적인 구성 요소는 아니고 단지 성능을 향상시키기 위한 선택적 구성 요소일 수 있다. 본 발명은 단지 성능 향상을 위해 사용되는 구성 요소를 제외한 본 발명의 본질을 구현하는데 필수적인 구성부만을 포함하여 구현될 수 있고, 단지 성능 향상을 위해 사용되는 선택적 구성 요소를 제외한 필수 구성 요소만을 포함한 구조도 본 발명의 권리범위에 포함된다.In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components for merely improving performance. The present invention can be implemented by including only essential components to implement the essence of the present invention except for components used for improving performance, and only having a structure including essential components excluding optional components used for improving performance Also included in the scope of the present invention.

또한 본 명세서에서, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략될 것이다.Also, in this specification, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted.

첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 일 실시예에 따른 필기 데이터 문자 인식 방법에 관한 흐름도를 나타내는 도면이다.3 is a flowchart illustrating a method for recognizing characters in handwriting data according to an embodiment of the present invention.

도 3을 참조하면, 단계 310에서, 필기 데이터 문자 인식 방법을 수행하는 장치(이하, '문자 인식 장치'라고 함)는 필기 데이터를 수신한다. Referring to FIG. 3 , in step 310, an apparatus (hereinafter, referred to as a 'character recognition apparatus') that performs a method for recognizing text on handwriting data receives handwriting data.

필기 데이터는 스마트폰, 태블릿 또는 컴퓨터 내의 파일에 저장되어 있는 필기 궤적 데이터이거나, 현재 화면에 표시되어 있는 필기 궤적 데이터를 의미한다. 필기 데이터는 좌표의 연속된 집합으로 이루어진 스트로크 데이터로 이루어져 있다. 스트로크 데이터는 각 언어별로 하나의 획 또는 철자를 나타낸다. 스트로크 데이터는 도형 및 그림 등의 세부 요소로서의 데이터로 구분될 수도 있다. 스트로크 데이터 자체로는 도형 이외의 경우에는 특별한 의미를 갖지 않으며, 종이 혹은 스마트폰 또는 태블릿 등과 같은 그 이외의 매체 상에 입력도구(전자펜, 스타일러스, 스마트펜 등)를 사용하여 접촉하여 저장되며, 접촉-비 접촉의 과정을 통해 하나의 문자 혹은 기호 데이터를 저장한다. The handwriting data refers to handwriting trace data stored in a file in a smartphone, tablet, or computer, or handwriting trace data currently displayed on a screen. The handwriting data consists of stroke data consisting of a continuous set of coordinates. The stroke data represents one stroke or spelling for each language. Stroke data may be divided into data as detailed elements such as figures and pictures. Stroke data itself has no special meaning in cases other than figures, and is stored in contact with an input tool (electronic pen, stylus, smart pen, etc.) on paper or other media such as a smartphone or tablet, A single character or symbol data is stored through a contact-non-contact process.

수신된 필기 데이터는 좌표 및 시간 정보가 연속으로 포함되어 있는 데이터이며, 도형과 기호가 혼재된 다양한 형태를 갖고 있다. The received handwriting data is data continuously including coordinates and time information, and has various forms in which figures and symbols are mixed.

단계 320에서, 문자 인식 장치는 필기 데이터가 도형인지를 판단한다. 도형인 경우에는 단계 360으로 진행하고, 아닌 경우에는 단계 330으로 진행한다.In step 320, the character recognition apparatus determines whether the handwriting data is a figure. If it is a figure, the process proceeds to step 360; otherwise, the process proceeds to step 330.

도형인지 판단 여부는 미리 규정되어 있는 도형에 대응되는 필기 데이터를 도형으로 판단할 수도 있고, 글자가 아니라고 판단되는 필기 데이터를 도형으로 판단할 수도 있다. 도형의 판단 여부는 학습을 거쳐 미리 규정되어 있을 수 있다.Whether to determine whether a figure is a figure may be determined by determining handwriting data corresponding to a predefined figure as a figure, or writing data determined not to be a character as a figure. Whether to determine the figure may be determined in advance through learning.

단계 330에서, 문자 인식 장치는 필기 데이터의 회전 각도를 추출한다. In step 330, the character recognition apparatus extracts a rotation angle of the handwriting data.

본 발명의 일 실시예에 따라, 문자 인식 장치는 페이지 내의 필기 데이터를 소정의 기준에 따라 분리한 후, 분리된 단위의 필기 데이터 별로 회전 각도를 추출한다. According to an embodiment of the present invention, the character recognition apparatus separates handwriting data in a page according to a predetermined criterion, and then extracts a rotation angle for each separated unit of handwriting data.

본 발명의 일 실시예에 있어서, 문자 인식 장치는 페이지 내의 필기 데이터를 워드 단위로 분리한다. 스트로크 데이터의 모음에 대해 의미를 부여한 상위 집합을 워드 또는 단어라고 한다. 워드는 의미 있는 데이터이며, 필기 인식을 위한 최소 단위가 된다. 필기 데이터를 분석하면, 문장 보다는 워드 단위 내의 필기의 경우에 거의 일정하게 필기가 되는 경향이 있고, 워드 별로 다른 위치나 방향으로 필기 되는 경우가 많다. 따라서, 문자 인식의 가능성을 가장 높이기 위하여 본 발명에서는 필기 데이터를 워드 단위로 분리한다.In one embodiment of the present invention, the character recognition apparatus separates handwriting data in a page in units of words. A superset to which a meaning is given to a collection of stroke data is called a word or a word. A word is meaningful data and serves as a minimum unit for handwriting recognition. When the handwriting data is analyzed, the handwriting tends to be almost constant in the case of handwriting within a word unit rather than a sentence, and the handwriting is often written in a different position or direction for each word. Accordingly, in the present invention, in order to maximize the possibility of character recognition, handwriting data is divided into word units.

필기 데이터에서 하나의 워드에 대한 필기 시간 및 좌표의 변위는 작고, 워드 간 필기 시간 및 좌표 변화는 워드 내에 비하여 크기 때문에, 필기 데이터 저장 시간, 즉, 필기 데이터의 스트로크 시간과 관련된 데이터와 수신된 필기 데이터의 좌표에 기초하여 가중치를 설정한다. Since the displacement of the handwriting time and coordinates for one word in the handwriting data is small, and the handwriting time and coordinate change between words is large compared to within a word, the handwriting data storage time, that is, data related to the stroke time of the handwriting data and the received handwriting Set weights based on the coordinates of the data.

본 발명의 일 실시예에 따른 가중치를 설정하는 방법은 아래와 같다.A method of setting a weight according to an embodiment of the present invention is as follows.

1. 전체 필기 데이터의 타임스탬프 평균, 즉, 각 스트로크 별 필기 소요시간의 평균인 T _avg 및 필기 데이터 간의 시간차 평균, 즉, 이전 스트로크의 끝 시간에서 다음 스트로크의 시작 시간을 차감한 시간의 평균인 T' _avg 를 구한다.1. Timestamp average of all handwriting data, that is, T _avg , which is the average of the required writing time for each stroke, and the average of the time difference between handwriting data, that is, the average of the time obtained by subtracting the start time of the next stroke from the end time of the previous stroke Find T' _avg .

2. 시간 임계 값인 T _th 를 구한다. T _th =T _avg + T' _avg 이다.2. Find the time threshold, T _th . T _th =T _avg + T' _avg .

3. 각 스트로크 별 평균 좌표를 구하여 측정값 (x _i , y _i )으로 사용하여, 좌표 값의 최대 및 최소를 구하여 가장 멀리 떨어진 값과의 거리를 해당 스트로크의 반지름(R _i )으로 결정한다. 좌표 값의 최대 및 최소를 구하는 알고리즘은 공지의 알고리즘을 이용한다.3. Obtain the average coordinate for each stroke and use it as the measured value ( x _i , y _i ), find the maximum and minimum of the coordinate value, and determine the distance from the farthest value as the radius ( R _i ) of the stroke. Algorithms for finding the maximum and minimum of the coordinate values use known algorithms.

4. 현재 스트로크 와 연속된 다음 스트로크의 시간차 T'가 시간 임계 값 T _th 보다 크면 가중치를 가산한다. 본 발명의 일 실시예에 있어서는, 가중치 ＋2.5를 가산한다.4. If the time difference T' between the current stroke and the next consecutive stroke is greater than the time threshold value T _th , a weight is added. In one embodiment of the present invention, a weight +2.5 is added.

5.

인 경우 가중치를 가산한다. 여기에서, 1.7은 실험을 통하여 얻어진 실험치이다. 본 경우에도 본 발명의 일 실시예에 있어서, 가중치 ＋2.5를 가산한다.5.

If , weight is added. Here, 1.7 is the experimental value obtained through the experiment. In this case as well, in an embodiment of the present invention, a weight of +2.5 is added.

6. 그 후, 아래와 같이 좌표를 기반으로 워드 분리의 가중치를 가산한다. 좌표 기반 처리의 조건은 도 5를 참조하여 설명하면 아래와 같다. 6. After that, the weight of word separation is added based on the coordinates as shown below. The conditions of the coordinate-based processing will be described with reference to FIG. 5 as follows.

도 4는 수신한 스트로크 데이터에 대한 도면이고, 도 5는 스트로크의 좌표를 이용하여 워드를 분리하는 과정을 나타내는 도면이다.4 is a diagram of received stroke data, and FIG. 5 is a diagram illustrating a process of dividing a word using the coordinates of the stroke.

(1) 기준이 되는 현재 스트로크의 중점과 이전 스트로크의 중점간의 거리를 구한다. 거리 l은 수학식 1과 같다.(1) Find the distance between the midpoint of the current stroke as a reference and the midpoint of the previous stroke. The distance l is the same as in Equation 1.

수학식 1을 참조하면, (x _i , y _i )는 현재 스트로크의 중점이고, (x _i-1 , y _i-1 )는 이전 스트로크의 중점이다.Referring to Equation 1, ( x _i , y _i ) is the midpoint of the current stroke, and ( x _i-1 , y _i-1 ) is the midpoint of the previous stroke.

(2) 현재 스트로크의 반지름 R _i 와, 이전 스트로크의 반지름 R _i-1, R _i-2 , R _i-3 , R _i-4 를 통해 다음을 수행한다. 이전 스트로크 반지름의 합 R _sum 에 대한 R _sum > R _i 조건을 비교하여, 전체 스토로크 데이터에 대해 조건이 참일 때, R _sum 값을 갱신한다(최초 1 회는 R _sum 을 계산하도록 R _sum 에 실수 최대값을 할당한다). 이전 스트로크를 네 개로 설정한 것은 실험적으로 구한 것이다. 발명의 다른 실시예에서는, 이전 스트로크의 개수를 변경할 수도 있다.(2) The following is performed through the radius R i of the current stroke and the radii R i _- _1, R _i-2 , R _i-3 , and R _i-4 of the previous stroke. Sum of previous stroke radii R _sum for R _sum > R _i The condition is compared, and when the condition is true for all the stroke data, the R _sum value is updated (the first time, a real maximum value is assigned to R _sum to calculate R _sum ). Setting the previous stroke to four was obtained experimentally. In another embodiment of the invention, the number of previous strokes may be changed.

(3) R _sum > R _i 인 경우에 대해 l > (R _i + R _i-1) 인 경우 가중치를 +1 증가시킨다. 만약, l > (R _i + R _i-1) 이 성립하지 않는다면, 이전의 반지름 값과 반복하여 (l > (R _i + R _i-1), 2

n

4)와 같이 비교하며, 하나의 조건이라도 만족하는 경우 가중치를 +1 증가시킨다. 만약, 모든 좌표비교 조건이 맞지 않는다면, 워드가 분리되지 않도록 가중치를 -1 감소시킨다.(3) For the case of R _sum > R _i , if l > ( R _i + R _i-1) , the weight is increased by +1. If l > ( R _i + R _i-1) does not hold, repeat with the previous radius value ( l > ( R _i + R _i-1), 2

n

4) , and if at least one condition is satisfied, the weight is increased by +1. If all coordinate comparison conditions are not met, the weight is reduced by -1 so that words are not separated.

위 4 내지 6의 조건에 따라 가중치가 5 이상인 경우에는 스트로크 데이터를 워드 단위로 분리한다. 가중치 5의 기준은 실험적으로 구해진 수치이다. 본 발명에 있어서, 워드를 분리하는 기준으로, 필기 데이터 저장 시간의 중요도를 좌표에 비하여 높게 설정했다According to the conditions 4 to 6 above, when the weight is 5 or more, the stroke data is divided into word units. The criterion of weight 5 is a numerical value obtained experimentally. In the present invention, as a criterion for separating words, the importance of writing data storage time is set higher than that of the coordinates.

도 6은 본 발명의 일 실시예에 따라, 워드로 분리된 결과의 예시를 나타내는 도면이다.6 is a diagram illustrating an example of a word-separated result according to an embodiment of the present invention.

문자 인식 장치는 워드 단위로 분리 후, 워드 단위의 스트로크 데이터의 방향성을 획득한 후, 방향성에 기초하여 워드 단위의 스트로크 데이터의 회전 각도를 추출한다. After separating into word units, the character recognition apparatus obtains the directionality of the stroke data of the word unit, and then extracts the rotation angle of the stroke data of the word unit based on the directionality.

본 발명의 일 실시예에 있어서, 최소자승법을 이용하여 상관관계를 나타내는 최적화 함수를 구하고, 최적화 함수에 따라 워드 단위의 스트로크 데이터의 회전 각도를 구한다.In an embodiment of the present invention, an optimization function representing a correlation is obtained using a least-squares method, and a rotation angle of stroke data in word units is obtained according to the optimization function.

일반적인 데이터 분석에서는 독립변수 x에 따라 변화하는 종속변수 y의 규칙성을 찾는 것으로 데이터의 상관관계를 분석한다. 본 발명의 일 실시예에서는 비정형 스트로크 데이터의 상관관계 분석을 위해 최소자승법을 사용하였다. 최소자승 법은 상관관계를 나타낸 함수 y=f(x)를 찾는 방법 중 한 가지로써, 스트로크 데이터의 비정형 특성에 관계없이 데이터의 특성함수 f(x)를 도출할 수 있다. 도출된 특성함수 f(x)를 통해 워드의 방향성, 기울기 및 스케일 팩터 등이 계산 가능하며, 이 값을 기준으로 워드들을 정규화(normalization)한다.In general data analysis, correlation of data is analyzed by finding the regularity of the dependent variable y that changes according to the independent variable x . In an embodiment of the present invention, the least squares method is used for correlation analysis of the atypical stroke data. The least-squares method is one of the methods of finding the function y=f(x) indicating the correlation, and can derive the characteristic function f(x) of the data regardless of the atypical characteristic of the stroke data. Through the derived characteristic function f(x) , the directionality, slope, and scale factor of the word can be calculated, and the words are normalized based on this value.

최소자승법(least square method)은 한 기준변인을 하나 또는 그 이상의 예언변인으로써 직선적 가정에 의하여 예언하고자 할 때 실제 기준변인과 직선적 가정에 의하여 예언된 기준변인과의 거리의 제곱의 합이 최소가 되도록 하는 기준을 의미하며, 어떤 두 개의 경제변량 x와 y 사이에 함수관계가 존재한다고 할 때, 그 인과관계를 수량적으로 파악하는 데 일반적으로 사용되는 것이 최소자승법이다. The least squares method is used to predict one reference variable as one or more predictive variables by a linear assumption so that the sum of the squares of the distance between the actual reference variable and the reference variable predicted by the linear assumption is minimized. When a functional relationship exists between any two economic variables x and y , the least-squares method is generally used to quantitatively determine the causal relationship.

도 7은 본 발명의 일 실시예에 따른 최적화 함수의 예를 나타내는 도면이다.7 is a diagram illustrating an example of an optimization function according to an embodiment of the present invention.

도 7에서, 네모난 도형은 각 스트로크의 중점을 나타내며, 직선은 최적화 함수를 나타낸다.In FIG. 7 , the square figure represents the midpoint of each stroke, and the straight line represents the optimization function.

도 7을 참조하면, 각각의 스트로크의 중점인 측정값 (x _i , y _i )에 대해 최적화 함인 y=ax+b를 유도한다. 최적화 함수 y=ax+b는 최소자승법을 통해 계산된 오차의 총합이 최소가 되는 일차함수이다. Referring to FIG. 7 , y=ax+b , which is optimized for the measured value ( x _i , y _i ), which is the midpoint of each stroke, is induced. The optimization function y=ax+b is a linear function that minimizes the sum of the errors calculated through the least-squares method.

각 측정값 (x ₁ , y ₁ ), (_{x2, y2}), …, (x _n , y _n )에서 최적 함수까지의 거리는 최소값을 갖는다. 최소자승법의 수식은 아래 수학식 2와 같다.Each measurement ( x ₁ , y ₁ ), ( _{x2, y2} ), … , the distance from ( x _n , y _n ) to the optimal function has a minimum value. The formula of the least squares method is as Equation 2 below.

도 7과 같이 편차 제곱의 총합 ε ² 은 오차(residule)라고 하며, 오차는 아래 수학식 3과 같다.As shown in FIG. 7 , the sum of squared deviations ε ² is referred to as a residual, and the error is as shown in Equation 3 below.

스트로크 데이터가 선형적인 관계라면 y _real =ax+b의 직선 방정식의 형태로 표현가능하며, 이에 따른 오차는 수학식 4와 같다.If the stroke data has a linear relationship, it can be expressed in the form of a linear equation of y _real =ax+b , and the resulting error is as shown in Equation (4).

수학식 3 및 4를 참조하면, 아래 수학식 5 및 6과 같이 a와 b는 각각에 대하여 편미분한 값이 0이 되는 경우 오차인 ε ² 을 최소화하는 값이 된다.Referring to Equations 3 and 4, as shown in Equations 5 and 6 below, a and b have a value that minimizes the error ε ² when the partial derivative of each becomes 0.

위 수학식 5 및 6을 만족하는 a와 b는 아래 수학식 7 및 8과 같다. A and b satisfying Equations 5 and 6 above are the same as Equations 7 and 8 below.

위 수학식에 의하여, a 및 b를 구하여 최적화 함수 y=ax+b를 획득할 수가 있다.According to the above equation, it is possible to obtain an optimization function y=ax+b by obtaining a and b .

x, y, a, b의 표준편차 σx, σy, σa, σb는 아래 수학식 9 내지 12와 같다.The standard deviations σx , σy , σa , and σb of x , y , a , and b are expressed in Equations 9 to 12 below.

수학식 9에서, x _avg 는 x의 평균값이고, 수학식 11 및 12에서의 a와 b의 표준편차는 수학식 10에서의 y의 표준편차로부터 구할 수가 있다.In Equation 9, x _avg is the average value of x , and the standard deviations of a and b in Equations 11 and 12 can be obtained from the standard deviation of y in Equation 10.

여기에서, 최적화 함수는 상관계수를 통하여 적합성을 판단할 수가 있다. 상관계수의 값은 최적화 함수의 데이터 표현 적합성을 나타내는 값이며, 1에 가까울 수록 최적화된 함수이다. 상관계수가 1인 경우에는 모든 데이터가 최적화 함수와 정확하게 일치하고, 1에 가까운 경우에는 일치하지는 않으나 직선에 근접한 경우이고, 0인 경우에는 모든 데이터 좌표가 골고루 분포하여 직선에 근접하지 않는 것을 의미한다. Here, the suitability of the optimization function can be determined through the correlation coefficient. The value of the correlation coefficient is a value indicating the data representation suitability of the optimization function, and the closer to 1, the more optimized the function is. When the correlation coefficient is 1, all data exactly match the optimization function; when it is close to 1, it does not match but is close to a straight line; when it is 0, it means that all data coordinates are evenly distributed and do not approach a straight line. .

측정값 y의 평균값인 y _avg 를 계산하고, 측정값 y와 평균값 y _avg 의 차이 값 제곱을 누적한 후, 측정값 x _i 에 대응하는 y _real 값을 계산한 후, 측정값과 직선 값의 차이 제곱의 총합을 계산한다. 그 이후, 최적화 함수의 적합성을 아래 수학식 13과 같이 검토한다.After calculating y _avg , which is the average value of the measured value y , accumulating the squares of the difference between the measured value y and the average value y _avg , and calculating the y _real value corresponding to the measured value x _i , the difference between the measured value and the straight line value Calculate the sum of squares. After that, the suitability of the optimization function is reviewed as in Equation 13 below.

수학식 13에서, 적합한 경우에는 υ=0이고, 적합한 경우에는 ∀y _real ≡y _i , υ=∑(y _i -y _avg ) ² 가 된다.In Equation 13, if appropriate, υ=0 , and if appropriate, ∀y _real ≡y _i , υ=∑(y _i -y _avg ) ² .

수학식 13을 참조하며, 수학식 14와 같이 상관계수 r ² 을 구할 수가 있다.Referring to Equation 13, the correlation coefficient r ² can be obtained as in Equation 14.

수학식 14를 참조하면, 상관계수 r ² 이 0에 가까울수록 최적화 함수의 부적합 정도를 나타내고, 1에 가까워질수록 적합한 최적화 함수임을 나타낸다. 이와같이, 상관계수를 이용하여 최적화 함수의 적합성을 판단할 수가 있다.Referring to Equation 14, the closer the correlation coefficient r ² to 0, the more the optimization function is inappropriate, and the closer it is to 1, the more suitable the optimization function is. In this way, it is possible to determine the suitability of the optimization function using the correlation coefficient.

도 8은 본 발명의 일 실시예에 따른 워드 단위의 스트로크 데이터의 최적화 함수의 예를 나타내는 도면이다. 도 8에서 빨간 실선이 최적화 함수이다.8 is a diagram illustrating an example of an optimization function of stroke data in word units according to an embodiment of the present invention. The red solid line in FIG. 8 is the optimization function.

이와 같이, 스트로크 데이터의 중점 좌표를 기준으로 1차 함수의 최적화 함수를 구하면, 일차함수의 특성상 변수 x의 계수 a로 인하여 스트로크 데이터의 회전 각도를 알 수가 있다. 예를 들어, a가 -1인 경우, 시계 방향으로 45° 기울어져 있는 것을 알 수가 있다.As such, when the optimization function of the linear function is obtained based on the midpoint coordinates of the stroke data, the rotation angle of the stroke data can be known due to the coefficient a of the variable x due to the characteristic of the linear function. For example, it can be seen that when a is -1, it is inclined by 45° in the clockwise direction.

단계 340에서, 문자 인식 장치는 추출된 회전 각도에 기초하여, 수신된 필기 데이터를 수평적으로 회전한다. In operation 340, the text recognition apparatus horizontally rotates the received handwriting data based on the extracted rotation angle.

문자 인식 장치는 워드 단위별로 추출된 회전 각도의 각도 요소를 제거한다. 예를 들어, 위에서 추출된 회전 각도가 45°인 경우, -45° 회전하여 각도 요소를 제거한다. 이 경우, 수신된 필기 데이터는 수평으로 평행하게 정렬된다.The character recognition apparatus removes the angular element of the rotation angle extracted for each word unit. For example, if the angle of rotation extracted above is 45°, rotate -45° to remove the angle component. In this case, the received handwriting data are horizontally and parallelly aligned.

본 발명의 다른 실시예로, 추출된 회전 각도가 페이지 전체에서 지배적인 회전 각도가 존재하는 경우에는, 해당 회전 각도를 기준으로 스트로크 데이터의 전체 좌표를 회전할 수도 있다.As another embodiment of the present invention, when a rotation angle in which the extracted rotation angle dominates the entire page exists, the entire coordinates of the stroke data may be rotated based on the rotation angle.

그 후, 필기의 문자 인식의 정확도를 높이기 위하여, 워드 단위로 필기의 크기를 정규화(normalization)를 할 수가 있다. 워드의 수직 및 수평 비율을 기준으로 필기 각각의 크기를 정규화하여, 워드 단위 내의 필기들이 동일 크기의 스케일 팩터(scale factor)를 가지도록 한다. 또한, 워드 단위 내의 필기 데이터의 횡간 및 자간을 동일하게 변경할 수도 있으며, 워드 단위끼리의 띄운 크기도 동일하게 변경할 수도 있다.Thereafter, in order to increase the accuracy of character recognition of the handwriting, the size of the handwriting may be normalized in units of words. The size of each handwriting is normalized based on the vertical and horizontal ratios of the word, so that the handwritings in the word unit have the same size scale factor. In addition, the horizontal spacing and the letter spacing of the handwriting data in the word unit may be changed equally, and the spacing between the word units may also be changed in the same way.

본 발명의 일 실시예로, 회전되어 수평이 되도록 배열된 워드 단위 필기의 수직 및 수평 비율은 1.0을 넘을 수가 없다. 만약, 비율이 1.0이 넘는 경우에는, 워드의 방향은 수직 방향으므로, 추가적으로 워드 단위의 필기를 90° 또는 270° 회전할 수가 있다.In one embodiment of the present invention, the vertical and horizontal ratio of the word unit handwriting arranged to be rotated and horizontal cannot exceed 1.0. If the ratio is greater than 1.0, since the direction of the word is vertical, it is possible to additionally rotate the handwriting in word units by 90° or 270°.

단계 350에서, 문자 인식 장치는 회전된 필기 데이터를 문자 인식한다.In operation 350, the character recognition apparatus recognizes the rotated handwriting data as a character.

문자 인식 장치는 회전된 필기 데이터를 워드 단위로 필기된 순서에 일치하도록 필기 인식용 데이터 세트로 가상의 공간에 배치한 후, 문자 인식 모듈을 이용하여 워드 단위의 필기 데이터들을 문자 인식한다.After arranging the rotated handwriting data as a data set for handwriting recognition in a virtual space to match the order of handwriting in word units, the character recognition apparatus recognizes the handwriting data in word units using a character recognition module.

만약, 문자 인식 장치가 회전된 필기 데이터를 문자 인식하지 못하는 경우에는 단계 360으로 이동한다.If the character recognition apparatus does not recognize the rotated handwriting data, the process moves to step 360 .

단계 360에서, 문자 인식 장치는 필기 데이터를 그래픽 데이터로 생성한다. 본 발명의 일 실시예에서는, 문자 인식 장치는 도형으로 인식되거나 문자 인식되지 않은 필기 데이터를 벡터 그래픽 데이터로 생성한다. In step 360, the character recognition device generates the handwriting data as graphic data. In an embodiment of the present invention, the character recognition apparatus generates handwriting data recognized as a figure or not recognized as a character as vector graphic data.

단계 370에서, 문자 인식 장치는 문자 인식된 필기 데이터를 화면 또는 페이지에 재배치하여 출력한다. In step 370, the character recognition apparatus rearranges the recognized character handwriting data on a screen or a page and outputs it.

문자 인식 장치는 문자 인식된 필기 데이터에 소정의 폰트를 적용하여 텍스트를 생성한다. 텍스트의 크기는 워드 단위 별로 문자 인식 시 정규화된 크기와 동일하게 출력되고, 워드 단위 내의 텍스트의 자간 및 횡간을 동일하게 하여 출력한다. 그 후, 문자 인식 장치는 워드 단위 별 회전 각도 및 수신되었던 좌표에 따라 텍스트를 재배치하여 새로운 페이지에 출력한다. The character recognition apparatus generates text by applying a predetermined font to the character-recognized handwriting data. The size of the text is output to be the same as the normalized size when character recognition is performed for each word unit, and the spacing and horizontal spacing of the text within the word unit are the same and outputted. Thereafter, the character recognition apparatus rearranges the text according to the rotation angle for each word unit and the received coordinates and outputs the rearranged text on a new page.

또한, 문자 인식 장치는 그래픽 데이터로 생성된 필기 데이터를 좌표에 따라 텍스트에 중첩하여 출력한다.In addition, the character recognition apparatus superimposes the handwriting data generated as graphic data on the text according to the coordinates and outputs it.

도 9는 도 2의 필기 데이터를 문자 인식하여 재배치한 예시를 나타내는 도면이다.9 is a diagram illustrating an example in which the handwriting data of FIG. 2 is recognized and rearranged.

이런 재배치 결과, 문자 인식 절차를 거친 필기 데이터의 전체적인 형태와 위치는 수신된 원본 필기 데이터와 유사한 형태로 표현이 되어, 필기 인식 결과에 대한 거부감을 해소할 수가 있다. 또한, 필기 인식되지 않은 데이터는 벡터 그래픽으로 다시 표현되기 때문에, 문자 인식을 실패하더라도 원본 필기 내용이 확인 가능하다. As a result of this rearrangement, the overall shape and location of the handwriting data that has undergone the character recognition procedure is expressed in a form similar to the received original handwriting data, thereby resolving the objection to the handwriting recognition result. In addition, since the unrecognized handwriting data is re-expressed as vector graphics, the original handwriting content can be checked even if the character recognition fails.

본 발명의 일 실시예에서, 문자 인식 장치는 워드 단위 별로 글상자 형태로 재배치하여 출력할 수도 있다. 이 경우, 필기 인식 결과를 복사 및 붙여 넣기 등을 통하여 외부 문서에 제공할 수도 있고, 서식 있는 형식으로 저장하는 원래 형태와 순서를 유지할 수 있게 할 수도 있다.In an embodiment of the present invention, the character recognition apparatus may rearrange and output the text box form for each word unit. In this case, the handwriting recognition result may be provided to an external document through copying and pasting, or the like, or the original form and order of saving in a rich format may be maintained.

도 10은 본 발명의 일 실시예에 따른 필기 데이터 문자 인식 장치에 관한 블록 다이어그램을 나타내는 도면이다.10 is a block diagram illustrating an apparatus for recognizing handwriting data according to an embodiment of the present invention.

도 10을 참조하면, 문자 인식 장치(1000)는 수신부(1010), 전처리부(1020), 문자 인식부(1030) 및 출력부(1040)를 포함한다.Referring to FIG. 10 , the character recognition apparatus 1000 includes a receiving unit 1010 , a preprocessing unit 1020 , a character recognition unit 1030 , and an output unit 1040 .

수신부(1010)는 필기 데이터를 수신한다. The receiver 1010 receives handwriting data.

전처리부(1020)는 필기 데이터가 도형인지를 판단한다. 도형인 경우에는 해당 스트로크 좌표에 따라 벡터 그래픽 데이터를 생성한다.The preprocessor 1020 determines whether the handwriting data is a figure. In the case of a figure, vector graphic data is generated according to the corresponding stroke coordinates.

전처리부(1020)는 필기 데이터의 회전 각도를 추출한다. The preprocessor 1020 extracts the rotation angle of the handwriting data.

본 발명의 일 실시예에 따라, 전처리부(1020)는 페이지 내의 필기 데이터를 소정의 기준에 따라 분리한 후, 분리된 단위의 필기 데이터별로 회전 각도를 추출한다. According to an embodiment of the present invention, the preprocessor 1020 separates the writing data in the page according to a predetermined criterion, and then extracts a rotation angle for each separated unit of the writing data.

본 발명의 일 실시예에 있어서, 전처리부(1020)는 페이지 내의 필기 데이터를 워드 단위로 분리한다. 스트로크 데이터의 모음에 대해 의미를 부여한 상위 집합을 워드 또는 단어라고 한다. 워드는 의미 있는 데이터이며, 필기 인식을 위한 최소 단위가 된다. 필기 데이터를 분석하면, 문장 보다는 워드 단위 내의 필기의 경우에 거의 일정하게 필기가 되는 경향이 있고, 워드 별로 다른 위치나 방향으로 필기 되는 경우가 많다. 따라서, 문자 인식의 가능성을 가장 높이기 위하여 본 발명에서는 필기 데이터를 워드 단위로 분리한다.In one embodiment of the present invention, the preprocessor 1020 separates the handwriting data in the page in word units. A superset to which a meaning is given to a collection of stroke data is called a word or a word. A word is meaningful data and serves as a minimum unit for handwriting recognition. When the handwriting data is analyzed, the handwriting tends to be almost constant in the case of handwriting within a word unit rather than a sentence, and the handwriting is often written in a different position or direction for each word. Accordingly, in the present invention, in order to maximize the possibility of character recognition, handwriting data is divided into word units.

필기 데이터에서 하나의 워드에 대한 필기 시간 및 좌표의 변위는 작고, 워드 간 필기 시간 및 좌표 변화는 워드 내에 비하여 크기 때문에, 전처리부(1020)는 필기 데이터 저장 시간, 즉, 필기 데이터의 스트로크 시간과 관련된 데이터와 수신된 필기 데이터의 좌표에 기초하여 가중치를 설정한다. Since the displacement of the writing time and coordinates for one word in the writing data is small, and the writing time and coordinate change between words is large compared to within a word, the preprocessor 1020 determines the writing data storage time, that is, the stroke time of the writing data and A weight is set based on the associated data and the coordinates of the received handwriting data.

1. 전처리부(1020)는 전체 필기 데이터의 타임스탬프 평균, 즉, 각 스트로크 별 필기 소요시간의 평균인 T _avg 및 필기 데이터 간의 시간차 평균, 즉, 이전 스트로크의 끝 시간에서 다음 스트로크의 시작 시간을 차감한 시간의 평균인 T' _avg 를 구한다.1. The preprocessor 1020 calculates the timestamp average of all the writing data, that is, T _avg , which is the average of the required writing time for each stroke, and the average of the time difference between the writing data, that is, the start time of the next stroke from the end time of the previous stroke. Find the average of the subtracted times, T' _avg .

2. 전처리부(1020)는 시간 임계 값인 T _th 를 구한다. T _th =T _avg + T' _avg 이다.2. The preprocessor 1020 obtains T _th , which is a time threshold. T _th =T _avg + T' _avg .

3. 전처리부(1020)는 각 스트로크 별 평균 좌표를 구하여 측정값 (x _i , y _i )으로 사용하여, 좌표 값의 최대 및 최소를 구하여 가장 멀리 떨어진 값과의 거리를 해당 스트로크의 반지름(R _i )으로 결정한다. 전처리부(1020)는 좌표 값의 최대 및 최소를 구하는 알고리즘은 공지의 알고리즘을 이용한다.3. The preprocessing unit 1020 obtains the average coordinate for each stroke and uses it as the measured value ( x _i , y _i ), obtains the maximum and minimum of the coordinate value, and calculates the distance from the most distant value to the radius of the stroke ( R _i ) is determined. The preprocessor 1020 uses a known algorithm for an algorithm for obtaining the maximum and minimum of the coordinate values.

4. 전처리부(1020)는 현재 스트로크 와 연속된 다음 스트로크의 시간차 T'가 시간 임계 값 T _th 보다 크면 가중치를 가산한다. 본 발명의 일 실시예에 있어서는, 전처리부(1020)는 가중치 ＋2.5를 가산한다.4. The preprocessor 1020 adds weights when the time difference T' between the current stroke and the subsequent stroke is greater than the time threshold value T _th . In an embodiment of the present invention, the preprocessor 1020 adds a weight of +2.5.

5. 전처리부(1020)는

인 경우 가중치를 가산한다. 여기에서, 1.7은 실험을 통하여 얻어진 실험치이다. 본 경우에도 본 발명의 일 실시예에 있어서, 전처리부(1020)는 가중치 ＋2.5를 가산한다.5. The preprocessor 1020

If , weight is added. Here, 1.7 is the experimental value obtained through the experiment. Also in this case, in an embodiment of the present invention, the pre-processing unit 1020 adds a weight of +2.5.

6. 그 후, 전처리부(1020)는 아래와 같이 좌표를 기반으로 워드 분리의 가중치를 가산한다. 좌표 기반 처리의 조건은 위에서 언급한 도 5를 참조하여 설명하면 아래와 같다. 6. After that, the preprocessor 1020 adds a weight of word separation based on the coordinates as follows. The conditions of the coordinate-based processing will be described below with reference to FIG. 5 mentioned above.

(1) 전처리부(1020)는 기준이 되는 현재 스트로크의 중점과 이전 스트로크의 중점간의 거리를 구한다. 거리 l은

이다. 여기에서, (x _i , y _i )는 현재 스트로크의 중점이고, (x _i-1 , y _i-1 )는 이전 스트로크의 중점이다.(1) The preprocessor 1020 obtains a distance between the midpoint of the current stroke as a reference and the midpoint of the previous stroke. distance l is

to be. Here, ( x _i , y _i ) is the midpoint of the current stroke, and ( x _i-1 , y _i-1 ) is the midpoint of the previous stroke.

(2) 전처리부(1020)는 현재 스트로크의 반지름 R _i 와, 이전 스트로크의 반지름 R _i-1, R _i-2 , R _i-3 , R _i-4 를 통해 다음을 수행한다. 전처리부(1020)는 이전 스트로크 반지름의 합 R _sum 에 대한 R _sum > R _i 조건을 비교하여, 전체 스토로크 데이터에 대해 조건이 참일 때, R _sum 값을 갱신한다(최초 1 회는 R _sum 을 계산하도록 R _sum 에 실수 최대값을 할당한다). 이전 스트로크를 네 개로 설정한 것은 실험적으로 구한 것이다. 발명의 다른 실시예에서는, 이전 스트로크의 개수를 변경할 수도 있다.(2) The preprocessor 1020 performs the following through the radius R _i of the current stroke and the radii R _{i-1 ,} R _i-2 , R _i-3 , and R _i-4 of the previous stroke. The preprocessor 1020 determines that R _sum > R _i for the sum R _sum of the previous stroke radii The condition is compared, and when the condition is true for all the stroke data, the R _sum value is updated (the first time, a real maximum value is assigned to R _sum to calculate R _sum ). Setting the previous stroke to four was obtained experimentally. In another embodiment of the invention, the number of previous strokes may be changed.

(3) 전처리부(1020)는 R _sum > R _i 인 경우에 대해 l > (R _i + R _i-1) 인 경우 가중치를 +1 증가시킨다. 만약 l > (R _i + R _i-1) 이 성립하지 않는다면, 전처리부(1020)는 이전의 반지름 값과 반복하여 (l > (R _i + R _i-1), 2

n

4)와 같이 비교하며, 하나의 조건이라도 만족하는 경우 가중치를 +1 증가시킨다. 만약, 모든 좌표비교 조건이 맞지 않는다면, 전처리부(1020)는 워드가 분리되지 않도록 가중치를 -1 감소시킨다.(3) The preprocessor 1020 increases the weight by +1 when l > ( R _i + R _i-1) with respect to the case of R _sum > R _i . If l > ( R _i + R _i-1) does not hold, the preprocessor 1020 repeats the previous radius value and ( l > ( R _i + R _i-1), 2

n

4) , and if at least one condition is satisfied, the weight is increased by +1. If all coordinate comparison conditions are not met, the preprocessor 1020 reduces the weight by -1 so that the word is not separated.

전처리부(1020)는 위 4 내지 6의 조건에 따라 가중치가 5 이상인 경우에는 스트로크 데이터를 워드 단위로 분리한다. 가중치 5의 기준은 실험적으로 구해진 수치이다. 본 발명에 있어서, 워드를 분리하는 기준으로, 필기 데이터 저장 시간의 중요도를 좌표에 비하여 높게 설정했다The preprocessor 1020 separates stroke data in word units when the weight is 5 or more according to conditions 4 to 6 above. The criterion of weight 5 is a numerical value obtained experimentally. In the present invention, as a criterion for separating words, the importance of writing data storage time is set higher than that of the coordinates.

전처리부(1020)는 워드 단위로 분리 후, 워드 단위의 스트로크 데이터의 방향성을 획득한 후, 방향성에 기초하여 워드 단위의 스트로크 데이터의 회전 각도를 추출한다. The preprocessor 1020 separates the word units, acquires the directionality of the word unit stroke data, and extracts a rotation angle of the word unit stroke data based on the directionality.

본 발명의 일 실시예에 있어서, 전처리부(1020)는 최소자승법을 이용하여 상관관계를 나타내는 최적화 함수를 구하고, 최적화 함수에 따라 워드 단위의 스트로크 데이터의 회전 각도를 구한다.In one embodiment of the present invention, the preprocessor 1020 obtains an optimization function representing a correlation by using the least squares method, and obtains a rotation angle of the stroke data in word units according to the optimization function.

최소자승법(least square method)은 한 기준변인을 하나 또는 그 이상의 예언변인으로써 직선적 가정에 의하여 예언하고자 할 때 실제 기준변인과 직선적 가정에 의하여 예언된 기준변인과의 거리의 제곱의 합이 최소가 되도록 하는 기준을 의미하며, 어떤 두 개의 경제변량 x와 y사이에 함수관계가 존재한다고 할 때, 그 인과관계를 수량적으로 파악하는 데 일반적으로 사용되는 것이 최소자승법이다. The least squares method is used to predict one reference variable as one or more predictive variables by a linear assumption so that the sum of the squares of the distance between the actual reference variable and the reference variable predicted by the linear assumption is minimized. When a functional relationship exists between any two economic variables x and y, the least-squares method is generally used to quantitatively determine the causal relationship.

최적화 함수 y=ax+b를 구하는 방법은 위 수학식 2 내지 8 및 그에 대한 설명으로 설명하였기에, 중복된 설명을 피하기 위하여 생략하기로 한다.Since the method for obtaining the optimization function y=ax+b has been described with Equations 2 to 8 and its description, it will be omitted to avoid duplicate description.

전처리부(1020)는 추출된 회전 각도에 기초하여, 수신된 필기 데이터를 수평적으로 회전한다. The preprocessor 1020 horizontally rotates the received handwriting data based on the extracted rotation angle.

전처리부(1020)는 워드 단위별로 추출된 회전 각도의 각도 요소를 제거한다. 예를 들어, 위에서 추출된 회전 각도가 45°인 경우, 전처리부(1020)는 워드 부분을 -45° 회전하여 각도 요소를 제거한다. 이 경우, 수신된 필기 데이터는 수평으로 평행하게 정렬된다.The preprocessor 1020 removes the angular element of the rotation angle extracted for each word unit. For example, when the rotation angle extracted above is 45°, the preprocessor 1020 rotates the word part by -45° to remove the angle element. In this case, the received handwriting data are horizontally and parallelly aligned.

본 발명의 다른 실시예로, 추출된 회전 각도가 페이지 전체에서 지배적인 회전 각도가 존재하는 경우에는, 전처리부(1020)는 해당 회전 각도를 기준으로 스트로크 데이터의 전체 좌표를 회전할 수도 있다.In another embodiment of the present invention, when the extracted rotation angle has a dominant rotation angle in the entire page, the preprocessor 1020 may rotate the entire coordinates of the stroke data based on the rotation angle.

그 후, 필기의 문자 인식의 정확도를 높이기 위하여, 전처리부(1020)는 워드 단위로 필기의 크기를 정규화(normalization)를 할 수가 있다. 워드의 수직 및 수평 비율을 기준으로 필기 각각의 크기를 정규화하여, 워드 단위 내의 필기들이 동일 크기의 스케일 팩터(scale factor)를 가지도록 한다. 또한, 워드 단위 내의 필기 데이터의 횡간 및 자간을 동일하게 변경할 수도 있으며, 워드 단위끼리의 띄운 크기도 동일하게 변경할 수도 있다.Thereafter, in order to increase the accuracy of character recognition of the handwriting, the preprocessor 1020 may normalize the size of the handwriting in units of words. The size of each handwriting is normalized based on the vertical and horizontal ratios of the word, so that the handwritings in the word unit have the same size scale factor. In addition, the horizontal spacing and the letter spacing of the handwriting data in the word unit may be changed equally, and the spacing between the word units may also be changed in the same way.

본 발명의 일 실시예로, 회전되어 수평이 되도록 배열된 워드 단위 필기의 수직 및 수평 비율은 1.0을 넘을 수가 없다. 만약, 비율이 1.0이 넘는 경우에는, 워드의 방향은 수직 방향으므로, 전처리부(1020)는 추가적으로 워드 단위의 필기를 90° 또는 270° 회전할 수가 있다.In one embodiment of the present invention, the vertical and horizontal ratio of the word unit handwriting arranged to be rotated and horizontal cannot exceed 1.0. If the ratio is greater than 1.0, since the direction of the word is vertical, the preprocessor 1020 may additionally rotate the handwriting in word units by 90° or 270°.

문자 인식부(1030)는 회전된 필기 데이터를 문자 인식한다.The character recognition unit 1030 recognizes the rotated handwriting data as a character.

문자 인식부(1030)는 회전된 필기 데이터를 워드 단위로 필기된 순서에 일치하도록 필기 인식용 데이터 세트로 가상의 공간에 배치한 후, 워드 단위의 필기 데이터들을 문자 인식한다.The character recognition unit 1030 arranges the rotated handwriting data in a virtual space as a data set for handwriting recognition so as to match the written order in units of words, and then recognizes the handwriting data in units of words as characters.

만약, 문자 인식부(1030)가 회전된 필기 데이터를 문자 인식하지 못하는 경우에는 워드 단위의 필기 데이터를 그래픽 데이터로 생성한다. 본 발명의 일 실시예에서는, 문자 인식부(1030)는 도형으로 인식되거나 문자 인식되지 않은 필기 데이터를 벡터 그래픽 데이터로 생성한다. If the character recognition unit 1030 does not recognize the rotated writing data as characters, the writing data in word units is generated as graphic data. In an embodiment of the present invention, the character recognition unit 1030 generates handwriting data recognized as a figure or not recognized as a character as vector graphic data.

출력부(1040)는 문자 인식된 필기 데이터를 화면 또는 페이지에 재배치하여 출력한다. The output unit 1040 rearranges the character-recognized handwriting data on a screen or page and outputs it.

출력부(1040)는 문자 인식된 필기 데이터에 소정의 폰트를 적용하여 텍스트를 생성한다. 출력부(1040)는 텍스트의 크기는 워드 단위 별로 문자 인식 시 정규화된 크기와 동일하게 출력하고, 워드 단위 내의 텍스트의 자간 및 횡간을 동일하게 하여 출력한다. 그 후, 출력부(1040)는 워드 단위 별 회전 각도 및 수신되었던 좌표에 따라 텍스트를 재배치하여 새로운 페이지에 출력한다. The output unit 1040 generates text by applying a predetermined font to the character-recognized handwriting data. The output unit 1040 outputs the size of the text to be the same as the normalized size when recognizing characters for each word unit, and outputs the same spacing and horizontal spacing of the text within the word unit. Thereafter, the output unit 1040 rearranges the text according to the received coordinates and the rotation angle for each word unit and outputs it on a new page.

또한, 출력부(1040)는 그래픽 데이터로 생성된 필기 데이터를 좌표에 따라 텍스트에 중첩하여 출력한다.Also, the output unit 1040 superimposes the handwriting data generated as graphic data on the text according to the coordinates and outputs it.

본 발명의 일 실시예에서, 출력부(1040)는 워드 단위 별로 글상자 형태로 재배치하여 출력할 수도 있다. 이 경우, 필기 인식 결과를 복사 및 붙여 넣기 등을 통하여 외부 문서에 제공할 수도 있고, 서식 있는 형식으로 저장하는 원래 형태와 순서를 유지할 수 있게 할 수도 있다.In an embodiment of the present invention, the output unit 1040 may rearrange and output the text box form for each word unit. In this case, the handwriting recognition result may be provided to an external document through copying and pasting, or the like, or the original form and order of saving in a rich format may be maintained.

이상 설명한 바와 같은 필기 데이터 문자 인식 방법은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 매체를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고, 상기 디스크 관리 방법을 구현하기 위한 기능적인(function) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다. The handwriting data character recognition method as described above can also be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes any type of recording medium in which data readable by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device. In addition, the computer-readable recording medium is distributed in a computer system connected through a network, so that the computer-readable code can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the disk management method can be easily inferred by programmers in the art to which the present invention pertains.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

문자 인식 장치: 1000
수신부: 1010 전처리부: 1020
문자 인식부: 1030 출력부: 1040Character Recognition Unit: 1000
Receiver: 1010 Preprocessor: 1020
Character recognition unit: 1030 Output unit: 1040

Claims

receiving handwriting data;
extracting a rotation angle of the received handwriting data;
rotating the received handwriting data based on the extracted rotation angle; and
and recognizing the rotated handwriting data as a character.

The method of claim 1,
The step of extracting the rotation angle of the received handwriting data includes:
separating the received handwriting data into word units; and
and extracting a rotation angle of the handwriting data separated in word units.

3. The method of claim 2,
Separating the received handwriting data into word units includes:
and separating the received handwriting data in word units according to data related to a stroke time of the received handwriting data and a weight based on coordinates of the received handwriting data.

The method of claim 1,
The step of extracting the rotation angle of the received handwriting data includes:
and extracting a rotation angle of the received handwriting data according to an optimization function calculated using a least-squares method.

The method of claim 1,
The character recognition of the rotated handwriting data includes:
normalizing the size of the rotated handwriting data; and
and character recognition of the handwriting data whose size is normalized.

The method of claim 1,
The character recognition of the rotated handwriting data includes:
arranging the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data was written; and
and character recognition of the arranged writing data.

The method of claim 1,
and outputting the character-recognized writing data.

8. The method of claim 7,
The step of outputting the character-recognized handwriting data includes:
and re-rotating and outputting the recognized handwriting data based on the extracted rotation angle.

9. The method of claim 8,
extracting non-character-recognized handwriting data from the received handwriting data;
generating the received handwriting data corresponding to the unrecognized handwriting data as graphic data; and
and outputting the generated graphic data by superimposing the generated graphic data on the output handwriting data.

9. The method of claim 8,
extracting handwriting data recognized as a figure from among the received handwriting data;
generating handwriting data recognized as the figure as graphic data; and
and outputting the generated graphic data by superimposing the generated graphic data on the output handwriting data.

9. The method of claim 8,
The step of re-rotating and outputting the character-recognized handwriting data based on the extracted rotation angle includes:
uniformizing the size, horizontal spacing, and spacing of the character-recognized handwriting data; and
and re-rotating and outputting the recognized character-recognized writing data with the same size, horizontal spacing, and inter-character spacing based on the extracted rotation angle.

a receiver for receiving handwriting data;
a preprocessing unit extracting a rotation angle of the received writing data and rotating the received writing data based on the extracted rotation angle; and
and a character recognition unit for character recognition of the rotated writing data.

13. The method of claim 12,
and the preprocessing unit separates the received writing data in units of words and extracts a rotation angle of the separated writing data in units of words.

14. The method of claim 13,
and the pre-processing unit separates the received handwriting data in word units according to weights based on data related to a stroke time of the received handwriting data and coordinates of the received handwriting data.

13. The method of claim 12,
and the preprocessor extracts the rotation angle of the received writing data according to an optimization function calculated using a least-squares method.

13. The method of claim 12,
and the character recognition unit normalizes the size of the rotated writing data and recognizes the character of the normalized size of the writing data.

13. The method of claim 12,
and the character recognition unit arranges the rotated handwriting data in a virtual page space according to a time sequence in which the handwriting data is written, and recognizes the arranged handwriting data as a character.

13. The method of claim 12,
and an output unit for outputting the character-recognized writing data.

19. The method of claim 18,
and the output unit rotates and outputs the recognized handwriting data based on the extracted rotation angle.

20. The method of claim 19,
The preprocessor extracts unrecognized handwriting data from the received handwriting data, and generates the received handwriting data corresponding to the unrecognized handwriting data as graphic data;
and the output unit overlaps the generated graphic data with the output handwriting data and outputs the superimposed graphic data.

20. The method of claim 19,
The preprocessor extracts handwriting data recognized as a figure from among the received handwriting data, and generates the handwriting data recognized as a figure as graphic data;
and the output unit overlaps the generated graphic data with the output handwriting data and outputs the superimposed graphic data.

20. The method of claim 19,
The output unit equalizes the size, horizontal spacing, and spacing of the character-recognized handwriting data, and re-rotates and outputs the character-recognized handwriting data having the same size, horizontal spacing, and spacing based on the extracted rotation angle. handwriting data character recognition device.