KR20040051437A

KR20040051437A - Performance improving method of grapheme based hangul recognizer

Info

Publication number: KR20040051437A
Application number: KR1020020079366A
Authority: KR
Inventors: 임길택; 김호연; 장승익; 남윤석
Original assignee: 한국전자통신연구원
Priority date: 2002-12-12
Filing date: 2002-12-12
Publication date: 2004-06-18
Also published as: KR100479349B1

Abstract

PURPOSE: A method for improving the performance of a character recognizer based on a grapheme is provided to prevent the wrong recognition caused by the wrong recognition of a vowel. CONSTITUTION: A grapheme combination pattern for a character image included in a document image is classified(502). It is judged that a final consonant is included in the character image based on the classified grapheme combination pattern(503). The grapheme included in the character image is classified based on a judging result. While the first consonant, a middle consonant, and/or the final consonant are classified based on the judging result, the recognition of the middle consonant is executed based on a recognition reliability of the first and/or the final consonant. The recognition results of the first, the middle, and the final consonant are combined(517).

Description

How to improve the phoneme-based character recognizer {PERFORMANCE IMPROVING METHOD OF GRAPHEME BASED HANGUL RECOGNIZER}

본 발명은 자소기반 문자인식기 성능 향상방법에 관한 것으로, 특히 스캐너또는 카메라 등의 영상 입력기를 통해 획득된 문서 영상에 존재하는 한글 문자 영상을 인식함에 있어서, 모음의 오인식으로 인한 문자의 오인식을 방지할 수 있도록 하는 방법에 관한 것이다.The present invention relates to a method for improving a phoneme-based character recognizer. In particular, in recognizing a Hangul character image existing in a document image acquired through an image input device such as a scanner or a camera, it is possible to prevent the recognition of a character due to a misrecognition of a vowel. It's about how to make that happen.

통상적으로, 한글은 19개의 초성과 21개의 중성 및 27개의 종성들이 2차원적 조합으로 되어 문자가 구성되기 때문에 인식 대상이 11,172개가 될 정도로 매우 많다.Typically, Hangul has a large number of objects to recognize 11,172 because 19 letters, 21 neutrals and 27 finalities are composed of two-dimensional combinations of letters.

또한, 사용빈도가 높은 문자만을 대상으로 인식할 경우, 1,000 ∼ 2,000여 종류가 되므로 높은 인식률의 인식시스템을 구현하는 것은 매우 어려움에 따라 한글문자 인식 방법은 문자단위의 인식 방법보다는 자소단위의 인식 방법이 주류를 이룬다.In addition, if only the characters with high frequency of use are recognized as objects, there are more than 1,000 to 2,000 types, so it is very difficult to implement a recognition system with high recognition rate. This is mainstream.

도 1은 종래 한글문자 인식 방법 과정에 대하여 도시한 도면으로서, 스캐너 또는 카메라 등의 영상 입력기에 의해 획득된 문서영상에서 개별 문자 영상 추출 단계를 거쳐 인식대상이 되는 문자 영상을 입력받은 다음에(단계 1), 입력된 문자 영상을 구성하고 있는 자소들의 조합 유형을 분류한다(단계 2). 이후, 자소 조합 유형분류 과정을 수행하면서 얻어지는 자소의 조합 위치 정보를 이용하여 자소별로 부분 영상을 분리하며(단계 3), 분리된 자소 영상들에 대해 특징치 벡터들을 각각 추출한 후, 이를 이용하여 각각의 자소를 독립적으로 인식하며(단계 4), 인식된 자소들을 결합함으로써 최종 인식결과를 얻는다(단계 5).1 is a view illustrating a conventional method of recognizing a Korean character, and after receiving a character image to be recognized through an individual character image extraction step from a document image acquired by an image input device such as a scanner or a camera (step) 1), the combination type of the phonemes constituting the input text image is classified (step 2). Subsequently, partial images are separated for each phoneme by using the combination location information of the phonemes obtained by performing the phoneme combination type classification process (step 3), and feature value vectors are extracted from the separated phonemes, respectively. Independently recognize the phoneme of (step 4), and obtain the final recognition result by combining the recognized phonemes (step 5).

이와 같이, 종래의 문자 인식 방법은 각각의 자소를 독립적으로 인식하고, 자소 인식결과를 결합하므로 문자를 구성하고 있는 자소들 중 한 개의 자소라도 오인식 된다면 문자인식 오류가 필연적으로 발생하게 된다. 여기서, 한글 자소는 초성 자음, 중성 모음, 종성 자음이 있으며, 이들 중 오인식이 가장 빈번하게 발생하는 것은 중성 모음이다.As described above, the conventional character recognition method independently recognizes each phoneme, and combines the phoneme recognition results. Therefore, if one of the phonemes constituting the letter is misrecognized, a character recognition error inevitably occurs. Here, Hangul phoneme has a consonant consonant, a neutral vowel, a final consonant, among which the most common misrecognition is a neutral vowel.

오인식이 빈번하게 발행하는 중성 모음에 대하여 보다 세부적으로 설명하면, 모음의 인식률이 자음보다 낮은 것은 2가지의 원인으로 설명될 수 있다.When the neutral vowels frequently misrecognized are described in more detail, the recognition rate of the vowels lower than the consonants can be explained by two reasons.

즉, 모음은 자음에 비해 정보의 잉여도가 낮다는 것이다. 자음은 모음에 비해 획의 손실, 변형 등이 어느 정도 일어나더라도(정보가 어느 정도 손실되더라도) 원래의 자음을 비교적 쉽게 추론해낼 수 있다.In other words, vowels are less redundant than information. Consonants can be deduced relatively easily from the original consonant, no matter how much loss or deformation of the vowel occurs (to some extent the information is lost).

그러나, 모음은 한글 자소의 제자원리상 작은 획 하나에 의해 구별이 된다. 그 예로, ㅣ, ㅓ와 ㅕ 는 세로획은 동일하나 짧은 가로획의 개수에 의해 구별됨에 따라 자소에 조금의 변형이 일어나면 오인식 가능성이 크게 높아진다. 이러한 현상은 특이한 글꼴이나, 저해상도, 저품질의 문자에서 두드러지게 나타난다.However, vowels are distinguished by one small stroke in Korean alphabet. For example, ,, ㅓ, and ㅕ are the same as the vertical strokes, but are distinguished by the number of short horizontal strokes, so that a slight deformation in the phoneme greatly increases the possibility of misrecognition. This is especially true for unusual fonts, low resolution and low quality characters.

다음으로, 자소영상의 정규화 시 모음 영상의 변형 정도가 자음 영상의 변형 정도에 비해 매우 크다는 것이다. 일반적으로 자소는 인식 전에 선형 또는 비선형으로 정규화되고, 이에 대해 일정한 차원의 특징치 벡터를 추출하여 인식하며 이러한 과정에서 모음의 변형은 자음에 비해 매우 커서 원래의 모음이 완전히 다른 모음으로 변형될 가능성이 높아 모음의 인식률이 자음의 인식률에 비해 상대적으로 저하하게 되는 문제점이 있다.Next, the degree of deformation of the vowel image at the normalization of the phoneme image is very large compared to the degree of deformation of the consonant image. In general, the phoneme is normalized linearly or nonlinearly before recognition, and the feature vector of a certain dimension is extracted and recognized. In this process, the vowel deformation is very large compared to the consonant, so that the original vowel is transformed into a completely different vowel. As a result, the recognition rate of the vowels is relatively lower than the recognition rate of the consonants.

이와 같이, 문자 인식 기술로는 1996년 8월 정보과학회 논문지에 게재된 "개선된 자소 인식 방법을 통한 고인식률 인쇄체 한글 인식"과, 2001년 1월 5일0285765로 등록된 "한글 문자체 인식 방법"과, 1999년 10월 20일 0239354로 등록된 "잡영이 포함된 저 품질 문자의 한글 인식방법" 등에 개시되어 있다.As such, the character recognition techniques include "Recognition of Printed Korean Characters Using the Improved Phoneme Recognition Method," published in the Journal of the Korean Information Science Society in August 1996, and "Hangul Character Recognition Method" registered as 0285765 on January 5, 2001. And "Method for Recognizing Korean Characters of Low-Quality Characters Including Joong-young" registered on October 20, 1999, 0239354.

상술한 바와 같이, 개시된 선행기술을 상세하게 설명하면, "개선된 자소 인식 방법을 통한 고인식률 인쇄체 한글 인식"은 입력된 문자영상에 대해 특징치 벡터를 추출한 후 한글의 6가지 유형 중 하나의 유형으로 분류한 후 유형정보를 이용하여 자소를 분리한다. 분리된 자소를 인식하기 위해 자소에 대한 특징치 벡터를 추출하고, 이를 입력으로 하는 자소 인식 신경망을 통해 자소가 인식된다. 각 자소의 인식은 서로 독립적으로 이루어지고, 인식된 자소를 조합함으로써 최종 문자 인식이 이루어진다.As described above, when the disclosed prior art is described in detail, "high recognition rate printed Hangul recognition through an improved phoneme recognition method" is one of six types of Hangul after extracting a feature value vector for an input character image. After classifying as, separate the phonemes using the type information. In order to recognize the separated phoneme, the feature vector of the phoneme is extracted, and the phoneme is recognized through a phoneme recognition neural network having the input. Recognition of each phoneme is made independently of each other, and final character recognition is achieved by combining the recognized phonemes.

자소의 분리는 분류된 한글 유형 정보를 이용하는데 정확한 자소영역만을 분리하는 것이 어려우므로 잡영이 포함되더라도 충분히 자소 영역을 포함시키도록 자소영역을 확장한다. 분리된 자소를 인식하기 위해 자소에 대한 그물눈(메쉬:mesh) 특징치를 추출한다.Since the separation of phonemes uses the classified Hangul type information, it is difficult to separate only the correct phoneme area. Therefore, the phoneme area is expanded to include the phoneme area sufficiently even if the miso is included. In order to recognize the separated phonemes, we extract the mesh feature of the phonemes.

그물눈 특징치는 인식대상 영역을 부분영역으로 나누고 부분영역에 존재하는 검은 화소의 수로써 정의된다. 분리된 자소에 대한 그물눈 특징치를 17개의 자소인식 신경망 인식기 중 해당 자소 인식 신경망에 입력함으로써 자소가 인식된다. 문자내의 모든 자소를 인식 한 후 자소들을 조합하여 최종 문자를 완성하는 기술이다.The mesh feature is defined as the number of black pixels existing in the partial region by dividing the recognition target region into partial regions. The phoneme is recognized by inputting the mesh feature for the separated phoneme into the corresponding phoneme-aware neural network among 17 phoneme-aware neural network recognizers. Recognizes all phonemes in a letter and combines them to complete the final letter.

다음으로, "한글 문자체 인식 방법"은 입력된 개별 문자에 대해 자소단위로 인식할 것인지 또는 문자 단위로 인식할 것인지 문자인식 방법을 결정하고, 자소분리의 인식방법으로 결정된 경우 입력 문자에 대해 자소를 분리하여 인식하고 인식된 자소를 조합하여 문자를 출력하거나, 문자 단위의 인식방법으로 결정된 경우 입력 문자를 문자단위로 인식하여 인식된 문자를 출력하는 하여, 출력된 문자를 저장한다.Next, the "Hangul character recognition method" determines the character recognition method to recognize each input character in the unit of phoneme or character unit, and if it is determined as the recognition method of phoneme separation, Recognize them separately and combine the recognized phonemes to output the characters, or if it is determined by the character-based recognition method, the input characters are recognized by character units and the recognized characters are output.

자소단위 방법과 문자단위 인식방법의 적용여부는 수평모음과 수직모음 및 조각난 획들을 추출하고 수평모음과 수직모음을 이용하여 모음을 인식한 후 인식된 모음과 조각난 획들의 모음 위치 및 크기, 개수 정보를 비교해서 서로 일치하면 모음이 정확히 인식되었다고 판단하여 자소단위 인식 방법을 수행하고 일치하지 않으면 모음인식이 어렵다고 판단하여 문자단위 인식을 한다.Application of the character unit method and character unit recognition method extracts horizontal vowels, vertical vowels and fragmented strokes, recognizes vowels using horizontal vowels and vertical vowels, and then identifies the vowel position, size and number of vowels If it compares with each other, it is judged that the vowels are recognized correctly. If it does not match, the vowel recognition is difficult.

인식방법의 결정을 위한 정보인 수평모음과 수직모음 및 조각획 정보는 다음과 같은 과정으로 이루어진다. 수평모음의 추출은 수평모음 존재영역을 추출한 후 그 영역내에서 수평선을 추출하고 상하 돌기 영상의 존재유무를 판단함으로써 이루어진다. 수직모음의 추출도 수평모음 추출방식과 유사하게 수직모음 존재영역을 추출한 후 수직선을 추출하고 좌우돌기 영상을 판별한 후 수직모음을 추출한다. 조각난 획의 추출은 윤곽선 추적 알고리즘을 이용하여 내외곽선을 검출하고 조각난 획 정보를 추출하는 기술이다.Horizontal collection, vertical collection and fragment information which are information for determining the recognition method are composed of the following process. The horizontal vowel extraction is performed by extracting the horizontal vowel existence region, extracting the horizontal line in the region, and determining the existence of the up-down projection image. Similar to the horizontal vowel extraction method, the vertical vowel extraction method extracts the vertical vowel presence area, then extracts the vertical line, determines the left and right projection images, and then extracts the vertical vowel. Extraction of fragmented stroke is a technique of detecting inner and outer contours and extracting fragmented stroke information by using a contour tracking algorithm.

다음으로, "잡영이 포함된 저 품질 문자의 한글 인식방법"은 입력 문자영상에 대해서 자소를 분리하는 단계와, 분리된 자소에 대해 자소를 구성하는 획들간의 비교를 통해 획의 두께 및 변화치를 이용하여 자소에서 그물눈들을 추출하는 단계와, 그물눈내의 획변화 특징치를 추출하는 단계와, 획변화 특징치 벡터와 기준 특징치벡터들간의 유사도를 비교함으로써 자소를 인식하는 단계와, 인식된 자소를 조합하는 단계로 구성되어 있다.Next, the method for recognizing the Hangul of low-quality characters including Jyoyoung includes the steps of separating phonemes from the input text image, and comparing the stroke thickness and the change value between the strokes constituting the phonemes. Extracting mesh eyes from the phoneme, extracting the stroke change feature in the eye, and recognizing the phoneme by comparing the similarity between the stroke change feature vector and the reference feature vector, and combining the recognized phonemes It is composed of steps.

분리된 자소에서 그물눈을 추출하기 위해서 자소의 수직방향과 수평방향의 획을 추출한 후, 이웃획들과 위치 및 크기를 비교하여 획의 두께와 변화치가 일정 값 이상이 될 까지 반복하는 과정을 수행한다.In order to extract the mesh from the isolated phoneme, we extract the vertical and horizontal strokes of the phoneme, and compare the position and size with the neighboring strokes and repeat the process until the stroke thickness and the change value are over a certain value. .

그물눈내의 획변화치를 특징치 벡터로 하여 기준 특징치 벡터와의 유사도를 비교하게 된다. 이 과정에서 인식기준이 되는 유사도를 획의 위치변화에 둔감하게 하기 위해 비교 대상 영역에 대한 가중치를 둔다.The similarity with the reference feature vector is compared using the stroke change in the mesh as the feature vector. In this process, in order to make the similarity, which is the recognition criterion, insensitive to the change of stroke position, weight is given to the area to be compared.

이로 인하여 획의 변화치에 따라 자동으로 형성되는 비선형 간격의 그물눈에서 획변화 특징치를 추출하여 이웃하는 영역들과 거리차를 구하고 거리차들을 바탕으로 유사도를 결정하여 자소를 인식하고 조합을 하는 기술이다.As a result, it extracts the feature of the stroke change from the mesh of the nonlinear interval automatically formed according to the change of the stroke, obtains the distance difference with the neighboring areas, and determines the similarity based on the distance difference, and recognizes and combines the characters.

이와 같이, 선행 특허에 개시된 기술을 살펴보았을 때, 각각의 자소를 독립적으로 인식하고, 자소 인식결과를 결합하므로 문자를 구성하고 있는 자소들 중 한 개의 자소라도 오인식된다면 문자인식 오류가 필연적으로 발생하게 되는 문제점이 여전히 남아 있다.As described above, when looking at the technology disclosed in the prior patent, each phoneme is independently recognized and the phoneme recognition result is combined, so that if one phoneme among the phonemes constituting the letter is misrecognized, a character recognition error will inevitably occur. The problem still remains.

따라서, 본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로서, 그 목적은 스캐너 또는 카메라 등의 영상 입력기를 통해 획득된 문서 영상에 존재하는 한글 문자 영상을 인식함에 있어서, 모음의 오인식으로 인한 문자의 오인식을 방지할 수 있도록 하는 자소기반 문자인식기 성능 향상방법을 제공함에 있다.Accordingly, the present invention has been made to solve the above-described problems, the purpose of which is to recognize the Hangul character image present in the document image obtained through an image input device such as a scanner or camera, The present invention provides a method for improving a phoneme-based character recognizer to prevent misrecognition.

상술한 목적을 달성하기 위한 본 발명에서 자소기반 문자인식기 성능 향상방법은 문서 영상에 포함된 문자 영상에 대한 자소 조합 유형을 분류하는 단계, 상기 분류된 자소 조합 유형이 기초하여 상기 문자 영상에 종성이 포함되어 있는지 판단하는 종성 유무 판단 단계, 상기 종성 유무 판단 단계의 판단결과에 기초하여 상기 문자 영상에 포함된 자소를 분류하는 자소 분류 단계, 상기 종성 유무 판단 단계의 판단결과에 기초하여 상기 자소 분류 단계에서 분류된 초성자음, 중성모음 및/또는 종성자음을 인식하되, 상기 중성모음의 인식은 상기 초성자음 및/또는 종성자음의 인식신뢰도에 기초하여 실행되는 자소 인식 단계, 상기 자소 인식 단계에서 인식된 초성자음, 중성모음 및/또는 조성자음의 인식 결과를 조합하는 자소 조합 단계를 포함하는 것을 특징으로 한다.According to an embodiment of the present invention, there is provided a method of improving a phoneme-based character recognizer. The method of classifying a phoneme combination type for a text image included in a document image may include: A phoneme classification step of classifying the phonemes included in the character image based on the result of the judgment of the presence or absence of the species, and the phoneme classification step based on the result of the judgment of the species presence judgment step Recognizes the consonants, neutral vowels and / or final consonants classified in, wherein the recognition of the vowels is performed based on the recognition reliability of the consonants and / or the final consonants. Comprising a phoneme combining step of combining the recognition results of the consonants, neutral vowels and / or consonants It characterized.

그리고, 상술한 목적을 달성하기 위한 본 발명에서 자소기반 문자인식기 성능 향상방법을 구현한 프로그램을 기록한 기록매체를 갖는 것을 특징으로 한다.In addition, the present invention for achieving the above object is characterized by having a recording medium recording a program implementing the method of improving the phoneme-based character recognizer performance.

도 1은 종래 한글문자 인식 방법 과정에 대하여 도시한 도면이고,1 is a view showing a conventional Hangul character recognition method process,

도 2는 본 발명에 따른 자소기반 문자인식기 성능 향상방법을 수행하기 위한 블록 구성도이며,2 is a block diagram for performing a method for improving a phoneme-based character recognizer according to the present invention.

도 3은 도 2에 도시된 인식부를 상세하게 도시한 도면이며,3 is a view showing in detail the recognition unit shown in FIG.

도 4는 본 발명의 실시 예에 따른 두 벡터간을 연결한 도면이며,4 is a view connecting two vectors according to an embodiment of the present invention.

도 5는 본 발명에 따른 자소기반 문자인식기 성능 향상방법에 대한 상세 흐름도이다.5 is a detailed flowchart illustrating a method of improving a phoneme-based character recognizer according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

10 : 문자 영상 입력부 20 : 자소 조합 유형 분류부10: character image input unit 20: phoneme combination type classification unit

30 : 종성 판단부 40 : 영상 분리부30: final judgment unit 40: image separation unit

50 : 인식부 60 : 결합부50: recognition unit 60: coupling unit

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시 예를 상세하게 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 자소기반 문자인식기 성능 향상방법을 수행하기 위한 블록 구성도로서, 문자 영상 입력부(10)와, 자소 조합 유형 분류부(20)와, 종성 판단부(30)와, 영상 분리부(40)와, 인식부(50)와, 결합부(60)를 포함한다.2 is a block diagram for performing a method for improving a phoneme-based character recognizer according to the present invention. The character image input unit 10, the phoneme combination type classification unit 20, the finality determination unit 30, and an image are shown. The separator 40, the recognition unit 50, and the coupling unit 60 are included.

문자 영상 입력부(10)는 스캐너(S1) 또는 카메라(S2) 등의 영상 입력기로부터 획득된 문서 영상에 존재하는 한글 문자 영상을 자소 조합 유형 분류부(20)에입력한다.The text image input unit 10 inputs the Hangul text image existing in the document image acquired from the image input device such as the scanner S1 or the camera S2 to the phoneme combination type classification unit 20.

자소 조합 유형 분류부(20)는 문자 영상 입력부(10)로부터 입력된 한글 문자 영상을 구성하고 있는 자소들의 조합 유형을 분류하여 종성 판단부(30)에 제공한다.The phoneme combination type classification unit 20 classifies the combination type of phonemes constituting the Hangul character image input from the text image input unit 10 and provides the result to the finality determination unit 30.

종성 판단부(30)는 자소 조합 유형 분류부(20)에 의해 분류되어 제공된 자소의 종성 유무를 판단하여 영상 분리부(40)에 제공한다.The finality determination unit 30 determines whether the finality of the phonemes classified and provided by the phoneme combination type classification unit 20 is provided to the image separation unit 40.

영상 분리부(40)는 종성 판단부(30)의 종성 유무 판단에 따라 종성이 없는 경우, 초성 자음과 중성 모음의 영상을 분리하며, 종성 판단부(30)의 종성 유무 판단에 따라 종성이 있는 경우, 초성 자음 및 중성 모음과 종성 자음의 영상을 분리하여 인식부(50)에 제공한다.The image separator 40 separates the image of the consonant consonant and the neutral vowel when there is no finality according to the finality judgment of the finality determination unit 30, and has a finality according to the finality determination of the finality determination unit 30. In this case, the image of the initial consonant and the neutral vowel and the final consonant are separated and provided to the recognition unit 50.

인식부(50)는 도 3에 도시된 바와 같이, 영상 분리부(40)에 의해 분리된 초성 자음, 종성 자음, 중성 모음을 각각 인식하기 위해 초성 자음 인식부(51)와, 종성 자음 인식부(52)와, 중성 모음 인식부(53)를 구비한다.As shown in FIG. 3, the recognition unit 50 includes an initial consonant recognition unit 51 and a final consonant recognition unit for recognizing the primary consonants, the final consonants, and the neutral vowels separated by the image separator 40. 52 and a neutral vowel recognition unit 53 are provided.

초성 자음 인식부(51)는 초성 자음의 특징치 벡터를 계산하고, 이를 이용하여 초성 자음을 인식하며, 각 초성 자음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 초성 자음을 인식결과로 선정하여 결합부(60)에 제공한다.The first consonant recognizer 51 calculates a feature vector of the first consonant, recognizes the first consonant using the first consonant, calculates the recognition reliability of each first consonant, and selects and combines the first consonant having the highest reliability as the recognition result. It provides to the part 60.

즉, 각 초성 자음에 대한 인식신뢰도를 x_i로 나타내고 이를 성분으로 하는 벡터를 초성 자음 신뢰도 벡터X로 나타낸다. 여기서i는 개별 초성 자음의 인덱스로서,i가 1인 경우 "ㄱ"이고, 2인 경우 "ㄲ"이며, 마지막 인덱스인 19인 경우 "ㅎ"이 될 수 있는데, 이는 수학식 1과 같다.In other words, the recognition reliability of each consonant is denoted by x _i , and the vector having the component is denoted by the consonant reliability vector X. I is an index of individual consonants, i may be "a" when 1, "2" when 2, and "ㅎ" when 19 is the last index, which is the same as Equation 1.

여기서, Nx는 초성 자음의 개수이다.Where Nx is the number of consonants.

종성 자음 인식부(52)는 종성 자음의 특징치 벡터를 계산하고, 이를 이용하여 종성 자음을 인식하며, 각 종성 자음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 종성 자음을 인식결과로 선정하여 결합부(60)에 제공한다.The final consonant recognizer 52 calculates a feature vector of the final consonant, recognizes the final consonant using the final consonant, calculates the recognition reliability of each final consonant, and selects and combines the final consonant having the highest reliability as a recognition result. It provides to the part 60.

즉, 각 종성 자음에 대한 인식신뢰도를 z_i로 나타내고 이를 성분으로 하는 벡터를 종성 자음 신뢰도 벡터Z로나타낸다. 여기서i는 초성 자음의 방식과 마찬가지로 개별 종성 자음의 인덱스를 나타내고 있는데, 이는 수학식 2와 같다.That is, the recognition reliability of each final consonant is represented by z _i , and the vector having the component is represented by the final consonant reliability vector Z. Where i represents the index of the individual final consonants in the same way as the initial consonants.

여기서, Nz은 종성 자음의 개수이다.Where Nz is the number of final consonants.

중성 모음 인식부(53)는 중성 모음의 특징치 벡터를 계산하고, 종성 유무에 따라 인식을 달리하는데, 먼저, 종성이 없는 경우 초성 자음 신뢰도 벡터와 중성 모음 특징치 벡터를 연결하는데, 그 연결은 도 4에 도시된 바와 같이, 연결하며, 연결된 벡터를 이용하여 중성 모음을 인식하며, 각 중성 모음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 중성 모음을 선정하여 결합부(60)에 제공한다.The neutral vowel recognition unit 53 calculates a feature vector of the neutral vowel, and varies recognition depending on the presence or absence of the final vowel. First, in the absence of the final trait, the consonant consonant reliability vector and the neutral vowel feature vector are connected. As shown in FIG. 4, the vowels are connected, the neutral vowels are recognized using the connected vector, the recognition reliability of each neutral vowel is calculated, and the neutral vowels having the highest reliability are selected and provided to the combiner 60.

여기서, 중성 모음 특징치 벡터는F로 표현되며, f_i는i번째의 특징치로 0보다 큰 양의 값이다.Here, the neutral vowel feature vector is represented by F , and f _i is an i- th feature value, which is a positive value greater than zero.

그리고, 초성 자음 인식결과는 중성 모음을 인식함에 있어서, 특징치로 사용되며 수학식 3에 의해 인식 신뢰도 벡터G를 생성한다. 여기서, G는이다.In addition, the initial consonant recognition result is used as a feature value in recognizing a neutral vowel, and generates a recognition reliability vector G by Equation 3. Where G is to be.

즉, 초성 자음의 인식결과 신뢰도(x_i)들은 그들의 합이 1이 되게 정규화되고(전체 신뢰도 값의 합으로 나눔) 다시 중성 모음 특징치 성분 중 최대값(max f)을 곱해주어, 초성 자음 신뢰도의 유효구간이 중성 모음 특징치의 유효구간과 동일하게 하여준다.That is, the recognition result reliability (x _i ) of the initial consonants is normalized so that their sum is 1 (divided by the sum of total reliability values), and then multiplied by the maximum value of the neutral vowel feature values (max f), thereby making the initial consonant reliability The effective section of is equal to the valid section of the neutral vowel feature.

이렇게 하면, 초성 자음의 신뢰도와 중성 모음의 특징치가 정규화되어 신뢰도 값 또는 특징치 값에 편향되지 않는 새로운 특징벡터가 얻어질 수 있다. 이렇게 새롭게 변경된 초성 자음의 신뢰도 g_i는 신뢰도 벡터 G의 성분이다. 그러면, 벡터 G와 F는 연결(concatenate)되어 각 중성 모음들의 인식 신뢰도가 계산되고, 최고 신뢰도를 갖는 중성 모음이 선정된다.In this way, the reliability of the initial consonant and the feature value of the neutral vowel are normalized to obtain a new feature vector that is not biased against the reliability value or the feature value. The reliability g _i of the newly changed consonants is a component of the reliability vector G. Then, the vectors G and F are concatenated to calculate the recognition reliability of each neutral vowel, and the neutral vowel having the highest confidence is selected.

다음으로, 종성이 있는 경우 초성 자음 신뢰도 벡터와, 종성 자음 신뢰도 벡터와 중성 모음 특징치 벡터를 연결하는데, 그 연결은 도 4에 도시된 바와 같이 연결하며, 연결된 벡터를 이용하여 중성 모음을 인식하며, 각 중성 모음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 중성 모음을 선정하여 결합부(60)에 제공한다.Next, when there is a finality, the initial consonant reliability vector, the final consonant reliability vector, and the neutral vowel feature vector are connected, and the connection is connected as shown in FIG. 4, and the neutral vowel is recognized using the connected vector. In addition, the recognition reliability of each neutral vowel is calculated, and the neutral vowel having the highest reliability is selected and provided to the combiner 60.

즉, 인식률이 상대적으로 높은 초성 및 종성 자음을 인식하고 이 정보를 중성 모음의 인식에 활용한다. 중성 모음 인식의 입력은 초성 자음 신뢰도 벡터G와 종성 자음 신뢰도 벡터H(G와 동일한 방식으로 계산), 그리고 중성 모음 특징치 벡터F가 연결되어 각 중성 모음들의 인식 신뢰도가 계산되고, 최고 신뢰도를 갖는 중성 모음을 선정하는 것이다.That is, it recognizes primary and final consonants with a relatively high recognition rate and uses this information to recognize neutral vowels. The input of the neutral vowel recognition is the concatenation of the initial consonant reliability vector G , the final consonant reliability vector H (calculated in the same way as G ), and the neutral vowel feature vector F. Thus, the recognition reliability of each neutral vowel is calculated, and the neutral vowel having the highest reliability is selected.

종성 자음 신뢰도 벡터 H의 성분은 수학식 4와 같다. 여기서, H는이다.The component of the final consonant reliability vector H is shown in Equation 4. Where H is to be.

여기서, 상기 z_i는 i 번째 종성 자음에 대한 인식 신뢰도를 나타내며, 상기 Nz은 종성 자음의 개수이며, 상기 f_j≥0이고, 상기는 중성 모음의 특징치 벡터의 성분 중 가장 큰 성분을 나타내며, 상기 h_i는 변경된 신뢰도이며, 종성 자음 신뢰도 벡터 H의 i번째 성분이다.Here, z _i represents the recognition reliability of the i-th final consonant, Nz is the number of the final consonants, f _j ≥ 0, Denotes the largest component of the feature vector of the neutral vowel, wherein h _i is the changed reliability and the i th component of the final consonant reliability vector H.

결합부(60)는 인식부(50)에 의해 각각 인식된 자소들을 결합하여 최종 문자 인식을 수행한다.The combiner 60 combines the phonemes respectively recognized by the recognizer 50 to perform final character recognition.

도 5의 흐름도를 참조하여, 상술한 구성을 갖는 본 발명에 따른 자소기반 문자인식기 성능 향상방법에 대하여 상세하게 설명한다.Referring to the flowchart of FIG. 5, the method of improving the phoneme-based character recognizer according to the present invention having the above-described configuration will be described in detail.

먼저, 영상 입력기(예로, 스캐너(S1) 또는 카메라(S2)등,..)로부터 문서 영상을 획득한 문자 영상 입력부(10)는 입력된 문서 영상에 존재하는 한글 문자 영상을 자소 조합 유형 분류부(20)에 입력한다(단계 501).First, the character image input unit 10 that obtains a document image from an image input device (for example, a scanner S1 or a camera S2, etc.) is configured to display a Hangul character image existing in the input document image. Enter in step 20 (step 501).

자소 조합 유형 분류부(20)는 문자 영상 입력부(10)로부터 입력된 한글 문자 영상을 구성하고 있는 자소들의 조합 유형을 분류하여 종성 판단부(30)에 제공한다(단계 502).The phoneme combination type classification unit 20 classifies the combination types of the phonemes constituting the Hangul character image input from the text image input unit 10 and provides them to the finality determination unit 30 (step 502).

종성 판단부(30)는 자소 조합 유형 분류부(20)에 의해 분류되어 제공된 자소의 종성 유무를 판단한다(단계 503).The finality determination unit 30 determines whether the finality of the phonemes classified and provided by the phoneme combination type classification unit 20 is present (step 503).

상기 판단 단계(503)에서 자소의 종성이 없는 경우, 영상 분리부(40)에서 초성 자음과 중성 모음의 영상을 분리한 후, 초성 자음은 인식부(50)내 초성 자음 인식부(51)에 제공하고, 중성 모음은 인식부(50)내 중성 모음 인식부(53)에 제공한다(단계 504).If there is no finality of the phoneme in the determination step 503, after separating the initial consonant and the image of the neutral vowel in the image separation unit 40, the initial consonant is transmitted to the initial consonant recognition unit 51 in the recognition unit 50. The neutral vowel is provided to the neutral vowel recognition unit 53 in the recognition unit 50 (step 504).

반면에, 상기 판단 단계(503)에서 자소의 종성이 있는 경우, 영상 분리부(40)에서 초성 자음 및 중성 모음과 종성 자음의 영상을 분리한 후, 초성 자음은 초성 자음 인식부(51)에 제공하고, 중성 모음은 중성 모음 인식부(53)에 제공하며, 종성 자음은 종성 자음 인식부(52)에 제공한다(단계 505).On the other hand, if there is a finality of the phoneme in the determination step 503, after separating the image of the initial consonant and the neutral vowel and the final consonant in the image separation unit 40, the initial consonant is transmitted to the initial consonant recognition unit 51. The neutral vowel is provided to the neutral vowel recognition unit 53, and the final consonant is provided to the final consonant recognition unit 52 (step 505).

초성 자음 인식부(51)는 영상 분리부(40)로부터 분리되어 제공된 초성 자음의 특징치 벡터를 계산하고, 이를 이용하여 초성 자음을 인식하며, 각 초성 자음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 초성 자음을 인식결과로 선정하여 결합부(60)에 제공한다. 즉, 각 초성 자음에 대한 인식신뢰도를 x_i로 나타내고 이를 성분으로 하는 벡터를 초성 자음 신뢰도 벡터X로 나타낸다. 여기서i는 개별 초성 자음의 인덱스로서,i가 1인 경우 "ㄱ"이고, 2인 경우 "ㄲ"이며, 마지막 인덱스인 19인 경우 "ㅎ"이 될 수 있는데, 이는 상술한 수학식 1과 같다(단계 506).The consonant consonant recognizer 51 calculates the feature vector of the consonant consonant provided separately from the image separator 40, recognizes the consonant consonant using the consonant consonant, calculates the recognition reliability of each consonant consonant, Branch is selected as a recognition result of the consonant consonants provided to the coupling unit (60). In other words, the recognition reliability of each consonant is denoted by x _i , and the vector having the component is denoted by the consonant reliability vector X. I is an index of individual consonants, i may be "a" for 1, "2" for 2, and "ㅎ" for 19, which is the same as Equation 1 above. (Step 506).

종성 자음 인식부(52)는 영상 분리부(40)로부터 분리되어 제공된 종성 자음의 특징치 벡터를 계산하고, 이를 이용하여 종성 자음을 인식하며, 각 종성 자음들의 인식 신뢰도를 계산하며, 최고 신뢰도를 가지는 종성 자음을 인식결과로 선정하여 결합부(60)에 제공한다. 즉, 각 종성 자음에 대한 인식신뢰도를 z_i로 나타내고 이를 성분으로 하는 벡터를 종성 자음 신뢰도 벡터Z로나타낸다. 여기서i는 초성 자음의 방식과 마찬가지로 개별 종성 자음의 인덱스를 나타내고 있는데, 이는 상술한 수학식 2와 같다(단계 507).The final consonant recognizer 52 calculates a feature vector of the final consonant provided separately from the image separator 40, recognizes the final consonant using the same, calculates the recognition reliability of each final consonant, The branch is selected as a recognition result of the final consonant and provided to the coupling unit 60. That is, the recognition reliability of each final consonant is represented by z _i , and the vector having the component is represented by the final consonant reliability vector Z. Here i represents the index of the individual final consonants in the same way as the initial consonants, which is the same as Equation 2 (step 507).

중성 모음 인식부(53)는 영상 분리부(40)로부터 분리되어 제공된 중성 모음의 특징치 벡터를 계산하고, 종성 유무에 따라 인식을 달리해야 함에 있어서, 종성 유무를 체크한다(단계 508).The neutral vowel recognition unit 53 calculates a feature vector of the neutral vowel separated from the image separator 40 and checks whether there is a finality in recognition of whether the recognition is different according to the presence or absence of the final vowel (step 508).

상기 체크 단계(508)에서 종성이 없는 경우, 초성 자음 신뢰도 벡터와 중성 모음 특징치 벡터를 연결하는데, 그 연결은 도 4에 도시된 바와 같이, 연결하며(단계 509), 연결된 벡터를 이용하여 중성 모음을 인식하며(단계 510), 각 중성 모음들의 인식 신뢰도를 계산하며(단계 511), 최고 신뢰도를 가지는 중성 모음을 선정하여 결합부(60)에 제공한다(단계 512).If there is no finality in the check step 508, the consonant reliability vector and the neutral vowel feature vector are concatenated, and the concatenation is concatenated (step 509) and neutral using the concatenated vector, as shown in FIG. The vowel is recognized (step 510), the recognition reliability of each neutral vowel is calculated (step 511), and the neutral vowel having the highest reliability is selected and provided to the combiner 60 (step 512).

여기서, 중성 모음 특징치 벡터는F로 표현되며, f_i는i번째 중성 모음의 특징치로 0보다 큰 양의 값이고, 초성 자음 인식결과는 중성 모음을 인식함에 있어서, 특징치로 사용되며 상술한 수학식 3에 의해 인식 신뢰도 벡터G를 생성한다. 즉, 초성 자음의 인식결과 신뢰도(x_i)들은 그들의 합 1이 되게 정규화되고(전체 신뢰도 값의 합으로 나눔) 다시 중성 모음 특징치 성분 중 최대값(max f)을 곱해주어, 초성 자음 신뢰도의 유효구간이 중성 모음 특징치의 유효구간과 동일하게 하여준다.Here, the neutral vowel feature vector is represented by F , f _i is a feature value of the i- th neutron vowel and is a positive value greater than 0, and the initial consonant recognition result is used as a feature value in recognizing the neutral vowel, The recognition reliability vector G is generated by Equation 3. That is, the recognition result reliability (x _i ) of the consonant consonants is normalized to be their sum 1 (divided by the sum of the total reliability values), and then multiplied by the maximum value of the neutral vowel feature values (max f), The effective section is made equal to the valid section of the neutral vowel feature.

이렇게 하면, 초성 자음의 신뢰도과 중성 모음의 특징치가 정규화되어 신뢰도 값 또는 특징치 값에 편향되지 않는 새로운 특징벡터가 얻어질 수 있다. 이렇게 새롭게 변경된 초성 자음의 신뢰도 g_i는 신뢰도 벡터 G로의 성분이다. 그러면, 벡터 G와 F는 연결(concatenate)되어 각 중성 모음들의 인식 신뢰도가 계산되고, 최고 신뢰도를 갖는 중성 모음이 선정된다.In this way, the reliability of the initial consonant and the characteristic value of the neutral vowel are normalized to obtain a new feature vector that is not biased against the reliability value or the feature value. The reliability g _i of the newly changed consonants is a component of the reliability vector G. Then, the vectors G and F are concatenated to calculate the recognition reliability of each neutral vowel, and the neutral vowel having the highest confidence is selected.

상기 체크 단계(508)에서 종성이 있는 경우 초성 자음 신뢰도 벡터와, 종성 자음 신뢰도 벡터와 중성 모음 특징치 벡터를 연결하는데, 그 연결은 도 4에 도시된 바와 같이 연결하며(단계 513), 연결된 벡터를 이용하여 중성 모음을 인식하며(단계 514), 각 중성 모음들의 인식 신뢰도를 계산하며(단계 515), 최고 신뢰도를 가지는 중성 모음을 선정하여 결합부(60)에 제공한다(단계 516).If there is a finality in the check step 508, the initial consonant reliability vector and the final consonant reliability vector and the neutral vowel feature vector are connected, and the connection is connected as shown in FIG. 4 (step 513). Recognize the neutral vowels (step 514), calculate the recognition reliability of each neutral vowels (step 515), and select the neutral vowels having the highest reliability and provide them to the combiner 60 (step 516).

즉, 인식률이 상대적으로 높은 초성 및 종성 자음을 인식하고 이 정보를 중성 모음의 인식에 활용한다. 중성 모음 인식의 입력은 초성 자음 신뢰도 벡터G와종성 자음 신뢰도 벡터H(G와 동일한 방식으로 계산), 그리고 중성 모음 특징치 벡터F가 연결되어 각 중성 모음들의 인식 신뢰도가 계산되고, 최고 신뢰도를 갖는 중성 모음을 선정하는 것이며, 종성 자음 인식 신뢰도 벡터 H의 성분은 상술한 수학식 4와 같다.That is, it recognizes primary and final consonants with a relatively high recognition rate and uses this information to recognize neutral vowels. The input of the neutral vowel recognition is concatenated by the initial consonant reliability vector G and the final consonant reliability vector H (calculated in the same way as G ) and the neutral vowel feature vector F. Therefore, the recognition reliability of each neutral vowel is calculated, and the neutral vowel having the highest reliability is selected, and the component of the final consonant recognition reliability vector H is expressed by Equation 4 described above.

결합부(60)는 인식부(50)에 의해 각각 인식된 자소들을 결합하여 최종 문자 인식을 수행한다(단계 517).The combiner 60 combines the phonemes respectively recognized by the recognizer 50 to perform final character recognition (step 517).

상기와 같이 설명한 본 발명은 스캐너 또는 카메라 등의 영상 입력기를 통해 획득된 문서 영상에 존재하는 한글 문자 영상을 인식함에 있어서, 모음의 오인식으로 인한 문자의 오인식을 방지함으로써, 중성 모음 인식 정보의 잉여도를 높여, 중성 모음의 인식률을 높이고 결과적으로 문자 인식률을 높일 수 있다. 그리고, 상대적으로 인식하기 어려운 중성 모음을 인식하기 위해 초성 자음 및 종성 자음 인식 정보를 중성 모음 인식에 이용함으로써 모음 인식률을 높일 수 있고, 초성 자음 및 종성 자음의 인식 신뢰도 값들을 우선 합이 1이 되도록 정규화한 후 중성 모음 특징치들의 최대값을 곱하여 서로 상이한 유효구간에 분포되는 자음인식 신뢰도 값들과 모음 특징치 값들을 정규화함으로써, 중성 모음의 인식이 자음의 신뢰도 값이나 모음의 특징치 값에 편향되지 않아 안정된 인식률을 얻을 수 있는 효과가 있다.In the present invention described above, in recognizing a Hangul character image existing in a document image acquired through an image input device such as a scanner or a camera, the recognition of a character due to a vowel misrecognition can be prevented, and thus the surplus of neutral vowel recognition information is prevented. To increase the recognition rate of the neutral vowel, and consequently to increase the character recognition rate. In order to recognize the neutral vowels that are relatively difficult to recognize, the vowel recognition rate can be increased by using the initial consonant and the final consonant recognition information for the recognition of the neutral vowel, and the sums of the recognition reliability values of the initial consonant and the final consonant are set to 1 first. By normalizing and then multiplying the maximum values of the neutral vowel feature values to normalize the consonant recognition confidence values and the vowel feature values distributed in different effective intervals, the recognition of the neutral vowel is not biased against the confidence value of the consonant or the feature value of the vowel. As a result, a stable recognition rate is obtained.

Claims

In the phoneme-based character recognition method,

Classifying the phoneme combination type for the text image included in the document image;

Determining whether there is a finality in the text image based on the classified phoneme combination type;

A phoneme classification step of classifying phonemes included in the text image based on the determination result of the finality determination step;

Recognize the consonants, the neutral vowels and / or the consonants classified in the phoneme classification step based on the result of the determination of the presence or absence of the final consonants, wherein the recognition of the neutral vowels is based on the recognition reliability of the consonants and / or the final consonants. A phoneme recognition step, executed based on

And a phoneme combination step of combining the recognition results of the consonant, the consonant and / or the consonant which are recognized in the phoneme recognition step.

The method of claim 1,

In the phoneme recognition step,

As a result of the determination of the presence or absence of the finality, if the finality is not included in the text image, the recognition of the neutral vowel is performed based on the recognition reliability of the initial consonant.

If the character image includes the finality, the recognition of the neutral vowels is performed based on the initial consonant and the recognition reliability of the final consonant. .

The method of claim 2,

In the phoneme recognition step,

Recognition reliability for the initial consonant, the neutral vowel and the final consonant are represented by vectors X and Z respectively expressed by the following equations ,

Equation 1

Equation 2

Here, Nx and Nz represent the number of recognizable leading consonants and final consonants, respectively, and x _i and z _j represent the recognition reliability of the i th initial consonant j th final consonant, respectively. How to improve.

The method of claim 3, wherein

In the phoneme recognition step,

As a result of the determination of the presence or absence of the finality, the recognition of the neutral vowels is performed based on the recognition reliability vector G of the initial consonant expressed by the following equation.

Equation 3

Wherein f _j represents a recognition reliability of an i- th neutral vowel among the recognizable neutral vowels.

The method of claim 4, wherein

Recognition of the neutral vowel is an operator in which the initial consonant reliability vector G and the neutral vowel reliability vector F are defined by the following equation. Connected by Run based on

Here, Ny is a method of improving the phoneme-based character recognizer, characterized in that the number of recognizable neutral vowels.

The method of claim 4, wherein

In the phoneme recognition step,

If the character image includes the finality, the recognition of the neutral vowels is performed in the recognition reliability vector G of the initial consonant and the recognition reliability vector H of the final consonant represented by Equation 4 below. Based on

Equation 4

Method for improving the phoneme-based character recognizer, characterized in that executed.

The method of claim 6,

The recognition of the neutral vowel is a vector in which the consonant reliability vector G, the neutral vowel reliability vector F, and the final consonant reliability vector H are connected. Characteristic-based character recognizer performance improvement method, characterized in that executed on the basis of.

A recording medium on which a program that implements the method of any one of claims 1 to 7 is recorded.