KR100258923B1

KR100258923B1 - Hangeul and english name recognition and error correcting method

Info

Publication number: KR100258923B1
Application number: KR1019940014742A
Authority: KR
Inventors: 김준호; 도정인; 김수형; 이상규
Original assignee: 윤종용; 삼성전자주식회사
Priority date: 1994-06-25
Filing date: 1994-06-25
Publication date: 2000-06-15
Also published as: KR960002067A

Abstract

PURPOSE: A method for recognizing Korean character and English names and correcting an erroneous recognition is provided to compare the results output from a Korean character and English recognizers with each other, and then automatically correct an erroneously recognized result, when a Korean character name and an English name of the same person are provided on a document. CONSTITUTION: An image that includes the Korean character and English names of the same person is inputted and processed(S21). An image corresponding to the Korean character name is extracted from the inputted image, and the extracted image is subjected to a Korean character recognizer after preprocessing(S22). The Korean character name is recognized(S23). An image corresponding to the English name is extracted, preprocessed, and then provided to an English recognizer(S24). The English name is recognized(S25). A recognition result of English description, which corresponds to a candidate character from a Korean character name recognizer, is extracted from a conversion table, and thereby the erroneous recognition is corrected(S27). The Korean character and English names corrected are output(S28,S29).

Description

Korean and English name recognition and misrecognition correction method

제1도는 종래의 한글성명과 영문성명에 대한 인식방법을 도시한 흐름도.1 is a flowchart illustrating a conventional method of recognizing Korean and English names.

제2도는 본 발명에 의한 한글성명과 영문성명에 대한 인식방법을 도시한 흐름도.2 is a flowchart illustrating a method of recognizing a Korean name and an English name according to the present invention.

제3도는 본 발명에 의한 한글성명과 영문성명의 오인식 교정방법을 도시한 흐름도.3 is a flowchart illustrating a method for correcting misrecognition of Korean and English names according to the present invention.

본 발명은 문자인식 방법에 관한 것으로, 특히, 함께 입력되는 동일인의 한글성명과 영문성명에 대한 인식 및 오인식을 교정하는 방법에 관한 것이다.The present invention relates to a character recognition method, and more particularly, to a method for correcting recognition and misrecognition of Korean and English names of the same person input together.

컴퓨터 사용이 일반화되면서 인간과의 손쉬운 인터페이스기능이 요구되고 있는데, 이러한 인터페이스 기능을 수행할 수 있는 대표적인 수단으로는 입력장치와 출력장치가 있다. 그 중에서 인간이 직접적으로 자신의 의사를 표현할 수 있는 수단은 입력장치이다.As the use of computers has become commonplace, easy interface functions with humans are required. Representative means for performing such interface functions are input devices and output devices. Among them, the means by which humans directly express their intentions is an input device.

그러나, 기존의 대부분의 컴퓨터는 입력장치로서 키보드를 사용하고있어 급격히 증가하는 정보량을 처리하는데 한계가 있다. 따라서, 컴퓨터에 인간의 오감과 비슷한 즉, 직접 문자를 인식하거나 음성을 인식하는- 기능을 구비하여 신속하게 정보를 처리할 수 있는 환경이 요구되고 있다.However, most existing computers use a keyboard as an input device, and thus there is a limit in processing a rapidly increasing amount of information. Therefore, there is a demand for an environment that can process information quickly by having a function similar to that of the five senses of a human, that is, directly recognizing a character or recognizing a voice.

문자인식은 패턴인식의 한 분야로서, 영상입력장치를 통해 입력된 영상에서 문자영상의 추출, 정규화, 골격화 등을 수행하는 전처리(Preprocessing)단계, 문자를 인식하는 인식단계, 문맥적 지식을 이용하여 오인식된 문자를 수정하는 후처리(postprocessing)단계를 통해 이루어진다.Character Recognition is a field of pattern recognition. It uses preprocessing step to perform extraction, normalization and skeletalization of the text image from the image input through the image input device. This is done through a postprocessing step that corrects misrecognized characters.

후처리단계로는 문맥적 지식의 표현 방법에 따라 지식의 확률적 표현에 기초한 상향식(bottom-up)방법, 문맥적 지식의 구조적 표현에 기초한 하향식(top-down)방법, 그리고 상향식과 하향식을 결합한 복합적(hybrid) 방법이 존재한다. 상향식방법의 대표적인 알고리즘으로는 비터비 알고리즘(Viterbi Algorithm, VA라고 약하기도 함)이 있고, 하향식방법의 대표적인 알고리즘으로는 사전 룩업(Dictionary Look-Up) 알고리즘(DLA)과 2진(Binary) N-gram 알고리즘(BNA)이 있으며, 복합적방법으로는 사전 비터비 알고리즘(Dictionary Viterbi Algorithm)과 Predictor-Corrector 알고리즘(PCA)이 알려져 있다.As a post-processing step, a bottom-up method based on stochastic representation of knowledge, a top-down method based on structural expression of contextual knowledge, and a combination of bottom-up and top-down according to the method of expressing contextual knowledge. Hybrid methods exist. Representative algorithms of the bottom-up method are Viterbi Algorithm (also abbreviated as VA), and representative algorithms of the top-down method are dictionary look-up algorithm (DLA) and binary N-. The gram algorithm (BNA) is known, and the combined methods include the dictionary viterbi algorithm and the predictor-corrrector algorithm (PCA).

한편, 광학문자인식(OCR:Optical Character Recognition) 기술을 응용한 대부분의 자동 입력방식은 하나 이상의 언어를 대상으로 인식하는 것이 보통이다. 그러나, 언어마다 인식 대상문자수 및 인식방법이 다르기 때문에, 기존의 자동 입력방식은 언어별로 독립적인 인식기를 이용하도록 구현되어 있다.On the other hand, most automatic input methods using optical character recognition (OCR) technology generally recognize one or more languages as targets. However, since the number of characters to be recognized and the recognition method are different for each language, the existing automatic input method is implemented to use an independent recognizer for each language.

제1도는 종래의 한글성명과 영문성명에 대한 인식방법을 도시한 흐름도로서, S11단계에서 동일인에 대한 한글성명과 영문성명이 기록된 문서 혹은 전표를 스캐너 등과 같은 영상입력장치를 통해 한글성명과 영문성명이 포함된 영상으로서 입력처리를 하면, S12단계에서 한글인식기를 이용하여 입력된 영상에서 한글성명영상을 추출하여 전처리(Preprocessing)를 수행하고, S13단계에서 전처리된 한글성명영상에서 한글성명에 대한 인식처리를 한 후, S14단계에서 인식된 한글성명을 출력한다.FIG. 1 is a flowchart illustrating a conventional method of recognizing a Korean name and an English name. In step S11, a document or a journal in which a Korean name and an English name of the same person are recorded is inputted through an image input device such as a scanner. If the input processing is performed as an image including the full name, the Korean name image is extracted from the input image using the Hangul recognizer in step S12, and the preprocessing is performed. After the recognition process, the Hangul name recognized in step S14 is output.

그리고, 영문성명의 경우에는, S15단계에서 입력된 영상에서 영문성명영상을 추출하여 전처리(Preprocessing)를 수행하고, S16단계에서 영문인식기를 이용하여 전처리된 영문성명영상에서 해당되는 영문성명을 인식한 후, S17단계에서 인식된 영문성명을 출력하도록 한다.In the case of the English name, the English name image is extracted from the image input in step S15 to perform preprocessing, and the corresponding English name is recognized in the preprocessed English name image using the English recognizer in step S16. After that, the English name recognized in step S17 to be output.

그러나, 일반적으로 문자의 수가 적은 언어에 대한 문자인식 성능이 문자의 수가 많은 언어에 대한 문자인식 성능보다 높다. 따라서, 영문자에 대한 문자인식 성능이 한글문자에 대한 문자인식 성능보다 높을 수 있다. 만약 제1도에서 사용한 인식기가 상술한 바와 같이 영문자에 대한 문자인식 성능이 한글문자에 대한 문자인식 성능보다 높을 경우에, 한글성명에 대한 문자인식결과가 영문성명에 대한 문자인식결과에 비해 오인식률이 높을 수 있다. 그러나, 제1도에서 사용한 인식기가 영문자에 대한 문자인식 성능에 비해 한글문자에 대한 문자인식 성능이 높을 수도 있는데, 이와 같은 경우에는 한글 인식기보다 영문 인식기의 오인식률이 높게 나타날 수 있다.However, in general, the character recognition performance for a language having a small number of characters is higher than the character recognition performance for a language having a large number of characters. Therefore, the character recognition performance for English characters may be higher than the character recognition performance for Korean characters. If the recognizer used in FIG. 1 has higher character recognition performance than the character recognition performance for the Korean characters as described above, the character recognition result for the Korean name is higher than the character recognition result for the English name. This can be high. However, the recognizer used in FIG. 1 may have a higher character recognition performance for Korean characters than a character recognition performance for English characters. In this case, the recognition rate of the English recognizer may be higher than that of the Korean recognizer.

이와 같이 인식기를 독립적으로 사용함으로 인해 발생될 수 있는 각 상황에서의 오인식률을 저하시키기 위해서, 기존에는 각 언어에 대한 인식기의 출력결과를 참조하여 사람이 직접 교정하는 방식이 이용되었다.In order to reduce the false recognition rate in each situation that can be generated by using the recognizers independently, a method of directly calibrating a person by referring to the output of the recognizers for each language has been used.

따라서, 본 발명의 목적은 문서 또는 전표상에 동일인에 대한 한글성명과 영문성명이 존재하는 경우, 한글인식기 및 영문인식기가 출력하는 결과를 상호 참조하여 오인식된 결과를 자동적으로 교정하는 오인식 교정방법을 제공하는데 있다.Accordingly, an object of the present invention is to correct a misrecognition method that automatically corrects a misrecognized result by cross-referencing a result outputted by a Korean recognizer and an English recognizer when a Korean name and an English name exist for the same person in a document or a document. To provide.

본 발명의 다른 목적은 상기와 같은 오인식 교정방법을 이용하는 성명인식방법을 제공하는데 있다.Another object of the present invention is to provide a statement recognition method using the above miscorrection correction method.

상기와 같은 목적을 달성하기 위하여 본 발명에 의한 오인식 교정방법은, 제1문자 및 제2문자로 기재된 성명의 영상을 입력하여 제1문자로 기재된 성명의 오인식을 제2문자로 기재된 성명의 인식결과를 참조하여 교정하는 방법에 있어서, 제1문자의 성명 인식결과의 후보문자로부터 제2문자의 표기를 생성하는 제1단계; 생성된 제2문자의 표기중에서 제2문자의 성명 인식기의 인식길이를 벗어나는 제2문자의 표기를 삭제하는 제2단계; 생성된 제2문자의 표기중에서 동일한 위치의 제2문자의 성명인식 후보집합에 없는 문자를 제거하는 제3단계; 제1문자의 성명 인식기의 성능보다 제2문자의 성명 인식기의 성능이 높은 경우에만 제3단계를 거친 제2문자의 표기를 인식 신뢰도에 따라 정렬하는 제4단계; 제4단계를 거친 제2문자의 표기를 다시 제1문자로 변환하는 제5단계; 변환된 제1문자의 표기중에서 제1문자의 성명인식 후보문자집합에 없으면 제거하는 제6단계; 제1문자의 성명 인식기의 성능보다 제2문자의 성능 인식기의 성능이 낮은 경우에만 제1문자의 인식 신뢰도에 따라 정렬하는 제7단계; 및 제7단계를 거친 제1문자중에서 가장 높은 신뢰도를 갖는 후보문자와 이에 해당하는 제2문자의 후보문자를 제1문자 및 제2문자의 1등후보로 각각 선택하는 단계를 포함하는 것이 바람직하다.In order to achieve the above object, the misrecognition correction method according to the present invention, by inputting the image of the name described in the first letter and the second letter, the recognition result of the name described in the second letter of the name misrepresentation of the first letter A method for calibrating with reference to the method comprising: a first step of generating a representation of a second character from candidate characters of a name recognition result of the first character; Deleting a representation of the second character outside the recognition length of the name recognizer of the second character from the generated representation of the second character; Removing a character not included in the name recognition candidate set of the second character at the same position from the generated second character representation; A fourth step of aligning the notation of the second character that has passed through the third step according to the recognition reliability only when the performance of the name recognizer of the second character is higher than that of the name recognizer of the first character; A fifth step of converting the representation of the second character, which has passed through the fourth step, into a first character; A sixth step of removing the candidate character set of the first character out of the notation of the converted first character if not present; A seventh step of sorting according to the recognition reliability of the first character only when the performance of the second character is lower than that of the first name character recognizer; And selecting the candidate characters having the highest reliability among the first characters having passed through the seventh step and the candidate characters of the corresponding second characters as the first candidates of the first and second characters, respectively. .

이어서, 첨부한 도면을 참조하여 본 발명을 상세히 설명하기로 한다.Next, the present invention will be described in detail with reference to the accompanying drawings.

우선, 본 발명은 동일인에 대한 한글성명과 영문성명을 모두 포함하는 영상을 입력대상으로 한다. 한 문자에 대해 하나보다 많은 수의 인식결과와 이에 해당하는 인식 신뢰도를 출력할 수 있는 한글인식기와 영문인식기가 각각 하나 이상 있어야 하며 한글성명의 자소와 이에 대응하는 영문표기에 대한 변환테이블과 인식된 영문성명의 결과가 한글성명의 적법한 영문표기인가를 검증하는 규칙이 있어야 한다.First, the present invention uses an image including both a Korean name and an English name for the same person as an input object. There must be at least one Korean recognizer and one English recognizer capable of outputting more than one recognition result and corresponding recognition reliability for each character, and the conversion table for the phoneme of the Hangul name and the corresponding English notation There must be a rule verifying that the result of an English statement is a legal English notation.

또한, 한글성명의 인식결과와 영문성명의 인식결과간의 상호 참조에 의한 교정은 성명 전체를 단위로 하지 않고, 한글 성명의 한 문자에 대응하는 영문 알파벳을 그 단위로 한다. 따라서 성명 후처리를 위해서는 한글성명에서 사용되는 자소들이 어느 영문알파벳들과 대응하는지에 대한 테이블이 필요하다. 여기서 성명 후처리는 전표상에 한글성명과 영문성명필드가 동시에 존재할 경우, 한글성명의 인식결과와 영문성명의 인식결과를 상호 참조하여 오인식된 문자를 수정하는 과정이다.In addition, the correction by cross-reference between the recognition result of the Korean name and the recognition name of the English name is not based on the entire name, but on the English alphabet corresponding to one letter of the Korean name. Therefore, for statement post-processing, we need a table of which alphabets correspond to the phonemes used in Korean names. Here, the statement post-processing is a process of correcting misrecognized characters by cross-referencing the recognition result of the Korean name and the English name when both the Korean name and the English name field exist simultaneously in the document.

제2도는 본 발명에 의한 한글성명과 영문성명에 대한 인식방법을 도시한 흐름도로서, S21단계에서 제1도의 S11단계에서와 같이 전표 혹은 문서상에 기록된 글자를 스캐너 등과 같은 영상입력장치를 이용하여 읽어들여 형성된 동일인에 대한 한글성명과 영문성명을 모두 포함하는 영상에 대한 입력처리를 한다.2 is a flowchart illustrating a method for recognizing a Korean name and an English name according to the present invention, and in step S21 to S11 of FIG. Input processing is performed on an image containing both Korean and English names for the same person.

S22단계에서는 입력된 문서영상으로부터 한글성명에 해당하는 영상들을 추출하고, 추출된 영상을 정규화 등과 같은 과정을 통해 전처리하여 한글인식기(미 도시됨)의 입력영상으로 제공한다. 이에 따라 한글인식기(미 도시됨)는 S23단계에서 입력영상에 대해 기존과 같은 방식으로 인식처리를 수행하여 하나 이상(본 발명의 실시 예에서는 10개)의 '후보 문자'와 해당 후보 문자의 '인식 신뢰도'를 각각 출력한다.In step S22, the image corresponding to the Hangul name is extracted from the input document image, and the extracted image is preprocessed through a process such as normalization and provided as an input image of a Hangul recognizer (not shown). Accordingly, the Hangul recognizer (not shown) performs the recognition process on the input image in the same manner as in the previous step S23, and at least one (10 in the embodiment of the present invention) 'candidate characters' and' Output the recognition reliability.

인식 신뢰도는 한글인식기에서 채택한 인식방법에 따라 각기 다른 형태로 출력되는데, 예를 들어 한글인식기가 통계적 인식방법중 최소거리 패턴분류에 의한 인식방법을 채택한 경우에, 입력 문자영상과 후보 문자와의 유클리드 거리(Euclidean distance)가 해당 후보 문자의 신뢰도로서 출력된다. 이 때, 유클리드 거리가 가까울수록 신뢰도가 높은 후보 문자가 된다. 또한 한글인식기가 확률에 기인한 문자인식방법을 채택한 경우에, 입력영상이 후보 문자에 속할 확률 값이 해당 후보 문자의 신뢰도로서 출력된다. 이 때, 확률 값이 1에 가까울수록 신뢰도가 높은 후보 문자가 된다. 한편, S24단계에서는 입력된 문서영상으로부터 영문성명에 해당하는 영상을 추출하고, 추출된 영상을 정규화 등의 전처리과정을 거쳐 영문인식기의 입력영상으로 제공한다. 이에 따라 영문인식기는 S25단계에서 입력영상에 대해 하나 이상의 '후보 문자'와 그에 따른 '인식 신뢰도'를 각각 출력한다. 출력되는 인식 신뢰도는 상술한 S23단계에서 설명한 바와 같이 영문 인식기에서 채택한 인식방법에 따라 결정된 형태로 출력된다.The recognition reliability is output in different forms according to the recognition method adopted by the Hangul recognizer. For example, when the Hangul recognizer adopts the recognition method by the minimum distance pattern classification among the statistical recognition methods, Euclid between the input character image and the candidate character The distance (Euclidean distance) is output as the reliability of the candidate character. At this time, the closer the Euclidean distance is, the higher the candidate character becomes. In addition, when the Hangul recognizer adopts the character recognition method based on the probability, the probability value that the input image belongs to the candidate character is output as the reliability of the candidate character. At this time, the closer the probability value is to 1, the more candidate characters are for reliability. Meanwhile, in step S24, an image corresponding to the English name is extracted from the input document image, and the extracted image is provided as an input image of the English recognizer after preprocessing such as normalization. Accordingly, the English recognizer outputs one or more candidate texts and corresponding recognition reliability for the input image in step S25. The output recognition reliability is output in a form determined according to the recognition method adopted by the English recognizer as described in step S23 described above.

S27단계에서는 한글성명 인식기(미 도시됨)로부터 제공되는 후보 문자들중 1등 후보문자와 영문성명 인식기로부터 제공되는 후보 문자들중 1등 후보문자가 동일하지 않으면, 한글성명 인식기의 후보문자들에 대응되는 영문표기의 인식결과를 변환 테이블(50)에서 검출하여 오인식을 교정한다.In step S27, if the first candidate character among the candidate characters provided from the Korean name recognizer and the first candidate character among the candidate characters provided from the English name recognizer are not the same, the candidate characters of the Korean name recognizer The recognition result of the corresponding English notation is detected in the conversion table 50, and the misrecognition is corrected.

S28단계에서는 교정된 한글성명을 기존과 같은 방식으로 출력하고, S29단계에서는 교정된 영문성명을 기존과 같은 방식으로 출력한다.In step S28, the corrected Korean name is output in the same manner as before, and in step S29, the corrected English name is output in the same manner.

제3도는 본 발명에 의한 한글성명과 영문성명의 오인식 교정방법에 대한 동작 흐름도이다.3 is a flowchart illustrating a method for correcting misrecognition of Korean and English names according to the present invention.

먼저, S31단계에서 한글인식기에서 인식한 '한글성명의 후보문자'와 이에 해당하는 '인식 신뢰도'를 오인식 교정을 위해 입력되도록 처리하고, 영문인식기에서 인식한 '영문성명의 후보 문자'와 이에 해당되는 '인식 신뢰도'도 오인식 교정을 위해 입력되도록 처리한다.First, the process of inputting the 'Hangul full name candidate' recognized by the Hangul recognizer and the corresponding 'recognition reliability' in step S31 is corrected for misrecognition. 'Recognition Reliability' is processed so that it is input for misrecognition correction.

그리고, S33단계에서 입력된 영문성명 후보 문자가 한글성명의 적법한 영문표기인가를 검증한다. 즉, 변환 테이블(50)을 참조하여 영문성명 후보 문자의 문자열이 적법한 한글성명의 한 문자를 나타내는지를 검증한다.Then, it is verified whether the English name candidate character input in step S33 is a legitimate English notation of the Korean name. That is, the conversion table 50 is verified to verify whether the character string of the English name candidate character represents one character of a legitimate Korean name.

예를 들어, 변환 테이블(50)에 표1과 같은 검증을 위한 테이블이 존재할 때, 문자열이 u{S,I,K}인 영문성명 후보 문자의 초성 매핑함수는 'ㅅ'을 출력하고, 중성 매핑함수는 'ㅣ'를 출력하고, 종성 매핑함수는 'ㄱ'을 출력하므로, 이 영문성명 후보 문자는 적법한 영문 표기로 판단되고, 문자열이 u{S,Y,I,G}인 영문성명 후보 문자의 초성 매핑함수는 'ㅅ'을 출력하고, 중성 매핑함수는 'ㅣ'를 출력하고, 종성 매핑함수는 'ㄱ'을 출력하므로, 이 영문성명 후보 문자 역시 적법한 영문 표기로 판단된다.For example, when a table for verification as shown in Table 1 exists in the conversion table 50, the initial mapping function of the English name candidate character whose string is u {S, I, K} outputs 'ㅅ', Since the mapping function outputs 'ㅣ' and the final mapping function outputs 'ㄱ', this English name candidate character is considered to be a legitimate English notation, and the string is u {S, Y, I, G}. Since the initial mapping function of the letter outputs 'ㅅ', the neutral mapping function outputs 'ㅣ', and the final mapping function outputs 'ㄱ', this candidate name is also considered to be a legitimate English representation.

[표 1]TABLE 1

그러나, 문자열이 u{S,Y,A,G}인 영문성명 후보 문자의 경우에는, 초성 매핑함수는 'ㅅ'을 출력하고, 중성 매핑함수는 '적용불가'를 출력하고, 종성 매핑함수는 'ㄱ'을 출력하므로, 적법하지 않은 영문표기로 판단되고, 문자열이 u{S,Y,I,Q}인 영문성명 후보 문자의 경우에는, 초성 매핑함수는 'ㅅ'을 출력하고, 종성 매핑함수는 'ㅣ'를 출력하고, 종성 매핑함수는 '적용불가'를 출력하므로, 적법하지 않은 영문표기로 판단된다.However, in the case of English name candidate characters with the string u {S, Y, A, G}, the initial mapping function outputs 'ㅅ', the neutral mapping function outputs 'not applicable', and the final mapping function Since 'ㄱ' is output, it is determined to be an illegal English notation, and in the case of an English name candidate character whose string is u {S, Y, I, Q}, the initial mapping function outputs 'ㅅ', and the final mapping The function outputs 'ㅣ' and the final mapping function outputs 'Not applicable', so it is considered to be an illegal English notation.

이와 같이 문자열의 초성, 중성, 종성의 매핑함수중 어느 하나라도 적용불가를 출력하는 경우에는 적법하지 않은 영문표기를 갖는 후보 문자로 판단한다.In this case, if any of the mapping functions of the initial, neutral, and final character of the string is not applicable, the candidate character having the illegal English notation is determined.

그 다음, S34단계에서 1등 후보 문자의 영문성명이 1등 후보 문자의 한글성명 인식결과와 동일한 지를 비교한다. 즉, 변환테이블(50)을 참조하여 1등 후보 문자에 해당되는 영문성명의 문자열의 초성, 중성, 종성의 매핑함수의 결과를 합쳐 하나의 한글문자를 생성한다. 예를 들어 영문성명 후보 문자의 문자열이 u{S,I,K}이고, 이에 대한 초성 매핑함수의 출력이 'ㅅ'이고, 중성 매핑함수의 출력이 'ㅣ'이고, 종성 매핑함수의 출력이 'ㄱ'이면, 이들 매핑함수의 출력결과를 합쳐 '식'자가 생성된다. 이와 같이 생성한 하나의 한글문자와 상술한 1등 후보 문자에 해당하는 하나의 한글문자를 비교한다. 비교결과, 동일하면, 다음 글자에 대한 오인식을 교정할 수 있도록 해당 문자에 대한 오인식 교정작업을 종료한다.Next, in step S34, it is compared whether the English name of the first candidate character is the same as the Korean name recognition result of the first candidate character. That is, one Korean character is generated by combining the results of the mapping functions of the initial, neutral, and final characters of the English name string corresponding to the first candidate character with reference to the conversion table 50. For example, the string of English name candidate character is u {S, I, K}, the output of initial mapping function is 'ㅅ', the output of neutral mapping function is 'ㅣ', and the output of final mapping function is If 'a', then the output of these mapping functions is combined to form an expression. One Korean character generated as described above is compared with one Korean character corresponding to the first candidate character. If the comparison result is the same, the misrecognition of the character is corrected to correct the misrecognition of the next character.

그러나, 동일하지 않으면, S35단계 내지 S42단계를 수행하여 오인식을 교정한다.However, if not the same, perform the steps S35 to S42 to correct the misperception.

즉, S35단계에서는 한글성명 인식결과로 인가된 모든 후보문자에 대응하는 영문표기를 변환테이블(50)을 이용하여 생성한다. 예를 들어, 한글 '식'에 대해서 초성 'ㅅ'에 대응되는 영문표기로 'S', 'SH'가 존재하고, 중성 'ㅣ'에 대응되는 영문표기로 'I', 'YI'가 존재하고, 종성 'ㄱ'에 대응되는 영문표기로 'K', 'G'가 존재한다면, 생성되는 영문표기는 'SIK', 'SIG', 'SYIK', 'SYIG', 'SHYIK', 'SHYIG'등 8가지가 된다. 변환테이블(50)은 한글성명표기와 이에 대응하는 영문성명표기가 상술한 표 1에서와 같이 구비되어 한글성명표기와 영문성명표기를 상호 변환할 때 참조할 수 있도록 구현된다.That is, in step S35, the English notation corresponding to all candidate characters applied as the Korean name recognition result is generated using the conversion table 50. For example, 'S' and 'SH' are the English letters corresponding to the initial consonant 'ㅅ', and 'I' and 'YI' are the English letters corresponding to the neutral 'ㅣ'. If the 'K' and 'G' exist as the English notation corresponding to the final word 'ㄱ', the generated English notation is 'SIK', 'SIG', 'SYIK', 'SYIG', 'SHYIK', 'SHYIG' 'Eight things will be. The conversion table 50 is provided with a Korean name notation and a corresponding English name notation as shown in Table 1 above so that the Korean name notation and the English name notation can be converted to each other for reference.

S36단계에서는 '생성된 영문표기'중에서 그 길이가 '인식된 영상의 영문표기'의 길이와 같지 않으면 버린다. S37단계에서는 '생성된 영문표기'의 모든 알파벳이 같은 위치의 '영문인식기의 후보문자집합'에 존재하지 않으면 버린다. S38단계에서는 S36단계와 S37단계를 거쳐 남은 '생성된 영문표기'에 대해 S37단계에서 매칭되는 '후보 문자 알파벳의 신뢰도의 합'을 기준으로 정렬한다.In step S36, the length of the generated English notation is not the same as the length of the English notation of the recognized image. In step S37, all the alphabets of the generated English notation are not present in the candidate character set of the English recognizer at the same position. In step S38, the generated English notation remaining in steps S36 and S37 is sorted based on the sum of the reliability of the candidate alphabets matched in step S37.

그러나, 이와 같은 정렬은 선택적으로 이루어진다. 즉, 이용하는 인식기가 한글성명 한 자의 인식성능보다 영문 성명표기 전체의 인식성능이 떨어진 경우에는 수행하지 않고 바로 S39단계로 진행되도록 하고, 이용하는 인식기의 인식성능이 반대인 경우에는 상술한 바와 같이 신뢰도에 따라 영문표기를 정렬하도록 한다.However, such alignment is optional. In other words, if the recognizer to be used has the recognition performance of the entire English name notation less than that of a single Hangul statement, the process proceeds directly to step S39. If the recognition performance of the recognizer to be used is reversed, the reliability is as described above. Sort the English notation accordingly.

S39단계에서는 '생성된 영문표기'를 변환테이블(50)을 이용하여 다시 한글로 변환한다. S40단계에서는 변환된 문자가 '한글성명 인식결과의 후보문자 집합'에 존재하지 않으면 제거한다. 존재하면 S41단계에서 이 후보문자의 한글 신뢰도에 근거하여 정렬한다. 그러나 S41단계에서 상술한 S38단계와 같이 선택적으로 이루어진다. 즉, 이용하는 인식기의 성능이 한글성명 한 자의 인식성능보다 영문성명 표기 전체의 인식성능이 높으면, S41단계는 수행되지 않고 바로 S42단계로 진행된다. 그러나, 이용하는 인식기의 성능이 반대인 경우에는 S41단계에서 한글 신뢰도에 따른 정렬이 이루어진다.In operation S39, the generated English notation is converted back to Korean using the conversion table 50. In step S40, if the converted character does not exist in the candidate character set of the Hangul name recognition result, it is removed. If present, in step S41, the candidate is sorted based on the Hangul reliability. However, in step S41 it is optionally made as in step S38 described above. That is, if the performance of the recognizer to be used is higher than the recognition performance of a single Korean name, the recognition performance of the entire English name notation is not performed. However, when the performance of the recognizer used is reversed, alignment is performed according to Hangul reliability in step S41.

S42단계에서는 정렬된 후보문자중에서 가장 높은 신뢰도의 한글문자후보 문자와 이에 해당하는 영문표현을 1등 후보 문자로 선택하여 저장한 뒤, 해당 문자에 대한 오인식 교정작업을 종료한다. 이러한 과정은 한글성명의 모든 문자에 대해 적용된다.In step S42, after selecting and storing the Hangul character candidate character and the corresponding English expression of the highest reliability among the sorted candidate characters as the first candidate character, the misrecognition correcting operation for the corresponding character ends. This process applies to all characters in the Hangul statement.

표 2는 10개의 영문인식결과를 이용한 한글문자의 오인식 교정을 설명하기 위한 예로서, 입력된 한글영상이 '권'인 경우에 올바르게 인식된 한글성명 인식결과는 4등 후보 문자에 존재하나 이에 대응하는 영문표현 'KWON'은 모든 알파벳이 1등 후보 문자에 존재한다. 따라서 상기 오인식 교정 과정에 의해 한글성명 인식결과의 모든 후보문자를 이들에 대응하는 영문표현으로 변환하여 정렬하게 되면, 'KWON'에 해당하는 표현이 가장 높은 신뢰도를 갖게 되고, 이를 다시 한글로 변환하면, 한글성명 인식결과의 후보문자집합에 존재하므로, 한글성명 1등 후보 문자는 '원'에서 '권'으로 교정된다. 표 2에서 각 후보문자의 옆에 위치한 값은 해당문자의 '인식 신뢰도'로서 낮을수록 높은 신뢰도를 나타낸다.Table 2 is an example to explain the correction of Korean character recognition using 10 English recognition results.If the input Korean image is 'volume', the Korean name recognition result recognized correctly exists in the 4th candidate character but corresponds to this. In the English expression 'KWON', all alphabets are present in the first candidate character. Therefore, when all candidate characters in the Korean name recognition result are converted into English expressions corresponding to these by the misrecognition correction process, the expression corresponding to 'KWON' has the highest reliability, and if it is converted into Korean again, The first candidate character of the Hangul name is corrected from 'circle' to 'volume' because it exists in the candidate character set of the Korean name recognition result. In Table 2, the value placed next to each candidate character is the 'recognition reliability' of the character, and the lower the value, the higher the reliability.

[표 2]TABLE 2

표 3은 10개의 한글인식결과를 이용한 영문자의 오인식 교정을 설명하기 위한 예로서, 입력영상 '식'에 대해서 올바르게 인식된 한글성명 인식결과는 1등 후보 문자에 존재하고, 이에 대응하는 영문표현 'SIG'중에서 'I'의 인식결과는 2등후보 문자에 존재하고 있으나 상기 오인식 교정과정을 통해 1등 후보 문자로 교정되게 된다.Table 3 is an example to explain the correction of the English character misidentification using 10 Korean recognition results. The Korean name recognition result correctly recognized for the input image 'Equation' exists in the first-class candidate character, and the corresponding English expression ' The recognition result of 'I' in SIG 'is present in the second candidate, but is corrected to the first candidate through the misrecognition correction process.

[표 3]TABLE 3

이상에서 설명한 바와 같은 한글성명과 영문성명의 인식결과를 상호 참조하여 오인식을 교정하는 방법은, 임의의 두 문자간에서도 성립할 수 있음은 자명하다. 특히, 인식기능이 상대적으로 우월한 한글성명과 상대적으로 열등한 한자성명 사이에서 위에서 설명한 방법을 그대로 적용할 수 있다.As described above, the method of correcting the misperception by cross-referencing the recognition results of the Korean name and the English name can be established between any two characters. In particular, the above-described method can be applied between a Hangul statement with a superior recognition function and a Hanja statement with a relatively inferior function.

본 발명의 오인식 교정방법을 사용하면, 사용하는 인식기의 성능에 따라 오인식을 용이하게 교정할 수 있다. 즉, 사용하는 인식기의 성능이 한글이 우월한지 영문이 우월한지에 따라 우월한 측의 인식률로 운영될 수 있도록 오인식을 용이하게 교정할 수 있다.By using the misrecognition correction method of the present invention, the misrecognition can be easily corrected according to the performance of the recognizer to be used. That is, the recognition of the recognition can be easily corrected to be operated at the recognition rate of the superior side depending on whether the performance of the recognizer is superior to Korean or English.

Claims

A method of correcting a misrecognition of a name written in the first letter by referring to a recognition result of the name written in the second letter by inputting an image of the name written in the first and second letters, the name of the first letter. A first step of generating a representation of the second character from candidate characters of the recognition result; Deleting a representation of the second character outside the recognition length of the name recognizer of the second character among the generated representations of the second character; Removing a character not included in the name recognition candidate set of the second character at the same position among the generated second character representations; A fourth step of aligning the notation of the second character that has passed through the third step according to the reliability of recognition only if the performance of the name recognizer of the second character is higher than that of the name recognizer of the first character; A fifth step of converting the notation of the second character that has passed through the fourth step back into the first character; A sixth step of removing the candidate character set of the first character from among the notation of the converted first character if not present; A seventh step of sorting according to the recognition reliability of the first character only when the performance of the name recognizer of the second character is lower than that of the name recognizer of the first character; And selecting candidate candidates having the highest reliability among the first characters having passed through the seventh step and candidate characters of the corresponding second characters as the first candidates of the first and second characters, respectively. Misconception correction method of the statement characterized by.

The method of claim 1, wherein the first character is Korean and the second character is English.

The method of claim 1, wherein the first character is Chinese and the second character is Korean.

A name recognition method of correcting a misrecognition of a name of the first character by referring to a recognition result of the name of the second character by inputting the image of the name represented by the first and second characters, the image of the name A first step of extracting and preprocessing the name image of the first character from the first image; A second step of recognizing a name of the first character from a name image of a preprocessed first character; A third step of extracting and preprocessing the name of the second character from the image of the name; A fourth step of recognizing the name of the second character from the name image of the pre-processed second character; A fifth step of verifying a representation of a name of the second character; A sixth step of comparing a name of the first character recognized in the second step with a verification result of the second character transmitted through the fifth step, and correcting misrecognition of the first character if it is not the same; And a seventh step of outputting the corrected names of the first and second characters, respectively.

The method of claim 4, wherein the first character is Korean and the second character is English.

The method of claim 4, wherein the first character is a Chinese character, and the second character is a Korean character.