KR19990015436A

KR19990015436A - Character recognition matching method

Info

Publication number: KR19990015436A
Application number: KR1019970037562A
Authority: KR
Inventors: 이형호
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-08-06
Filing date: 1997-08-06
Publication date: 1999-03-05
Also published as: KR100258939B1

Abstract

본 발명은 문자 인식에 관한 것으로서, 특히 문자 인식의 매칭 속도 개선 방법에 관한 것이다. 본 발명의 목적을 위하여 문자 영상을 입력하여 영상의 잡음을 제거하고 정규화를 수행하는 제1과정, 상기 제1과정에서 문자중에서 특징을 추출하는 제2과정, 상기 제2과정에서의 각 문자의 특징과 사전의 특징과의 거리를 계산하는 제3과정, 상기 제3과정에서의 거리값(D)이 임의로 결정된 값(Th)보다 크면 다음 문자의 사전과 거리 계산을 하고, 그렇지 않으면 특징 개수(N)에 대해 거리 계산을 반복하는 제4과정을 포함한다. 상술한 바와 같이 본 발명에 의하면, 입력 문자와 유사성이 없는 문자와 거리값을 계산하는 데 있어서, 문자의 특징수(N)를 A(실험에서 얻은 특징의 일부분을 나타내는 상수)번 만큼 계산을 하여 인식이 될 가능성이 없는 거리값(Th)보다 크면 입력 영상이 현재 계산되고 있는 사전의 문자가 될 가능성이 없으므로 (N-P)(P는 A의 배수) 번의 거리 계산을 하지 않고 문자를 인식할 수가 있다.The present invention relates to character recognition, and more particularly, to a method of improving matching speed of character recognition. For the purposes of the present invention, a first process of inputting a character image to remove noise of the image and performing normalization, a second process of extracting a feature from the character in the first process, and a feature of each character in the second process A third step of calculating the distance between the feature and the dictionary; and if the distance value D in the third step is greater than a randomly determined value Th, the distance and the dictionary of the next character are calculated; And a fourth process of repeating the distance calculation for. As described above, according to the present invention, in calculating a character and a distance value having no similarity with an input character, the character number N of the character is calculated by A (a constant representing a part of the characteristic obtained in the experiment). If it is larger than the distance value Th, which is unlikely to be recognized, the input image is not likely to be the character of the dictionary currently being calculated. Therefore, the character can be recognized without calculating the distance (NP) (P is a multiple of A) times. .

Description

Character recognition matching method

본 발명은 문자 인식에 관한 것으로서, 특히 문자 인식의 매칭 속도 개선 방법에 관한 것이다.The present invention relates to character recognition, and more particularly, to a method of improving matching speed of character recognition.

일반적으로 문자 인식은 패턴 인식의 한 분야로 문서 정보의 자동 입력 도구로서 개발되어왔다. 도 1은 일반적인 문자 인식 과정을 보이는 흐름도이며, 문자 영상을 보정 및 개선하는 전처리 과정(110), 특징을 추출하는 과정(120), 추출된 특징을 이용하여 인식 결과를 출력하는 특징 매칭 과정(130)으로 이루어진다. 즉, 문자 영상이 입력되면 정규화등의 전처리 과정을 수행하며(110과정), 전처리가 완료된 후 N개의 특징을 추출한다(120과정). 다음 특징 매칭 과정(130)에서 입력된 문자는 데이터 베이스에 N개의 특징을 미리 추출해놓은 M개의 문자 사전 수 만큼 비교하여 N×M번 만큼의 거리 계산을 하고 인식 결과를 구한다. 이와 같이 종래에는 문자 영상에서 사용되는 특징의 개수 N과 사전(훈련 과정에서 구한 특징값들) 각각의 문자의 특징과 거리(D)를 계산하여 D가 가장 적은 값일 때 문자 인식을 수행하는 매칭 방법을 사용하였다. 즉,In general, character recognition has been developed as an automatic input tool for document information as a field of pattern recognition. 1 is a flowchart illustrating a general character recognition process, a preprocessing process 110 for correcting and improving a text image, a feature extraction process 120, and a feature matching process 130 for outputting a recognition result using the extracted feature. ) That is, when a text image is input, a preprocessing process such as normalization is performed (step 110). After the preprocessing is completed, N features are extracted (step 120). The character input in the next feature matching process 130 compares the number of M character dictionaries previously extracted from the N features into a database, calculates the distance by N × M times, and obtains a recognition result. As described above, a matching method of performing character recognition when D is the smallest value by calculating the feature number and distance D of each character N and a dictionary (feature values obtained in the training process) used in the character image in the related art. Was used. In other words,

(여기서 N는 특징수이며, M은 인식 대상 문자수, Xi는 사전에 있는 특징, Yi는 입력 영상의 특징)으로 정의된다. 따라서 M개의 문자로 구성된 사전에 대하여 한 문자의 영상을 인식하려면, 특징수를 전부 계산하는 N×M번의 거리(D) 계산을 수시로 수행해야하기 때문에 시간이 지체되는 문제점이 있었다.Where N is the number of features, M is the number of characters to be recognized, Xi is the feature in the dictionary, and Yi is the feature of the input image. Therefore, in order to recognize an image of one character with respect to a dictionary composed of M characters, there is a problem that time is delayed because a distance D calculation of N × M times, which calculates the number of features, must be performed at any time.

본 발명이 이루고자하는 기술적 과제는 사전의 한 문자의 특징값과 입력 영상으로부터 구해진 특징값간의 거리값(Dm)이 임의로 결정된 값(Th)보다 크면 다음 문자 사전의 거리를 하여 인식 속도를 개선한 방법을 제공하는 데 있다.The present invention is to improve the recognition speed by the distance of the next character dictionary if the distance value (Dm) between the feature value of one character and the feature value obtained from the input image is larger than the arbitrarily determined value (Th). To provide.

도 1은 일반적인 문자 인식 과정을 보이는 흐름도이다.1 is a flowchart illustrating a general character recognition process.

도 2는 본 발명에 따른 문자 인식의 매칭 방법의 흐름도이다.2 is a flowchart of a matching method of character recognition according to the present invention.

도 3은 도 2의 거리를 계산하기 위한 사전의 특징을 도시한 것이다.3 shows the features of a dictionary for calculating the distance of FIG. 2.

상기의 기술적 과제를 해결하기 위하여 본 발명은 문자 인식의 매칭 방법에 있어서, 문자 영상을 입력하여 영상의 잡음을 제거하고 정규화를 수행하는 제1과정; 상기 제1과정에서 문자중에서 특징을 추출하는 제2과정; 상기 제2과정에서의 각 문자의 특징과 사전의 특징과의 거리를 계산하는 제과정; 상기 제3과정에서의 거리값(D)이 임의로 결정된 값(Th)보다 크면 다음 문자의 사전과 거리 계산을 하고, 그렇지 않으면 특징 개수(N)에 대해 거리 계산을 반복하는 제4과정을 포함하는 것을 특징으로 하는 문자 인식의 매칭 방법이다.In order to solve the above technical problem, the present invention provides a matching method of character recognition, comprising: a first process of inputting a character image to remove noise of the image and performing normalization; A second step of extracting a feature from characters in the first step; Calculating a distance between a feature of each character in the second process and a feature of a dictionary; If the distance value (D) in the third process is larger than the randomly determined value (Th), and the distance calculation with the dictionary of the next character, otherwise it comprises a fourth step of repeating the distance calculation for the number of features (N) Character matching is characterized in that the matching method.

이하에서 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 문자 인식의 매칭 방법의 흐름도이며, 전처리 과정(210), 특징 추출 과정(220), 각 특징과 사전의 특징과의 거리를 계산하는 과정(240, 250, 260, 262, 270, 280 ), 각 문자별 거리 계산값을 선택하는 과정(290)으로 이루어진다.2 is a flowchart of a method of matching character recognition according to the present invention, and includes a preprocessing process 210, a feature extraction process 220, and a process of calculating a distance between each feature and a dictionary feature (240, 250, 260, 262). , 270, 280), and a process 290 of selecting a distance calculation value for each character.

도 2에 도시된 바와 같이 문자 인식을 위하여 먼저 문자 영상을 입력하여 영상의 잡음을 제거하고 정규화를 행하는 전처리 과정을 수행한다(210과정). 다음 전처리 과정(210)의 결과를 입력하여 특징들을 추출한다(220과정). 여기서 특징 추출은 문자 인식을 위한 값으로서 주어진 문자를 인식하기 위하여 문자의 특성을 나타내는 속성들만을 추출한다. 예를 들면 문자 영상을 N×M으로 분리했을 때 한 영역에서의 검은 화소 개수가 특징값이 될 수 있다. 다음 각 문자별 거리 계산값(Dm)을 초기화하며(230과정), 도 3에 도시된 바와 같이 입력된 문자(M)의 특징 개수 N과 사전 각각 문자(M)의 특징 개수 N개를 가지고 거리를 계산한다. 즉,As shown in FIG. 2, a text image is first inputted to remove the noise of the image and normalized for text recognition (step 210). The features of the next preprocess 210 are input to extract the features (step 220). Here, the feature extraction extracts only the attributes representing the characteristics of the character in order to recognize the given character as a value for character recognition. For example, when the character image is divided into N × M, the number of black pixels in one region may be a feature value. Next, the distance calculation value Dm for each character is initialized (step 230), and the distance is obtained by the number of features N of the input letters M and the number of features N of the letters M, respectively, as shown in FIG. Calculate In other words,

수학 식2는 입력된 영상(Y)의 A개까지 특징값과 사전에 있는 문자(X)의 A개 특징값간의 거리 계산이다(240과정). 다음 n개의 특징값 까지의 거리값(P)을 누적시켜 누적값(Dm) 계산한다(250과정). 250과정에서 계산된 누적된 거리값(Dm)과 미리 실험으로 구해진 임의의 값(Th:실험에서 구하여진 인식이 될 가능성이 없는 거리값)을 비교하여(260과정) Th보다 적은 값을 가지고 있으면 문자가 인식될 가능성이 없기 때문에 도 3의 N개의 특징수들중 횟수(n)가 특징수의 일부분인 맨나중의 A보다 클 때 까지 계속 비교해 나가서(262과정) 한 문자의 N개의 특징값의 거리 계산이 완료되었는지를 확인하며(280과정), 입력 영상이 사전에 있는 모든 문자와 거리 계산이 끝나는 지를 확인한다(280과정). 한편 누적된 거리값(Dm)이 Th보다 크면 입력된 문자는 현재 거리로 계산되고 있는 사전의 문자가 되지 않을 가능성이 크기 때문에 나머지의 거리 계산은 하지 않고 바로 점프하여 280과정을 수행한다. 이 과정에서 m M이면 사전 M문자의 거리 계산을 모두 수행한 것이 되므로 수학식 1의 각 문자별 누적된 거리 계산값인 Dm중에 값이 적은 것을 선택하여 그 결과를 출력하며(270과정), m M이면 다시 230과정으로 복귀하여 각 문자별 거리 계산을 계속한다.Equation 2 is a distance calculation between the feature values of up to A of the input image Y and the A feature values of the letter X in the dictionary (step 240). The cumulative value Dm is calculated by accumulating the distance values P up to the next n feature values (250). If the accumulated distance value (Dm) calculated in step 250 is compared with a random value (Th: distance value not likely to be recognized in the experiment) obtained in advance (260 steps) Since the character is not likely to be recognized, the comparison is continued until the number n of the N feature numbers in FIG. 3 is greater than the last A that is a part of the feature number (step 262). It is checked whether the distance calculation is completed (step 280), and the input image is checked whether all characters in the dictionary and the distance calculation are completed (step 280). On the other hand, if the accumulated distance value (Dm) is larger than Th, the input character is not likely to be the dictionary character calculated as the current distance, and thus jumps immediately without performing the remaining distance calculation and performs step 280. In this process, m M means that the distance calculation of the dictionary M characters is performed. Therefore, a small value is selected among Dm, which is a cumulative distance calculation value for each character of Equation 1, and the result is output (270 steps). If M, the process returns to step 230 and the distance calculation for each character is continued.

상술한 바와 같이 본 발명에 의하면, 입력 문자와 유사성이 없는 문자와 거리값을 계산하는 데 있어서, 문자의 특징수(N)를 A(실험에서 얻은 특징의 일부분을 나타내는 상수)번 만큼 계산을 하여 인식이 될 가능성이 없는 거리값(Th)보다 크면 입력 영상이 현재 계산되고 있는 사전의 문자가 될 가능성이 없으므로 (N-P)(P는 A의 배수) 번의 거리 계산을 하지 않고 문자를 인식할 수가 있다.As described above, according to the present invention, in calculating a character and a distance value having no similarity with an input character, the character number N of the character is calculated by A (a constant representing a part of the characteristic obtained in the experiment). If it is larger than the distance value Th, which is unlikely to be recognized, the input image is not likely to be the character of the dictionary currently being calculated. Therefore, the character can be recognized without calculating the distance (NP) (P is a multiple of A) times. .

Claims

In the matching method of character recognition,

Inputting a character image to remove noise of the image and perform normalization;

A second step of extracting a feature from characters in the first step;

A third step of calculating a distance between a feature of each character in the second step and a feature of a dictionary;

If the distance value (D) in the third step is larger than the reference value Th, the fourth step of calculating the distance with the dictionary of the next character, otherwise it comprises a fourth step of repeating the distance calculation for the number of features (N) of the character Character recognition matching method, characterized in that.

The method of claim 1, wherein the reference value Th of the fourth process is a distance value that is unlikely to be recognized.