KR19990010213A

KR19990010213A - Character Recognition Method with Improved Matching Speed

Info

Publication number: KR19990010213A
Application number: KR1019970032911A
Authority: KR
Inventors: 김준호; 도정인
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-07-15
Filing date: 1997-07-15
Publication date: 1999-02-05
Also published as: KR100243220B1

Abstract

본 발명은 문자 인식 방법에 관한 것으로서, 특히 문자 인식을 위한 매칭 속도를 개선한 문자 인식 방법에 관한 것이다.The present invention relates to a character recognition method, and more particularly, to a character recognition method with improved matching speed for character recognition.

본 발명은 입력되는 문자 영상으로부터 특징 벡터를 추출하는 특징 벡터 추출 단계, 발생 빈도율이 소정 값 이상인 참조 벡터를 추출하여 별도로 분류한 상위 그룹의 참조 벡터들과 상기 특징 벡터를 순차적으로 매칭시켜 최소 거리를 갖는 참조 벡터를 추출하는 단계, 상기 최소 거리와 소정의 임계값을 비교하는 단계, 상기 비교 단계의 비교 결과 매칭 거리가 소정의 임계값보다 크거나 같은 경우, 발생 빈도율이 소정 값 이하인 참조 벡터를 추출하여 별도로 분류한 하위 그룹의 참조 벡터들과 상기 특징 벡터를 순차적으로 매칭시켜 최소 거리를 갖는 참조 벡터를 추출하는 단계 및 상기 비교 단계의 비교 결과 매칭 거리가 소정의 임계값보다 작은 경우에는 상기 상위 그룹의 참조 벡터 중에서 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하고, 그렇지 않은 경우에는 상기 하위 그룹 참조 벡터 중에서 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하는 단계를 포함함을 특징으로 한다.According to the present invention, a feature vector extraction step of extracting a feature vector from an input character image and a minimum distance by sequentially matching the feature vectors and the reference vector of a higher group which are separately classified by extracting a reference vector having a frequency of occurrence greater than or equal to a predetermined value Extracting a reference vector having a predetermined value; comparing the minimum distance with a predetermined threshold; and when the matching result of the comparing step is greater than or equal to a predetermined threshold, a reference vector having an occurrence frequency less than or equal to a predetermined value Extracting a reference vector having a minimum distance by sequentially matching the reference vectors of the subgroup classified separately and the feature vector, and extracting a reference vector having a minimum distance, and when the matching distance is smaller than a predetermined threshold, Output the character code corresponding to the reference vector with the minimum distance among the reference vectors of the upper group If, otherwise it characterized in that it comprises the step of outputting the character code corresponding to the reference vector having the minimum distance from the reference sub-group vector.

본 발명에 의하면 문자 인식에 필요한 참조 벡터와의 매칭 횟수를 줄여 문자 인식을 위한 매칭 시간을 단축시킬 수 있는 효과가 있다.According to the present invention, it is possible to shorten the matching time for character recognition by reducing the number of matching with a reference vector required for character recognition.

Description

Character Recognition Method with Improved Matching Speed

디지털 기술의 발전과 함께 정보화 사회로 변화해 가면서 새로 발생되는 문서 및 기존에 발생된 문서들을 디지털 매체에 저장해야 하는 필요성이 빠르게 증가되고 있다. 문서들을 디지털 매체에 저장하면 정보 검색이 매우 용이할 뿐만 아니라, 종이로 저장하는 방법에 비하여 매우 적은 부피로 장기간 저장할 수 있는 장점이 있다.With the development of digital technology, as the information society changes, the necessity of storing newly generated documents and existing documents in digital media is rapidly increasing. Storing documents in a digital medium not only makes information retrieval very easy, but also has the advantage of being stored for a long time in a very small volume compared to the method of storing in paper.

일반적으로 위와 같이 문서를 디지털 매체에 저장하기 위한 방법은 키 보드를 사용하여 사람이 직접 입력시키는 방법과 광학 문자 인식을 사용하여 자동으로 입력시키는 방법이 있다.In general, as described above, a method for storing a document in a digital medium includes a method of directly inputting a person using a keyboard and an automatic method of inputting using optical character recognition.

위의 방법 중에서 키 보드를 이용한 수동 입력 방법은 입력 속도가 느리고, 많은 인력이 필요하여 비용이 많이 드는 단점이 있다. 이에 비하여 문자 인식에 의한 자동 입력 방법은 빠른 속도로 문서를 저장할 수 있는 장점이 있다.Among the above methods, the manual input method using the keyboard has a disadvantage that the input speed is slow and requires a lot of manpower, which is expensive. On the other hand, the automatic input method by character recognition has an advantage of storing a document at a high speed.

그러나, 종래의 기술에 의한 문자 인식을 통하여 문서를 저장하는 방법은 입력 문자의 특징 벡터를 모든 참조 벡터들과 매칭시켜 최소 거리의 참조 벡터를 찾아내는 방법에 의하여 문자를 인식한다. 그런데 문서가 한글로 기록되어 있는 경우에, 한글을 사용한 문자의 수가 1만자 이상이고, KSC-5601 표준 코드에서 정한 문자 수도 2350자이고, 또한 한글 문서에서는 한글 이외에 한자와 영문자를 혼용하고 있어, 매칭시켜야 할 대상 문자는 더욱 늘어나게 되어, 문자를 인식하여 이에 상응하는 참조 벡터를 찾는데 소요되는 시간이 증가되는 문제점이 있었다.However, a method of storing a document through character recognition according to the related art recognizes a character by matching a feature vector of an input character with all reference vectors and finding a reference vector of a minimum distance. However, if the document is written in Korean, the number of characters using Hangul is 10,000 or more, the number of characters determined by the KSC-5601 standard code is 2350, and the Hangul document uses both Chinese and English characters in addition to Korean. The target character to be increased further, which increases the time required to recognize the character and find the corresponding reference vector.

본 발명이 이루고자 하는 기술적 과제는 상술한 문제점을 해결하기 위하여 발생 빈도율이 높은 참조 벡터와 그렇지 않은 참조 벡터를 분류한 후, 입력 문자에 상응하는 참조 벡터를 발생 빈도율이 높은 그룹의 참조 벡터들로부터 순차적으로 찾아내는 매칭 속도를 개선한 문자 인식 방법을 제공하는데 있다.The technical problem to be solved by the present invention is to classify a reference vector having a high frequency of occurrence and a reference vector that is not, in order to solve the above problems, and then to generate a reference vector corresponding to the input character, the reference vectors of a group having a high frequency of occurrence The present invention provides a character recognition method with improved matching speed.

도 1은 본발명에 의한 매칭 속도를 개선한 문자 인식 방법의 흐름도이다.1 is a flowchart of a character recognition method for improving a matching speed according to the present invention.

도 2는 도 1의 흐름도가 적용된 문자 인식 장치의 블록도이다.2 is a block diagram of a character recognition apparatus to which the flowchart of FIG. 1 is applied.

상기 기술적 과제를 달성하기 위하여 본 발명에 의한 매칭 속도를 개선한 문자 인식 방법은 입력되는 문자 영상의 특징 벡터와 소정의 참조 벡터를 매칭시켜 상기 특징 벡터에 상응하는 참조 벡터에 대한 문자 코드를 출력시키는 문자 인식 방법에 있어서, 입력되는 문자 영상으로부터 특징 벡터를 추출하는 특징 벡터 추출 단계, 상기 소정의 참조 벡터들 중에서 발생 빈도율이 소정 값 이상인 참조 벡터를 추출하여 별도로 분류한 상위 그룹의 참조 벡터들과 상기 특징 벡터를 순차적으로 매칭시켜 각각의 거리를 구한 다음, 최소 거리를 갖는 참조 벡터를 추출하는 상위 그룹 참조 벡터 매칭 단계, 상기 상위 그룹 참조 벡터 매칭 단계에서 구한 최소 거리와 소정의 임계값을 비교하는 매칭 거리 비교 단계, 상기 매칭 거리 비교 단계의 비교 결과 매칭 거리가 소정의 임계값보다 크거나 같은 경우, 상기 소정의 참조 벡터들 중에서 발생 빈도율이 소정 값 이하인 참조 벡터를 추출하여 별도로 분류한 하위 그룹의 참조 벡터들과 상기 특징 벡터를 순차적으로 매칭시켜 각각의 거리를 구한 다음, 최소 거리를 갖는 참조 벡터를 추출하는 하위 그룹 참조 벡터 매칭 단계 및 상기 매칭 거리 비교 단계의 비교 결과 매칭 거리가 소정의 임계값보다 작은 경우에는 상기 상위 그룹의 참조 벡터 중에서 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하고, 그렇지 않은 경우에는 상기 하위 그룹 참조 벡터 매칭 단계에서 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하는 문자 코드 출력 단계를 포함함을 특징으로 한다.In order to achieve the above technical problem, a character recognition method having an improved matching speed according to the present invention outputs a character code for a reference vector corresponding to the feature vector by matching a feature vector of an input character image with a predetermined reference vector. A character recognition method comprising: extracting a feature vector from an input character image; extracting a reference vector having a frequency of occurrence greater than or equal to a predetermined value from among the predetermined reference vectors, and classifying reference vectors of a higher group; Matching each feature vector sequentially to obtain respective distances, and then comparing a minimum group obtained from the upper group reference vector matching step of extracting a reference vector having a minimum distance from the upper group reference vector matching step and a predetermined threshold value. Matching distance comparison step, the comparison result of the matching distance comparison step Is greater than or equal to a predetermined threshold, a reference vector having a frequency of occurrence less than or equal to a predetermined value is extracted from the predetermined reference vectors, and the feature vectors are sequentially matched with reference vector of a sub-group classified separately. After finding the distance and comparing the subgroup reference vector matching step and the matching distance comparing step to extract the reference vector having the minimum distance, if the matching distance is smaller than a predetermined threshold, the minimum distance among the reference vectors of the upper group is determined. And a character code outputting step of outputting a character code corresponding to a reference vector having a reference vector, and otherwise outputting a character code corresponding to a reference vector having a minimum distance in the subgroup reference vector matching step.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예에 대하여 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1에 도시된 바와 같이, 본 발명에 의한 매칭 속도를 개선한 문자 인식 방법은 문자 영상을 입력하여 특징 벡터를 추출하는 단계(110, 120), 특징 벡터와 발생 빈도가 높은 상위 그룹의 참조 벡터들을 순차적으로 매칭시켜 매칭 거리를 연산하여 최소 거리를 갖는 참조 벡터를 추출하는 상위 그룹 참조 벡터 매칭 단계(130), 상위 그룹에서 연산된 최소 거리와 임계값을 비교하는 매칭 거리 비교 단계(140), 특징 벡터와 발생 빈도가 낮은 하위 그룹의 참조 벡터들을 순차적으로 매칭시켜 매칭 거리를 연산하여 최소 거리를 갖는 참조 벡터를 추출하는 하위 그룹 참조 벡터 매칭 단계(150), 추출된 참조 벡터에 상응하는 문자 코드를 출력하는 문자 코드 출력 단계(160)를 구비한다.As shown in FIG. 1, in the character recognition method with improved matching speed according to the present invention, the method may include extracting a feature vector by inputting a character image (110, 120), and the reference vector of the feature vector and the higher frequency group. Matching the two groups sequentially to calculate a matching distance to extract a reference vector having a minimum distance reference vector matching step 130, matching distance comparison step 140 to compare the minimum distance and the threshold calculated in the upper group, Subgroup reference vector matching step 150 of extracting a reference vector having a minimum distance by calculating a matching distance by sequentially matching feature vectors with reference vectors of a low occurrence frequency group, and a character code corresponding to the extracted reference vector. Character code output step 160 for outputting.

문자 인식에 있어서, 입력되는 문자를 인식하여 이를 대응시키기 위해 데이터 베이스에 저장하여 놓은 발생 가능한 문자 패턴들의 벡터 값을 모아 놓은 것이 참조 벡터 그룹이다.In character recognition, a reference vector group is a collection of vector values of possible character patterns stored in a database in order to recognize and correspond to an input character.

본 발명에서는 위의 참조 벡터들을 발생 빈도율이 높은 것과 그렇지 않은 참조 벡터들을 상위 그룹의 참조 벡터와 하위 그룹의 참조 벡터 군으로 분류하여 놓는다.In the present invention, the above reference vectors are classified into high occurrence frequencies and non-higher reference vectors into the reference vector of the upper group and the reference vector group of the lower group.

이 때 발생 빈도율은 문자 인식을 실행하는 분야에 따라서 다르게 나타날 것이므로, 상위 그룹의 참조 벡터 군과 하위 그룹의 참조 벡터 군의 구분은 문자 인식을 실행하면서 학습에 의하여 상/하위 그룹의 참조 벡터 군을 분류시킬 수 있다.In this case, the frequency of occurrence will be different depending on the field of character recognition. Therefore, the classification of the reference vector group of the upper group and the reference vector group of the lower group can be distinguished by the reference vector group of the upper / lower group by learning while performing the character recognition. Can be classified.

그리고, 도 2는 도 1에 도시된 본 발명의 흐름도가 적용되는 문자 인식 장치의 블록도로서, 입력되는 문서의 문자 영상을 감지하여 입력시키는 문자 영상 입력부(210), 문자 영상을 분석하여 특징 벡터를 추출하는 판독부(220), 문자 영상의 특징 벡터를 저장하는 입력 데이터 메모리(240), 참조 벡터들을 발생 빈도율에 따라서 그룹을 분류하여 저장한 데이터 베이스(250), 입력된 문자를 인식하여 이에 상응하는 문자 코드를 저장하는 출력 데이터 저장부(260), 입력 데이터 메모리(240)의 특징 벡터를 데이터 베이스(250)에 분류되어 저장된 참조 벡터들과의 매칭 거리의 연산 및 비교를 실행시키는 제어를 실행하는 제어부(230)를 구비한다.2 is a block diagram of a character recognition apparatus to which the flowchart of the present invention shown in FIG. 1 is applied, a character image input unit 210 that detects and inputs a character image of an input document, and analyzes a character image. A readout unit 220 for extracting an image, an input data memory 240 for storing a feature vector of a character image, a database 250 for classifying and storing groups of reference vectors according to a generation frequency rate, and recognizing an input character. A control for executing calculation and comparison of matching distances with reference vectors classified and stored in the database 250 by a feature vector of the output data storage unit 260 and the input data memory 240 storing a character code corresponding thereto. It includes a control unit 230 for executing.

그러면 도 2의 블록도를 참조하여 본 발명의 흐름도를 설명하기로 한다.Next, a flowchart of the present invention will be described with reference to the block diagram of FIG. 2.

문자 영상 입력 단계(110)는 문자를 인식하여 저장하고자 하는 문서의 문자를 광학 센서에 의하여 수신하고, 수신되는 문자 신호를 소정의 개수(n)의 원소로 구성된 격자에 투사시킨다.The text image input step 110 receives a text of a document to be recognized and stored by an optical sensor, and projects the received text signal onto a grid composed of a predetermined number n of elements.

특징 벡터 추출 단계(120)는 격자를 통하여 수신되는 문자 영상의 고유 특성값을 1 과 0의 이진값 또는 실수값으로 n개의 원소에 대하여 값을 부여한다. 따라서 이러한 원소들의 값을 조합하여 특징 벡터를 만든다.The feature vector extraction step 120 assigns the unique characteristic values of the character image received through the grid to n elements as binary or real values of 1 and 0. Thus, the values of these elements are combined to form a feature vector.

상위 그룹 패턴 매칭 단계(130)는 위의 단계에서 추출한 특징 벡터와 발생 빈도율이 높은 참조 벡터들로 구성된 상위 그룹의 벡터들을 순차적으로 매칭시키면서 매칭 거리를 계산하여 최소 거리를 갖는 참조 벡터를 추출하는 단계이다.The upper group pattern matching step 130 extracts a reference vector having a minimum distance by calculating a matching distance while sequentially matching vectors of the upper group composed of the feature vector extracted in the above step and the reference vectors having a high frequency of occurrence. Step.

매칭 거리 비교 단계(140)는 상위 그룹 패턴 매칭 단계(130)에서 특징 벡터와 매칭되어 추출된 최소 거리를 갖는 참조 벡터의 최소 거리와 에러 없이 문자를 인식할 수 있는 최대한의 매칭 거리인 임계값을 비교하는 단계이다. 비교 결과, 최소 거리가 임계값보다 작은 경우에는 매칭된 참조 벡터에 상응하는 문자를 특징 벡터에 상응하는 문자로 인식할 수 있는 경우에 해당되며, 그렇지 않은 경우에는 최소 거리로 매칭된 참조 벡터에 상응하는 문자를 특징 벡터에 상응하는 문자로 인식할 수 없는 경우에 해당된다.The matching distance comparing step 140 may determine a minimum distance of the reference vector having the minimum distance extracted by matching the feature vector from the upper group pattern matching step 130 and a threshold value that is the maximum matching distance for recognizing characters without errors. Comparing step. As a result of the comparison, when the minimum distance is smaller than the threshold, the character corresponding to the matched reference vector can be recognized as the character corresponding to the feature vector. Otherwise, the character corresponds to the reference vector matched by the minimum distance. This is the case that the character to be recognized as a character corresponding to the feature vector cannot be recognized.

하위 그룹 참조 벡터 매칭 단계(150)는 매칭 거리 비교 단계(140)의 매칭 거리 비교 결과 특징 벡터와 상위 그룹의 참조 벡터의 최소 매칭 거리가 임계값보다 크거나 같은 경우에는, 상위 그룹의 참조 벡터들 중에는 입력 문자로 인식할 수 있는 참조 벡터가 존재하지 않은 경우이므로 추출한 특징 벡터와 발생 빈도율이 낮은 참조 벡터들로 구성된 하위 그룹의 벡터들을 순차적으로 매칭시키면서 매칭 거리를 계산하여 최소 거리를 갖는 참조 벡터를 추출하는 단계이다.The lower group reference vector matching step 150 may refer to the reference vectors of the upper group if the minimum matching distance between the feature vector and the reference vector of the higher group is greater than or equal to the threshold as a result of the matching distance comparison step 140. Since there is no reference vector that can be recognized as an input character, a reference vector having a minimum distance is calculated by matching a extracted feature vector with a subgroup of low-frequency reference vectors sequentially and calculating a matching distance. Extracting step.

문자 코드 출력 단계(160)는 매칭 거리 비교 단계(140)의 매칭 거리 비교 결과 특징 벡터와 상위 그룹의 참조 벡터들을 매칭시켜 구한 최소 매칭 거리가 임계값보다 작은 경우에는 상위 그룹의 참조 벡터들 중에서 특징 벡터에 매칭되어 추출한 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하고, 상위 그룹의 참조 벡터들을 매칭시켜 구한 최소 매칭 거리가 임계값보다 크거나 같은 경우에는 하위 그룹의 참조 벡터들 중에서 특징 벡터에 매칭되어 추출한 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력한다.The character code outputting step 160 may be performed using a feature from among the reference vectors of the upper group when the minimum matching distance obtained by matching the feature vector and the reference vector of the upper group with the matching distance comparison result 140 is smaller than the threshold. Outputs a character code corresponding to the reference vector having the minimum distance matched with the vector and extracts the feature code, and when the minimum matching distance obtained by matching the reference vectors of the upper group is greater than or equal to the threshold value, among the reference vectors of the lower group The character code corresponding to the reference vector having the minimum distance matched with and extracted is output.

이상의 단계들을 통하여 입력되는 문자 영상에서 특징 벡터를 추출하여 데이터 베이스에 저장된 상위 그룹의 참조 벡터들과 매칭시켜 최소 거리를 갖는 참조 벡터를 추출하고, 이 최소 거리가 임계값보다 작은 경우에는 최소 거리를 갖는 참조 벡터를 입력 문자로 인식하여 이에 상응하는 문자 코드를 출력한다. 그런데 상위 그룹의 참조 벡터 군에 특징 벡터와의 매칭 거리가 임계값보다 작은 참조 벡터가 없는 경우에, 데이터 베이스에 저장된 하위 그룹의 참조 벡터들과 특징 벡터의 매칭 거리를 계산하여 최소 거리를 갖는 참조 벡터에 상응하는 문자 코드를 출력하여 입력되는 문자를 인식할 수 있게 하였다.Through the above steps, the feature vector is extracted from the input character image and matched with the reference vectors of the upper group stored in the database to extract the reference vector having the minimum distance. When the minimum distance is smaller than the threshold, the minimum distance is extracted. A reference vector having the same character is recognized as an input character and a character code corresponding thereto is output. However, when there is no reference vector whose feature vector is smaller than the threshold in the reference vector group of the upper group, the reference has the minimum distance by calculating the matching distance between the reference vector and the feature vector of the lower group stored in the database. The character code corresponding to the vector is output so that the input character can be recognized.

참조 벡터들을 발생 빈도에 따라서 상/하위 그룹으로 분류하여 참조 벡터들을 매칭시킴으로써, 일 실시 예로 다음과 같은 효과를 얻는다.By matching the reference vectors by classifying the reference vectors into upper / lower groups according to the frequency of occurrence, the following effects are obtained in one embodiment.

한글 2350자에 대해 실제 문서에서 발생하는 빈도율이 높은 순서로 누적하여 81.59%가 될 때의 문자수를 200자라고 하고, 이 200자에 해당하는 입력 문자 영상의 60%가 하위 그룹의 참조 벡터 군에서 최소 거리로 매칭되는 참조 벡터를 찾는 하위 그룹 참조 벡터 매칭 단계(150)를 거치지 않으며, 참조 벡터는 2350개의 각 문자마다 하나씩만 있다고 가정할 때, 개선 효과는 다음과 같다.For 2350 Hangul characters, the number of characters occurring when the frequency of occurrence in the actual document accumulates in order of high is 81.59% is 200 characters, and 60% of the input character images corresponding to the 200 characters are the reference vector group of the subgroup. Assuming that there is no subgroup reference vector matching step 150 to find a reference vector matching the minimum distance in, and that there is only one reference vector for each 2350 characters, the improvement effect is as follows.

먼저 종래의 방법으로 2350자의 문자 영상을 2350개의 참조 벡터에 대해 모두 매칭을 하면 총 5522500번의 매칭을 수행한다. 그런데 본 발명의 다단계 매칭을 적용하면 각 문자당 평균 매칭 횟수는 상위 빈도율 81.59%의 200자 중 60%인 120자는 200번의 매칭을 하고 나머지 40% 80자와 하위 빈도율 19.41%이하의 2150자는 2350번의 매칭을 하게 되므로 총 5240500번의 매칭을 하게 된다. 이 경우에는 5.11%의 매칭 횟수가 줄어드는 효과가 발생한다.First, when all 2350 character images are matched with 2350 reference vectors, a total of 5522500 matches are performed. However, if the multi-level matching of the present invention is applied, the average number of matches per character is 60 200% of the 200 characters of the upper frequency rate 81.59%, 200 matches are performed, and the remaining 40% 80 characters and the lower frequency rate of 19.41% or less 2150 characters Since 2350 matches are made, a total of 5240500 matches are made. In this case, a 5.11% matching count is reduced.

그러나 문자의 발생 빈도율은 실제 문서에서 각 문자가 얼마나 자주 나타나는가를 반영한 것으로, 누적 빈도율 상위 81.59% 문자의 60%, 즉 전체의 48.95%가 200번의 매칭을 하고 나머지 누적 빈도율 51.05%가 2350번의 매칭을 하므로, 실제 문서에서는 전체 매칭 횟수의 44.79%가 줄어드는 효과를 얻는다.However, the frequency of occurrence of characters reflects how often each character occurs in the actual document, with 60% of the top 81.59% cumulative frequency rate, or 48.95% of the total, matching 200 times and the remaining cumulative frequency of 51.05% being 2350. Since we do one match, we get 44.79% of the total number of matches in the actual document.

따라서 상위 그룹을 구성하는 참조 벡터들의 발생 빈도율을 위의 81.59%에서 보다 높이면 하위 그룹의 참조 벡터들을 매칭시키는 하위 그룹 참조 벡터 매칭 단계(150)를 실행하는 비율이 낮아지므로 매칭 시간을 단축시킬 수 있게 된다. 그러나, 상위 그룹을 구성하는 참조 벡터들의 발생 빈도율을 너무 높이게 되면, 하위 그룹의 참조 벡터들을 매칭시키는 비율은 낮아지나, 상위 그룹의 매칭 시간이 증가하게 되어 전체적인 참조 벡터의 매칭 시간 단축의 개선율이 낮아지게 된다. 그러므로 상위 그룹과 하위 그룹을 구분하는 발생 빈도율은 실험에 의하여 적절한 값에서 설정하면 특징 벡터에 대응하는 참조 벡터를 찾아내는 매칭 속도를 최대로 높일 수 있게 된다.Therefore, if the frequency of occurrence of the reference vectors constituting the upper group is higher than the above 81.59%, the matching time of the lower group reference vector matching step 150 for matching the reference vectors of the lower group is lowered, thereby reducing the matching time. Will be. However, if the frequency of occurrence of the reference vectors constituting the upper group is too high, the rate of matching the reference vectors of the lower group is lowered, but the matching time of the upper group is increased, and the improvement rate of shortening the matching time of the overall reference vector is increased. Will be lowered. Therefore, if the frequency of occurrence that distinguishes the upper group from the lower group is set at an appropriate value by experiment, the matching speed for finding the reference vector corresponding to the feature vector can be maximized.

위에서 설명한 일 실시 예에서는 참조 벡터들을 발생 빈도율에 따라서 상위 그룹과 하위 그룹 2개로 분류하였으나, 발생 빈도율을 보다 세부적으로 나누어 3개 이상의 참조 벡터 그룹으로 나누어 최상위 발생 빈도율을 갖는 참조 벡터 그룹부터 최하위 발생 빈도율을 갖는 참조 벡터 그룹으로 순차적으로 매칭시키면 2개 그룹으로 분류하여 참조 벡터들을 매칭시키는 경우에 비하여 매칭 속도를 더욱 높일 수 있게 된다.In the above-described embodiment, the reference vectors are classified into two groups, the upper group and the lower group according to the occurrence frequency rate. However, the reference vector group having the highest occurrence frequency rate is further divided into three or more reference vector groups. When sequentially matching the reference vector group having the lowest occurrence frequency rate, the matching speed can be further increased as compared with the case of matching the reference vectors by dividing into two groups.

상술한 바와 같이 본 발명에 의하면 데이터 베이스에 저장된 참조 벡터들을 발생 빈도율이 높은 것과 그렇지 않은 참조 벡터들을 상위 그룹의 참조 벡터와 하위 그룹의 참조 벡터 군으로 분류하여 특징 벡터에 상위 그룹의 참조 벡터들을 우선 매칭시켜 검색하고 나서, 입력 문자로 인식할 수 있는 참조 벡터가 존재하지 않는 경우에 하위 그룹의 참조 벡터들을 매칭시켜 문자를 인식시킴으로써, 문자 인식에 필요한 참조 벡터와의 매칭 횟수를 줄여 문자 인식을 위한 매칭 시간을 단축시킬 수 있는 효과가 있다.As described above, according to the present invention, reference vectors stored in a database are classified into reference vectors of a higher group and reference groups of a lower group and higher reference frequencies that are not generated. If there is no reference vector that can be recognized as an input character, the character is recognized by matching the reference vectors of the subgroups and reducing the number of matching with the reference vector required for character recognition. There is an effect that can shorten the matching time.

Claims

A character recognition method of matching a feature vector of an input character image with a predetermined reference vector and outputting a character code for a reference vector corresponding to the feature vector,

A feature vector extraction step of extracting a feature vector from an input character image;

Among the predetermined reference vectors, a reference vector having an occurrence frequency greater than or equal to a predetermined value is extracted and sequentially matched with the reference vectors of a higher group classified separately and the feature vector to obtain respective distances, and then a reference vector having a minimum distance. A higher group reference vector matching step of extracting a;

A matching distance comparing step of comparing the minimum distance obtained in the upper group reference vector matching step with a predetermined threshold value;

When the matching distance is greater than or equal to a predetermined threshold as a result of the comparing of the matching distances, the reference vectors of subgroups which are separately classified by extracting a reference vector having a frequency of occurrence less than or equal to a predetermined value from among the predetermined reference vectors Subgroup reference vector matching step of sequentially matching the feature vectors to obtain respective distances, and then extracting a reference vector having a minimum distance; And

When the matching distance is smaller than a predetermined threshold as a result of the comparison of the matching distance comparing step, a character code corresponding to the reference vector having the minimum distance among the reference vectors of the higher group is output; otherwise, the lower group reference vector And a character code outputting step of outputting a character code corresponding to the reference vector having the minimum distance in the matching step.

2. The method of claim 1, wherein the reference vector of the upper group and the reference vectors of the lower group are classified into at least three groups by subdividing the occurrence frequency rate, and sequentially performing matching from the reference vectors of the group having a high occurrence rate. Character recognition method with improved matching speed characterized in that.