KR19990056813A

KR19990056813A - Contact Character Separation Method using Individual / Horizontal Ratio

Info

Publication number: KR19990056813A
Application number: KR1019970076831A
Authority: KR
Inventors: 고동익
Original assignee: 구자홍; 엘지전자 주식회사
Priority date: 1997-12-29
Filing date: 1997-12-29
Publication date: 1999-07-15

Abstract

본 발명은 개별글자의 가로/세로비를 이용한 접촉문자 분리 방법에 관한 것으로, 종래에는 현재는 문서들이 좀더 복잡해지고 다양한 글자체를 섞어서 작성된 문서가 많아짐에 따라 종래 개별문자의 폭과 공간의 평균폭을 이용하여 물리적으로 인접문자를 융합하거나 절단하는 방법을 사용할 경우 다양한 글자체의 변화에 능동적으로 대처할 수 없는 문제점이 있다. 따라서 본 발명은 문자열로 부터 각각의 개별문자를 추출하는 개별문자 추출단계와, 상기 개별문자 추출단계에 추출한 개별문자가 접촉되었는지 접촉되지 않았는지를 체크하는 접촉여부 판단단계와, 상기 접촉여부 판단단계에서 접촉되지 않은 개별문자를 인식하는 개별문자 인식단계와, 상기 개별문자 인식단계에서 인식한 개별문자의 가로/세로비를 구하고 이를 이용하여 예상 분리지역을 추출하는 예상 분리지역 추출단계와, 상기 예상 분리지역 추출단계에서 추출한 예상 분리지역을 이용하여 접촉문자를 분리한 후 인식하도록 하는 접촉문자 분리단계로 동작하도록 하여 다양한 글자체에 대해 능동적으로 접촉분리를 수행할 수 있도록 하고, 전체 처리시간을 현격히 감소시킬 수 있도록 하고, 문서의 상태의 안 좋은 경우에도 안정된 인식결과를 보장할 수 있도록 한 것이다.The present invention relates to a method of separating contact letters using the horizontal / vertical ratio of individual letters. In the related art, as the documents become more complicated and more documents are created by mixing various fonts, the width of the conventional individual letters and the average width of the space are increased. When using a method of fusing or cutting physically adjacent characters by using this method, there is a problem in that it is not possible to actively cope with changes in various fonts. Therefore, in the present invention, an individual character extraction step of extracting each individual character from a character string, a contact determination step of checking whether or not the individual character extracted in the individual character extraction step is in contact, and in the contact determination step, An individual character recognition step of recognizing individual characters not in contact, an expected separation region extraction step of extracting an expected separation region using the horizontal / vertical ratio of the individual characters recognized in the individual character recognition step, and the expected separation By using the expected separation area extracted in the area extraction step, it operates as a contact character separation step that separates and recognizes the contact character, enabling active contact separation for various fonts, and significantly reducing the overall processing time. To ensure stable recognition even when the document is in poor condition. It is to ensure.

Description

Contact Character Separation Method using Individual / Horizontal Ratio

본 발명은 문자 인식시 여러 개의 문자가 접촉되어 있을 경우 이를 분리하기 위한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법에 관한 것으로, 특히 인식 대상의 글자체(font)를 추정하고 그 글자체의 가로/세로비를 이용하여 접촉 분리지역을 예상한 후 그 분리 예상 지역에서 특징값들을 추출하여 분리할 수 있도록 함으로써 어떤 글자체에도 유동성 있게 분리할 수 있도록 한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법에 관한 것이다.The present invention relates to a method of separating contact characters using the horizontal / vertical ratio of an individual character for separating when a plurality of characters are contacted during character recognition. In particular, the present invention estimates a font to be recognized and transposes the font. Expect contact separation area by using / vertical ratio, and then extract the feature values from the expected separation area to separate them. It is about a method.

도 1은 종래 문자 분리방법에 대한 동작 과정도로서, 이에 도시된 바와같이, 입력된 영상에 대한 수평방향의 누적 흑화소와 임계값을 비교하여 문자열의 상하위치를 구하여 문자열을 분리하는 문자열 추출단계와, 상기 문자열 추출단계에서 구한 문자열에 대해 수직방향의 흑화소 히스토그램을 구해 수직방향의 누적흑화소 수를 이용하여 개별문자를 추출하는 개별문자 분리 후보 결정단계와, 상기 개별문자 분리 후보 결정단계에서 추출된 개별문자의 시작점과 끝점을 이용하여 각 문자들의 평균폭과 각 문자들 사이의 평균간격을 구하는 문자간격 추정단계와, 상기 문자간격 추정단계에서 구한 간격정보를 이용하여 문자간을 통합하거나 절단하는 문자간 통합 및 분리단계로 이루어진다.1 is an operation process diagram for a conventional character separation method, as shown in this, string extraction step of separating the string by obtaining the upper and lower positions of the string by comparing the threshold value and the accumulated black pixels in the horizontal direction for the input image In the step of determining the individual character separation candidate for extracting the individual characters using the cumulative number of black pixels in the vertical direction by obtaining a black pixel histogram in the vertical direction for the string obtained in the character string extraction step, and in the individual character separation candidate determination step Character spacing estimating step of obtaining average width of each character and average spacing between each character by using extracted starting point and end point of individual characters, and combining or truncating between characters using interval information obtained in the character spacing estimation step It consists of the integration and separation between characters.

이와같이 이루어진 종래 기술에 대하여 상세히 설명하면 다음과 같다.Referring to the prior art made in this way in detail as follows.

스캐너를 통해 입력된 문서의 이진영상이 입력되면, 그 읽어들인 문서의 수평방향의 누적흑화소를 구해 이 구한 값과 임계값을 비교하여 문자열의 상하위치를 구해 문서로 부터 문자열을 분리한다.When a binary image of a document input through a scanner is input, the horizontal cumulative black pixels of the read document are obtained, and the upper and lower positions of the string are obtained by comparing the obtained value with the threshold value, and the string is separated from the document.

이렇게 분리한 문자열에 대해 수직방향의 흑화소 히스토그램을 구한 후 누적 흑화소 수를 이용하여 개별문자를 추출한다.A vertical black pixel histogram is obtained for the separated character strings, and individual characters are extracted using the cumulative number of black pixels.

상기에서 추출된 개별문자의 시작점과 끝점을 이용하여 각 문자들의 평균폭과 각 문자들 사이의 평균간격을 구한다.The average width of each character and the average interval between each character are obtained by using the start and end points of the individual characters extracted above.

이렇게하여 평균폭과 평균간격이 구해지면 그 구해진 값을 이용하여 분리된 문자의 경우엔 문자를 융합하고 접촉문자인 경우에는 접촉문자를 분리한다.In this way, when the average width and the average interval are obtained, the obtained values are fused using the obtained values, and the contact characters are separated in the case of contact characters.

여기서 접촉문자 분리는 인접 개별문자의 가로폭이 평균 문자폭보다 작고 두 문자를 합한 문자의 폭이 이미 구한 폭과 비슷하면 이 두 인접문자를 합쳐 하나의 문자로 만들고, 인접 개별문자의 가로폭이 이미 구한 평균문자 간격보다 일정량이 크면 그 간격정보를 이용하여 절단한다.In this case, if the width of adjacent characters is smaller than the average character width and the width of the combined characters is similar to the already obtained width, the two adjacent characters are combined to make one character, and the width of adjacent individual characters is If a certain amount is larger than the average character interval already obtained, it is cut using the interval information.

이상에서와 같은 방법으로 문자를 분리하여 인식한다.Characters are separated and recognized in the same way as above.

그러나, 상기에서와 같은 종래기술에서 현재는 문서들이 좀더 복잡해지고 다양한 글자체를 섞어서 작성된 문서가 많아짐에 따라 종래 개별문자의 폭과 공간의 평균폭을 이용하여 물리적으로 인접문자를 융합하거나 절단하는 방법을 사용할 경우 다양한 글자체의 변화에 능동적으로 대처할 수 없는 문제점이 있다.However, in the prior art as described above, as documents become more complicated and more documents are created by mixing various fonts, there is a method of physically fusing or cutting adjacent characters using the width of the conventional individual characters and the average width of the space. When used, there is a problem that can not actively cope with changes in various fonts.

예를 들면, 도 2의 영역1, 영역2, 영역3에서와 같이 각기 다른 글자체로 작성된 문서에서 경우 각 영역의 문자를 모두 만족스럽게 분리할 수 없는 문제점이 있다.For example, in a document written in different fonts as in regions 1, 2, and 3 of FIG. 2, there is a problem in that all characters of each region cannot be satisfactorily separated.

따라서 상기에서와 같은 종래의 문제점을 해결하기 위한 본 발명의 목적은 문자열로 부터 개별문자를 추출하고, 그 추출한 문자중 접촉되 않은 글자체의 가로/세로비를 구하고 이를 이용하여 접촉문자의 예상 분리지역을 추출할 수 있도록 한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법을 제공함에 있다.Therefore, an object of the present invention for solving the conventional problems as described above is to extract the individual characters from the character string, to obtain the horizontal / vertical ratio of the non-contact font among the extracted characters and to use the expected separation region of the contact character The present invention provides a method of separating contact characters using the horizontal / vertical ratio of an individual character to extract the.

본 발명의 다른 목적은 분리하고자 하는 인식 대상의 글자체를 이용하여 접촉문자의 예상 분리지역을 추출함으로써 다양한 글자체에 대해 능동적으로 접촉분리를 수행할 수 잇도록 한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법을 제공함에 있다.Another object of the present invention is to extract the expected separation region of the contact character by using the font of the recognition object to be separated contact by using the horizontal / vertical ratio of the individual characters to actively perform contact separation for a variety of fonts It provides a method of character separation.

본 발명의 또 다른 목적은 전체 처리시간의 현격한 감소를 가져오도록 한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법을 제공함에 있다.It is still another object of the present invention to provide a method for separating contact characters using the width / length ratio of an individual character to bring about a drastic reduction in overall processing time.

본 발명의 또 다른 목적은 문서의 상태의 안 좋은 경우에도 안정된 인식결과를 보장할 수 있도록 한 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법을 제공함에 있다.It is still another object of the present invention to provide a method of separating contact characters using horizontal / vertical ratios of individual characters so that stable recognition results can be ensured even in a poor document state.

도 1은 종래 문자 분리방법에 대한 동작 과정도.1 is an operation process diagram for a conventional character separation method.

도 2는 종래 가로/세로비가 일반적인 글자체와 다른 경우를 보여주는 글자체 유형도.Figure 2 is a typeface showing a case in which the conventional horizontal / vertical ratio is different from the general font.

도 3은 글자의 폭이 좁은 글자체 유형도.Figure 3 is a narrow font type of characters.

도 4는 영역1의 글자체와 영역2의 글자체가 상당히 다른 특성을 갖는 형태를 보여주는 글자체 유형도.4 is a font type diagram showing a form in which the font of area 1 and the font of area 2 have significantly different characteristics.

도 5는 본 발명 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법에 대한 동작 과정도.5 is an operation process diagram for a contact character separation method using the horizontal / vertical ratio of the individual character of the present invention.

상기 목적을 달성하기 위한 본 발명은 문자열로 부터 각각의 개별문자를 추출하는 개별문자 추출단계와, 상기에 추출한 개별문자가 접촉되었는지 접촉되지 않았는지를 체크하는 접촉여부 판단단계와, 상기에서 접촉되지 않은 개별문자를 인식하는 개별문자 인식단계와, 상기에서 인식한 개별문자의 가로/세로비를 구하고 이를 이용하여 예상 분리지역을 추출하는 예상 분리지역 추출단계와, 상기에서 추출한 예상 분리지역을 이용하여 접촉문자를 분리한 후 인식하도록 하는 접촉문자 분리단계로 이루어진 것을 특징으로 한다.The present invention for achieving the above object is an individual character extraction step of extracting each individual character from the character string, the contact determination step of checking whether the extracted individual character is in contact or not, and the contact is not An individual character recognition step of recognizing individual characters, an expected separation region extraction step of extracting an expected separation region using the obtained horizontal / vertical ratio of the individual characters recognized above, and contacting using the expected separation region extracted above. Characterized in that the contact character separation step to recognize after separating the characters.

이하, 첨부한 도면에 의거하여 상세히 살펴보면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 ?는 본 발명 개별문자의 가로/세로비를 이용한 접촉문자 분리 방법에 대한 동작 과정도로서, 이에 도시한 바와같이, 문자열로 부터 각각의 개별문자를 추출하는 개별문자 추출단계(S11)와, 상기 개별문자 추출단계(S11)에 추출한 개별문자가 접촉되었는지 접촉되지 않았는지를 체크하는 접촉여부 판단단계(S12)와, 상기 접촉여부 판단단계(S12)에서 접촉되지 않은 개별문자를 인식하는 개별문자 인식단계(S13)와, 상기 개별문자 인식단계(S13)에서 인식한 개별문자의 가로/세로비를 구하고 이를 이용하여 예상 분리지역을 추출하는 예상 분리지역 추출단계(S14)와, 상기 예상 분리지역 추출단계(S14)에서 추출한 예상 분리지역을 이용하여 접촉문자를 분리한 후 인식하도록 하는 접촉문자 분리단계(S15)로 이루어진다.? Is an operation process diagram for a contact character separation method using the horizontal / vertical ratio of the individual characters of the present invention, as shown in the individual character extraction step (S11) for extracting each individual character from the character string, Contact character determination step (S12) for checking whether the individual character extracted in the individual character extraction step (S11) has touched or not contacted, and individual character recognition for recognizing the individual character not touched in the contact status determination step (S12) Step (S13), and the expected separation region extraction step (S14) for extracting the expected separation region using the horizontal / vertical ratio of the individual characters recognized in the individual character recognition step (S13) and using this, and extracting the expected separation region The contact character separation step S15 is performed to recognize the contact character after separating the contact character using the expected separation region extracted in step S14.

이와같이 각 단계로 이루어진 본 발명의 동작 및 작용 효과에 대하여 상세히 설명하면 다음과 같다.When described in detail with respect to the operation and effect of the present invention made of each step as follows.

예를 들어, 도 3에서와 같이 문자 폭이 좁고 위아래로 길어서 일반적인 글자체와 상이한 문자열이 입력되면, 그 문자열로 부터 먼저 Franz와 같은 개별문자를 추출한다.(S11)For example, as shown in FIG. 3, when a character width is narrow and long up and down, a character string different from a general font is input, and then individual characters such as Franz are first extracted from the character string (S11).

이렇게 개별문자를 추출한 다음 처음에 있는 글자체 F가 그 다음 글자와 접촉되어 있는지 접촉되어 있지 않은지를 체크한다.(S12)After extracting the individual characters, it is checked whether the first letter F is in contact with the next letter or not.

체크한 결과, 글자체 F가 접촉되지 않고 떨어져 있으면 그 F문자를 인식한다.(S13)As a result of the check, if the letter F is separated without contact, the letter F is recognized (S13).

이렇게 문자를 인식한 다음 그 인식한 문자의 가로/세로비를 구하여 저장해둔다.(S13)After the character is recognized, the width and height ratio of the recognized character is obtained and stored. (S13)

그러면 상기에서 구한 가로/세로비로 부터 글자체가 상당히 폭이 좁다는 것을 알 수 있게 된다.Then, it can be seen that the font is quite narrow from the horizontal / vertical ratio obtained above.

그런다음 F다음의 문자를 읽어들인다.Then read the character after F.

읽어들인 문자가 "ranz"과 같이 접촉 문자일 경우, 이 접촉부분의 높이를 구한다.If the read character is a contact character such as "ranz", the height of this contact part is obtained.

이렇게 구한 접촉부분의 높이와 앞에서 구한 개별 글자체의 가로/세로비를 연산하여 예상 분리지역을 추출한다.The expected separation area is extracted by calculating the heights of the contact parts and the horizontal and vertical ratios of the individual fonts.

즉, 접촉 부분을 예측한다.(S14)That is, the contact portion is predicted (S14).

상기에서 예측한 예상 분리지역을 이용하여 접촉문자를 분리하고, 이 분리한 문자를 인식한다.(S15)Contact characters are separated using the expected separation region predicted above, and the separated characters are recognized (S15).

그리고, 그 다음 어절에 있는 "Alt"도 같은 글자체로 판단되므로 이미 뽑아놓은 개별 글자체의 가로/세로비값을 적용하여 접촉을 분리한다.In addition, since "Alt" in the next word is also determined to be the same font, the contact is separated by applying the horizontal / vertical ratio values of the individual fonts already drawn.

상기 가로/세로비의 값은 매 어절마다 갱신되며, 어절이 바뀌었거나 줄이 바뀌었을 경우 처음의 접촉문자가 나타나면 이전의 글자체와의 유사성을 판별하여 유사하면 기존의 값을 그대로 적용하고, 유사하지 않으면 다시 갱신하여 접촉문자를 분리한다.When the word is changed or the line is changed, the value of the horizontal / vertical ratio is updated every word, and when the first contact character appears, the similarity with the previous font is determined. If not, update again to separate the contact characters.

그리고, 도 4에서와 같이 영역1의 글자체와 영역2의 글자체의 특성이 상당히 다르게 나타날 경우, 영역1에서 구한 가로/세로비의 값은 영역2에서 새로이 갱신된다.As shown in FIG. 4, when the characteristics of the font of the area 1 and the font of the area 2 appear quite differently, the value of the horizontal / vertical ratio obtained in the area 1 is newly updated in the area 2.

따라서, 본 발명은 문자열로 부터 개별문자를 추출하고, 그 추출한 문자중 접촉되 않은 글자체의 가로/세로비를 구하고 이를 이용하여 접촉문자의 예상 분리지역을 추출하여 접촉문자를 분리하도록 함으로써 다양한 글자체에 대해 능동적으로 접촉분리를 수행할 수 있도록 하고, 전체 처리시간을 현격히 감소시킬 수 있도록 하고, 문서의 상태의 안 좋은 경우에도 안정된 인식결과를 보장할 수 있도록 한 효과가 있다.Accordingly, the present invention extracts the individual characters from the character string, obtains the horizontal / vertical ratio of the non-contact font among the extracted characters, and extracts the expected separation region of the contact character using the extracted characters to separate the contact characters. It is possible to actively carry out contact separation, to significantly reduce the overall processing time, and to ensure stable recognition even in case of poor document status.

Claims

An individual character extraction step of extracting each individual character from a character string, a contact determination step of checking whether or not the individual character extracted in the individual character extraction step is in contact, and an individual not contacted in the contact determination step; In the individual character recognition step of recognizing the character, the expected separation region extraction step of extracting the expected separation region using the horizontal / vertical ratio of the individual characters recognized in the individual character recognition step, and the expected separation region extraction step Contact character separation method using the horizontal / vertical ratio of the individual character, characterized in that consisting of the contact character separation step to recognize after separating the contact character using the extracted expected separation region.

The method of claim 1, wherein the horizontal / vertical ratio of the individual characters is updated every word in the expected separation region extraction step.