KR100718185B1

KR100718185B1 - Apparatus for recognizing character and method for operating the apparatus

Info

Publication number: KR100718185B1
Application number: KR1020060009920A
Authority: KR
Inventors: 정철곤; 문영수; 김지연; 김상균; 김진형; 조규태; 이성훈; 김진식; 김민수
Original assignee: 삼성전자주식회사
Priority date: 2006-02-02
Filing date: 2006-02-02
Publication date: 2007-05-15

Abstract

본 발명에 따른 문자 인식 방법은, 소정의 낱자 이미지의 특징치(Feature) 정보 및 형식 정보를 판독하는 단계; 상기 특징치 정보 및 형식 정보에 따라, 상기 낱자 이미지의 초성 정보, 중성 정보, 또는 종성 정보를 생성하는 단계; 상기 초성 정보, 중성 정보, 또는 종성 정보에 따라 상기 낱자 이미지를 판독하기 위한 복수 개의 슈도(Pseudo) 낱자를 생성하고, 상기 각 슈도 낱자에 대응하는 비분할 신뢰도(Confidence Value)를 산출하는 단계; 상기 각 슈도 낱자에 대응하는 비분할 신뢰도 중 가장 높은 값을 갖는 1순위 비분할 신뢰도와 차기 높은 값을 갖는 2순위 비분할 신뢰도의 차가 소정의 임계값 이상인 경우, 상기 1순위 비분할 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식하는 단계; 및 상기 1순위 신뢰도 및 2순위 신뢰도의 차가 상기 임계값 미만인 경우, 상기 낱자 이미지를 자소 분할하여 상기 각 슈도 낱자에 대응하는 분할 신뢰도를 산출하고, 상기 각 슈도 낱자에 대응하는 상기 분할 신뢰도 및 상기 비분할 신뢰도를 서로 곱한 결과 가장 높은 값의 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식하는 단계를 포함하는 것을 특징으로 한다.The character recognition method according to the present invention comprises the steps of: reading feature information and format information of a predetermined single image; Generating initial information, neutral information, or final information of the single image according to the feature value information and format information; Generating a plurality of pseudo words for reading the single image according to the initial information, the neutral information, or the final information, and calculating a non-split confidence value corresponding to each of the pseudo words; If the difference between the first-order non-partitioned reliability having the highest value and the second-order non-partitioned reliability having the next higher value is greater than or equal to a predetermined threshold value, the pseudo-first-partitioned reliability Recognizing a piece as the piece image; And when the difference between the first-rank reliability and the second-rank reliability is less than the threshold value, subdivides the single image to calculate a divided reliability corresponding to each of the pseudo words, and the divided reliability and the ratio corresponding to each of the pseudo words. And a step of recognizing a pseudo piece having the highest reliability as a result of multiplying the reliability to be obtained as the single image.

문자인식, 자소 분할, 자소 비분할, 폰트, 홀리스틱, 컴포넌트 Character Recognition, Phoneme Segmentation, Phoneme Segmentation, Font, Holistic, Component

Description

Character recognition apparatus and its method {APPARATUS FOR RECOGNIZING CHARACTER AND METHOD FOR OPERATING THE APPARATUS}

도 1은 종래 기술에 따른 자소 분할 문자 인식 방법의 예를 도시한 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram showing an example of a method of character segmentation character recognition according to the prior art.

도 2는 본 발명의 일실시예에 따른 자소 기반 문자 인식 방법의 전체 흐름을 도시한 순서도.2 is a flow chart showing the overall flow of the phoneme-based character recognition method according to an embodiment of the present invention.

도 3은 본 발명의 일실시예에 따른 문자 인식 장치의 구성을 도시한 블록도.3 is a block diagram showing the configuration of a character recognition apparatus according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 따른 낱자 이미지 특징치 정보의 생성 원리를 도시한 도면.4 is a diagram illustrating a principle of generating single image feature value information according to an embodiment of the present invention;

도 5는 본 발명의 일실시예에 따라 각 초성 클래스 스코어를 산출하는 초성 인식기를 도시한 도면FIG. 5 is a diagram illustrating a primitive recognizer for calculating each primitive class score according to an embodiment of the present invention.

도 6은 본 발명의 일실시예에 따른 자소 분할부가 낱자 이미지를 자소 분할한 자소 분할 이미지를 도시한 도면.FIG. 6 is a diagram illustrating a phoneme-divided image in which a phoneme division part phoneme-divided a piece image according to an embodiment of the present invention; FIG.

도 7은 본 발명에 따른 문자 인식 방법을 구현하는데 채용될 수 있는 범용 컴퓨터 시스템의 내부 블록도.7 is an internal block diagram of a general purpose computer system that may be employed to implement the character recognition method in accordance with the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

210 : 특징치 정보 판독 단계210: feature value reading step

220 : 형식 정보 판독 단계220: format information reading step

230 : 비분할 신뢰도 산출 단계230: non-partitioned reliability calculation step

240 : 1순위 비분할 신뢰도 및 2순위 비분할 신뢰도 차와 임계값 비교 단계240: comparison of the first-order non-partitioned reliability and the second-order non-partitioned reliability difference and threshold value

250 : 분할 신뢰도 산출 단계250: partition reliability calculation step

260 : 낱자 인식 단계260: Step Recognition

본 발명은 자소 기반 문자인식 방법 및 장치에 관한 것으로, 더욱 상세하게는 동영상의 자막 문자를 낱자로 분류하고, 상기 낱자의 자소를 분리하지 않고 그대로 상기 낱자를 인식하는 비분할 자소 인식 방법 및 상기 낱자의 자소를 정확히 분리하여 상기 낱자를 인식하는 개선된 분할 자소 인식 방법을 상황에 따라 적절히 활용하여 상기 낱자를 인식함으로써, 문자 폰트의 변화나 부정확성에 상관없이 보다 정확하게 문자를 인식할 수 있는 자소 기반 문자인식 방법 및 장치에 관한 것이다.The present invention relates to a method and a device for character-based character recognition, and more particularly, to classify subtitle characters of a video into characters, and to recognize the characters as they are without separating the characters of the characters. Phoneme-based character that can recognize characters more accurately regardless of changes or inaccuracies of character fonts by appropriately utilizing the improved segmentation recognition method that recognizes the word by separating the phoneme correctly. A recognition method and apparatus are provided.

최근 들어 IT 업계는 동영상 서비스 및 기기 전쟁에 들어갔다 해도 과언이 아닐 정도로 각종 영상 매체의 보급이 활발해지고 있다. 위성 DMB 방송, 지상파 DMB 방송, 데이터 방송, 인터넷 방송 등 신규 영상 서비스가 시작되면서 통신, 인터넷 서비스, 디지털 장비 등의 IT 전분야에서 동영상 서비스 산업이 블루오션(Blueocean)으로 떠오르고 있다. Recently, the IT industry has been spreading various video media such that it is no exaggeration to enter the video service and device war. As new video services such as satellite DMB broadcasting, terrestrial DMB broadcasting, data broadcasting, and internet broadcasting have begun, the video service industry is emerging as blue ocean in all areas of IT such as telecommunication, internet service, and digital equipment.

위성/지상파 DMB 방송을 계기로 '손 안의 TV 시대'가 열렸고, 각 이동 통신 사들도 콘텐츠 업계와의 제휴를 통해 자사의 데이터 방송을 통한 동영상 서비스를 확대해 나가고 있다. 또한, 인터넷 포털 사이트들도 자체 제작하거나 콘텐츠 업체와의 제휴를 통해 확보한 동영상을 자사의 사이트 및 제휴 사이트 등을 통해 사용자들에게 제공하고 있다. Satellite / terrestrial DMB broadcasting has opened the era of TV in the hand, and mobile carriers are also expanding video services through their data broadcasting through partnerships with the content industry. In addition, Internet portal sites also provide users with their own videos and affiliated videos through their own production or partnership with content companies.

이외에도, TV 포털이 최근 서비스되고 있는데, TV 포털은 인터넷 TV의 전 단계로서, 포털 사이트가 제공하는 영화나 드라마 등을 사용자가 PC나 노트북, 휴대 단말기 등을 통해 주문형 비디오(VOD) 형태로 다운로드 또는 스트리밍하여 시청할 수 있는 서비스를 의미한다. 차후, 광대역 통합망을 통해 인터넷, 방송, 및 전화를 모두 인터넷 망으로 이용할 수 있는 TPS(Triple Play Service) 서비스가 본격화되면 동영상 콘텐츠에 대한 수요는 더욱 증가할 것이 자명하다.In addition, TV portals are recently being serviced. TV portals are a preliminary step of Internet TV, and users can download movies or dramas provided by the portal site in the form of video on demand (VOD) through a PC, a laptop or a mobile terminal. It means a service that can be streamed and watched. In the future, the demand for video content will increase further when the Triple Play Service (TPS) service, which can use the Internet, broadcast, and telephone all over the Internet through a broadband integrated network, is in full swing.

이와 같이, 영상 문화에 익숙한 젊은 세대에게 있어 동영상은 선택 사양이 아닌 필수 사양으로 자리잡고 있어, 동영상 관련 산업은 이제 IT 업계의 최대 경쟁력이 될 것이라는 평가도 나오고 있다. 이에 따라, DMB 단말기, PMP 등의 동영상 재생 단말기 시장도 나날이 확대되어 가고 있다.As such, for younger generations who are familiar with the video culture, video is not an option but an essential feature, and the video industry is now becoming the most competitive IT industry. Accordingly, the market for moving picture playback terminals such as DMB terminals and PMPs is expanding day by day.

이동통신 단말기 업체들은 위성 DMB 폰 및 지상파 DMB 폰을 경쟁적으로 출시하고 있고, MP3 플레이어 업체들은 DMB 방송을 지원하는 다양한 모델의 PMP를 개발하여 출시하고 있다. 또한, 최근에는 MP3 플레이어 또한 2인치 등의 소형 LCD를 디스플레이 수단으로 장착함으로써, 동영상의 재생 기능을 지원하고 있다. 이러한 각종 동영상 지원 단말기는 각 동영상 서비스를 하나의 단말기에서 모두 지원하는 컨버전스(Convergence) 제품으로 진화할 것이 확실시 되고 있다.Mobile terminal companies are competitively launching satellite DMB phones and terrestrial DMB phones, and MP3 player companies are developing and launching various models of PMPs that support DMB broadcasting. Recently, MP3 players also support a video playback function by mounting a small LCD such as 2 inches as a display means. These various video supporting terminals are sure to evolve into convergence products that support each video service in one terminal.

이와 같이, 동영상 서비스 및 단말기의 기능이 나날이 발전함에 따라, 서비스의 편의성을 추구하는 사용자의 욕구 또한 커지고 있다. 즉, 사용자는 이제 더 이상 단순한 동영상의 재생만을 단말기에 요구하지 않고, 보다 다양한 부가 기능을 지원하는 동영상 서비스를 제공 받기를 원한다.As such, as the functions of the video service and the terminal are developed day by day, the desire of the user for the convenience of the service is also increasing. That is, the user no longer requires the terminal to simply play the video, but wants to be provided with a video service supporting more various additional functions.

예로써, 동영상 요약 서비스가 있는데, 상기 동영상 요약 서비스라 함은 바쁜 일상 속에 사용자가 수 시간에 이르는 동영상을 모두 시청할 시간이 없을 경우, 상기 동영상의 요약 영상을 생성하여 사용자에게 제공하는 서비스를 의미한다. 이러한 동영상 요약 서비스는 출퇴근 시 등의 이동 중이나, 짧은 휴식 시간을 이용하여 자신의 휴대 단말기를 통해 동영상을 시청하는 바쁜 현대인의 일상에 적합하므로, 동영상 요약 서비스가 점차 확대될 것으로 예상된다.For example, there is a video summary service. The video summary service refers to a service that generates a summary image of the video and provides it to the user when the user does not have time to watch all the videos for several hours in a busy daily life. . Such a video summary service is suitable for the daily life of busy modern people who watch a video through their mobile terminals by using a short break while traveling, such as commuting, it is expected that the video summary service will be gradually expanded.

상기와 같은 동영상 요약 서비스 또는 동영상 검색 서비스는 소정의 영상 정보를 통해 구현될 수 있는데, 동영상의 자막 문자가 상기 영상 정보로 활용될 수 있다. 예를 들어, 자막 문자 정보를 이용하여 뉴스나 스포츠 등의 동영상을 요약 및 검색할 수 있다. The video summary service or video search service as described above may be implemented through predetermined image information. Subtitle characters of the video may be used as the image information. For example, subtitle text information may be used to summarize and search for a video such as news or sports.

이러한 자막 문자를 활용한 동영상 요약 및 검색 서비스를 구현하기 위해서는 상기 자막 문자를 신속하고 정확하게 인식할 수 있어야 한다. 또한, 스캐너 등을 이용한 명함 인식, 북 스캔 등의 스캐닝 동작도 정확한 문자 인식을 전제로 한다.In order to implement a video summary and search service using the caption characters, the caption characters should be recognized quickly and accurately. In addition, scanning operations such as business card recognition and book scanning using a scanner or the like also assume accurate character recognition.

종래의 문자 인식 방법으로는 낱자의 자소를 분할하여 각 자소를 인식한 후, 상기 인식된 자소를 합쳐 상기 낱자를 인식하는 자소 분할 문자 인식 방법이 있다. 이러한 자소 단위의 문자 인식 방법에 따르면, 도 1의 (a)에 도시된 바와 같이 낱자 단위인 경우보다 자소로 인식하는 경우가 폰트의 변화에 강건한 성질을 가지므로, 보다 정확한 문자 인식이 가능한 장점이 있다. A conventional character recognition method includes a phoneme segmentation character recognition method that recognizes each phoneme by dividing a phoneme of a single letter and then recognizes the phoneme by combining the recognized phonemes. According to the character recognition method of the phoneme unit, as shown in (a) of FIG. 1, since the case of recognition is a stronger character than the case of the letter unit, the advantage of enabling more accurate character recognition is provided. have.

그러나, 문자의 폰트가 다양하므로 자소를 분할 동작에 오류가 빈번하게 발생한다는 단점이 있다. 예를 들어, 도 1의 (b)에 도시된 바와 같이, 프로젝션 프로파일의 방법을 사용하는 경우 "세"라는 낱자가 "ㅅ", "ㅕ", "ㅏ"의 자소로 분리될 수 있다. 또한, 도 1의 (c)에 도시된 바와 같이, "뭇" 과 "믓"의 경우 각 폰트의 종류에 따라 중성과 종성이 서로 구별되기 어렵다. 이와 같이, 종래의 자소 분할 문자인식 방법에서는 오류가 빈번하게 발생하고, 특히 모음의 분할 동작에서 정확한 자소 분할을 기대하기 어렵다는 단점이 지적되고 있다.However, since the fonts of the characters are various, an error occurs frequently in the segmentation operation. For example, as shown in (b) of FIG. 1, when using the projection profile method, the words "three" may be separated into the letters "s", "s", and "s". In addition, as shown in Fig. 1C, in the case of " much " and " m " As described above, an error occurs frequently in the conventional phoneme segmentation character recognition method, and it is pointed out that it is difficult to expect an accurate phoneme segmentation especially in the segmentation operation of vowels.

또한, 종래의 유형 분류를 통해 자소를 인식하거나 문자를 인식하는 경우, 유형 분류는 획의 추출을 통해 수행되므로, 획이 두꺼운 폰트의 경우 획의 추출 동작 자체가 어렵다는 단점이 있다. 또, 유형 분류 동작 자체가 정확성을 수반하지 못하여 오류가 빈번하게 발생하므로, 전체적인 오인식 가능성이 크다.In addition, in the case of recognizing a letter or a character through the conventional type classification, the type classification is performed through the extraction of the stroke, and thus, in the case of a font having a thick stroke, the stroke extraction operation itself is difficult. In addition, since the type classification operation itself does not accompany accuracy, errors frequently occur, and thus, there is a high possibility of overall misrecognition.

또한, 종래의 모음 구조와 규칙기반 자소 분리 방법의 경우, 수평 모음과 수직 모음의 추출을 통해 자소를 분리한다. 그러나, 획이 두꺼운 인쇄체의 경우, 상기 획의 추출 동작이 어려워 자소 분할에 오류가 빈번하게 발생하고 있다. In addition, in the conventional vowel structure and rule-based phoneme separation method, the phoneme is separated by extracting the horizontal and vertical vowels. However, in the case of a printed body having a thick stroke, the stroke extraction operation is difficult, and an error occurs frequently in the segmentation.

또한, 종래의 배경 세선화 방법을 이용한 자소 분리 방법의 경우, 배경 세선화 알고리즘이 노이즈에 너무 민감하여, 비디오 문자와 같이 노이즈가 심한 동영상에는 적용하기 어렵다는 단점이 있다.In addition, in the case of the conventional phoneme separation method using the background thinning method, the background thinning algorithm is so sensitive to noise that it is difficult to apply to a video with a high noise such as a video character.

이러한 종래 기술의 문제점에 따라, 문자 폰트의 변화에 구애 받지 않고, 어떠한 문자라도 보다 정확하고 신속하게 인식할 수 있는 문자 인식 방법의 개발이 요구되고 있다.According to the problems of the prior art, there is a demand for the development of a character recognition method capable of recognizing any character more accurately and quickly, regardless of the change in the character font.

본 발명은 상기와 같은 종래 기술을 개선하기 위해 안출된 것으로서, 낱자를 자소로 분할하지 않고 상기 낱자 그대로 인식하는 비분할 자소 인식 방법을 통해 상기 낱자를 인식함으로써, 문자 인식 동작의 수행 시간을 단축할 수 있는 자소 기반 문자인식 방법 및 장치를 제공하는 것을 목적으로 한다.The present invention has been made to improve the prior art as described above, by reducing the execution time of the character recognition operation by recognizing the character through a non-divisional character recognition method that recognizes the character as it is without dividing the character into characters. It is an object of the present invention to provide a phoneme-based character recognition method and apparatus.

또한, 본 발명은 상기 비분할 자소 문자 인식 결과가 모호한 경우, 상기 낱자의 자소를 분할하여 인식하는 분할 자소 인식 방법을 통해 상기 낱자를 인식한 후, 상기 비분할 자소 인식 결과와 비교하여 상기 낱자를 인식함으로써, 문자 폰트의 변화에 상관 없이 정확하게 문자를 인식할 수 있는 자소 기반 문자인식 방법 및 장치를 제공하는 것을 목적으로 한다.In addition, the present invention, if the non-divisional phoneme character recognition result is ambiguous, after recognizing the character through a divisional phoneme recognition method for dividing the recognition of the phoneme of the character, and comparing the result with the non-divisional phoneme recognition result By recognizing, it is an object of the present invention to provide a phoneme-based character recognition method and apparatus capable of accurately recognizing a character regardless of a change in a character font.

또한, 본 발명은 낱자를 자소 분할하여 인식하는 경우, 상기 낱자의 형식 정보 및 곡률 정보를 이용하여 상기 자소 중 모음과의 연결 부위인 터칭 포인트(Touching Point)를 검출하여 자소 분할함으로써, 보다 정확하게 자소 분할을 수행할 수 있는 자소 기반 문자 인식 방법 및 장치를 제공하는 것을 목적으로 한다.According to the present invention, when the phoneme is divided and recognized, the phoneme is more accurately detected by using a phoneme type information and curvature information to detect a touch point, which is a connection point with a vowel, among the phonemes. An object of the present invention is to provide a method and apparatus for characterization-based character recognition capable of performing division.

상기의 목적을 이루고 종래기술의 문제점을 해결하기 위하여, 본 발명에 따른 문자 인식 방법은, 소정의 낱자 이미지의 특징치(Feature) 정보 및 형식 정보를 판독하는 단계; 상기 특징치 정보 및 형식 정보에 따라, 상기 낱자 이미지의 초성 정보, 중성 정보, 또는 종성 정보를 생성하는 단계; 상기 초성 정보, 중성 정보, 또는 종성 정보에 따라 상기 낱자 이미지를 판독하기 위한 복수 개의 슈도(Pseudo) 낱자를 생성하고, 상기 각 슈도 낱자에 대응하는 비분할 신뢰도(Confidence Value)를 산출하는 단계; 상기 각 슈도 낱자에 대응하는 비분할 신뢰도 중 가장 높은 값을 갖는 1순위 비분할 신뢰도와 차기 높은 값을 갖는 2순위 비분할 신뢰도의 차가 소정의 임계값 이상인 경우, 상기 1순위 비분할 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식하는 단계; 및 상기 1순위 신뢰도 및 2순위 신뢰도의 차가 상기 임계값 미만인 경우, 상기 낱자 이미지를 자소 분할하여 상기 각 슈도 낱자에 대응하는 분할 신뢰도를 산출하고, 상기 각 슈도 낱자에 대응하는 상기 분할 신뢰도 및 상기 비분할 신뢰도를 서로 곱한 결과 가장 높은 값의 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object and to solve the problems of the prior art, the character recognition method according to the present invention comprises the steps of: reading feature information and format information of a predetermined single image; Generating initial information, neutral information, or final information of the single image according to the feature value information and format information; Generating a plurality of pseudo words for reading the single image according to the initial information, the neutral information, or the final information, and calculating a non-split confidence value corresponding to each of the pseudo words; If the difference between the first-order non-partitioned reliability having the highest value and the second-order non-partitioned reliability having the next higher value is greater than or equal to a predetermined threshold value, the pseudo-first-partitioned reliability Recognizing a piece as the piece image; And when the difference between the first-rank reliability and the second-rank reliability is less than the threshold value, subdivides the single image to calculate a divided reliability corresponding to each of the pseudo words, and the divided reliability and the ratio corresponding to each of the pseudo words. And a step of recognizing a pseudo piece having the highest reliability as a result of multiplying the reliability to be obtained as the single image.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명한다.Hereinafter, with reference to the accompanying drawings will be described an embodiment of the present invention;

우선, 도 2를 참조하여 본 발명에 따른 자소 기반 문자 인식 방법의 전체 흐름을 간략하게 설명한 후, 도 3 내지 도 5를 참조하여 본 발명에 따른 문자 인식 장치의 자소 기반 문자 인식 방법을 상세히 설명하기로 한다.First, after briefly explaining the entire flow of the phoneme-based character recognition method according to the present invention with reference to FIG. 2, the phoneme-based character recognition method of the character recognition apparatus according to the present invention will be described in detail with reference to FIGS. 3 to 5. Shall be.

도 2는 본 발명의 일실시예에 따른 자소 기반 문자 인식 방법의 전체 흐름을 도시한 순서도이다.2 is a flowchart illustrating the overall flow of a phoneme-based character recognition method according to an embodiment of the present invention.

본 발명의 일실시예에 따른 문자 인식 장치는 비분할 자소 인식 방법 또는 분할 자소 인식 방법에 따라 문자를 인식할 수 있다. 동영상의 자막 문자 등의 낱 자 이미지가 입력되면, 상기 문자 인식 장치는 상기 낱자의 특징치(Feature) 정보를 판독한다(단계(210)). 상기 특징치 정보는 상기 낱자 이미지를 복수 개로 분할한 각 메시(Mesh) 이미지가 포함하는 픽셀(Pixel)의 벡터 정보로 구현될 수 있다.The character recognition apparatus according to an embodiment of the present invention may recognize a character according to a non-partitioned phoneme recognition method or a divisional phoneme recognition method. When a single image such as a caption character of a moving image is input, the character recognition device reads feature information of the single character (step 210). The feature value information may be implemented as vector information of a pixel included in each mesh image obtained by dividing the single image into a plurality.

상기 문자 인식 장치는 상기 낱자의 형식 정보를 판독한다(단계(220)). 상기 낱자 이미지의 형식 정보는 제1 형식 내지 제6 형식으로 구성된 한글 형식과 영문자, 숫자, 또는 특수 문자의 제7 형식 중 어느 하나의 형식으로 설정될 수 있다.The character recognition apparatus reads the format information of the letter (step 220). The format information of the single image may be set to any one of a Korean format consisting of first to sixth formats and a seventh format of an English letter, a number, or a special character.

상기 문자 인식 장치는 상기 특징치 정보 및 형식 정보에 따라, 상기 낱자 이미지의 초성 정보, 중성 정보, 또는 종성 정보를 생성한다. 상기 문자 인식 장치는 상기 초성 정보, 중성 정보, 또는 종성 정보에 따라 상기 낱자 이미지를 판독하기 위한 복수 개의 슈도(Pseudo) 낱자를 생성하고, 상기 각 슈도 낱자에 대응하는 비분할 신뢰도(Confidence Value)를 산출한다(단계(230)).The character recognition apparatus generates initial information, neutral information, or final information of the single image according to the feature value information and format information. The character recognition apparatus generates a plurality of pseudo words for reading the single image according to the initial information, the neutral information, or the final information, and generates a non-divided confidence value corresponding to each of the pseudo words. Calculation (step 230).

상기 문자 인식 장치는 상기 각 슈도 낱자에 대응하는 비분할 신뢰도 중 가장 높은 값을 갖는 1순위 비분할 신뢰도와 차기 높은 값을 갖는 2순위 비분할 신뢰도의 차를 소정의 임계값과 비교한다(단계(240)). 단계(240)에서, 상기 비교 결과 상기 신뢰도 차가 상기 임계값 이상인 경우, 상기 1순위 비분할 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식한다(단계(260)).The character recognition apparatus compares the difference between the first-order non-partitioned reliability having the highest value and the second-order non-partitioned reliability having the next higher value among the non-partitioned reliability corresponding to each pseudo word with a predetermined threshold value (step ( 240)). In step 240, when the difference of the reliability is greater than or equal to the threshold value, the pseudo word having the first non-segmented reliability is recognized as the piece image (step 260).

단계(240)에서, 상기 비교 결과 상기 신뢰도 차가 상기 임계값 미만인 경우, 상기 문자 인식 장치는 상기 낱자 이미지를 자소 분할하여 상기 각 슈도 낱자에 대응하는 분할 신뢰도를 산출한다(단계(250)). 이러한 경우, 상기 문자 인식 장치는 상기 각 슈도 낱자에 대응하는 상기 분할 신뢰도 및 상기 비분할 신뢰도를 서로 곱 한 결과 가장 높은 값의 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식한다(단계(260)). In operation 240, when the comparison result shows that the difference in reliability is less than the threshold value, the character recognition device may subdivide the single image to calculate a split reliability corresponding to each of the pseudo characters (step 250). In this case, the character recognition apparatus multiplies the split reliability and the non-segment reliability corresponding to each of the pseudo digits with each other and recognizes the pseudo gram having the highest reliability as the single image (step 260). .

이하에서는 도 3 내지 도 5를 참조하여 본 발명의 일실시예에 따른 문자 인식 장치의 구성 및 그에 따른 자소 기반 문자 인식 동작을 상세히 설명한다.Hereinafter, a configuration of a character recognition apparatus and a phoneme-based character recognition operation according to an embodiment of the present invention will be described in detail with reference to FIGS. 3 to 5.

도 3은 본 발명의 일실시예에 따른 문자 인식 장치의 구성을 도시한 블록도이다.3 is a block diagram showing a configuration of a character recognition apparatus according to an embodiment of the present invention.

본 발명에 따른 문자 인식 장치는 PVR(Personal Video Recorder), 홈 서버(Home Server), 스마트 모바일 서버(Smart Mobile Server), DVD 플레이어/레코더, PC, 노트북, PDA, 디지털 TV, 또는 이동통신 단말기 중 어느 하나로 구현될 수 있다. Character recognition apparatus according to the present invention is a PVR (Personal Video Recorder), home server (Home Server), Smart Mobile Server (Smart Mobile Server), DVD player / recorder, PC, laptop, PDA, digital TV, or mobile communication terminal It can be implemented as either.

본 발명의 일실시예에 따른 문자 인식 장치는 특징치 정보부(310), 형식 정보부(320), 비분할 자소 인식부(330), 낱자 인식 제어부(340), 및 분할 자소 인식부(350)를 포함하여 구성될 수 있다. 비분할 자소 인식부(330)는 초성 인식기(331), 중성 인식기(332), 및 종성 인식기(333)을 포함하여 구성될 수 있다. 분할 자소 인식부(350)는 자소 분할부(351), 자소 특징치 정보부(352), 초성 인식기(353), 중성 인식기(354), 및 종성 인식기(355)를 포함하여 구성될 수 있다.Character recognition apparatus according to an embodiment of the present invention is characterized in that the feature value information unit 310, the format information unit 320, the non-divisional phoneme recognition unit 330, the piece recognition control unit 340, and the divided phoneme recognition unit 350 It can be configured to include. The non-partitioned phoneme recognizer 330 may include a first recognizer 331, a neutral recognizer 332, and a final recognizer 333. The divided phoneme recognizer 350 may include a phoneme divider 351, a phoneme feature value information unit 352, a first recognizer 353, a neutral recognizer 354, and a seed recognizer 355.

특징치 정보부(310)는 입력되는 동영상의 자막 문자 중 소정의 낱자 이미지의 특징치(Feature) 정보를 판독한다. 상기 특징치 정보는 상기 낱자 이미지를 복수 개의 메시(Mesh) 이미지로 분할한 후, 각 메시 이미지가 포함하는 픽셀(Pixel)의 벡터 정보로 구현될 수 있다. 이는 도 4를 참조하여 상세히 설명한다.The feature value information unit 310 reads feature value information of a predetermined single image among subtitle characters of an input video. The feature value information may be implemented as vector information of a pixel included in each mesh image after dividing the single image into a plurality of mesh images. This will be described in detail with reference to FIG. 4.

도 4는 본 발명의 일실시예에 따른 낱자 이미지 특징치 정보의 생성 원리를 도시한 도면이다.4 is a diagram illustrating a principle of generating single image feature value information according to an embodiment of the present invention.

본 발명의 일실시예에 따르면, 특징치 정보부(310)는 상기 입력된 낱자 이미지를 복수 개의 메시 이미지로 분할한다. 예를 들어, 도 4의 (a)에 도시된 바와 같이, 특징치 정보부(310)는 상기 낱자 이미지를 6*6의 36개 메시 이미지로 분할할 수 있다. According to an embodiment of the present invention, the feature value information unit 310 divides the input single image into a plurality of mesh images. For example, as illustrated in FIG. 4A, the feature value information unit 310 may divide the single image into 36 mesh images of 6 * 6.

특징치 정보부(310)는 상기 분할된 각 메시 이미지에 포함된 픽셀의 벡터 정보를 판독한다. 예를 들어, 상기 36개의 메시 이미지 중 5*3 메시 이미지(410) 중 문자가 표시된 영역과 표시되지 않은 영역의 경계 영역에 방향 벡터가 도 4 (b)에서와 같이 설정될 수 있다.The feature value information unit 310 reads vector information of pixels included in each of the divided mesh images. For example, a direction vector may be set as shown in FIG. 4B in a boundary area between a region where characters are displayed and a region where characters are not displayed among 5 * 3 mesh images 410 among the 36 mesh images.

특징치 정보부(310)는 5*3 메시 이미지(410)에 표시된 상기 각 방향 벡터의 각도 정보를 판독한다. 즉, 상기 방향 벡터가 제1 각도(1도 내지 45도), 제2 각도(46도 내지 90도), 제3 각도(91도 내지 135도), 제4 각도(136도 내지 180도), 제5 각도(181도 내지 225도), 제6 각도(226도 내지 270도), 제7 각도(271도 내지 315도), 및 제8 각도(316도 내지 360) 중 어느 각도에 속하는지를 판독한다.The feature value information unit 310 reads angle information of the respective direction vectors displayed on the 5 * 3 mesh image 410. That is, the direction vector may be a first angle (1 degree to 45 degrees), a second angle (46 degrees to 90 degrees), a third angle (91 degrees to 135 degrees), a fourth angle (136 degrees to 180 degrees), Read which angle belongs to the fifth angle (181 degrees to 225 degrees), the sixth angle (226 degrees to 270 degrees), the seventh angle (271 degrees to 315 degrees), and the eighth angle (316 degrees to 360 degrees) do.

상기 방향 벡터의 각도 정보가 판독되면, 특징치 정보부(310)는 상기 각도 정보를 이용하여 도 4 (c)에 도시된 바와 같이 5*3 메시 이미지(410)의 특징치 정보를 생성한다. 상기 특징치 정보는 제1 각도 영역(421), 제2 각도 영역(422), 제3 각도 영역(423), 제4 각도 영역(424), 제5 각도 영역(425), 제6 각도 영역(426), 제7 각도 영역(427), 및 제8 각도 영역(428)을 포함한다.When angle information of the direction vector is read, the feature value information unit 310 generates feature value information of the 5 * 3 mesh image 410 using the angle information as shown in FIG. The feature value information includes a first angle region 421, a second angle region 422, a third angle region 423, a fourth angle region 424, a fifth angle region 425, and a sixth angle region ( 426, a seventh angular region 427, and an eighth angular region 428.

상기 예와 같이 5*3 메시 이미지(410)의 경우, 총 8개의 방향 벡터 중 제6 각도를 갖는 방향 벡터가 3개, 제7 각도를 갖는 방향 벡터가 2개, 제8 각도를 갖는 방향 벡터가 3개이므로, 특징치 정보부(310)는 5*3 메시 이미지(410)에 대응하여 "00000323" 이라는 특징치 정보를 생성할 수 있다.In the case of the 5 * 3 mesh image 410 as described above, three direction vectors having a sixth angle, two direction vectors having a seventh angle, and two direction vectors having an eighth angle among a total of eight direction vectors Since there are three, the feature value information unit 310 may generate feature value information of "00000323" corresponding to the 5 * 3 mesh image 410.

특징치 정보부(310)는 상기 낱자 이미지가 포함하는 모든 메시 이미지에 대응하는 특징치 정보를 생성한다. 상기와 같이 낱자 이미지가 36개의 메시 이미지로 분할되는 경우, 특징치 정보부(310)는 상기 낱자 이미지에 대응하여 288 자리수의 특징치 정보를 생성할 수 있다.The feature value information unit 310 generates feature value information corresponding to all mesh images included in the single image. When the single image is divided into 36 mesh images as described above, the feature value information unit 310 may generate 288 digit feature value information corresponding to the single image.

다시 도 3에서, 형식 정보부(320)는 상기 낱자 이미지의 형식 정보를 판독한다. 상기 형식 정보는 한글 6형식에 영문자, 숫자, 또는 특수 문자로 구성되는 7형식 분류법에 따른 형식 정보로 구현될 수 있다. 즉, 제1 형식 내지 제6 형식은 한글에 해당되고, 제7 형식은 영문자, 숫자, 또는 특수 문자에 해당된다.3 again, the format information unit 320 reads format information of the single image. The format information may be implemented as format information according to a seven-type classification system consisting of alphabets, numbers, or special characters in Korean six formats. That is, the first to sixth forms correspond to Hangul, and the seventh form corresponds to alphabetic characters, numbers, or special characters.

상기 제1 형식에 포함되는 한글은 가, 나, 다 등이 될 수 있다. 상기 제2 형식에 포함되는 한글은 구, 누, 도 등이 될 수 있다. 상기 제3 형식에 포함되는 한글은 귀, 뇌, 뒤 등이 될 수 있다. 상기 제4 형식에 포함되는 한글은 간, 넌, 단 등이 될 수 있다. 상기 제5 형식에 포함되는 한글은 군, 논, 돈 등이 될 수 있다. 상기 제6 형식에 포함되는 한글은 귄, 뉜, 된 등이 될 수 있다. 상기 제1 형식 내지 제3 형식은 받침을 포함하지 않는 한글로 구현될 수 있고, 상기 제4 형식 내지 제6 형식은 받침을 포함하는 한글로 구현될 수 있다. Hangul included in the first format may be ga, b, da, or the like. Hangul included in the second format may be phrase, nu, or do. Hangul included in the third form may be an ear, a brain, a back, or the like. Hangul included in the fourth format may be Gan, Non, Dan, and the like. Hangul included in the fifth form may be a group, a rice field, or a money. The Hangul included in the sixth form may be Gin, divided, or Han. The first to third forms may be implemented in Korean, which does not include a base, and the fourth to sixth forms may be implemented in Korean, including a base.

형식 정보부(320)는 멀티 클래스(Multi-Class) SVM을 이용하여 상기 형식 정 보의 분류를 수행할 수 있다.The type information unit 320 may classify the type information by using a multi-class SVM.

비분할 자소 인식부(330)는 상기 특징치 정보 및 형식 정보를 참조하여 상기 낱자 이미지의 초성 정보, 중성 정보, 또는 종성 정보를 생성한다. 비분할 자소 인식부(330)는 초성 인식기(331)를 통해 상기 낱자 이미지의 초성 정보를 생성하고, 중성 인식기(332)를 통해 상기 낱자 이미지의 중성 정보를 생성하며, 종성 인식기(333)를 통해 상기 낱자 이미지의 종성 정보를 생성한다.The non-segmented phoneme recognition unit 330 generates the initial information, the neutral information, or the final information of the single image by referring to the feature value information and the format information. The non-partitioned phoneme recognizer 330 generates the initial information of the single image through the initial recognizer 331, generates the neutral information of the single image through the neutral recognizer 332, and uses the final recognizer 333. Generates final information of the single image.

한글의 각 낱자는 초성, 중성, 또는 종성 중 어느 하나 이상을 포함하여 구성될 수 있다. 초성은 19개의 클래스로 구분될 수 있고, 중성은 5개 내지 9개의 클래스, 종성은 27개의 클래스로 구분될 수 있다. Each letter of Hangeul may be configured to include any one or more of the initial, neutral, or final. The first class can be divided into 19 classes, the neutral class can be divided into 5 to 9 classes, and the final class can be divided into 27 classes.

초성 인식기(331)는 상기 19개의 초성 클래스에 각각 대응하는 19개의 초성 분류기를 포함한다. 중성 인식기(332)는 상기 5개 내지 9개의 중성 클래스에 각각 대응하는 5개 내지 9개의 중성 분류기를 포함한다. 종성 인식기(333)는 상기 27개의 종성 클래스에 각각 대응하는 27개의 종성 분류기를 포함한다.The first classifier 331 includes 19 first classifiers respectively corresponding to the 19 first class. The neutral recognizer 332 includes five to nine neutral classifiers corresponding to the five to nine neutral classes, respectively. The species identifier 333 includes 27 species classifiers each corresponding to the 27 species classes.

상기 각각의 초성 분류기, 중성 분류기, 및 종성 분류기는 각자의 초성 클래스, 중성 클래스, 및 종성 클래스에 대응하는 클래스 특징치 정보를 유지할 수 있다. 예를 들어, 초성 분류기 중 ㄱ 분류기의 경우, 다양한 종류의 ㄱ 폰트 각각에 대응하는 특징치 정보를 유지할 수 있다. 이는 당업자의 판단에 따라 사전에 다양한 방법으로 미리 설정될 수 있다.Each of the initial classifier, the neutral classifier, and the final classifier may maintain class feature value information corresponding to each initial class, the neutral class, and the final class. For example, in the case of the a classifier among the initial classifiers, feature value information corresponding to each of the various types of a fonts may be maintained. This may be preset in various ways in advance according to the judgment of those skilled in the art.

비분할 자소 인식부(330)는 상기 낱자 이미지를 상기 각각의 클래스 분류기에 모두 대입하여 상기 각 클래스 분류기에 대응하는 클래스 스코어를 산출한다. 예를 들어, 상기 낱자 이미지가 "가"인 경우, 상기 ㄱ 분류기는 "가"의 특징치 정보 또는 형식 정보와 상기 ㄱ 분류기가 유지하고 있는 ㄱ 특징치 정보를 서로 비교하여, 그 유사성에 따른 ㄱ 클래스 스코어를 산출할 수 있다. 상기 ㄱ 클래스 스코어는 상기 낱자 이미지의 상기 초성 정보에 포함될 수 있다.The non-segmented phoneme recognition unit 330 substitutes all the single image into each class classifier to calculate a class score corresponding to each class classifier. For example, when the single image is "a", the a classifier compares a feature value information or type information of "a" with a feature value information maintained by the a classifier and compares the a according to the similarity. Class scores can be calculated. The a class score may be included in the initial information of the piece image.

도 5에 도시된 바와 같이, 초성 인식기(510)가 포함하는 ㄱ 분류기(511)가 이미지 "가"에 대응하는 ㄱ 클래스 스코어를 산출할 수 있다. 또한, ㄲ 분류기(512)도 "가"에 대응하는 ㄲ 클래스 스코어를 산출할 수 있다. 또한, ㅎ 분류기(519)도 "가"에 대응하는 ㅎ 클래스 스코어를 산출할 수 있다. 이때, 각 클래스 스코어 값은 ㄱ 클래스 스코어, ㄲ 클래스 스코어, ㅎ 클래스 스코어의 순서대로 산출될 수 있다.As shown in FIG. 5, the a classifier 511 included in the initial recognizer 510 may calculate a class score corresponding to the image “a”. The classifier 512 may also calculate a class class score corresponding to "A". The classifier 519 may also calculate a class score corresponding to "A". In this case, each class score value may be calculated in order of a class score, a class score, and a class score.

이와 같은 방법으로, 비분할 자소 인식부(330)는 초성 인식기(331), 중성 인식기(332), 및 종성 인식기(333)가 각각 포함하는 모든 클래스 분류기에 상기 낱자 이미지를 대입하여, 상기 낱자 이미지에 대응하여 각 클래스 분류기가 갖는 클래스 스코어를 산출할 수 있다. In this manner, the non-partitioned phoneme recognizer 330 substitutes the single image into all class classifiers included in the initial recognizer 331, the neutral recognizer 332, and the final recognizer 333, respectively. Corresponding to each other, the class score of each class classifier can be calculated.

다시 도 3에서, 낱자 인식 제어부(340)는 상기 초성 정보, 중성 정보, 또는 종성 정보에 따라 상기 낱자 이미지를 판독하기 위한 복수 개의 슈도(Pseudo) 낱자를 생성하고, 상기 각 슈도 낱자에 대응하는 비분할 신뢰도(Confidence Value)를 산출한다.3 again, the piece recognition control unit 340 generates a plurality of pseudo words for reading the piece image according to the initial information, the neutral information, or the final information, and the ratio corresponding to each of the pseudo words. The confidence value to calculate is calculated.

낱자 인식 제어부(340)는 상기 초성 정보가 포함하는 19개의 클래스 스코어를 각각 확률값으로 변환할 수 있다. 예를 들어, 상기 ㄱ 클래스 스코어에 대응하 는 확률값으로 P(ㄱ)을 산출할 수 있다. 낱자 인식 제어부(340)는 상기 클래스 스코어의 확률값 변환을 아래 수학식 1을 통해 수행할 수 있다.The piece recognition controller 340 may convert the 19 class scores included in the initial information into probability values. For example, P (a) may be calculated as a probability value corresponding to the a-class score. The word recognition control unit 340 may perform the transformation of the probability value of the class score through Equation 1 below.

[수학식 1] [Equation 1]

P(W_i｜X) = 1/(1 + exp(af(x)+b))P (W _i | X) = 1 / (1 + exp (af (x) + b))

- W_i : i번째 클래스 분류기W _i : Class i classifier

- X : 특징치 정보-X: feature information

- f(x) : 클래스 스코어f (x): class score

- a,b : 상수a, b: constant

상기 수학식 1을 통해 상기 클래스 스코어를 클래스 확률로 변환한 후, 낱자 인식 제어부(340)는 각각의 초성 클래스 확률, 중성 클래스 확률, 또는 종성 클래스 확률을 서로 곱하여, 각 확률 곱에 대응하는 클래스 결합에 따른 슈도(Pseudo) 낱자를 생성한다. After converting the class score into a class probability through Equation 1, the piece recognition control unit 340 multiplies each initial class probability, the neutral class probability, or the final class probability by combining the classes corresponding to each probability product. Generate pseudo-seed according to.

예를 들어, 낱자 인식 제어부(340)는 초성 클래스 확률로 "ㄱ 클래스 확률"과 중성 클래스 확률로 "ㅏ 클래스 확률"을 서로 곱할 수 있다. 이 때, 상기 확률 곱에 대응하여 낱자 인식 제어부(340)는 "가"라는 슈도 낱자를 생성할 수 있다. 이러한 각 클래스 확률 간의 곱은 모든 클래스 확률에 대해 수행되어 조합 가능한 모든 낱자가 생성될 수 있다. 따라서, 상기 생성된 각각의 슈도 낱자에 대응하여 상기 각 클래스 확률을 곱한 확률곱이 산출될 수 있다. 상기 확률곱은 상기 각 슈도 낱자의 비분할 신뢰도(Confidence Value)로 설정될 수 있다For example, the piece recognition control unit 340 may multiply the "a class probability" by the initial class probability and the "class class probability" by the neutral class probability. In this case, in response to the probability product, the piece recognition controller 340 may generate a pseudo piece of "a". The product of each of these class probabilities may be performed for all class probabilities to produce all combinable pieces. Accordingly, a probability product obtained by multiplying each of the class probabilities may be calculated in correspondence with each generated pseudo word. The probability product may be set to a non-segmented confidence value of each pseudo shoe.

낱자 인식 제어부(340)는 상기 생성한 모든 슈도 낱자 중에서 상기 비분할 신뢰도의 값이 가장 높은 1순위 슈도 낱자와 다음으로 높은 값의 비분할 신뢰도를 갖는 2순위 슈도 낱자를 선택한다. 이후, 낱자 인식 제어부(340)는 상기 1순위 슈도 낱자의 비분할 신뢰도와 상기 2순위 슈도 낱자의 비분할 신뢰도의 차를 산출한다.The piece recognition control unit 340 selects the first-order pseudo word having the highest value of the non-partitioned reliability and the second-order pseudo word having the next highest value of non-partitioned reliability among all the generated pseudo words. Thereafter, the piece recognition control unit 340 calculates a difference between the undivided reliability of the first-order pseudo word and the non-divided reliability of the second-order pseudo word.

상기 산출 결과, 상기 1순위 비분할 신뢰도와 상기 2순위 비분할 신뢰도의 차가 소정의 임계값 이상인 경우, 낱자 인식 제어부(340)는 상기 1순위 슈도 낱자를 상기 낱자 이미지로 인식한다. 상기 임계값은 당업자가 소정의 실험을 통해 오류 발생 빈도 통계를 분석하여 소정의 값으로 설정할 수 있다.As a result of the calculation, when the difference between the first rank undivided reliability and the second rank undivided reliability is greater than or equal to a predetermined threshold value, the piece recognition controller 340 recognizes the first rank pseudo piece as the piece image. The threshold value may be set by a person skilled in the art to a predetermined value by analyzing a frequency of error occurrence through a predetermined experiment.

상기 산출 결과, 상기 1순위 비분할 신뢰도와 상기 2순위 비분할 신뢰도의 차가 상기 임계값 미만인 경우, 낱자 인식 제어부(340)는 상기 낱자 이미지를 분할 자소 인식부(350)를 통해 인식할 수 있다.As a result of the calculation, when the difference between the first rank undivided reliability and the second rank undivided reliability is less than the threshold value, the piece recognition controller 340 may recognize the piece image through the divided phoneme recognizer 350.

분할 자소 인식부(350)는 상기와 같이 상기 1순위 비분할 신뢰도와 상기 2순위 비분할 신뢰도의 차가 상기 임계값 미만인 경우, 상기 낱자 이미지를 자소 분할한다. 즉, 분할 자소 인식부(350)는 자소 분할부(351)를 통해 상기 낱자 이미지의 자소를 분할한다.When the difference between the first-order non-partitioned reliability and the second-order non-partitioned reliability is less than the threshold value as described above, the divided phoneme recognizer 350 subdivides the single image. That is, the divided phoneme recognizer 350 divides the phoneme of the single image through the phoneme divider 351.

자소 분할부(351)는 상기 낱자 이미지의 형식 정보를 통해 자소 분할 영역을 설정하고, 상기 각 분할 영역에 포함된 상기 낱자 이미지의 터칭 포인트(Touching Point)에 따라 상기 낱자 이미지를 초성 이미지, 중성 이미지, 또는 종성 이미지로 분할할 수 있다. The phoneme dividing unit 351 sets a phoneme segmentation area through the format information of the piece image, and converts the piece image into the initial image and the neutral image according to a touching point of the piece image included in each of the divided areas. , Or a final image.

예를 들어, 도 6에 도시된 바와 같이 자소 분할부(351)는 낱자 이미지 "계"(610)에 대하여 컨투어 스무딩(Contour Smoothing)을 수행한 제1 자소 분할 이미지(620)를 생성한다. 이 후, 자소 분할부(351)는 상기 제1 자소 분할 이미지의 도미넌트 포인트(Dominant Point)를 검출하여 제2 자소 분할 이미지(630)를 생성한다. 상기 도미넌트 포인트란 곡률이 높은 포인트, 즉 상기 낱자 이미지에서 많이 꺽여지는 부분을 의미한다. 자소 분할부(351)는 상기 제2 자소 분할 이미지를 네이버후드 포인트(Neighborhood Point)를 제거하고, 비선형 커팅 패스(Nonlinear Cutting Path)를 추출하여 상기 낱자 이미지의 자소 분할 이미지(640)를 생성할 수 있다. For example, as illustrated in FIG. 6, the phoneme dividing unit 351 generates a first phoneme divided image 620 that performs contour smoothing on the single image “system” 610. Thereafter, the phoneme dividing unit 351 detects a dominant point of the first phoneme-divided image to generate a second phoneme-divided image 630. The dominant point refers to a point having a high curvature, that is, a portion that is bent a lot in the single image. The phoneme dividing unit 351 may remove a neighborhood point from the second phoneme divided image, extract a nonlinear cutting path, and generate a phoneme divided image 640 of the single image. have.

상기 낱자 이미지의 자소 분할 후, 자소 특징치 정보부(352)는 상기 낱자 이미지가 분할된 초성 이미지, 중성 이미지, 또는 종성 이미지의 각 특징치 정보를 판독한다. 상기 특징치 정보 판독은 도 4를 통해 설명한 특징치 정보부(310)의 특징치 정보 판독 동작과 동일하게 구현될 수 있다.After the phoneme segmentation of the single image, the phoneme feature value information unit 352 reads each feature value of the initial image, the neutral image, or the final image from which the single image is divided. The feature value information reading may be implemented in the same manner as the feature value reading operation of the feature value information unit 310 described with reference to FIG. 4.

이 후, 분할 자소 인식부(350)는 낱자 인식 제어부(340)에 의해 상기 생성된 모든 슈도 낱자 중 비분할 신뢰도가 높은 순서에 따라 N개의 슈도 낱자를 선택한다. 분할 자소 인식부(350)는 상기 N개의 슈도 낱자가 각각 포함하는 초성 클래스, 중성 클래스, 또는 종성 클래스를 각각 판독한다.Thereafter, the divided phoneme recognizing unit 350 selects N pseudo characters in the order of high degree of non-division reliability among all the pseudo characters generated by the character recognition control unit 340. The division phoneme recognizing unit 350 reads each of the initial class, the neutral class, or the final class each of the N pseudo words.

분할 자소 인식부(350)는 초성 인식기(353), 중성 인식기(354), 및 종성 인식기(355)를 통해 상기 낱자 이미지가 분할된 초성 이미지, 중성 이미지, 또는 종성 이미지가 상기 N개의 슈도 낱자 각각에 대응하여 갖는 초성 클래스 스코어, 중 성 클래스 스코어, 또는 종성 클래스 스코어를 각각 산출한다.The segmented phoneme recognizer 350 may include the first pseudo image, the neutral image, or the final image in which the single image is divided by the initial recognizer 353, the neutral recognizer 354, and the final recognizer 355, respectively. The initial class score, the neutral class score, or the final class score which have in correspondence to are calculated.

예를 들어, 상기 낱자 이미지가 "가" 이고, 상기 N개의 슈도 낱자가 비분할 신뢰도 순서에 따라 "기, 가, 계, 게, 피, 키, 지, 파, 카"인 경우, 분할 자소 인식부(350)는 상기 낱자 이미지 "가"가 자소 분할된 초성 이미지 "ㄱ" 및 중성 이미지 "ㅏ"의 상기 10개의 슈도 낱자 각각에 대한 클래스 스코어를 산출한다.For example, when the single image is "ga" and the N pseudo-words are "gi, ga, g, g, p, key, g, wave, ka" in undivided reliability order, the segmentation recognition is performed. The unit 350 calculates a class score for each of the 10 pseudo characters of the initial image " a " and the neutral image " '"

즉, 상기 1순위 슈도 낱자 "기"에 대한 상기 낱자 이미지 "가"의 각 클래스 스코어를 산출하는 경우, 초성 인식기(353)는 "ㄱ 분류기"를 통해 상기 초성 이미지 "ㄱ"의 초성 클래스 스코어를 산출한다. 또한, 중성 인식기(352)는 "ㅣ 분류기"를 통해 "ㅏ"의 중성 클래스 스코어를 산출한다. That is, when calculating the class score of the single image "a" for the first order pseudo word "group", the first class recognizer 353 calculates the initial class score of the initial image "a" through the "a classifier". Calculate. In addition, the neutral recognizer 352 calculates a neutral class score of "ㅏ" through the "|" classifier.

또한, 상기 10순위 슈도 낱자 "자"에 대한 상기 낱자 이미지 "가"의 각 클래스 스코어를 산출하는 경우, 초성 인식기(351)는 "ㅈ 분류기를 통해 상기 초성 이미지 "ㄱ"의 초성 클래스 스코어를 산출한다. 또한, 중성 인식기(352)는 "ㅏ 분류기"를 통해 "ㅏ"의 중성 클래스 스코어를 산출한다. Further, when calculating each class score of the single image "a" for the 10th rank pseudo word "child", the first class recognizer 351 calculates the initial class score of the initial image "a" through the "new" classifier. In addition, the neutral recognizer 352 calculates the neutral class score of "ㅏ" through the "ㅏ classifier".

이와 같이, 상기 10개의 슈도 낱자 각각에 대하여 상기 낱자 이미지의 각 클래스 스코어가 산출되면, 낱자 인식 제어부(340)는 각 슈도 낱자 별로 초성 클래스 스코어, 중성 클래스 스코어, 또는 종성 클래스 스코어를 서로 곱한 값을 산출한다. 상기 각 클래스 스코어의 곱은 상기 낱자 이미지의 상기 각 슈도 낱자 이미지에 대한 분할 신뢰도로 설정될 수 있다. As such, when each class score of the piece image is calculated for each of the 10 pseudo pieces, the piece recognition controller 340 multiplies the initial class score, the neutral class score, or the final class score by each pseudo piece. Calculate. The product of each class score may be set as a split reliability of the pseudo pseudo image of the single image.

상기 예를 들면, 상기 낱자 이미지 "가"의 상기 1순위 슈도 낱자 "기"에 대한 분할 신뢰도는, "ㄱ 분류기"를 통해 산출된 상기 초성 이미지 "ㄱ"의 초성 클래 스 스코어값과, "ㅣ 분류기"를 통해 산출된 상기 중성 이미지 "ㅏ"의 중성 클래스 스코어값의 곱으로 설정될 수 있다.For example, the segmentation reliability of the first order pseudo word "group" of the single image "a" may include the initial class score value of the initial image "a" calculated through the "a classifier" and "| It may be set to the product of the neutral class score value of the neutral image "ㅏ" calculated through the "classifier".

상기와 같이, 상기 낱자 이미지의 상기 각 슈도 낱자에 대한 분할 신뢰도가 산출되면, 낱자 인식 제어부(340)는 상기 N개의 슈도 낱자 각각에 대응하는 분할 신뢰도 및 비분할 신뢰도를 서로 곱하여 슈도 낱자 신뢰도를 산출한다. 이 때, 상기 분할 신뢰도는 비분할 신뢰도의 경우와 같이, 확률값으로 변환되어 상기 비분할 신뢰도와 곱해질 수 있다.As described above, when the split reliability of the respective pseudo pictures of the piece image is calculated, the piece recognition control unit 340 multiplies the split reliability and the non-divided reliability corresponding to each of the N pseudo pieces to calculate the pseudo picture reliability. do. In this case, the split reliability may be converted into a probability value and multiplied by the non-split reliability, as in the case of non-split reliability.

이 후, 낱자 인식 제어부(340)는 상기 N개의 슈도 낱자 중 가장 높은 값의 슈도 낱자 신뢰도를 갖는 슈도 낱자를 상기 낱자 이미지로 인식할 수 있다. Thereafter, the piece recognition control unit 340 may recognize the pseudo piece having the highest pseudo degree reliability among the N pseudo pieces as the single image.

이와 같이, 비분할 자소 인식 결과 1순위 슈도 낱자와 2순위 슈도 낱자의 비분할 신뢰도 차이가 얼마 나지 않아 문자 인식의 정확성이 보장되지 않는 것으로 판단되는 경우, 본 발명에 따른 문자 인식 장치는 낱자 이미지를 자소 분할하여 인식하고, 상기 자소 분할 결과와 상기 자소 비분할 결과를 모두 참조하여 문자를 인식함으로써, 문자 인식의 정확성을 보장할 수 있다. As described above, when it is determined that the accuracy of character recognition is not guaranteed because the difference between the non-divided reliability of the first-order pseudo stitches and the second-rank pseudo stitches is short, the character recognition apparatus according to the present invention uses a single image. It is possible to guarantee the accuracy of character recognition by recognizing characters by phoneme division and recognizing a character by referring to both the phoneme division result and the phoneme non-division result.

또한, 상기 비분할 자소 인식 결과, 1순위 슈도 낱자와 2순위 슈도 낱자의 비분할 신뢰도의 차이가 임계값 이상으로 산출되는 경우, 상기 낱자 이미지를 자소 분할하지 않고, 상기 1순위 슈도 낱자를 상기 낱자 이미지로 인식함으로써, 문자 인식의 신속성 및 정확성을 동시에 보장할 수 있는 효과를 얻을 수 있다.Further, when the non-divided phoneme recognition result indicates that the difference between the non-divided reliability of the first-order pseudo word and the second-order pseudo word is calculated to be greater than or equal to a threshold value, the first-order pseudo word is not subdivided. By recognizing the image, it is possible to obtain the effect of ensuring the speed and accuracy of character recognition at the same time.

본 발명에 따른 문자 인식 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The character recognition method according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

도 7은 본 발명에 따른 문자 인식 방법을 구현하는데 채용될 수 있는 범용 컴퓨터 시스템의 내부 블록도이다.7 is an internal block diagram of a general purpose computer system that may be employed to implement the character recognition method according to the present invention.

컴퓨터 시스템(700)은 램(RAM: Random Access Memory)(720)과 롬(ROM: Read Only Memory)(730)을 포함하는 주기억장치와 연결되는 하나 이상의 프로세서(710)를 포함한다. 프로세서(710)는 중앙처리장치(CPU)로 불리기도 한다. 본 기술분야에서 널리 알려져 있는 바와 같이, 롬(730)은 데이터(data)와 명령(instruction)을 단방향성으로 CPU에 전달하는 역할을 하며, 램(720)은 통상적으로 데이터와 명령을 양방향성으로 전달하는 데 사용된다. 램(720) 및 롬(730)은 컴퓨터 판독 가능 매체의 어떠한 적절한 형태를 포함할 수 있다. 대용량 기억장치(Mass Storage)(740)는 양방향성으로 프로세서(710)와 연결되어 추가적인 데이터 저장 능력을 제공하며, 상기된 컴퓨터 판독 가능 기록 매체 중 어떠한 것일 수 있다. 대용량 기억장치(740)는 프로그램, 데이터 등을 저장하는데 사용되며, 통상적으로 주기억장치보다 속도가 느린 하드디스크와 같은 보조기억장치이다. CD 롬(760)과 같은 특정 대용량 기억장치가 사용될 수도 있다. 프로세서(710)는 비디오 모니터, 트랙볼, 마우스, 키보드, 마이크로폰, 터치스크린 형 디스플레이, 카드 판독기, 자기 또는 종이 테이프 판독기, 음성 또는 필기 인식기, 조이스틱, 또는 기타 공지된 컴퓨터 입출력장치와 같은 하나 이상의 입출력 인터페이스(750)와 연결된다. 마지막으로, 프로세서(710)는 네트워크 인터페이스(770)를 통하여 유선 또는 무선 통신 네트워크에 연결될 수 있다. 이러한 네트워크 연결을 통하여 상기된 방법의 절차를 수행할 수 있다. 상기된 장치 및 도구는 컴퓨터 하드웨어 및 소프트웨어 기술 분야의 당업자에게 잘 알려져 있다.Computer system 700 includes one or more processors 710 connected to main memory including random access memory (RAM) 720 and read only memory (ROM) 730. The processor 710 is also called a central processing unit (CPU). As is well known in the art, the ROM 730 serves to transfer data and instructions to the CPU unidirectionally, and the RAM 720 typically transfers data and instructions bidirectionally. Used to. RAM 720 and ROM 730 may include any suitable form of computer readable media. Mass storage 740 is bidirectionally coupled to processor 710 to provide additional data storage capabilities, and may be any of the computer readable recording media described above. The mass storage device 740 is used to store programs, data, and the like, and is typically an auxiliary memory device such as a hard disk which is slower than the main memory device. Certain mass storage devices such as CD ROM 760 may be used. The processor 710 may include one or more input / output interfaces, such as video monitors, trackballs, mice, keyboards, microphones, touchscreen displays, card readers, magnetic or paper tape readers, voice or handwriting readers, joysticks, or other known computer input / output devices. 750 is connected. Finally, the processor 710 may be connected to a wired or wireless communication network through the network interface 770. Through this network connection, the procedure of the method described above can be performed. The apparatus and tools described above are well known to those skilled in the computer hardware and software arts.

지금까지 본 발명에 따른 구체적인 실시예에 관하여 설명하였으나, 본 발명의 범위에서 벗어나지 않는 한도 내에서는 여러 가지 변형이 가능함은 물론이다.While specific embodiments of the present invention have been described so far, various modifications are possible without departing from the scope of the present invention.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며, 후술하는 특허청구의 범위뿐 아니라 이 특허청구의 범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the claims below, but also by the equivalents of the claims.

본 발명의 자소 기반 문자 인식 장치 및 방법에 따르면, 낱자를 자소로 분할하지 않고 상기 낱자 그대로 인식하는 비분할 자소 인식 방법을 통해 상기 낱자를 인식함으로써, 문자 인식 동작의 수행 시간을 단축할 수 있는 효과를 얻을 수 있다.According to the phoneme-based character recognition apparatus and method of the present invention, an effect of shortening the execution time of a character recognition operation by recognizing the letter through a non-divisional letter recognition method that recognizes the letter as it is without dividing the letter into phonemes. Can be obtained.

또한, 본 발명의 자소 기반 문자 인식 장치 및 방법에 따르면, 상기 비분할 자소 문자 인식 결과가 모호한 경우, 상기 낱자의 자소를 분할하여 인식하는 분할 자소 인식 방법을 통해 상기 낱자를 인식한 후, 상기 비분할 자소 인식 결과와 비교하여 상기 낱자를 인식함으로써, 문자 폰트의 변화에 상관 없이 정확하게 문자를 인식할 수 있는 효과를 얻을 수 있다.Further, according to the phoneme-based character recognition apparatus and method of the present invention, if the non-divisional phoneme character recognition result is ambiguous, after recognizing the letter through a divisional phoneme recognition method for dividing and recognizing the phoneme, By recognizing the letter in comparison with the result of the subtractive recognition, it is possible to obtain the effect of accurately recognizing the character regardless of the change of the character font.

또한, 본 발명의 자소 기반 문자 인식 장치 및 방법에 따르면, 낱자를 자소 분할하여 인식하는 경우, 상기 낱자의 형식 정보 및 곡률 정보를 이용하여 상기 자소 중 모음과의 연결 부위인 터칭 포인트(Touching Point)를 검출하여 자소 분할함으로써, 보다 정확하게 자소 분할을 수행할 수 있는 효과를 얻을 수 있다.In addition, according to the phoneme-based character recognition apparatus and method of the present invention, when a phoneme is divided and recognized, a touching point which is a connection part with a vowel among the phonemes by using the type information and curvature information of the letter. By detecting the phoneme segmentation, it is possible to obtain the effect that the phoneme segmentation can be performed more accurately.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the present invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications thereof will belong to the scope of the present invention.

Claims

In the character recognition method,

Reading feature value information and format information of a predetermined single image;

Generating initial information, neutral information, or final information of the single image according to the feature value information and format information;

Generating a plurality of pseudo words for reading the single image according to the initial information, the neutral information, or the final information, and calculating a non-split confidence value corresponding to each of the pseudo words;

If the difference between the first-order non-partitioned reliability having the highest value and the second-order non-partitioned reliability having the next higher value is greater than or equal to a predetermined threshold value, the pseudo-first-partitioned reliability Recognizing a piece as the piece image; And

When the difference between the first-rank reliability and the second-rank reliability is less than the threshold value, the split image is subdivided to calculate split reliability corresponding to each of the pseudo words, and the split reliability and the non-division corresponding to the respective pseudo words are divided. Recognizing the pseudo word having the highest reliability as a result of multiplying each other by the reliability image

Character recognition method comprising a.

The method of claim 1,

The feature value information of the single image may include vector information of pixels included in each mesh image obtained by dividing the single image into a plurality.

The method of claim 1,

The format information of the single image is set to any one of a Korean format consisting of a first format to a sixth format and a seventh format of an English letter, a number, or a special character.

The method of claim 1,

The step of generating the initial information, the neutral information, or the final information of the single image according to the feature value information and format information,

First class score (Score) each having 19 Korean first class (corresponding to the initial number of the single image), Neutral class score each having 5 to 9 neutral classes (1) corresponding to the neutrality of the single image, or Calculating a final class score of each of the 27 Korean final classes corresponding to the single image finality; And

Converting the initial class score, the neutral class score, or the final class score into an initial class probability, a neutral class probability, or a final class probability, respectively.

Character recognition method comprising a.

The method of claim 4, wherein

Generating a plurality of pseudo words for reading the single image, and calculating non-segmented reliability corresponding to each of the pseudo words,

Multiplying each of the initial class probabilities, the neutral class probabilities, or the final class probabilities by each other, and calculating the probability product of each class of the pseudo-shallow image according to the combination of each initial class, the neutral class, or the final class.

Character recognition method comprising a.

The method of claim 1,

When the difference between the first-order non-partitioned reliability and the second-order non-partitioned reliability is less than the threshold value, the step of subdividing the single image to calculate the divided reliability corresponding to each pseudo-letter,

Dividing the single image into an initial image, a neutral image, or a final image;

Reading respective feature information of the initial image, the neutral image, or the final image;

Selecting N pseudo pieces according to a high value undivided reliability order of the pseudo pieces, and reading an initial class, a neutral class, or a final class each of the N pseudo pieces;

Calculating an initial class score, a neutral class score, or a final class score each of the initial image, the neutral image, or the final image corresponding to each of the N pseudowords; And

Multiplying the initial class score, the neutral class score, or the final class score by each of the N pseudo pieces to calculate a segmentation reliability of the single image corresponding to each of the N pseudo pieces

Character recognition method comprising a.

The method of claim 6,

The step of dividing the single image into an initial image, a neutral image, or a final image may include setting a division region through format information of the single image, and touching points of the single image included in the respective division regions. And dividing the image into the initial image, the neutral image, or the final image.

A computer-readable recording medium having recorded thereon a program for executing the method of any one of claims 1 to 7.