KR101377601B1

KR101377601B1 - System and method for providing recognition and translation of multiple language in natural scene image using mobile camera

Info

Publication number: KR101377601B1
Application number: KR1020120104762A
Authority: KR
Inventors: 곽희규; 이현주; 김성헌; 나경태
Original assignee: 주식회사 인지소프트
Priority date: 2012-09-20
Filing date: 2012-09-20
Publication date: 2014-03-25

Abstract

The present invention relates to a system and a method for recognizing and translating multiple language text from natural images using a mobile camera. The system for recognizing and translating multiple language text from natural images includes: a common text recognition/translation engine which extracts, recognizes, and translates text from an image acquired by a mobile camera; and a management unit which manages a recognition/translation database used for recognizing and translating the extracted text, builds a database specialized for each application where text recognition and translation is performed including data about combinations among alphabets and words, among alphabets, and among words and data about wrong-recognition patterns, and provides the database to the common text recognition/translation engine. The common text recognition/translation engine provides a framework applicable commonly to applications where text recognition and translation is performed and recognizes and translates text using a database specialized for each application which is provided by the management unit. According to the present invention, by providing the common engine framework and the management unit, even if recognition targets vary with applications, it is possible to recognize various kinds of text such as natural images without separately developing an engine. Also, text from a natural image can be successfully extracted even if being damaged due to various factors, and the recognition rate is not deteriorated even if the fonts of text are diverse. [Reference numerals] (100) Mail send system; (110) Send list generation unit; (112) Natural image pre-processing unit; (114) Text extraction unit; (116) Text verification unit; (120) Multiple language text recognition unit; (122) Character string recognition unit; (124) Image recognition unit; (130) Multiple language translation unit; (150) Management tool; (155) Database management tool by language; (160) Text recognition performance tuning tool; (165) Text recognition learning tool; (170) Test bed by language; (175) Distribution management tool; (180) Commercial software; (185) Software for a smartphone; (190) Software for a terminal; (195) Package software; (197) Software for a server service

Description

System and method for providing recognition and translation of multiple language in natural scene image using mobile camera

본 발명은 영상에 존재하는 문자정보를 인식하는 기술에 관한 것으로서, 특히 카메라를 이용하여 취득한 자연영상(natural scene image)에 존재하는 특정 문자 정보(scene text)를 추출하고 인식하며, 번역하는 모바일 카메라를 이용한 자연 영상 다국어 문자인식과 번역 방법 및 시스템에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technology for recognizing text information present in an image, and in particular, a mobile camera that extracts, recognizes, and translates specific text information present in a natural scene image acquired using a camera. The present invention relates to a multilingual character recognition and translation method and system using natural image.

종래의 문자인식 기술은 스캐너를 통한 이미지 생성과 문서처리에 집중되어 있었다. 그러나 모바일 단말 환경의 발전으로 사용자의 이동성이 활발해진 최근에는 모바일 카메라를 통한 영상 취득과 자연 영상(natural scene image)의 문자인식을 통한 정보 수집 및 서비스의 결합이 시급하게 요구되고 있다.Conventional text recognition technology has been concentrated on image generation and document processing through a scanner. However, in recent years, in which the mobility of the user has been increased due to the development of the mobile terminal environment, there is an urgent need for combining information acquisition and service through image acquisition through a mobile camera and character recognition of a natural scene image.

또한 종래의 문자 인식 엔진은 특정 어플리케이션에서는 문자인식 성능이 어느 정도 나오지만, 인식 대상이 달라지면 인식 성능이 떨어지고 인식 대상에 맞는 별도의 엔진을 개발해야 하는 불편함과 어려움이 있었다.In addition, the conventional character recognition engine has some degree of character recognition performance in a specific application, but when the recognition target is different, the recognition performance is inferior, and there is inconvenience and difficulty in developing a separate engine suitable for the recognition target.

본 발명이 해결하고자 하는 과제는 종래의 문자 인식 엔진에서는 특정 어플리케이션에서는 문자인식 성능이 어느 정도 나오지만, 인식 대상이 달라지면 인식 성능이 떨어지고 인식 대상에 맞는 별도의 엔진을 개발해야 했지만, 자연영상과 같이 인식 대상이 다양하고, 어플리케이션에 따라 인식대상이 달라지더라도, 엔진을 별도로 개발하지 않고도 자연영상과 같은 다양한 문자 인식이 가능하고, 카메라로부터 취득된 자연영상에 포함된 문자를 자동으로 추출 및 인식하고 다른 언어로 번역하는, 모바일 카메라를 이용한 자연 영상 다국어 문자 인식 및 번역 시스템을 제공하는 것이다. The problem to be solved by the present invention is that the character recognition performance in a specific application in the conventional character recognition engine, but the recognition performance is reduced when the recognition target is different, but it was necessary to develop a separate engine suitable for the recognition target, but it recognizes like natural video Even if the targets are diverse and the targets of recognition are different according to the application, it is possible to recognize various characters such as natural images without developing the engine separately, and automatically extracts and recognizes the characters included in the natural images acquired from the camera. The present invention provides a multilingual character recognition and translation system for natural video using a mobile camera.

본 발명이 해결하고자 하는 다른 과제는 자연영상과 같이 인식 대상이 다양하고, 어플리케이션에 따라 인식대상이 달라지더라도, 엔진을 별도로 개발하지 않고도 자연영상과 같은 다양한 문자 인식이 가능하고, 카메라로부터 취득된 자연영상에 포함된 문자를 자동으로 추출 및 인식하고 다른 언어로 번역하는, 모바일 카메라를 이용한 자연 영상 다국어 문자 인식 및 번역 방법을 제공하는 것이다. Another problem to be solved by the present invention is that even if the recognition target is different, such as a natural image, and the recognition target is different depending on the application, it is possible to recognize a variety of characters, such as a natural image without developing an engine, and acquired from the camera The present invention provides a method for recognizing and translating natural image multilingual characters using a mobile camera, which automatically extracts and recognizes characters included in a natural image and translates them into other languages.

상기 기술적 과제를 이루기 위한 본 발명에 의한 모바일 카메라를 이용한 자연 영상 다국어 문자 인식 및 번역 시스템은, 모바일 카메라로 취득한 영상에 존재하는 문자를 추출, 인식 및 번역하는 공통 문자 인식/번역 엔진; 및 상기 추출된 문자의 인식 및 번역에 사용되는 인식/번역DB를 관리하며, 상기 문자 인식 및 번역이 수행되는 응용 분야(application)에 따라 특화되는 낱자와 단어, 낱자간 조합, 단어간 조합 정보 및 오인식 패턴정보를 데이터베이스로 구축하여 상기 공통 문자 인식/번역 엔진에게 제공하는 관리도구를 포함하고, 상기 공통 문자 인식/번역 엔진은, 상기 문자 인식 및 번역이 수행되는 응용 분야(application)에 공통적으로 사용 가능한 프레임 웍(framework)을 제공하며, 상기 관리도구로부터 제공되는 응용분야 별로 특화된 데이터베이스를 이용하여 문자를 인식 및 번역한다.According to an aspect of the present invention, there is provided a natural image multilingual character recognition and translation system using a mobile camera, including: a common character recognition / translation engine for extracting, recognizing, and translating characters existing in an image acquired by a mobile camera; And manages a recognition / translation DB used for recognition and translation of the extracted characters, and includes information on the letter and word, letter combination, word combination information specialized according to an application in which the character recognition and translation is performed. A management tool for constructing a database of misrecognition pattern information as a database and providing the common character recognition / translation engine, wherein the common character recognition / translation engine is commonly used for an application in which the character recognition and translation is performed. It provides a possible framework, and recognizes and translates characters using a database specialized for each application provided by the management tool.

상기 공통 문자 인식/번역 엔진은 모바일 카메라로 촬영한 자연영상에서 문자열이 존재하는 영역을 탐색하여 문자를 추출하는 자연영상 문자영역 추출부; 상기 추출된 문자영역의 문자열을 인식하는 다국어 문자인식부; 및 상기 인식된 문자열을 다국어 데이터베이스를 검색하여 다양한 언어로 번역하는 다국어 번역부를 포함하는 것을 특징으로 한다.The common character recognition / translation engine may include: a natural image character region extraction unit for extracting a character by searching for an area in which a character string exists in a natural image photographed by a mobile camera; A multilingual character recognition unit for recognizing a character string of the extracted character area; And a multilingual translation unit for translating the recognized character strings into various languages by searching a multilingual database.

상기 자연영상 문자영역 추출부는 입력된 영상 파일을 영상 처리가 가능한 형태로 변환하고 영상에 포함되어 있는 왜곡을 보정하는 자연영상 전처리부; 입력된 영상 내에서 문자영역 후보군을 선정하고 문자영역 후보들 중 문자영역의 배경영상과 문자열을 구분 한 후 추출된 문자열을 각각의 문자로 분리하는 문자추출부; 및 적어도 단어간 조합정보와, 낱자간 조합정보 및 오인식 패턴 정보를 포함하는 언어모델을 적용하여 언어별 문자 추출로 문자를 검증하는 문자검증부를 포함하는 것을 특징으로 한다.The natural image character region extraction unit may include a natural image preprocessing unit converting an input image file into a form capable of image processing and correcting distortion included in the image; A character extracting unit for selecting a character region candidate group in the input image, separating a background image and a character string of the character region among the character region candidates, and separating the extracted character string into respective characters; And a character verification unit for verifying characters by extracting characters for each language by applying a language model including at least word combination information, word combination information, and misrecognition pattern information.

상기 다국어 문자 인식부는 인쇄문자 형태로 된 문자열을 인식하는 문자열 인식부를 포함하고, 상기 문자열 인식부는 문자열 영상에서 문자구분 예상 지점으로 분할 점을 세분화하여 분할한 후 각 세그먼트를 조합 및 인식하여 최적 조합을 탐색하여 문자열을 인식하는 것을 특징으로 한다. 상기 다국어 문자 인식부는 그래픽 요소가 포함된 문자 또는 로고를 인식하는 이미지 인식부를 더 포함하고, 상기 이미지 인식부는 이미지의 특성을 나타내는 코너 포인트(corner point), 접합 포인트(junction point)를 추출한 후 주변 정보를 이용하여 벡터화하는 특징추출부; 및 상기 벡터화된 이미지 특징 정보와 이미지 매칭 DB의 정보를 매칭하여 가장 유사한 이미지 정보를 찾는 특징 매칭부를 포함하는 것이 바람직하다.The multi-language character recognition unit includes a string recognition unit for recognizing character strings in the form of printed characters, and the character string recognition unit divides and divides a split point into predicted character division points in a string image, and then combines and recognizes each segment to obtain an optimal combination. It is characterized by recognizing a string by searching. The multilingual text recognition unit may further include an image recognition unit recognizing a character or a logo including a graphic element, and the image recognition unit extracts a corner point and a junction point representing characteristics of an image and then surrounds the surrounding information. Feature extractor to vectorize using; And a feature matching unit that matches the vectorized image feature information with information of an image matching DB to find the most similar image information.

상기 다국어 번역부는 일반적인 단어 또는 단어의 조합이 복수의 단어 또는 단어의 조합으로 매칭되어 있는 일반용어 번역 데이터베이스; 및 특화된 단어 또는 단어의 조합이 1대 1로 매칭되어 있는 특화용어 번역 데이터베이스를 구비하고, 상기 인식된 문자열이 상기 일반 단어 또는 단어의 조합일 경우에는 상기 일반용어 번역 데이터 데이터베이스를 참조하여 복수의 단어 또는 단어의 조합으로 번역하고, 상기 인식된 문자열이 상기 특화된 단어 또는 단어의 조합일 경우에는 상기 특화용어 번역 데이터 데이터베이스를 참조하여 하나의 단어 또는 단어의 조합으로 번역하는 것을 특징으로 한다.The multilingual translation unit may include a general term translation database in which a general word or a combination of words is matched with a plurality of words or combinations of words; And a specialized term translation database in which a specialized word or word combination is matched one-to-one, and when the recognized character string is the general word or combination of words, a plurality of words by referring to the general term translation data database. Or a combination of words, and when the recognized character string is the specialized word or a combination of words, the character string is translated into a single word or a combination of words by referring to the specialized terminology translation data database.

상기 관리도구는 성능 튜닝 기능 및 문자인식 모델 DB 생성을 위한 학습기능을 부가로 제공하며, 상기 추출된 문자를 인식하고 번역하는 데 사용되는 인식/번역DB를 관리하고 상기 관리에 필요한 등록, 변경, 삭제 및 조회 기능을 제공하는 언어별 DB 관리도구; 성능 튜닝을 제공하는 문자인식 성능튜닝 도구; 및 문자인식 모델 DB 생성을 위한 학습기능을 제공하는 문자인식 학습도구를 포함하는 것을 특징으로 한다. The management tool additionally provides a performance tuning function and a learning function for generating a character recognition model DB, and manages a recognition / translation DB used to recognize and translate the extracted character, and registers, changes, A language-specific DB management tool that provides a deletion and inquiry function; Character recognition performance tuning tool providing performance tuning; And a character recognition learning tool that provides a learning function for generating a character recognition model DB.

상기 인식/번역 DB는 인식기 학습에 사용되는 낱자정보를 폰트 종류별로 구분하여 저장하는 폰트 DB; 원래 이미지 정보와 특징 추출 정보로 이루어지는 이미지 매칭 DB; 단어와 단어출현 빈도 및 단어 간 조합정보로 이루어지는 단어정보와, 낱자, 낱자출현빈도 및 낱자간 조합정보로 이루어지는 낱자 정보와, 오인식 발생이 쉬운 단어와 단어의 오인식 패턴으로 이루어지는 오인식 패턴 정보로 이루어지는 언어 모델 DB; 상기 문자인식엔진 학습도구에서 생성된 문자인식 모델의 목록을 관리하는 문자인식 모델 DB; 및 기준언어의 단어, 기준언어와 매칭되는 외국어 단어의 형태를 가지며 기준언어 테이블과 매칭되는 외국어 단어 테이블로 이루어지는 번역 DB중 적어도 하나를 포함하는 것을 특징으로 한다. The recognition / translation DB may include: a font DB configured to classify and store piece information used for recognizer learning by font type; An image matching DB consisting of original image information and feature extraction information; Language consisting of word information consisting of words, frequency of occurrence of words and combination information between words, word information consisting of words, word occurrence frequency and combination information between words, and misrecognition pattern information consisting of words that are easy to be mistaken and patterns of word misrecognition Model DB; A character recognition model DB for managing a list of character recognition models generated by the character recognition engine learning tool; And a translation DB having a form of a word of the reference language, a foreign language word matching the reference language, and a foreign language word table matching the reference language table.

상기 문자인식 학습도구는 폰트 DB의 낱자 정보를 입력 받아 학습된 문자인식 모델을 생성하고, 각 낱자의 경계정보를 이용하여 이미지 특성을 추출하여 특징벡터로 표현하는 클래스 별 특징 추출부; 학습 및 인식에 사용되는 메모리 및 계산량을 축소하기 위해 상기 특징 벡터의 차원에서 중요한 요소만을 추출하여 상기 특징 벡터를 재구성하는 클래스별 차원 축소부; 및 클래스 별 낱자들의 벡터를 이용하여 클래스를 대표하는 기준 벡터를 생성하는 학습부를 포함하는 것을 특징으로 한다. The character recognition learning tool includes: a feature extraction unit for each class that generates a learned character recognition model by receiving the character information of the font DB, extracts the image characteristic using the boundary information of each character, and expresses it as a feature vector; A dimension reduction unit for each class for reconstructing the feature vector by extracting only an important element from the dimension of the feature vector in order to reduce the memory and the amount of computation used for learning and recognition; And a learning unit generating a reference vector representing a class by using a vector of classes.

상기 문자인식 성능 튜닝 도구는 상기 언어별 DB관리도구로부터 전달받은 DB정보와 특화된 분야의 테스트용 샘플을 이용하여 특화된 분야에 필요한 DB와 엔진 튜닝정보를 선별하고, 상기 선별된 DB와 엔진튜닝 정보를 이용하여 엔진의 튜닝을 수행하는 것을 특징으로 한다.The character recognition performance tuning tool selects DB and engine tuning information necessary for a specialized field by using the DB information received from the DB management tool for each language and a test sample for a specialized field, and selects the selected DB and engine tuning information. It characterized in that the tuning of the engine using.

본 발명에 의한 모바일 카메라를 이용한 자연 영상 다국어 문자 인식 및 번역 시스템은 상기 문자인식 성능 튜닝 도구로부터 최적 튜닝 DB정보와 최적 튜닝 엔진 정보를 전달받아 엔진을 재구성하여 테스트하는 언어별 테스트 베드; 및 상기 문자인식 성능 튜닝 도구로부터 최적 튜닝 DB 정보와 최적 튜닝 엔진 정보를 전달받고 사용가능한 형태의 엔진과 DB를 구성하여 서비스 하고자 하는 환경에 적합한 라이브러리를 생성하여 배포용 모듈을 출력하는 배포관리도구를 더 포함하는 것을 특징으로 한다. Natural image multi-language character recognition and translation system using a mobile camera according to the present invention includes a test bed for each language for receiving the optimal tuning DB information and the optimal tuning engine information from the character recognition performance tuning tool to reconfigure and test the engine; And a distribution management tool that receives the optimal tuning DB information and the optimal tuning engine information from the character recognition performance tuning tool, generates a library suitable for the service environment by constructing an engine and a DB in a usable form, and outputs a module for distribution. It further comprises.

상기 기술적 과제를 이루기 위한 본 발명에 의한 모바일 카메라를 이용한 자연 영상 다국어 문자 인식 및 번역 방법은 문자 인식 및 번역이 수행되는 응용 분야(application)에 따라 특화되는 낱자와 단어, 낱자간 조합, 단어간 조합 정보 및 오인식 패턴정보를 데이터베이스로 구축하는 단계; 상기 응용분야에 공통적으로 적용되는 프레임웍을 제공하는 문자 인식/번역 엔진이 모바일 카메라로 취득한 영상에 존재하는 문자열 영상을 추출하는 단계; 상기 문자 인식/번역 엔진이 상기 데이터 베이스를 이용하여 상기 추출된 문자열 영상을 분석하고 인식하여 텍스트 형태로 출력하는 단계; 및 상기 문자 인식/번역 엔진이 상기 텍스트 형태의 문자를 소정의 데이터 베이스를 이용하여 다른 언어로 번역하는 단계를 포함한다.Natural image multi-language character recognition and translation method using a mobile camera according to the present invention for achieving the above technical problem is a combination of words and words, combinations between words, words and combinations that are specialized according to the application (s) that character recognition and translation is performed Constructing information and misrecognition pattern information into a database; Extracting a character string image existing in an image acquired by a mobile camera by a character recognition / translation engine providing a framework commonly applied to the application field; Analyzing, recognizing, and outputting the extracted character string image in text form by using the database by the character recognition / translation engine; And translating, by the character recognition / translation engine, the text-type characters into another language using a predetermined database.

상기 문자열 영상 추출은 입력된 영상 파일을 영상 처리가 가능한 형태로 변환하고 영상에 포함되어 있는 왜곡을 보정하는 단계; 입력된 영상 내에서 문자영역 후보군을 선정하고 문자영역 후보들 중 문자영역의 배경영상과 문자열을 구분 한 후 추출된 문자열을 각각의 문자로 분리하는 단계; 및 적어도 단어간 조합정보와, 낱자간 조합정보 및 오인식 패턴 정보를 포함하는 언어모델을 적용하여 언어별 문자 추출로 문자를 검증하는 단계를 포함하는 것을 특징으로 한다.The extracting of the string image may include converting an input image file into a form capable of image processing and correcting distortion included in the image; Selecting a character region candidate group in the input image, separating a background image and a character string of the character region among the character region candidates, and separating the extracted character string into respective characters; And verifying a character by extracting characters for each language by applying a language model including at least word combination information, word combination information, and misrecognition pattern information.

상기 문자열 영상 분석 및 인식은 문자열 영상에서 문자구분 예상 지점으로 분할 점을 세분화하여 분할한 후 각 세그먼트를 조합 및 인식하여 최적 조합을 탐색하는 것을 특징으로 한다.The character string analysis and recognition may be performed by subdividing and dividing a split point into character segment prediction points in a character string image, and then searching for an optimal combination by combining and recognizing each segment.

상기 문자열 영상 인식 및 인식된 문자열 번역은 인식기 학습에 사용되는 낱자정보를 폰트 종류별로 구분하여 저장하는 폰트 DB; 원래 이미지 정보와 특징 추출 정보로 이루어지는 이미지 매칭 DB; 단어와 단어출현 빈도 및 단어 간 조합정보로 이루어지는 단어정보와, 낱자, 낱자출현빈도 및 낱자간 조합정보로 이루어지는 낱자 정보와, 오인식 발생이 쉬운 단어와 단어의 오인식 패턴으로 이루어지는 오인식 패턴 정보로 이루어지는 언어 모델 DB; 상기 문자인식엔진 학습도구에서 생성된 문자인식 모델의 목록을 관리하는 문자인식 모델 DB; 및 기준언어의 단어, 기준언어와 매칭되는 외국어 단어의 형태를 가지며 기준언어 테이블과 매칭되는 외국어 단어 테이블로 이루어지는 번역 DB중 적어도 하나를 이용하여 이루어지는 것을 특징으로 한다.The character string image recognition and the recognized character string translation may include a font DB for classifying font information used for learning a recognizer according to font type; An image matching DB consisting of original image information and feature extraction information; Language consisting of word information consisting of words, frequency of occurrence of words and combination information between words, word information consisting of words, word occurrence frequency and combination information between words, and misrecognition pattern information consisting of words that are easy to be mistaken and patterns of word misrecognition Model DB; A character recognition model DB for managing a list of character recognition models generated by the character recognition engine learning tool; And a translation DB having a form of a word of the reference language, a foreign language word matching the reference language, and a foreign language word table matching the reference language table.

그리고 상기 기재된 발명을 프로세서에 의해 실행되는 프로그램을 기록한 프로세서에 의해 읽을 수 있는 기록매체를 제공한다. The present invention provides a recording medium readable by a processor that records a program executed by the processor.

종래의 문자 인식 엔진에서는 특정 어플리케이션에서는 문자인식 성능이 어느 정도 나오지만, 인식 대상이 달라지면 인식 성능이 떨어지고 인식 대상에 맞는 별도의 엔진을 개발해야 했지만, 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자 인식과 번역 시스템 및 방법에 의하면, 엔진 공통 프레임 웍 및 적어도 학습 및 성능튜닝 기능을 포함하고 있는 관리도구를 제공함으로써, 자연영상과 같이 인식 대상이 다양하고, 어플리케이션에 따라 인식대상이 달라지더라도, 달라진 인식대상에 적합한 관리도구를 엔진 공통 프레임 웍에게 제공함으로써 엔진을 별도로 개발하지 않고도 자연영상과 같은 다양한 문자 인식이 가능하다.In the conventional character recognition engine, the character recognition performance is shown to some extent in a specific application, but when the recognition target is different, the recognition performance is deteriorated and a separate engine suitable for the recognition target has to be developed, but the natural image multilingual character recognition using the mobile camera according to the present invention And a translation system and method, by providing an engine common framework and a management tool including at least learning and performance tuning functions, the recognition targets are varied, such as natural video, and even if the recognition targets vary depending on the application. By providing management tools suitable for recognition objects to the common framework of the engine, it is possible to recognize various characters such as natural video without developing the engine separately.

그리고 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자 인식과 번역 시스템 및 방법에 의하면, 카메라로 취득한 자연 영상에 존재하는 문자를 추출하고 인식하며, 번역할 수 있다. 자연영상의 문자인식은 다양한 요소에 의해 영상이 훼손되는 경우가 많기 때문에 문자 추출 및 인식에 어려움이 많지만, 본 발명에 의하면 다양한 요소에 의해 훼손된 자연영상의 문자를 정상적으로 추출할 수 있고 문자의 폰트가 다양해도 인식률이 저하되지 않는다.And according to the natural image multilingual character recognition and translation system and method using a mobile camera according to the present invention, it is possible to extract, recognize, and translate characters existing in the natural image acquired by the camera. Character recognition of the natural image is often difficult to extract and recognize the character because the image is often damaged by various factors, according to the present invention can extract the character of the natural image damaged by the various factors normally and the font of the character The variety does not reduce the recognition rate.

도 1은 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자인식 시스템에 대한 일실시예를 블록도로 나타낸 것이다.
도 2는 이미지 인식부의 구성을 나타낸 것이다.
도 3은 상기 인식/번역 DB의 구성을 나타낸 것이다.
도 4는 문자인식 학습 도구의 구성을 나타낸 것이다.
도 5는 본 발명의 구성에 대한 보다 구체적인 실시예의 구성을 블록도로 나타낸 것이다.
도 6은 엔진 공통 프레임 웍의 구성을 블록도로 나타낸 것이다.
도 7은 자연영상 문자 추출 엔진의 구성을 블록도로 나타낸 것이다.
도 8은 카메라 기반 영상촬영시 자연영상에서 발생하는 문제들의 예를 나타낸 것이다.
도 9는 reflection 효과를 제거하는 알고리즘을 그림으로 나타낸 것이다.
도 10은 영상 병합이 필요한 예를 나타낸 것이며,
도 11은 파노라믹 영상의 정합방법을 나타낸 것이다.
도 12는 영상 병합이 완료된 이미지를 나타낸 것이다.
도 13은 그림자가 있는 자연영상에서 경계검출 방법을 나타낸 것이다.
도 14는 MRF를 적용한 글자 후보 영역 추출과정을 나타낸 것이다.
도 15는 perspective 효과 검출의 일예를 나타낸 것이다.
도 16은 문자 병합 판단이 어려운 예를 나타낸 것이다.
도 17은 언어모델 적용 후 문자 병합의 예를 나타낸 것이다.
도 18은 문자열 인식기 구성을 블록도로 나타낸 것이다.
도 19는 과분할 인식 과정의 예를 나타낸 것이다.
도 20은 수집 대상 한글 폰트의 일예를 나타낸 것이다.
도 21은 수집 대상 라틴 폰트의 일예를 나타낸 것이다.
도 22는 훼손된 영상에 대한 인식결과 후처리의 예를 나타낸 것이다.
도 23은 이미지 매칭의 구성을 나타낸 것이다.
도 24는 이미지 매칭에서 특징점 추출의 예를 나타낸 것이다.
도 25는 이미지 매칭의 예를 나타낸 것이다.
도 26은 클러스터링(Clustering)을 통한 후보 선택을 나타낸 것이다.
도 27은 한국어, 영어 일반용어 번역 DB의 구성 예로서 단방향 1:N 번역 테이블 구성과, 한국어, 라틴 특화용어 번역 DB 구성 예로서 양방향 1:1 번역 테이블의 구성 예를 나타내고 있다.
도 28은 관리도구의 구성도를 나타낸 것이다.
도 29는 언어별 DB의 구성도를 나타낸 것이다.
도 30은 폰트 정보 및 폰트 DB 관리의 예를 나타낸 것이다.
도 31은 폰트 DB 구축방법을 나타낸 것이다.
도 32는 언어 모델 DB 구축 흐름도를 도시한 것이다.
도 33은 이미지 매칭 DB 구축 흐름도를 나타낸 것이다.
도 34는 번역 DB 관리 구성도를 나타낸 것이다.
도 35는 문자인식 엔진 학습 도구의 구성도를 나타낸 것이다.
도 36은 특징 추출 과정을 예시적으로 나타낸 것이다.
도 37은 문자인식 성능 튜닝 도구의 흐름도를 나타낸 것이다.
도 38은 언어별 테스트 베드의 구성을 나타낸 것이다.
도 39는 배포관리 도구 구성도를 나타낸 것이다.
도 40은 상용 소프트웨어 구성도를 나타낸 것이다.
도 41은 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자인식 및 번역 방법에 대한 일실시예를 흐름도로 나타낸 것이다.
도 42는 도 41의 문자열 영상 추출 과정을 보다 상세하게 나타낸 것이다.1 is a block diagram showing an embodiment of a natural image multilingual character recognition system using a mobile camera according to the present invention.
2 shows a configuration of an image recognition unit.
3 shows the configuration of the recognition / translation DB.
Figure 4 shows the configuration of the character recognition learning tool.
Figure 5 shows a block diagram of a more specific embodiment of the configuration of the present invention.
6 is a block diagram showing the configuration of the engine common framework.
7 is a block diagram showing the configuration of the natural video character extraction engine.
8 illustrates examples of problems occurring in natural video when camera-based video is taken.
9 is a diagram illustrating an algorithm for removing reflection effects.
10 illustrates an example in which image merging is required.
11 illustrates a matching method of a panoramic image.
12 illustrates an image in which image merging is completed.
FIG. 13 illustrates a boundary detection method in a natural image with shadow.
14 illustrates a process of extracting a character candidate region applying MRF.
15 shows an example of perspective effect detection.
16 illustrates an example in which character merging determination is difficult.
17 shows an example of character merging after application of a language model.
18 is a block diagram illustrating a string recognizer configuration.
19 shows an example of a hyperdivision recognition process.
20 illustrates an example of a collection Korean font.
21 shows an example of a collection target latin font.
22 shows an example of post-processing of a recognition result of a corrupted image.
23 shows the configuration of image matching.
24 shows an example of feature point extraction in image matching.
25 shows an example of image matching.
26 illustrates candidate selection through clustering.
FIG. 27 shows an example of the configuration of a unidirectional 1: N translation table as an example of a Korean and English general terminology translation DB, and an example of a bidirectional 1: 1 translation table as an example of a Korean and Latin specialized term translation DB.
Fig. 28 shows the construction of the management tool.
29 shows a configuration diagram of the DB for each language.
30 shows an example of font information and font DB management.
Fig. 31 shows a font DB construction method.
32 shows a flowchart of a language model DB construction.
33 shows an image matching DB construction flowchart.
34 shows a translation DB management configuration diagram.
35 is a block diagram of a character recognition engine learning tool.
36 exemplarily illustrates a feature extraction process.
37 shows a flowchart of the character recognition performance tuning tool.
38 shows the configuration of a test bed for each language.
39 shows a configuration diagram of a distribution management tool.
40 shows a commercial software configuration.
41 is a flowchart illustrating an embodiment of a natural image multilingual character recognition and translation method using a mobile camera according to the present invention.
FIG. 42 illustrates the string image extraction process of FIG. 41 in more detail.

이하, 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 바람직한 일 실시예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. The embodiments described in the present specification and the configurations shown in the drawings are merely preferred embodiments of the present invention and are not intended to represent all of the technical ideas of the present invention and therefore various equivalents And variations are possible.

먼저, 본 발명의 명세서 및 도면에서 사용되는 용어를 정의하기로 한다. '자연영상'이란 간판 및 표지판, 비정형 문서 등 다양한 사물에 존재하는 일반적인 텍스트를 카메라를 통해 촬영한 영상으로서, 카메라를 통해 촬영된 영상에 포함되어 있는 문자는 글꼴, 크기, 배치 등이 자유롭고, 햇빛 또는 조명 등 주위 환경에 의해 왜곡이 많이 발생될 수 있다.First, terms used in the specification and drawings of the present invention will be defined. 'Natural video' is a video taken by the camera of general texts on various objects such as signs, signs, and atypical documents.The characters included in the video taken by the camera are free of font, size, and arrangement, and the sunlight Alternatively, a lot of distortion may be generated by the surrounding environment such as lighting.

본 발명은 카메라로부터 취득된 자연영상에 포함된 문자, 예를 들어 한국어, 라틴계열 언어를 자동으로 추출 및 인식 하는 엔진과 상기 엔진 성능을 최적화하는 언어 모델 및 소프트웨어 관리 도구와 서비스 특화 용어 번역 데이터베이스(DB)를 제공한다. The present invention provides an engine for automatically extracting and recognizing characters included in a natural image obtained from a camera, for example, Korean and Latin languages, a language model and software management tool for optimizing the engine performance, and a service-specific term translation database. DB).

도 1은 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자인식 시스템에 대한 일실시예를 블록도로 나타낸 것으로서, 상기 모바일 카메라를 이용한 자연영상 다국어 문자인식 시스템은 공통 문자 인식/번역 엔진(100) 및 관리도구(150)를 포함하여 이루어진다. 그리고 공통 문자 인식/번역 엔진(100)을 기반으로 맞춤형 모듈 및 앱을 포함하는 상용 소프트웨어 모듈(180)이 추가될 수 있다. 1 is a block diagram showing an embodiment of a natural image multilingual character recognition system using a mobile camera according to the present invention, the natural image multilingual character recognition system using the mobile camera is a common character recognition / translation engine 100 and It comprises a management tool 150. In addition, a commercial software module 180 including a custom module and an app may be added based on the common character recognition / translation engine 100.

공통 문자 인식/번역 엔진(100)은 모바일 카메라로 취득한 영상에 존재하는 문자를 추출, 인식 및 번역하며, 자연영상 문자영역 추출부(110), 다국어 문자인식부(120) 및 다국어 번역부(130)을 포함하여 이루어진다.The common character recognition / translation engine 100 extracts, recognizes, and translates the characters existing in the image acquired by the mobile camera, and extracts the natural image character region extractor 110, the multilingual character recognition unit 120, and the multilingual translation unit 130. )

상기 관리도구(150)은 상기 추출된 문자의 인식 및 번역에 사용되는 인식/번역DB를 관리하며, 상기 문자 인식 및 번역이 수행되는 응용 분야(application)에 따라 특화되는 낱자와 단어, 낱자간 조합, 단어간 조합 정보 및 오인식 패턴정보를 데이터베이스로 구축하여 상기 공통 문자 인식/번역 엔진에게 제공한다. 이 때, 상기 공통 문자 인식/번역 엔진(100)은, 상기 문자 인식 및 번역이 수행되는 응용 분야(application)에 공통적으로 사용 가능한 프레임 웍(framework)을 제공하며, 관리도구(150)로부터 제공되는 응용분야 별로 특화된 데이터베이스를 이용하여 문자를 인식 및 번역한다.The management tool 150 manages a recognition / translation DB used for recognition and translation of the extracted characters, and combines letters, words, and letters that are specialized according to an application in which the character recognition and translation is performed. And constructing a database of word combination information and misrecognition pattern information into a database and providing the common character recognition / translation engine. At this time, the common character recognition / translation engine 100 provides a framework that can be commonly used for an application in which the character recognition and translation is performed, and is provided from the management tool 150. Characters are recognized and translated using a database specialized for each application.

자연영상 문자영역 추출부(110)는 모바일 카메라로 촬영한 자연영상에서 문자열이 존재하는 영역을 탐색하여 문자를 추출하며, 자연영상 전처리부(112), 문자추출부(114) 및 문자검증부(116)를 포함하여 이루어진다.The natural image character region extraction unit 110 searches for a region in which a character string exists in the natural image photographed by the mobile camera, and extracts the character. The natural image preprocessing unit 112, the character extraction unit 114, and the character verification unit ( 116).

자연영상 전처리부(112)는 입력된 영상 파일을 영상 처리가 가능한 형태로 변환하고 영상에 포함되어 있는 왜곡을 보정한다. The natural image preprocessing unit 112 converts the input image file into a form capable of image processing and corrects distortion included in the image.

문자추출부(114)는 입력된 영상 내에서 문자영역 후보군을 선정하고 문자영역 후보들 중 문자영역의 배경영상과 문자열을 구분 한 후 추출된 문자열을 각각의 문자로 분리한다. The character extracting unit 114 selects a character region candidate group from the input image, separates the background image and the character string of the character region among the character region candidates, and separates the extracted character string into respective characters.

문자검증부(116)는 적어도 단어간 조합정보와, 낱자간 조합정보 및 오인식 패턴 정보를 포함하는 언어모델을 적용하여 언어별 문자 추출로 문자를 검증한다. The character verification unit 116 verifies a character by extracting characters for each language by applying a language model including at least word combination information, word combination information, and misrecognition pattern information.

다국어 문자 인식부(120)는 상기 추출된 문자영역의 문자열을 인식하며, 문자열 인식부(122) 및 이미지 인식부(124)를 포함하여 이루어진다. The multilingual character recognition unit 120 recognizes the extracted character string of the text area, and includes a string recognition unit 122 and an image recognition unit 124.

문자열 인식부(122)는 인쇄문자 형태로 된 문자열을 인식하며, 보다 구체적으로는 문자열 영상에서 문자구분 예상 지점으로 분할 점을 세분화하여 분할한 후 각 세그먼트를 조합 및 인식하여 최적 조합을 탐색하여 문자열을 인식한다. The character string recognition unit 122 recognizes a character string in the form of a printed character. More specifically, the character string is divided into subdivided points into character predicted points in a character string image, and then each segment is combined and recognized to search for an optimal combination. Recognize.

이미지 인식부(124)는 그래픽 요소가 포함된 문자 또는 로고를 인식하며, 도 2에 나타낸 바와 같이 특징추출부(200) 및 특징 매칭부(250)를 포함하여 이루어진다.The image recognizing unit 124 recognizes a letter or logo including graphic elements, and includes a feature extracting unit 200 and a feature matching unit 250 as shown in FIG. 2.

특징추출부(200)는 이미지의 특성을 나타내는 코너 포인트(corner point), 접합 포인트(junction point)를 추출한 후 주변 정보를 이용하여 벡터화한다. 특징 매칭부(250)는 상기 벡터화된 이미지 특징 정보와 이미지 매칭 DB의 정보를 매칭하여 가장 유사한 이미지 정보를 찾는다.The feature extractor 200 extracts a corner point and a junction point representing the characteristics of the image and then vectorizes the image using surrounding information. The feature matching unit 250 searches for the most similar image information by matching the vectorized image feature information and the information of the image matching DB.

다국어 번역부(130)는 상기 인식된 문자열을 다양한 언어로 번역하며, 일반적인 단어 또는 단어의 조합이 복수의 단어 또는 단어의 조합으로 배칭되어 있는 일반용어 번역 데이터 베이스와 특화된 단어 또는 단어의 조합이 1대 1로 매칭되어 있는 특화용어 번역 데이터베이스를 구비하고, 상기 인식된 문자열이 상기 일반 단어 또는 단어의 조합일 경우에는 상기 일반용어 번역 데이터 데이터베이스를 참조하여 복수의 단어 또는 단어의 조합으로 번역하고, 상기 인식된 문자열이 상기 특화된 단어 또는 단어의 조합일 경우에는 상기 특화용어 번역 데이터 데이터베이스를 참조하여 하나의 단어 또는 단어의 조합으로 번역한다.The multilingual translation unit 130 translates the recognized character strings into various languages, and a general term translation database in which a general word or a combination of words is arranged as a plurality of words or a combination of words and a specialized word or a combination of words is 1 A specialized term translation database that is matched one-to-one, and if the recognized character string is the general word or a combination of words, the term is translated into a plurality of words or a combination of words with reference to the general term translation data database. When the recognized character string is the specialized word or combination of words, the recognized character string is translated into one word or combination of words by referring to the specialized terminology translation data database.

관리도구(150)는 상기 추출된 문자의 인식 및 번역에 사용되는 인식/번역DB를 관리하고, 성능 튜닝 기능 및 문자인식 모델 DB 생성을 위한 학습기능을 부가로 제공하는 것이 바람직하며, 언어별 DB관리도구(155), 문자인식 성능튜닝 도구(160), 문자인식 학습도구(165)를 포함하여 이루어지고, 언어별 테스트 베드(170) 및 배포관리도구(175)를 더 구비함이 바람직하다.The management tool 150 manages the recognition / translation DB used for recognition and translation of the extracted characters, and additionally provides a performance tuning function and a learning function for generating a character recognition model DB. The management tool 155, the character recognition performance tuning tool 160, and the character recognition learning tool 165 may be made to further include a language-specific test bed 170 and a distribution management tool 175.

언어별 DB 관리도구(155)는 상기 추출된 문자를 인식하고 번역하는 데 사용되는 인식/번역DB를 관리하고 상기 관리에 필요한 등록, 변경, 삭제 및 조회 기능을 제공한다. 문자인식 성능튜닝 도구(160)는 문자인식 성능을 튜닝하며, 보다 상세하게는 상기 언어별 DB관리도구(155)로부터 전달받은 DB정보와 특화된 분야의 테스트용 샘플을 이용하여 특화된 분야에 필요한 DB와 엔진 튜닝정보를 선별하고, 상기 선별된 DB와 엔진튜닝 정보를 이용하여 엔진을 튜닝한다.The language-specific DB management tool 155 manages the recognition / translation DB used to recognize and translate the extracted characters, and provides registration, change, deletion and inquiry functions necessary for the management. The character recognition performance tuning tool 160 tunes the character recognition performance, and more specifically, the DB required for the specialized field using the DB information received from the language-specific DB management tool 155 and the test sample for the specialized field. Engine tuning information is selected, and the engine is tuned using the selected DB and engine tuning information.

문자인식 학습도구(165)는 문자인식 모델 DB 생성을 위한 학습기능을 제공한다. Character recognition learning tool 165 provides a learning function for generating a character recognition model DB.

도 3은 상기 인식/번역 DB(300)의 구성을 나타낸 것으로서, 폰트DB(310), 이미지 매칭 DB(320), 언어모델 DB(330), 문자인식 모델 DB(340) 및 번역 DB(350)를 구비하는 것이 바람직하다. 3 illustrates the configuration of the recognition / translation DB 300, the font DB 310, the image matching DB 320, the language model DB 330, the character recognition model DB 340, and the translation DB 350. It is preferable to have a.

폰트DB(310)는 인식기 학습에 사용되는 낱자정보를 폰트 종류별로 구분하여 저장한다. 이미지 매칭 DB(320)는 원래 이미지 정보와 특징 추출 정보로 이루어진다. 언어모델 DB(330)는 단어와 단어출현 빈도 및 단어 간 조합정보로 이루어지는 단어정보와, 낱자, 낱자출현빈도 및 낱자간 조합정보로 이루어지는 낱자 정보와, 오인식 발생이 쉬운 단어와 단어의 오인식 패턴으로 이루어지는 오인식 패턴 정보를 포함한다. 문자인식 모델 DB(340)는 상기 문자인식엔진 학습도구에서 생성된 문자인식 모델의 목록을 관리한다.The font DB 310 stores the piece information used for learning the recognizer by font type. The image matching DB 320 consists of original image information and feature extraction information. The language model DB 330 includes word information consisting of words, frequency of occurrence of words and combination information between words, word information consisting of words, frequency of occurrence of words and combinations between words, and misrecognition patterns of words and words that are easy to be mistaken. Contains misrecognition pattern information made. The character recognition model DB 340 manages a list of the character recognition models generated by the character recognition engine learning tool.

번역 DB(350)는 기준언어의 단어, 기준언어와 매칭되는 외국어 단어의 형태를 가지며 기준언어 테이블과 매칭되는 외국어 단어 테이블로 이루어진다.The translation DB 350 has a form of a word of a reference language, a foreign language word matching with the reference language, and a foreign language word table matching with the reference language table.

문자인식 학습도구(165)는 폰트 DB의 낱자 정보를 입력 받아 학습된 문자인식 모델을 생성하고, 도 4에 나타낸 바와 같이 클래스별 특징 추출부(400), 클래스별 차원축소부(420) 및 학습부(440)를 구비하는 것이 바람직하다.Character recognition learning tool 165 receives the character information of the font DB to generate a learned character recognition model, as shown in Figure 4 feature extraction unit 400, dimensional reduction unit 420 for each class and learning It is preferred to have a portion 440.

클래스별 특징 추출부(400)는 각 낱자의 경계정보를 이용하여 이미지 특성을 추출하여 특징벡터로 표현한다. 클래스별 차원축소부(420)는 학습 및 인식에 사용되는 메모리 및 계산량을 축소하기 위해 상기 특징 벡터의 차원에서 중요한 요소만을 추출하여 상기 특징 벡터를 재구성한다. 학습부(440)는 클래스 별 낱자들의 벡터를 이용하여 클래스를 대표하는 기준 벡터를 생성한다.The feature extractor 400 for each class extracts an image characteristic by using boundary information of each piece and expresses the feature as a feature vector. The class reduction unit 420 for each class reconstructs the feature vector by extracting only an important element in the dimension of the feature vector in order to reduce the memory and the amount of computation used for learning and recognition. The learner 440 generates a reference vector representing a class by using a vector of classes.

언어별 테스트 베드(170)는 상기 문자인식 성능 튜닝 도구로부터 최적 튜닝 DB정보와 최적 튜닝 엔진 정보를 전달받아 엔진을 재구성하여 테스트한다. The language-specific test bed 170 receives the optimal tuning DB information and the optimal tuning engine information from the character recognition performance tuning tool and reconfigures and tests the engine.

배포관리도구(175)는 상기 문자인식 성능 튜닝 도구(160)로부터 최적 튜닝 DB 정보와 최적 튜닝 엔진 정보를 전달받고 사용 가능한 형태의 엔진과 DB를 구성하여 서비스하고자하는 환경에 적합한 라이브러리를 생성하여 배포용 모듈을 출력한다.The distribution management tool 175 receives the optimal tuning DB information and the optimal tuning engine information from the character recognition performance tuning tool 160 and generates and distributes a library suitable for an environment to be configured and serviced by using an engine and a DB in a usable form. Output the module.

이하, 상술한 본 발명의 구성에 대한 보다 구체적인 실시예를 설명하기로 한다. Hereinafter, more specific embodiments of the above-described configuration of the present invention will be described.

도 5는 상술한 본 발명의 구성에 대한 보다 구체적인 실시예의 구성을 블록도로 나타낸 것으로서, 엔진 공통 프레임 웍(500)과 관리도구(550)를 포함하며, 엔진 공통 프레임 웍(500)을 기반으로 상용 소프트웨어(580)를 부가로 구비된다.5 is a block diagram showing the configuration of a more specific embodiment of the above-described configuration of the present invention, which includes an engine common framework 500 and a management tool 550, and is commercially available based on the engine common framework 500. Software 580 is additionally provided.

본 발명에 의한 자연영상 다국어 문자인식 및 번역 방법 및 시스템은 다국어 언어 인식기를 구성하기 위해서 엔진 공통 프레임웍(framework)을 구성하여 모든 언어에서 공통적으로 처리하는 부분을 개발하고 각 언어에 특화된 부분을 구분하여 제공한다. The natural video multilingual character recognition and translation method and system according to the present invention construct an engine common framework in order to construct a multilingual language recognizer, develop a part that is commonly processed in all languages, and distinguish a part specialized in each language. to provide.

도 6은 엔진 공통 프레임 웍(500)의 구성을 블록도로 나타낸 것이다. 상기 엔진 공통 프레임 웍(500)은 도 1의 공통 문자 인식/번역 엔진(100)에 해당하는 것으로서, 도 1의 자연영상 문자영역 추출부(110)에 해당하는 자연영상 문자 추출 엔진(600), 도 1의 다국어 문자 인식부(120)에 해당하는 다국어 문자 인식 엔진(640), 도 1의 다국어 번역부(130)에 해당하는 다국어 번역 엔진(680)으로 구성된다. 6 is a block diagram illustrating a configuration of the engine common framework 500. The engine common framework 500 corresponds to the common character recognition / translation engine 100 of FIG. 1, the natural image character extraction engine 600 corresponding to the natural image character region extraction unit 110 of FIG. 1, The multilingual character recognition engine 640 corresponding to the multilingual character recognition unit 120 of FIG. 1, and the multilingual translation engine 680 corresponding to the multilingual translation unit 130 of FIG. 1.

자연영상 문자 추출 엔진(600)은 사용자가 취득한 자연영상에서 문자열이 존재하는 영역을 탐색한다.The natural image character extraction engine 600 searches for an area in which a character string exists in the natural image acquired by the user.

다국어 인식 엔진(640)은 탐색된 문자열 영역을 인식하고, 단계를 거쳐 인식 성공률을 향상시키는 모듈이다.The multilingual recognition engine 640 is a module that recognizes the searched string region and improves the recognition success rate through the steps.

다국어 번역엔진(680)은 인식된 텍스트를 다양한 언어로 번역을 수행하는 모듈로서, 일반적인 단어 또는 단어의 조합이 복수의 단어 또는 단어의 조합으로 배칭되어 있는 일반용어 번역 데이터 베이스와 특화된 단어 또는 단어의 조합이 1대 1로 매칭되어 있는 특화용어 번역 데이터베이스를 구비하고, 상기 인식된 문자열이 상기 일반 단어 또는 단어의 조합일 경우에는 상기 일반용어 번역 데이터 데이터베이스를 참조하여 복수의 단어 또는 단어의 조합으로 번역하고, 상기 인식된 문자열이 상기 특화된 단어 또는 단어의 조합일 경우에는 상기 특화용어 번역 데이터 데이터베이스를 참조하여 하나의 단어 또는 단어의 조합으로 번역한다.The multilingual translation engine 680 is a module for translating recognized texts into various languages. The multilingual translation engine 680 is a general term translation database in which a general word or a combination of words is arranged in a plurality of words or combinations of words and a specialized word or word. A specialized term translation database having a combination of one-to-one matching, and when the recognized character string is the general word or a combination of words, the common term translation data database is translated into a plurality of words or a combination of words with reference to the general term translation data database. When the recognized character string is the specialized word or combination of words, the recognized character string is translated into a single word or combination of words with reference to the specialized term translation data database.

먼저, 도 6에 도시된 자연영상 문자 추출 엔진(600)을 설명하기로 한다. 종래의 영상 처리 엔진은 스캐너 기반으로 자연영상보다는 왜곡될 수 있는 조건이 적어 소수의 기능으로도 영상 인식을 할 수 있었으나 자연영상의 여러 가지 왜곡된 현상들을 처리하기는 미흡하였다.First, the natural image character extraction engine 600 shown in FIG. 6 will be described. Conventional image processing engines can recognize images with a small number of functions because they have fewer conditions that can be distorted than natural images based on a scanner, but they are insufficient to process various distorted phenomena of natural images.

도 7은 자연영상 문자 추출 엔진(600)의 구성을 블록도로 나타낸 것이다. 자연영상 문자 추출 엔진(600)은 전처리 과정에서 입력된 영상의 왜곡된 효과들을 제거하고, 영상에 포함되어 있는 문자의 영역을 찾아 인식 엔진에서 인식률을 높일 수 있게 최적의 문자열 영상을 만드는 것이다.7 is a block diagram illustrating a configuration of the natural video character extraction engine 600. The natural image character extraction engine 600 removes the distorted effects of the image input in the preprocessing process, finds the region of the characters included in the image, and creates an optimal string image to increase the recognition rate in the recognition engine.

자연영상 문자추출 엔진(600)은 카메라 또는 파일로부터 획득된 영상에서 인식 엔진에 문자열 영상을 제공까지의 전 과정을 포함하며, 자연영상 전처리 엔진(700), 문자 추출 엔진(740), 문자 검증 엔진(780) 3가지 모듈을 포함하여 이루어진다.The natural image character extraction engine 600 includes the entire process of providing a string image to a recognition engine from an image obtained from a camera or a file, and includes a natural image preprocessing engine 700, a character extraction engine 740, and a character verification engine. 780 consists of three modules.

자연영상 전처리 엔진(700)은 도 1의 자연영상 전처리부(112)에 해당하는 것으로서, 입력된 영상 파일을 영상처리가 가능한 형태로 변환하고, 영상에 포함되어 있는 여러 가지 왜곡된 효과를 보정한다. 또한 입력 영상의 중첩여부에 따라 병합과정이 수행된다.The natural image preprocessing engine 700 corresponds to the natural image preprocessing unit 112 of FIG. 1, converts the input image file into a form capable of image processing, and corrects various distorted effects included in the image. . In addition, the merge process is performed according to whether the input image is overlapped.

문자 추출 엔진(740)은 도 1의 문자추출부(114)에 해당하는 것으로서, 입력된 영상 내에서 문자영역일 확률이 높은 후보군을 선정하고, 이미지 처리 정보를 이용하여 문자영역 후보군 선정의 정확도를 높인다. 문자영역 후보들 중 확률적인 방법으로 실제 문자영역의 배경영상과 문자열을 구분 한 후, 추출된 문자열을 각각의 문자들로 분리한다.The character extraction engine 740 corresponds to the character extracting unit 114 of FIG. 1, and selects a candidate group having a high probability of being a text area in the input image, and uses the image processing information to determine the accuracy of selecting a text area candidate group. Increase Among the text area candidates, the background image and the text of the actual text area are distinguished by a probabilistic method, and the extracted text is divided into the respective letters.

문자 추출 과정에서 문자열에 해당하는 언어를 모르는 경우 문자 추출 엔진에서 추출된 문자들이 언어에 맞게 분할되었는지 확인할 수 없다. 문자 검증 엔진은 언어 모델을 적용하여 언어별 문자 추출로 문자 추출의 정확도를 높인다.If you do not know the language of the character string during the character extraction process, the character extraction engine cannot determine whether the characters extracted are divided according to the language. The character verification engine improves the accuracy of character extraction by applying language model.

도 7을 참조하여 자연영상 전처리 엔진(700)을 보다 상세하게 설명한다.A natural image preprocessing engine 700 will be described in more detail with reference to FIG. 7.

자연영상 전처리 엔진(700)은 도 7에 나타나 있는 바와 같이, 영상 코덱엔진(705), 영상최적화 엔진(710) 및 영상 병합 엔진(715)을 포함한다. As illustrated in FIG. 7, the natural image preprocessing engine 700 includes an image codec engine 705, an image optimization engine 710, and an image merging engine 715.

영상 코덱 엔진(705)을 설명하면, 8M 화소 카메라는 최대 해상도로 찍으면 영상 크기가 3264 x 2448 정도 된다. 이러한 영상을 모바일 기기에서 압축 해제하는 경우 많은 양의 메모리와 많은 수행 시간이 소요되므로 영상코덱 엔진(705)은 모바일 환경에서 메모리 공간을 효율적으로 사용하기 위해 경량화 코덱과 모바일 환경에서 디코딩 시간을 줄일 수 있는 코덱 및 파라미터에 따라 축소된 영상으로 디코딩되는 JPEG 코덱 모듈을 제공한다.Referring to the image codec engine 705, an 8M pixel camera has an image size of about 3264 x 2448 when taken at the maximum resolution. When decompressing such a video in a mobile device, a large amount of memory and a large amount of execution time are required, so the image codec engine 705 can reduce decoding time in a lightweight codec and a mobile environment to efficiently use memory space in a mobile environment. It provides a JPEG codec module that decodes a reduced image according to a codec and parameters.

영상 최적화 엔진(710)을 설명하면, 자연영상 처리는 영상의 복합적인 왜곡에 의해 대상을 정확히 인식하기 어렵다. 카메라 기반 영상촬영시 자연영상에서 발생하는 문제들로는 도 8에 나타낸 바와 같이, rotation & distortion, blurring, transparency, reflection, occlusion, shadow가 있다.Referring to the image optimization engine 710, natural image processing is difficult to accurately recognize the object by the complex distortion of the image. Problems that occur in natural video when taking a camera-based image include rotation & distortion, blurring, transparency, reflection, occlusion, and shadow, as shown in FIG. 8.

영상 최적화 엔진(710)은 자연영상으로부터 최적의 인식 대상 추출 및 인식을 위한 영상 개선 프로세싱 엔진으로서, rotation, skew, perspective등의 왜곡이 있는 영상을 보정 및 인식한다. 또한 조명의 변화를 고려하여 입력 영상이 Reflectance illumination으로 이루어진 경우 수학적으로 모델링 하고 illumination effect를 제거하여 글자 영역을 효과적으로 추출한다.The image optimization engine 710 is an image enhancement processing engine for extracting and recognizing an optimal recognition object from a natural image, and corrects and recognizes an image having a distortion such as rotation, skew, and perspective. In addition, when the input image is made of reflectance illumination in consideration of the change in illumination, the letter region is effectively extracted by mathematically modeling and removing the illumination effect.

도 9는 reflection 효과를 제거하는 알고리즘을 그림으로 나타낸 것으로서, 도 9에 나타낸 영상처럼 그림자로 인하여 조명 효과가 발생한 영상을 전처리 과정을 거쳐 실제 물체의 색을 되살려 글자 영역이 검출된다. 이를 위해, 영상 최적화 엔진에는 Blurring효과를 제거하는 알고리즘과 Transparency효과를 제거하는 알고리즘과 Occlusion효과를 제거하는 알고리즘과 Shadow효과를 제거하는 알고리즘 및 자연영상에 적합한 이진화 알고리즘이 포함되어 있다.9 is a diagram illustrating an algorithm for removing a reflection effect. As shown in FIG. 9, a text area is detected by regenerating a color of an actual object through a preprocessing process of an image in which a lighting effect is generated due to a shadow. To this end, the image optimization engine includes an algorithm for removing blurring effect, an algorithm for removing transparency effect, an algorithm for removing occlusion effect, an algorithm for removing shadow effect, and a binarization algorithm suitable for natural image.

영상 병합 엔진(715)을 설명하면, 영상 병합엔진(715)은 굴곡이 심한 영상 또는 대상을 여러 장으로 나누어 촬영 한 영상들을 평면상의 하나의 영상으로 정합하여 한 번에 인식을 수행 할 수 있게 한다. 영상 병합엔진(715)에는 각 영상 간의 중복영역 검출 알고리즘과 영상의 회전 추정 및 보정 알고리즘이 포함되어 있다. 도 10은 영상 병합이 필요한 예를 나타낸 것이며, 도 11은 파노라믹 영상의 정합방법을 나타낸 것이다. 도 12는 영상 병합이 완료된 이미지를 나타낸 것이다.Referring to the image merging engine 715, the image merging engine 715 may perform image recognition by matching images photographed by dividing a severely curved image or an object into multiple images into a single image on a plane. . The image merge engine 715 includes an algorithm for detecting overlapping regions between images and a rotation estimation and correction algorithm of the images. 10 illustrates an example in which image merging is required, and FIG. 11 illustrates a matching method of a panoramic image. 12 illustrates an image in which image merging is completed.

도 7을 참조하면, 자연영상 전처리 엔진(700)에서 자연영상을 전처리한 후 문자추출 엔진(740)을 통해 문자를 추출한다. 문자추출 엔진(740)을 설명하기로 한다.Referring to FIG. 7, after the natural image is preprocessed by the natural image preprocessing engine 700, a character is extracted through the character extraction engine 740. The character extraction engine 740 will be described.

문자추출 엔진(740)을 설명하면, 자연영상에서 문자가 있을 만한 영역을 탐색하여 문자영역을 검출하고, 영상의 조건들을 DB화 시켜 문자영역 검출시 이용함으로써 이미지 처리정보를 적용하고, MRF(Markov Random Field) 엔진을 통해 자연영상에서 배경과 문자영역을 분할한 후 문자영역에서 각각의 문자를 추출하며, 문자영역 검출엔진(745), MRF엔진(755), CC(Connected Component) 추출 엔진(760)을 포함한다.. Referring to the character extraction engine 740, a character region is detected by searching for a region in which a character may exist in a natural image, and the image conditions are applied to a DB to apply image processing information by using the MRF (Markov). Random field (SLR) engine divides the background and text area from the natural image and extracts each character from the text area.The text area detection engine (745), MRF engine (755), and connected component (CC) extraction engine (760) ).

자연영상의 인식 문제에서 가장 많이 연구되고 있는 분야는 문자 영역의 검출 분야이다. 이는 문자 추출의 기술 수준이 초기 단계에 있는 것을 의미하며 개선된 문자영역 검출 엔진(745)이 필요하다.The most researched field in the recognition problem of natural image is the field of detection of character domain. This means that the technical level of character extraction is at an early stage and an improved character area detection engine 745 is needed.

문자영역 검출 엔진(745)은 일반적으로 자연영상 내에서 단위 면적당 경계선의 분포가 높게 나타나는 특징을 이용하여 영상의 경계 부분의 분포를 바탕으로 문자 영역을 판별하며, 지정된 크기의 영역 내 경계선 분포를 바탕으로 문자 영역 여부를 판단해 주는 인공 신경망 기반 인식기를 이용하여 문자 영역의 후보를 판별할 수 있다. 또한 문자영역 검출엔진(745)은 경계선 분석으로 판별된 문자영역 후보들을 연결요소 분석 및 모폴로지컬 연산 등을 통하여 군집화하며, 군집화된 후보들에서 영역의 크기, 위치, 모양 등의 분석을 통하여 문자 영역을 추출한다. 도 13은 그림자가 있는 자연영상에서 경계검출 방법을 나타낸 것이다.The text area detection engine 745 generally determines the text area based on the distribution of the boundary portion of the image by using the feature that the distribution of the boundary line per unit area is high in the natural image. By using the artificial neural network based recognizer to determine whether the character region can be determined candidates of the character region. In addition, the text area detection engine 745 clusters the text area candidates determined by the boundary analysis through connection element analysis and morphological operations, and analyzes the text area by analyzing the size, position, and shape of the area in the clustered candidates. Extract. FIG. 13 illustrates a boundary detection method in a natural image with shadow.

문자영역의 검출율 향상을 위해 영상의 다양한 조건에서 획득된 색상 정보, 형태 정보 등 이미지 처리 정보 DB를 사용하는 것이 바람직하며, 등록된 정보를 이용 함으로써 자연영상에서 더 높은 문자 영역 후보 검출율을 가질 수 있다. 또한 이미지 처리 정보를 적용(750)함으로써, 즉 영상의 조건들을 DB화 시켜 문자영역 검출시 이용함으로써, 문자 영역 검출 시간을 감소시킨다. In order to improve the detection rate of the text area, it is preferable to use an image processing information DB such as color information and shape information acquired under various conditions of the image, and have higher character area candidate detection rate in the natural image by using the registered information. Can be. In addition, by applying the image processing information (750), that is, by making the conditions of the image DB to use the character area detection, the character area detection time is reduced.

일반적으로 디지털 카메라로 촬영한 영상은 다양성이 커 문자의 위치, 크기, 색상 등을 미리 알 수 없기 때문에 자연영상 속에 포함된 문자들을 영상으로부터 분할해내는 과정이 어려움이 있는데, 상기 MRF(Markov Random Field) 엔진(755)은 배경으로부터 문자를 추출하며, 영상 내 문자 분할의 문제를 확률적인 레이블(label) 결정 문제로 바꾸어 각 픽셀의 레이블을 주어진 영상에 대해 가장 높은 확률에 따라 결정한다. MRF 엔진(755)은 글자 후보를 모두 추출한 후 글자간 관계 정보를 이용하여 실제 글자인 부분을 추출한다. 이를 통해 더 높은 신뢰도의 글자 후보를 추출할 수 있다. 도 14는 MRF를 적용한 글자 후보 영역 추출과정을 나타낸 것이다.In general, images taken with a digital camera have a great variety, and thus the position, size, and color of the characters cannot be known in advance, so it is difficult to divide the characters included in the natural image from the images. The engine 755 extracts characters from the background, and replaces the problem of character segmentation in the image with a problem of label determination, and determines the label of each pixel according to the highest probability for a given image. The MRF engine 755 extracts all character candidates and then extracts portions of actual characters using relationship information between characters. Through this, it is possible to extract a character candidate with higher reliability. 14 illustrates a process of extracting a character candidate region applying MRF.

CC(Connected Component)추출 엔진(760)은 추출된 문자열에서 낱자 단위로 문자를 분할한다. 이를 위해, 추출된 문자열에 대해 CC추출을 이용하여 문자 영역의 최소 단위인 CC로 레이블링하고, 각 CC들을 병합 과정을 거쳐 하나의 문자로 재구성한다. 그리고 나서, 상기 재구성된 문자들의 연결된 라인을 찾아 온전한 문자열로 구성하며, Perspective효과를 제거하기 위해 문자열의 베이스 라인을 찾아 perspective의 적용 여부를 판별한다. 도 15는 perspective 효과 검출의 일예를 나타낸 것이다.The connected component (CC) extraction engine 760 splits the characters in units of characters in the extracted string. To this end, the extracted string is labeled with the CC which is the minimum unit of the character area using CC extraction, and each CC is reconstructed into one character through a merge process. Then, the concatenated lines of the reconstructed characters are found and composed of intact strings. In order to remove the perspective effect, the baseline of the strings is searched to determine whether the perspective is applied. 15 shows an example of perspective effect detection.

도 7을 참조하면, 자연영상 전처리 엔진(700)에서 자연영상을 전처리한 후 문자추출 엔진(740)을 통해 문자를 추출하고, 문자검증 엔진(780)을 통해 언어 모델 적용하여 추출된 문자를 검증한다. 문자검증 엔진(780)을 설명하기로 한다.Referring to FIG. 7, after preprocessing the natural image in the natural image preprocessing engine 700, the character is extracted through the character extraction engine 740, and the language model is applied through the character verification engine 780 to verify the extracted character. do. The character verification engine 780 will be described.

문자열은 CC추출에 의해 병합되기 때문에 문자열로만 주어질 뿐 인식을 위한 기초 정보를 갖지 못한다. 추출된 문자열이 2줄로 구성된 경우 한글의 초성+중성 인지, 영문 2줄인지 혼합된 언어인지 구분이 어렵기 때문에 문자 병합의 완성도를 높이기 위해서는 언어 모델 적용한 문자검증 엔진(780)이 필요하다. 도 16은 문자 병합 판단이 어려운 예를 나타낸 것이고 도 17은 언어모델 적용 후 문자 병합의 예를 나타낸 것이다.Because strings are merged by CC extraction, they are given only as strings and do not have basic information for recognition. If the extracted string is composed of two lines, it is difficult to distinguish whether it is a Korean initial consonant + neutral, an English two lines, or a mixed language. Therefore, a character verification engine 780 applied to a language model is required to increase the completion of character merging. FIG. 16 illustrates an example of difficult character determination, and FIG. 17 illustrates an example of character merging after application of a language model.

문자검증 엔진은 도 17에 나타낸 바와 같이 문자 병합 후 문자열로 구성하기 전 언어모델을 적용하여 병합된 문자들의 언어를 파악하고, 해당 언어에 적합한 병합방법으로 병합되었는지, 이종의 언어로 구성되어 있는지 등을 검증한다. As shown in FIG. 17, the character verification engine recognizes the languages of the merged characters by applying a language model before composing the character strings after merging the characters and merges them with a merge method suitable for the corresponding language, or is composed of heterogeneous languages. Verify.

한편, 도 6을 참조하면, 자연영상 문자 추출엔진(600)을 통해 문자가 추출되면, 다국어 문자 엔진(640)을 통해 다국어 문자를 인식한다. 다국어 문자 인식 엔진(640)을 설명하기로 한다.Meanwhile, referring to FIG. 6, when a character is extracted through the natural video character extraction engine 600, the multilingual character is recognized through the multilingual character engine 640. The multilingual character recognition engine 640 will be described.

다국어 문자 인식엔진(640)은 자연영상 문자추출 단계의 결과인 문자열 영상을 분석하고 인식하여 텍스트 형태의 결과를 출력하며 문자열 인식기(645) 이미지 인식기(650)를 포함한다. 문자열 인식기(645)는 일반 인쇄문자의 형태로 된 문자열을 인식하며, 이미지 인식기(645)은 그래픽 요소가 많이 포함된 문자 또는 로고 등을 인식한다. .The multilingual character recognition engine 640 analyzes and recognizes a string image that is a result of the natural image character extraction step, outputs a textual result, and includes a string recognizer 645 and an image recognizer 650. The string recognizer 645 recognizes a character string in the form of general print characters, and the image recognizer 645 recognizes a character or a logo including many graphic elements. .

문자열 인식기(645)과 이미지 인식기(650)은 서비스 형태에 따라 함께 사용하거나 한 부분만을 사용하는 것이 가능하며 대부분의 상황에서는 문자열 인식기(645)으로만 인식을 수행하며 이미지 인식기(650)는 문자열 인식으로는 인식이 어려운 로고 등에 대한 인식을 제한적으로 수행한다.The string recognizer 645 and the image recognizer 650 may be used together or only one part may be used depending on the service type. In most situations, the string recognizer 645 performs the recognition only, and the image recognizer 650 recognizes the string. Recognizes logos that are difficult to recognize.

도 18은 문자열 인식기(645) 구성을 블록도로 나타낸 것으로서, 문자열 인식기(645)를 설명하기로 한다. FIG. 18 is a block diagram illustrating the configuration of the string recognizer 645, and the string recognizer 645 will be described.

문자열 인식기(645)는 과분할 기반의 문자인식 방식을 채용하고 있으며, 과분할 기반의 문자인식은 문자열영상에서 문자구분 예상지점으로 분할 점을 세분화하여 분할한 후 각 세그먼트를 조합 및 인식하여 최적 조합을 탐색하며, 두 문자가 접촉되어 하나의 세그먼트 내에 두 문자가 포함된 경우 정상인식이 불가능한 문제를 방지한다. 상기 문자열 인식 기능은 과분할, 과분할된 세그먼트 조합 및 인식, 오인식 패턴 정보를 이용한 후처리, 단어간 조합정보를 이용한 후처리 단계를 포함한다.The string recognizer 645 adopts a hyperdivision-based character recognition method, and the hyperdivision-based character recognition divides a split point into an expected point of character division in a string image, divides the split point, and then combines and recognizes each segment for an optimal combination. In this case, the two characters are in contact with each other to prevent the problem that normal recognition is impossible when two characters are included in one segment. The character string recognition function includes over-dividing, over-divided segment combination and recognition, post-processing using misrecognition pattern information, and post-processing using combination information between words.

도 19는 과분할 인식 과정의 예를 나타낸 것으로서, 가로쓰기용 과분할 처리와 세로쓰기용 과분할 처리가 있다. 상기 가로쓰기용 과분할 처리 및 세로쓰기용 과분할 처리에서의 과분할된 세그먼트 조합 및 인식은 언어모델을 이용하여 조합의 정확도를 높인다. 19 shows an example of an overdivision recognition process, in which there is an overdivision process for horizontal writing and an overdivision process for vertical writing. The overdivided segment combinations and recognition in the horizontal division and the division division division division processing are improved using the language model.

인식기는 한국어 인식기, 라틴 인식기, 중국어 인식기 및 일본어 인식기를 포함한다. Recognizers include Korean recognizers, Latin recognizers, Chinese recognizers, and Japanese recognizers.

한국어 문자열 인식엔진은 문자인식 공통 프레임웍을 기반으로 하며, 한국어 폰트수집, 낱자학습, 문자열 인식기 개발 및 튜닝을 통해 만들어질 수 있다.Korean character string recognition engine is based on a common character recognition framework, and can be created by collecting Korean fonts, learning words, and developing and tuning a character string recognizer.

인식기는 수집된 폰트 정보를 이용하여 학습과 튜닝을 거쳐서 개발되기 때문에, 학습에 사용되지 않은 폰트에 대해서는 인식의 정확성을 기대하기가 어렵다. 따라서 폰트정보는 인식기에서 가장 중요한 요소이며 다양한 폰트 정보가 필요하다.Since the recognizer is developed through learning and tuning using the collected font information, it is difficult to expect the accuracy of the recognition for the font not used for learning. Therefore, font information is the most important element in the recognizer and various font information is needed.

한국어 폰트는 명조, 고딕, 궁서 계열의 경우는 보유하고 있는 폰트 정보를 사용하며 추가로 그래픽 요소가 가미된 폰트 계열을 수집하여 인식가능 폰트의 범위를 넓힌다. 수집된 폰트 정보는 방대하기 때문에, 전용 관리도구인 폰트 DB 관리도구가 수집된 폰트를 사용하여 관리한다. 도 20은 수집 대상 한글 폰트의 일예를 나타낸 것이다.Korean fonts use the font information possessed by Myeongjo, Gothic, and Palace, and expand the range of fonts that can be recognized by collecting fonts with graphic elements. Since the collected font information is huge, the font DB management tool, which is a dedicated management tool, manages the collected fonts. 20 illustrates an example of a collection Korean font.

한국어용 낱자인식기는 수집된 낱자를 학습하여 한국어용 낱자를 인식하며, 학습 파라미터를 조절해가며 학습과 테스트를 반복해서 수행한다. 학습 파라미터는 학습 과정에서 사용되는, 즉 낱자의 특징추출, 추출된 특징을 이용한 학습에 사용되는 파라미터를 의미하며 파라미터 값에 따라서 인식기의 성능에 차이가 발생한다. 따라서 성능이 높은 학습기를 구축하기 위해서는 적합한 파라미터 값을 찾아야 하고 학습과 테스트를 반복해서 수행해야 한다. 낱자 학습은 폰트 DB 관리도구로부터 수집된 폰트 정보를 전달받고 문자인식 학습 도구를 사용하여 진행한다.Korean word recognizer learns collected letters and recognizes Korean words, adjusts learning parameters and repeats learning and testing. The learning parameter refers to a parameter used in the learning process, that is, a parameter used for learning using feature extraction and extracted features, and a difference occurs in the performance of the recognizer according to the parameter value. Therefore, in order to build a high-performance learner, it is necessary to find suitable parameter values and repeat the training and testing. The word learning is received using font information collected from the font DB management tool and proceeds using the character recognition learning tool.

한국어 문자열 인식기는 엔진 공통 프레임웍을 기반으로 한국어 문자열 인식에 적합한 세그먼트 조합 루틴, 언어모델 사용 루틴, 세그먼트 조합 후처리 루틴을 포함한다. 한국어 세그먼트 조합 루틴과 조합 후처리 루틴은 한글의 자소 특성 정보를 이용하여 조합 성능을 높이고, 언어모델 사용 루틴도 한글의 자소 단위로 분할한 형태를 사용하여 처리 정확성을 높일 수 있고, 실제 사용 환경에서 취득한 문자이미지 샘플을 사용하여 한국어 문자에 적합하게 인식이 되도록 한국어의 특성을 분석하고 튜닝을 진행한다.The Korean string recognizer includes a segment combination routine, a language model usage routine, and a segment combination post-processing routine suitable for Korean string recognition based on an engine common framework. Korean segment combination routine and combination post-processing routine can improve combination performance using Hangul's phonemic property information. Also, language model using routine can improve processing accuracy by using partitioned form of Hangul's phoneme unit. Using the acquired character image sample, the characteristics of Korean are analyzed and tuned in order to recognize them appropriately for Korean characters.

라틴 문자열 인식엔진은 한국어 인식엔진과 마찬가지로 엔진 공통 프레임웍을 기반으로 하며, 폰트수집, 낱자학습, 문자열 인식기 개발 및 튜닝을 통해 만들어질 수 있다. 라틴 문자는 기본 영문 알파벳과 유럽의 각 나라에서 사용하는 추가 라틴 문자로 구성된다. 추가 라틴 문자는 기본 영문 알파벳의 위쪽 또는 아래쪽에 점 형태의 Diacritic이 붙어 있으며 다른 언어의 문자와 형태가 다르기 때문에 Diacritic 특성을 적용한 루틴들을 포함한다. 도 21은 수집 대상 라틴 폰트의 일예를 나타낸 것이다.The Latin string recognition engine, like the Korean recognition engine, is based on the common framework of the engine and can be created through font collection, single word learning, and string recognizer development and tuning. Latin letters consist of the basic English alphabet and additional Latin letters used in each country of Europe. Additional Latin letters include routines with a diacritic in the form of dots above or below the base English alphabet and differing from other languages. 21 shows an example of a collection target latin font.

라틴 문자에 적합한 낱자인식기를 위해서는 학습 파라미터를 조절해가며 학습과 테스트를 반복해서 수행함으로써, 라틴 낱자 학습이 이루어진다. 라틴 문자열 인식기는 공통 프레임웍을 기반으로 라틴 문자열 인식에 적합한 세그먼트 조합 루틴, 언어모델 사용 루틴, 세그먼트 조합 후처리 루틴을 포함하며, 실제 문자이미지 샘플을 이용하여 라틴 문자에 적합하도록 튜닝이 이루어진다. In order to recognize the Latin characters, the Latin character learning is performed by repeatedly performing the learning and the test by adjusting the learning parameters. The Latin string recognizer includes segment combination routines, language model usage routines, and segment combination post-processing routines suitable for Latin string recognition based on a common framework, and are tuned to fit Latin characters using actual character image samples.

중국어 인식기를 설명한다. 먼저 중국어 폰트 수집한다. 중국어 폰트는 다른 언어 폰트에 비해서 필기체 형태의 특이한 폰트들이 많이 사용되므로 정형화된 폰트 외에 특이한 형태의 폰트 위주로 수집하고, 중국어 낱자 인식기를 개발한다. Describe the Chinese recognizer. First collect Chinese fonts. Chinese fonts have a lot of cursive unique fonts compared to other language fonts. Therefore, the Chinese fonts are collected in the form of unusual fonts in addition to the standard fonts, and the Chinese word recognizer is developed.

중국어와 일본어는 한국어와 라틴에 비해 인식대상 문자의 종류가 훨씬 많은 특성으로 인해서 인식 시간이 오래 걸리는 문제가 있기 때문에 클러스터링 개념을 낱자 인식기에 도입하여 인식시간을 단축한다. 클러스터링은 인식과정에서 모든 클래스를 비교하는 것을 방지하기 위해서 유사한 형태의 클래스로 그룹화한 클러스터를 구성하고 유사한 클러스터 내의 클래스들만 비교하는 방식이다.Chinese and Japanese have a problem that it takes a long time to recognize due to the characteristics of recognition characters much more than Korean and Latin. Therefore, the clustering concept is introduced into the word recognizer to reduce the recognition time. Clustering is a method of forming clusters grouped into similar classes and comparing only classes in similar clusters in order to prevent comparison of all classes.

중국어 문자열 인식기 생성을 위해, 공통 프레임웍을 기반으로 중국어 문자열 인식에 적합한 segment 조합 루틴, 언어모델 사용 루틴, segment 조합 후처리 루틴을 포함하며 실제의 문자이미지 샘플을 이용하여 중국어 문자에 적합한 튜닝을 진행한다.In order to generate the Chinese string recognizer, it includes the segment combination routine, language model usage routine, and segment combination post-processing routine suitable for Chinese string recognition based on the common framework. .

일본어 인식기를 설명한다. 먼저, 일본어 폰트를 수집하고, 일본어 낱자 인식기를 생성한다. 중국어와 마찬가지로 클러스터링 개념을 도입하여 인식시간을 단축시키고 학습과 테스트를 반복 수행하여 최적의 낱자인식기를 생성한다.Explain the Japanese recognizer. First, Japanese fonts are collected and a Japanese word recognizer is generated. Like Chinese, the clustering concept is introduced to reduce the recognition time and to perform the learning and testing repeatedly to create the optimal word recognizer.

일본어 문자열 인식기 생성을 위해, 공통 프레임웍을 기반으로 일본어 문자열 인식에 적합한 segment 조합 루틴, 언어모델 사용 루틴, segment 조합 후처리 루틴을 수정하며 실제의 문자이미지 샘플을 이용하여 일본어 문자에 적합한 튜닝을 진행한다.To create Japanese string recognizer, we modify segment combination routine, language model usage routine, segment combination post-processing routine suitable for Japanese string recognition based on common framework, and perform appropriate tuning for Japanese characters using actual character image samples. .

도 18을 참조하면, 문자열 인식기(645)는 과분할 처리 후 다국어를 인식하고, 인식 후처리를 수행한다. 인식 후처리는 오인식 패턴 정보를 이용한 후처리를 수행한 후, 단어간 조합 정보를 이용하여 후처리를 수행함으로써 이루어진다.Referring to FIG. 18, the string recognizer 645 recognizes a multilingual language after the hyperdivision process, and performs a recognition post-process. Recognition post-processing is performed by performing post-processing using misrecognition pattern information, and then performing post-processing using combination information between words.

상기 오인식 패턴 정보를 이용한 후처리와 단어 간 조합 정보를 이용한 후처리는 훼손된 형태의 문자열 영상이 입력되어 순수한 문자열 인식 과정만으로는 정상적인 인식결과를 얻을 수 없을 때 인식성능을 개선시킨다. The post-processing using the misrecognition pattern information and post-processing using the combination information between words improves the recognition performance when a normal string recognition process cannot obtain a normal recognition result by inputting a corrupted string image.

오인식 패턴 정보를 이용한 후처리는 언어모델의 오인식 패턴정보와 단어 빈도수 정보를 이용하여 인식된 각 단어의 오인식을 보정하며 추가로 문맥정보, 문자크기 등의 정보도 이용한다. 또한 단어 간 조합 정보를 이용한 후처리는 언어모델의 단어간 조합정보를 이용하여 잘못된 단어 간 조합을 보정한다. 도 22는 훼손된 영상에 대한 인식결과 후처리의 예를 나타낸 것으로서, 훼손된 문자열 영상이 입력되면 과분할 및 세그먼트 조합 인식을 수행한 후 오인식 패턴을 이용하여 후처리한 후 단어간 조합정보를 이용하여 후처리하여 오인식이 보정되는 것을 나타내고 있다. 이 때 언어모델 정보를 이용하여 오인식 패턴 정보와 단어가 조합정보를 이용하여 오인식을 보정한다. Post-processing using the misrecognition pattern information corrects misrecognition of each recognized word using misrecognition pattern information and word frequency information of the language model, and additionally uses contextual information and character size. In addition, the post-processing using the word-to-word combination information corrects an incorrect word-to-word combination using the word-to-word combination information of the language model. FIG. 22 illustrates an example of post-processing of a recognition result post-processing on a corrupted image. When a corrupted character string image is input, after performing division and recognition on segment combination, postprocessing using a misrecognition pattern and then using post-word combination information. The processing indicates that the misperception is corrected. At this time, the misrecognition pattern information and words are corrected using the combination information using language model information.

한편, 도 6을 참조하면, 자연영상 문자 추출엔진(600)을 통해 문자가 추출되면, 다국어 문자 엔진(640)을 통해 다국어 문자를 인식하는데, 이 때 이미지 매칭을 수행한다. 상기 이미지 매칭을 설명하기로 한다.Meanwhile, referring to FIG. 6, when a character is extracted through the natural image character extraction engine 600, the multilingual character is recognized by the multilingual character engine 640, and image matching is performed at this time. The image matching will be described.

도 23은 이미지 매칭의 구성을 나타낸 것으로서, 이미지 매칭은 로고, 이미지화된 문자 등의 인식에 사용하는 데, 이미지의 주요 부분을 추출하고 벡터화하는 특징추출과, 벡터화된 특징 정보로 이미지 매칭 DB를 검색하여 이미지의 특징을 매칭하는 특징 매칭을 통해 이미지 매칭이 이루어진다. FIG. 23 illustrates the configuration of image matching. Image matching is used for recognition of logos, imaged characters, and the like. The feature extraction extracts and vectorizes a main part of an image, and searches an image matching DB with vectorized feature information. Image matching is performed through feature matching to match features of an image.

상기 특징 추출은 이미지의 특성을 잘 나타내는 주요 부분에 대해서 주변 정보 등을 이용하여 벡터화를 통해 이루어진다. 이미지 특성을 잘 나타내는 요소는 Corner point, Junction point 등이 있으며 이러한 요소를 추출한다. 각 요소의 주변정보는 pixel 값 분포 등을 이용하여 벡터화되며, 이미지 매칭에 사용되는 특징은 이미지의 크기변화, 회전, 부분적인 가려짐에 대한 처리가 가능해야 하므로 전역적 특징보다는 지역적 특징을 사용하는 것이 바람직하다. 대표적인 알고리즘으로 SIFT, SURF 등이 있다. 상기 SIFT(Scale-invariant feature transform)는 크기, 회전 변환, 3D 투영의 Affine 변환, 빛의 강도에 robust한 feature을 생성하며, 카메라 기반 물체 인식, 이미지 매칭 기술의 근간을 이루는 알고리즘이다. 상기 SURF(Speeded Up Robust Features )는 SIFT의 고속화 버전으로 속도가 빠르고 정확성도 좋으며, 모바일 단말 탑재용으로 적합하다. 도 24는 이미지 매칭에서 특징점 추출의 예를 나타내고 있다.The feature extraction is performed through vectorization using the surrounding information and the like for the main part representing the characteristics of the image well. Elements that represent image characteristics well include corner point and junction point, and these elements are extracted. Peripheral information of each element is vectorized using the distribution of pixel values, and the feature used for image matching should be able to handle the size change, rotation, and partial occlusion of the image. It is preferable. Typical algorithms include SIFT and SURF. The scale-invariant feature transform (SIFT) generates robust features in size, rotation transformation, Affine transformation of 3D projection, light intensity, and is an algorithm that forms the basis of camera-based object recognition and image matching technology. The Speeded Up Robust Features (SURF) is a high speed version of SIFT, which is fast and accurate, and is suitable for mounting a mobile terminal. 24 shows an example of feature point extraction in image matching.

상기 특징 매칭은 이미지 특징 정보와 이미지 매칭 DB의 정보를 매칭하고 가장 유사한 이미지 정보를 찾는다. 도 25는 이미지 매칭의 예를 나타내고 있다. 사전에 등록해둔 이미지 매칭 DB의 각 특징과 입력 이미지에서 추출된 특징 정보를 비교하여 가장 유사한 이미지 정보를 찾는다. 일반적인 문자 인식의 특징 비교에는 신경망으로 대표되는 경계선 기반 방법론(Discriminative Methods)을 많이 사용하는데, 이러한 방법론은 적은 class에만 적합하다. 상표 인식은 인식 대상 Class의 수가 많기 때문에 이러한 방법론을 사용할 수 없고, 많은 Class에 적합한 중심 기반 방법론(Generative Methods)을 사용하는 것이 높은 성능을 낼 수 있다. 이미지 매칭은 하나의 이미지 당 수천 개의 특징점이 추출되어 사용되는데, 이 모든 차원을 사용하면 계산량이 과다하기 때문에 인식 속도가 크게 떨어지는 문제점이 발생한다. 매칭 시 인식률을 거의 동일하게 유지하면서도 계산량을 최적화할 수 있는 특징 차원 축소(Dimensionality Reduction), 매칭 후보 축소(클러스터링(Clustering)을 통한 후보 선택) 등의 기술을 적용하여 성능을 향상시킨다. 도 26은 클러스터링(Clustering)을 통한 후보 선택을 나타내고 있다.The feature matching matches the image feature information with the information in the image matching DB and finds the most similar image information. 25 shows an example of image matching. The most similar image information is found by comparing each feature of the image matching DB registered in advance and feature information extracted from the input image. To compare the features of general character recognition, we use a lot of Discriminative Methods represented by neural networks. This methodology is only suitable for small classes. Trademark recognition cannot use this methodology because it has a large number of classes to be recognized, and using a center-based method suitable for many classes can yield high performance. In image matching, thousands of feature points are extracted and used per image. When all these dimensions are used, the computation speed is excessive and the recognition speed is greatly reduced. The performance is improved by applying techniques such as dimensionality reduction and matching candidate reduction (cluster selection through clustering), which can optimize computations while keeping the recognition rate almost the same. 26 illustrates candidate selection through clustering.

한편, 도 6을 참조하면, 자연영상 문자 추출엔진(600)을 통해 문자가 추출되면, 다국어 문자 엔진(640)을 통해 다국어 문자를 인식하고, 다국어 번역엔진(680)을 통해 인식된 문자를 다른 언어인 다국어로 번역한다. 다국어 번역 엔진(680)을 설명하기로 한다.Meanwhile, referring to FIG. 6, when a character is extracted through the natural image character extraction engine 600, the multilingual character is recognized through the multilingual character engine 640, and the character recognized through the multilingual translation engine 680 is converted into another character. Translate into multiple languages. The multilingual translation engine 680 will be described.

다국어 번역 기능은 단방향 1:N 번역방식과 양방향 1:1 번역방식을 포함한다. 단방향 1:N 방식은 번역된 의미가 두 개 이상 존재하는 경우에 사용하며 일반용어와 같이 하나의 단어가 여러 개의 뜻으로 번역되는 경우에 사용한다. 양방향 1:1 방식은 번역된 의미가 한 개인 경우에 사용하며 특화된 분야에서는 일반적으로 하나의 단어가 한 개의 뜻으로 번역이 가능하므로 특화용어 사전 번역에 활용한다. 상기 다국어 번역엔진은 테이블 인덱싱, 정렬(sorting) 기술을 이용하여 처리시간을 단축한다. 도 27은 한국어, 영어 일반용어 번역 DB의 구성 예로서 단방향 1:N 번역 테이블 구성과, 한국어, 라틴 특화용어 번역 DB 구성 예로서 양방향 1:1 번역 테이블의 구성 예를 나타내고 있다. Multilingual translations include unidirectional 1: N translation and bidirectional 1: 1 translation. Unidirectional 1: N is used when there is more than one translated meaning, and when one word is translated into several meanings like general terminology. The bidirectional 1: 1 method is used when there is only one translated meaning. In a specialized field, one word can be translated into one meaning. The multilingual translation engine uses table indexing and sorting techniques to reduce processing time. FIG. 27 shows an example of the configuration of a unidirectional 1: N translation table as an example of a Korean and English general terminology translation DB, and an example of a bidirectional 1: 1 translation table as an example of a Korean and Latin specialized term translation DB.

한편, 도 5를 참조하면, 본 발명의 구성에 대한 보다 구체적인 실시 예로서, 엔진 공통 프레임 웍(500)과 관리도구(550)를 포함하며, 엔진 공통 프레임 웍(500)을 기반으로 상용 소프트웨어(580)를 부가로 구비된다. 관리도구(550)는 도 1의 관리도구(150)에 해당하며, 관리도구(550)를 설명하기로 한다.On the other hand, referring to Figure 5, a more specific embodiment of the configuration of the present invention, including the engine common framework 500 and the management tool 550, based on the engine common framework 500 commercial software ( 580 is additionally provided. The management tool 550 corresponds to the management tool 150 of FIG. 1, and the management tool 550 will be described.

본 발명에 의한 자연영상 문자인식 및 번역 시스템 및 방법은 인식에 필요한 DB를 사용하여 인식 및 번역을 수행하고 최종 상용 SW 모듈을 생성하기 위해서는 엔진튜닝, 테스트 및 검증을 통해 최종 배포한다.Natural image character recognition and translation system and method according to the present invention is distributed through the engine tuning, testing and verification to perform the recognition and translation using the DB necessary for recognition and to generate the final commercial SW module.

이러한 하위 엔진 구성에서부터 상용SW 모듈 생성까지의 과정은 효율적으로 진행되어야 할 필요성이 있다. 따라서 모든 단계에서 관련 데이터를 효과적으로 공유하고 각 데이터는 체계적으로 관리하기 위해서는 전용 관리 도구가 필요하다.The process from the sub-engine configuration to the commercial SW module generation needs to be carried out efficiently. Therefore, dedicated management tools are needed to effectively share relevant data at all stages and manage each data systematically.

도 28은 관리도구(550)의 구성도를 나타낸 것으로서, 관리도구(550)는 인식과 번역 과정에서 사용하는 이미지 매칭 DB, 언어모델 DB, 문자인식 모델 DB, 번역 DB를 관리하고 성능 튜닝 기능이 필요하며 튜닝된 엔진을 테스트 하고 배포하는 기능과 문자인식 모델 DB를 생성하기 위한 학습기능이 필요하다.28 is a block diagram of the management tool 550. The management tool 550 manages an image matching DB, a language model DB, a character recognition model DB, a translation DB, and a performance tuning function used in a recognition and translation process. It needs the ability to test and deploy the tuned engine and the learning function to create the character recognition model DB.

관리도구(550)는 이러한 기능을 지원하기 위해서 언어별 DB 관리도구(2800), 문자인식엔진 학습 도구(2820), 문자인식 성능 튜닝 도구(2840), 언어별 테스트베드(2860), 배포관리도구(2880)를 포함하여 이루어진다. 인식 엔진에는 언어별 DB관리도구(2800), 문자인식 엔진 학습도구(2820), 문자인식 성능 튜닝 도구(2840)가 필요하다.The management tool 550 supports a language-specific DB management tool (2800), character recognition engine learning tool (2820), character recognition performance tuning tool (2840), language-specific test bed (2860), deployment management tool (2880). The recognition engine requires a language-specific DB management tool (2800), character recognition engine learning tool (2820), character recognition performance tuning tool (2840).

도 29는 언어별 DB의 구성도를 나타낸 것으로서, 언어별 DB 관리 도구는 폰트 DB, 문자인식모델 DB, 언어모델DB, 이미지 매칭 DB, 번역 DB를 관리하며 관리에 필요한 등록, 변경, 삭제, 조회 기능으로 구성된다.29 is a view showing the configuration of the language-specific DB, the language-specific DB management tool manages the font DB, character recognition model DB, language model DB, image matching DB, translation DB, registration, change, deletion, inquiry required for management It consists of functions.

폰트 DB 관리를 설명하면, 폰트 DB는 인식기 학습에 사용되는 낱자 정보를 폰트 종류별로 구분하여 저장한 데이터이며, 낱자정보는 각 낱자 이미지와 각 이미지의 레이블로 구성된다. 도 30은 폰트 정보 및 폰트 DB 관리의 예를 나타낸 것으로서, 언어종류, 폰트이름, 폰트사이즈, 폰트타입을 사용하여 폰트정보를 구분하고 관리한다. 폰트 정보는 실물 낱자와 가상 낱자의 두 종류 형태로 수집된다. 도 31은 폰트 DB 구축방법을 나타낸 것으로서, 실물 낱자 샘플추가와 가상 낱자 샘플 추가를 나타내고 있다. 실물 낱자 샘플추가는 실물을 촬영하고 낱자를 분할하고 각 낱자를 레이블링하고 검증 및 수정 후 낱자정보 파일을 생성한다. 가상 낱자 샘플 추가는 특화분야에 맞는 폰트 타입, 폰트 크기 등 설정값을 결정하고 자동화 프로그램을 수행하여 낱자 정보파일을 생성한다. Referring to the font DB management, the font DB is data stored by dividing the piece information used for the recognizer learning by font type, and the piece information is composed of each piece image and a label of each image. 30 shows an example of font information and font DB management, and classifies and manages font information using language type, font name, font size, and font type. The font information is collected in two forms, real and virtual. Fig. 31 shows a method of constructing a font DB, which illustrates adding real letter samples and adding virtual word samples. Add real word sample captures real objects, splits them, labels each one, verifies and corrects them, and generates a file of information. Virtual sample add adds font type and font size suitable for specific field and executes automated program to generate single information file.

또한 언어별 DB관리도구(2800)는 문자인식 모델 DB 관리하며, 구체적으로는 문자인식엔진 학습 도구에서 생성된 문자인식 모델의 목록을 관리한다.In addition, the language-specific DB management tool (2800) manages the character recognition model DB, specifically manages the list of the character recognition model generated by the character recognition engine learning tool.

또한 언어별 DB관리도구(2800)는 언어모델 DB 관리하며, 상기 언어모델 DB는 단어정보, 낱자정보, 오인식패턴 정보로 구성된다. 상기 단어정보는 단어, 단어출현빈도, 단어간 조합 정보의 형태로 구성 관리하며, 낱자정보는 낱자, 낱자출현빈도, 낱자간 조합 정보의 형태로 구성 관리되고, 오인식 패턴 정보는 오인식 발생이 쉬운 단어에 대해서 그 패턴을 저장하고 인식 과정에서 패턴 정보로부터 오인식을 보정하기 위한 정보를 의미하며, 오인식 패턴 정보는 단어, 단어의 오인식 패턴의 형태로 구성 관리한다. 도 32는 언어 모델 DB 구축 흐름도를 도시한 것으로서, 단어정보를 추가하여 단어정보 DB를 구축하고, 낱말정보를 추가하여 낱자정보 DB를 구축하고, 오인식 패턴 정보를 추가하여 오인식 패턴 정보 DB를 구축한다.The language-specific DB management tool 2800 manages a language model DB, and the language model DB is composed of word information, word information, and misrecognition pattern information. The word information is composed and managed in the form of words, word appearance frequency, combination information between words, the piece information is organized and managed in the form of single letter, single occurrence frequency, combination information between words, and misrecognition pattern information is a word that is easy to generate misrecognition For the information storage means for storing the pattern and correct the misrecognition from the pattern information in the recognition process, the misrecognition pattern information is configured and managed in the form of words, misrecognition patterns of words. 32 is a flowchart illustrating a language model DB construction, in which word information is added to construct a word information DB, word information is added to construct a word information DB, and misrecognition pattern information is added to construct a misrecognition pattern information DB. .

언어별 DB관리도구(2800)는 또한 이미지 매칭 DB를 관리하며, 이미지 매칭 DB는 Original 이미지 정보(이미지 버퍼, 의미, 카테고리)와 특징 추출 정보(이미지 특징 정보, 이미지의 의미(텍스트))로 구성된다. 도 33은 이미지 매칭 DB 구축 흐름도를 나타낸 것으로서, original 이미지 정보를 추가하여 original 이미지 DB를 구축하고, 특징 추출 정보를 추가하여 이미지 특징 DB를 구축한다.The language-specific DB management tool 2800 also manages an image matching DB, and the image matching DB is composed of original image information (image buffer, meaning, category) and feature extraction information (image feature information and image meaning (text)). do. 33 is a flowchart illustrating an image matching DB construction. The original image DB is added by adding original image information, and the image feature DB is constructed by adding feature extraction information.

언어별 DB 관리도구(2800)는 또한 번역 DB를 관리하며, 번역 정보는 기준언어의 단어, 기준언어와 매칭되는 외국어 단어의 형태로 정의되며, 번역 DB는 기준언어 테이블과 매칭되는 외국어 단어 테이블로 구성 관리한다. 번역 DB는 번역 처리 방식에 따라서 단방향 1:N 방식을 사용하는 일반용어 번역 DB와 양방향 1:1 방식을 사용하는 특화용어 번역 DB로 구성한다. 도 34는 번역 DB 관리 구성도를 나타낸 것으로서, 일반용어 번역 DB와 특화용어 번역 DB를 포함한다.Language-specific DB management tool (2800) also manages the translation DB, the translation information is defined in the form of words of the reference language, foreign language matching the reference language, translation DB is a foreign language word table matching the reference language table Manage configuration. The translation DB is composed of general term translation DB using unidirectional 1: N method and specialized term translation DB using bidirectional 1: 1 method according to translation processing method. 34 is a diagram illustrating a translation DB management configuration, and includes a general term translation DB and a specialized term translation DB.

본 발명에 의한 관리도구(550)는 언어별 DB 관리도구(2800), 문자인식엔진 학습 도구(2820), 문자인식 성능 튜닝 도구(2840), 언어별 테스트베드(2860), 배포관리 도구(2880)를 포함하여 이루어지는데, 상기 문자인식엔진 학습 도구(2820)를 설명하기로 한다. Management tool 550 according to the present invention is a language-specific DB management tool (2800), character recognition engine learning tool (2820), character recognition performance tuning tool (2840), language-specific test bed (2860), distribution management tool (2880) ), The character recognition engine learning tool 2820 will be described.

문자인식엔진 학습 도구(2820)는 폰트 DB의 낱자 정보를 입력 받아 학습된 문자인식 모델을 생성한다. 도 35는 문자인식 엔진 학습도구(2820)의 구성도를 나타낸 것이다. .The character recognition engine learning tool 2820 generates a learned character recognition model by receiving the character information of the font DB. 35 shows a configuration diagram of the character recognition engine learning tool 2820. .

문자인식엔진 학습은 Class 별 특징 추출, Class 별 특징 차원축소, 학습단계로 구성된다. 상기 Class 별 특징 추출은 각 낱자에 대해서 이미지 특성을 추출하고 추출된 정보를 정량적으로 수치화하는 과정으로, 특징은 일반적으로 많이 사용되는 문자의 경계정보를 사용하며 수치화된 결과는 벡터 형태로 표현한다. 도 36은 특징 추출 과정을 예시적으로 나타낸 것으로서, 낱자 이미지를 입력하고, 특징정보를 추출하고 특징을 벡터화한다. Character recognition engine learning consists of feature extraction by class, feature dimension reduction by class, and learning phase. The feature extraction for each class is a process of extracting image properties for each character and quantifying the extracted information quantitatively. The feature uses boundary information of a character that is commonly used, and the quantized result is expressed in a vector form. 36 exemplarily illustrates a feature extraction process, inputting a single image, extracting feature information, and vectorizing a feature.

Class 별 차원 축소는 학습 및 인식에 사용되는 메모리 및 계산량을 축소하기 위해서 앞 단계에서 추출한 특징 벡터의 차원에서 중요한 요소만을 추출하여 특징 벡터를 재구성하는 과정으로 일반적으로 많이 사용되는 FDA(Fisher's Discriminant Analysis)의 방법을 이용한다.Dimension reduction by class is a process that reconstructs the feature vector by extracting only the important elements from the dimension of the feature vector extracted in the previous step in order to reduce the memory and the amount of computation used for learning and recognition. The commonly used Fisher's Discriminant Analysis (FDA) Use the method.

학습은 각 Class 별 낱자들의 벡터를 이용하여 Class를 대표하는 기준 벡터를 생성한다. Learning creates a reference vector representing a class using a vector of each class.

본 발명에 의한 관리도구(550)는 언어별 DB 관리도구(2800), 문자인식엔진 학습 도구(2820), 문자인식 성능 튜닝 도구(2840), 언어별 테스트베드(2860), 배포관리도구배포관리도구포함하여 이루어지는데, 상기 문자인식 성능 튜닝 도구(2840)를 설명하기로 한다. Management tool 550 according to the present invention is a language management DB management tool (2800), character recognition engine learning tool (2820), character recognition performance tuning tool (2840), language-specific test bed (2860), distribution management tool distribution management A tool is included, and the character recognition performance tuning tool 2840 will be described.

도 37은 문자인식 성능 튜닝 도구(2840)의 흐름도를 나타낸 것으로서, 문자인식성능 튜닝도구(2840)는 특화된 분야에 적합한 엔진의 튜닝을 효율적으로 수행하기 위해 필요한 도구로서, 언어별 DB 관리 도구로부터 전달 받은 DB정보와 특화분야의 테스트용 샘플을 이용하여 특화 분야에 필요한 DB와 엔진 튜닝정보를 선별할 수 있다. 엔진의 튜닝 정보는 특화된 분야에서 직접 취득하거나 가상으로 생성한 이미지를 테스트하는 과정을 통해 생성이 가능하다. 튜닝을 위한 각 설정사항들을 직접 작업자가 설정하고 처리된 이미지나 리포트 결과를 육안으로 확인하는 과정을 반복 수행하면서 최적의 설정 값을 찾는다. 37 is a flowchart illustrating a character recognition performance tuning tool 2840. The character recognition performance tuning tool 2840 is a tool necessary for efficiently performing tuning of an engine suitable for a specialized field, and is transmitted from a language-specific DB management tool. DB and engine tuning information necessary for specialized field can be selected by using received DB information and sample for test in specialized field. The tuning information of the engine can be generated by testing images acquired directly from a specialized field or virtually generated. By manually setting each setting for tuning and visually checking the processed image or report results, the optimum setting value is found.

특화된 분야에 적합한 방법만을 사용함으로써 불필요한 처리를 없애고 처리정확도를 높일 수 있도록 이미지 전처리 필터 사용유무와 인식 튜닝 요소를 설정한다. 이미지 튜닝 요소로는 Blurring효과를 제거하는 필터, Transparency효과를 제거하는 필터, Reflection효과를 제거하는 필터, Occlusion효과를 제거하는 필터, Shadow효과를 제거하는 필터, 자연영상에 적합한 이진화 알고리즘, 이탤릭 처리 유무, 자간 거리 설정 등이 있다.The use of image preprocessing filter and recognition tuning factor are set to eliminate unnecessary processing and increase processing accuracy by using only methods suitable for specialized fields. Image tuning elements include filters to remove blurring effects, filters to remove transparency effects, filters to remove reflection effects, filters to remove occlusion effects, filters to remove shadow effects, binarization algorithms suitable for natural video, and italics , Kerning distance setting, etc.

본 발명에 의한 관리도구(550)는 언어별 DB 관리도구(2800), 문자인식엔진 학습 도구(2820), 문자인식 성능 튜닝 도구(2840), 언어별 테스트베드(2860), 배포 관리도구(2880)를 포함하여 이루어지는데, 상기 언어별 테스트 베드(2860) 및 배포관리도구(2880)를 설명하기로 한다. Management tool 550 according to the present invention is a language-specific DB management tool (2800), character recognition engine learning tool (2820), character recognition performance tuning tool (2840), language-specific test bed (2860), distribution management tool (2880) It includes a), the language-specific test bed 2860 and the distribution management tool (2880) will be described.

도 38은 언어별 테스트 베드(2860)의 구성을 나타낸 것으로서, 상기 언어별 테스트베드(2860)는 문자인식 성능 튜닝 도구로부터 최적튜닝 DB 정보와 최적튜닝 엔진 정보를 전달 받고 실제의 엔진과 DB 를 구성하여 테스트를 수행하는 시스템으로서, 테스트 샘플, 샘플정답 정보(Ground Truth), 성능 측정 및 리포팅 도구를 포함하여 이루어진다. 38 illustrates a configuration of a test bed 2860 for each language, and the language-specific test bed 2860 receives an optimal tuning DB information and an optimal tuning engine information from a character recognition performance tuning tool, and configures an actual engine and a DB. The system performs a test by including a test sample, ground truth, performance measurement, and a reporting tool.

상기 테스트 샘플은 이미지 데이터를 포함하며, 실제 환경의 이미지 데이터를 취득하여 구축하고 상기 샘플정답 정보는 구축된 테스트 샘플에 대해서 성능측정에 사용되는 정상인식text결과, 문자영역 등을 작업자가 수동으로 기입하여 구축한다.The test sample includes image data, and acquires and builds image data of a real environment, and the sample correct answer information manually writes a normal recognition text result, a text area, and the like used for performance measurement on the constructed test sample. To build.

그리고 성능 측정 및 리포팅 도구는 문자열 추출률, 문자열 인식률, 번역 성공률을 포함하며, 대량의 테스트 샘플에 대한 테스트를 용이하도록 일괄처리가 가능한 형태로 구성된다. Performance measurement and reporting tools include string extraction rates, string recognition rates, and translation success rates, and are organized in batches to facilitate testing of large numbers of test samples.

도 39는 배포관리도구(2880) 구성도를 나타낸 것으로서, 상기 배포 관리 도구는 문자인식 성능 튜닝 도구로부터 최적튜닝 DB 정보와 최적튜닝 엔진 정보를 전달 받고 실제 사용가능한 형태의 엔진과 DB를 구성하여 서비스하고자 하는 환경에 적합한 라이브러리를 생성한다. 라이브러리 생성 기능 뿐 아니라 배포 정보 및 배포된 라이브러리의 버전 정보를 관리 할 수도 있다.FIG. 39 illustrates a configuration diagram of a distribution management tool 2880. The distribution management tool receives an optimal tuning DB information and an optimal tuning engine information from a character recognition performance tuning tool, and configures an engine and a DB in a form that can be actually used. Create a library appropriate for your environment. In addition to the library creation function, it is possible to manage the distribution information and the version information of the distributed library.

한편, 도 5는 상술한 본 발명의 구성에 대한 보다 구체적인 실시예의 구성을 블록도로 나타낸 것으로서, 엔진 공통 프레임 웍(500)과 관리도구(550)를 포함하며, 엔진 공통 프레임 웍(500)을 기반으로 상용 소프트웨어(580)를 부가로 구비된다. 상기 상용 소프트웨어(580)에 대해 간략하게 설명하기로 한다. On the other hand, Figure 5 is a block diagram showing the configuration of a more specific embodiment of the configuration of the present invention described above, including the engine common framework 500 and the management tool 550, based on the engine common framework 500 Commercial software 580 is additionally provided. The commercial software 580 will be briefly described.

포털업체 및 이동통신사 등, 수요기관에서 요구되는 요청사항에 부응하기 위해 모바일 카메라를 통해 획득한 자연영상의 처리 및 문자영역 추출 엔진 기술기반으로 맞춤형 모듈 및 앱(App)을 제공한다. 도 40은 상기 상용 소프트웨어 구성도를 나타낸 것으로서, 상기 제공되는 모듈은 간판 인식 시스템과 같은 스마트폰용 모듈과 다국어 번역 서비스와 같은 단말탑재용 모듈, 부동산정보 서비스와 같은 서버 & 패키지 모듈로 구성된다. In order to meet the demands required by demand agencies such as portal companies and mobile telecommunication companies, the company provides customized modules and apps based on the technology of processing natural images acquired through mobile cameras and extracting text areas. 40 shows the commercial software configuration, wherein the provided module includes a smartphone module such as a signage recognition system, a terminal mounting module such as a multilingual translation service, and a server & package module such as a real estate information service.

상기 간판 인식 시스템은 간판을 포함한 자연영상을 취득하여 관련된 간판의 특정 데이터를 추출하며 추출된 데이터는 경량화된 엔진과 경량화된 DB를 통해 문자인식을 수행하고 인식된 결과를 토대로 탐색된 상점정보를 보여주거나 주변의 상점정보 결과를 사용자에게 서비스한다. The signboard recognition system acquires natural images including signboards and extracts specific data of related signboards. The extracted data performs text recognition through a lightweight engine and a lightweight DB, and displays the shop information searched based on the recognized results. Gives or serves the user the results of nearby store information.

상기 다국어 번역 서비스는 메뉴판 등의 자연영상을 취득하여 메뉴판의 특정 데이터를 추출하며, 추출된 데이터는 인식엔진과 특화된 인식DB를 통해 메뉴판 내 문자열에 대한 문자인식을 수행하고 인식된 결과를 토대로 다국어 번역 모듈을 통해 다국어(한국어, 라틴계열, 중국어, 일본어)로 번역해 번역 결과를 사용자에게 서비스한다. The multilingual translation service extracts specific data of the menu board by acquiring a natural image such as a menu board, and the extracted data performs character recognition on a string in the menu board through a recognition engine and a specialized recognition DB, and retrieves the recognized result. Based on this, the multilingual translation module is used to translate the translation into multiple languages (Korean, Latin, Chinese, Japanese) and serve the translation results to the user.

상기 부동산정보 서비스는 새주소 표지판 자연영상을 취득하여 해당 이미지에서 새주소 표지판 영역안의 문자 추출 및 추출된 문자에 대한 인식 후 인식된 새주소 정보에 부가정보를 첨부하여 서버에 전송함으로써 서버에 저장되어 있는 해당 부동산정보를 얻어오는 기능으로 구성된 모듈로서 부동산 정보 서비스 제공 및 각종 응용서비스를 할 수 있게 구성된다. 간단한 정보 확인을 주로하게 되는 일반 사용자에 경우 영상취득, 문자추출 및 인식, 서버 연동 과정을 사용자에 단말에서 모두 처리할 수 있게 패키지 형태로 구성되며, 사용량이 많고, 많은 정보를 확인해야 하는 업무담당자에 경우 영상취득과 서버 연동만 단말에서 처리하고, 이미지를 서버에 올리면 문자추출 및 인식 과정을 서버에서 수행 한 후 DB 검색을 통해 해당 정보만 단말에 전송하는 형태로 구성된다.The real estate information service is stored in the server by acquiring the natural image of the new address sign, extracting the characters in the new address sign area from the image and recognizing the extracted character, and then attaching additional information to the recognized new address information and transmitting it to the server. The module consists of a function of obtaining the corresponding real estate information, which is configured to provide real estate information services and various application services. In case of general user who checks simple information, it is composed of package to handle image acquisition, text extraction and recognition, and server interworking process on the user's terminal. In this case, only image acquisition and server interworking are processed by the terminal, and when the image is uploaded to the server, the character extraction and recognition process is performed on the server, and then only the relevant information is transmitted to the terminal through DB search.

한편, 도 41은 본 발명에 의한 모바일 카메라를 이용한 자연영상 다국어 문자인식 및 번역 방법에 대한 일실시예를 흐름도로 나타낸 것이다. Meanwhile, FIG. 41 is a flowchart illustrating an embodiment of a natural language multilingual character recognition and translation method using a mobile camera according to the present invention.

도 41을 참조하여 모바일 카메라를 이용한 자연영상 다국어 문자인식 및 번역 방법을 설명하기로 한다. 먼저, 문자 인식 및 번역이 수행되는 응용 분야(application)에 따라 특화되는 낱자와 단어, 낱자간 조합, 단어간 조합 정보 및 오인식 패턴정보를 데이터베이스로 구축하는 것이 필요하다. 상기 데이터베이스가 구축된 상태에서 상기 응용분야에 공통적으로 적용되는 프레임웍을 제공하는 문자 인식/번역 엔진이 모바일 카메라로 취득한 영상에 존재하는 문자열 영상을 추출한다.(S4100단계)Referring to FIG. 41, a natural language multilingual character recognition and translation method using a mobile camera will be described. First, it is necessary to build a database of words and words, word-to-word combinations, word-to-word combination information, and misrecognition pattern information specialized according to an application in which character recognition and translation are performed. In the state in which the database is constructed, a character recognition / translation engine providing a framework commonly applied to the application field extracts a string image existing in an image acquired by the mobile camera (step S4100).

그리고 나서, 상기 문자 인식/번역 엔진이 상기 추출된 문자열 영상을 분석하고 인식하여 텍스트 형태로 출력한다.(S4120단계) 상기 문자열 영상 분석 및 인식은 문자열 영상에서 문자구분 예상 지점으로 분할 점을 세분화하여 분할한 후 각 세그먼트를 조합 및 인식하여 최적 조합을 탐색하는 것이 바람직하다. 그리고 나서, 상기 텍스트 형태의 문자를 다른 언어로 번역한다.(S4140단계)Then, the character recognition / translation engine analyzes and recognizes the extracted string image and outputs it in text form. (S4120) The string image analysis and recognition subdivides a split point into predicted character division points in the string image. After segmentation, it is desirable to search for the best combination by combining and recognizing each segment. Then, the text-type characters are translated into another language (step S4140).

도 42는 상기 문자열 영상 추출을 보다 상세하게 나타낸 것으로서, 도 42를 참조하면, 상기 문자열 영상추출은 먼저, 입력된 영상 파일을 영상 처리가 가능한 형태로 변환하고 영상에 포함되어 있는 왜곡을 보정한다.(S4200단계) 입력된 영상 내에서 문자영역 후보군을 선정하고(S4220단계), 문자영역 후보들 중 문자영역의 배경영상과 문자열을 구분 한 후 추출된 문자열을 각각의 문자로 분리한다.(S4240단계) 그리고 나서, 단어, 단어의 출현 빈도 및 단어간 조합정보와, 낱자, 낱자의 출현빈도 및 낱자간 조합정보와 오인식 패턴 정보를 포함하여 이루어지는 언어모델 DB를 적용하여 언어별 문자 추출로 문자를 검증한다.(S4260단계)FIG. 42 illustrates the string image extraction in more detail. Referring to FIG. 42, the string image extraction first converts an input image file into a form capable of image processing and corrects distortion included in the image. In operation S4200, a character area candidate group is selected in the input image (step S4220), a background image of the character area candidates and a character string are separated, and the extracted character string is separated into respective characters (step S4240). Then, the characters are verified by extracting characters by language by applying a language model DB including words, frequency of occurrence of words and word combinations between words, word occurrences of letters, word occurrences of letters, combination information between letters, and misrecognition pattern information. (Step S4260)

여기서, 상기 문자열 영상 인식 및 인식된 문자열 번역은 인식기 학습에 사용되는 낱자정보를 폰트 종류별로 구분하여 저장하는 폰트 DB, 원래 이미지 정보와 특징 추출 정보로 이루어지는 이미지 매칭 DB, 단어와 단어출현 빈도 및 단어 간 조합정보로 이루어지는 단어정보와, 낱자, 낱자출현빈도 및 낱자간 조합정보로 이루어지는 낱자 정보와, 오인식 발생이 쉬운 단어와 단어의 오인식 패턴으로 이루어지는 오인식 패턴 정보로 이루어지는 언어 모델 DB, 상기 문자인식엔진 학습도구에서 생성된 문자인식 모델의 목록을 관리하는 문자인식 모델 DB 및 기준언어의 단어, 기준언어와 매칭되는 외국어 단어의 형태를 가지며 기준언어 테이블과 매칭되는 외국어 단어 테이블로 이루어지는 번역 DB중 적어도 하나를 이용하여 이루어진다.Here, the character string image recognition and the recognized character string translation may include a font DB for classifying and storing the piece information used for recognizer learning by font type, an image matching DB including original image information and feature extraction information, word and word occurrence frequency, and words. A language model DB comprising word information consisting of inter-combination information, word information consisting of single words, frequency of occurrence and inter-combination information, misrecognition pattern information consisting of words and misrecognition patterns of words that are easy to be mistaken, and the character recognition engine At least one of a character recognition model DB that manages a list of character recognition models generated by the learning tool, and a translation DB consisting of a word of a reference language and a foreign language word table having a form of a foreign language word matching the reference language and matching with the reference language table. Is done using.

본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터(정보 처리 기능을 갖는 장치를 모두 포함한다)가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등이 있다. The present invention can be embodied as code that can be read by a computer (including all devices having an information processing function) in a computer-readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of computer-readable recording devices include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이다. While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

100 : 공통문자 인식/번역 엔진 110 : 자연영상 문자영역 추출부
112 : 자연영상 전처리부 114 : 문자 추출부
116 : 문자검증부 120 ; 다국어 문자인식부
122 : 문자열 인식부 124 : 이미지 인식부
130 : 다국어 번역부 150 : 관리도구
155 : 언어별 DB 관리도구 160 : 문자인식 성능튜닝 도구
165 : 문자인식 학습도구 170 : 언어별 테스트 베드
175 : 배포관리도구 180 ; 상용 소프트웨어
185 : 스마트폰용 소프트웨어 190 : 단말 탑재용 소프트웨어
195 : 패키지 소프트웨어 199 : 서버 서비스용 소프트웨어
200 : 특징추출부 250 : 특징매칭부
300 : 인식/번역 데이터베이스 310 : 폰트 DB
320 : 이미지 매칭 DB 330 : 언어모델 DB
340 : 문자인식 모델 DB 350 : 번역 DB
400 : 클래스별 특징추출부 420 : 클래스별 차원 축소부
440 : 학습부100: common character recognition / translation engine 110: natural image character region extraction unit
112: natural image preprocessing unit 114: character extraction unit
116: character verification unit 120; Multilingual Character Recognition
122: string recognition unit 124: image recognition unit
130: multilingual translation unit 150: management tool
155: language management DB management tool 160: character recognition performance tuning tool
165: character recognition learning tool 170: language-specific test bed
175: deployment management tool 180; Commercial software
185: smartphone software 190: terminal mounting software
195 Package Software 199 Server Service Software
200: feature extraction unit 250: feature matching unit
300: recognition / translation database 310: font DB
320: Image matching DB 330: Language model DB
340: character recognition model DB 350: translation DB
400: feature extraction unit for each class 420: dimension reduction unit for each class
440: learning unit

Claims

A common character recognition / translation engine for extracting, recognizing and translating characters existing in an image acquired by a mobile camera; And
Manages the recognition / translation DB used for the recognition and translation of the extracted character, and the character and word, the combination between the characters, combination information between words and misrecognition that is specialized according to the application (s) where the character recognition and translation is performed A management tool for constructing a pattern information as a database and providing the common character recognition / translation engine;
The common character recognition / translation engine provides a framework commonly available for an application in which the character recognition and translation is performed, and uses a database specialized for each application field provided from the management tool. Natural image multilingual character recognition system using a mobile camera, characterized in that the character is recognized and translated.

The method of claim 1, wherein the common character recognition / translation engine
A natural image character region extraction unit for extracting characters by searching for an area in which a character string exists in a natural image photographed by a mobile camera;
A multilingual character recognition unit for recognizing a character string of the extracted character area; And
Natural image multi-lingual character recognition system using a mobile camera, characterized in that it comprises a multi-language translation unit to search the multi-lingual database to translate the recognized string.

The method of claim 2, wherein the natural image character region extraction unit
A natural image preprocessing unit converting the input image file into a form capable of image processing and correcting distortion included in the image;
A character extracting unit for selecting a character region candidate group in the input image, separating a background image and a character string of the character region among the character region candidates, and separating the extracted character string into respective characters; And
Natural image multilingual character recognition using a mobile camera, characterized in that it comprises a character verification unit for verifying the character by extracting the characters by applying a language model including at least the combination information between words, the combination information between words and misrecognition pattern information system.

The method of claim 2, wherein the multilingual character recognition unit
A string recognition unit for recognizing character strings in the form of printed characters,
The string recognition unit
Natural image multi-lingual character recognition system using a mobile camera, characterized in that the segmentation point is divided into a character point predicted point in the character string image, and then each segment is combined and recognized to search for an optimal combination.

The method of claim 4, wherein the multilingual character recognition unit
Further comprising an image recognition unit for recognizing a character or a logo containing a graphic element,
The image recognition unit
A feature extractor for extracting corner points and junction points representing characteristics of an image and then vectorizing them using surrounding information; And
Natural image multilingual character recognition system using a mobile camera, characterized in that it comprises a feature matching unit to find the most similar image information by matching the information of the vectorized corner point (junction point) and junction point (junction point) and the image matching DB .

The method of claim 2, wherein the multilingual translation unit
A general term translation database in which a general word or combination of words is matched with a plurality of words or combinations of words; And
A specialized term translation database in which specialized words or word combinations are matched one-to-one,
When the recognized character string is the general word or a combination of words, the generalized word translation data database is translated into a plurality of words or a combination of words,
When the recognized character string is the specialized word or a combination of words, the natural image multilingual character recognition system using a mobile camera, characterized in that the translation to a single word or a combination of words with reference to the specialized term translation data database.

The method of claim 1, wherein the management tool
It provides additional performance tuning function and learning function for creating character recognition model DB.
A language-specific DB management tool that manages a recognition / translation DB used to recognize and translate the extracted characters and provides a registration, change, deletion, and inquiry function necessary for the management;
Character recognition performance tuning tool providing performance tuning; And
Natural image multilingual character recognition system using a mobile camera, characterized in that it includes a character recognition learning tool that provides a learning function for generating a character recognition model DB.

The method of claim 7, wherein the recognition / translation DB:
A font DB for storing the piece information used for learning the recognizer by font type;
An image matching DB consisting of original image information and feature extraction information;
Language consisting of word information consisting of words, frequency of occurrence of words and combination information between words, word information consisting of words, word occurrence frequency and combination information between words, and misrecognition pattern information consisting of words that are easy to be mistaken and patterns of word misrecognition Model DB;
A character recognition model DB for managing a list of character recognition models generated by the character recognition learning tool; And
A natural image multilingual character recognition system using a mobile camera, characterized in that it comprises at least one of a word of the reference language, a foreign language word matching the reference language and a translation DB consisting of a foreign language word table matching the reference language table.

The method of claim 7, wherein the character recognition learning tool
Generates the learned character recognition model by inputting the font information of the font DB,
A feature extraction unit for each class, which extracts an image characteristic by using boundary information of each piece and expresses it as a feature vector;
A dimension reduction unit for each class for reconstructing the feature vector by extracting only an important element from the dimension of the feature vector in order to reduce the memory and the amount of computation used for learning and recognition; And
Natural image multi-lingual character recognition system using a mobile camera, characterized in that it comprises a learning unit for generating a reference vector representing a class by using the vector of each class.

The method of claim 7, wherein the character recognition performance tuning tool
DB and engine tuning information required for a specialized field are selected using DB information received from the DB management tool for each language and a specialized field test, and engine tuning is performed using the selected DB and engine tuning information. Natural video multilingual character recognition system using a mobile camera, characterized in that.

8. The method of claim 7,
A language-specific test bed configured to test by reconfiguring the engine by receiving optimal tuning DB information and optimal tuning engine information from the character recognition performance tuning tool; And
It further includes a distribution management tool that receives the optimal tuning DB information and the optimal tuning engine information from the character recognition performance tuning tool, generates an environment library to be serviced by constructing an available engine and DB and outputs a module for distribution. Natural video multilingual character recognition system using a mobile camera, characterized in that.

Constructing a database of words and words, word-to-word combinations, word-to-word combination information, and misrecognition pattern information specialized according to an application in which character recognition and translation are performed;
Extracting a character string image existing in an image acquired by a mobile camera by a character recognition / translation engine providing a framework commonly applied to the application field;
Analyzing, recognizing, and outputting the extracted character string image in text form by using the database by the character recognition / translation engine; And
The character recognition / translation engine natural character multilingual character recognition method using a mobile camera comprising the step of translating the text-type characters to another language using a predetermined database.

The method of claim 12, wherein the character string image extraction
Converting the input image file into a form capable of image processing and correcting distortion included in the image;
Selecting a character region candidate group in the input image, separating a background image and a character string of the character region among the character region candidates, and separating the extracted character string into respective characters; And
Applying a language model including at least combination information between words, combination information between words, and misrecognition pattern information to verify a character by extracting characters for each language, wherein the natural image multilingual character recognition method using a mobile camera is provided. .

The method of claim 12, wherein the string image analysis and recognition is performed.
A natural image multilingual character recognition method using a mobile camera, characterized by dividing a segmentation point into a character classification prediction point in a string image, and then searching for an optimal combination by combining and recognizing each segment.

The method of claim 12, wherein the string image recognition and the recognized string translation are performed.
A font DB for storing the piece information used for learning the recognizer by font type;
An image matching DB consisting of original image information and feature extraction information;
Language consisting of word information consisting of words, frequency of occurrence of words and combination information between words, word information consisting of words, word occurrence frequency and combination information between words, and misrecognition pattern information consisting of words that are easy to be mistaken and patterns of word misrecognition Model DB;
A character recognition model DB for managing a list of character recognition models generated by the character recognition learning tool; And
Natural image multi-lingual character recognition method using a mobile camera, characterized in that using at least one of the translation language consisting of a word of the reference language, a foreign language word matching the reference language, and a foreign language word table matching the reference language table .