KR102142238B1

KR102142238B1 - Method of extracting text information such as abbreviation, handwriting, atypical word and sentence included in a predetermined image and automatically translating the extraction result into a predetermined language

Info

Publication number: KR102142238B1
Application number: KR1020200023258A
Authority: KR
Inventors: 박남도
Original assignee: 주식회사 엔디소프트
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-08-07

Abstract

The present invention relates to a method of extracting text information such as abbreviation, handwriting, atypical words and sentences included in a predetermined image and automatically translating the extraction result into a predetermined language. Disclosed is a new method of extracting the text information included in the image using various data sets and then automatically translating the same into a specific language. The technical idea of the present invention relates to a method of extracting the text information such as abbreviations, handwriting, atypical words and sentences included in the predetermined image and then automatically translating the extraction results into a predetermined language. In the present invention, when a part in which the predetermined language is displayed is photographed through a terminal in daily life and transmitted to a server, the server automatically translates in real time into a language set by the user and provides the set language to the terminal. Therefore, if you take pictures of menus, signboards, etc. while traveling abroad using the terminal equipped with an app proposed in the present invention, there will be an advantage of translating the pictures into a native language in real time and providing the same.

Description

Method of extracting text information such as abbreviation, handwriting, atypical word and sentence included in a predetermined image and automatically translating the extraction result into a predetermined language}

본 발명은 소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법에 관한 것으로, 다양한 데이터 셋을 활용하여 이미지에 포함된 텍스트 정보를 추출한 후 특정 언어로 자동 번역할 수 있는 새로운 방법을 제안한다The present invention relates to a method for automatically extracting text information such as abbreviations, handwriting, unstructured words, and sentences included in a predetermined image, and then automatically translating the extracted result into a predetermined language. Text information included in an image using various data sets A new method to automatically translate to a specific language after extracting

일반적으로, 기존의 IT 기술 개발 시 여러 분야에 필요한 데이터 셋을 구축하고 활용하였으나, 수동으로 업데이트를 해주어야 하는 번거로움이 있으며, 수동으로 업데이트 시 지속적인 인건비 및 업데이트 비용이 발생할 뿐더러 데이터 셋을 활용한 인공지능의 학습의 정확도, 정밀도 및 결과를 도출하기까지 사용되는 시간의 지연 등에 의문이 생길 수 밖에 없는 상황이다In general, in the development of existing IT technologies, data sets required for various fields were built and utilized, but there is a hassle of manually updating, and manual updates incur constant labor and update costs, as well as artificial use of data sets There is no doubt that the accuracy, precision, and delay of the time used to derive the results of the learning of intelligence must be questioned.

예를 들어, 강아지 이미지에서 강아지의 이름을 맞추는 프로그램을 개발한다고 가정하면 기존의 기술은 수많은 강아지 관련 이미지들을 서버의 DB에 저장하여 해당 이미지들(데이터 셋)과 비교하려는 강아지 이미지를 매칭하여 이미지 유사도가 높은 강아지의 이름을 결과로 도출 하는 방식을 사용하고 있다. For example, assuming that a program for matching a puppy's name in a puppy image is developed, the existing technique stores a number of puppy-related images in a server's DB and matches the puppy image to be compared with the images (data set) to match the image similarity. The method of deriving the name of the dog with the highest dog is used.

그런데, 이 방식의 문제점은, 새로운 강아지 (잡종)이 나왔을 경우 새로운 강아지들의 이미지들을 서버에 새롭게 저장해 주어야 ㅎ한다는 문제점이 있다. However, the problem with this method is that when a new puppy (hybrid) comes out, the images of the new puppy should be newly stored on the server.

또한, 데이터 셋에 저장되지 않은 강아지 이미지를 서버의 데이터 셋과 매칭할 경우 결과값이 나오지 않거나, 잘못된 결과값이 도출된다. In addition, when the puppy image not stored in the data set matches the data set of the server, a result value is not displayed or an incorrect result value is derived.

이와 같은 문제를 해결하기 위하여 데이터 셋 자체를 인공지능화 하여 지속적으로 데이터 셋이 업데이트 될 수 있는 기술 및 시스템과 이 시스템을 활용한 번역 시스템이 필요하다.In order to solve such a problem, a technology and a system capable of continuously updating the data set by artificializing the data set itself and a translation system utilizing the system are required.

1. 특허출원번호 : 10-2013-0142744, 발명의 명칭 : " 촬상 이미지를 적합화 하여 문자를 인식하는 방법 및 그 방법을 실행하는 정보 처리 기기"1. Patent Application No.: 10-2013-0142744, Title of the invention: "Method for recognizing characters by adapting the captured image and information processing device for performing the method" 2. 특허출원번호 : 10-2012-0104762, 발명의 명칭 : " 모바일 카메라를 이용한 자연 영상 다국어 문자 인식과 번역 시스템 및 방법"2. Patent application number: 10-2012-0104762, Title of invention: "Natural video multilingual character recognition and translation system and method using mobile camera"

본 발명은 단말기를 통하여 서버로 전송되는 소정의 이미지에서 문자 정보인 텍스트 정보를 추출한 후 이를 사용자가 원하는 소정의 언어로 자동 번역하여 제공할 수 있는 새로운 방법을 제안하고자 한다.The present invention is to propose a new method for extracting text information, which is text information, from a predetermined image transmitted to a server through a terminal, and then automatically translating the text information into a desired language.

본 발명의 기술적 사상인 "소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법에 대한 제 1 실시예는The first embodiment of a method of automatically extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a predetermined image, which is a technical idea of the present invention, and automatically translating the extracted result into a predetermined language

단말기의 카메라로 촬영한 이미지 또는 단말기에 저장된 이미지가 서버로 전송되는 단계;Transmitting an image captured by the camera of the terminal or an image stored in the terminal to the server;

상기 서버에서 상기 이미지를 스캔하는 단계;Scanning the image from the server;

스캔한 상기 이미지를 상기 서버에서 설정하는 소정 해상도의 이미지로 전환시키는 단계;Converting the scanned image into an image of a predetermined resolution set by the server;

상기 이미지를 구성하는 전체 픽셀 각각에 대한 색상 정보를 추출하는 단계;Extracting color information for each pixel constituting the image;

동일 색상으로 구성되는 하나 이상의 형상 정보인 블롭 정보를 수집하는 단계;Collecting blob information which is one or more shape information composed of the same color;

상기 블롭 정보의 텍스트 라인 정보를 검출하여 상기 블롭 정보에 대응하는 제 1 언어(한국어인지, 영어인지, 중국어인지 등)를 결정하는 단계;Detecting text line information of the blob information and determining a first language (Korean, English, Chinese, etc.) corresponding to the blob information;

상기 블롭 정보의 크기 및 높이를 표준화시키는 단계;Normalizing the size and height of the blob information;

상기 블롭 정보에 대응하는 텍스트 정보를 추출하는 단계;Extracting text information corresponding to the blob information;

언어데이터베이스(예컨대, 상세한 설명에서 설명되는 제 2 데이터 셋)에 저장된 제 1 언어(위에서 한국어로 결정되었다면 한국어)의 단어 중에서 상기 텍스트 정보와 유사도가 가장 높은 단어를 선택하는 단계;Selecting a word having the highest similarity to the text information among words in a first language (Korean if it is determined to be Korean in the above) stored in a language database (eg, a second data set described in the detailed description);

상기 선택된 단어를 상기 제 1 언어(한국어)와 다른 제 2 언어(예컨대 영어)로 번역하는 단계;Translating the selected word into a second language (eg, English) different from the first language (Korean);

상기 제 2 언어를 상기 단말기로 전송하는 단계로 이루어지는 것을 특징으로 한다.And transmitting the second language to the terminal.

본 발명의 제 1 실시예에 있어서In the first embodiment of the present invention

상기 블롭 정보의 텍스트 라인 정보를 검출하여 상기 블롭 정보에 대응하는 제 1 언어를 결정하는 단계는, Detecting text line information of the blob information and determining a first language corresponding to the blob information includes:

상기 블롭 정보 각각의 윤곽선 정보를 취득하는 단계;Acquiring contour information of each of the blob information;

각각의 상기 윤곽선 정보 및 각각의 상기 윤곽선 정보를 적어도 하나 이상 조합하여 어떤 언어의 윤곽선 정보와 유사한지 여부를 확률적으로 판단하는 단계;Stochastically determining whether or not the contour information of a certain language is similar to each other by combining each of the contour information and each of the contour information;

상기 유사 여부 판단에 따라 유사도가 가장 높은 것으로 판단되는 상기 제 1 언어로 결정하는 단계로 구성되는 것을 특징으로 한다.And determining in the first language, which is determined to have the highest similarity according to the similarity determination.

본 발명의 기술적 사상인 "소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법에 대한 제 2 실시예는The second embodiment of a method of automatically extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a predetermined image, which is a technical idea of the present invention, and automatically translating the extracted result into a predetermined language

소정의 검색어와 이미지를 관리하며, 복수개의 사이트에 주기적으로 접속하여 상기 소정의 검색어와 이미지에 각기 매칭되는 제 1 정보를 수집하여 저장하는 제 1 데이터 셋,A first data set that manages a predetermined search term and image and periodically accesses a plurality of sites to collect and store first information matching each of the predetermined search term and image,

상기 제 1 데이터 셋에서 전송되는 상기 제 1 정보를 주기적으로 자동 저장시키며 업데이트시키는 제 2 데이터 셋,A second data set that periodically stores and updates the first information transmitted from the first data set,

단말기를 통하여 전송되는 소정 이미지에 포함되어 있는 제 1 텍스트 정보를 추출하여 관리하는 제 3 데이터 셋을 구비하는 서버를 이용하여 소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법으로서, Extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a given image using a server having a third data set that extracts and manages the first text information contained in the given image transmitted through the terminal. Then, as a method of automatically translating the extraction result into a predetermined language,

상기 블롭 정보의 텍스트 라인 정보를 검출하여 상기 블롭 정보에 대응하는 제 1 언어를 결정하는 단계;Detecting text line information of the blob information and determining a first language corresponding to the blob information;

상기 블롭정보의 크기 및 높이를 표준화시키는 단계;Normalizing the size and height of the blob information;

제 2 데이터 셋에 저장된 제 1 언어의 단어 중에서 상기 텍스트 정보와 유사도가 가장 높은 단어를 선택하는 단계;Selecting a word having the highest similarity to the text information from words of the first language stored in the second data set;

상기 선택된 단어를 상기 제 1 언어와 다른 제 2 언어로 번역하는 단계;Translating the selected word into a second language different from the first language;

본 발명의 제 2 실시예에 있어서, In the second embodiment of the present invention,

본 발명의 제 2 실시예에 있어서In the second embodiment of the present invention

상기 텍스트 정보가 상기 제 2 데이터 셋에 존재하지 않는 축약어 또는 신생어인 경우, 상기 제 1 데이터 셋은 상기 복수개의 사이트에 상기 축약어 또는 상기 신생어에 대한 정보를 검색하여 상기 제 2 데이터 셋에 저장시키고 학습시키는 것을 특징으로 한다.When the text information is an abbreviation or a new word that does not exist in the second data set, the first data set retrieves information about the abbreviation or the new word from the plurality of sites and stores the information in the second data set. Characterized by learning.

본 발명에서 제안하는 소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법을 실시하는 경우 다음과 같은 효과를 기대할 수 있다. When extracting text information such as abbreviations, handwriting, unstructured words, and sentences included in a predetermined image proposed by the present invention and then performing an automatic translation of the extracted results into a predetermined language, the following effects can be expected.

본 발명은 일상 생활에서 단말기를 통하여 소정 언어가 표시되어 있는 부분을 촬영하여 서버로 전송하면 서버에서는 사용자가 설정한 언어로 실시간 자동 번역하여 단말기로 제공한다. In the present invention, when a portion in which a predetermined language is displayed through a terminal in everyday life is photographed and transmitted to a server, the server automatically translates the language in a language set by the user and provides it to the terminal.

따라서 본 발명에서 제안하는 앱을 탑재한 단말기를 이용하여 해외 여행 등을 하면서 메뉴, 간판 등의 사진을 촬영하면 실시간으로 모국어으로 번역하여 제공할 수 있다는 이점이 있을 것이다.Therefore, when taking a picture of a menu, signboard, etc. while traveling abroad using a terminal equipped with the app proposed in the present invention, there will be an advantage that it can be provided in translation in real time in the native language.

더불어, TTS 기능을 제공하는 경우, 어떤 외국어든지 사용자가 원하는 모국어로 번역된 후 사용자 단말기의 스피커를 통하여 간편하게 청취할 수도 있을 것이다. In addition, when the TTS function is provided, any foreign language may be translated into a native language desired by the user and then simply listened through the speaker of the user terminal.

도 1은 본 발명에서 제안하는 방법을 실시하기 위한 서버의 기능 블록도를 개념적으로 도시한 도면이다.
도 2는 이미시에서 소정의 데이터 정보를 취득하는 방법의 일예를 설명하는 도면이다.
도 3 및 도 4는 소정 이미지의 블롭 정보로부터 데이터 라인을 취득하여 블롭 정보가 어떤 종류의 언어인지를 판단하는 과정을 보여주기 위하여 도시한 도면이다.1 is a diagram conceptually showing a functional block diagram of a server for implementing the method proposed in the present invention.
2 is a view for explaining an example of a method for obtaining predetermined data information in the image.
3 and 4 are diagrams illustrating a process of determining what kind of language the blob information is by acquiring a data line from blob information of a predetermined image.

이하, 도면을 참고하여 본 발명에서 제안하는 소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법에 대하여 설명하기로 한다Hereinafter, a method of extracting text information such as abbreviations, handwriting, unstructured words, and sentences included in a predetermined image proposed by the present invention with reference to the drawings and then automatically translating the extracted result into a predetermined language will be described.

도 1은 본 발명에서 제안하는 방법을 실시하기 위한 서버의 기능 블록도를 개념적으로 도시한 도면이다. 1 is a diagram conceptually showing a functional block diagram of a server for implementing the method proposed in the present invention.

도 1에 도시된 바와 같이, 본 발명의 서버는 제 1 데이터 셋, 제 2 데이터 셋, 제 3 데이터 셋을 포함한다. 참고로, 본 발명에서 제 1 ~제 3 데이터 셋으로 구분한 것은 기능적인 설명을 위한 것으로 본 발명의 기능을 수행하는 경우에는 그 명칭이나 데이터 셋의 갯수에 상관없이 본 발명의 보호범위에 속하는 것으로 해석되어야 할 것이다. 참고로, 본 발명의 데이터 셋이란 인공지능의 학습에 사용되는 데이터를 수집해 놓은 자료 및 이를 관리 운영하는 소정의 알고리즘이 장착된 는 구성을 통칭한다.1, the server of the present invention includes a first data set, a second data set, and a third data set. For reference, the first to third data sets in the present invention are for functional description, and when performing the functions of the present invention, they belong to the protection scope of the present invention regardless of the name or the number of data sets. Should be interpreted. For reference, the data set of the present invention collectively refers to a configuration in which data used for learning AI is collected and a predetermined algorithm for managing and operating the data is installed.

도 1에서, 제 1 데이터 셋은 소정의 검색어와 이미지를 관리 저장하며, 복수개의 사이트(다음, 네이버, 구글 등의 각종 포털, 블로그, 트위터 및 페이스북과 같은 각종 SNS 등)에 주기적으로 접속하여 상기 소정의 검색어와 이미지에 각기 매칭되는 정보(이하, 제 1 정보)를 수집하여 저장하는 기능을 수행한다. In FIG. 1, the first data set manages and stores predetermined search terms and images, and periodically accesses a plurality of sites (various portals such as Daum, Naver, and Google, various SNS such as blogs, Twitter, and Facebook). It performs a function of collecting and storing information (hereinafter, first information) that matches each of the predetermined search terms and images.

예를 들어, 제 1 데이터 셋에 저장되어 있는 소정의 검색어는 각 국가별 언어(영어, 중국어, 일본어, 한국어 등; 국가별 언어에는 그 나라의 신생어, 조어 등을 포함한다. 예컨대, 신종 한국어, 한국어 줄임 말, 특정 연령대에서 축약하여 즐겨 쓰는 단어 등을 포괄적으로 포함한다)를 포함하고, 소정의 이미지는 각종 이미지(동물, 식물, 산, 바다, 과일, 건축물 등)를 모두 포함하는 개념이다. 참고로, 제 1 데이터 셋에 저장되어 관리되는 상기 소정의 검색어와 이미지는 관리자에 의하여 언제든지 새롭게 업데이트될 수 있다. For example, a predetermined search term stored in the first data set includes languages for each country (English, Chinese, Japanese, Korean, etc.; country-specific languages include new words, terms, etc. of the country. , Korean abbreviations, and comprehensive words including abbreviated words for a specific age group), and certain images include all kinds of images (animals, plants, mountains, sea, fruits, buildings, etc.) . For reference, the predetermined search word and image stored and managed in the first data set may be newly updated at any time by an administrator.

본 발명의 서버는 제 1 데이터 셋에 저장되어 있는 소정의 검색어 및 이미지를 활용하여 각종 사이트 예컨대, 네이버, 다음, 구글, 야후 및 각종 트위터나 페이스북과 같은 SNS 및 각종 까페 등에 자동 접속하여 원하는 검색어 또는 이미지와 매칭되는 정보에 대한 검색작업을 요청하는 기능을 주기적 또는 관리자의 설정에 의하여 수행한다. 이러한 검색 작업에 의하여 수집되는 정보는 본 발명에서는 제 1 정보라고 한다The server of the present invention automatically accesses various sites such as Naver, Daum, Google, Yahoo, and various social media sites such as Twitter, Facebook, and various cafes by utilizing predetermined search terms and images stored in the first data set. Alternatively, a function for requesting a search operation for information matching an image is performed periodically or by a manager's setting. Information collected by such a search operation is referred to as first information in the present invention.

다음, 본 발명의 서버는 상기 검색 작업에 의하여 수집된 제 1 정보를 제 2 데이터 셋에 자동 저장시키면서 제 2 데이터 셋을 업데이트시킨다. Next, the server of the present invention updates the second data set while automatically storing the first information collected by the search operation in the second data set.

따라서, 제 2 데이터셋은 관리자의 개입없이 시간의 경과에 따라 각종 검색 결과가 자동 저장될 수 있음을 알 수 있다. 즉, 본 발명의 실시에 따라 제 2 데이터 셋은 딥러닝 등과 같은 인공지능 방식에 의하여 지속적으로 자동 업데이트 될 수 있으며, 지속적인 업데이트를 통하여 수집되어 저장 관리되는 텍스트와 이미지의 정확도나 정밀도 등의 개선을 기대할 수 있다. Accordingly, it can be seen that various search results can be automatically stored in the second data set over time without administrator intervention. That is, according to the implementation of the present invention, the second data set may be continuously and automatically updated by an artificial intelligence method such as deep learning, and improves accuracy and precision of text and images collected and stored through continuous updating. I can expect.

본 발명의 제 3 데이터 셋은 소정 앱이 설치되어 있는 단말기(컴퓨터, 스마트폰, 태블릿 등과 같이 인터넷망을 통하여 본 발명 서버와 접속 가능한 단말기)와 연동된다. 본 발명의 제 3 데이터 셋은 특히 단말기에서 전송되는 이미지(단말기의 카메라에 의하여 실시간으로 촬영된 이미지 또는 단말기에 기저장되어 있는 이미지 포함)에 포함되어 있는 적어도 하나 이상의 텍스트 정보(이하, 설명의 편의를 위해 제 1 텍스트 정보라 함)를 추출하는 기능을 수행한다. The third data set of the present invention is linked to a terminal (a terminal that can be connected to the server of the present invention through an Internet network, such as a computer, a smartphone, a tablet, etc.) on which a predetermined app is installed. In particular, the third data set of the present invention includes at least one or more text information (hereinafter referred to as convenience of description) included in an image transmitted from the terminal (including an image captured in real time by the camera of the terminal or an image previously stored in the terminal). For the first text information).

예를 들어, 제 3 데이터 셋은 단말기를 통하여 전송된 이미지에서 텍스트 정보를 추출하여 취합하며 본 발명의 서버에서는 이들 텍스트 정보를 제 2 데이터 셋과 매칭하여 유사도가 가장 높은 단어를 선택하는 기능을 수행한다. For example, the third data set extracts and collects text information from the image transmitted through the terminal, and the server of the present invention performs a function of selecting the word with the highest similarity by matching these text information with the second data set. do.

본 발명 실시에 있어서 단말기에서 전송되는 소정 이미지에서 텍스트를 추출하는 방법은 다음과 같이 처리된다. In the present invention, a method of extracting text from a predetermined image transmitted from a terminal is processed as follows.

1. 사용자는 단말기의 카메라로 촬영한 이미지 또는 단말기에 저장된 이미지를 서버로 전송한다. 그러면 서버에서는 상기 이미지를 스캔 처리한다1. The user transmits the image captured by the terminal's camera or the image stored in the terminal to the server. The server then scans the image.

스캔한 상기 이미지를 상기 서버에서는 서버에서 설정하는 소정 해상도의 이미지로 전환시키다. The scanned image is converted into an image of a predetermined resolution set by the server in the server.

이는 각 단말기에서 전송되는 이미지를 표준화시키키 위함이다. This is to standardize the image transmitted from each terminal.

2. 본 발명 서버에서는 표준화된 이미지(이하 이미지라함)를 구성하는 전체 픽셀(전픽셀) 각각에 대한 색상 정보를 추출한다. 2. The server of the present invention extracts color information for each of all pixels (all pixels) constituting a standardized image (hereinafter referred to as an image).

3. 다음, 동일 색상으로 구성되는 형상 정보인 블롭 정보를 수집한다. 여기서, 블롭 정보는 적어도 하나 이상일 것이다. 3. Next, blob information that is shape information composed of the same color is collected. Here, the blob information will be at least one.

여기서, 형상 정보 즉 블롭 정보란 동일 색상으로 구성된 픽셀의 윤곽선 정보를 의미한다. 윤곽선 정보는 소정의 윤곽선 라인에 대한 정보를 포함한다Here, the shape information, that is, the blob information means the outline information of pixels composed of the same color. The contour information includes information about a predetermined contour line

예컨대, 본 발명의 제 3 데이터 셋에서는 동일 색상을 가진 픽셀들을 상호 연결함으로써 소정의 형상 라인을 적어도 하나 이상 추출한다. For example, in the third data set of the present invention, at least one predetermined shape line is extracted by interconnecting pixels having the same color.

예컨대, 도 3 및 도 4를 참조하면, 서버로 전송된 이미지가 "안"이라고 하면, 동일 색상을 가진 블롭 정보의 윤곽선 정보는 예컨대 "o", "ㅏ", "ㄴ", "아", "안" 등을 포함할 수 있다For example, referring to FIGS. 3 and 4, if the image transmitted to the server is “no”, the outline information of the blob information having the same color is, for example, “o”, “,”, “b”, “ah”, Can include "no"

4. 다음, 본 발명 서버(또는 제 3 데이터 셋)에서는 텍스트 라인 검출기를 이용하여 상기 블롭 정보가 어떤 종류의 텍스트 라인에 대응하는지 여부를 검출한다. 4. Next, the server (or the third data set) of the present invention uses a text line detector to detect whether the blob information corresponds to a type of text line.

여기서, 텍스트 라인 검출기는 기호, 숫자는 물론 각종 언어의 알파벳, 모음, 자음, 단어의 라인 정보(이하, 텍스트 라인)에 관한 정보를 포함하고 있다. 따라서, 본 발명 서버에서는 동일 색상으로 수집된 각 블롭 정보 및 각 블롭 정보의 조합의 윤곽 정보가 텍스트 라인 검출기의 어떤 언어의 텍스트 라인에 대응하는지 여부를 검출한다. 이를 통하여 상기 블롭 정보가 어떤 언어의 모음, 자음, 알파벳, 또는 단어인지를 결정한다Here, the text line detector includes information on the line information (hereinafter, text line) of alphabets, vowels, consonants, and words of various languages as well as symbols and numbers. Therefore, in the server of the present invention, it is detected whether the outline information of each blob information and the combination of each blob information collected in the same color corresponds to a text line in a language of the text line detector. Through this, it is determined which language vowel, consonant, alphabet, or word is the blob information.

예컨대 "o", "ㅏ", "ㄴ", "아", "안" 등에 대하여 본 발명 서버의 텍스트 라인 검출기는 제 3 데이터 셋에 저장되어 있는 언어 중 한글과 가장 유사하다고 판단할 수 있다. For example, with respect to "o", "발명", "b", "ah", "an", etc., the text line detector of the server of the present invention can be determined to be most similar to Hangul among languages stored in the third data set.

유사도를 예컨대, 한국어 70%, 중국어 1%, 영어 29%로 판단함으로써, 어느 나라의 언어인지에 대한 우선순위를 설정할 수 있다. By determining the similarity as 70% in Korean, 1% in Chinese, and 29% in English, for example, it is possible to set a priority for which country the language is.

참고로, 예컨대 손글씨 또는 흘림체와 같이 비정형적으로 작성된 경우에도 각 블롭 정보 및 인접한 블롭 정보의 상호 조합에 의하여 도출되는 텍스트 라인을 분석하여 가장 유사도가 높은 언어를 결정한다 (예컨대, 안녕이라는 단어에 대하여 ㅇ, ㅏ, ㄴ, ㄴ, ㅕ, ㅇ, 아, 안, 녀, 녕 등과 같은 다양한 조합을 도출한다) For reference, even when it is written atypically, such as handwriting or italics, a text line derived by mutual combination of each blob information and adjacent blob information is analyzed to determine the language having the highest similarity (for example, for the word goodbye) ㅇ, ㅏ, ㄴ, ㄴ, ㅕ, ㅇ, ah, an, nyeo, nyeo, ning, etc. to derive various combinations)

이를 단계별로 정리하면, 상기 블롭 정보의 텍스트 라인 정보를 검출하여 상기 블롭 정보에 대응하는 언어를 결정하는 단계는, Summarizing this step by step, detecting the text line information of the blob information and determining a language corresponding to the blob information includes:

상기 유사 여부 판단에 따라 유사도가 가장 높은 것으로 판단되는 언어로 결정하는 단계로 구성된다.It is composed of the step of determining in the language determined to have the highest similarity according to the similarity determination.

5. 다음, 본 발명 서버에서는 피치 검출기를 이용하여 상기 텍스트 라인의 크기 및 높이를 표준화한다. 5. Next, the server of the present invention standardizes the size and height of the text line using a pitch detector.

이는 텍스트 라인의 사이즈가 천차 만별이기 때문에 본 발명 서버에서 설정한 표준 사이즈로 획일화시키기 위함이다. This is to uniformize the standard size set by the server of the present invention because the size of the text line is very different.

6. 다음, 상기 블롭 정보에 대응하는 텍스트 정보를 추출한다. 6. Next, text information corresponding to the blob information is extracted.

즉, 텍스트 라인 검출기 및 피치 검출기를 거쳐 블롭 정보에 대응하는 특정 언어에 대응하는 텍스트 정보가 도출된다That is, text information corresponding to a specific language corresponding to blob information is derived through a text line detector and a pitch detector.

7. 다음, 본 발명 서버에서는 이렇게 도출된 텍스트 정보에 대하여 언어 데이터베이스 예컨대 제 2 데이터 셋에 저장되어 있는 단어와의 유사도 여부를 판단한다. 7. Next, the server of the present invention determines whether the text information thus derived is similar to a word stored in a language database, for example, a second data set.

8. 유사도가 가장 높은 단어(예컨대 COFFEE)가 선택되면 본 발명 서버에서는 사용자 단말기에 설정되어 언어(예컨대 한글)로 번역한 후 사용자 단말기로 전송한다 8. When the word with the highest similarity (for example, COFFEE) is selected, the server of the present invention is set in the user terminal, translates into a language (for example, Korean), and transmits it to the user terminal.

따라서, 사용자는 외국어가 포함되어 있는 간판, 메뉴판, 서적 등을 촬영하기만 하면 자신이 원하는 모국어로 번역된 텍스트 정보를 실시간으로 단말기를 통하여 확인할 수 있다. Therefore, the user can check the text information translated in his/her native language through the terminal in real time by simply photographing a signboard, menu board, book, etc. containing a foreign language.

이하에서는 특정 이미지에서 텍스트 정보를 추출하는 방법에 대하여 보다 구체적으로 설명하기로 한다Hereinafter, a method of extracting text information from a specific image will be described in more detail.

도 2에 도시된 바와 같이, "안녕이라는 필기체"가 쓰여진 이미지로부터 추출한 텍스트의 라인의 특징을 분석하여 어느 나라의 언어와 유사한지 여부를 판단한다As illustrated in FIG. 2, it is determined whether the language of a country is similar by analyzing a feature of a line of text extracted from an image in which “Hello Handwriting” is written.

즉, 도 2의 이미지의 전픽셀 영역에 대하여 동일 색상으로 구성되는 블롭 정보(형상 정보)를 도출한다 That is, blob information (shape information) composed of the same color is derived for all pixel regions of the image of FIG. 2.

이렇게 도출된 블롭 정보의 윤곽선 정보로부터 텍스트 라인을 추출한 후 피치 검출기를 이용하여 텍스트 정보를 표준화시킨 후 언어 데이터베이스(예컨대 제 2 데이터 셋)에 저장되어 있는 각종 언어 데이터들과 비교하여 가장 유사도가 높은 단어를 추출한다.After extracting the text line from the contour information of the blob information derived in this way, the text information is normalized using a pitch detector, and then the word with the highest similarity compared to various language data stored in the language database (for example, the second data set). To extract.

예컨대, 제 3 데이터 셋에서 검출한 텍스트 라인을 "안녕"이라로 판단한 경우, 유사도를 예컨대, 한국어 70%, 중국어 1%, 영어 29%로 판단함으로써, 어느 나라의 언어인지에 대한 우선순위를 설정할 수 있다. For example, when it is determined that the text line detected in the third data set is “hello,” the similarity is determined as 70% of Korean, 1% of Chinese, and 29% of English, thereby setting the priority for the language of the country. Can.

다음, 우선 순위에서 한국어에 해당할 확율이 가장 높은 경우, 새롭게 갱신되는 정보가 누적 저장되고 있는 제 2 데이터 셋에 저장되어 있는 한국어와 비교하여 판단 우선 순위를 예컨대 안녕 90%, 안념 7%, 암념 3%과 같은 판단을 할 수 있다.Next, when the probability corresponding to Korean is the highest in the priority, the priority of judgment is compared to Korean stored in the second data set in which the newly updated information is accumulated and stored, for example, 90% of well-being, 7% of anxiety, and darkness You can make the same judgment as 3%.

이 경우, 본 발명에서는 이미지에서 추출한 텍스트를 확률이 가장 높은 단어인 "안녕"을 판단한다. In this case, in the present invention, the text extracted from the image is judged as "hi", which is the word with the highest probability.

본 발명에 따른 방법을 사용하는 경우, 기존의 기술로 인식하기 어려운 필기체, 손 글씨, 악필로 적힌 텍스트의 언어 종류와 의미를 파악할 수 있다는 이점이 있다. When the method according to the present invention is used, there is an advantage in that it is possible to grasp the language type and meaning of text written in handwriting, handwriting, and bad writing, which are difficult to recognize with conventional techniques.

즉, 전술한 바와 같이, 본 발명은 이미지 중에서 동일 색상으로 구성되는 블롭 정보의 윤곽선 정보로부터 텍스트 라인을 도출한 후 해당 텍스트 라인이 어떤 언어에 해당하는지 1차 판단한 후 언어데이터베이스에 저장되어 있는 해당 언어의 어떤 단어와 유사한지를 판단하는 프로세스를 거치므로 정형화되어 있는 텍스트는 물론 비정형적인 텍스트 정보(예컨대 손글씨, 디자인 글씨 등)에 대하여도 어떤 언어에 해당하는지 판단 가능하다는 이점이 있다.That is, as described above, the present invention derives a text line from contour information of blob information composed of the same color in an image, and after first determining which language the text line corresponds to, the corresponding language stored in the language database Since it goes through the process of determining which words are similar to, it has the advantage of being able to determine which language corresponds to not only the normalized text but also the unstructured text information (eg, handwriting, design writing, etc.).

다음, 본 발명에서는 최종 선택된 텍스트를 다른 외국어로 자동 번역하는 번역 모듈을 제공한다. Next, the present invention provides a translation module that automatically translates the final selected text into another foreign language.

따라서, 본 발명에서 제공하는 방법을 사용하는 경우 특정 이미지에 기재된 텍스트를 다른 외국어로 용이하게 번역할 수 있을 것이며, 이러한 기술은 본 발명을 실시하는 앱이 설치된 단말기를 이용하여 누구라도 손쉽게 사용할 수 있다. Therefore, when using the method provided by the present invention, text written in a specific image may be easily translated into other foreign languages, and such technology can be easily used by anyone using a terminal in which an app that implements the present invention is installed. .

즉, 단말기 사용자가 본 발명에서 제공하는 본역 기능 앱을 설치한 후 자신이 사용하는 모국어를 설정한 경우, 단말기 사용자가 예컨대 외국어가 포함된 소정 이미지를 촬영하여 본 발명의 서버로 전송하면 본 발명 서버에서는 소정 이미지에 포함된 외국어 텍스트를 사용자가 원하는 모국어로 번역하여 제공할 수 있다. That is, when the terminal user installs the main function app provided by the present invention and sets his/her native language, the terminal user, for example, takes a predetermined image containing a foreign language and transmits it to the server of the present invention. In the above, a foreign language text included in a predetermined image may be translated and provided in a native language desired by the user.

따라서, 본 발명에서 제안하는 방법이 적용된 앱을 단말기에 설치하는 경우 예컨대 해외 여행시 외국어를 모르기 때문에 발생하는 여행의 불편함이 해소할 수 있다는 이점이 있다. Therefore, when installing the app to which the method proposed in the present invention is applied to a terminal, for example, there is an advantage in that the inconvenience of travel caused by not knowing a foreign language when traveling abroad can be solved.

한편, 본 발명에서 제공한 번역 결과가 이상한 경우 단말기 사용자는 해당 부분을 선택하여 본 발명 서버로 전송할 수 있으며, 이 경우 본 발명의 서버에서는 해당 부분에 대하여 새롭게 번역하여 단말기로 전송할 수 있다. On the other hand, if the translation result provided by the present invention is strange, the terminal user can select the corresponding part and transmit it to the server of the present invention, and in this case, the server of the present invention can newly translate the relevant part and transmit it to the terminal.

이런 현상이 발생하는 이유는 손글씨 등으로 이루어진 이미지의 경우 해당 손글씨에 해당하는 언어나 단어의 결정에 오류가 발생할 수 있기 때문이다. The reason for this phenomenon is that in the case of an image composed of handwriting, an error may occur in the determination of the language or word corresponding to the handwriting.

이 경우, 본 발명에서는 전술한 과정을 수행함에 있어 언어의 종류 선택 및 단어 선택시 그 다음 확률로 선택되었던 언어와 단어로 재번역을 하여 단말기로 제공할 수 있다. In this case, in the present invention, in performing the above-described process, when selecting a type of language and selecting a word, the next probability may be retranslated into the selected language and word to provide the terminal.

한편, 빠른 정보화 시대에 새롭게 생성되는 축약어(줄임말)나 신생어가 이미지에서 검출된 경, 제 2 데이터 셋에 이러한 축약어나 신생어가 저장되어 있지 않는 경우가 있을 수 있다. On the other hand, when a new abbreviation (shortened) or a new word is detected in an image in the fast information age, there may be a case where the abbreviation or new word is not stored in the second data set.

이런 경우 이런 축약어 또는 신생어가 기재되어 있는 이미지가 서버로 입력되는 경우 본 발명에 따른 번역 방법을 실시함에 있어서 정확한 번역 결과를 제공하지 못할 수 있다. In this case, when an image in which these abbreviations or new words are described is input to the server, accurate translation results may not be provided in performing the translation method according to the present invention.

이러한 문제점을 해소하기 위하여 본 발명 서버에서는 제 2 데이터 셋에 저장되어 있지 않은 신생어나 축약어에 대하여 제 1 데이터 셋을 통하여 해당 신생어 또는 축약어의 의미를 찾아 업데이트 하도록 명령을 할 수 있으며, 이에 제 1 데이터 셋은 포털이나, 각종 온라인 사이트들에 자동접속하여 해당 신생어나 축약어에 대한 의미를 취득하여 제 2 데이터 셋을 업그레이드 할 수 있다. In order to solve this problem, the server of the present invention may instruct a new or abbreviated word that is not stored in the second data set to find and update the meaning of the new or abbreviated word through the first data set. The data set can be upgraded to the second data set by automatically accessing a portal or various online sites to acquire the meaning of the new or abbreviated word.

즉, 본 발명의 경우, 이미지에서 도출한 텍스트 정보가 상기 제 2 데이터 셋에 존재하지 않는 축약어 또는 신생어인 경우, 상기 제 1 데이터 셋은 상기 복수개의 사이트에 상기 축약어 또는 상기 신생어에 대한 정보를 검색하여 상기 제 2 데이터 셋에 저장시켜 학습시킨다. That is, in the case of the present invention, if the text information derived from the image is an abbreviated word or a new word that does not exist in the second data set, the first data set sends information about the abbreviated word or the new word to the plurality of sites. Search and store it in the second data set for training.

따라서, 본 발명에 따른 경우 새롭게 만들어진 신생어나 축약어가 검출되더라도 짧은 시간내에 해당 축약어 또는 신생어에 대한 학습이 이루어지기 때문에 제대로 된 번역 결과물을 제공할 수 있다는 이점이 있다.Therefore, according to the present invention, even if a newly created new or abbreviated word is detected, there is an advantage that a proper translation result can be provided because learning about the abbreviated or new word is performed in a short time.

지금까지 설명한 본 발명 "소정 이미지에 포함된 축약어, 손글씨, 비정형 단어 및 문장과 같은 텍스트 정보를 추출한 후 그 추출 결과를 소정 언어로 자동 번역하는 방법은 다음과 같은 점들에 있어 기존 기술과는 차이가 있다. The present invention described so far is a method for automatically extracting text information such as abbreviations, handwriting, unstructured words, and sentences included in a predetermined image and automatically translating the extracted results into a predetermined language differs from existing technologies in the following points. have.

1. 네이버의 파파고나 삼성의 빅스비와 같이 카메라로 피사체를 인식시키는 것은 동일하나 인공지능형 데이터 셋 및 시스템 구축으로 인하여 별도의 데이터 셋 업그레이드가 필요 없으므로 그에 따른 인력비, 유지비가 현저하게 절감된다1. Recognizing the subject with the camera is the same as Naver's Pagogo or Samsung's Bixby, but due to the construction of an artificial intelligence data set and system, there is no need to upgrade a separate data set, thereby significantly reducing manpower and maintenance costs.

2. 상기 업체들의 서비스와 달리 보유 DB 대차 방식이 아닌, 데이터 셋 및 인공지능 모듈을 사용함으로써 손으로 작성한 글씨, 필기체, 흘림체 등과 같이 기존 기술로 인식하기 어려운 다양한 텍스트를 제대로 인식한 후 사용자가 원하는 외국어로 번역하여 그 결과를 제공할 수 있다. 2. Unlike the service of the above companies, the user wants after properly recognizing various texts that are difficult to recognize with existing technologies, such as hand-written text, handwriting, and shedding, by using the data set and artificial intelligence module, not the DB holding method. The result can be translated into a foreign language.

3. 인공지능 모듈은 병렬형태로 CPU와 GPU를 동시에 사용함으로써 상기 업체들이 제공하는 서비스들 보다 동일한 하드웨어 자원을 사용한다는 가정하에 여타 서비스보다 결과 도출 및 처리 속도가 월등히 뛰어날 수 있다. 3. By using the CPU and GPU at the same time in the parallel module, the AI module can outperform other services under the assumption that it uses the same hardware resources than the services provided by the above companies, and can have a superior performance and processing speed.

4. 사용자가 번역 결과를 확인 후 번역 결과에 만족하지 못한다면 재 번역 기능을 통하여 유사도 및 우선순위가 처음과 다른 번역 결과를 지속적으로 제공하며, 사용자가 더 이상 재 번역 기능을 사용하지 않을 시 마지막 번역 결과에 유사도 및 우선순위의 가점을 주므로 시간이 지날 수록 번역의 정확도가 향상될 수 있다. 4. If the user is not satisfied with the translation result after checking the translation result, the re-translation function continuously provides the translation result with a similarity and priority from the first, and the last translation when the user no longer uses the re-translation function. The accuracy of the translation can be improved over time because the results are added with similarity and priority.

Claims

Transmitting an image captured by the camera of the terminal or an image stored in the terminal to the server;
Scanning the image from the server;
Converting the scanned image into an image of a predetermined resolution set by the server;
Extracting color information for each pixel constituting the image;
Collecting blob information which is one or more shape information composed of the same color;
Detecting text line information of the blob information and determining a first language corresponding to the blob information ;
Normalizing the size and height of the blob information;
Extracting text information corresponding to the blob information;
Selecting a word having the highest similarity to the text information from words stored in the language database;
Translating the selected word into a second language different from the first language;
And transmitting the second language to the terminal.
Detecting text line information of the blob information and determining a first language corresponding to the blob information includes:
Acquiring contour information of each of the blob information;
Probabilistically determining whether or not the contour information of a certain language is similar by combining each of the contour information and each of the contour information;
And determining the first language determined to have the highest similarity according to the similarity determination , extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a predetermined image and extracting the extracted text information. How to automatically translate results into a given language

delete

A first data set that manages a predetermined search term and image, periodically accesses a plurality of sites, and collects and stores first information matching each of the predetermined search term and image,
A second data set that periodically stores and updates the first information transmitted from the first data set,
Extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a given image using a server having a third data set that extracts and manages the first text information contained in the given image transmitted through the terminal. Then, as a method of automatically translating the extraction result into a predetermined language,
Transmitting an image captured by the camera of the terminal or an image stored in the terminal to the server;
Scanning the image from the server;
Converting the scanned image into an image of a predetermined resolution set by the server;
Extracting color information for each pixel constituting the image;
Collecting blob information which is one or more shape information composed of the same color;
Detecting text line information of the blob information and determining a first language corresponding to the blob information;
Normalizing the size and height of the blob information;
Extracting text information corresponding to the blob information;
Selecting a word having the highest similarity to the text information from words of the first language stored in the second data set;
Translating the selected word into a second language different from the first language;
And transmitting the second language to the terminal.
Detecting text line information of the blob information and determining a first language corresponding to the blob information includes:
Acquiring contour information of each of the blob information;
Stochastically determining whether or not the contour information of a certain language is similar to each other by combining each of the contour information and each of the contour information;
And extracting text information such as abbreviations, handwriting, atypical words, and sentences included in a predetermined image, and determining the first language, which is determined to have the highest similarity according to the similarity determination. How to automatically translate results into a given language

delete

The method of claim 3
When the text information is an abbreviation or a new word that does not exist in the second data set, the first data set retrieves information about the abbreviation or the new word from the plurality of sites and stores the information in the second data set. A method of extracting text information, such as abbreviations, handwriting, unstructured words, and sentences, included in a predetermined image, characterized by learning, and automatically translating the extracted results into a predetermined language