KR102073644B1

KR102073644B1 - Text recognition apparatus included in image, text recognition method and recording media storing with program for the performing method

Info

Publication number: KR102073644B1
Application number: KR1020170175042A
Authority: KR
Inventors: 정원국; 유진선; 장기영; 정치훈
Original assignee: 주식회사 매스프레소
Priority date: 2017-05-04
Filing date: 2017-12-19
Publication date: 2020-02-06
Also published as: KR20180122927A

Abstract

텍스트의 인식장치가 개시된다. 텍스트의 인식장치는 하나 이상의 프로세서, 메모리 및 스토리지를 포함한다. 메모리는 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드한다. 스토리지는 컴퓨터 프로그램 및 데이터를 저장한다. 컴퓨터 프로그램은 이미지에 포함된 복수의 픽셀 각각에서 특징값을 추출하는 특징추출모듈, 추출된 특징값을 이용하여 복수의 픽셀 각각에서 복수의 컴퍼넌트 중 이미지에 포함된 텍스트를 구성하는 복수의 인식컴퍼넌트를 결정하는 컴퍼넌트결정모듈 및 각 픽셀마다 결정된 복수의 인식컴퍼넌트를 조합하여 텍스트를 완성하는 텍스트완성모듈을 포함한다.
이로 인하여, 글자와 수식이 혼재된 이미지에서 텍스트 인식 정확도와 인식 속도가 향상된다.Disclosed is a text recognition apparatus. The apparatus for recognizing text includes one or more processors, memory, and storage. The memory loads computer programs executed by the processor. Storage stores computer programs and data. The computer program includes a feature extraction module for extracting feature values from each of a plurality of pixels included in the image, and a plurality of recognition components constituting text included in an image among the plurality of components using the extracted feature values. A component determining module for determining and a text completion module for combining the plurality of recognition components determined for each pixel to complete the text.
As a result, the text recognition accuracy and the recognition speed are improved in the image in which letters and expressions are mixed.

Description

Text recognition apparatus included in image, text recognition method and recording media storing with program for the performing method}

본 발명은 이미지에 포함된 텍스트 인식장치, 텍스트 인식방법 및 텍스트 인식방법을 실행하기 위한 프로그램을 기록한 기록매체에 관한 것으로서, 보다 상세하게는 이미지에 세그멘테이션을 수행함 없이 텍스트 인식장치, 텍스트 인식방법 및 텍스트 인식방법을 실행하기 위한 프로그램을 기록한 기록매체에 관한 것이다.The present invention relates to a recording medium on which a text recognition apparatus, a text recognition method, and a program for executing the text recognition method included in an image are recorded. More particularly, the present invention relates to a text recognition apparatus, a text recognition method, and a text without performing segmentation on an image. A recording medium having recorded thereon a program for executing a recognition method.

IT기술의 진보에 따라 영상을 제공하는 다양한 기기들, 이를테면 TV, 컴퓨터, 스마트폰, 태블릿, 웨어러블 디바이스, 가상현실 기기 등이 존재한다. 또한, 영상을 제공하면서 동시에 영상에 존재하는 텍스트를 인식하고, 인식된 텍스트를 이용하기 위한 다양한 연구가 진행되고 있다.As IT technology advances, there are various devices that provide video such as TVs, computers, smart phones, tablets, wearable devices, and virtual reality devices. In addition, various studies are being conducted to recognize text existing in an image while providing an image and to use the recognized text.

텍스트의 인식은 전자장치가 이미지 파일을 분석함으로써 수행된다. 텍스트를 인식하기 위한 종래기술의 일례로서 영역 기반(region-based) 인식이 있다. 영역 기반 인식은 영상을 글자마다 분할하는 세그멘테이션을 수행하고, 분할된 영역마다 텍스트가 존재하는지 판단하고, 영역에 텍스트가 존재하는 경우, 한글인지 영문인지 등을 판단한다. 나아가, 분할된 영역에 텍스트의 언어에 맞게 광문자인식(Optical Character Recognition: OCR)을 수행하여 이미지에서 텍스트를 인식한다.Recognition of text is performed by the electronic device analyzing the image file. An example of the prior art for recognizing text is region-based recognition. Region-based recognition performs segmentation for segmenting an image for each character, and determines whether text exists in each divided region, and if text exists in the region, determines whether it is Korean or English. Furthermore, optical character recognition (OCR) is performed in accordance with the language of the text in the divided region to recognize the text in the image.

영역 기반 인식을 통해 이미지에 포함된 텍스트를 인식하려는 경우, 이미지에 세그멘테이션이 반드시 수행되어야 한다. 따라서, 영역 기반 인식은 이미지에 포함된 글자 수가 많을수록 글자별로 영역을 나누어야 하기 때문에, 이미지 상에서 텍스트를 인식하는 시간이 오래 걸리는 단점이 있다. 또한, 영역에 포함된 언어를 사전에 파악해 두어야 텍스트인식이 가능하므로, 여러 단계를 거쳐 텍스트를 인식하는 문제점이 있다.In order to recognize text included in an image through region-based recognition, segmentation must be performed on the image. Therefore, the area-based recognition has a disadvantage in that it takes a long time to recognize the text on the image because the area is divided by the character as the number of characters included in the image. In addition, since the text recognition is possible only by grasping the language included in the area in advance, there is a problem of recognizing the text through several steps.

나아가, 이미지 상에 한글, 수식, 영문이 혼합하여 존재하는 경우, 각 글자마다 높이 또는 합자(ligature) 등과 같은 특성이 상이함으로 인해 세그멘테이션을 정밀하게 수행하는 것이 어려운 문제가 있다. Furthermore, when Korean, mathematical, and English are mixed and present on an image, it is difficult to precisely perform segmentation because characteristics such as height or ligature are different for each letter.

즉, 종래기술에 따른 텍스트 인식방법은 텍스트 인식의 정확도와 속도가 떨어지는 문제가 있다.That is, the text recognition method according to the prior art has a problem that the accuracy and speed of the text recognition falls.

한국공개특허 제 2009-0035541 호Korean Patent Publication No. 2009-0035541

본 발명은 상술한 문제점을 해결하기 위한 것으로서, 세그멘테이션의 수행을 생략하고도 이미지에 포함된 텍스트를 인식함으로써 보다 빠르고 정확하게 이미지에서 텍스트를 인식하여 제공하는 텍스트를 인식하기위한 방법, 그 장치 및 텍스트인식을 수행하는 프로그램을 기록한 기록매체가 제공된다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problem, and a method, apparatus and text recognition method for recognizing text provided by recognizing text in an image more quickly and accurately by recognizing text included in an image without performing segmentation. A recording medium is provided which records a program for performing the program.

상기한 문제점을 해결하기 위한 본 발명의 일 실시예에 따르면, 하나 이상의 프로세서; 상기 프로세서에 의하여 수행되는 컴퓨터 프로그램을 로드(load)하는 메모리; 및 상기 컴퓨터 프로그램 및 데이터를 저장하는 스토리지를 포함하고, 상기 컴퓨터 프로그램은, 이미지에 포함된 복수의 픽셀 각각에서 특징값을 추출하는 특징추출모듈, 상기 추출된 특징값을 이용하여 상기 복수의 픽셀 각각에서 복수의 컴퍼넌트 중 상기 이미지에 포함된 텍스트를 구성하는 인식컴퍼넌트를 결정하는 컴퍼넌트결정모듈, 및 상기 각 픽셀마다 결정된 인식컴퍼넌트들을 조합하여 상기 텍스트를 완성하는 텍스트완성모듈을 포함하는, 텍스트 인식장치가 제공된다. 이로 인하여 글자와 수식이 혼재된 이미지에서 텍스트 인식 정확도와 인식속도가 향상된다.According to an embodiment of the present invention for solving the above problems, at least one processor; A memory for loading a computer program executed by the processor; And a storage for storing the computer program and data, wherein the computer program comprises: a feature extraction module for extracting feature values from each of the plurality of pixels included in the image; each of the plurality of pixels using the extracted feature values A component determination module for determining a recognition component constituting text included in the image among a plurality of components, and a text completion module for combining the recognition components determined for each pixel to complete the text. Is provided. As a result, the text recognition accuracy and the recognition speed are improved in the image with mixed letters and expressions.

상기 특징추출모듈은, 상기 이미지에 포함된 상기 복수의 픽셀 각각에 대하여 제1필터를 적용하여 상기 특징값을 추출하고, 상기 컴퍼넌트결정모듈은, 상기 제1필터의 적용에 따라 추출된 상기 특징값에 대하여 제2필터를 적용하여 상기 각 픽셀마다 상기 복수의 컴퍼넌트가 각각 존재할 수 있는 확률분포를 산출하고, 상기 복수의 컴퍼넌트 중 상기 산출된 확률분포를 이용하여 선택된 컴퍼넌트를 상기 각 픽셀의 인식컴퍼넌트로 결정할 수 있다.The feature extraction module extracts the feature value by applying a first filter to each of the plurality of pixels included in the image, and the component determination module extracts the feature value extracted according to the application of the first filter. Applying a second filter with respect to the pixel to calculate the probability distribution that each of the plurality of components can be present for each pixel, and by using the calculated probability distribution of the plurality of components as a recognition component of each pixel You can decide.

상기 컴퍼넌트결정모듈은, 상기 복수의 픽셀 각각에 대하여 상기 인식컴퍼넌트가 결정된 후, 상기 복수의 픽셀 중 서로 소정의 거리 이내로 인접하고 상기 결정된 인식컴퍼넌트가 서로 같은 적어도 하나의 픽셀을 일군의 픽셀로 결정하고, 상기 일군의 픽셀에 상기 인식컴퍼넌트가 존재하는 것으로 결정할 수 있다.The component determining module is configured to determine, as a group of pixels, at least one pixel adjacent to each other within a predetermined distance of the plurality of pixels and having the determined recognition components equal to each other after the recognition component is determined for each of the plurality of pixels. The recognition component may be determined to exist in the group of pixels.

상기 컴퍼넌트결정모듈은, 상기 각 픽셀에 대해서 결정된 상기 인식컴퍼넌트 각각의 모양에 기반하여 상기 각 인식컴퍼넌트의 중심좌표를 산출하고, 상기 산출된 중심좌표에 기초하여 중심픽셀들을 결정하며, 상기 복수의 픽셀 중 상기 각 중심픽셀과 거리가 기준거리레벨 이하이고, 상기 결정된 컴퍼넌트가 존재할 확률이 기준확률레벨을 초과하는 픽셀들을 상기 인식컴퍼넌트 각각이 존재하는 일군의 픽셀로 결정할 수 있다.The component determination module calculates a center coordinate of each recognition component based on a shape of each recognition component determined for each pixel, and determines center pixels based on the calculated center coordinates, wherein the plurality of pixels Pixels whose distances from the respective center pixels are less than or equal to a reference distance level and whose probability that the determined component exists exceed a reference probability level may be determined as a group of pixels in which each of the recognition components exists.

상기 컴퓨터 프로그램은 상기 특징추출모듈, 상기 컴퍼넌트결정모듈 및 상기 텍스트완성모듈 중 적어도 하나를 트레이닝시키는 트레이닝모듈을 더 포함할 수 있다.The computer program may further include a training module for training at least one of the feature extraction module, the component determination module, and the text completion module.

상기 트레이닝모듈은 테스트이미지와 정답데이터를 생성할 수 있다.The training module may generate a test image and correct answer data.

상기 트레이닝모듈이 상기 테스트이미지를 상기 특징추출모듈로 전달하면, 상기 특징추출모듈은 상기 테스트이미지에 제1필터를 적용하여 테스트특징값을 추출하고, 상기 컴퍼넌트결정모듈은 상기 테스트특징값에 제2필터를 적용하여 테스트확률분포를 산출하여 상기 트레이닝모듈로 전달하며, 상기 트레이닝모듈은 상기 정답데이터와 상기 산출된 테스트확률분포를 비교하고, 비교결과에 기초하여 상기 정답데이터와 상기 테스트확률분포가 소정레벨 이상 차이나면 상기 제1필터와 상기 제2필터 중 적어도 하나를 조절할 수 있다.When the training module transmits the test image to the feature extraction module, the feature extraction module applies a first filter to the test image to extract a test feature value, and the component determination module is configured to apply the second feature to the test feature value. A test probability distribution is calculated and applied to a training module by applying a filter, and the training module compares the correct answer data with the calculated test probability distribution, and the correct answer data and the test probability distribution are predetermined based on a comparison result. If the difference is more than the level, at least one of the first filter and the second filter may be adjusted.

상기 컴퍼넌트는 문자, 수식의 구성요소 및 공백 중 적어도 하나를 포함할 수 있다.The component may include at least one of a character, a component of an expression, and a space.

상기한 문제점을 해결하기 위한 본 발명의 일 실시예에 따르면, 텍스트 인식장치에 의해 수행되는 텍스트 인식방법에 있어서, 이미지에 복수의 픽셀 각각에서 특징값을 추출하는 단계; 상기 추출된 특징값을 이용하여 상기 복수의 픽셀 각각에서 복수의 컴퍼넌트 중 상기 이미지에 포함된 텍스트를 구성하는 인식컴퍼넌트를 결정하는 단계; 및 상기 각 픽셀마다 결정된 인식컴퍼넌트들을 조합하여 상기 텍스트를 완성하는 단계를 포함할 수 있다.According to an embodiment of the present invention for solving the above problems, a text recognition method performed by a text recognition apparatus, the method comprising: extracting a feature value from each of a plurality of pixels in an image; Determining a recognition component constituting text included in the image among a plurality of components in each of the plurality of pixels using the extracted feature values; And combining the recognition components determined for each pixel to complete the text.

상기 이미지에 복수의 픽셀 각각에서 특징값을 추출하는 단계는, 상기 이미지에 포함된 상기 복수의 픽셀 각각에 대하여 제1필터를 적용하여 상기 특징값을 추출하는 단계를 포함하고, 상기 추출된 특징값을 이용하여 상기 복수의 픽셀 각각에서 복수의 컴퍼넌트 중 상기 이미지에 포함된 텍스트를 구성하는 인식컴퍼넌트를 결정하는 단계는, 상기 제1필터의 적용에 따라 추출된 상기 특징값에 대하여 제2필터를 적용하여 상기 각 픽셀마다 상기 복수의 컴퍼넌트가 각각 존재할 수 있는 확률분포를 산출하는 단계; 및 상기 복수의 컴퍼넌트 중 상기 산출된 확률분포를 이용하여 선택된 컴퍼넌트를 상기 각 픽셀의 인식컴퍼넌트로 결정하는 단계를 포함할 수 있다.Extracting a feature value from each of the plurality of pixels in the image includes extracting the feature value by applying a first filter to each of the plurality of pixels included in the image, wherein the extracted feature value Determining a recognition component constituting the text included in the image among the plurality of components in each of the plurality of pixels by using a second filter, applying a second filter to the feature value extracted in accordance with the application of the first filter Calculating a probability distribution in which each of the plurality of components may exist for each pixel; And determining a selected component as a recognition component of each pixel by using the calculated probability distribution among the plurality of components.

상기 복수의 컴퍼넌트 중 상기 산출된 확률분포를 이용하여 선택된 컴퍼넌트를 상기 각 픽셀의 인식컴퍼넌트로 결정하는 단계는, 상기 복수의 픽셀 각각에 대하여 상기 인식컴퍼넌트가 결정된 후, 상기 복수의 픽셀 중 서로 소정의 거리 이내로 인접하고 상기 결정된 인식컴퍼넌트가 서로 같은 적어도 하나의 픽셀을 일군의 픽셀로 결정하는 단계; 및 상기 일군의 픽셀에 상기 인식컴퍼넌트가 존재하는 것으로 결정하는 단계를 포함하는 단계를 포함할 수 있다.Determining a selected component as the recognition component of each pixel by using the calculated probability distribution among the plurality of components, after the recognition component is determined for each of the plurality of pixels, a predetermined one of the plurality of pixels Determining at least one pixel that is adjacent to each other within a distance and that the determined recognition components are equal to each other as a group of pixels; And determining that the recognition component exists in the group of pixels.

상기 적어도 하나의 픽셀을 일군의 픽셀로 결정하는 단계는, 상기 각 픽셀에 대해서 결정된 상기 인식컴퍼넌트 각각의 모양에 기반하여 상기 각 인식컴퍼넌트의 중심좌표를 산출하는 단계; 상기 산출된 중심좌표에 기초하여 중심픽셀들을 결정하는 단계; 및 상기 복수의 픽셀 중 상기 각 중심픽셀과 거리가 기준거리레벨 이하이고, 상기 결정된 컴퍼넌트가 존재할 확률이 기준확률레벨을 초과하는 픽셀들을 상기 인식컴퍼넌트 각각이 존재하는 일군의 픽셀로 결정하는 단계를 포함할 수 있다.The determining of the at least one pixel as a group of pixels may include calculating a center coordinate of each recognition component based on a shape of each recognition component determined for each pixel; Determining center pixels based on the calculated center coordinates; And determining, as a group of pixels in which each of the recognition components exists, pixels in which the distance from each of the plurality of pixels is less than or equal to a reference distance level and the probability that the determined component is present exceeds a reference probability level. can do.

상기 텍스트 인식장치를 트레이닝시키는 단계를 더 포함할 수 있다.The method may further include training the text recognition device.

상기 텍스트의 인식을 트레이닝하는 단계는, 테스트이미지와 정답데이터를 생성하는 단계를 포함할 수 있다.Training the recognition of the text may include generating a test image and correct answer data.

상기 텍스트의 인식을 트레이닝하는 단계는, 상기 트레이닝모듈이 상기 생성한 테스트이미지를 상기 특징추출모듈로 전달하는 단계; 상기 특징추출모듈은 상기 테스트이미지에 제1필터를 적용하여 테스트특징값을 추출하는 단계; 상기 컴퍼넌트결정모듈은 상기 테스트특징값에 제2필터를 적용하여 테스트확률분포를 산출하여 상기 트레이닝모듈로 전달하는 단계; 상기 정답데이터와 상기 산출된 테스트확률분포를 비교하는 단계; 상기 비교결과에 기초하여 상기 정답데이터와 상기 테스트확률분포가 소정레벨 이상 차이나면 상기 제1필터와 상기 제2필터 중 적어도 하나를 조절하는 단계를 포함할 수 있다.The training of recognition of the text may include: transmitting, by the training module, the generated test image to the feature extraction module; Extracting a test feature value by applying the first filter to the test image; Calculating, by the component determining module, a test probability distribution by applying a second filter to the test feature value and delivering the test probability distribution to the training module; Comparing the correct answer data with the calculated test probability distribution; And adjusting at least one of the first filter and the second filter when the correct answer data and the test probability distribution differ by more than a predetermined level based on the comparison result.

본 발명의 상기한 문제점을 해결하기 위한 본 발명의 일 실시예에 따르면, 텍스트 인식방법을 실행하기 위한 프로그램이 기록된 컴퓨터에서 판독 가능한 기록매체가 제공된다.According to one embodiment of the present invention for solving the above problems of the present invention, there is provided a computer-readable recording medium on which a program for executing a text recognition method is recorded.

본 발명의 일실시예에 따르면, 글자와 수식이 혼재된 이미지에서 텍스트 인식 정확도와 인식 속도가 향상된다.According to an embodiment of the present invention, text recognition accuracy and recognition speed are improved in an image in which letters and expressions are mixed.

나아가, 본 발명의 다른 실시예에 따르면 이미지에서 한글과 수식이 함께 포함된 텍스트를 정확히 구분하여 인식할 수 있다.Furthermore, according to another embodiment of the present invention, it is possible to correctly distinguish and recognize text including Korean and mathematical expressions in an image.

또한, 본 발명의 또 다른 실시예에 따르면, 텍스트 인식장치에 수행되는 머신러닝을 통해 텍스트를 보다 빠르고 정확하게 인식할 수 있게 된다.In addition, according to another embodiment of the present invention, through the machine learning performed on the text recognition device it is possible to recognize the text faster and more accurately.

도 1은 본 발명의 일 실시예에 따른 사용자단말의 사용예를 도시한다.
도 2는 본 발명의 일 실시예에 따른 이미지에 포함된 텍스트 인식을 이용하는 시스템의 구성도를 도시한다.
도 3은 본 발명의 실시예에 따른 텍스트 인식장치의 블록도를 도시한다.
도 4는 본 발명의 일 실시예에 따른 텍스트 인식 프로그램에 포함되는 모듈을 도시한다.
도 5는 본 발명의 일 실시예에 따른 이미지에 포함된 특징을 추출하는 예를 도시한다.
도 6은 본 발명의 일 실시예에 따른 이미지에 포함된 특징의 추출 시, 입력 대상, 마스크 및 출력 대상을 디지털화하여 도시한다.
도 7은 본 발명의 일 실시예에 따른 확률분포를 추출하는 예를 도시한다.
도 8은 본 발명의 일 실시예에 따른 중심픽셀을 산출하고, 중심픽셀로부터 소정의 거리에 있는 픽셀들을 일군의 픽셀로 결정하는 예를 도시한다.
도 9는 본 발명의 일 실시예에 따른 이미지에 포함된 텍스트의 인식방법의 순서도를 도시한다.
도 10은 본 발명의 다른 일 실시예에 따른 트레이닝모듈을 포함하는 텍스트 인식 프로그램을 도시한다.
도 11은 본 발명의 다른 일 실시예에 따른 이미지에 포함된 텍스트의 인식방법의 순서도를 도시한다.
도 12는 본 발명의 다른 일실시예에 따른 텍스트 인식장치의 블록도를 도시한다.1 illustrates an example of use of a user terminal according to an embodiment of the present invention.
2 is a block diagram of a system using text recognition included in an image according to an embodiment of the present invention.
3 is a block diagram of a text recognition apparatus according to an embodiment of the present invention.
4 illustrates a module included in a text recognition program according to an embodiment of the present invention.
5 illustrates an example of extracting a feature included in an image according to an embodiment of the present invention.
FIG. 6 illustrates a digital representation of an input object, a mask, and an output object when extracting a feature included in an image according to an exemplary embodiment of the present invention.
7 illustrates an example of extracting a probability distribution according to an embodiment of the present invention.
8 illustrates an example of calculating a center pixel according to an embodiment of the present invention and determining pixels at a predetermined distance from the center pixel as a group of pixels.
9 is a flowchart illustrating a method of recognizing text included in an image according to an embodiment of the present invention.
10 illustrates a text recognition program including a training module according to another embodiment of the present invention.
11 is a flowchart illustrating a method of recognizing text included in an image according to another exemplary embodiment of the present invention.
12 is a block diagram of a text recognition device according to another embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 게시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 게시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention, and methods for achieving them will be apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various forms, and only the embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the art to which the present invention pertains. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims. Like reference numerals refer to like elements throughout.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다. 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다.Unless otherwise defined, all terms used in the present specification (including technical and scientific terms) may be used in a sense that can be commonly understood by those skilled in the art. In addition, terms that are defined in a commonly used dictionary are not ideally or excessively interpreted unless they are specifically defined clearly. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase.

본 명세서에서, 텍스트란, 사람이 작성하고 해석할 수 있는 교환 가능한 정보로서, 문자, 숫자, 기호, 도형 및 그림 중 어느 하나 및 이의 결합체를 의미한다. 특히, 본 발명의 실시예에 따르면, 텍스트란, 후술될 텍스트 인식장치에 의한 이미지 인식의 결과물일 수 있다. 이하, 실시예들에 의해 설명되는 본 발명의 사상은 이미지에 포함된 텍스트를 인식하고, 이를 이용하는 장치에 적용될 수 있다. 본 발명에 따른 텍스트 인식장치는 컴퓨터, 휴대용단말기, 스마트폰, 서버, 웨어러블 디바이스 등 외부장치와 다양한 전자장치에 적용될 수 있다. In the present specification, text refers to interchangeable information that can be written and interpreted by a person, and means any one of letters, numbers, symbols, figures, and pictures, and a combination thereof. In particular, according to an embodiment of the present invention, the text may be a result of image recognition by a text recognition apparatus to be described later. Hereinafter, the idea of the present invention described by the embodiments may be applied to a device that recognizes text included in an image and uses the same. The text recognition apparatus according to the present invention may be applied to an external device such as a computer, a portable terminal, a smartphone, a server, a wearable device, and various electronic devices.

도 1은 본 발명의 일 실시예에 따른 사용자단말의 사용예를 도시한다.1 illustrates an example of use of a user terminal according to an embodiment of the present invention.

도 1을 참조하면, 사용자단말(100)은 사용자의 조작에 따라 텍스트가 쓰여진 종이(10)를 촬영하고 이미지(101)를 생성한다. 생성된 이미지(101)는 사용자단말(100)의 디스플레이부(미도시)에 표시될 수 있다. 사용자단말(100)은 사용자의 추가 조작에 의해 이미지(101)를 외부장치로 전송할 수 있다. 여기에서, 외부장치는, 사용자단말(100)과 네트워크를 통해 통신할 수 있는 컴퓨팅 장치이다. 일 실시예에 따르면, 외부장치는 이미지 상의 텍스트를 인식할 수 있는 서버 장치일 수 있으나, 본 발명의 실시예는 이에 한정되지 않으며, 외부장치는, 데스크톱, 랩톱, 스마트폰, 태블릿 등의 단말장치일 수도 있다.Referring to FIG. 1, the user terminal 100 photographs the paper 10 on which text is written according to a user's manipulation and generates an image 101. The generated image 101 may be displayed on a display unit (not shown) of the user terminal 100. The user terminal 100 may transmit the image 101 to an external device by a user's additional manipulation. Here, the external device is a computing device capable of communicating with the user terminal 100 through a network. According to an embodiment, the external device may be a server device capable of recognizing text on an image, but embodiments of the present invention are not limited thereto, and the external device may be a terminal device such as a desktop, a laptop, a smartphone, a tablet, and the like. It may be.

이미지(101)는 사용자단말(100)이 직접 촬영하여 생성되는 것에 한정되지 아니한다. 이미지(101)는 외부로부터 수신되거나, 사용자의 조작에 따라 사용자단말(100) 내부에 생성되어 저장될 수 있다.The image 101 is not limited to being generated by directly photographing the user terminal 100. The image 101 may be received from the outside or may be generated and stored in the user terminal 100 according to a user's manipulation.

외부장치는 수신된 이미지(101)에서 텍스트를 인식하고, 인식된 텍스트를 다른 단말장치로 전송한다. 외부장치는 다른 단말장치로부터 상기 인식된 텍스트에 대응하는 정보(103)가 수신되면, 수신된 정보(103)를 사용자단말(100)로 다시 전달한다. 수신된 정보(103)는 사용자단말(100)의 디스플레이부에 표시될 수 있다.The external device recognizes the text in the received image 101 and transmits the recognized text to another terminal device. When the external device receives the information 103 corresponding to the recognized text from another terminal device, the external device transfers the received information 103 to the user terminal 100 again. The received information 103 may be displayed on the display unit of the user terminal 100.

일례로서, 종이(10)에 포함된 텍스트는 문제를 포함할 수 있다. 또한, 인식된 텍스트에 대응하는 정보(103)는 문제에 대한 정답과 그에 대한 해설을 포함할 수 있다.As one example, the text included in the paper 10 may include a problem. In addition, the information 103 corresponding to the recognized text may include a correct answer to the problem and a description thereof.

도 2는 본 발명의 일 실시예에 따른 이미지에 포함된 텍스트 인식을 이용하는 시스템의 구성도를 도시한다.2 is a block diagram of a system using text recognition included in an image according to an embodiment of the present invention.

이하, 도 2를 참조하여 본 발명의 일 실시예에 따른 이미지에 포함된 텍스트 인식을 이용하는 시스템의 구성 및 동작을 구체적으로 설명한다. 설명의 편의를 위하여, 이미지에 포함된 텍스트 인식을 이용하는 시스템을 이하, 시스템 이라고도 한다.Hereinafter, the configuration and operation of a system using text recognition included in an image according to an embodiment of the present invention will be described in detail with reference to FIG. 2. For convenience of explanation, a system using text recognition included in an image is also referred to as a system hereinafter.

도 2를 참조하면, 시스템은 사용자단말(100), 텍스트 인식장치(200) 및 정답제공자단말(300)을 포함할 수 있다. 사용자단말(100), 텍스트 인식장치(200) 및 정답제공자단말(300)은 서로 간에 데이터를 송수신할 수 있는 전자장치로 구성된다. 이를 위해 예컨대, 사용자단말(100), 텍스트 인식장치(200) 또는 정답제공자단말(300)은 서버, 데스크탑 PC와 같은 고정형 컴퓨터장치, 노트북, 스마트폰, 태블릿 피씨, 휴대용 단말, 웨어러블 디바이스 등 모바일 컴퓨터장치 중 어느 하나로 구성될 수 있다.Referring to FIG. 2, the system may include a user terminal 100, a text recognition apparatus 200, and a correct answer provider terminal 300. The user terminal 100, the text recognition apparatus 200, and the correct answer provider terminal 300 are composed of electronic devices capable of transmitting and receiving data to and from each other. To this end, for example, the user terminal 100, the text recognition device 200 or the correct answer provider terminal 300 is a server, a fixed computer device such as a desktop PC, laptops, smartphones, tablet PCs, portable terminals, wearable devices such as mobile computers It may consist of any of the devices.

사용자단말(100)은 텍스트를 포함하는 이미지를 텍스트 인식장치(200)로 전송한다. 특히, 본 발명의 일 실시예에 따르면, 텍스트 인식장치(200)는 도 1에 대한 설명에서 상술된 외부장치일 수 있다.The user terminal 100 transmits an image including the text to the text recognition apparatus 200. In particular, according to an embodiment of the present invention, the text recognition apparatus 200 may be the external device described above in the description of FIG. 1.

텍스트 인식장치(200)는 이미지에 포함된 텍스트를 인식하고, 인식된 텍스트를 정답제공자단말(300)로 전송하거나, 데이터베이스(Data Base: DB)에 저장한다.The text recognition apparatus 200 recognizes the text included in the image and transmits the recognized text to the correct answer provider terminal 300 or stores the text in a database.

정답제공자단말(300)은 텍스트 인식장치(200)로부터 인식된 텍스트를 수신한다. 정답제공자단말(300)의 사용자는 인식된 텍스트를 참조하여 정답제공자단말(300)에 인식된 텍스트에 대응하는 정보를 입력한다. 인식된 텍스트가 일례로서 수학문제인 경우 이에 대응하는 정보는 수학문제의 정답일 수 있다. 정답제공자단말(300)은 입력된 정보를 포함하는 데이터를 텍스트 인식장치(200)로 전송한다. 특히, 본 발명의 일 실시예에 따르면, 정답제공자단말(300)은 도 1에 대한 설명에서 상술된 다른 단말장치일 수 있다.The correct answer provider terminal 300 receives the recognized text from the text recognition apparatus 200. The user of the correct answer provider terminal 300 inputs information corresponding to the recognized text to the correct answer provider terminal 300 with reference to the recognized text. If the recognized text is a math problem as an example, the corresponding information may be a correct answer to the math problem. The correct answer provider terminal 300 transmits data including the input information to the text recognition apparatus 200. In particular, according to an embodiment of the present invention, the answer provider terminal 300 may be another terminal device described above in the description of FIG.

텍스트 인식장치(200)는 데이터로서 저장된 인식된 텍스트와 관련된 정보 및/또는 정답제공자단말(300)로부터 수신되는 정보를 사용자단말(100)로 전송한다. 일례로서, 텍스트 인식장치(200)는 인식된 텍스트가 수학문제일 경우 저장된 데이터에 동일한 문제가 있는지 검색한다. 텍스트 인식장치(200)는 동일한 문제에 대한 해답이 데이터로서 포함되어 있다면, 검색된 해답을 사용자단말(100)로 전송할 수 있다. 또한, 텍스트 인식장치(200)는 인식된 수학문제와 유사하거나 관련된 문제들이 검색되면 관련 정보로서 사용자단말(100)로 전송할 수도 있다.The text recognition apparatus 200 transmits the information related to the recognized text stored as data and / or the information received from the correct answer provider terminal 300 to the user terminal 100. As an example, when the recognized text is a mathematical problem, the text recognizing apparatus 200 searches whether the stored data has the same problem. If the answer to the same problem is included as data, the text recognition apparatus 200 may transmit the searched solution to the user terminal 100. In addition, the text recognizing apparatus 200 may transmit the related information to the user terminal 100 as related information when a similar or related problem is detected.

사용자단말(100)은 텍스트 인식장치(200)로부터 수신되는 정보를 사용자에게 제공한다.The user terminal 100 provides the user with information received from the text recognition apparatus 200.

도 2에 도시되지 않았으나, 시스템은 사용자단말(100), 텍스트 인식장치(200) 그리고 정답제공자단말(300) 외에도 데이터 전송을 중개하는 중개 서버, 데이터의 동기화를 위한 서버 등을 추가로 포함할 수 있다.Although not shown in FIG. 2, the system may further include an intermediary server for mediating data transmission, a server for data synchronization, etc. in addition to the user terminal 100, the text recognition apparatus 200, and the correct answer provider terminal 300. have.

또한, 여러 사용자가 이미지를 전송하고, 여러 사용자가 이에 대응하는 정보를 제공할 수 있도록 복수개의 사용자단말과 정답제공자단말이 구비될 수 있다. In addition, a plurality of user terminals and a correct answer provider terminal may be provided so that multiple users may transmit images and multiple users may provide corresponding information.

사용자단말(100)과 정답제공자단말(300)은 설명의 편의를 위해 단말장치를 역할에 따라 구분해 둔 것으로서, 동일한 단말장치가 필요에 따라 서로 역할을 달리하여 기능을 수행할 수도 있다.The user terminal 100 and the correct answer provider terminal 300 are divided into terminal devices according to roles for convenience of explanation, and the same terminal devices may perform functions by different roles from each other as necessary.

도 3은 본 발명의 실시예에 따른 텍스트 인식장치의 블록도를 도시한다.3 is a block diagram of a text recognition apparatus according to an embodiment of the present invention.

도 3을 참조하면, 텍스트 인식장치(200)는 도 2에 대한 설명에서 상술한 바와 같이, 텍스트를 포함하는 이미지를 수신한다. 텍스트 인식장치(200)는 수신된 이미지에서 텍스트를 인식한다. 텍스트 인식장치(200)는 이미지에서 인식된 텍스트를 외부로 전송 및 저장한다. 텍스트 인식장치(200)는 인식된 텍스트에 대응하는 정보를 외부로부터 수신하거나 텍스트 인식장치(200)에 미리 저장된 정보를 검색하여 이미지를 제공한 장치에 전송한다.Referring to FIG. 3, the text recognition apparatus 200 receives an image including text as described above with reference to FIG. 2. The text recognition apparatus 200 recognizes text in the received image. The text recognition apparatus 200 transmits and stores the text recognized in the image to the outside. The text recognition apparatus 200 receives information corresponding to the recognized text from the outside or searches for information previously stored in the text recognition apparatus 200 and transmits the information to the apparatus providing the image.

도 3을 참조하면, 텍스트 인식장치(200)는 프로세서(201), 네트워크 인터페이스(203), 메모리(205) 및 스토리지(207)를 포함한다.Referring to FIG. 3, the text recognition apparatus 200 includes a processor 201, a network interface 203, a memory 205, and a storage 207.

프로세서(201)는 텍스트 인식장치(200)의 전반적인 동작을 제어한다. 프로세서(201)는 CPU(Central Processing Unit), MPU(Micro Processor Unit), MCU(Micro Controller Unit), 또는 본 발명의 기술 분야에 잘 알려진 임의의 형태의 프로세서를 포함하여 구성된다. 또한, 프로세서(201)는 본 발명의 실시예들에 따른 방법을 실행하기 위한 적어도 하나의 애플리케이션 또는 프로그램에 대한 연산을 수행한다. 텍스트 인식장치(200)는 적어도 하나 이상의 프로세서를 구비할 수 있다.The processor 201 controls the overall operation of the text recognition apparatus 200. The processor 201 includes a central processing unit (CPU), a micro processor unit (MPU), a micro controller unit (MCU), or any type of processor well known in the art. In addition, the processor 201 performs operations on at least one application or program for executing a method according to embodiments of the present invention. The text recognition apparatus 200 may include at least one processor.

네트워크 인터페이스(203)는 텍스트 인식장치(200)의 외부와의 통신을 수행한다. 네트워크 인터페이스(203)는 텍스트 인식장치(200)가 외부 장치들과 통신하며, 데이터를 송수신하도록 한다. 송수신되는 데이터의 종류는 음성, 이미지, 텍스트, 동영상 등 다양하며 전술된 예에 한정되지 아니한다. The network interface 203 communicates with the outside of the text recognition apparatus 200. The network interface 203 allows the text recognition apparatus 200 to communicate with external devices and to transmit and receive data. Types of data transmitted and received are various, such as a voice, an image, a text, a video, and the like, and are not limited to the above examples.

네트워크 인터페이스(203)는 외부장치와 통신하기 위해 유선통신을 위한 접속부를 포함할 수 있다. 접속부는 HDMI(high definition multimedia interface), HDMI-CEC(consumer electronics control), USB, 컴포넌트(component) 등의 규격에 따른 신호/데이터를 송/수신할 수 있으며, 이들 각각의 규격에 대응하는 적어도 하나 이상의 커넥터 또는 단자를 포함한다. 텍스트 인식장치(200)는 유선 LAN(Local Area Network)을 통해 복수의 서버들과 유선 통신을 수행할 수 있다.The network interface 203 may include a connection for wired communication to communicate with an external device. The connection unit may transmit / receive signals / data according to standards of high definition multimedia interface (HDMI), HDMI-consumer electronics control (USB-CEC), USB, component, etc., and at least one corresponding to each standard The above connector or terminal is included. The text recognition apparatus 200 may perform wired communication with a plurality of servers through a wired local area network (LAN).

네트워크 인터페이스(203)가 지원하는 통신의 종류는 유선통신에 한정되지 아니한다. 네트워크 인터페이스(203)는 무선통신을 수행하기 위해 RF(radio frequency)신호를 송수신하는 RF회로를 포함할 수 있으며, Wi-fi, 블루투스, 지그비(Zigbee), UWM(Ultra-Wide Band), Wireless USB, NFC(Near Field Communication) 중 하나 이상의 통신을 수행하도록 구성될 수 있다.The type of communication supported by the network interface 203 is not limited to wired communication. The network interface 203 may include an RF circuit for transmitting and receiving RF (radio frequency) signals for performing wireless communication, and may include Wi-fi, Bluetooth, Zigbee, Ultra-Wide Band (UWM), and Wireless USB. , NFC may be configured to perform at least one communication of Near Field Communication (NFC).

메모리(205)는 각종 데이터, 명령 및/또는 정보를 저장한다. 메모리(205)에는 스토리지(207)로부터 프로그램의 적어도 일부가 로드 될 수 있다. 메모리(205)는 기록해독이 가능하며 읽기 또는 쓰기 속도가 빠른 휘발성 메모리(volatile memory)로 구성된다. 일례로서, 메모리(205)에 RAM, DRAM 또는 SRAM 중 어느 하나가 구비될 수 있다.The memory 205 stores various data, commands and / or information. At least a part of a program may be loaded into the memory 205 from the storage 207. The memory 205 is composed of volatile memory capable of reading and writing and having a high read or write speed. As one example, the memory 205 may be provided with any one of RAM, DRAM, or SRAM.

스토리지(207)는 외부로부터 수신한 정보 및 데이터를 저장하기 위한 모듈이다. 스토리지(207)는 텍스트 인식장치(200)에 공급되는 전원이 차단되더라도 데이터들이 남아있어야 하며, 변동사항을 반영할 수 있도록 쓰기 가능한 비휘발성 메모리(non-volatile memory)로 구비될 수 있다. 스토리지(207)는 ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리 등과 같은 하드 디스크, 착탈형 디스크, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터로 읽을 수 있는 기록 매체를 포함하여 구성될 수 있다.The storage 207 is a module for storing information and data received from the outside. The storage 207 should retain data even when the power supplied to the text recognition device 200 is cut off, and may be provided as a non-volatile memory that can be written to reflect changes. The storage 207 may be a hard disk, a removable disk, such as a read only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable ROM (EPROM), flash memory, or the like, or any form well known in the art. And a computer readable recording medium.

스토리지(207)에는 텍스트 인식 프로그램(210) 및 데이터(212)가 저장될 수 있다. 데이터(212)는 이미지에서 인식된 텍스트 및 외부장치와 주고 받은 기록들을 포함할 수 있다. 일례로서, 데이터(212)에는 다양한 문제와 문제의 키워드, 문제의 난이도, 과목, 배우는 학년 등이 관련 정보로서 추가로 포함할 수도 있다. 다른 예로서, 데이터(212)는 문제에 대한 해설, 해설을 작성한 정답제공자에 대한 정보 등을 포함할 수도 있다.The storage 207 may store a text recognition program 210 and data 212. The data 212 may include text recognized in an image and records exchanged with an external device. As an example, the data 212 may further include various problems, keywords of the problem, difficulty of the problem, a subject, a grade of learning, and the like as related information. As another example, the data 212 may include a description of the problem, information about the answer provider who wrote the commentary, and the like.

이하, 본 발명의 실시예들에 따른 방법들은 텍스트 인식장치(200)에 의해 수행되는 것으로 설명한다.Hereinafter, methods according to embodiments of the present invention will be described as being performed by the text recognition apparatus 200.

이하, 상술한 도 1 내지 3에 대한 설명을 바탕으로, 본 발명의 실시예들을 구체적으로 설명하도록 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the above description with reference to FIGS. 1 to 3.

도 4는 본 발명의 일 실시예에 따른 텍스트 인식 프로그램에 포함되는 모듈을 도시한다. 4 illustrates a module included in a text recognition program according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시예에 따른 텍스트 인식 프로그램(210)은 특징추출모듈(211), 컴퍼넌트결정모듈(213) 그리고 텍스트완성모듈(215)을 포함한다. 텍스트 인식 프로그램(210)은 저장모듈(217)과 제어모듈(219)을 더 포함할 수도 있다. 이하, 텍스트 인식 프로그램(210)의 각 구성요소의 기능은 모듈 단위로서, 도 3에서 참조된 프로세서(201)에 의해 실행됨으로써 구현될 수 있다. 즉, 후술되는, 특징추출모듈(211), 컴퍼넌트결정모듈(213), 텍스트완성모듈(215), 저장모듈(217) 및 제어모듈(219) 등 각 모듈이 주체로서 수행하는 동작은, 프로세서(201)에 의해 각 모듈이 실행됨으로써 구현되는 기능을 의미한다.Referring to FIG. 4, the text recognition program 210 according to an embodiment of the present invention includes a feature extraction module 211, a component determination module 213, and a text completion module 215. The text recognition program 210 may further include a storage module 217 and a control module 219. Hereinafter, the functions of each component of the text recognition program 210 may be implemented by being executed by the processor 201 referred to in FIG. 3 as a module unit. That is, the operations performed by the respective modules, such as the feature extraction module 211, the component determination module 213, the text completion module 215, the storage module 217, and the control module 219, which will be described later, may be performed by a processor ( By 201) means a function implemented by executing each module.

특징추출모듈(211)은 수신된 이미지에 컨벌루션을 수행하여 특징값을 추출하도록 구성된다. 컨벌루션이란 수신된 이미지 중 적어도 일부에 마스크의 가중치를 곱하여 수신된 이미지보다 더 작은 크기의 출력값을 출력하도록 영상처리하는 것을 의미한다. 이미지는 숫자, 문자, 기호, 여백 등 다양한 컴퍼넌트를 포함할 수 있다. 이때, 마스크는, 이미지에 포함된 각 컴퍼넌트에서 특징값을 추출하기 위한 소프트웨어 툴로써, 윈도 또는 커널이라고도 한다.The feature extraction module 211 is configured to perform convolution on the received image to extract feature values. By convolution, at least a part of the received image is multiplied by a weight of a mask to image the output to output an output having a smaller size than the received image. An image can contain various components, such as numbers, letters, symbols, and margins. In this case, the mask is a software tool for extracting feature values from each component included in the image, and is also referred to as a window or a kernel.

특징추출모듈(211)은 입력데이터에 제1필터를 적용하여 출력데이터를 출력한다. 제1필터의 적용이 완료된 출력데이터를 특징값이라고도 한다. 이하, 제1 필터가 적용되는 과정을 제1 필터링이라 칭하기로 한다.The feature extraction module 211 outputs output data by applying a first filter to the input data. Output data on which the first filter is applied is also referred to as a feature value. Hereinafter, a process in which the first filter is applied will be referred to as first filtering.

제1필터링에 따른, 특징값의 출력은 하나 이상의 단계로 수행될 수 있다. 이와 같이 하나 이상의 단계가 수행되는 동안, 각 단계마다 서로 다른 마스크가 적용될 수 있다. 특징추출모듈(211)은 이미지에 대하여 글자별 세그멘테이션을 수행하지 않고, 입력데이터의 모든 픽셀에 대하여 제1필터링을 수행한다. 마스크는 보다 정확히 특징값을 추출하기 위해 여러 계층으로 중첩될 수 있다.According to the first filtering, the output of the feature value may be performed in one or more steps. As described above, while one or more steps are performed, different masks may be applied to each step. The feature extraction module 211 performs first filtering on all pixels of the input data without performing letter segmentation on the image. Masks can be nested in multiple layers to more accurately extract feature values.

이때, 마스크의 계층의 개수는, 이미지의 각 컴퍼넌트로부터 추출하고자 하는 특징의 개수에 대응할 수 있다. 특징값이란 이미지에 포함될 수 있는 각 컴퍼넌트의 특징에 대응하는 값을 의미하기도 한다. 예컨대, 이미지에 포함된 각 컴퍼넌트의 특징이란 일자 모양인 특징, 둥그런 모양인 특징, 꺾인 모양인 특징 등을 의미한다. 이러한 다양한 모양의 특징들을 특징값으로 추출하기 위해 다양한 계층의 마스크가 특징추출모듈(211)에 구비된다. 특징은 위에서 설명된 내용에 한정되지 아니하며, 이미지 내에 글자를 인식하기 위한 다양한 특징이 존재할 수 있다. 마스크의 계층의 개수가 많을수록 보다 다양한 특징값이 추출된다. 더 다양한 특징값이 추출될 수록 이미지에 존재하는 텍스트의 컴퍼넌트가 더 정확하게 결정된다. In this case, the number of layers of the mask may correspond to the number of features to be extracted from each component of the image. The feature value may mean a value corresponding to a feature of each component that can be included in an image. For example, the feature of each component included in the image means a feature of a straight shape, a feature of a round shape, a feature of a curved shape, and the like. Masks of various layers are provided in the feature extraction module 211 to extract features of various shapes as feature values. The feature is not limited to the above-described content, and there may be various features for recognizing letters in the image. The larger the number of hierarchies of masks, the more various feature values are extracted. The more various feature values are extracted, the more accurately the components of the text present in the image are determined.

이하, 설명의 편의를 위해 제1필터링에 사용되는 복수개의 마스크의 서로 다른 가중치들을 제1가중치라고도 한다. 예컨대, 제1가중치를 조절하는 것은 복수개의 마스크의 서로 다른 가중치 중 적어도 일부를 수정하는 것을 의미한다.Hereinafter, for convenience of description, different weights of the plurality of masks used in the first filtering may be referred to as first weights. For example, adjusting the first weight means modifying at least some of different weights of the plurality of masks.

이하, 특징추출모듈(211)의 동작을 도 5와 도 6을 참조하여 보다 자세히 설명하도록 한다. 도 5는 본 발명의 일 실시예에 따른 이미지에 포함된 특징을 추출하는 예를 도시한다. 또한, 도 6은 본 발명의 일 실시예에 따른 이미지에 포함된 특징을 추출하는 예를 도시한다. 특히, 도 6에서, 이미지에 포함된 특징, 상기 특징을 추출하기 위한 마스크 및 추출된 출력 데이터의 기계어 처리 과정을 예로써 도시한다.Hereinafter, the operation of the feature extraction module 211 will be described in more detail with reference to FIGS. 5 and 6. 5 illustrates an example of extracting a feature included in an image according to an embodiment of the present invention. 6 illustrates an example of extracting a feature included in an image according to an embodiment of the present invention. In particular, in FIG. 6, a feature included in an image, a mask for extracting the feature, and a machine language processing process of the extracted output data are shown as an example.

도 5를 참조하면, 이미지인 입력데이터(500)에 제1마스크(501)를 적용하여 제1출력데이터(502)를 생성하고, 제1출력데이터(502)에 제2마스크(503)를 적용하여 제2출력데이터(504)를 생성하는 과정이 도시된다. 입력데이터(500)는 제1출력데이터(502)보다 많은 픽셀을 포함하고, 제1출력데이터(502)는 제2출력데이터(504)보다 많은 픽셀을 포함한다. 제1마스크(501)와 제2마스크(503)는 서로 다른 크기를 가질 수 있다.Referring to FIG. 5, the first output data 502 is generated by applying the first mask 501 to the input data 500 which is an image, and the second mask 503 is applied to the first output data 502. The process of generating the second output data 504 is shown. The input data 500 includes more pixels than the first output data 502, and the first output data 502 includes more pixels than the second output data 504. The first mask 501 and the second mask 503 may have different sizes.

도 6을 참조하면, 도 5를 보다 자세히 설명하기 위해 각 픽셀에 기계어가 예시로서 입력된 입력데이터(601), 제1마스크(601) 그리고 제1출력데이터(602)가 도시된다. 각 픽셀에 있는 숫자는 설명의 편의를 위해 제공되는 예시일 뿐으로, 입력된 숫자와 픽셀의 크기 등에 본 발명의 사상이 한정되는 것은 아니다.Referring to FIG. 6, input data 601, a first mask 601, and a first output data 602, in which a machine language is input to each pixel as an example, are described in more detail with reference to FIG. 5. The numbers in each pixel are merely examples provided for convenience of explanation, and the spirit of the present invention is not limited to the input numbers and the size of the pixels.

제1마스크(501)와 제2마스크(503)는 다양한 특징값을 추출하기 위해 각각 다양한 가중치를 갖는 복수의 계층이 구비될 수 있다. 각 단계별 마스크의 계층이 중첩되는 개수만큼 출력데이터의 개수가 늘어난다. 또한, 처리하고자 하는 입력데이터(500)의 크기에 따라 특징값을 추출하는 단계가 더 포함될 수 있다. 단계가 추가되면 특징추출모듈(211)에 복수의 계층을 갖는 제3마스크, 제4마스크 등이 추가적으로 구비될 수도 있다.The first mask 501 and the second mask 503 may be provided with a plurality of layers each having various weights in order to extract various feature values. The number of output data increases by the number of layers of masks in each stage. The method may further include extracting feature values according to the size of the input data 500 to be processed. When the step is added, the third and fourth masks having a plurality of layers may be additionally provided in the feature extraction module 211.

컴퍼넌트결정모듈(213)은 특징추출모듈(211)에서 추출된 특징값을 이용하여 각 픽셀마다 특정한 컴퍼넌트가 존재할 수 있는지 확률을 산출하고, 산출결과에 기초하여 이미지에 포함된 텍스트의 구성인 컴퍼넌트를 인식컴퍼넌트로 결정한다. 컴퍼넌트결정모듈(213)은 분류기(Classifier)라고도 불린다. 컴퍼넌트결정모듈(213)은 신경망을 이용한다. 컴퍼넌트는 예컨대, 한글의 자음 및 모음, 수식기호, 숫자, 영문 알파벳 등을 포함한다. 공백(space)은 아무 글자가 없는 상태로서 본 발명의 일 실시예에 따르면, 마치 글자의 컴퍼넌트 중 하나로 취급될 수 있다. 컴퍼넌트결정모듈(213)은 각 픽셀에 대해서 추출된 특징값들에 제2가중치를 갖는 제2필터를 적용하여 각 컴퍼넌트가 해당 픽셀에 존재할 확률을 산출한다. 예를 들어 컴퍼넌트결정모듈(213)은 ab라는 텍스트가 이미지에 포함된 경우, 이미지상 a가 위치하는 픽셀들에 컴퍼넌트 'a'가 위치할 확률을 높게 산출할 수 있다. 또한, 컴퍼넌트결정모듈(213)은, b가 위치하는 픽셀들에 컴퍼넌트 'b'가 위치할 확률을 높게 산출할 수 있다. 나아가, 컴퍼넌트결정모듈(213)은 a와 b 글자 내부에 존재하는 공백인 픽셀에도 주위 픽셀들의 특징값을 참조함으로써, 공백인 부분 역시 컴퍼넌트 'a' 및 'b'가 존재할 확률을 높게 산출할 수 있다. 이 때, 컴퍼넌트결정모듈(213)은 a와 b가 위치하는 픽셀들에는 다른 컴퍼넌트들이 위치할 확률을 낮게 산출할 수도 있다. 또한, 컴퍼넌트결정모듈(213)은 a와 b가 위치하지 않는 픽셀들에는 공백컴퍼넌트가 위치할 확률을 높게 산출할 수 있다. 다음으로, 컴퍼넌트결정모듈(213)은 산출된 결과를 이용하여 인식컴퍼넌트를 결정한다.The component determination module 213 calculates a probability that a specific component may exist for each pixel by using the feature values extracted by the feature extraction module 211, and based on the calculation result, the component determination module 213 calculates a component that is a component of text included in the image. Determined by the recognition component. The component determination module 213 is also called a classifier. The component determination module 213 uses a neural network. Components include, for example, consonants and vowels in Korean, modifiers, numbers, alphabets, and the like. Space is a state in which there is no letter, and according to an embodiment of the present invention, it may be treated as one of the components of the letter. The component determination module 213 calculates a probability that each component exists in the pixel by applying a second filter having a second weight value to the feature values extracted for each pixel. For example, when the text ab is included in the image, the component determination module 213 may calculate a high probability that the component 'a' is positioned in the pixels where a is located on the image. In addition, the component determination module 213 may calculate a high probability that the component 'b' is positioned in the pixels where b is positioned. Furthermore, the component determination module 213 may refer to the feature values of the surrounding pixels in the blank pixels existing inside the letters a and b, thereby calculating a high probability of the presence of the components 'a' and 'b'. have. At this time, the component determination module 213 may calculate a low probability that other components are located in the pixels where a and b are located. In addition, the component determination module 213 may calculate a high probability that a blank component is positioned in pixels where a and b are not located. Next, the component determination module 213 determines the recognition component using the calculated result.

이하, 컴퍼넌트결정모듈(213)의 자세한 동작을 도 7 및 도 8을 참조하여 보다 상세하게 설명한다.Hereinafter, the detailed operation of the component determination module 213 will be described in more detail with reference to FIGS. 7 and 8.

도 7은 본 발명의 일 실시예에 따른 확률분포를 추출하는 예를 도시한다.7 illustrates an example of extracting a probability distribution according to an embodiment of the present invention.

도 7을 참조하면, 컴퍼넌트결정모듈(213)이 추출된 특징값을 이용하여 확률분포를 산출하는 일 예를 도시한다. 컴퍼넌트결정모듈(213)이 사용하는 출력데이터는 특징추출모듈(211)에서 출력되는 최종 출력데이터이다. 전술한 바와 같이 입력데이터에서 특징값을 추출하는 단계는 도 5의 제1마스크(501)와 제2마스크(503)가 적용되는 2개의 단계로 한정되지 않으나, 이하에서는 설명의 편의를 위해 제2출력데이터(504)를 최종 출력데이터로 가정하여 설명한다.Referring to FIG. 7, an example in which the component determination module 213 calculates a probability distribution using extracted feature values is illustrated. The output data used by the component determination module 213 is final output data output from the feature extraction module 211. As described above, the step of extracting the feature value from the input data is not limited to two steps to which the first mask 501 and the second mask 503 of FIG. 5 are applied. The output data 504 is assumed to be final output data.

컴퍼넌트결정모듈(213)은 제2출력데이터(504)에 포함되는 특징값을 이용하여 이미지의 각 픽셀에 각 컴퍼넌트가 존재할 확률을 산출한다. 이를 위해, 제2가중치가 설정된 제2필터를 입력된 제2출력데이터(504)에 적용할 수 있다. 컴퍼넌트결정모듈(213)은 각 컴퍼넌트마다 레이어를 생성한다. 이어서, 컴퍼넌트결정모듈(213)은 각 컴퍼넌트마다 각 픽셀에 존재할 확률을 도식화한 확률분포(700)를 생성한다. The component determination module 213 calculates a probability that each component exists in each pixel of the image by using the feature value included in the second output data 504. To this end, a second filter having a second weight value set may be applied to the input second output data 504. The component determination module 213 generates a layer for each component. Subsequently, the component determination module 213 generates a probability distribution 700 that illustrates the probability that each component exists in each pixel.

추출된 확률분포(700)를 살펴보면 이미지상에서 '공백(space)', 'a', 'b', '2', 'c', '4', 'd', '+', '=', ''등 각 컴퍼넌트가 존재할 확률이 높은 픽셀이 표시된다. 컴퍼넌트결정모듈(213)은 실제로 이미지에 존재하지 않는 다른 컴퍼넌트, 예컨대 'ㄷ', 'e', 'f' '<' 등, 에 대해서도 확률분포를 산출하나, 도시는 생략한다. 즉, 컴퍼넌트결정모듈(213)은 이미지에 존재하는 컴퍼넌트에 대해서만 확률을 산출하는 것이 아니라, 다른 모든 컴퍼넌트에 대한 레이어를 생성하여 존재할 확률을 산출한다. 도면에서는 편의상 특정 컴퍼넌트가 실제 위치하는 곳에만 해당 컴퍼넌트가 위치할 확률이 높은 것으로 산출되었다고 도시하고 있으나, 이에 한정되지 않는다. 즉, 특정 컴퍼넌트가 위치한 픽셀에 해당 컴퍼넌트가 존재할 확률과 다른 컴퍼넌트가 존재할 확률이 각각 존재할 수 있다. 예컨대, 컴퍼넌트 'a'와 인접한 곳에 위치한 특정 픽셀에서는 'a'가 존재할 확률이 70%, 'e'가 존재할 확률이 40%로 나타날 수 있다.Looking at the extracted probability distribution 700, 'space', 'a', 'b', '2', 'c', '4', 'd', '+', '=', Pixels with high probability that each component is present. The component determination module 213 calculates probability distributions for other components that do not actually exist in the image, such as 'c', 'e', 'f', '<', etc., but the illustration is omitted. In other words, the component determination module 213 calculates the probability of existence by generating layers for all other components, instead of calculating the probability only for the components existing in the image. In the drawings, for convenience, it is calculated that the probability that the corresponding component is located only where the specific component is actually located is not limited thereto. That is, the probability that the corresponding component exists and the probability that the other component exists in the pixel in which the specific component is located may exist. For example, a specific pixel located adjacent to the component 'a' may have a 70% probability of having 'a' present and a 40% probability of having 'e' present.

컴퍼넌트결정모듈(213)은 산출결과에 기초하여 특정 픽셀에서 존재할 확률이 가장 높은 컴퍼넌트를 해당 픽셀에서 인식되는 컴퍼넌트인 인식컴퍼넌트로 결정할 수 있다.The component determination module 213 may determine, based on the calculation result, the component having the highest probability of being present in a specific pixel as a recognition component that is a component recognized in the pixel.

도 8은 본 발명의 일 실시예에 따른 중심픽셀을 산출하고, 중심픽셀로부터 소정의 거리에 있는 픽셀들을 일군의 픽셀로 결정하는 예를 도시한다.8 illustrates an example of calculating a center pixel according to an embodiment of the present invention and determining pixels at a predetermined distance from the center pixel as a group of pixels.

컴퍼넌트결정모듈(213)은 일군의 픽셀(1001, 1011, 1021)에 하나의 컴퍼넌트가 위치한다고 결정할 수 있다. 이를 위해 컴퍼넌트결정모듈(213)은 결정된 인식컴퍼넌트의 모양에 기반하여 인식컴퍼넌트의 중심좌표를 산출하고, 중심좌표가 위치한 중심픽셀(1000, 1010, 1020)을 결정한다. 그리고, 컴퍼넌트결정모듈(213)은 중심픽셀(1000, 1010, 1020)과 거리가 소정의 기준거리레벨이하이고, 결정된 인식컴퍼넌트가 존재할 확률이 소정의 기준확률레벨을 초과하는 픽셀들을 일군의 픽셀(1001, 1011, 1021)로 결정할 수 있다. 기준거리레벨이란, 컴퍼넌트결정모듈(213)이 산출된 중심픽셀로부터 인식컴퍼넌트를 포함하는 픽셀을 결정하기 위하여 미리 설정된 거리 값일 수 있다. 또한, 기준확률레벨이란, 컴퍼넌트결정모듈(213)이 일군의 픽셀에 존재하는 컴퍼넌트를 인식컴퍼넌트로 결정하는데 있어, 기준이 되는 확률 값일 수 있다. The component determination module 213 may determine that one component is located in the group of pixels 1001, 1011, and 1021. To this end, the component determination module 213 calculates a center coordinate of the recognition component based on the determined shape of the recognition component, and determines the center pixels 1000, 1010, and 1020 where the center coordinate is located. In addition, the component determination module 213 may include pixels in which the distance between the center pixels 1000, 1010, and 1020 is equal to or less than a predetermined reference distance level, and the probability that the determined recognition component exists exceeds a predetermined reference probability level. 1001, 1011, and 1021. The reference distance level may be a distance value preset in order for the component determination module 213 to determine a pixel including the recognition component from the calculated center pixel. In addition, the reference probability level may be a probability value that serves as a reference when the component determination module 213 determines a component present in a group of pixels as a recognition component.

컴퍼넌트결정모듈(213)은 중심픽셀로부터 기준거리레벨 이내이고, 중심픽셀에 결정된 인식컴퍼넌트와 같은 컴퍼넌트가 존재할 확률이 기준확률레벨을 초과하는 픽셀들에 대해서, 다른 컴퍼넌트가 존재할 확률이 산출되더라도 일군의 픽셀로 결정할 수 있다. 컴퍼넌트결정모듈(213)은 일군의 픽셀들에 대해서 중심픽셀의 인식컴퍼넌트와 같은 컴퍼넌트를 인식컴퍼넌트로 결정한다. The component determination module 213 is within a reference distance level from the center pixel, and for the pixels whose probability that a component such as a recognition component determined at the center pixel exists exceeds the reference probability level, the probability that another component exists is calculated even if the group is calculated. Can be determined in pixels. The component determination module 213 determines a component, such as a recognition component of a center pixel, as a recognition component for a group of pixels.

이는, 일군의 픽셀들에 중심픽셀의 인식컴퍼넌트와 다른 컴퍼넌트가 존재할 확률이 더 높게 산출되더라도 마찬가지다. 예컨대, 컴퍼넌트결정모듈(213)은 산출된 특징값에 기초하여 특정 픽셀에서 'a'가 존재할 확률을 50%로 산출하고, 'e'가 존재할 확률을 65%로 산출할 수 있다. 그 특정 픽셀이 'a' 컴퍼넌트의 중심픽셀(1000)로부터 소정 기준거리레벨 이내이고, 'a'가 존재할 확률 또한 기준확률레벨을 초과한다면, 컴퍼넌트결정모듈(213)은 'e'가 존재할 확률이 'a'가 존재할 확률보다 더 높게 산출됨에도 불구하고, 해당 픽셀에서 'a'컴퍼넌트를 인식컴퍼넌트로 결정할 수 있다. 컴퍼넌트결정모듈(213)은 픽셀별로 결정된 인식컴퍼넌트를 텍스트완성모듈(215)로 전달한다.This is true even if the group of pixels has a higher probability that there is a different component from the recognition component of the center pixel. For example, the component determination module 213 may calculate a probability that 'a' exists in a specific pixel as 50% based on the calculated feature value, and calculate a probability that 'e' exists as 65%. If the particular pixel is within a predetermined reference distance level from the center pixel 1000 of the 'a' component, and the probability that 'a' exists also exceeds the reference probability level, the component determination module 213 has a probability that 'e' exists. Although 'a' is calculated to be higher than the probability of existence, the 'a' component may be determined as a recognition component in the corresponding pixel. The component determination module 213 transfers the recognition component determined for each pixel to the text completion module 215.

즉, 본 발명의 일실시예에 따른 특징추출모듈(211)과 컴퍼넌트결정모듈(213)은 이미지 전체에 대해서 특징값을 추출하고, 추출된 특징값을 이용하여 컴퍼넌트가 존재할 확률을 산출한다. 본 발명에 따르면, 이미지에 포함된 글자별로 세그멘테이션을 수행한 후, 세그멘테이션에 따라 생성된 각 세그먼트에 포함된 글자가 한글인지, 영어인지 또는 수식인지 판단하는 과정이 생략된다. 본 발명의 일 실시예에 따르면 전체 이미지에 포함된 글자를 바로 추출 및 인식할 수 있게 되어, 세그멘테이션 후 개별적으로 글자에 대한 판단이 요구되는 종래 기술 대비 적은 판단 과정이 수행되고 이에 따라 이미지에 포함된 텍스트를 더 빠르고 정확하게 인식할 수 있다.That is, the feature extraction module 211 and the component determination module 213 according to an embodiment of the present invention extract feature values for the entire image and calculate a probability that a component exists using the extracted feature values. According to the present invention, after performing segmentation for each letter included in the image, a process of determining whether a letter included in each segment generated according to the segmentation is Korean, English, or mathematical is omitted. According to an embodiment of the present invention, it is possible to directly extract and recognize the letters included in the entire image, so that fewer judgments are performed compared to the prior art in which judgment on the letters is individually performed after the segmentation and thus included in the image. Recognize text faster and more accurately

텍스트완성모듈(215)은 컴퍼넌트결정모듈(213)에서 결정된 인식컴퍼넌트를 조합하여 텍스트를 완성한다. 텍스트완성모듈(215)은 아직 조합되지 않은 인식컴퍼넌트들을 조합하여 텍스트를 완성한다. 예컨대, 텍스트완성모듈(215)은 인식컴퍼넌트가 한글 자소 (ㄱ, ㅏ, ㅇ)인 경우, 이를 조합하여 '강'이라는 글자를 완성한다. 나아가, 텍스트완성모듈(215)은 수식(부호, 분수표시, 시그마, 조합표시, 지수 등)인 인식컴퍼넌트를 조합하여 이미지에 포함되는 실제 수식을 완성한다. 이후, 텍스트완성모듈(215)은 완성된 글자와 수식들을 조합하여 한 줄로 만든다. 이 과정에서, 한 줄에 포함된 글자가 앞뒤 문맥에 기초하여 명백한 오기라고 판단되는 경우, 자동으로 수정을 가하거나, 사용자에게 확인을 구하여 수정할 수도 있다. 이를 위해, 텍스트 인식장치(200)는 디스플레이를 구비하고, 사용자의 확인을 구하고 오기를 수정하는 인터페이스를 출력할 수 있다.The text completion module 215 combines the recognition components determined by the component determination module 213 to complete the text. The text completion module 215 combines the recognition components that are not yet combined to complete the text. For example, the text completion module 215 combines these when the recognition component is a Hangul alphabet (a, ㅏ, ㅇ), thereby completing the letter 'gang'. Further, the text completion module 215 combines the recognition components which are equations (signs, fraction display, sigma, combination display, exponent, etc.) to complete the actual equations included in the image. Then, the text completion module 215 combines the completed letters and expressions into a single line. In this process, if it is determined that the characters included in a line are obvious coming based on the context before and after, it may be corrected automatically or may be corrected by asking the user for confirmation. To this end, the text recognition apparatus 200 may include a display and output an interface for obtaining a user's confirmation and correcting a mistake.

텍스트완성모듈(215)의 동작은 전술한 내용에 한정되지 아니하고, 결정된 인식컴퍼넌트들을 다양한 방식으로 조합할 수 있도록 구현된다.The operation of the text completion module 215 is not limited to the above description, and is implemented to combine the determined recognition components in various ways.

저장모듈(217)은 텍스트 인식 프로그램(210)의 실행에 따라 생성되고 외부와 송수신되는 정보를 텍스트 인식장치(200)의 스토리지(207)에 데이터(212)로서 저장한다. The storage module 217 stores the information generated according to the execution of the text recognition program 210 and transmitted / received to the outside as data 212 in the storage 207 of the text recognition apparatus 200.

제어모듈(219)은 텍스트 인식 프로그램(210)의 각 모듈들의 전반적인 동작을 제어한다.The control module 219 controls the overall operation of each module of the text recognition program 210.

도 9는 본 발명의 일 실시예에 따른 이미지에 포함된 텍스트의 인식방법의 순서도를 도시한다.9 is a flowchart illustrating a method of recognizing text included in an image according to an embodiment of the present invention.

먼저, 동작 S500에서 텍스트 인식장치(200)는 사용자단말(100)로부터 이미지를 수신한다. 이미지는 사용자단말(100)이 직접 촬영하여 생성한 것일 수도 있고, 사용자단말(100)이 인터넷 등을 통해 외부로부터 수신한 것일 수도 있으며, 사용자가 사용자단말(100)을 조작하여 생성한 것일 수도 있다. First, in operation S500, the text recognition apparatus 200 receives an image from the user terminal 100. The image may be generated by directly photographing by the user terminal 100, may be received by the user terminal 100 from the outside through the Internet, or may be generated by a user operating the user terminal 100. .

그리고, 동작 S501에서, 특징추출모듈(211)은 이미지에서 특징값을 추출한다. 특징추출모듈(211)은 이미지의 모든 픽셀에 제1가중치의 제1필터를 적용함으로써 특징값을 추출한다.In operation S501, the feature extraction module 211 extracts a feature value from an image. The feature extraction module 211 extracts a feature value by applying a first filter of a first weight value to all pixels of the image.

그리고, 동작 S502에서, 컴퍼넌트결정모듈(213)은 추출된 특징값에 기초하여 인식컴퍼넌트를 결정한다. 컴퍼넌트결정모듈(213)은 추출된 특징값을 이용하여 픽셀별로 각 컴퍼넌트가 있을 확률을 산출하고, 산출된 확률에 기초하여 해당 픽셀에 위치할 확률이 가장 높은 컴퍼넌트를 인식컴퍼넌트로 결정할 수 있다.In operation S502, the component determining module 213 determines the recognition component based on the extracted feature value. The component determination module 213 may calculate the probability that each component exists for each pixel by using the extracted feature value, and determine the component having the highest probability of being located in the pixel based on the calculated probability as the recognition component.

마지막으로, 동작 S503에서, 텍스트완성모듈(215)은 결정된 인식컴퍼넌트를 조합하여 텍스트를 완성한다.Finally, in operation S503, the text completion module 215 combines the determined recognition components to complete the text.

도 10은 본 발명의 다른 일 실시예에 따른 트레이닝모듈을 포함하는 텍스트 인식 프로그램을 도시한다.10 illustrates a text recognition program including a training module according to another embodiment of the present invention.

본 발명의 다른 일 실시예에 다른 텍스트 인식 프로그램(210)은 텍스트 인식의 정확도를 학습시키기 위한 트레이닝모듈(1100)을 더 포함할 수 있다. 본 실시예에서의 텍스트 인식 프로그램(210)은 트레이닝모듈(1100)을 이용한 머신러닝(Machine learning)을 통해 보다 정확하게 텍스트를 인식하도록 학습한다. Another text recognition program 210 according to another embodiment of the present invention may further include a training module 1100 for learning the accuracy of text recognition. In the present embodiment, the text recognition program 210 learns to recognize text more accurately through machine learning using the training module 1100.

트레이닝모듈(1100)은 특징추출모듈(211), 컴퍼넌트결정모듈(213) 및 텍스트완성모듈(215) 중 적어도 하나를 학습시킨다. 트레이닝모듈(1100)은 학습을 위한 학습데이터를 생성하고, 이 중 적어도 일부를 특징추출모듈(211)에 제공한다. 학습데이터는 예컨대, 테스트이미지와 정답데이터를 포함한다. 정답데이터는 테스트이미지에 포함된 텍스트의 각 컴퍼넌트에 대한 확률분포값을 정답확률분포로 포함하고, 테스트이미지에 포함된 텍스트를 정답텍스트로 포함할 수 있다. 트레이닝모듈(1100)은 수식 및 한글이 포함된 이미지를 렌더링하고, 이미지에 랜덤한 형태의 노이즈를 첨가하며, 렌더링된 이미지에서 각 좌표마다 해당 좌표에 존재하는 컴퍼넌트에 대한 정답데이터를 생성한다. 정답데이터는 정답확률분포를 포함한다. 다른예로서, 트레이닝모듈(1100)은 학습데이터를 생성하지 아니하고 외부로부터 수신할 수도 있다.The training module 1100 learns at least one of the feature extraction module 211, the component determination module 213, and the text completion module 215. The training module 1100 generates learning data for learning, and provides at least some of them to the feature extraction module 211. The training data includes, for example, test images and correct answer data. The correct answer data may include a probability distribution value for each component of the text included in the test image as the correct probability distribution, and include the text included in the test image as the correct answer text. The training module 1100 renders an image including equations and Korean characters, adds noise in a random form to the image, and generates correct answer data for components existing in corresponding coordinates for each coordinate in the rendered image. The correct answer data includes the correct answer probability distribution. As another example, the training module 1100 may receive from the outside without generating the training data.

특징추출모듈(211)과 컴퍼넌트결정모듈(213)은 수신된 학습데이터에 포함된 테스트이미지에 제1가중치를 갖는 제1필터를 적용하여 테스트특징값을 추출하고, 추출된 테스트특징값에 제2가중치를 갖는 제2필터를 적용하여 확률분포를 산출할 수 있다. 테스트이미지에서 산출된 확률분포를 이하에서 테스트 확률분포라 칭하기로 한다.The feature extraction module 211 and the component determination module 213 extract a test feature value by applying a first filter having a first weight to a test image included in the received training data, and extract a test feature value from the extracted test feature value. A probability distribution may be calculated by applying a second filter having a weight. The probability distribution calculated from the test image will be referred to as a test probability distribution below.

트레이닝모듈(1100)은 테스트 확률분포와 정답확률분포를 비교하고, 비교결과에 기초하여 특징추출모듈(211)과 컴퍼넌트결정모듈(213)을 학습시킬 수 있다. 일례로서, 트레이닝모듈(1100)은 테스트 확률분포 및 정답확률분포를 비교함으로써 오류(error)레벨을 산출한다. 나아가, 트레이닝모듈(1100)은 산출된 오류레벨에 기초하여 제1가중치 및/또는 제2가중치를 조절할 수 있다. 이를 위해 트레이닝모듈(1100)을 비롯한 텍스트 인식 프로그램(210)에 역전달(back propagation) 알고리즘이 적용될 수 있다. The training module 1100 may compare the test probability distribution and the correct answer probability distribution and train the feature extraction module 211 and the component determination module 213 based on the comparison result. As an example, the training module 1100 calculates an error level by comparing a test probability distribution and a correct answer probability distribution. Further, the training module 1100 may adjust the first weight value and / or the second weight value based on the calculated error level. To this end, a back propagation algorithm may be applied to the text recognition program 210 including the training module 1100.

나아가, 텍스트완성모듈(215)은 추출된 테스트 확률분포로부터 텍스트를 완성할 수 있다. 트레이닝모듈(1100)은 텍스트완성모듈(215)에서 완성된 텍스트와 정답데이터에 포함된 정답텍스트를 비교하고, 비교결과에 기초하여 텍스트완성모듈(215)을 학습시킬 수도 있다.Further, the text completion module 215 may complete the text from the extracted test probability distribution. The training module 1100 may compare the text completed in the text completion module 215 with the correct answer text included in the correct answer data, and train the text completion module 215 based on the comparison result.

다만, 본 발명에 따른 트레이닝모듈(1100)이 특징추출모듈(211), 컴퍼넌트결정모듈(213) 그리고 텍스트완성모듈(215)을 학습시키는 과정은 전술된 설명에 한정되지 아니한다. 본 발명에는 이미 공지된 다양한 머신러닝(Machine Learning) 기법이 적용될 수 있다.However, the process of training the feature extraction module 211, the component determination module 213 and the text completion module 215 by the training module 1100 according to the present invention is not limited to the above description. Various known machine learning techniques can be applied to the present invention.

도 11은 본 발명의 다른 일 실시예에 따라 학습과정이 추가된 이미지에 포함된 텍스트의 인식방법의 순서도를 도시한다.11 is a flowchart illustrating a method of recognizing text included in an image to which a learning process is added according to another embodiment of the present invention.

학습과정은 동작 S600내지 동작 S602에서 이루어진다.The learning process is performed in operations S600 to S602.

먼저, 동작 S600에서, 트레이닝모듈(1100)이 테스트이미지와 정답데이터를 포함하는 학습데이터를 수신한다. 다른 예에 따르면, 트레이닝모듈(1100)은 직접 학습데이터를 생성할 수도 있다. 트레이닝모듈(1100)은 테스트이미지를 특징추출모듈(211)에 테스트이미지를 전달한다. First, in operation S600, the training module 1100 receives training data including a test image and correct answer data. According to another example, the training module 1100 may directly generate learning data. The training module 1100 transfers the test image to the feature extraction module 211.

그리고, 동작 S601에서, 특징추출모듈(211)이 테스트이미지에서 특징값을 추출하고, 컴퍼넌트결정모듈(213)이 추출된 특징값을 이용하여 테스트확률분포를 추출한다.In operation S601, the feature extraction module 211 extracts a feature value from the test image, and the component determination module 213 extracts a test probability distribution using the extracted feature value.

이후, S602에서, 트레이닝모듈(1100)이 테스트확률분포와 정답데이터를 비교한다. 그리고 비교결과에 기초하여 특징추출모듈(211)의 제1필터의 제1가중치와 컴퍼넌트결정모듈(213)의 제2필터의 제2가중치 중 적어도 하나를 조절한다.Then, in S602, the training module 1100 compares the test probability distribution and the correct answer data. Then, at least one of the first weight of the first filter of the feature extraction module 211 and the second weight of the second filter of the component determination module 213 is adjusted based on the comparison result.

상기의 학습과정이 완료된 후, 텍스트 인식장치(200)는 동작 S1103 내지 S1106을 통해 기 설명된 이미지에서 텍스트를 인식한다.After the above learning process is completed, the text recognition apparatus 200 recognizes the text in the previously described image through operations S1103 to S1106.

우선, 동작 S603에서, 텍스트 인식장치(200)가 이미지를 수신한다.First, in operation S603, the text recognition apparatus 200 receives an image.

그리고, 동작 S604에서, 특징추출모듈(211)이 상기에서 조절된 제1가중치를 갖는 제1필터를 수신된 이미지에 적용하여 특징값을 추출한다.In operation S604, the feature extraction module 211 extracts a feature value by applying the first filter having the first weight value adjusted above to the received image.

그후, 동작 S605에서, 컴퍼넌트결정모듈(213)이 추출된 특징값에 조절된 제2가중치를 갖는 제2필터를 적용함으로써 확률분포를 산출한다. 그리고, 컴퍼넌트결정모듈(213)은 산출된 확률분포를 이용하여 픽셀마다 인식컴퍼넌트를 결정한다.Then, in operation S605, the component determination module 213 calculates the probability distribution by applying the second filter having the adjusted second weight to the extracted feature value. Then, the component determination module 213 determines the recognition component for each pixel using the calculated probability distribution.

마지막으로, 동작 S606에서, 텍스트완성모듈(215)은 결정된 인식컴퍼넌트를 조합하여 텍스트를 완성한다.Finally, in operation S606, the text completion module 215 combines the determined recognition components to complete the text.

도 12는 본 발명의 다른 일실시예에 따른 텍스트 인식장치의 블록도를 도시한다.12 is a block diagram of a text recognition device according to another embodiment of the present invention.

도 12를 참조하면, 본 실시예에서의 스토리지(207)는 슬라이서(1201)를 더 포함할 수 있다.Referring to FIG. 12, the storage 207 in the present embodiment may further include a slicer 1201.

슬라이서(1201)는 스토리지(207)에 저장되는 프로그램으로서, 그 기능은 도 3에서 참조된 프로세서(201)에 의해 실행됨으로써 구현된다. 즉, 후술되는 슬라이서(1201)의 동작은 프로세서(201)에 의해 실행됨으로써 구현되는 기능을 의미한다. 네트워크 인터페이스(203)로부터 이미지(1200)가 전달되면, 슬라이서(1201)는 수신된 이미지(1200)의 크기를 판단한다. 슬라이서(1201)는 수신된 이미지(1200)가 소정 크기 이상인 경우 이미지(1200)를 적당한 크기로 나누어 이미지조각들을 생성한다. 슬라이서(1201)는 이미지조각들을 텍스트 인식 프로그램(210)으로 전달한다. 즉, 슬라이서(1201)는 텍스트 인식 프로그램(210)이 너무 큰 이미지(120)를 한번에 인식하지 않고 나누어 인식하도록 하기 위해, 이미지(1200)를 잘라서 텍스트 인식 프로그램(210)에 제공한다. The slicer 1201 is a program stored in the storage 207, and its function is implemented by being executed by the processor 201 referred to in FIG. 3. That is, the operation of the slicer 1201 to be described later refers to a function implemented by being executed by the processor 201. When the image 1200 is transmitted from the network interface 203, the slicer 1201 determines the size of the received image 1200. The slicer 1201 generates image fragments by dividing the image 1200 into an appropriate size when the received image 1200 has a predetermined size or more. The slicer 1201 transfers the pieces of the image to the text recognition program 210. That is, the slicer 1201 cuts the image 1200 and provides the text recognition program 210 to the text recognition program 210 so that the text recognition program 210 divides and recognizes the image 120 that is too large at once.

추가적인 실시예로서, 슬라이서(1201)는 이미지(1200)가 잘리는 경계선에 글자가 위치하는지를 판단할 수 있다. 글자가 위치하여 글자가 일부 잘린다고 판단되면, 슬라이서(1201)는 이미지(1200)에 포함된 글자가 중간에 잘리지 않도록 잘리는 크기를 적절하게 조절할 수도 있다.As an additional embodiment, the slicer 1201 may determine whether a letter is located at a boundary where the image 1200 is cut off. If it is determined that the letters are partially cut and the letters are located, the slicer 1201 may appropriately adjust the size of the cut so that the letters included in the image 1200 are not cut in the middle.

텍스트 인식 프로그램(210)은 이미지조각에서 텍스트(1210)를 인식하고, 인식된 텍스트를 슬라이서(1201)로 전달한다. 텍스트 인식 프로그램(210)이 이미지(1200)에서 텍스트(1210)를 인식하는 방법은 전술한 바 자세한 내용은 생략한다.The text recognition program 210 recognizes the text 1210 in the image fragment and transmits the recognized text to the slicer 1201. The method for the text recognition program 210 to recognize the text 1210 in the image 1200 has been described above.

슬라이서(1201)는 텍스트 인식 프로그램(210)이 제공한 이미지 조각에서 인식된 텍스트들을 취합한다. 슬라이서(1201)는 취합된 결과인 텍스트(1210)를 데이터(212)로서 스토리지에 저장하거나, 네트워크 인터페이스(203)로 전달한다. 네트워크 인터페이스(203)로 전달된 텍스트(1210)는 외부로 출력된다. The slicer 1201 collects the recognized texts from the image fragments provided by the text recognition program 210. The slicer 1201 stores the text 1210, which is the result of the aggregation, in the storage as data 212 or transmits the data to the network interface 203. The text 1210 transmitted to the network interface 203 is output to the outside.

본 실시예에 따르면, 지나치게 큰 크기의 이미지(1200)가 수신되더라도 속도가 지연됨 없이 처리하여 텍스트(1210)를 인식할 수 있다.According to the present exemplary embodiment, even if an image 1200 having an excessively large size is received, the text 1210 may be recognized by processing without delay.

100 사용자단말
200 텍스트인식장치
201 프로세서
203 네트워크 인터페이스
205 메모리
207 스토리지
210 텍스트 인식 프로그램
212 데이터
300 정답제공자단말100 User Terminal
200 Text Recognition Device
201 processor
203 network interface
205 memory
207 storage
210 text recognition program
212 data
300 correct answer providers

Claims

One or more processors;
A memory for loading a computer program executed by the processor;
And
Storage for storing said computer program and data,
The computer program,
Feature extraction module for extracting a feature value by applying a first filter to each of the plurality of pixels included in the image,
A probability distribution in which a plurality of components may exist for each of the plurality of pixels may be calculated by applying a second filter to the extracted feature value, and the selected component is selected using the calculated probability distribution among the plurality of components. A component determination module for determining the recognition component of each pixel constituting the text included in the text, and
And a text completion module for completing the text by combining the recognition components determined for each pixel.
The component determination module,
After the recognition component is determined for each of the plurality of pixels, a center coordinate of each of the recognition components is calculated based on the shape of each of the recognition components determined for each pixel, and the center pixel is based on the calculated center coordinates. The center pixel and the distance of the plurality of pixels is less than the reference distance level, adjacent to each other within a predetermined distance,
Characterized in that the pixels of the probability that the determined recognition component is greater than the reference probability level is determined as a group of pixels that the determined recognition components are the same, and it is determined that the recognition component is present in the group of pixels, the text Recognition device.

delete

The method of claim 1,
The computer program further comprises a training module for training at least one of the feature extraction module, the component determination module, and the text completion module.

The method of claim 5,
The training module is a text recognition device, characterized in that for generating a test image and the correct answer data.

The method of claim 6,
When the training module delivers the test image to the feature extraction module,
The feature extraction module extracts a test feature value by applying a first filter to the test image, and the component determination module calculates a test probability distribution by applying a second filter to the test feature value and delivers it to the training module. ,
The training module compares the correct answer data with the calculated test probability distribution and, if the correct answer data and the test probability distribution differ by more than a predetermined level, based on a comparison result, at least one of the first filter and the second filter. Text Recognition Device.

The method of claim 1,
And the component includes at least one of a character, a component of an expression, and a space.

In the text recognition method performed by the text recognition apparatus,
Extracting feature values by applying a first filter to each of the plurality of pixels in the image;
A probability distribution in which a plurality of components may exist for each of the plurality of pixels may be calculated by applying a second filter to the extracted feature value, and the selected component is selected using the calculated probability distribution among the plurality of components. Determining as a recognition component of each pixel constituting text included in the text; And
Comprising a combination of the recognition components determined for each pixel to complete the text,
Determining the recognition component,
Calculating a center coordinate of each recognition component based on a shape of each recognition component determined for each pixel after the recognition component is determined for each of the plurality of pixels;
Determining center pixels based on the calculated center coordinates;
The determined recognition components of the plurality of pixels have a distance equal to or less than a reference distance level and adjacent to each other within a predetermined distance, and the probability that the determined recognition component exists exceeds a reference probability level is equal to each other. Determining with a group of pixels; And
Determining that the recognition component is present in the group of pixels;
Text recognition method.

delete

The method of claim 9,
Training the recognition of the text by the text recognition device;
The training may include generating a test image and correct answer data by the training.

The method of claim 14,
Training the recognition of the text,
Extracting a test feature value by applying a first filter to the generated test image;
Calculating a test probability distribution by applying a second filter to the test feature value;
Comparing the correct answer data with the calculated test probability distribution; And
And adjusting at least one of the first filter and the second filter if the correct answer data and the test probability distribution differ by more than a predetermined level based on the comparison result.

The method of claim 9,
And the component comprises at least one of a character, a component of an expression, and a space.

A computer-readable recording medium having recorded thereon a program for executing the text recognition method according to any one of claims 9 and 14 to 16.