KR102200608B1

KR102200608B1 - Apparatus and method for character detection

Info

Publication number: KR102200608B1
Application number: KR1020190080110A
Authority: KR
Inventors: 김성호; 류준환
Original assignee: 영남대학교 산학협력단
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-01-08

Abstract

Disclosed are a character detection device and a method thereof. According to one embodiment of the present invention, the character detection device comprises: an image input unit receiving a test image including one or more cursive characters; a detection unit detecting one or more cursive character regions from the input test image using a character detection model learned based on a plurality of learning images each including one or more cursive characters; and an erroneous detection region removing unit removing an erroneous detection region among the plurality of cursive character regions based on whether overlapping between the detected plurality of cursive character regions is detected, when a plurality of cursive character regions are detected in the test image.

Description

Character detection device and method {APPARATUS AND METHOD FOR CHARACTER DETECTION}

본 발명의 실시예들은 문자 검출 기술과 관련된다.Embodiments of the present invention relate to character detection technology.

종래의 광학 문자 인식(Optical Character Recognition, OCR) 기법으로는 투영법을 이용한 후처리법, 세분화에 기반한 인식 방법, 각 문자를 검출한 후 단어를 조합하는 방법, 트리(Tree) 구조에 기반한 인식 및 탐지 방법 등이 제안된다. 그러나 이러한 기존 방법은 다양한 잡음(noise)에 취약하다는 문제점이 있다.Conventional Optical Character Recognition (OCR) techniques include a post-processing method using a projection method, a recognition method based on segmentation, a method of combining words after detecting each character, and a recognition and detection method based on a tree structure. Etc. are suggested. However, this existing method has a problem in that it is vulnerable to various noises.

이러한 문제를 해결하기 위해 최근 딥러닝(Deep Learning) 기반의 광학 문자 인식 기법이 제안되고 있으나, 이러한 기법들은 대부분 개별 문자가 아닌 문장 또는 단어를 검출하여 인식하고 있으며, 그 대상 또한 주로 영어로 한정되어 있어 고문서에 작성된 초서체를 검출하는 데 한계가 있다.In order to solve this problem, a deep learning-based optical character recognition technique has been recently proposed, but most of these techniques detect and recognize sentences or words rather than individual characters, and the target is also mainly limited to English. There is a limit in detecting cursive fonts written in old documents.

대한민국 등록특허공보 제10-1777601호 (2017.09.06.)Republic of Korea Patent Publication No. 10-1777601 (2017.09.06.)

본 발명의 실시예들은 문자 검출 장치 및 방법을 제공하기 위한 것이다.Embodiments of the present invention are to provide a character detection apparatus and method.

본 발명의 일 실시예에 따른 문자 검출 장치는, 하나 이상의 초서체 문자를 포함하는 테스트 이미지를 입력받는 이미지 입력부, 각각 하나 이상의 초서체 문자를 포함하는 복수의 학습 이미지를 기반으로 학습된 문자 검출 모델을 이용하여 상기 입력된 테스트 이미지에서 하나 이상의 초서체 문자 영역을 검출하는 검출부 및 상기 테스트 이미지에서 복수의 초서체 문자 영역이 검출된 경우, 상기 검출된 복수의 초서체 문자 영역 사이의 중복 여부에 기초하여 상기 복수의 초서체 문자 영역 중 오검출 영역을 제거하는 오검출 영역 제거부를 포함한다.The character detection apparatus according to an embodiment of the present invention uses an image input unit that receives a test image including one or more cursive characters, and a character detection model learned based on a plurality of training images each including one or more cursive characters. And a detection unit for detecting at least one cursive character area in the input test image, and when a plurality of cursive character areas are detected in the test image, the plurality of cursive characters based on whether or not overlap between the detected plurality of cursive character areas And an erroneous detection area removal unit for removing the erroneous detection area among the character areas.

상기 복수의 학습 이미지는, 하나 이상의 초서체 문자를 이용하여 작성된 문서를 스캔한 이미지를 포함할 수 있다.The plurality of learning images may include scanned images of documents created using one or more cursive characters.

상기 복수의 학습 이미지는 또한, 복수의 초서체 문자 각각에 대한 개별 문자 이미지 중 적어도 하나와, 초서체 문자를 포함하지 않는 하나 이상의 배경 이미지 중 하나를 조합하여 생성된 이미지를 포함할 수 있다.The plurality of training images may also include an image generated by combining at least one of individual character images for each of the plurality of cursive characters and one or more background images not including the cursive characters.

상기 문자 검출 모델은, 컨볼루션 뉴럴 네트워크(Convolutional Neural Network, CNN) 구조를 포함할 수 있다.The character detection model may include a convolutional neural network (CNN) structure.

상기 CNN 구조는, 상기 테스트 이미지로부터 추출된 특징 지도(Feature map)가 입력되는 복수의 제1 컨볼루션 층(Convolution layers) 및 상기 제1 컨볼루션 층을 통과한 특징 지도가 입력되는 복수의 제2 컨볼루션 층을 포함할 수 있다.The CNN structure includes a plurality of first convolution layers to which a feature map extracted from the test image is input, and a plurality of second convolution layers to which a feature map passing through the first convolution layer is input. It may include a convolutional layer.

상기 문자 검출 모델은 또한, 상기 검출된 하나 이상의 초서체 문자 영역 각각을 바운딩 박스(Bounding box) 형태로 생성할 수 있다.The character detection model may also generate each of the detected one or more cursive character regions in the form of a bounding box.

상기 문자 검출 모델은 또한, 상기 검출된 하나 이상의 초서체 문자 영역 각각에 대한 신뢰 점수(Confidence score)를 생성할 수 있다.The character detection model may also generate a confidence score for each of the detected one or more cursive character regions.

상기 신뢰 점수는, 상기 검출된 하나 이상의 초서체 문자 영역에 초서체 문자가 포함되어 있을 확률일 수 있다.The confidence score may be a probability that a cursive character is included in the detected one or more cursive character regions.

상기 오검출 영역 제거부는, 상기 복수의 초서체 문자 영역 각각의 크기 및 상기 복수의 초서체 문자 영역 사이의 겹치는 영역의 크기에 기초하여 상기 오검출 영역을 검출할 수 있다.The erroneous detection area removal unit may detect the erroneous detection area based on a size of each of the plurality of cursive character areas and an overlapping area between the plurality of cursive character areas.

상기 오검출 영역 제거부는 또한, 상기 신뢰 점수에 기초하여, 상기 복수의 초서체 문자 영역 중 하나의 영역을 기준 영역으로 설정하고, 아래의 수학식 1The erroneous detection area removing unit further sets one area of the plurality of cursive character areas as a reference area based on the confidence score, and the following Equation 1

[수학식 1][Equation 1]

(이때,

는 상기 기준 영역의 크기,

는 상기 나머지 영역 중 하나의 크기,

는 기 설정된 값)을 이용하여 상기 기준 영역과 상기 복수의 초서체 문자 영역 중 상기 기준 영역을 제외한 나머지 영역 각각 사이의 IOU(Intersection Over Union) 값을 산출하며, 상기 나머지 영역 중 상기 IOU 값이 기 설정된 값 이상인 영역을 상기 오검출 영역으로 판단할 수 있다.(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is a preset value) to calculate an IOU (Intersection Over Union) value between the reference area and the remaining areas except for the reference area among the plurality of cursive character areas, and the IOU value among the remaining areas is preset. An area greater than or equal to the value may be determined as the erroneous detection area.

본 발명의 일 실시예에 따른 문자 검출 방법은, 하나 이상의 초서체 문자를 포함하는 테스트 이미지를 입력받는 단계, 각각 하나 이상의 초서체 문자를 포함하는 복수의 학습 이미지를 기반으로 학습된 문자 검출 모델을 이용하여 상기 입력된 테스트 이미지에서 하나 이상의 초서체 문자 영역을 검출하는 단계 및 상기 테스트 이미지에서 복수의 초서체 문자 영역이 검출된 경우, 상기 검출된 복수의 초서체 문자 영역 사이의 중복 여부에 기초하여 상기 복수의 초서체 문자 영역 중 오검출 영역을 제거하는 단계를 포함한다.The character detection method according to an embodiment of the present invention includes receiving a test image including one or more cursive characters, each using a character detection model learned based on a plurality of training images including at least one cursive character. Detecting at least one cursive character area in the input test image, and when a plurality of cursive character areas are detected in the test image, the plurality of cursive character areas based on whether or not overlap between the detected plurality of cursive character areas And removing the erroneous detection area from among the areas.

상기 오검출 영역을 제거하는 단계는, 상기 복수의 초서체 문자 영역 각각의 크기 및 상기 복수의 초서체 문자 영역 사이의 겹치는 영역의 크기에 기초하여 상기 오검출 영역을 검출할 수 있다.In the removing of the erroneous detection area, the erroneous detection area may be detected based on a size of each of the plurality of cursive character areas and a size of an overlapping area between the plurality of cursive character areas.

상기 오검출 영역을 제거하는 단계는 또한, 상기 신뢰 점수에 기초하여, 상기 복수의 초서체 문자 영역 중 하나의 영역을 기준 영역으로 설정하는 단계, 아래의 수학식 1The step of removing the erroneous detection area may further include setting one of the plurality of cursive character areas as a reference area based on the confidence score, Equation 1 below

[수학식 1][Equation 1]

(이때,

는 상기 기준 영역의 크기,

는 상기 나머지 영역 중 하나의 크기,

는 기 설정된 값)을 이용하여 상기 기준 영역과 상기 복수의 초서체 문자 영역 중 상기 기준 영역을 제외한 나머지 영역 각각 사이의 IOU(Intersection Over Union) 값을 산출하는 단계 및 상기 나머지 영역 중 상기 IOU 값이 기 설정된 값 이상인 영역을 아래의 수학식 1을 이용하여 오검출 영역으로 판단하는 단계를 더 포함할 수 있다.(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is a preset value), calculating an IOU (Intersection Over Union) value between the reference region and the remaining regions other than the reference region among the plurality of cursive character regions, and the IOU value among the remaining regions. It may further include determining an area greater than or equal to the set value as an erroneous detection area using Equation 1 below.

상기 문자 검출 방법은, 상기 오검출 영역을 제거하는 단계 이후에, 아래의 수학식 2를 이용하여 성능 지표를 생성하는 단계를 더 포함할 수 있다.The character detection method may further include generating a performance index using Equation 2 below after removing the erroneous detection area.

[수학식 2][Equation 2]

(이때, # of FA= 진짜 양성(True positive) 검출 박스 개수 + 가짜 양성(False positive) 검출 박스 개수 - 검출된 정답(ground truth) 박스의 개수, # of GT = 정답 박스의 개수)(At this time, # of FA = number of true positive detection boxes + number of false positive detection boxes-number of detected ground truth boxes, # of GT = number of correct answer boxes)

본 발명의 실시예들에 따르면, 초서 영역 검출 결과로 생성된 바운딩 박스(Bounding box)들의 면적을 고려함으로써, 중복없이 초서체 문자가 포함된 초서 영역을 검출할 수 있다.According to embodiments of the present invention, by considering the areas of bounding boxes generated as a result of detecting a cursive area, it is possible to detect a cursive area including a cursive character without overlapping.

또한 본 발명의 실시예들에 따르면, 복수의 개별 문자 이미지(Cropped Character) 및 문자를 포함하지 않는 복수의 배경 이미지를 임의로 변형한 후 조합하여 생성된 이미지를 문자 검출 모델의 학습에 사용함으로써, 초서체 문자가 포함된 초서 영역 검출 성능을 향상시킬 수 있다.In addition, according to the embodiments of the present invention, a plurality of cropped characters and a plurality of background images not including characters are arbitrarily transformed and then combined to use the generated image for learning a character detection model, It is possible to improve the detection performance of a cursive area containing characters.

또한 본 발명의 실시예들에 따르면, 오검출 성능 평가 지표로서 이미지 당 오검출 개수를 판단하는 False Positives Per Image(FPPI) 대신 하나의 글자 당 오검출 개수를 판단하는 False Positives Per Character(FPPC)를 사용함으로써, 효율적으로 오검출 성능을 평가할 수 있다.In addition, according to embodiments of the present invention, instead of False Positives Per Image (FPPI), which determines the number of false positives per image, as an indicator of false detection performance, False Positives Per Character (FPPC), which determines the number of false positives per character, is used. By using it, it is possible to efficiently evaluate erroneous detection performance.

도 1은 종래의 문자 검출 네트워크 구조를 설명하기 위한 도면
도 2는 본 발명의 일 실시예에 따른 문자 검출 네트워크 구조를 설명하기 위한 도면
도 3은 본 발명의 일 실시예에 따른 문자 검출 장치의 블록도
도 4는 본 발명의 일 실시예에 따른 데이터셋을 설명하기 위한 도면
도 5는 본 발명의 일 실시예에 따른 문자 검출 모델 내 컨볼루션 뉴럴 네트워크(Convolutional Neural Network, CNN) 구조를 설명하기 위한 도면
도 6은 본 발명의 일 실시예에 따른 문자 검출 방법의 흐름도
도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a diagram for explaining a conventional character detection network structure
2 is a diagram for explaining the structure of a character detection network according to an embodiment of the present invention
3 is a block diagram of a character detection apparatus according to an embodiment of the present invention
4 is a diagram for explaining a data set according to an embodiment of the present invention
5 is a diagram for explaining a structure of a convolutional neural network (CNN) in a character detection model according to an embodiment of the present invention
6 is a flowchart of a character detection method according to an embodiment of the present invention
7 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The following detailed description is provided to aid in a comprehensive understanding of the methods, devices, and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments of the present invention, when it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of users or operators. Therefore, the definition should be made based on the contents throughout this specification. The terms used in the detailed description are only for describing embodiments of the present invention, and should not be limiting. Unless explicitly used otherwise, expressions in the singular form include the meaning of the plural form. In this description, expressions such as "comprising" or "feature" are intended to refer to certain features, numbers, steps, actions, elements, some or combination thereof, and one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, actions, elements, any part or combination thereof.

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램, 및 상기 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나, 또는 컴퓨터 소프트웨어 분야에서 통상적으로 사용 가능한 것일 수 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 프로그램의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Meanwhile, an embodiment of the present invention may include a program for performing the methods described in the present specification on a computer, and a computer-readable recording medium including the program. The computer-readable recording medium may include a program command, a local data file, a local data structure, or the like alone or in combination. The media may be specially designed and configured for the present invention, or may be commonly used in the field of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and specially configured to store and execute program instructions such as ROM, RAM, and flash memory. Hardware devices are included. Examples of the program may include not only machine language codes created by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

도 1은 종래의 초서 영역 검출 네트워크 구조를 설명하기 위한 도면(100)이다. 도시된 바와 같이, 종래의 초서 영역 검출 네트워크 구조는 복수 축척(multi-scale)의 컨볼루션 층(Convolution layers)을 포함하며, 복수 축척의 컨볼루션 층을 통과하여 얻어진 검출 결과는 신뢰 점수(Confidence score) 및 영역으로 표시된다.1 is a diagram 100 for explaining the structure of a conventional cursive area detection network. As shown, the conventional cursive region detection network structure includes multi-scale convolution layers, and the detection result obtained by passing through the multi-scale convolution layer is a confidence score. ) And areas.

이때 종래의 초서 영역 검출 네트워크 구조는 해당 검출 결과를 신뢰 점수 기준으로 정렬하여 가장 높은 신뢰 점수 값을 갖는 검출 결과에 대응되는 영역을 기준 영역으로 설정하고, 기준 영역과 나머지 영역 간 아래의 수학식 1을 통한 계산 결과, IOU 값이 기 설정된 값을 넘는 경우 가장 높은 신뢰 점수 값을 갖는 검출 결과를 제외한 나머지 검출 결과를 제거하는 방식의 고전적인 비최대값 억제(Non-maximum suppression, NMS) 알고리즘을 이용한다.At this time, in the conventional cursive region detection network structure, the detection result is aligned based on the confidence score, and the region corresponding to the detection result having the highest confidence score value is set as the reference region, and Equation 1 below between the reference region and the remaining regions When the IOU value exceeds a preset value, the classical non-maximum suppression (NMS) algorithm is used to remove the remaining detection results except for the detection result with the highest confidence score. .

[수학식 1][Equation 1]

(이때,

는 기준 영역의 크기,

는 나머지 영역 중 하나의 크기)(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas)

이러한 구조로 인해, 복수 축척의 컨볼루션 층 중 일부 큰 객체 검출에 이용되는 컨볼루션 층은 문서에 적힌 초서체 문자의 초서 영역 검출 시 불필요하게 되며, 검출 결과를 정렬하고 제거하는 과정에서 검출된 초서 영역을 나타내기 위한 바운딩 박스(Bounding box)의 면적을 고려하지 않기 때문에 중복된 초서 영역 검출이 발생할 수 있다.Due to this structure, the convolution layer used for detecting some large objects among the convolution layers of multiple scales becomes unnecessary when detecting the cursive area of the cursive character written on the document, and the cursive area detected in the process of sorting and removing the detection results. Since the area of the bounding box for indicating the Bounding box is not considered, detection of an overlapped cursive area may occur.

도 2는 본 발명의 본 발명의 일 실시예에 따른 초서 영역 검출 네트워크 구조를 설명하기 위한 도면(200)이다. 도시된 바와 같이, 본 발명의 일 실시예에 따른 초서 영역 검출 네트워크 구조는 각각 하나 이상의 초서체 문자를 포함하는 복수의 학습 이미지를 이용하여 학습된 문자 검출 모델을 포함하며, 해당 문자 검출 모델은 단일 축척(single-scale)의 컨볼루션 층을 포함한다. 해당 컨볼루션 층을 통과한 검출 결과는 기 설정된 항목을 기준으로 한 점수(score)와 검출된 초서체 문자를 나타내는 영역(box)의 형태로 현출되며, 대응되는 점수와 영역을 한 쌍으로 묶어서 정렬한 후 해당 점수가 가장 높은 점수-영역 쌍을 기준으로 하여 기 설정된 수식을 통한 계산 결과가 기 설정된 값을 넘는 경우 가장 높은 해당 항목 값을 갖는 검출 결과 쌍을 제외한 나머지 검출 결과 쌍을 제거하는 변형된 비최대값 억제 알고리즘을 적용한다.FIG. 2 is a diagram 200 for explaining the structure of a cursive area detection network according to an embodiment of the present invention. As shown, the structure of a cursive area detection network according to an embodiment of the present invention includes a character detection model trained using a plurality of training images each including one or more cursive characters, and the character detection model is a single scale (single-scale) convolutional layer. The detection result passing through the convolutional layer is displayed in the form of a score based on a preset item and a box representing the detected cursive character, and the corresponding score and area are grouped and aligned. After that, if the calculation result through a preset formula based on the score-area pair with the highest score exceeds the preset value, a modified ratio that removes the remaining detection result pairs except for the detection result pair with the highest corresponding item value. Apply the maximum suppression algorithm.

단일 축척의 컨볼루션 층 구조로 인해, 일반적인 객체(예를 들어, 자동차, 사람 등)에 비해 상대적으로 작은 초서체 문자의 초서 영역을 기존의 광학 문자 인식(Optical Character Recognition, OCR)을 위한 구조에 비해 상대적으로 간단히 검출할 수 있으며, 검출된 초서체 문자 영역을 나타내기 위한 바운딩 박스(Bounding box)의 면적을 고려함으로써 초서 영역의 중복된 검출을 방지할 수 있다.Due to the single-scale convolutional layer structure, the cursive area of cursive characters, which is relatively small compared to general objects (for example, cars, people, etc.), is compared to the existing structure for Optical Character Recognition (OCR). It can be detected relatively simply, and redundant detection of the cursive area can be prevented by considering the area of the bounding box for indicating the detected cursive character area.

도 3은 본 발명의 일 실시예에 따른 문자 검출 장치(300)의 블록도이다.3 is a block diagram of a character detection apparatus 300 according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시예에 따른 문자 검출 장치(300)는 이미지 입력부(310), 검출부(320) 및 오검출 영역 제거부(330)를 포함한다.Referring to FIG. 3, a character detection apparatus 300 according to an embodiment of the present invention includes an image input unit 310, a detection unit 320, and an erroneous detection area removal unit 330.

이미지 입력부(310)는 하나 이상의 초서체 문자를 포함하는 테스트 이미지를 입력받는다.The image input unit 310 receives a test image including one or more cursive characters.

검출부(320)는 각각 하나 이상의 초서체 문자를 포함하는 복수의 학습 이미지를 기반으로 학습된 문자 검출 모델을 이용하여 입력된 테스트 이미지에서 하나 이상의 초서체 문자 영역을 검출한다.The detection unit 320 detects at least one cursive character region from the input test image using a character detection model learned based on a plurality of training images each including at least one cursive character.

본 발명의 일 실시예에 따르면, 복수의 학습 이미지는 하나 이상의 초서체 문자를 이용하여 작성된 문서를 스캔한 이미지를 포함할 수 있다.According to an embodiment of the present invention, the plurality of training images may include scanned images of documents created using one or more cursive characters.

도 4는 본 발명의 일 실시예에 따른 데이터셋(400)을 설명하기 위한 도면이다. 데이터셋(400)은 본 발명 내에서 문자 검출 모델의 학습에 이용되는 복수의 학습 이미지를 포함하는 이미지들의 집합을 의미한다.4 is a diagram illustrating a data set 400 according to an embodiment of the present invention. The dataset 400 refers to a set of images including a plurality of training images used for learning a character detection model in the present invention.

도 4를 참조하면, 데이터셋(400)은 복수의 초서체 문자 각각에 대한 개별 문자 이미지(Cropped Character, 410), 초서체 문자를 포함하는 하나 이상의 이미지(420) 및 초서체 문자를 포함하지 않는 하나 이상의 배경 이미지(430)로 구성될 수 있다.Referring to FIG. 4, the dataset 400 includes a cropped character 410 for each of a plurality of cursive characters, at least one image 420 including cursive characters, and at least one background not including cursive characters. It may be composed of an image 430.

구체적으로, 데이터셋(400)은 복수의 초서체 문자 각각에 대한 개별 문자 이미지(410) 중 적어도 하나와, 초서체 문자를 포함하지 않는 하나 이상의 배경 이미지(430) 중 하나를 조합하여 생성된 이미지를 더 포함할 수 있다.Specifically, the dataset 400 further includes an image generated by combining at least one of individual character images 410 for each of a plurality of cursive characters and one or more background images 430 not including cursive characters. Can include.

더욱 상세하게, 데이터셋(400)은 초서체 문자를 포함하지 않는 하나 이상의 배경 이미지(430) 중 임의의 각도로 회전된 이미지 하나와 복수의 초서체 문자 각각에 대한 개별 문자 이미지(410) 중 임의로 조절된 글씨 크기 또는 선명도를 갖는 하나 이상의 이미지를 조합하여 생성된 이미지를 더 포함할 수 있다.In more detail, the dataset 400 is arbitrarily adjusted among one or more background images 430 that do not include cursive characters, one image rotated at an arbitrary angle, and individual character images 410 for each of a plurality of cursive characters. An image generated by combining one or more images having a font size or sharpness may be further included.

이때, 생성된 이미지에 포함된 하나 이상의 개별 문자 이미지(410)가 사라지지 않도록 선명도를 일정 값 이상으로 조절할 수 있다.In this case, the sharpness may be adjusted to a predetermined value or more so that one or more individual character images 410 included in the generated image do not disappear.

본 발명의 일 실시예에 따르면, 문자 검출 모델은 컨볼루션 뉴럴 네트워크(Convolutional Neural Network, CNN) 구조를 포함할 수 있다.According to an embodiment of the present invention, the character detection model may include a convolutional neural network (CNN) structure.

도 5는 본 발명의 일 실시예에 따른 문자 검출 모델 내 CNN 구조(500)를 설명하기 위한 도면이다.5 is a diagram illustrating a CNN structure 500 in a character detection model according to an embodiment of the present invention.

도 5를 참조하면, CNN 구조(500)는 테스트 이미지로부터 추출된 특징 지도(Feature map)가 입력되는 복수의 제1 컨볼루션 층(510) 및 제1 컨볼루션 층(510)을 통과한 특징 지도가 입력되는 복수의 제2 컨볼루션 층(520)을 포함할 수 있다.5, the CNN structure 500 is a feature map passing through a plurality of first convolutional layers 510 and first convolutional layers 510 into which a feature map extracted from a test image is input. It may include a plurality of second convolutional layers 520 are input.

구체적으로, 예를 들어 제1 컨볼루션 층(510)은 7개의 하부 컨볼루션 층들의 합으로 구성될 수 있으나, 하부 컨볼루션 층의 개수는 반드시 이에 한정되는 것은 아니다.Specifically, for example, the first convolution layer 510 may be composed of the sum of seven lower convolution layers, but the number of lower convolution layers is not limited thereto.

또한, 예를 들어 제2 컨볼루션 층(520)은 8개의 하부 컨볼루션 층들의 합으로 구성될 수 있으나, 하부 컨볼루션 층의 개수는 반드시 이에 한정되는 것은 아니다.Also, for example, the second convolutional layer 520 may be configured as a sum of eight lower convolutional layers, but the number of lower convolutional layers is not limited thereto.

또한, 제1 컨볼루션 층(510)을 통과한 특징 지도는 컨볼루션 연산에 의해 제1 컨볼루션 층(510)을 통과하기 전에 비해 줄어든 크기(scale)를 가질 수 있다.Also, the feature map that has passed through the first convolutional layer 510 may have a smaller scale than before passing through the first convolutional layer 510 by a convolution operation.

본 발명의 일 실시예에 따르면, CNN 구조(500)는 Resnet-34 네트워크의 일부를 구조의 일 부분으로서 포함할 수 있다. 예를 들어, Resnet-34 네트워크를 통과한 특징 지도가 7개의 3x3 컨볼루션 층을 통과한 후, stride가 2인 컨볼루션 층 한 개를 통과하여 크기가 절반으로 감소된 후에, 8개의 3x3 컨볼루션 층을 통과할 수 있다.According to an embodiment of the present invention, the CNN structure 500 may include a part of the Resnet-34 network as part of the structure. For example, a feature map that has passed through the Resnet-34 network passes through 7 3x3 convolution layers, then passes through one convolution layer with a stride of 2 and is halved in size, then 8 3x3 convolutions. Can pass through layers.

본 발명의 일 실시예에 따르면, 문자 검출 모델은 아래의 수학식 2 내지 4를 통해 평균 제곱 오차(Mean Square Error, MSE)를 계산하여 위치 검출 오차(localization loss)를 산출할 수 있다.According to an embodiment of the present invention, the character detection model may calculate a localization loss by calculating a mean square error (MSE) through Equations 2 to 4 below.

[수학식 2] [Equation 2]

[수학식 3][Equation 3]

이때,

는 위치 검출 오차, i는 초서체 문자가 검출된 바운딩 박스의 인덱스 값, j는 정답(ground truth) 바운딩 박스의 인덱스(index) 값, N은 정답 바운딩 박스에 대응되는 디폴트 바운딩 박스의 수, d는 디폴트 바운딩 박스, m은 바운딩 박스의 좌상단의 가로축 좌표값(cx), 바운딩 박스의 좌상단의 세로축 좌표값(cy), 바운딩 박스의 폭(width, w) 및 바운딩 박스의 높이(height, h)로 구성된 집합의 원소, k는 특정 카테고리(category), l은 예측된 바운딩 박스, g는 정답 바운딩 박스를 나타낸다.At this time,

Is the position detection error, i is the index value of the bounding box in which the cursive character is detected, j is the index value of the ground truth bounding box, N is the number of default bounding boxes corresponding to the correct answer bounding box, d is Default bounding box, m is the horizontal coordinate value of the upper left corner of the bounding box (cx), the vertical axis coordinate value of the upper left corner of the bounding box (cy), the width of the bounding box (width, w), and the height of the bounding box (height, h). An element of the configured set, k denotes a specific category, l denotes a predicted bounding box, and g denotes a correct answer bounding box.

또한,

는 아래의 수학식 4를 이용하여 산출될 수 있다.Also,

Can be calculated using Equation 4 below.

[수학식 4][Equation 4]

그러나, 문자 검출 모델은 평균 제곱 오차 대신 smooth L1 오차(smooth L1 loss) 또는 L2 오차(L2 loss)를 계산하여 위치 검출 오차를 산출할 수 있으며, 위치 검출 오차를 산출하는 방법은 반드시 이에 한정되는 것은 아니다.However, the character detection model can calculate the position detection error by calculating a smooth L1 error or an L2 error instead of the mean square error, and the method of calculating the position detection error is necessarily limited to this. no.

또한, 문자 검출 모델은 아래의 수학식 5를 통해 신뢰 오차(Confidence loss)를 계산하여 아래의 수학식 6를 통해 신뢰 오차를 상술한 위치 검출 오차와 결합시켜 학습을 위한 오차 함수를 도출할 수 있다.In addition, the character detection model may calculate a confidence loss through Equation 5 below, and combine the confidence error with the above-described position detection error through Equation 6 below to derive an error function for learning. .

[수학식 5][Equation 5]

[수학식 6][Equation 6]

이때,

는 신뢰 오차,

은 학습을 위한 오차 함수,

는 초서체 문자가 검출된 바운딩 박스의 인덱스 값,

는 초서체 문자가 검출되지 않은 바운딩 박스의 인덱스 값, j는 정답(ground truth) 바운딩 박스의 순서, N은 정답 바운딩 박스에 대응되는 디폴트 바운딩 박스의 수, d는 디폴트 바운딩 박스, p는 특정 카테고리, l은 예측된 바운딩 박스, g는 정답 바운딩 박스, c는 카테고리 p에 대한 신뢰 점수를 나타낸다.At this time,

Is the confidence error,

Is the error function for learning,

Is the index value of the bounding box where the cursive character was detected,

Is the index value of the bounding box where no cursive character is detected, j is the order of the ground truth bounding box, N is the number of default bounding boxes corresponding to the correct answer bounding box, d is the default bounding box, p is a specific category, l is the predicted bounding box, g is the correct answer bounding box, and c is the confidence score for category p.

또한,

는 아래의 수학식 7을 이용하여 산출될 수 있다.Also,

Can be calculated using Equation 7 below.

[수학식 7][Equation 7]

또한 본 발명의 일 실시예에 따르면, 문자 검출 모델은 제1 컨볼루션 층(510) 및 제2 컨볼루션 층(520)을 통과한 특징 지도를 이용하여 신뢰 점수(Confidence score) 및 바운딩 박스 형태의 결과값을 생성할 수 있다.In addition, according to an embodiment of the present invention, the character detection model is in the form of a confidence score and a bounding box using a feature map that has passed through the first convolution layer 510 and the second convolution layer 520. You can generate results.

구체적으로, 문자 검출 모델은 검출된 하나 이상의 초서체 문자 영역 각각을 바운딩 박스의 형태로 생성할 수 있다. 또한, 문자 검출 모델은 검출된 하나 이상의 초서체 문자 영역 각각에 대한 신뢰 점수를 생성할 수 있다.Specifically, the character detection model may generate each of the detected one or more cursive character regions in the form of a bounding box. In addition, the character detection model may generate a confidence score for each of the detected one or more cursive character regions.

이때, 바운딩 박스는 문서 이미지 내에서 초서체 문자가 있을 것으로 예측되는 영역을 표시하는 사각형의 구획을 지칭할 수 있으며, 바운딩 박스 형태의 결과값은 바운딩 박스의 좌상단의 가로축 좌표값, 바운딩 박스의 좌상단의 세로축 좌표값, 바운딩 박스의 폭(width) 및 바운딩 박스의 높이(height)의 4가지 값을 계산함으로써 생성될 수 있다. 그러나, 바운딩 박스 형태의 결과값은 반드시 이에 한정되는 것은 아니며, 문서 이미지 내에서 초서체 문자가 있을 것으로 예측되는 영역을 표시하는 사각형의 구획을 특정할 수 있는 방법이라면 바운딩 박스 형태의 결과값을 생성하는 데 이용될 수 있다.In this case, the bounding box may refer to a rectangular section indicating the area in which the cursive character is predicted to be present in the document image, and the result value of the bounding box type is the horizontal axis coordinate value of the upper left corner of the bounding box, It can be generated by calculating four values of the vertical axis coordinate value, the width of the bounding box, and the height of the bounding box. However, the result value in the form of a bounding box is not necessarily limited to this, and if it is a method that can specify a rectangular section indicating the area in which the cursive character is expected to be present in the document image, the result value in the form of a bounding box is generated. Can be used to

또한, 신뢰 점수는 검출된 하나 이상의 초서체 문자 영역에 초서체 문자가 포함되어 있을 확률일 수 있다. 예를 들어, 임의의 검출된 초서체 문자 영역과 대응되는 신뢰 점수가 0.9라면, 이 초서체 문자 영역을 나타내는 바운딩 박스 내에는 90%의 확률로 초서체 문자가 포함되어 있음을 나타낼 수 있다.In addition, the confidence score may be a probability that a cursive character is included in the detected one or more cursive character regions. For example, if the confidence score corresponding to the detected cursive character area is 0.9, it may indicate that the cursive character is included in the bounding box representing the cursive character area with a 90% probability.

오검출 영역 제거부(330)는 테스트 이미지에서 복수의 초서체 문자 영역이 검출된 경우, 검출된 복수의 초서체 문자 영역 사이의 중복 여부에 기초하여 복수의 초서체 문자 영역 중 오검출 영역을 제거한다.When a plurality of cursive character regions are detected in the test image, the erroneous detection region removal unit 330 removes the erroneous detection region from among the plurality of cursive character regions based on whether or not overlapping between the detected plurality of cursive character regions is detected.

예를 들어, 오검출 영역 제거부(330)는 검출된 복수의 초서체 문자 영역 중 어느 하나의 문자 영역에 포함된 다른 문자 영역이 존재할 경우, 포함된 문자 영역을 오검출 영역으로 판단하고 제거할 수 있다.For example, when there is another character area included in any one character area among a plurality of detected cursive character areas, the erroneous detection area removal unit 330 may determine the included character area as a false detection area and remove it. have.

본 발명의 일 실시예에 따르면, 오검출 영역 제거부(330)는 복수의 초서체 문자 영역 각각의 크기 및 복수의 초서체 문자 영역 사이의 겹치는 영역의 크기에 기초하여 오검출 영역을 검출할 수 있다.According to an embodiment of the present invention, the erroneous detection area removal unit 330 may detect the erroneous detection area based on the size of each of the plurality of cursive character areas and the size of an overlapping area between the plurality of cursive character areas.

본 발명의 일 실시예에 따르면, 오검출 영역 제거부(330)는 검출부(320)의 결과값으로 생성된 바운딩 박스의 면적을 고려하는 변형된 비최대값 억제 알고리즘을 사용할 수 있다.According to an embodiment of the present invention, the erroneous detection area removal unit 330 may use a modified non-maximum value suppression algorithm that considers the area of the bounding box generated as a result value of the detection unit 320.

또한 본 발명의 일 실시예에 따르면, 오검출 영역 제거부(330)는 신뢰 점수에 기초하여, 복수의 초서체 문자 영역 중 하나의 영역을 기준 영역으로 설정하고, 아래의 수학식 8In addition, according to an embodiment of the present invention, the erroneous detection area removal unit 330 sets one of the plurality of cursive character areas as a reference area based on the confidence score, and Equation 8 below

[수학식 8][Equation 8]

(이때,

는 기준 영역의 크기,

는 나머지 영역 중 하나의 크기,

는 기 설정된 값)을 이용하여 기준 영역과 복수의 초서체 문자 영역 중 기준 영역을 제외한 나머지 영역 각각 사이의 IOU(Intersection Over Union) 값을 산출하며, 나머지 영역 중 IOU 값이 기 설정된 값 이상인 영역을 오검출 영역으로 판단할 수 있다.(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is a preset value) to calculate the IOU (Intersection Over Union) value between the reference area and the remaining areas except for the reference area among the plurality of cursive character areas, and the area in which the IOU value is greater than or equal to the preset value is calculated. It can be determined as a detection area.

더욱 상세하게, 오검출 영역 제거부(330)는 검출부(320)의 결과값으로 생성된 복수의 바운딩 박스와 이에 대응되는 복수의 신뢰 점수를 함께 정렬시켜, 가장 높은 신뢰 점수를 갖는 바운딩 박스를 기준으로 위의 수학식 8을 통해 다른 바운딩 박스들과의 IOU(Intersection Over Union) 값을 계산할 수 있다. 또한, 오검출 영역 제거부(330)는 기 설정된 값 이상의 IOU 값을 갖는 다른 바운딩 박스들은 동일한 초서체 문자에 대한 중복된 검출이라 판단하고 제거하며, 이 과정을 나머지 모든 바운딩 박스들에 대해서 반복할 수 있다.In more detail, the erroneous detection area removal unit 330 aligns a plurality of bounding boxes generated as a result of the detection unit 320 and a plurality of confidence scores corresponding thereto, based on the bounding box having the highest confidence score. As a result, an IOU (Intersection Over Union) value with other bounding boxes may be calculated through Equation 8 above. In addition, the erroneous detection area removal unit 330 determines and removes other bounding boxes having IOU values greater than or equal to a preset value as duplicate detection for the same cursive character, and this process can be repeated for all remaining bounding boxes. have.

도 6은 본 발명의 일 실시예에 따른 문자 검출 방법을 설명하기 위한 흐름도이다. 6 is a flowchart illustrating a method of detecting a character according to an embodiment of the present invention.

도 6에 도시된 방법은 예를 들어, 도 3에 도시된 문자 검출 장치(300)에 의해 수행될 수 있다. 도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.The method illustrated in FIG. 6 may be performed, for example, by the character detection apparatus 300 illustrated in FIG. 3. In the illustrated flowchart, the method is described by dividing the method into a plurality of steps, but at least some of the steps are performed in a different order, combined with other steps, performed together, omitted, divided into detailed steps, or not shown. One or more steps may be added and performed.

도 6을 참조하면, 우선, 문자 검출 장치(300)는 하나 이상의 초서체 문자를 포함하는 테스트 이미지를 입력받는다(610).Referring to FIG. 6, first, the character detection device 300 receives a test image including one or more cursive characters (610).

이후, 문자 검출 장치(300)는 각각 하나 이상의 초서체 문자를 포함하는 복수의 학습 이미지를 기반으로 학습된 문자 검출 모델을 이용하여 입력된 테스트 이미지에서 하나 이상의 초서체 문자 영역을 검출한다(620).Thereafter, the character detection apparatus 300 detects at least one cursive character region from the input test image using a character detection model learned based on a plurality of training images each including at least one cursive character (operation 620).

또한 본 발명의 일 실시예에 따르면, 복수의 학습 이미지는 복수의 초서체 문자 각각에 대한 개별 문자 이미지 중 적어도 하나와, 초서체 문자를 포함하지 않는 하나 이상의 배경 이미지 중 하나를 조합하여 생성된 이미지를 더 포함할 수 있다.In addition, according to an embodiment of the present invention, the plurality of training images further includes an image generated by combining at least one of individual character images for each of the plurality of cursive characters and one or more background images not including cursive characters. Can include.

본 발명의 일 실시예에 따르면, 도 5에 도시된 바와 같이 문자 검출 모델은 CNN 구조(500)를 포함할 수 있다.According to an embodiment of the present invention, as shown in FIG. 5, the character detection model may include a CNN structure 500.

이때, 문자 검출 모델은 위의 수학식 2 내지 4를 통해 평균 제곱 오차를 계산하여 위치 검출 오차를 산출할 수 있다.In this case, the character detection model may calculate the position detection error by calculating the mean square error through Equations 2 to 4 above.

그러나, 문자 검출 모델은 평균 제곱 오차 대신 smooth L1 오차 또는 L2 오차를 계산하여 위치 검출 오차를 산출할 수 있으며, 위치 검출 오차를 산출하는 방법은 반드시 이에 한정되는 것은 아니다.However, the character detection model may calculate a position detection error by calculating a smooth L1 error or an L2 error instead of a mean square error, and a method of calculating the position detection error is not necessarily limited thereto.

또한, 문자 검출 모델은 위의 수학식 5를 통해 신뢰 오차를 계산하여 아래의 수학식 6를 통해 신뢰 오차를 상술한 위치 검출 오차와 결합시켜 학습을 위한 오차 함수를 도출할 수 있다.In addition, the character detection model may calculate a confidence error through Equation 5 above, and combine the confidence error with the position detection error described above through Equation 6 below to derive an error function for learning.

본 발명의 일 실시예에 따르면, CNN 구조는 테스트 이미지로부터 추출된 특징 지도(Feature map)가 입력되는 복수의 제1 컨볼루션 층(Convolution layers) 및 제1 컨볼루션 층을 통과한 특징 지도가 입력되는 복수의 제2 컨볼루션 층을 포함할 수 있다.According to an embodiment of the present invention, the CNN structure includes a plurality of first convolution layers into which a feature map extracted from a test image is input, and a feature map passing through the first convolution layer. It may include a plurality of second convolutional layers.

본 발명의 일 실시예에 따르면, 문자 검출 모델은 제1 컨볼루션 층(510) 및 제2 컨볼루션 층(520)을 통과한 특징 지도를 이용하여 신뢰 점수 및 바운딩 박스 형태의 결과값을 생성할 수 있다.According to an embodiment of the present invention, the character detection model generates a confidence score and a result value in the form of a bounding box using a feature map that has passed through the first convolution layer 510 and the second convolution layer 520. I can.

구체적으로, 문자 검출 모델은 검출된 하나 이상의 초서체 문자 영역 각각을 바운딩 박스 형태로 생성할 수 있다. 또한, 문자 검출 모델은 검출된 하나 이상의 초서체 문자 영역 각각에 대한 신뢰 점수를 생성할 수 있다.Specifically, the character detection model may generate each of the detected one or more cursive character regions in the form of a bounding box. In addition, the character detection model may generate a confidence score for each of the detected one or more cursive character regions.

이때, 신뢰 점수는 검출된 하나 이상의 초서체 문자 영역에 초서체 문자가 포함되어 있을 확률일 수 있다.In this case, the confidence score may be a probability that a cursive character is included in the detected one or more cursive character regions.

이후, 문자 검출 장치(300)는 테스트 이미지에서 복수의 초서체 문자 영역이 검출된 경우, 검출된 복수의 초서체 문자 영역 사이의 중복 여부에 기초하여 복수의 초서체 문자 영역 중 오검출 영역을 제거한다(630).Thereafter, when a plurality of cursive character areas are detected in the test image, the character detection apparatus 300 removes the erroneous detection area from among the plurality of cursive character areas based on whether or not overlapping between the detected cursive character areas (630). ).

본 발명의 일 실시예에 따르면, 문자 검출 장치(300)는 초서 영역을 검출하는 단계(630)의 결과값으로 생성된 바운딩 박스의 면적을 고려하는 변형된 비최대값 억제 알고리즘을 사용할 수 있다.According to an embodiment of the present invention, the character detection apparatus 300 may use a modified non-maximum value suppression algorithm that considers the area of the bounding box generated as a result of the step 630 of detecting the cursive area.

또한 본 발명의 일 실시예에 따르면, 문자 검출 장치(300)는, 복수의 초서체 문자 영역 각각의 크기 및 복수의 초서체 문자 영역 사이의 겹치는 영역의 크기에 기초하여 오검출 영역을 검출할 수 있다.In addition, according to an embodiment of the present invention, the character detection apparatus 300 may detect an erroneous detection area based on the size of each of the plurality of cursive character areas and the size of an overlapping area between the plurality of cursive character areas.

구체적으로, 문자 검출 장치(300)는 신뢰 점수에 기초하여, 복수의 초서체 문자 영역 중 하나의 영역을 기준 영역으로 설정하고, 아래의 수학식 9Specifically, based on the confidence score, the character detection apparatus 300 sets one of the plurality of cursive character areas as a reference area, and Equation 9 below

[수학식 9][Equation 9]

(이때,

는 기준 영역의 크기,

는 나머지 영역 중 하나의 크기,

는 기 설정된 값)를 이용하여 기준 영역과 복수의 초서체 문자 영역 중 기준 영역을 제외한 나머지 영역 각각 사이의 IOU(Intersection Over Union) 값을 산출하며, 나머지 영역 중 IOU 값이 기 설정된 값 이상인 영역을 오검출 영역으로 판단할 수 있다.(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is a preset value) to calculate the IOU (Intersection Over Union) value between the reference area and the rest of the areas except the reference area among the plurality of cursive character areas, and the area where the IOU value is greater than or equal to the preset value is calculated. It can be determined as a detection area.

더욱 상세하게, 문자 검출 장치(300)는 검출부(320)의 결과값으로 생성된 복수의 바운딩 박스와 이에 대응되는 복수의 신뢰 점수를 함께 정렬시켜, 가장 높은 신뢰 점수를 갖는 바운딩 박스를 기준으로 위의 수학식 9를 통해 다른 바운딩 박스들과의 IOU(Intersection Over Union) 값을 계산하여, 기 설정된 값 이상의 IOU 값을 갖는 다른 바운딩 박스들은 동일한 초서체 문자에 대한 중복된 검출이라 판단하고 제거하며, 이 과정을 나머지 모든 바운딩 박스들에 대해서 반복하는 방법을 의미할 수 있다.In more detail, the character detection apparatus 300 arranges a plurality of bounding boxes generated as a result of the detection unit 320 and a plurality of confidence scores corresponding thereto together, so that the character detection apparatus 300 is arranged based on the bounding box having the highest confidence score. By calculating the IOU (Intersection Over Union) value with other bounding boxes through Equation 9 of, other bounding boxes having an IOU value equal to or greater than a preset value are determined to be duplicate detection of the same cursive character and removed. This may mean a method of repeating the process for all remaining bounding boxes.

본 발명의 일 실시예에 따른 문자 검출 방법에 있어서, 오검출 영역을 제거하는 단계 이후에, 성능 지표를 생성하는 단계를 더 포함할 수 있다.In the character detection method according to an embodiment of the present invention, after removing the erroneous detection area, generating a performance index may be further included.

구체적으로, 기존의 이미지 당 오검출 개수를 판단하는 오검출 성능 평가 지표에 해당하는 False Positives Per Image(FPPI)를 사용하지 않고, 아래의 수학식 10을 통해 하나의 글자 당 오검출 개수를 판단하는 오검출 성능 평가 지표에 해당하는 False Positives Per Character(FPPC)를 사용하여 문자 검출 성능을 평가할 수 있다.Specifically, without using False Positives Per Image (FPPI), which corresponds to the false detection performance evaluation index that determines the number of false detections per image, the number of false positives per character is determined through Equation 10 below. Character detection performance can be evaluated using False Positives Per Character (FPPC), which is an indicator of false detection performance.

[수학식 10][Equation 10]

(이때, # of FA= 진짜 양성(True positive) 개수 + 가짜 양성(False positive) 개수 - 검출된 정답(ground truth) 박스의 개수, # of GT = 정답 박스의 개수)(At this time, # of FA = number of true positives + number of false positives-number of detected ground truth boxes, # of GT = number of correct answer boxes)

도 7은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.7 is a block diagram illustrating and describing a computing environment 10 including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities in addition to those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 초서 영역 검출 장치일 수 있다.The illustrated computing environment 10 includes a computing device 12. In one embodiment, the computing device 12 may be a cursive area detection device.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The computing device 12 includes at least one processor 14, a computer-readable storage medium 16 and a communication bus 18. The processor 14 may cause the computing device 12 to operate according to the exemplary embodiments mentioned above. For example, the processor 14 may execute one or more programs stored in the computer-readable storage medium 16. The one or more programs may include one or more computer-executable instructions, and the computer-executable instructions are configured to cause the computing device 12 to perform operations according to an exemplary embodiment when executed by the processor 14 Can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.The computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer-readable storage medium 16 includes a set of instructions executable by the processor 14. In one embodiment, the computer-readable storage medium 16 includes memory (volatile memory such as random access memory, nonvolatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by computing device 12 and store desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 18 interconnects the various other components of the computing device 12, including the processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(102)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24. The input/output interface 22 and the network communication interface 26 are connected to the communication bus 18. The input/output device 24 may be connected to other components of the computing device 12 through the input/output interface 22. The exemplary input/output device 24 includes a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touch pad or a touch screen), a voice or sound input device, and various types of sensor devices and/or a photographing device. Input devices and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included in the computing device 12 as a component constituting the computing device 12, and may be connected to the computing device 102 as a separate device distinct from the computing device 12. May be.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 청구범위뿐만 아니라 이 청구범위와 균등한 것들에 의해 정해져야 한다.Although the exemplary embodiments of the present invention have been described in detail above, those of ordinary skill in the art to which the present invention pertains will understand that various modifications may be made to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention should not be defined by being limited to the described embodiments, and should not be determined by the claims to be described later, but also by the claims and equivalents.

10: 컴퓨팅 환경
12: 컴퓨팅 장치
14: 프로세서
16: 컴퓨터 판독 가능 저장 매체
18: 통신 버스
20: 프로그램
22: 입출력 인터페이스
24: 입출력 장치
26: 네트워크 통신 인터페이스
300: 문자 검출 장치
310: 이미지 입력부
320: 검출부
330: 오검출 영역 제거부10: computing environment
12: computing device
14: processor
16: computer readable storage medium
18: communication bus
20: program
22: input/output interface
24: input/output device
26: network communication interface
300: character detection device
310: image input unit
320: detection unit
330: erroneous detection area removal unit

Claims

An image input unit for receiving a test image including one or more cursive characters;
A detector configured to detect at least one cursive character region from the input test image using a character detection model learned based on a plurality of training images each including at least one cursive character; And
When a plurality of cursive character areas are detected in the test image, including a false detection area removal unit for removing a false detection area from among the plurality of cursive character areas based on whether or not overlapping between the detected plurality of cursive character areas,
The character detection model generates, for each of the detected one or more cursive character areas, a confidence score, which is a probability that a cursive character is included in each of the detected one or more cursive character areas,
The erroneous detection area removing unit sets one of the plurality of cursive character areas as a reference area based on the confidence score, and among the reference area and the plurality of cursive character areas using Equation 1 below. A character detection device for calculating an Intersection Over Union (IOU) value between each of the remaining areas excluding the reference area, and determining an area in which the IOU value is equal to or greater than a preset value among the remaining areas as the erroneous detection area.
[Equation 1]

(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is the preset value)

The method according to claim 1,
The plurality of learning images, including an image of a scanned document created using one or more cursive characters, character detection device.

The method according to claim 1,
The plurality of learning images includes an image generated by combining at least one of individual character images for each of the plurality of cursive characters and one or more background images that do not include cursive characters.

The method according to claim 1,
The character detection model includes a convolutional neural network (CNN) structure.

The method of claim 4,
The CNN structure includes a plurality of first convolution layers to which a feature map extracted from the test image is input, and a plurality of second convolution layers to which a feature map passing through the first convolution layer is input. Character detection device comprising a convolutional layer.

The method according to claim 1,
The character detection model generates each of the detected one or more cursive character regions in the form of a bounding box.

delete

Receiving a test image including one or more cursive characters;
Detecting at least one cursive character region from the input test image using a character detection model learned based on a plurality of training images each including at least one cursive character; And
In the case where a plurality of cursive character areas are detected in the test image, removing an erroneous detection area from among the plurality of cursive character areas based on whether or not overlapping between the detected plurality of cursive character areas is detected,
The character detection model generates, for each of the detected one or more cursive character areas, a confidence score, which is a probability that a cursive character is included in each of the detected one or more cursive character areas,
The removing of the erroneous detection area may include setting one of the plurality of cursive character areas as a reference area based on the confidence score;
Calculating an Intersection Over Union (IOU) value between the reference region and the remaining regions of the plurality of cursive character regions except for the reference region using Equation 1 below; And
The character detection method further comprising determining an area in which the IOU value is equal to or greater than a preset value among the remaining areas as an erroneous detection area using Equation 1 below.
[Equation 1]

(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is the preset value)

The method of claim 11,
The plurality of learning images, including an image of a scanned document created using one or more cursive characters, character detection method.

The method of claim 11,
The plurality of training images includes an image generated by combining at least one of individual character images for each of the plurality of cursive characters and one or more background images not including cursive characters.

The method of claim 11,
The character detection model includes a convolutional neural network (CNN) structure.

The method of claim 14,
The CNN structure includes a plurality of first convolution layers to which a feature map extracted from the test image is input, and a plurality of second convolution layers to which a feature map passing through the first convolution layer is input. Character detection method comprising a convolutional layer.

The method of claim 11,
The character detection model generates each of the detected one or more cursive character regions in the form of a bounding box.

delete

The method of claim 11,
After the step of removing the erroneous detection region, the method further comprising generating a performance index using Equation 2 below.
[Equation 2]

(At this time, # of FA = number of true positive detection boxes + number of false positive detection boxes-number of detected ground truth boxes, # of GT = number of correct answer boxes)

As a computer program stored in a non-transitory computer readable storage medium,
The computer program includes one or more instructions, and when the instructions are executed by a computing device having one or more processors, the computing device causes:
Receive a test image containing one or more cursive characters,
One or more cursive character regions are detected from the input test image using a character detection model learned based on a plurality of training images each including at least one cursive character,
When a plurality of cursive character areas are detected in the test image, an erroneous detection area among the plurality of cursive character areas is removed based on whether or not the detected plurality of cursive character areas overlap,
The character detection model generates, for each of the detected one or more cursive character areas, a confidence score, which is a probability that a cursive character is included in each of the detected one or more cursive character areas,
The computing device may set one of the plurality of cursive character areas as a reference area based on the confidence score, and the reference area among the plurality of cursive character areas using Equation 1 below. A computer program for calculating an Intersection Over Union (IOU) value between each of the remaining areas except for the area, and determining an area in which the IOU value is equal to or greater than a preset value among the remaining areas as the erroneous detection area.
[Equation 1]

(At this time,

Is the size of the reference area,

Is the size of one of the remaining areas,

Is the preset value)