KR101896357B1

KR101896357B1 - Method, device and program for detecting an object

Info

Publication number: KR101896357B1
Application number: KR1020180015537A
Authority: KR
Inventors: 고동운
Original assignee: 주식회사 라디코
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2018-09-07

Abstract

The present invention provides a method, a device and a program for detecting an object. The method for detecting an object by the device comprises the following steps of: obtaining an image including a detection target object; segmenting the obtained image into a plurality of cells; obtaining information on at least one region estimated to contain the object for each of the plurality of segmented cells by using a plurality of layers included in a network for detecting the object; obtaining reliability for each of the at least one region obtained; and detecting the object based on the obtained reliability and an overlapped region of the at least one region obtained.

Description

METHOD, DEVICE AND PROGRAM FOR DETECTING AN OBJECT,

본 발명은 객체를 검출하는 방법, 디바이스 및 프로그램에 관한 것이다. The present invention relates to a method, a device and a program for detecting an object.

일반적으로 관공서, 금융권, 기업등에서 문서의 보관 및 관리의 용도로 문서는 스캔된 이미지 형태로 널리 이용되고 있다. 특히, 문서의 보관 관리 비용의 증가로 1990년대 후반이후 대다수의 금융권에서는 이미지를 이용한 문서 관리가 보편화 되어 있다.In general, documents are widely used as scanned images for archiving and management of documents in government offices, financial institutions, and corporations. In particular, since the cost of archiving documents has increased, the management of documents using images has become commonplace in most financial sectors since the late 1990s.

이러한 이미지 형태로 보관및 관리되고 있는 문서내 보호되어야 할 개인정보 내용중 가장 많은 부분을 차지하고 있는것은 주민등록번호이다.It is the resident registration number which occupies the most part of the personal information to be protected in the document which is kept and managed in this image form.

문서내 주민등록번호는 인쇄체 및 필기체 형태로 다양하게 기록되어 있으며, 다양한 양식에 기록되어 기록된 양식을 특정하기 어렵다.The resident registration number in the document is variously recorded in the form of a printed matter and a handwritten form, and it is difficult to specify the recorded form recorded in various forms.

많은 회사 및 금융기관에서 개인정보 보호를 위해 개인정보가 포함된 문서 자동검출 시스템을 도입하여 사용하고 있으나, 검출대상은 텍스트, 워드, 아래한글과 같은 전자문서에 국한되어 있으며, 이미지 형태로 관리유통되는 문서는 검출되지 않아 보안사고가 발생될 위험이 높다.Many companies and financial institutions have introduced automatic document detection systems that contain personal information to protect personal information. However, the objects to be detected are confined to electronic documents such as text, word, and Korean alphabet, Documents are not detected and there is a high risk of security incidents.

특허문헌 1은 이미지내 텍스트를 인식하여 개인정보를 추출하고 마킹하는 방법으로 이미지내 문자를 인식하고 인식된 결과를 바탕으로 개인정보를 추출후 마킹한다.Patent Document 1 recognizes text in an image by extracting and marking personal information by recognizing text in the image, and extracts and marks personal information based on the recognized result.

특허문헌 2는 이미지내 텍스트를 인식하고 인식결과를 확률적 유사도를 이용하여 개인정보일 확률을 판단후 개인정보를 추출한다.In Patent Document 2, the text in the image is recognized and the recognition result is used to determine the probability of the personal information using the probability similarity, and then the personal information is extracted.

특허문헌 3은 스캔된 문서이미지를 템플릿매칭을 이용하여 문서의 종류를 분류하고 분류된 문서의 종류에 상응하는 위치에 존재하는 개인정보를 추출한다. 즉, 정형화된 형식을 갖는 문서이미지에서 문서분류를 통해 개인정보를 검출한다.Patent Document 3 classifies the types of documents by using template matching of scanned document images and extracts the personal information existing at positions corresponding to the types of the classified documents. That is, personal information is detected through document classification in a document image having a formal format.

특허문헌 1과 특허문헌 2의 경우에는 문자 인식 기술을 이용하는점에서 빠른 처리가 요구되는 정보 유출방지에는 비효율적이며, 문자 인식율에 그 성능이 제한되는 문제점이 있다. 이에 반해 특허문헌 3의 경우에는 빠른 처리가 가능하나 템플릿 매칭등을 이용 문서의 분류가 선행되어야 하는관계로 불특정 포맷의 문서 이미지에 대한 적용에 한계를 갖고 있다.In the case of Patent Document 1 and Patent Document 2, there is a problem in that it is inefficient to prevent information leakage which requires fast processing in terms of using character recognition technology, and its performance is limited to a character recognition rate. On the other hand, in the case of Patent Document 3, although quick processing is possible, classification of documents using template matching or the like must precede classification, which limits the application to a document image of an unspecified format.

통상의 컨벌루션 신경망을 이용한 이미지내 객체 검출방법은 분류기를 사용하여 객체를 검출하며, 일정크기의 윈도우를 생성한후 윈도우를 전체 이미지를 움직이며 분류하는 과정을 거치게 된다. 즉 분류기를 일정크기의 윈도우를 상하좌우로 움직이며 전체 이미지를 탐색하는 방식이다. 이는 빠른 처리가 불가능한 문제점을 갖고 있다.An object detection method using an ordinary convolution neural network detects an object using a classifier, and after a window of a predetermined size is generated, a window is moved and classified as an entire image. That is, the classifier moves the window of a certain size vertically and horizontally to search the entire image. This has the problem that it is impossible to process quickly.

등록특허공보 제10-1401028호 (2014.05.22 등록)Patent Registration No. 10-1401028 (Registered on May 22, 2014) 등록특허공보 제10-1721063호 (2017.03.23 등록)Patent Registration No. 10-1721063 (registered on March 23, 2013) 공개특허공보 제10-2015-0130253호 (2015.11.23 공개)Published Japanese Patent Application No. 10-2015-0130253 (published Nov. 23, 2015)

본 발명이 해결하고자 하는 과제는 객체를 검출하는 방법, 디바이스 및 프로그램을 제공하는 것이다.A problem to be solved by the present invention is to provide a method, a device and a program for detecting an object.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problems, and other problems which are not mentioned can be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따라 디바이스가 객체를 검출하는 방법은, 검출대상 객체를 포함하는 이미지를 획득하는 단계, 상기 획득된 이미지를 복수의 셀로 분할(segmentation)하는 단계, 상기 객체를 검출하기 위한 네트워크에 포함된 복수의 레이어를 이용하여, 상기 복수의 분할된 셀 각각에 대하여 상기 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득하는 단계, 상기 획득된 하나 이상의 영역 각각에 대한 신뢰도를 획득하는 단계 및 상기 획득된 신뢰도 및 상기 획득된 하나 이상의 영역의 중첩영역에 기초하여, 상기 객체를 검출하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method for a device to detect an object, comprising: obtaining an image including a detection object; segmenting the obtained image into a plurality of cells; Obtaining information about one or more regions estimated to contain the object for each of the plurality of divided cells using a plurality of layers included in a network for detecting the object, Obtaining reliability for each of the regions, and detecting the object based on the obtained reliability and the overlapping region of the obtained one or more regions.

또한, 상기 분할하는 단계는, 상기 획득된 이미지의 노이즈를 제거하는 단계, 상기 노이즈가 제거된 이미지를 하나 이상의 블록으로 분할하는 단계 및 상기 분할된 하나 이상의 블록 각각을 상기 복수의 셀로 분할하는 단계를 포함할 수 있다.The dividing may further include removing noise of the obtained image, dividing the noise-removed image into one or more blocks, and dividing each of the divided one or more blocks into the plurality of cells .

또한, 상기 하나 이상의 영역에 대한 정보는, 상기 하나 이상의 영역의 중심점 좌표, 폭 및 높이를 포함하고, 상기 하나 이상의 중심점 좌표, 폭 및 높이는, 상기 분할된 하나 이상의 블록 중 상기 하나 이상의 영역 각각이 속한 블록의 폭 및 높이를 기준으로 정규화된 것을 특징으로 할 수 있다.In addition, the information on the at least one area includes a center point coordinate, a width, and a height of the at least one area, and the at least one center point coordinate, width, And normalized based on the width and height of the block.

또한, 상기 검출대상 객체는 하나 이상의 클래스를 포함하고, 상기 하나 이상의 영역에 대한 정보를 획득하는 단계는, 상기 하나 이상의 클래스별로 상기 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득하는 단계를 포함할 수 있다.In addition, the object to be detected may include one or more classes, and the step of acquiring information on the one or more regions may include acquiring information on one or more regions estimated to include the object on the basis of the one or more classes . &Lt; / RTI >

또한, 상기 객체를 검출하는 단계는, 상기 획득된 하나 이상의 영역 각각에 대하여, 소정의 임계치 이상의 신뢰도를 갖는 하나 이상의 영역을 획득하는 단계 및 상기 소정의 임계치 이상의 신뢰도를 갖는 하나 이상의 영역을 이용하여 상기 객체를 검출하는 단계를 포함할 수 있다.The step of detecting the object may further include the steps of acquiring at least one region having a reliability higher than a predetermined threshold for each of the obtained one or more regions and using at least one region having reliability higher than the predetermined threshold, And detecting the object.

또한, 상기 객체를 검출하는 단계는, 상기 소정의 임계치 이상의 신뢰도를 갖는 하나 이상의 영역의 중첩영역을 이용하여 상기 객체가 포함된 영역의 위치를 산출하는 단계를 포함할 수 있다.The detecting of the object may include calculating a position of the region including the object using the overlap region of at least one region having reliability higher than the predetermined threshold.

또한, 상기 네트워크에 포함된 복수의 레이어는 상기 하나 이상의 영역에 대한 정보 및 상기 하나 이상의 영역의 신뢰도를 출력하는 출력 레이어를 포함하고, 상기 출력 레이어는, 상기 복수의 분할된 셀 각각에 대하여 상기 하나 이상의 영역 각각의 위치, 크기 및 신뢰도를 나타내는 뉴런 및 상기 객체의 클래스를 나타내는 뉴런을 포함할 수 있다.The plurality of layers included in the network may include an output layer outputting information on the at least one region and the reliability of the at least one region, A neuron representing the location, size and reliability of each of the above regions, and a neuron representing the class of the object.

또한, 상기 검출대상 객체는, 상기 이미지에 포함된 주민등록번호 및 다른 개인정보 중 적어도 하나를 포함할 수 있다.In addition, the detection object may include at least one of a resident registration number and other personal information included in the image.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 디바이스는, 하나 이상의 인스트럭션을 저장하는 메모리, 상기 메모리에 저장된 상기 하나 이상의 인스트럭션을 실행하는 프로세서를 포함하고, 상기 프로세서는 상기 하나 이상의 인스트럭션을 실행함으로써, 검출대상 객체를 포함하는 이미지를 획득하고, 상기 획득된 이미지를 복수의 셀로 분할하고, 상기 객체를 검출하기 위한 네트워크에 포함된 복수의 레이어를 이용하여, 상기 복수의 분할된 셀 각각에 대하여 상기 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득하고, 상기 획득된 하나 이상의 영역 각각에 대한 신뢰도를 획득하고, 상기 획득된 신뢰도 및 상기 획득된 하나 이상의 영역의 중첩영역에 기초하여, 상기 객체를 검출한다.According to an aspect of the present invention, there is provided a device including a memory for storing one or more instructions, a processor for executing the one or more instructions stored in the memory, the processor executing the one or more instructions Thereby dividing the obtained image into a plurality of cells, and using a plurality of layers included in a network for detecting the object, for each of the plurality of divided cells Acquiring information about one or more regions estimated to include the object, obtaining reliability for each of the obtained one or more regions, and based on the obtained reliability and the overlapping region of the obtained one or more regions, And detects the object.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 프로그램은, 하드웨어인 컴퓨터와 결합되어, 제1 항의 방법을 수행할 수 있도록 컴퓨터에서 독출가능한 기록매체에 저장된다.According to an aspect of the present invention, there is provided a computer readable recording medium readable with a computer, which is capable of performing the method of claim 1. [

본 발명의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the invention are included in the detailed description and drawings.

본 발명에 따르면, 스캔된 문서 이미지내 포함된 주민등록번호를 컨벌루션 신경망을 기반으로하여 추출함으로써 이미지내 포함된 비정형 형태의 개인정보를 효율적으로 추출할 수 있다.According to the present invention, it is possible to efficiently extract the irregular-shaped personal information included in the image by extracting the resident registration number included in the scanned document image based on the convolution neural network.

또한, 인공신경망을 이용하여 비정형의 양식과 형식에 강인한 주민등록번호 추출로 개인정보 유출 방지 시스템의 성능을 향상시킬 수 있다.Also, by using artificial neural network, it is possible to improve the performance of personal information leakage prevention system by extracting resident registration number robust to unstructured form and format.

또한, 컨벌루션 신경망을 기반으로 객체를 검출함에 있어서 이미지를 셀로 나누어 처리함으로써 처리속도를 향상시킬 수 있다.Also, in detecting an object based on a convolutional neural network, processing speed can be improved by dividing an image into cells.

또한, 전처리 과정에서 검출할 영역을 제한함으로써 처리 속도를 향상시킬수 있다.In addition, the processing speed can be improved by limiting the area to be detected in the preprocessing process.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

도 1은 일 실시예에 따라 디바이스에서 객체를 검출하는데 이용하는 뉴럴 네트워크를 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 디바이스가 객체를 검출하는 방법을 설명하기 위한 흐름도이다.
도 3은 일 실시예에 따라 객체를 검출하는 디바이스의 블록도이다.
도 4는 일 실시 예에 따른 프로세서의 블록도이다.
도 5는 일 실시예에 따른 데이터 학습부의 블록도이다.
도 6은 일 실시 예에 따른 디바이스를 도시한 블록도이다.
도 7 내지 도 11은 개시된 실시 예에 따라 디바이스가 객체를 검출하기 위한 학습을 수행하고, 객체를 검출하는 방법을 설명하기 위한 도면이다.
도 12는 일 실시예에 따른 데이터 인식부의 블록도이다.
도 13은 일 실시 예에 따른 디바이스를 도시한 블록도이다.1 is a diagram for explaining a neural network used for detecting an object in a device according to an embodiment.
2 is a flow chart illustrating a method for a device to detect an object according to an embodiment.
3 is a block diagram of a device that detects an object in accordance with one embodiment.
4 is a block diagram of a processor in accordance with one embodiment.
5 is a block diagram of a data learning unit according to an embodiment.
6 is a block diagram illustrating a device according to one embodiment.
FIGS. 7 to 11 are diagrams for explaining a method for a device to perform learning for detecting an object and to detect an object according to the disclosed embodiment. FIG.
12 is a block diagram of a data recognition unit according to an embodiment.
13 is a block diagram illustrating a device according to one embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. It should be understood, however, that the invention is not limited to the disclosed embodiments, but may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, Is provided to fully convey the scope of the present invention to a technician, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terminology used herein is for the purpose of illustrating embodiments and is not intended to be limiting of the present invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. The terms " comprises "and / or" comprising "used in the specification do not exclude the presence or addition of one or more other elements in addition to the stated element. Like reference numerals refer to like elements throughout the specification and "and / or" include each and every combination of one or more of the elements mentioned. Although "first "," second "and the like are used to describe various components, it is needless to say that these components are not limited by these terms. These terms are used only to distinguish one component from another. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical scope of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless defined otherwise, all terms (including technical and scientific terms) used herein may be used in a sense that is commonly understood by one of ordinary skill in the art to which this invention belongs. In addition, commonly used predefined terms are not ideally or excessively interpreted unless explicitly defined otherwise.

명세서에서 사용되는 "부" 또는 “모듈”이라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부" 또는 “모듈”은 어떤 역할들을 수행한다. 그렇지만 "부" 또는 “모듈”은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부" 또는 “모듈”은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부" 또는 “모듈”은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부" 또는 “모듈”들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부" 또는 “모듈”들로 결합되거나 추가적인 구성요소들과 "부" 또는 “모듈”들로 더 분리될 수 있다.As used herein, the term "part" or "module" refers to a hardware component, such as a software, FPGA, or ASIC, and a "component" or "module" performs certain roles. However, "part" or " module " is not meant to be limited to software or hardware. A "module " or " module " may be configured to reside on an addressable storage medium and configured to play back one or more processors. Thus, by way of example, "a" or " module " is intended to encompass all types of elements, such as software components, object oriented software components, class components and task components, Microcode, circuitry, data, databases, data structures, tables, arrays, and variables, as used herein. Or " modules " may be combined with a smaller number of components and "parts " or " modules " Can be further separated.

공간적으로 상대적인 용어인 "아래(below)", "아래(beneath)", "하부(lower)", "위(above)", "상부(upper)" 등은 도면에 도시되어 있는 바와 같이 하나의 구성요소와 다른 구성요소들과의 상관관계를 용이하게 기술하기 위해 사용될 수 있다. 공간적으로 상대적인 용어는 도면에 도시되어 있는 방향에 더하여 사용시 또는 동작시 구성요소들의 서로 다른 방향을 포함하는 용어로 이해되어야 한다. 예를 들어, 도면에 도시되어 있는 구성요소를 뒤집을 경우, 다른 구성요소의 "아래(below)"또는 "아래(beneath)"로 기술된 구성요소는 다른 구성요소의 "위(above)"에 놓여질 수 있다. 따라서, 예시적인 용어인 "아래"는 아래와 위의 방향을 모두 포함할 수 있다. 구성요소는 다른 방향으로도 배향될 수 있으며, 이에 따라 공간적으로 상대적인 용어들은 배향에 따라 해석될 수 있다.The terms spatially relative, "below", "beneath", "lower", "above", "upper" And can be used to easily describe a correlation between an element and other elements. Spatially relative terms should be understood in terms of the directions shown in the drawings, including the different directions of components at the time of use or operation. For example, when inverting an element shown in the figures, an element described as "below" or "beneath" of another element may be placed "above" another element . Thus, the exemplary term "below" can include both downward and upward directions. The components can also be oriented in different directions, so that spatially relative terms can be interpreted according to orientation.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 일 실시예에 따라 디바이스에서 객체를 검출하는데 이용하는 뉴럴 네트워크를 설명하기 위한 도면이다.1 is a diagram for explaining a neural network used for detecting an object in a device according to an embodiment.

일 실시 예에서, 디바이스는 객체를 포함한 이미지를 획득할 수 있다. 예를 들어, 디바이스는 외부의 디바이스로부터 객체를 포함한 이미지를 수신하거나, 디바이스에 구비된 촬영 장치를 통해 객체를 촬영하여, 객체를 포함한 이미지를 획득할 수 있다. In one embodiment, the device may obtain an image that includes an object. For example, a device may receive an image including an object from an external device, or may photograph an object through a photographing device provided in the device to obtain an image including the object.

일 실시예에 따른 디바이스는 뉴럴 네트워크(neural network, 100)를 이용하여 이미지에 포함된 객체를 검출할 수 있다. 여기서, 뉴럴 네트워크(100)는, 통계학적 기계 학습의 결과를 이용하여, 이미지의 다양한 속성 정보들을 추출하고, 추출된 속성 정보들을 기초로 이미지 내의 객체들을 검출, 식별 및/또는 판단하는 알고리즘 집합일 수 있다. A device according to an exemplary embodiment may detect an object included in an image using a neural network 100. FIG. Here, the neural network 100 extracts various attribute information of an image using the result of statistical machine learning, and sets an algorithm for detecting, identifying, and / or determining objects in the image based on the extracted attribute information .

또한 뉴럴 네트워크(100)는 전술한 알고리즘 집합을 실행하기 위한 소프트웨어 또는 엔진(engine) 등으로 구현될 수 있다. 소프트웨어 또는 엔진 등으로 구현된 뉴럴 네트워크는 디바이스(미도시)내의 프로세서 또는 서버(미도시)의 프로세서에 의해 실행될 수 있다. The neural network 100 may also be implemented as software or an engine for executing the algorithm set described above. A neural network embodied in software, engine, or the like may be executed by a processor in a device (not shown) or a processor in a server (not shown).

일 실시예에 따른 뉴럴 네트워크(100)는, 뉴럴 네트워크(100)에 입력된 이미지 내에 포함된 다양한 속성들을 추상화함으로써, 이미지 내의 객체들을 검출할 수 있다. 이 경우, 이미지 내 속성들을 추상화한다는 것은, 이미지로부터 속성 정보들을 검출하고, 검출된 속성 정보들 중에서 객체를 대표할 수 있는 핵심 속성을 판단 하는 것일 수 있다. The neural network 100 according to one embodiment can detect objects in an image by abstracting various attributes contained in the image input to the neural network 100. [ In this case, abstracting the attributes in the image may be to detect attribute information from the image and to determine key attributes that can represent the object from the detected attribute information.

또한, 뉴럴 네트워크(100)는 입력 레이어(110), 출력 레이어(130) 및 그 사이의 복수의 레이어들(122 내지 128)을 포함할 수 있다. In addition, the neural network 100 may include an input layer 110, an output layer 130, and a plurality of layers 122 to 128 therebetween.

디바이스는 뉴럴 네트워크(100)를 이용하여 이미지의 속성 정보를 추출할 수 있다. 이미지의 속성 정보는 색상, 엣지(edge), 폴리건(polygon), 채도(saturation), 명도(brightness), 색온도, 블러(blur), 선명도(sharpness), 명도비(contrast) 등을 포함할 수 있으나, 이는 일 예일 뿐, 이미지의 속성 정보가 전술한 예에 한정되는 것은 아니다. The device can extract the attribute information of the image using the neural network 100. [ The attribute information of the image may include color, edge, polygon, saturation, brightness, color temperature, blur, sharpness, contrast, This is only an example, and the attribute information of the image is not limited to the above example.

한편, 디바이스는 스마트폰, 태블릿 PC, PC, 스마트 TV, 휴대폰, PDA(personal digital assistant), 랩톱, 미디어 플레이어, 마이크로 서버, GPS(global positioning system) 장치, 전자책 단말기, 디지털방송용 단말기, 네비게이션, 키오스크, MP3 플레이어, 디지털 카메라, 가전기기 및 기타 모바일 또는 비모바일 컴퓨팅 장치일 수 있으나, 이에 제한되지 않는다. 또한, 디바이스는 통신 기능 및 데이터 프로세싱 기능을 구비한 시계, 안경, 헤어 밴드 및 반지 등의 웨어러블 디바이스일 수 있다.On the other hand, the device may be a smart phone, a tablet PC, a PC, a smart TV, a mobile phone, a personal digital assistant (PDA), a laptop, a media player, a micro server, Kiosks, MP3 players, digital cameras, consumer electronics, and other mobile or non-mobile computing devices. The device may also be a wearable device, such as a watch, eyeglasses, a hair band, and a ring, with communication and data processing capabilities.

도 2는 일 실시예에 따른 디바이스가 객체를 검출하는 방법을 설명하기 위한 흐름도이다.2 is a flow chart illustrating a method for a device to detect an object according to an embodiment.

단계 S210에서, 디바이스는 검출대상 객체를 포함하는 이미지를 획득한다.In step S210, the device acquires an image including the object to be detected.

예를 들어, 검출대상 객체는 이미지에 포함된 주민등록번호를 의미할 수 있으나, 이에 제한되지 않는다.For example, the detection object may mean a resident registration number included in the image, but is not limited thereto.

일 실시 예에서, 검출대상 객체는 하나 이상의 클래스를 포함한다. 예를 들어, 클래스는 서로 다른 종류의 객체를 의미한다. 예를 들어, 검출대상 객체는 수기로 작성된 필기체 형식의 주민등록번호와, 타이핑(또는 인쇄)된 인쇄체 형식의 주민등록번호를 포함한다. 또한, 검출대상 객체는 지문이나 이름, 주소 등 다양한 종류의 개인정보를 포함할 수 있으며, 이에 제한되지 않는다.In one embodiment, the object to be detected includes one or more classes. For example, a class means an object of a different kind. For example, the object to be detected includes a handwritten resident registration number written in handwriting, and a resident registration number in the form of a typed (or printed) print. In addition, the object to be detected may include various kinds of personal information such as fingerprints, names, and addresses, but is not limited thereto.

일 실시 예에서, 이미지는 스캔 또는 촬영된 문서이미지를 포함할 수 있다.In one embodiment, the image may comprise a scanned or photographed document image.

일 실시 예에서, 디바이스는 획득된 이미지의 노이즈를 제거한다. 예를 들어, 디바이스는 이미지에 포함된 불필요한 공백이나, 정보를 포함하지 않는 부분, 스캔 또는 촬영 중에 이미지에 포함된 불필요한 화소 등을 제거할 수 있다.In one embodiment, the device removes noise from the acquired image. For example, a device may remove unwanted white space contained in an image, a portion that does not contain information, or an unnecessary pixel included in an image during a scan or photographing.

일 실시 예에서, 디바이스는 이미지를 정합시킬 수 있다. 예를 들어, 디바이스는 이미지를 회전시키거나, 이미지의 크기를 조절하여 이미지를 소정의 규격에 맞도록 정합시킬 수 있다.In one embodiment, the device can match the images. For example, the device may rotate the image or adjust the size of the image to match the image to a predetermined standard.

일 실시 예에서, 디바이스는 노이즈가 제거되고, 정합된 이미지를 하나 이상의 블록으로 분할(segment)할 수 있다.In one embodiment, the device can remove noise and segment the matched image into one or more blocks.

예를 들어, 디바이스는 이미지에 포함된 흑화소의 분포를 바탕으로 이미지를 여러 개의 블록으로 분할할 수 있다. 스캔된 문서의 많은 부분이 공백 혹은 의미없는 부분이므로, 분할된 블록만을 객체 검출에 활용함으로써, 처리 속도를 향상시킬 수 있다.For example, a device can divide an image into blocks based on the distribution of black pixels contained in the image. Since a large part of the scanned document is blank or meaningless, only the divided blocks are used for object detection to improve the processing speed.

일 실시 예에서, 디바이스는 분할된 블록을 이용하여 정규화된 이미지를 생성할 수 있다. 예를 들어, 디바이스는 흑백 및 컬러이미지를 모두 처리하기 위하여 3개 채널을 갖는 컬러 이미지로, 가로 및 세로 길이가 동일한 정규화 이미지를 생성할 수 있다.In one embodiment, the device may generate a normalized image using the partitioned block. For example, a device can produce a normalized image with the same horizontal and vertical lengths as a color image with three channels to handle both monochrome and color images.

단계 S220에서, 디바이스는 단계 S210에서 획득된 이미지를 복수의 셀로 분할한다.In step S220, the device divides the image obtained in step S210 into a plurality of cells.

일 실시 예에서, 디바이스는 단계 S210에서 이미지를 분할한 하나 이상의 블록 각각을 복수의 셀로 분할할 수 있다.In one embodiment, the device may divide each of the one or more blocks into which the image was divided in step S210 into a plurality of cells.

블록을 복수의 셀로 분할하는 구체적인 방법은 후술한다.A specific method of dividing the block into a plurality of cells will be described later.

단계 S230에서, 디바이스는 객체를 검출하기 위한 네트워크에 포함된 복수의 레이어를 이용하여, 단계 S220에서 분할된 복수의 셀 각각에 대하여 검출대상 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득한다.In step S230, the device obtains information on one or more areas estimated to contain the detection object for each of the plurality of cells divided in step S220 using a plurality of layers included in the network for detecting the object do.

일 실시 예에서, 하나 이상의 영역에 대한 정보는, 하나 이상의 영역의 중심점 좌표, 폭 및 높이를 포함한다. In one embodiment, the information for the one or more regions includes the center point coordinates, width, and height of one or more regions.

일 실시 예에서, 하나 이상의 중심점 좌표, 폭 및 높이는 단계 S210에서 분할된 하나 이상의 블록 중 하나 이상의 영역 각각이 속한 블록의 폭 및 높이를 기준으로 정규화된 정보일 수 있다. 예를 들어, 하나 이상의 중심점 좌표, 폭 및 높이는 블록의 폭 및 높이를 기준으로 0~1사이의 값으로 정규화될 수 있다.In one embodiment, the one or more center point coordinates, width, and height may be normalized information based on the width and height of the block to which each of at least one of the one or more blocks divided in step S210 belongs. For example, one or more center point coordinates, width, and height may be normalized to a value between 0 and 1 based on the width and height of the block.

일 실시 예에서, 디바이스는 하나 이상의 클래스별로 검출대상 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득한다.In one embodiment, the device obtains information about one or more regions that are estimated to include the object of detection by one or more classes.

또한, 디바이스는 획득된 하나 이상의 영역 각각에 대한 신뢰도를 획득한다. In addition, the device obtains reliability for each of the one or more regions obtained.

일 실시 예에서, 디바이스는 획득된 하나 이상의 영역 각각에 대하여, 소정의 임계치 이상의 신뢰도를 갖는 하나 이상의 영역을 획득한다. 디바이스는 소정의 임계치 이상의 신뢰도를 갖는 하나 이상의 영역을 이용하여 객체를 검출한다.In one embodiment, the device obtains, for each of the one or more regions obtained, one or more regions having a confidence that is above a predetermined threshold. The device detects an object using one or more regions having a reliability higher than a predetermined threshold value.

예를 들어, 디바이스는 획득된 하나 이상의 영역 중 소정의 임계치 이상의 신뢰도를 갖지 못하는 영역은 배제하고, 소정의 임계치 이상의 신뢰도를 갖는 영역에 대해서만 분석을 통해 객체를 검출할 수 있다.For example, the device can exclude a region that does not have a reliability higher than a predetermined threshold among the acquired one or more regions, and can detect an object through analysis only for regions having a reliability higher than a predetermined threshold.

일 실시 예에서, 디바이스는 획득된 하나 이상의 영역의 중첩영역에 기초하여 객체를 검출한다. 예를 들어, 획득된 하나 이상의 영역에 있어서 많은 영역들이 중첩되는 부분이 있다면, 해당 부분에서 객체가 검출될 확률이 높다고 할 수 있다. 따라서, 디바이스는 하나 이상의 영역의 중첩영역에 기초하여 객체를 검출할 수 있다.In one embodiment, the device detects the object based on the overlap region of the acquired one or more regions. For example, if there is a part where many areas overlap in the acquired one or more areas, the probability that an object is detected in the corresponding part is high. Thus, a device can detect an object based on an overlapping area of one or more areas.

일 실시 예에서, 디바이스는 소정의 임계치 이상의 신뢰도를 갖는 영역들의 중첩영역에 기초하여 객체를 검출할 수 있다.In one embodiment, the device may detect an object based on an overlapping region of regions having a reliability greater than a predetermined threshold.

일 실시 예에서, 디바이스에서 이용되는 뉴럴 네트워크는 복수의 레이어들을 포함하고, 하나 이상의 영역에 대한 정보 및 하나 이상의 영역의 신뢰도를 출력하는 출력 레이어를 포함한다.In one embodiment, a neural network used in a device includes a plurality of layers, and includes an output layer that outputs information about one or more regions and reliability of one or more regions.

일 실시 예에서, 출력 레이어는 복수의 분할된 셀 각각에 대하여 하나 이상의 영역 각각의 위치, 크기 및 신뢰도를 나타내는 뉴런 및 검출대상 객체의 클래스를 나타내는 뉴런을 포함한다.In one embodiment, the output layer includes a neuron representing the location, size and reliability of each of the one or more regions for each of the plurality of segmented cells and a neuron representing the class of the detected object.

도 3은 일 실시예에 따라 객체를 검출하는 디바이스의 블록도이다. 3 is a block diagram of a device that detects an object in accordance with one embodiment.

프로세서(310)는 하나 이상의 코어(core, 미도시) 및 그래픽 처리부(미도시) 및/또는 다른 구성 요소와 신호를 송수신하는 연결 통로(예를 들어, 버스(bus) 등)를 포함할 수 있다.The processor 310 may include one or more cores (not shown) and a connection path (e.g., a bus, etc.) to transmit and receive signals to and / or from a graphics processing unit (not shown) .

일 실시예에 따라 프로세서(310)는 뉴럴 네트워크(100)에 포함된 하나 이상의 인스트럭션들을 병렬적으로 처리할 수 있다.According to one embodiment, the processor 310 may process one or more instructions contained in the neural network 100 in parallel.

예를 들어, 프로세서(310)는 뉴럴 네트워크(100)에 포함된 복수의 레이어들을 이용하여, 이미지의 색상, 엣지, 폴리건, 채도, 명도, 색온도, 블러, 선명도, 명도비 등과 같은 속성 정보를 획득할 수 있다.For example, the processor 310 may use the plurality of layers included in the neural network 100 to obtain attribute information such as color, edge, polygon, saturation, brightness, color temperature, blur, sharpness, .

일 실시예에 따른 프로세서(310)는 메모리에 저장된 하나 이상의 인스트럭션을 실행함으로써, 도 2와 관련하여 설명된 객체 검출방법을 수행한다.The processor 310 in accordance with one embodiment performs the object detection method described with respect to FIG. 2 by executing one or more instructions stored in memory.

예를 들어, 프로세서(310)는 메모리에 저장된 하나 이상의 인스트럭션을 실행함으로써 검출대상 객체를 포함하는 이미지를 획득하고, 상기 획득된 이미지를 복수의 셀로 분할하고, 상기 객체를 검출하기 위한 네트워크에 포함된 복수의 레이어를 이용하여, 상기 복수의 분할된 셀 각각에 대하여 상기 객체를 포함하는 것으로 추정되는 하나 이상의 영역에 대한 정보를 획득하고, 상기 획득된 하나 이상의 영역 각각에 대한 신뢰도를 획득하고, 상기 획득된 신뢰도 및 상기 획득된 하나 이상의 영역의 중첩영역에 기초하여, 상기 객체를 검출한다.For example, the processor 310 may be configured to obtain an image containing the object to be detected by executing one or more instructions stored in memory, to divide the obtained image into a plurality of cells, Acquiring information on at least one region estimated to contain the object for each of the plurality of divided cells using a plurality of layers, obtaining reliability for each of the obtained one or more regions, Based on the obtained reliability and the overlapping area of the obtained one or more areas.

한편, 프로세서(310)는 프로세서(310) 내부에서 처리되는 신호(또는, 데이터)를 일시적 및/또는 영구적으로 저장하는 램(RAM: Random Access Memory, 미도시) 및 롬(ROM: Read-Only Memory, 미도시)을 더 포함할 수 있다. 또한, 프로세서(310)는 그래픽 처리부, 램 및 롬 중 적어도 하나를 포함하는 시스템온칩(SoC: system on chip) 형태로 구현될 수 있다. The processor 310 may include a random access memory (RAM) (not shown) and a read only memory (ROM) for temporarily and / or permanently storing signals (or data) , Not shown). In addition, the processor 310 may be implemented as a system-on-chip (SoC) including at least one of a graphics processing unit, a RAM, and a ROM.

메모리(320)에는 프로세서(310)의 처리 및 제어를 위한 프로그램들(하나 이상의 인스트럭션들)을 저장할 수 있다. 메모리(320)에 저장된 프로그램들은 기능에 따라 복수 개의 모듈들로 구분될 수 있다. 일 실시예에 따라 메모리(320)는 데이터 학습부 및 데이터 인식부를 포함할 수 있다. 또한, 데이터 학습부 및 데이터 인식부는 각각 독립적으로 뉴럴 네트워크 모듈을 포함하거나, 하나의 뉴럴 네트워크 모듈을 공유할 수 있다.The memory 320 may store programs (one or more instructions) for processing and control of the processor 310. Programs stored in the memory 320 may be divided into a plurality of modules according to functions. According to one embodiment, the memory 320 may include a data learning unit and a data recognition unit. The data learning unit and the data recognizing unit may each include a neural network module or may share one neural network module.

뉴럴 네트워크 모듈은 복수의 레이어들을 포함할 수 있다. 뉴럴 네트워크 모듈에 포함된 복수의 레이어들은 각각 이미지로부터 적어도 하나의 속성 정보를 검출하고 검출된 적어도 하나의 속성 정보를 추상화하는 하나 이상의 인스트럭션들을 포함할 수 있다. The neural network module may include a plurality of layers. The plurality of layers included in the neural network module may each include one or more instructions for detecting at least one attribute information from the image and abstracting the detected at least one attribute information.

예를 들어, 제1 내지 제 N 레이어들(122 내지 128)은 각각 이미지로부터 이미지의 속성 정보를 추출하는 하나 이상의 인스트럭션을 포함하는 컨벌루션 레이어(convolution layer), 및/또는 추출된 이미지 속성 으로부터 대표값을 결정하는 하나 이상의 인스트럭션을 포함하는 풀링 레이어(pooling layer)를 포함할 수 있다.For example, the first to Nth layers 122 to 128 may each comprise a convolution layer including one or more instructions for extracting attribute information of an image from an image, and / or a convolution layer, And a pooling layer that includes one or more instructions that determine the state of the processor.

도 4를 참고하면, 일 실시예에 따른 프로세서(330)는 데이터 학습부(410) 및 데이터 인식부(420)를 포함할 수 있다.Referring to FIG. 4, the processor 330 according to one embodiment may include a data learning unit 410 and a data recognition unit 420.

데이터 학습부(410)는 이미지 내에 포함된 객체를 검출하기 위한 기준을 학습할 수 있다. 예를 들어, 데이터 학습부(410)는 객체를 검출하기 위해 사용되는 학습 데이터를 이용하여 뉴럴 네트워크(100)에 포함되는 적어도 하나의 레이어의 파라미터를 학습시킬 수 있다. The data learning unit 410 may learn a criterion for detecting an object included in the image. For example, the data learning unit 410 can learn parameters of at least one layer included in the neural network 100 using learning data used for detecting an object.

데이터 인식부(420)는 데이터 학습부(410)를 통해 학습된 기준에 기초하여, 이미지 내의 객체를 검출할 수 있다. The data recognition unit 420 can detect an object in the image based on the learned criterion through the data learning unit 410. [

데이터 학습부(410) 및 데이터 인식부(420) 중 적어도 하나는, 적어도 하나의 하드웨어 칩 형태로 제작되어 디바이스에 탑재될 수 있다. 예를 들어, 데이터 학습부(410) 및 데이터 인식부(420) 중 적어도 하나는 인공 지능(AI; artificial intelligence)을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 전술한 각종 디바이스에 탑재될 수도 있다.At least one of the data learning unit 410 and the data recognition unit 420 may be manufactured in at least one hardware chip form and mounted on the device. For example, at least one of the data learning unit 410 and the data recognition unit 420 may be manufactured in the form of a dedicated hardware chip for artificial intelligence (AI) Or an application processor) or a graphics-only processor (e.g., a GPU), and may be mounted on various devices as described above.

이 경우, 데이터 학습부(410) 및 데이터 인식부(420)는 하나의 디바이스에 탑재될 수도 있으며, 또는 별개의 디바이스들에 각각 탑재될 수도 있다. 예를 들어, 데이터 학습부(410) 및 데이터 인식부(420) 중 하나는 디바이스에 포함되고, 나머지 하나는 서버에 포함될 수 있다. 또한, 데이터 학습부(410) 및 데이터 인식부(420)는 유선 또는 무선으로 통하여, 데이터 학습부(410)가 구축한 모델 정보를 데이터 인식부(420)로 제공할 수도 있고, 데이터 인식부(420)로 입력된 데이터가 추가 학습 데이터로서 데이터 학습부(410)로 제공될 수도 있다.In this case, the data learning unit 410 and the data recognition unit 420 may be mounted on one device or may be mounted on separate devices, respectively. For example, one of the data learning unit 410 and the data recognition unit 420 may be included in the device, and the other may be included in the server. The data learning unit 410 and the data recognition unit 420 may provide the model information constructed by the data learning unit 410 to the data recognition unit 420 through a wired or wireless connection, 420 may be provided to the data learning unit 410 as additional learning data.

한편, 데이터 학습부(410) 및 데이터 인식부(420) 중 적어도 하나는 소프트웨어 모듈로 구현될 수 있다. 데이터 학습부(410) 및 데이터 인식부(420) 중 적어도 하나가 소프트웨어 모듈(또는, 인스트럭션(instruction) 포함하는 프로그램 모듈)로 구현되는 경우, 소프트웨어 모듈은 컴퓨터로 읽을 수 있는 판독 가능한 비일시적 판독 가능 기록매체(non-transitory computer readable media)에 저장될 수 있다. 또한, 이 경우, 적어도 하나의 소프트웨어 모듈은 OS(Operating System)에 의해 제공되거나, 소정의 어플리케이션에 의해 제공될 수 있다. 또는, 적어도 하나의 소프트웨어 모듈 중 일부는 OS(Operating System)에 의해 제공되고, 나머지 일부는 소정의 어플리케이션에 의해 제공될 수 있다.At least one of the data learning unit 410 and the data recognition unit 420 may be implemented as a software module. When at least one of the data learning unit 410 and the data recognition unit 420 is implemented as a software module (or a program module including an instruction), the software module may be a computer-readable, And may be stored in non-transitory computer readable media. In this case, at least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by an operating system (OS), and some of the software modules may be provided by a predetermined application.

도 5는 일 실시예에 따른 데이터 학습부의 블록도이다. 5 is a block diagram of a data learning unit according to an embodiment.

도 5를 참조하면, 일부 실시예에 따른 데이터 학습부(410)는 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550)를 포함할 수 있다. 다만, 이는 일 실시예일뿐, 전술한 구성 들 보다 적은 구성 요소로 데이터 학습부(410)가 구성되거나, 전술한 구성들 이외에 다른 구성 요소가 추가적으로 데이터 학습부(410)에 포함될 수 있다. 5, a data learning unit 410 according to some embodiments includes a data obtaining unit 510, a preprocessing unit 520, a learning data selecting unit 530, a model learning unit 540, 550). However, this is only an example, and the data learning unit 410 may be configured with fewer components than the above-described configurations, or other components other than the above-described configurations may be additionally included in the data learning unit 410. [

데이터 획득부(510)는 이미지 및 동영상 중 적어도 하나를 획득할 수 있다. 여기에서, 동영상은 복수의 이미지들로 구성될 수 있다. 일 예로, 데이터 획득부(510)는 데이터 학습부(410)를 포함하는 디바이스 또는 학습부(410)를 포함하는 디바이스와 통신 가능한 외부의 디바이스로부터 이미지를 획득할 수 있다. The data acquisition unit 510 may acquire at least one of an image and a moving image. Here, the moving image may be composed of a plurality of images. In one example, the data acquisition unit 510 may acquire an image from an external device capable of communicating with a device including the data learning unit 410 or a device including the learning unit 410.

전처리부(520)는 객체의 검출을 위한 학습에 획득된 이미지가 이용될 수 있도록, 획득된 이미지를 전처리할 수 있다. 전처리부(520)는 후술할 모델 학습부(540)가 객체의 검출을 위한 학습을 위하여 획득된 이미지를 이용할 수 있도록, 획득된 이미지를 기 설정된 포맷으로 가공할 수 있다.The preprocessing unit 520 can preprocess the acquired image so that the acquired image can be used for learning for object detection. The preprocessing unit 520 can process the acquired image into a predetermined format so that the model learning unit 540, which will be described later, can use the image obtained for learning for object detection.

학습 데이터 선택부(530)는 전처리된 데이터 중에서 학습에 필요한 이미지를 선택할 수 있다. 선택된 이미지는 모델 학습부(540)에 제공될 수 있다. 학습 데이터 선택부(530)는 설정된 기준에 따라, 전처리된 이미지 중에서 학습에 필요한 이미지를 선택할 수 있다. The learning data selection unit 530 can select an image required for learning from the preprocessed data. The selected image may be provided to the model learning unit 540. The learning data selecting unit 530 can select an image required for learning from among the preprocessed images according to the set criteria.

모델 학습부(540)는 데이터 인식 모델을 학습시킬 수 있다. 예를 들어, 데이터 학습부(410)는 객체를 검출하기 위해 사용되는 학습 데이터를 이용하여 뉴럴 네트워크(100)에 포함되는 적어도 하나의 레이어의 파라미터를 학습시킬 수 있다.The model learning unit 540 can learn the data recognition model. For example, the data learning unit 410 can learn parameters of at least one layer included in the neural network 100 using learning data used for detecting an object.

또한, 모델 학습부(540)는, 예를 들어, 학습에 따른 객체의 검출 결과가 올바른지에 대한 피드백을 이용하는 강화 학습(reinforcement learning)을 통하여, 데이터 인식 모델을 학습시킬 수 있다.In addition, the model learning unit 540 can learn the data recognition model through reinforcement learning using, for example, feedback as to whether the detection result of the object according to the learning is correct.

또한, 데이터 인식 모델이 학습되면, 모델 학습부(540)는 학습된 데이터 인식 모델을 저장할 수 있다. 이 경우, 모델 학습부(540)는 학습된 데이터 인식 모델을 데이터 인식부(420)를 포함하는 디바이스의 메모리에 저장할 수 있다. 또는, 모델 학습부(540)는 학습된 데이터 인식 모델을 후술할 데이터 인식부(420)를 포함하는 디바이스의 메모리에 저장할 수 있다. 또는, 모델 학습부(540)는 학습된 데이터 인식 모델을 디바이스와 유선 또는 무선 네트워크로 연결되는 서버의 메모리에 저장할 수도 있다.Further, when the data recognition model is learned, the model learning unit 540 can store the learned data recognition model. In this case, the model learning unit 540 can store the learned data recognition model in the memory of the device including the data recognition unit 420. [ Alternatively, the model learning unit 540 may store the learned data recognition model in the memory of the device including the data recognition unit 420 to be described later. Alternatively, the model learning unit 540 may store the learned data recognition model in a memory of a server connected to the device via a wired or wireless network.

이 경우, 학습된 데이터 인식 모델이 저장되는 메모리는, 예를 들면, 디바이스의 적어도 하나의 다른 구성요소에 관계된 명령 또는 데이터를 함께 저장할 수도 있다. 또한, 메모리는 소프트웨어 및/또는 프로그램을 저장할 수도 있다. 프로그램은, 예를 들면, 커널, 미들웨어, 애플리케이션 프로그래밍 인터페이스(API) 및/또는 애플리케이션 프로그램(또는 "애플리케이션") 등을 포함할 수 있다.In this case, the memory in which the learned data recognition model is stored may also store instructions or data associated with, for example, at least one other component of the device. The memory may also store software and / or programs. The program may include, for example, a kernel, a middleware, an application programming interface (API) and / or an application program (or "application").

모델 평가부(550)는 데이터 인식 모델에 평가 데이터를 입력하고, 평가 데이터로부터 출력되는 검출 결과가 소정 기준을 만족하지 못하는 경우, 모델 학습부(540)로 하여금 다시 학습하도록 할 수 있다. 이 경우, 평가 데이터는 데이터 인식 모델을 평가하기 위한 기 설정된 데이터일 수 있다. 여기에서, 평가 데이터는 데이터 인식 모델을 기반으로 검출된 객체와 실제의 객체 간의 일치 비율 등을 포함할 수 있다. The model evaluation unit 550 may input evaluation data to the data recognition model and allow the model learning unit 540 to learn again when the detection result output from the evaluation data does not satisfy the predetermined criterion. In this case, the evaluation data may be predetermined data for evaluating the data recognition model. Here, the evaluation data may include a matching ratio between the detected object and the actual object based on the data recognition model.

한편, 학습된 데이터 인식 모델이 복수 개가 존재하는 경우, 모델 평가부(550)는 각각의 학습된 동영상 인식 모델에 대하여 소정 기준을 만족하는지를 평가하고, 소정 기준을 만족하는 모델을 최종 데이터 인식 모델로서 결정할 수 있다. On the other hand, when there are a plurality of learned data recognition models, the model evaluation unit 550 evaluates whether each of the learned moving image recognition models satisfies a predetermined criterion, and if the model satisfying the predetermined criterion is a final data recognition model You can decide.

한편, 데이터 학습부(410) 내의 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550) 중 적어도 하나는, 적어도 하나의 하드웨어 칩 형태로 제작되어 디바이스에 탑재될 수 있다. 예를 들어, 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550) 중 적어도 하나는 인공 지능(AI; artificial intelligence)을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 전술한 각종 디바이스 에 탑재될 수도 있다.At least one of the data acquisition unit 510, the preprocessor 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 in the data learning unit 410 may include at least one And can be mounted on a device. For example, at least one of the data acquisition unit 510, the preprocessor 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 may be an artificial intelligence (AI) Or may be implemented as part of a conventional general-purpose processor (e.g., a CPU or an application processor) or a graphics-only processor (e.g., a GPU) and loaded on various devices as described above.

또한, 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550)는 하나의 디바이스에 탑재될 수도 있으며, 또는 별개의 디바이스들에 각각 탑재될 수도 있다. 예를 들어, 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550) 중 일부는 디바이스에 포함되고, 나머지 일부는 서버에 포함될 수 있다.The data acquisition unit 510, the preprocessor 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 may be mounted on one device, Respectively. For example, some of the data acquisition unit 510, the preprocessor 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 are included in the device, .

또한, 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550) 중 적어도 하나는 소프트웨어 모듈로 구현될 수 있다. 데이터 획득부(510), 전처리부(520), 학습 데이터 선택부(530), 모델 학습부(540) 및 모델 평가부(550) 중 적어도 하나가 소프트웨어 모듈(또는, 인스트럭션(instruction) 포함하는 프로그램 모듈)로 구현되는 경우, 소프트웨어 모듈은 컴퓨터로 읽을 수 있는 판독 가능한 비일시적 판독 가능 기록매체(non-transitory computer readable media)에 저장될 수 있다. 또한, 이 경우, 적어도 하나의 소프트웨어 모듈은 OS(Operating System)에 의해 제공되거나, 소정의 애플리케이션에 의해 제공될 수 있다. 또는, 적어도 하나의 소프트웨어 모듈 중 일부는 OS(Operating System)에 의해 제공되고, 나머지 일부는 소정의 애플리케이션에 의해 제공될 수 있다.At least one of the data acquisition unit 510, the preprocessing unit 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 may be implemented as a software module. At least one of the data acquisition unit 510, the preprocessor 520, the learning data selection unit 530, the model learning unit 540, and the model evaluation unit 550 is a software module (or a program including an instruction) Module), the software module may be stored in a computer-readable, readable non-transitory computer readable media. Also, in this case, the at least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by an Operating System (OS), and some of the software modules may be provided by a predetermined application.

도 6은 일 실시 예에 따른 디바이스를 도시한 블록도이다.6 is a block diagram illustrating a device according to one embodiment.

도 6을 참조하면, 디바이스(300)는 이미지 전처리부(330), 학습데이터 편집 및 생성부(332), 학습데이터 저장부(334), 학습부(336) 및 학습결과저장부(338)를 포함한다. 6, the device 300 includes an image preprocessing unit 330, a learning data editing and generating unit 332, a learning data storage unit 334, a learning unit 336, and a learning result storage unit 338 .

일 실시 예에서, 이미지 전처리부(330), 학습데이터 편집 및 생성부(332), 학습데이터 저장부(334), 학습부(336) 및 학습결과저장부(338)는 도 4 및 도 5에 도시된 데이터 학습부(410)의 일 실시 예로서 이해될 수 있다.In one embodiment, the image preprocessing unit 330, the learning data editing and generating unit 332, the learning data storage unit 334, the learning unit 336, and the learning result storage unit 338 are shown in FIGS. 4 and 5 May be understood as an embodiment of the data learning unit 410 shown.

따라서, 도 6에 도시된 디바이스(300) 및 디바이스(300)에 포함된 이미지 전처리부(330), 학습데이터 편집 및 생성부(332), 학습데이터 저장부(334), 학습부(336) 및 학습결과저장부(338)와 관련하여 생략된 내용이라 하더라도, 도 4 및 도 5의 데이터 학습부(410)와 관련하여 이미 설명된 내용은 도 6에 도시된 디바이스(300) 및 디바이스(300)에 포함된 이미지 전처리부(330), 학습데이터 편집 및 생성부(332), 학습데이터 저장부(334), 학습부(336) 및 학습결과저장부(338)에도 적용될 수 있다.Therefore, the image preprocessing unit 330, the learning data editing and generating unit 332, the learning data storage unit 334, the learning unit 336, and the learning unit 336 included in the device 300 and the device 300 shown in FIG. The contents already described with respect to the data learning unit 410 of FIGS. 4 and 5 are the same as those of the device 300 and the device 300 shown in FIG. 6, The learning data editing and generating unit 332, the learning data storage unit 334, the learning unit 336, and the learning result storage unit 338 included in the image processing unit 330 of FIG.

이하에서는, 도 7 내지 도 11을 참조하여, 도 6에 도시된 디바이스(300)의 동작을 설명한다.Hereinafter, the operation of the device 300 shown in Fig. 6 will be described with reference to Figs. 7 to 11. Fig.

도 7 내지 도 11은 개시된 실시 예에 따라 디바이스가 객체를 검출하기 위한 학습을 수행하고, 객체를 검출하는 방법을 설명하기 위한 도면이다.FIGS. 7 to 11 are diagrams for explaining a method for a device to perform learning for detecting an object and to detect an object according to the disclosed embodiment. FIG.

일 실시 예에서, 디바이스(300)는 신경망 학습을 위한 학습데이터(600)를 생성하고 이를 이용하여 뉴럴 네트워크를 학습시키고 그 학습결과를 저장한다. 예를 들어, 뉴럴 네트워크는 컨벌루션 기반의 뉴럴 네트워크일 수 있다.In one embodiment, the device 300 generates learning data 600 for neural network learning and uses it to learn the neural network and store the learning results. For example, the neural network may be a convolution based neural network.

일 실시 예에서, 이미지 전처리부(330)는 스캔된 문서 이미지로부터 노이즈를 제거한 후, 흑화소의 분포를 바탕으로 여러개의 블록으로 분할한다. 도 8을 참조하면, 스캔된 문서 이미지(700)로부터 분할된 각각의 블록(710 내지 740)이 도시되어 있다.In one embodiment, the image preprocessing unit 330 removes noise from the scanned document image, and then divides the image into a plurality of blocks based on the distribution of black pixels. Referring to Figure 8, each of the blocks 710-740 that are divided from the scanned document image 700 is shown.

분할된 각각의 블록은 뉴럴 네트워크(100)의 입력레이어(110)의 입력값으로 정규화된다. 이미지 전처리부(330)는 분할된 블록을 이용하여 학습데이터(600)의 정규화된 이미지(610)부분을 생성한다. Each of the divided blocks is normalized to the input value of the input layer 110 of the neural network 100. The image preprocessing unit 330 generates a normalized image 610 portion of the learning data 600 using the divided blocks.

일 실시 예에서, 정규화 이미지는 흑백 및 컬러이미지를 모두 처리하기 위하여 3개 채널을 갖는 컬러 이미지로서, 가로 및 세로가 동일한 소정의 크기로 정규화된다.In one embodiment, the normalized image is a color image with three channels to process both monochrome and color images, and is normalized to a predetermined size of equal width and height.

학습데이터 편집 및 생성부(332)는 사용자의 입력을 받아 이미지 상의 객체 영역과 해당 영역의 종류(또는 클래스)를 설정하고, 이를 이용하여 마스크 데이터(620)를 만든다. The learning data editing and generating unit 332 receives an input from a user and sets an object area on the image and a type (or a class) of the corresponding area, and creates mask data 620 using the set.

예를 들어, 마스크 데이터(620)는 도 9에 도시된 바와 같이 객체 영역(810)의 중심점(812)의 x, y 좌표와 영역의 폭(814)과 높이(816)으로 구성된다. 예를 들어, 객체 영역(810)의 중심점(812) 좌표는 세그먼트된 블럭(800)의 폭과 높이를 기준으로 0과 1 사이로 정규화될 수 있다. For example, the mask data 620 is composed of the x and y coordinates of the center point 812 of the object area 810 and the width 814 and the height 816 of the area, as shown in Fig. For example, the center point 812 coordinates of the object region 810 may be normalized between 0 and 1 based on the width and height of the segmented block 800.

또한 영역의 폭(814)과 높이(816) 역시 세그먼트된 블럭(800)의 폭과 높이를 기준으로 0과 1사이로 정규화될 수 있다. The width 814 and height 816 of the region may also be normalized between 0 and 1 based on the width and height of the segmented block 800.

따라서 마스크 데이터(620)에 포함되는 객체 영역정보는 영역의 중심점(812)과 영역의 폭(814) 및 높이(816)의 정보를 포함하고, 각각의 정보가 0과 1 사이로 정규화된 정보이다.Therefore, the object area information included in the mask data 620 includes information of the center point 812 of the area, the width 814 and the height 816 of the area, and each information is information normalized between 0 and 1.

마스크 데이터(620)는 뉴럴 네트워크 학습시 지도학습을 위한 데이터로 사용된다.The mask data 620 is used as data for map learning in neural network learning.

일 실시 예에서, 학습데이터 저장부(334)는 생성된 학습데이터에 노이즈 추가, 회전, 이동 등의 변환작업을 거쳐 다수의 학습데이터를 자동으로 생성 및 저장한다. 이때 학습데이터 저장부(334)는 분산 병렬 처리가 가능한 저장소로 형성될 수 있다.In one embodiment, the learning data storage unit 334 automatically generates and stores a plurality of learning data through a conversion operation such as adding noise, rotation, and movement to the generated learning data. At this time, the learning data storage unit 334 may be formed as a storage capable of distributed parallel processing.

학습부(336)는 생성된 학습데이터를 이용하여 뉴럴 네트워크(100)를 학습시킨다. 일 실시예에서, 뉴럴 네트워크(100)의 학습은 지도학습(supervised learning)으로 이루어진다. 지도학습이란, 입력 데이터와 그에 대응하는 출력 데이터를 함께 신경망에 입력하고, 입력 데이터에 대응하는 출력 데이터가 출력되도록 연결된 간선들의 가중치를 업데이트하는 방법이다. 예를 들어, 개시된 실시 예에 따른 뉴럴 네트워크는 델타 규칙 및 오류역전파 학습 등을 이용하여 인공뉴런들 사이의 연결 가중치를 업데이트 할수 있다.The learning unit 336 learns the neural network 100 using the generated learning data. In one embodiment, learning of the neural network 100 is done by supervised learning. The map learning is a method of inputting input data and corresponding output data together into a neural network and updating the weights of connected edges so that output data corresponding to the input data is output. For example, the neural network according to the disclosed embodiments may update the connection weights between artificial neurons using delta rules and error-domain propagation learning.

학습결과 저장부(338)는 뉴럴 네트워크(100)의 학습결과인 가중치값과 뉴럴 네트워크(100)의 구성정보를 학습결과 데이터로 저장한다.The learning result storage unit 338 stores weight values as learning results of the neural network 100 and configuration information of the neural network 100 as learning result data.

도 10은 개시된 실시 예에 따라 객체를 검출하는 방법을 도시한 도면이다.10 is a diagram illustrating a method for detecting an object in accordance with the disclosed embodiment.

개시된 실시 예에서는, 검출속도 향상을 위해서 분할된 블록 이미지(800)를 일정한 크기의 셀(802)로 나눈 후 각 셀(802)에 기초하여 객체 검출을 수행한다. In the disclosed embodiment, to improve the detection speed, the divided block image 800 is divided into cells 802 of a predetermined size, and object detection is performed based on each cell 802.

예를 들어, 디바이스(300)는 분할된 블록(800) 이미지를 w*h개의 셀로 나눈 후 나뉜 각 셀별로 각각 n개의 추정된 객체 영역과 신뢰도를 추출한다. For example, the device 300 divides the divided block 800 image into w * h cells, and extracts n estimated object regions and reliability for each divided cell.

도 11을 참조하면, 출력레이어(130)의 구성의 일 예가 도시되어 있다.Referring to Fig. 11, an example of the configuration of the output layer 130 is shown.

예를 들어, 출력레이어(130)는 각 셀별로 각각 n개의 추정된 객체 영역과 영역별 신뢰도를 출력하는 컨벌루션 뉴럴 네트워크로 구성된다. For example, the output layer 130 is composed of convolution neural networks that output n estimated object areas and reliability for each cell for each cell.

또한, 나뉜 셀 중 하나인 셀(804)에 대응하는 뉴런(900)은 각각 추정된 객체 영역의 중심점의 가로 좌표 x를 나타내는 뉴런(901), 세로 좌표 y를 나타내는 뉴런(902), 영역의 폭을 나타내는 뉴런(903), 영역의 높이를 나타내는 뉴런(904), 신뢰도를 나타내는 뉴런(905)들이 순차적으로 n개가 포함되며, 클래스 구분을 위한 뉴런(906 및 907)으로 구성될 수 있다.The neuron 900 corresponding to one of the divided cells 804 includes a neuron 901 representing the abscissa x of the center point of the estimated object region, a neuron 902 representing the ordinate y, A neuron 904 representing the height of the region and a neuron 905 representing the reliability are sequentially included and may consist of neurons 906 and 907 for classification.

이후 디바이스(300)는 각 셀별로 추정된 객체 영역들간의 중첩 부분과 신뢰도를 이용하여 최적의 객체 영역을 추정할 수 있다.Then, the device 300 can estimate an optimal object region using the overlapping portion and the reliability between the object regions estimated for each cell.

이때 객체 영역(810)은 학습을 위한 마스크 데이터의 구조와 같은 영역의 중심점(812)좌표 x, y 와 영역의 폭(814) 및 영역의 높이(816)의 정보를 갖는다. 또한 추정된 객체 영역(810)마다 신뢰도를 갖는다.At this time, the object region 810 has the coordinates x, y of the center point 812 of the region such as the structure of the mask data for learning, the width 814 of the region, and the height 816 of the region. And also has reliability for each estimated object region 810.

따라서 객체 영역 검출을 위한 뉴럴 네트워크(100)의 최종 출력 레이어(130)는 w*h*(n*5+c)의 크기의 출력을 갖는 컨벌루션 레이어의 형태로 구성된다. 이때 c는 구분하고자 하는 클래스의 갯수이며, 컨벌루션 레이어의 필터의 개수는 n*5+c가 된다.Thus, the final output layer 130 of the neural network 100 for object region detection is configured in the form of a convolution layer having an output of magnitude w * h * (n * 5 + c). In this case, c is the number of classes to classify, and the number of filters in the convolution layer is n * 5 + c.

일 실시 예에서, 클래스의 개수는 인쇄된 인쇄체형식의 주민등록번호와 필기된 필기체 형식의 주민등록번호를 포함하는 2개가 될 수 있다. 이외에도 지문이나 이름 등 다른 형식의 개인정보 검출을 위하여 클래스를 추가하고, 다양한 형식의 개인정보를 검출할 수 있다.In one embodiment, the number of classes may be two, including the resident registration number in the printed typography format and the resident registration number in the written handwriting format. In addition, classes can be added to detect other types of personal information, such as fingerprints and names, and personal information in various formats can be detected.

예로, 입력 레이어(110)는 이미지 전처리부(330)의 처리 결과로 정규화된 이미지를 입력값으로 받기 위해 3개 채널에 608 * 608 크기를 갖도록 구성될 수 있다.For example, the input layer 110 may be configured to have a size of 608 * 608 in three channels to receive an image normalized as a result of processing by the image preprocessing unit 330 as an input value.

예로, 복수의 레이어들(122 내지 128)에 포함된 제1 레이어(122)는 컨벌루션 레이어로서 입력레이어(110)의 입력값을 받고, 윈도우의 크기는 3, 필터의 크기는 32로 구성된다.For example, the first layer 122 included in the plurality of layers 122 to 128 receives an input value of the input layer 110 as a convolution layer, and the size of the window is 3 and the size of the filter is 32. [

예로, 복수의 레이어들(122 내지 128)에 포함된 제2 레이어(124)는 풀링 레이어로서, 컨벌루션 레이어의 출력값을 받고, 다운스케일 크기는 2로 구성된다.For example, the second layer 124 included in the plurality of layers 122 to 128 is a pulling layer, and receives the output value of the convolution layer, and the downscale size is 2.

이후의 레이어들(126 및 128)은 생략될 수 있으며, 실시 예에 따라 제3 레이어(126)를 컨벌루션 레이어로 사용하고, 풀링레이어의 다운샘플링으로 인하여 검출결과의 해상도가 낮아지는 것을 방지하기 위해, 제4 레이어(128)는 이용되지 않을 수 있다.Subsequent layers 126 and 128 may be omitted and the third layer 126 may be used as the convolution layer according to an embodiment and the downsampling of the pooling layer may be used to prevent lower resolution of the detection result , The fourth layer 128 may not be used.

출력 레이어(130)의 출력값은 객체 검출을 위하여 분할된 이미지를 나눌 셀의 크기를 가지고, 클래스별로 추정되는 n개 객체 영역의 정보와 신뢰도를 갖도록 구성된다.The output value of the output layer 130 has a size of a cell to divide the divided image for object detection and is configured to have reliability and information of n object regions estimated for each class.

따라서, 개시된 실시 예에 따른 뉴럴 네트워크(100)는 전처리된 이미지를 입력받아 각 레이어를 통과 후 출력 레이어(130)에서 이미지 내의 객체로 추정되는 영역의 좌표정보와 신뢰도를 도출하게 된다. 이때 신뢰도가 임계치 이상인 경우만을 추출된 영역으로 선택함으로써 객체 검출과정을 완성하게 된다.Accordingly, the neural network 100 according to the disclosed embodiment receives the preprocessed image, passes through each layer, and then obtains coordinate information and reliability of an area estimated as an object in the image in the output layer 130. [ At this time, only the case where the reliability is equal to or more than the threshold value is selected as the extracted region, thereby completing the object detection process.

도 12는 일 실시예에 따른 데이터 인식부의 블록도이다.12 is a block diagram of a data recognition unit according to an embodiment.

도 12를 참조하면, 일부 실시예에 따른 데이터 인식부(420)는 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050)를 포함할 수 있다.12, the data recognition unit 420 according to some embodiments includes a data acquisition unit 1010, a preprocessing unit 1020, a recognition data selection unit 1030, a recognition result providing unit 1040, 1050 < / RTI >

데이터 획득부(1010)는 객체의 검출에 필요한 이미지를 획득할 수 있으며, 전처리부(1020)는 객체의 검출을 위해 획득된 이미지가 이용될 수 있도록, 획득된 이미지를 전처리할 수 있다. 전처리부(1020)는 후술할 인식 결과 제공부(1040)가 객체의 검출을 위하여 획득된 이미지를 이용할 수 있도록, 획득된 이미지를 기 설정된 포맷으로 가공할 수 있다. 인식 데이터 선택부(1030)는 전처리된 데이터 중에서 객체의 검출에 필요한 이미지를 선택할 수 있다. 선택된 데이터는 인식 결과 제공부(1040)에게 제공될 수 있다. The data acquisition unit 1010 can acquire an image necessary for the detection of the object and the preprocessing unit 1020 can preprocess the acquired image so that the acquired image can be used for detection of the object. The preprocessing unit 1020 may process the acquired image into a predetermined format so that the recognition result providing unit 1040, which will be described later, can use the image obtained for the object detection. The recognition data selection unit 1030 can select an image required for object detection from the preprocessed data. The selected data may be provided to the recognition result providing unit 1040.

인식 결과 제공부(1040)는 선택된 이미지를 일 실시예에 따른 뉴럴 네트워크에 적용하여 이미지 내의 객체를 검출할 수 있다. 여기에서, 뉴럴 네트워크는 전술한 바와 복수의 레이어를 포함할 수 있다. 뉴럴 네트워크에 이미지를 적용하여 이미지 내의 객체를 검출하는 방법은 도 1 내지 11을 참고하여 전술한 방법과 대응될 수 있다. The recognition result providing unit 1040 may apply the selected image to the neural network according to an embodiment to detect an object in the image. Here, the neural network may include a plurality of layers as described above. A method of detecting an object in an image by applying an image to a neural network may correspond to the method described above with reference to FIGS.

인식 결과 제공부(1040)는 이미지에 포함된 적어도 하나의 객체의 검출 정보를 제공할 수 있다. The recognition result providing unit 1040 may provide detection information of at least one object included in the image.

모델 갱신부(1050)는 인식 결과 제공부(1040)에 의해 제공되는 객체의 검출 결과에 대한 평가에 기초하여, 뉴럴 네트워크에 포함된 하나 이상의 레이어의 파라미터 등이 갱신되도록 평가에 대한 정보를 도 5를 참고하여 전술한 모델 학습부(1040)에게 제공할 수 있다. The model updating unit 1050 updates the information on the evaluation so that the parameters of one or more layers included in the neural network are updated based on the evaluation of the detection result of the object provided by the recognition result providing unit 1040, To the model learning unit 1040 described above.

한편, 데이터 인식부(1020) 내의 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050) 중 적어도 하나는, 적어도 하나의 하드웨어 칩 형태로 제작되어 디바이스에 탑재될 수 있다. 예를 들어, 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050) 중 적어도 하나는 인공 지능을 위한 전용 하드웨어 칩 형태로 제작될 수도 있고, 또는 기존의 범용 프로세서(예: CPU 또는 application processor) 또는 그래픽 전용 프로세서(예: GPU)의 일부로 제작되어 전술한 각종 디바이스에 탑재될 수도 있다.At least one of the data acquiring unit 1010, the preprocessing unit 1020, the recognition data selecting unit 1030, the recognition result providing unit 1040 and the model updating unit 1050 in the data recognizing unit 1020 is a It can be manufactured in the form of one hardware chip and mounted on the device. For example, at least one of the data acquisition unit 1010, the preprocessing unit 1020, the recognition data selection unit 1030, the recognition result providing unit 1040, and the model updating unit 1050 may be a dedicated hardware chip for artificial intelligence Or may be implemented as part of an existing general purpose processor (e.g., a CPU or an application processor) or a graphics-only processor (e.g., a GPU) and loaded on the various devices described above.

또한, 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050)는 하나의 디바이스에 탑재될 수도 있으며, 또는 별개의 디바이스들에 각각 탑재될 수도 있다. 예를 들어, 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050) 중 일부는 디바이스에 포함되고, 나머지 일부는 서버에 포함될 수 있다.The data acquisition unit 1010, the preprocessing unit 1020, the recognition data selection unit 1030, the recognition result providing unit 1040 and the model updating unit 1050 may be mounted on one device, Devices, respectively. For example, some of the data acquisition unit 1010, the preprocessing unit 1020, the recognition data selection unit 1030, the recognition result providing unit 1040, and the model updating unit 1050 are included in the device, May be included in the server.

또한, 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050) 중 적어도 하나는 소프트웨어 모듈로 구현될 수 있다. 데이터 획득부(1010), 전처리부(1020), 인식 데이터 선택부(1030), 인식 결과 제공부(1040) 및 모델 갱신부(1050) 중 적어도 하나가 소프트웨어 모듈(또는, 인스트럭션(instruction) 포함하는 프로그램 모듈)로 구현되는 경우, 소프트웨어 모듈은 컴퓨터로 읽을 수 있는 판독 가능한 비일시적 판독 가능 기록매체(non-transitory computer readable media)에 저장될 수 있다. 또한, 이 경우, 적어도 하나의 소프트웨어 모듈은 OS(Operating System)에 의해 제공되거나, 소정의 어플리케이션에 의해 제공될 수 있다. 또는, 적어도 하나의 소프트웨어 모듈 중 일부는 OS(Operating System)에 의해 제공되고, 나머지 일부는 소정의 어플리케이션에 의해 제공될 수 있다.At least one of the data acquisition unit 1010, the preprocessing unit 1020, the recognition data selection unit 1030, the recognition result providing unit 1040, and the model updating unit 1050 may be implemented as a software module. At least one of the data acquisition unit 1010, the preprocessing unit 1020, the recognition data selection unit 1030, the recognition result providing unit 1040 and the model updating unit 1050 is a software module (or an instruction) Program modules), the software modules may be stored in a computer-readable, readable non-transitory computer readable media. In this case, at least one software module may be provided by an operating system (OS) or by a predetermined application. Alternatively, some of the at least one software module may be provided by an operating system (OS), and some of the software modules may be provided by a predetermined application.

도 13은 일 실시 예에 따른 디바이스를 도시한 블록도이다.13 is a block diagram illustrating a device according to one embodiment.

도 13을 참조하면, 디바이스(300)는 신경망 학습결과 로드부(340), 신경망 생성부(342), 이미지 전처리부(344), 검출부(346) 및 검출결과 적용부(348)를 포함한다.13, the device 300 includes a neural network learning result load unit 340, a neural network generator 342, an image preprocessor 344, a detection unit 346, and a detection result application unit 348.

일 실시 예에서, 신경망 학습결과 로드부(340), 신경망 생성부(342), 이미지 전처리부(344), 검출부(346) 및 검출결과 적용부(348)는 도 4 및 도 12에 도시된 데이터 인식부(420)의 일 실시 예로서 이해될 수 있다.In one embodiment, the neural network learning result load unit 340, the neural network generating unit 342, the image preprocessing unit 344, the detection unit 346, and the detection result application unit 348 are the same as the data shown in Figs. 4 and 12 And can be understood as an embodiment of the recognition section 420. [

따라서, 도 13에 도시된 디바이스(300) 및 디바이스(300)에 포함된 신경망 학습결과 로드부(340), 신경망 생성부(342), 이미지 전처리부(344), 검출부(346) 및 검출결과 적용부(348)와 관련하여 생략된 내용이라 하더라도, 도 4 및 도 5의 데이터 학습부(410)와 관련하여 이미 설명된 내용은 도 13에 도시된 디바이스(300) 및 디바이스(300)에 포함된 신경망 학습결과 로드부(340), 신경망 생성부(342), 이미지 전처리부(344), 검출부(346) 및 검출결과 적용부(348)에도 적용될 수 있다.Accordingly, the neural network learning result load unit 340, the neural network generation unit 342, the image preprocessing unit 344, the detection unit 346, and the detection result application unit 342 included in the device 300 and the device 300 shown in FIG. The contents already described with respect to the data learning unit 410 of FIGS. 4 and 5 are the same as those of the device 300 shown in FIG. 13 and the device 300 shown in FIG. The neural network learning result load unit 340, the neural network generation unit 342, the image preprocessing unit 344, the detection unit 346, and the detection result application unit 348.

신경망 학습결과 로드부(340)는 도 6의 학습결과저장부(338)를 통하여 저장된 학습결과데이터를 읽어들인다. 이때, 읽어들이는 정보는 컨벌루션 기반 뉴럴 네트워크의 구성 정보와 각 레이어 내 뉴런간의 가중치 정보를 포함한다.The neural network learning result load unit 340 reads the learning result data stored in the learning result storage unit 338 of FIG. At this time, the read information includes configuration information of the convolution-based neural network and weight information between neurons in each layer.

신경망 생성부(342)는 상기 신경망 학습결과 로드부(340)를 통하여 읽어들인 신경망의 구성정보와 가중치 정보를 바탕으로 객체 검출을 위한 뉴럴 네트워크를 생성한다.The neural network generator 342 generates a neural network for object detection on the basis of the configuration information and the weight information of the neural network read through the neural network learning result load unit 340.

상기 신경망 학습결과 로드부(340)와 신경망 생성부(342)를 포함함에 따라, 데이터 인식부(420)는 데이터 인식부(410)와 별개의 장치로 독립적으로 동작할 수 있다.The neural network learning result load unit 340 and the neural network generator 342 may be included so that the data recognition unit 420 can operate independently of the data recognition unit 410.

이미지 전처리부(344)는 도 6의 이미지 전처리부(330)와 동일하게 입력된 이미지로부터 노이즈를 제거하고, 객체 검출을 위한 블록 분할 및 정규화 작업을 수행한다.The image preprocessing unit 344 removes noise from the input image in the same manner as the image preprocessing unit 330 in FIG. 6, and performs block segmentation and normalization for object detection.

검출부(346)는 이미지 전처리부(344)로 부터 생성된, 분할된 각 블록별 입력데이터와 신경망 생성부(342)로부터 생성된 컨벌루션 기반 뉴럴 네트워크를 이용하여 객체 검출작업을 수행한다.The detection unit 346 performs an object detection operation using the divided input data for each block generated from the image preprocessing unit 344 and the convolution based neural network generated from the neural network generating unit 342.

검출결과 적용부(348)는 검출부(346)로부터 생성된 객체 검출정보를 바탕으로, 이미지로부터 분할된 각각의 블록의 위치정보를 이용하여 객체의 위치를 산출한다.Based on the object detection information generated by the detection unit 346, the detection result application unit 348 calculates the position of the object using the position information of each block divided from the image.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with the embodiments of the present invention may be embodied directly in hardware, in software modules executed in hardware, or in a combination of both. The software module may be a random access memory (RAM), a read only memory (ROM), an erasable programmable ROM (EPROM), an electrically erasable programmable ROM (EEPROM), a flash memory, a hard disk, a removable disk, a CD- May reside in any form of computer readable recording medium known in the art to which the invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다. While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand. Therefore, it should be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

300: 디바이스
310: 프로세서
320: 메모리300: device
310: Processor
320: Memory

Claims

Acquiring an image including a detection target object;
Segmenting the obtained image into a plurality of cells;
Performing object detection on each of the plurality of divided cells using a plurality of layers included in a network for detecting the object, wherein the object detection is performed for each of the plurality of divided cells, Acquiring information about a region;
Obtaining reliability for each of the obtained one or more regions; And
Detecting the object based on the obtained reliability and the overlapping area of the obtained one or more areas; / RTI >
Wherein the dividing step comprises:
Dividing the image into one or more blocks based on a distribution of pixels included in the obtained image; And
And dividing each of the divided one or more blocks into the plurality of cells.

delete

The method according to claim 1,
Wherein the information on the at least one area includes a center point coordinate, a width, and a height of the at least one area,
The one or more center point coordinates, width,
Characterized in that the device is normalized with respect to the width and height of the block to which each of the at least one of the divided one or more blocks belongs.

The method according to claim 1,
Wherein the detection object includes one or more classes,
Wherein obtaining information about the one or more regions comprises:
And obtaining information about at least one region that is estimated to include the object for each of the one or more classes.

The method according to claim 1,
Wherein detecting the object comprises:
Obtaining, for each of the obtained one or more areas, one or more areas having a reliability higher than a predetermined threshold; And
Detecting the object using one or more regions having reliability higher than the predetermined threshold; Wherein the device detects an object.

6. The method of claim 5,
Wherein detecting the object comprises:
Calculating a position of an area including the object using an overlap region of at least one region having a reliability higher than or equal to the predetermined threshold value; Wherein the device detects an object.

The method according to claim 1,
Wherein a plurality of layers included in the network include an output layer for outputting information on the one or more areas and reliability of the one or more areas,
Wherein the output layer comprises:
A neuron representing a location, size and reliability of each of the one or more areas for each of the plurality of divided cells, and a neuron representing a class of the object.

The method according to claim 1,
The object to be detected,
Wherein the device comprises at least one of a resident registration number and other personal information included in the image.

A memory for storing one or more instructions;
And a processor executing the one or more instructions stored in the memory,
The processor executing the one or more instructions,
Acquiring an image including a detection object,
Dividing the obtained image into a plurality of cells,
Performing object detection on each of the plurality of divided cells using a plurality of layers included in a network for detecting the object, wherein the object detection is performed for each of the plurality of divided cells, Acquires information on the area,
Acquiring reliability for each of the obtained one or more areas,
Detecting the object based on the obtained reliability and the overlapping area of the obtained one or more areas,
And dividing the obtained image into a plurality of cells,
Dividing the image into one or more blocks based on a distribution of pixels included in the acquired image, and dividing each of the divided one or more blocks into the plurality of cells.

A computer program stored in a computer readable recording medium in combination with a computer which is hardware and which is capable of performing the method of claim 1. Description: