KR102484316B1

KR102484316B1 - Method and apparatus for configuring learning data set in object recognition

Info

Publication number: KR102484316B1
Application number: KR1020200171073A
Authority: KR
Inventors: 조성욱
Original assignee: 청주대학교 산학협력단
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2023-01-02
Also published as: KR20220081520A

Abstract

본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성하는 방법에 있어서, (a) 제1 데이터셋에서 저장된 객체 이미지를 호출하는 단계; (b) 제2 데이터셋에서 랜덤으로 이미지를 호출하는 단계; (c) 호출된 제2 데이터셋의 랜덤 이미지 내에 호출한 제1 데이터셋의 객체 이미지를 기설정된 방식에 따라 임의의 위치에 자동으로 배치하는 단계; 및 (d) 상기 제1 데이터셋의 객체 이미지가 제2 데이터셋의 랜덤 이미지에 자동 배치된 이미지를 저장하는 단계를 포함한다.A method for constructing a learning dataset for object recognition according to an embodiment of the present invention, comprising: (a) calling an object image stored in a first dataset; (b) randomly calling an image from a second dataset; (c) automatically arranging an object image of the called first dataset in a random image of the called second dataset at a random location according to a preset method; and (d) storing an image in which the object image of the first dataset is automatically arranged in a random image of a second dataset.

Description

Method and apparatus for configuring learning data set in object recognition}

본 발명은 딥러닝 학습에 관한 것으로, 더욱 상세하게는 객체 인식을 위한 학습 데이터셋이 충분하지 않은 경우에 학습 데이터셋을 구성하는 방법 및 장치에 관한 것이다.The present invention relates to deep learning learning, and more particularly, to a method and apparatus for constructing a training dataset when a training dataset for object recognition is not sufficient.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

최근 딥러닝 등의 대규모 심층학습기법을 이용한 객체 인식 기술이 크게 발전하고 있다. 이에 따라 클래스별로 빅데이터가 요구되지만, 학습에 이용 가능한 공개된 데이터셋의 수는 여전히 부족하고 그 종류도 다양하지 못하다.Recently, object recognition technology using large-scale deep learning techniques such as deep learning has been greatly developed. Accordingly, big data is required for each class, but the number of open datasets available for learning is still insufficient and the types are not diverse.

따라서, 연구자 등 사용자가 단지 인터넷 등에 공개된 데이터셋만을 이용해서는 연구에 필요한 충분한 데이터셋을 얻을 수 없을 뿐만 아니라 그를 이용하여 연구를 진행한 경우에는 오류 발생 가능성이 높다. 따라서, 상기한 과정을 통한 연구 결과에 대한 신뢰성도 떨어진다.Therefore, users such as researchers cannot obtain enough datasets necessary for research by using only datasets disclosed on the Internet, etc., and errors are highly likely to occur when conducting research using them. Therefore, the reliability of the research results through the above process is also reduced.

한편, 이러한 문제를 해소하기 위하여, 수동으로 데이터셋을 구축하거나 그러한 자체 데이터셋을 구축하는 경우에는 시간과 노력이 많이 들고 초기 비용과 레이블링 비용 역시 무시할 수 없어 문제가 있다.On the other hand, in order to solve this problem, manually constructing a dataset or constructing such a self-dataset requires a lot of time and effort, and there is a problem in that initial cost and labeling cost cannot be ignored.

본 발명의 일과제는, 객체 인식을 위한 학습용 데이터셋 내의 현저히 개수가 부족한 클래스의 불균형성을 문제를 해소하기 위한 학습용 데이터셋 구성 방법에 관한 프레임워크를 정의하는 것이다.An object of the present invention is to define a framework for a method for constructing a training dataset to solve the problem of an imbalance of classes with a significantly insufficient number in the training dataset for object recognition.

본 발명의 다른 일과제는, 상기 정의한 프레임워크를 통하여 학습용 데이터셋 구성을 자동화하는 것이다.Another task of the present invention is to automate the construction of training datasets through the framework defined above.

본 발명의 일실시예에 따르면, 상기 (a), (c) 및 (d) 단계는, 상기 (b) 단계에서 호출한 제2 데이터셋의 각 이미지에 대하여 상기 제1 데이터셋에서 저장한 모든 객체 이미지를 사용할 때까지 반복 수행될 수 있다.According to one embodiment of the present invention, steps (a), (c) and (d) include all images stored in the first dataset for each image of the second dataset called in step (b). It can be repeatedly performed until the object image is used.

본 발명의 일실시예에 따르면, 상기 제1 데이터셋에서 랜덤으로 이미지를 호출하는 단계; 호출된 이미지에 대하여 영역을 지정하고 배경을 제거하는 단계; 배경이 제거되고 지정된 영역에서 객체를 추출하는 단계; 및 추출된 객체를 지정 크기로 저장하는 단계를 더 포함할 수 있다.According to one embodiment of the present invention, randomly calling an image from the first dataset; designating an area for the called image and removing a background; removing the background and extracting the object from the designated area; and storing the extracted object in a designated size.

본 발명의 일실시예에 따르면, 상기 (c) 단계는, 상기 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징을 추출하는 단계; 상기 제2 데이터셋에서 랜덤으로 호출한 이미지를 평면 분할하는 단계; 분할된 각 평면 이미지에 대한 4대 영상 특징을 추출하는 단계; 추출한 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징과 추출한 제2 데이터셋에서 분할된 각 평면 이미지에 대한 4대 영상 특징의 특징량을 비교하여 기준값 초과 여부를 판단하는 단계; 상기 판단 결과 상기 제2 데이터셋에서 분할된 모든 평면 이미지에서 기준값이 초과되었으면, 상기 제2 데이터셋에서 분할된 하나의 평면 이미지를 선정하는 단계; 및 상기 선정된 제2 데이터셋의 분할된 평면 이미지에 상기 제1 데이터셋의 객체 이미지를 삽입하는 단계;를 더 포함할 수 있다.According to an embodiment of the present invention, step (c) may include extracting four major image features of an object image extracted or stored in the first dataset; dividing an image randomly called from the second dataset into planes; Extracting four major image features for each divided planar image; Determining whether a reference value is exceeded by comparing four image features of an object image extracted or stored in the extracted first dataset with feature amounts of four image features of each plane image divided in the extracted second dataset; selecting one planar image divided from the second data set when the reference value is exceeded in all the divided flat images from the second data set as a result of the determination; and inserting an object image of the first dataset into the divided plane image of the selected second dataset.

본 발명의 일실시예에 따르면, 상기 삽입하는 단계는, 상기 삽입되는 제1 데이터셋의 객체 이미지의 경계영역에 대하여 스무딩을 수행하는 단계; 및 객체 레이블 기반 상황 판단 후 상기 선정된 제2 데이터셋의 분할된 평면 이미지 내에 최종 삽입 위치를 결정하는 단계;를 더 포함할 수 있다.According to one embodiment of the present invention, the inserting may include performing smoothing on a boundary region of the object image of the first dataset to be inserted; and determining a final insertion position within the segmented planar image of the selected second dataset after determining a situation based on object labels.

본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성하는 장치는, 메모리; 및 프로세서를 포함하되, 상기 프로세서는, (a) 제1 데이터셋에서 저장된 객체 이미지를 호출하고, (b) 제2 데이터셋에서 랜덤으로 이미지를 호출하고, (c) 호출된 제2 데이터셋의 랜덤 이미지 내에 호출한 제1 데이터셋의 객체 이미지를 기설정된 방식에 따라 임의의 위치에 자동으로 배치하고, (d) 상기 제1 데이터셋의 객체 이미지가 제2 데이터셋의 랜덤 이미지에 자동 배치된 이미지를 저장한다.An apparatus constituting a learning dataset for object recognition according to an embodiment of the present invention includes a memory; And a processor, wherein the processor (a) calls an object image stored in a first dataset, (b) randomly calls an image in a second dataset, and (c) retrieves an object image stored in a second dataset. The object image of the first dataset called in the random image is automatically placed in a random location according to a preset method, and (d) the object image of the first dataset is automatically placed in the random image of the second dataset. Save the image.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 (b) 단계에서 호출한 제2 데이터셋의 각 이미지에 대하여 상기 제1 데이터셋에서 저장한 모든 객체 이미지를 사용할 때까지 상기 (a), (c) 및 (d) 과정을 반복 수행할 수 있다.According to one embodiment of the present invention, the processor performs the steps (a), until all object images stored in the first dataset are used for each image of the second dataset called in step (b). (c) and (d) may be repeatedly performed.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 제1 데이터셋에서 랜덤으로 이미지를 호출하고, 호출된 이미지에 대하여 영역을 지정하고 배경을 제거하고, 배경이 제거되고 지정된 영역에서 객체를 추출하고, 추출된 객체를 지정 크기로 상기 메모리에 저장할 수 있다.According to one embodiment of the present invention, the processor randomly calls an image from the first dataset, designates a region for the called image, removes a background, and extracts an object from the region where the background is removed and designated. And, the extracted object can be stored in the memory in a designated size.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 (c) 과정에서, 상기 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징을 추출하고, 상기 제2 데이터셋에서 랜덤으로 호출한 이미지를 평면 분할하고, 분할된 각 평면 이미지에 대한 4대 영상 특징을 추출하고, 추출한 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징과 추출한 제2 데이터셋에서 분할된 각 평면 이미지에 대한 4대 영상 특징의 특징량을 비교하여 기준값 초과 여부를 판단하되, 상기 판단 결과 상기 제2 데이터셋에서 분할된 모든 평면 이미지에서 기준값이 초과되었으면, 상기 제2 데이터셋에서 분할된 하나의 평면 이미지를 선정하고, 상기 선정된 제2 데이터셋의 분할된 평면 이미지에 상기 제1 데이터셋의 객체 이미지를 삽입할 수 있다.According to one embodiment of the present invention, in the step (c), the processor extracts four major image features of the object image extracted or stored in the first dataset, and randomly calls them from the second dataset. One image is plane-segmented, four major image features for each divided plane image are extracted, four major image features for object images extracted or stored in the extracted first dataset and each plane divided in the extracted second dataset It is determined whether or not the reference value is exceeded by comparing the feature amount of the four major image features of the image. As a result of the determination, if the reference value is exceeded in all plane images divided in the second dataset, one divided in the second dataset is determined. A plane image may be selected, and an object image of the first dataset may be inserted into the divided plane image of the selected second dataset.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 삽입하는 과정에서, 상기 삽입되는 제1 데이터셋의 객체 이미지의 경계영역에 대하여 스무딩을 수행하고, 객체 레이블 기반 상황 판단 후 상기 선정된 제2 데이터셋의 분할된 평면 이미지 내에 최종 삽입 위치를 결정할 수 있다.According to an embodiment of the present invention, in the inserting process, the processor performs smoothing on a boundary area of the object image of the first dataset to be inserted, determines the situation based on the object label, and then determines the second selected second data set. The final insertion position can be determined within the segmented planar image of the dataset.

본 발명의 다양한 실시예에 따르면, 다음과 같은 효과가 있다.According to various embodiments of the present invention, there are the following effects.

첫째, 공개된 학습용 데이터셋을 조합하여 필요한 연구를 위한 다양한 새로운 학습용 데이터셋을 구축할 수 있는 효과가 있다.First, there is an effect of constructing various new learning datasets for necessary research by combining open learning datasets.

둘째, 학습용 자체 데이터셋 구축 시 소요되는 초기 비용과 레이블링 비용을 절감할 수 있는 효과가 있다.Second, it has the effect of reducing the initial cost and labeling cost required for building your own dataset for learning.

도 1은 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템의 구성도이다.
도 2는 본 발명의 일실시예에 따른 컴퓨팅 장치의 구성 블록도이다.
도 3은 본 발명의 일실시예에 따른 컴퓨팅 장치에서 데이터셋 구성 방법을 설명하기 위해 도시한 순서도이다.
도 4는 본 발명의 일실시예에 따른 컴퓨팅 장치에서 객체 배치 자동화 방법을 설명하기 위해 도시한 순서도이다.1 is a configuration diagram of a learning dataset construction system for object recognition according to an embodiment of the present invention.
2 is a configuration block diagram of a computing device according to an embodiment of the present invention.
3 is a flowchart illustrating a method of constructing a dataset in a computing device according to an embodiment of the present invention.
4 is a flowchart illustrating a method of automating object arrangement in a computing device according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, or substitutes included in the spirit and technical scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. '및/또는' 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term 'and/or' includes a combination of a plurality of related recited items or any one of a plurality of related recited items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에서, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no intervening element exists.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서 "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. It should be understood that terms such as "include" or "having" in this application do not exclude in advance the possibility of existence or addition of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification. .

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해서 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

본 명세서에서는 본 발명에 따라 본 발명의 일과제는, 객체 인식을 위한 학습용 데이터셋 내의 현저히 개수가 부족한 클래스의 불균형성을 문제를 해소하기 위한 학습용 데이터셋 구성 방법에 관한 프레임워크(framework)를 정의하고, 상기 정의한 프레임워크를 통하여 학습용 데이터셋 구성을 자동화하는 다양한 실시예를 개시한다.In the present specification, according to the present invention, the task of the present invention is to define a framework for a method for constructing a training dataset to solve the problem of an imbalance of classes with a significantly insufficient number in the training dataset for object recognition. and discloses various embodiments of automating the construction of training datasets through the framework defined above.

본 발명의 이해를 돕고 설명의 편의를 위하여, 이하에서는 2개의 데이터셋을 예로 하여 설명하였으나, 본 발명은 이에 한정되지 않고 3개 이상의 데이터셋이 이용될 수 있다. 이를 통해 충분한 학습용 데이터셋을 구성 및 확보하여 필요한 연구에 활용할 수 있다. For the convenience of understanding and description of the present invention, two datasets are described below as examples, but the present invention is not limited thereto and three or more datasets may be used. Through this, it is possible to construct and secure sufficient training datasets and use them for necessary research.

상술한 본 발명에 따른 실시예에서 딥러닝과 관련하여 컨볼루션 신경망(CNN, Convolution Neural Network) 등에 대한 설명은 공지의 기술을 참조하고 별도 본 명세서에서 상세 설명은 생략하였다.In the above-described embodiment according to the present invention, a description of a convolution neural network (CNN), etc. in relation to deep learning refers to known technologies, and detailed descriptions are omitted herein.

도 1은 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템(10)의 구성도이다.1 is a configuration diagram of a learning dataset construction system 10 for object recognition according to an embodiment of the present invention.

도 2는 본 발명의 일실시예에 따른 컴퓨팅 장치(120)의 구성 블록도이다.2 is a block diagram of a computing device 120 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템(10)은, 소스(110)와 컴퓨팅 장치(120)를 포함할 수 있다.Referring to FIG. 1 , a learning dataset construction system 10 for object recognition according to an embodiment of the present invention may include a source 110 and a computing device 120 .

소스(110)는, 객체 인식을 위한 공개된 딥러닝 학습용 데이터셋을 저장한다. 실시예에 따라, 소스(110)는 공개된 데이터베이스, 인터넷 등 다양할 수 있다.The source 110 stores open deep learning training datasets for object recognition. Depending on the embodiment, the source 110 may be various, such as a public database or the Internet.

도 1에서는 비록 하나의 소스만을 도시하였으나, 본 발명은 이에 한정되지 않고, 복수의 소스가 본 발명에 따른 객체 인식을 위하여 이용될 수 있다.Although only one source is shown in FIG. 1, the present invention is not limited thereto, and a plurality of sources may be used for object recognition according to the present invention.

컴퓨팅 장치(120)는 소스(110)로부터 객체 인식을 위한 공개된 딥러닝 학습용 데이터셋을 획득 및 저장하고, 저장된 학습용 데이터셋에 기초하여 새로운 학습용 데이터셋을 구성할 수 있다.The computing device 120 may obtain and store an open deep learning training dataset for object recognition from the source 110 and construct a new training dataset based on the stored training dataset.

실시예에 따라, 컴퓨팅 장치(120)는 단말이거나 단말에 포함된 일구성일 수 있다. Depending on the embodiment, the computing device 120 may be a terminal or a component included in the terminal.

실시예에 따라, 컴퓨팅 장치(120)는 공개된 학습용 데이터셋을 기초로 새롭게 구성한 학습용 데이터셋을 단말로 전송하는 서버(server)일 수 있다.According to an embodiment, the computing device 120 may be a server that transmits a newly configured learning dataset based on an open learning dataset to a terminal.

컴퓨팅 장치(120)는 소스(110)로부터 획득하여 저장한 공개 학습용 데이터셋에 기반하여 구성하여 저장한 학습용 데이터셋을 결합하여 객체 인식을 위한 새로운 딥러닝 학습용 데이터셋을 구성하고 저장할 수 있다. 실시예에 따라, 컴퓨팅 장치(120)는 전술한 과정을 공개 및/또는 저장한 학습용 데이터셋에 대하여 반복 수행하여 연구에 필요한 충분할 데이터셋을 확보할 수 있으며, 이러한 객체 인식을 위한 자체 딥러닝 학습용 데이터셋을 구축함으로써 초기 비용뿐만 아니라 레이블링(labeling) 비용까지 절감할 수 있다.The computing device 120 may configure and store a new deep learning training dataset for object recognition by combining the training dataset configured and stored based on the open learning dataset obtained and stored from the source 110 . Depending on the embodiment, the computing device 120 may secure sufficient datasets for research by repeatedly performing the above-described process on open and/or stored learning datasets, and for self-deep learning learning for object recognition. By building a dataset, not only the initial cost but also the labeling cost can be reduced.

컴퓨팅 장치(120)는 도 3 및 4에서 후술하는 본 발명에 따른 객체 인식을 위한 학습용 데이터셋 구성 과정을 자동으로 실행되도록 프레임워크를 정의하고, 정의된 프레임워크를 통하여 다양하고 충분한 객체 인식을 위한 딥러닝 학습용 데이터셋을 구축할 수 있다.The computing device 120 defines a framework to automatically execute the process of constructing a learning dataset for object recognition according to the present invention described later in FIGS. 3 and 4, and for various and sufficient object recognition through the defined framework. You can build a dataset for deep learning training.

도 2를 참조하면, 본 발명의 일실시예에 따른 컴퓨팅 장치(120)는, 메모리(207)와 프로세서(processor)를 포함할 수 있다. 실시예에 따라, 상기 프로세서는, 통신 인터페이스(201), 영역지정/배경제거모듈(202), 객체추출모듈(203), 객체배치모듈(204), 학습모듈(205), 및 제어모듈(206)을 포함할 수 있다. 다만, 본 발명은 이에 한정되지 않고 도시되지 않은 적어도 하나의 구성요소가 더 포함되거나 반대일 수 있다. 실시예에 따라, 상기한 구성요소 중 둘이상의 구성요소가 하나의 구성요소로 구현되거나 반대일 수 있다.Referring to FIG. 2 , a computing device 120 according to an embodiment of the present invention may include a memory 207 and a processor. Depending on the embodiment, the processor includes a communication interface 201, a region designation/background removal module 202, an object extraction module 203, an object arrangement module 204, a learning module 205, and a control module 206 ) may be included. However, the present invention is not limited thereto, and at least one component not shown may be further included or vice versa. Depending on the embodiment, two or more of the above components may be implemented as one component or vice versa.

실시예에 따라, 도 2에 도시된 메모리(207)는 컴퓨팅 장치(120)의 내장 메모리일 수도 있고, 외장 메모리나 가상 메모리일 수 있다. 실시예에 따라, 도 2에서는 비록 하나의 메모리를 도시하였으나, 복수의 메모리가 본 발명에 이용될 수 있다. 이 때, 모든 메모리가 내장 또는 외장될 필요는 없다. 한편, 복수의 메모리가 본 발명에 이용되는 경우에 각 메모리의 타입, 용량 등이 동일할 필요는 없다.Depending on the embodiment, the memory 207 shown in FIG. 2 may be an internal memory of the computing device 120, an external memory, or a virtual memory. Depending on the embodiment, although a single memory is shown in FIG. 2, a plurality of memories may be used in the present invention. At this time, not all memories need to be internal or external. Meanwhile, when a plurality of memories are used in the present invention, the types and capacities of each memory need not be the same.

통신 인터페이스(201)는 소스(110)와의 통신 환경을 제공하고 데이터 커뮤니케이션이 이루어질 수 있도록 지원할 수 있다.The communication interface 201 may provide a communication environment with the source 110 and support data communication.

본 발명에 따른 프로세서는, 객체 인식을 위한 딥러닝 학습용 데이터셋 내 현저히 개수가 부족한 클래스(Class)의 불균형 문제를, 해당 클래스가 포함된 다른 이미지에서 배경 제거 후, 보유한 다른 학습용 데이터셋의 일부에 삽입하도록 정의된 프레임워크(framework)를 통하여 해소하고자 한다.The processor according to the present invention solves the imbalance problem of a class with a significantly insufficient number in the deep learning training dataset for object recognition, after removing the background from other images containing the corresponding class, and then adding it to a part of the other training dataset. It is intended to be resolved through a framework defined to be inserted.

상기에서, 다른 학습용 데이터셋은 예를 들어, 상기에서 해당 클래스를 획득한 이후에 보유한 데이터셋일 수 있다.In the above, the other training dataset may be, for example, a dataset held after acquiring the corresponding class in the above.

도 2의 각 구성에 대한 상세 설명은 도 3 및 도 4의 순서도와 함께 설명한다.A detailed description of each component of FIG. 2 will be described together with the flow charts of FIGS. 3 and 4 .

도 3은 본 발명의 일실시예에 따른 컴퓨팅 장치(120)에서 데이터셋 구성 방법을 설명하기 위해 도시한 순서도이다.3 is a flowchart illustrating a method of constructing a dataset in the computing device 120 according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 따른 컴퓨팅 장치(120)에서 객체 배치 자동화 방법을 설명하기 위해 도시한 순서도이다.4 is a flowchart illustrating a method of automating object arrangement in the computing device 120 according to an embodiment of the present invention.

도 2 및 도 3을 참조하여, 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성 방법을 설명하면, 다음과 같다.Referring to FIGS. 2 and 3, a method of constructing a learning dataset for object recognition according to an embodiment of the present invention is described as follows.

이때, 프로세서는 적어도 2개의 객체 인식을 위한 학습용 데이터셋 폴더(folder)를 미리 준비할 수 있다. 다만, 본 발명은 이에 한정되지 않고, 3개 이상의 학습용 데이터셋 폴더가 이용될 수 있다.At this time, the processor may prepare in advance a training dataset folder for recognizing at least two objects. However, the present invention is not limited thereto, and three or more training dataset folders may be used.

먼저, 준비된 데이터셋 A 폴더에 대하여 다음과 같은 과정을 수행할 수 있다.First, the following process can be performed for the prepared dataset A folder.

프로세서의 영역지정/배경제거모듈(202)은, 데이터셋 A 폴더에서 하나의 이미지를 선택(또는 호출)하고 선택된 이미지에 영역을 지정하고, 지정된 영역의 배경을 제거할 수 있다(S102). 실시예에 따라, 상기 이미지는, 폴더로부터 랜덤(random)하게 선택되거나 미리 설정된 기준에 따라 선택될 수 있다. 실시예에 따라, 영역 지정 및 지정된 영역에 대한 배경 제거는, 그래프컷 알고리즘에 기초하여 이루어질 수 있으나, 본 발명은 이에 한정되는 것은 아니다.The area designation/background removal module 202 of the processor may select (or call) one image from the dataset A folder, designate an area to the selected image, and remove the background of the specified area (S102). Depending on the embodiment, the image may be randomly selected from a folder or may be selected according to a preset criterion. According to embodiments, designation of a region and removal of a background of the designated region may be performed based on a graph cut algorithm, but the present invention is not limited thereto.

프로세서의 객체추출모듈(203)은, 배경이 제거된 선택된 이미지의 지정된 영역에서 객체를 추출할 수 있다(S104).The object extraction module 203 of the processor may extract an object from a designated area of the selected image from which the background has been removed (S104).

메모리(207)는, 추출된 객체 이미지를 지정 크기로 저장할 수 있다(S106).The memory 207 may store the extracted object image in a designated size (S106).

프로세서는 학습용 데이터셋 A에 대하여 S106 과정을 통하여 지정 크기로 저장한 객체 이미지를 호출할 수 있다(S108).The processor may call the object image stored in the designated size through the process of S106 with respect to the training dataset A (S108).

프로세서는 학습용 데이터셋 B로부터 하나의 이미지를 호출할 수 있다(S110). 실시예에 따라, 상기 호출은 랜덤으로 이루어지거나 미리 설정된 바에 따라 이루어질 수 있다.The processor may call one image from the training dataset B (S110). Depending on the embodiment, the call may be made randomly or according to a preset bar.

프로세서의 객체배치모듈(204)은, 호출된 학습용 데이터셋 B의 랜덤 이미지 내에 기호출한 학습용 데이터셋 A의 객체 이미지를 배치할 수 있다(S112). S112의 배치 과정에서 배치 영역에 대한 결정은 자동으로 수행될 수 있으며, 이에 대한 상세 설명은 도 4에서 상세하게 설명하고, 여기서 상세 설명은 생략한다.The object arrangement module 204 of the processor may arrange the object image of the commonly called training dataset A in the called random image of the training dataset B (S112). In the arrangement process of S112, the determination of the arrangement area may be automatically performed, and a detailed description thereof is described in detail in FIG. 4, and a detailed description thereof is omitted here.

프로세서의 제어모듈(206)은, 데이터셋 A의 객체 이미지가 데이터셋 B의 랜덤 이미지에 자동 배치된 이미지를 메모리(207)에 저장할 수 있다(S114).The control module 206 of the processor may store in the memory 207 an image in which an object image of dataset A is automatically arranged in a random image of dataset B (S114).

프로세서는, 학습용 데이터셋 B에서 호출된 이미지에 대하여 학습용 데이터셋 A가 모두 사용되었는지 판단하여(S116), 모두 사용될 때까지 상기한 과정을 반복한다.The processor determines whether the training dataset A is all used for the images called from the training dataset B (S116), and repeats the above process until all are used.

실시예에 따라, 프로세서는, 학습용 데이터셋 B에서 호출 가능한 모든 이미지 대하여 전술한 과정을 호출된 이미지 단위로 반복 수행할 수 있다.Depending on the embodiment, the processor may repeatedly perform the above-described process in units of called images for all images that can be called in the training dataset B.

프로세서의 학습모듈(205)은, 전술한 과정을 통하여 새롭게 구성되는 학습용 데이터셋을 딥러닝 학습에 이용할 수 있다.The learning module 205 of the processor may use the training dataset newly constructed through the above process for deep learning learning.

도 4에서는, 전술한 도 3의 S112 단계, 즉 컴퓨팅 장치(120)에서 객체 배치 자동화 방법을 더욱 상세하게 설명한다.In FIG. 4 , the object arrangement automation method in step S112 of FIG. 3 , ie, the computing device 120 , will be described in more detail.

도 4를 참조하면, 프로세서는, 도 3의 S108 과정에서, 학습용 데이터셋 A에서 추출 또는 저장된 객체 이미지에 대한 영상 특징을 추출할 수 있다(S202). 실시예에 따라, 추출되는 영상 특징은 점, 선, 면, 텍스처에 대한 4대 영상 특징일 수 있다.Referring to FIG. 4 , in step S108 of FIG. 3 , the processor may extract image features of an object image extracted or stored in training dataset A (S202). Depending on the embodiment, the image features to be extracted may be four major image features of points, lines, planes, and textures.

한편, 프로세서는, 도 3의 S110 과정에서, 학습용 데이터셋 B에서 호출한 이미지를 분할할 수 있다(S204). 실시예에 따라, 프로세서는, 상기 호출한 이미지를 평면으로 4분할할 수 있다. 다만, 본 발명은 이에 한정되는 것은 아니다.Meanwhile, in step S110 of FIG. 3, the processor may divide an image called from dataset B for training (S204). Depending on the embodiment, the processor may divide the called image into four planes. However, the present invention is not limited thereto.

프로세서는, 분할된 각 평면 이미지에 대하여 영상 특징을 추출할 수 있다(S206). 이때, 상기 추출되는 영상 특징은 후술하는 바와 같이, 영상 특징량 비교를 위하여 전술한 학습용 데이터셋 A와 대응되도록 점, 선, 면, 텍스처 4대 영상 특징을 추출할 수 있다.The processor may extract image features for each divided planar image (S206). In this case, as described later, the four major image features of points, lines, planes, and textures may be extracted to correspond to the above-described training dataset A for image feature comparison.

프로세서는, 학습용 데이터셋 A의 객체 이미지에 대하여 추출한 영상 특징과 학습용 데이터셋 B에서 추출한 분할된 평면 이미지에 대한 영상 특징의 특징량을 비교(S210)하여 기준값 초과 여부를 판단할 수 있다(S212).The processor may determine whether the reference value is exceeded by comparing the feature amount of the image feature extracted from the object image of the training dataset A and the image feature of the segmented plane image extracted from the training dataset B (S210) (S212). .

프로세서는, S212 과정에서 판단한 결과 학습용 데이터셋 B에서 분할된 평면 이미지에서 기준값을 초과되었으면, 학습용 데이터셋 B에서 분할된 하나의 평면 이미지를 선정할 수 있다. 실시예에 따라, 상기 선정은 분할된 모든 평면에 대하여 순차로 이루어질 수도 있다. 프로세서는, 상기 선정된 학습용 데이터셋 B의 분할된 평면 이미지에 학습용 데이터셋 A의 객체 이미지를 삽입할 수 있다(S216).As a result of the determination in step S212, if the planar image divided from the learning dataset B exceeds the reference value, the processor may select one planar image divided from the training dataset B. Depending on the embodiment, the selection may be made sequentially with respect to all divided planes. The processor may insert the object image of the training dataset A into the divided plane image of the selected training dataset B (S216).

프로세서는, 삽입되는 학습용 데이터셋 A의 객체 이미지의 경계영역에 대하여 스무딩(smoothing)을 수행할 수 있다(S218).The processor may perform smoothing on the boundary area of the object image of the training dataset A to be inserted (S218).

프로세서는, 객체 레이블(object label) 기반 상황 판단 후 선정된 학습용 데이터셋 B의 분할된 평면 이미지 내에 학습용 데이터셋 A의 객체 이미지의 최종 삽입 위치를 결정하고 삽입 완료할 수 있다(S220).After determining the situation based on the object label, the processor may determine the final insertion position of the object image of the training dataset A into the divided plane image of the selected learning dataset B and complete the insertion (S220).

상기한 과정을 통하여 객체 배치 자동화 과정이 이루어질 수 있다.An object arrangement automation process may be performed through the above process.

실시예에 따라, 상기 S212 과정 이후에 대하여는 분할된 각 평면 이미지 단위로 수행될 수 있다. Depending on the embodiment, steps after step S212 may be performed in units of each divided planar image.

프로세서는, S212 과정에서 분할된 모든 평면 이미지가 기준값을 초과하지 못하였으면, S204 과정에서 분할하였던 영상을 재분할할 수 있다(S214). 만약 S204 단계에서 평면 4분할을 수행하였다면, 이 경우에는 평면 n분할하되, 상기 n분할을 4분할 이상(예를 들어, 8분할, 16분할 등)을 의미할 수 있다. 실시예에 따라, 프로세서는, S214 과정의 영상 평면 재분할이 일정 반복 횟수 이상인지 판단하고(S208), 판단 결과에 따라 일정 반복 횟수 이상이면, 학습용 데이터셋 B로부터 새롭게 랜덤으로 호출한 이미지에 대하여 전술한 과정을 수행하도록 할 수 있다.If all the divided flat images in step S212 do not exceed the reference value, the processor may re-segment the image divided in step S204 (S214). If the 4-division is performed in step S204, in this case, the plane is divided into n, but the n-division may mean 4 or more divisions (eg, 8 divisions, 16 divisions, etc.). Depending on the embodiment, the processor determines whether the image plane re-segmentation in step S214 is more than a certain number of iterations (S208), and if it is more than a certain number of iterations according to the determination result, the previously described image is newly randomly called from the training dataset B. You can do one process.

도 3 내지 4에서는 각 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 발명의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 발명의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 3 내지 4에 기재된 순서를 변경하여 실행하거나 각 과정 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 3 내지 4는 시계열적인 순서로 한정되는 것은 아니다.Although each process is described as being sequentially executed in FIGS. 3 to 4, this is merely an example of the technical idea of one embodiment of the present invention. In other words, those skilled in the art to which an embodiment of the present invention pertains may change and execute the order described in FIGS. 3 and 4 without departing from the essential characteristics of an embodiment of the present invention, or at least one of each process. Since it will be possible to apply various modifications and variations by executing the process in parallel, FIGS. 3 to 4 are not limited to a time-series order.

한편, 도 3 내지 4에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Meanwhile, the processes shown in FIGS. 3 and 4 can be implemented as computer readable codes on a computer readable recording medium. A computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. That is, computer-readable recording media include magnetic storage media (eg, ROM, floppy disk, hard disk, etc.), optical reading media (eg, CD-ROM, DVD, etc.) and carrier waves (eg, Internet Transmission through) and the same storage medium. In addition, the computer-readable recording medium may be distributed to computer systems connected through a network to store and execute computer-readable codes in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present embodiment, and various modifications and variations can be made to those skilled in the art without departing from the essential characteristics of the present embodiment. Therefore, the present embodiments are not intended to limit the technical idea of the present embodiment, but to explain, and the scope of the technical idea of the present embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

10: 객체 인식을 위한 학습 데이터셋 구성 시스템
110: 소스
120: 컴퓨팅 장치
201: 통신 인터페이스
202: 영역지정/배경제거모듈
203: 객체추출모듈
204: 객체배치모듈
205: 학습모듈
206: 제어모듈
207: 메모리10: Training dataset construction system for object recognition
110: source
120: computing device
201: communication interface
202: area designation / background removal module
203: object extraction module
204: object arrangement module
205: learning module
206: control module
207: memory

Claims

A method for constructing a training dataset for object recognition performed by a computing device,
(a) calling an object image stored in a first dataset;
(b) randomly calling an image from a second dataset;
(c) automatically arranging an object image of the called first dataset in a random image of the called second dataset at a random location according to a preset method; and
(d) storing an image in which an object image of the first dataset is automatically arranged in a random image of a second dataset,
In step (c),
extracting four major image features, which are features of points, lines, planes, and textures of object images extracted or stored in the first dataset;
dividing an image randomly called from the second dataset into planes;
Extracting four major image features for each divided planar image;
Determining whether a reference value is exceeded by comparing four major image features of an object image extracted or stored in the extracted first dataset with feature amounts of four major image features of each plane image divided in the extracted second dataset ;
selecting one planar image divided from the second data set when the reference value is exceeded in all the divided flat images from the second data set as a result of the determination; and
Inserting an object image of the first dataset into the divided plane image of the selected second dataset;
How to construct the training dataset.

According to claim 1,
Steps (a), (c) and (d),
The learning dataset construction method, which is repeatedly performed for each image of the second dataset called in step (b) until all object images stored in the first dataset are used.

According to claim 1,
randomly calling an image from the first dataset;
designating an area for the called image and removing a background;
removing the background and extracting the object from the designated area; and
A method for constructing a training dataset, further comprising storing the extracted object in a specified size.

delete

According to claim 1,
The step of inserting is
performing smoothing on a boundary area of the object image of the first dataset to be inserted; and
And determining a final insertion position in the divided plane image of the selected second dataset.

In the apparatus for constructing a learning dataset for object recognition,
Memory; and
including a processor;
the processor,
(a) calling an object image stored in the first dataset;
(b) randomly calling an image from the second dataset;
(c) automatically placing the object image of the called first dataset in a random image of the called second dataset at a random location according to a preset method;
(d) storing an image in which the object image of the first dataset is automatically arranged in a random image of the second dataset,
The processor, in the step (c), extracts four major image features, which are features of points, lines, planes, and textures of the object image extracted or stored in the first dataset, and random from the second dataset. The image called by is divided into planes, four major image features for each divided plane image are extracted, and four major image features for object images extracted or stored in the extracted first dataset and the extracted second dataset are extracted. It is determined whether the feature amount of the four major image features for each divided flat image is compared to determine whether the reference value is exceeded. As a result of the determination, if the reference value is exceeded in all the divided flat images in the second dataset, in the second dataset Selecting one divided plane image and inserting an object image of the first dataset into the divided plane image of the selected second dataset,
Training dataset configuration device.

According to claim 6,
the processor,
For each image of the second dataset called in step (b), repeating the steps (a), (c) and (d) until all object images stored in the first dataset are used. Training dataset configuration device.

According to claim 6,
the processor,
Randomly calling an image from the first dataset,
Designate an area for the called image and remove the background,
Background is removed and objects are extracted from the specified area,
A learning dataset construction device that stores the extracted object in the memory in a specified size.

delete

According to claim 6,
The processor, in the process of inserting,
Smoothing is performed on the boundary area of the object image of the first dataset to be inserted;
A learning dataset construction device for determining a final insertion position in the divided plane image of the selected second dataset.