KR20220081520A

KR20220081520A - Method and apparatus for configuring learning data set in object recognition

Info

Publication number: KR20220081520A
Application number: KR1020200171073A
Authority: KR
Inventors: 조성욱
Original assignee: 청주대학교 산학협력단
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2022-06-16
Also published as: KR102484316B1

Abstract

본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성하는 방법에 있어서, (a) 제1 데이터셋에서 저장된 객체 이미지를 호출하는 단계; (b) 제2 데이터셋에서 랜덤으로 이미지를 호출하는 단계; (c) 호출된 제2 데이터셋의 랜덤 이미지 내에 호출한 제1 데이터셋의 객체 이미지를 기설정된 방식에 따라 임의의 위치에 자동으로 배치하는 단계; 및 (d) 상기 제1 데이터셋의 객체 이미지가 제2 데이터셋의 랜덤 이미지에 자동 배치된 이미지를 저장하는 단계를 포함한다.In a method for constructing a training dataset for object recognition according to an embodiment of the present invention, the method comprising: (a) calling an object image stored in a first dataset; (b) randomly calling an image from the second dataset; (c) automatically arranging the object image of the called first dataset in the random image of the called second dataset in an arbitrary position according to a preset method; and (d) storing an image in which the object image of the first dataset is automatically placed in the random image of the second dataset.

Description

{Method and apparatus for configuring learning data set in object recognition}

본 발명은 딥러닝 학습에 관한 것으로, 더욱 상세하게는 객체 인식을 위한 학습 데이터셋이 충분하지 않은 경우에 학습 데이터셋을 구성하는 방법 및 장치에 관한 것이다.The present invention relates to deep learning learning, and more particularly, to a method and apparatus for configuring a learning dataset when the training dataset for object recognition is insufficient.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

최근 딥러닝 등의 대규모 심층학습기법을 이용한 객체 인식 기술이 크게 발전하고 있다. 이에 따라 클래스별로 빅데이터가 요구되지만, 학습에 이용 가능한 공개된 데이터셋의 수는 여전히 부족하고 그 종류도 다양하지 못하다.Recently, object recognition technology using large-scale deep learning techniques such as deep learning has been greatly developed. Accordingly, big data is required for each class, but the number of open datasets available for learning is still insufficient and the types are not diverse.

따라서, 연구자 등 사용자가 단지 인터넷 등에 공개된 데이터셋만을 이용해서는 연구에 필요한 충분한 데이터셋을 얻을 수 없을 뿐만 아니라 그를 이용하여 연구를 진행한 경우에는 오류 발생 가능성이 높다. 따라서, 상기한 과정을 통한 연구 결과에 대한 신뢰성도 떨어진다.Therefore, a user such as a researcher cannot obtain a sufficient data set necessary for a study only by using only a data set published on the Internet, and there is a high possibility of errors occurring when a study is conducted using the data set. Therefore, the reliability of the research results through the above process is also low.

한편, 이러한 문제를 해소하기 위하여, 수동으로 데이터셋을 구축하거나 그러한 자체 데이터셋을 구축하는 경우에는 시간과 노력이 많이 들고 초기 비용과 레이블링 비용 역시 무시할 수 없어 문제가 있다.On the other hand, in order to solve this problem, manually constructing a data set or constructing such a data set itself takes a lot of time and effort, and there is a problem in that the initial cost and the labeling cost cannot be ignored.

본 발명의 일과제는, 객체 인식을 위한 학습용 데이터셋 내의 현저히 개수가 부족한 클래스의 불균형성을 문제를 해소하기 위한 학습용 데이터셋 구성 방법에 관한 프레임워크를 정의하는 것이다.It is an object of the present invention to define a framework for a method of constructing a training dataset for resolving the problem of imbalance of classes that are markedly insufficient in number in a training dataset for object recognition.

본 발명의 다른 일과제는, 상기 정의한 프레임워크를 통하여 학습용 데이터셋 구성을 자동화하는 것이다.Another task of the present invention is to automate the configuration of the training dataset through the framework defined above.

본 발명의 일실시예에 따르면, 상기 (a), (c) 및 (d) 단계는, 상기 (b) 단계에서 호출한 제2 데이터셋의 각 이미지에 대하여 상기 제1 데이터셋에서 저장한 모든 객체 이미지를 사용할 때까지 반복 수행될 수 있다.According to one embodiment of the present invention, Steps (a), (c) and (d) may be repeatedly performed for each image of the second dataset called in step (b) until all object images stored in the first dataset are used. can

본 발명의 일실시예에 따르면, 상기 제1 데이터셋에서 랜덤으로 이미지를 호출하는 단계; 호출된 이미지에 대하여 영역을 지정하고 배경을 제거하는 단계; 배경이 제거되고 지정된 영역에서 객체를 추출하는 단계; 및 추출된 객체를 지정 크기로 저장하는 단계를 더 포함할 수 있다.According to an embodiment of the present invention, calling an image randomly from the first dataset; designating an area for the called image and removing the background; removing the background and extracting an object from the designated area; and storing the extracted object in a specified size.

본 발명의 일실시예에 따르면, 상기 (c) 단계는, 상기 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징을 추출하는 단계; 상기 제2 데이터셋에서 랜덤으로 호출한 이미지를 평면 분할하는 단계; 분할된 각 평면 이미지에 대한 4대 영상 특징을 추출하는 단계; 추출한 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징과 추출한 제2 데이터셋에서 분할된 각 평면 이미지에 대한 4대 영상 특징의 특징량을 비교하여 기준값 초과 여부를 판단하는 단계; 상기 판단 결과 상기 제2 데이터셋에서 분할된 모든 평면 이미지에서 기준값이 초과되었으면, 상기 제2 데이터셋에서 분할된 하나의 평면 이미지를 선정하는 단계; 및 상기 선정된 제2 데이터셋의 분할된 평면 이미지에 상기 제1 데이터셋의 객체 이미지를 삽입하는 단계;를 더 포함할 수 있다.According to an embodiment of the present invention, the step (c) may include: extracting four major image features for an object image extracted or stored in the first dataset; dividing an image randomly called from the second dataset into a plane; extracting four major image features for each divided plane image; determining whether a reference value is exceeded by comparing the feature amount of the four image features of the extracted or stored object image from the extracted first dataset with the feature amounts of the four image features of each flat image divided from the extracted second dataset; selecting one divided planar image from the second dataset when a reference value is exceeded in all of the planar images divided in the second dataset as a result of the determination; and inserting the object image of the first dataset into the divided planar image of the selected second dataset.

본 발명의 일실시예에 따르면, 상기 삽입하는 단계는, 상기 삽입되는 제1 데이터셋의 객체 이미지의 경계영역에 대하여 스무딩을 수행하는 단계; 및 객체 레이블 기반 상황 판단 후 상기 선정된 제2 데이터셋의 분할된 평면 이미지 내에 최종 삽입 위치를 결정하는 단계;를 더 포함할 수 있다.According to an embodiment of the present invention, the inserting includes: performing smoothing on a boundary region of the object image of the first data set to be inserted; and determining a final insertion position in the divided planar image of the selected second dataset after determining the object label-based situation.

본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성하는 장치는, 메모리; 및 프로세서를 포함하되, 상기 프로세서는, (a) 제1 데이터셋에서 저장된 객체 이미지를 호출하고, (b) 제2 데이터셋에서 랜덤으로 이미지를 호출하고, (c) 호출된 제2 데이터셋의 랜덤 이미지 내에 호출한 제1 데이터셋의 객체 이미지를 기설정된 방식에 따라 임의의 위치에 자동으로 배치하고, (d) 상기 제1 데이터셋의 객체 이미지가 제2 데이터셋의 랜덤 이미지에 자동 배치된 이미지를 저장한다.An apparatus for configuring a learning dataset for object recognition according to an embodiment of the present invention includes: a memory; and a processor, wherein the processor (a) calls a stored object image from a first dataset, (b) calls an image randomly from a second dataset, and (c) calls The object image of the first dataset called in the random image is automatically placed in an arbitrary position according to a preset method, and (d) the object image of the first dataset is automatically placed in the random image of the second dataset. Save the image.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 (b) 단계에서 호출한 제2 데이터셋의 각 이미지에 대하여 상기 제1 데이터셋에서 저장한 모든 객체 이미지를 사용할 때까지 상기 (a), (c) 및 (d) 과정을 반복 수행할 수 있다.According to one embodiment of the present invention, the processor (a) until all the object images stored in the first dataset are used for each image of the second dataset called in step (b). (c) and (d) may be repeated.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 제1 데이터셋에서 랜덤으로 이미지를 호출하고, 호출된 이미지에 대하여 영역을 지정하고 배경을 제거하고, 배경이 제거되고 지정된 영역에서 객체를 추출하고, 추출된 객체를 지정 크기로 상기 메모리에 저장할 수 있다.According to an embodiment of the present invention, the processor randomly calls an image from the first dataset, designates an area for the called image, removes the background, and removes the background and extracts an object from the designated area and the extracted object may be stored in the memory in a specified size.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 (c) 과정에서, 상기 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징을 추출하고, 상기 제2 데이터셋에서 랜덤으로 호출한 이미지를 평면 분할하고, 분할된 각 평면 이미지에 대한 4대 영상 특징을 추출하고, 추출한 제1 데이터셋에서 추출 또는 저장된 객체 이미지에 대한 4대 영상 특징과 추출한 제2 데이터셋에서 분할된 각 평면 이미지에 대한 4대 영상 특징의 특징량을 비교하여 기준값 초과 여부를 판단하되, 상기 판단 결과 상기 제2 데이터셋에서 분할된 모든 평면 이미지에서 기준값이 초과되었으면, 상기 제2 데이터셋에서 분할된 하나의 평면 이미지를 선정하고, 상기 선정된 제2 데이터셋의 분할된 평면 이미지에 상기 제1 데이터셋의 객체 이미지를 삽입할 수 있다.According to an embodiment of the present invention, the processor, in the process (c), Extracts four image features for an object image extracted or stored in the first dataset, plane-segments a randomly called image from the second dataset, and extracts four image features for each divided plane image and comparing the feature amounts of the four image features of the extracted or stored object image from the extracted first dataset with the four image features for each flat image divided from the extracted second dataset to determine whether the reference value is exceeded, As a result of the determination, if the reference value is exceeded in all the plane images divided in the second dataset, one plane image divided in the second dataset is selected, and the divided plane image of the selected second dataset is added to the An object image of the first dataset may be inserted.

본 발명의 일실시예에 따르면, 상기 프로세서는, 상기 삽입하는 과정에서, 상기 삽입되는 제1 데이터셋의 객체 이미지의 경계영역에 대하여 스무딩을 수행하고, 객체 레이블 기반 상황 판단 후 상기 선정된 제2 데이터셋의 분할된 평면 이미지 내에 최종 삽입 위치를 결정할 수 있다.According to an embodiment of the present invention, the processor performs smoothing on the boundary region of the object image of the first dataset to be inserted during the insertion process, and after determining the object label-based situation, the selected second It is possible to determine the final insertion position within the segmented flat image of the dataset.

본 발명의 다양한 실시예에 따르면, 다음과 같은 효과가 있다.According to various embodiments of the present invention, the following effects are obtained.

첫째, 공개된 학습용 데이터셋을 조합하여 필요한 연구를 위한 다양한 새로운 학습용 데이터셋을 구축할 수 있는 효과가 있다.First, it has the effect of constructing a variety of new learning datasets for necessary research by combining publicly available learning datasets.

둘째, 학습용 자체 데이터셋 구축 시 소요되는 초기 비용과 레이블링 비용을 절감할 수 있는 효과가 있다.Second, there is an effect that can reduce the initial cost and labeling cost required when building your own dataset for learning.

도 1은 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템의 구성도이다.
도 2는 본 발명의 일실시예에 따른 컴퓨팅 장치의 구성 블록도이다.
도 3은 본 발명의 일실시예에 따른 컴퓨팅 장치에서 데이터셋 구성 방법을 설명하기 위해 도시한 순서도이다.
도 4는 본 발명의 일실시예에 따른 컴퓨팅 장치에서 객체 배치 자동화 방법을 설명하기 위해 도시한 순서도이다.1 is a block diagram of a system for configuring a learning dataset for object recognition according to an embodiment of the present invention.
2 is a block diagram of a computing device according to an embodiment of the present invention.
3 is a flowchart illustrating a data set configuration method in a computing device according to an embodiment of the present invention.
4 is a flowchart illustrating an object arrangement automation method in a computing device according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. '및/또는' 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. The term 'and/or' includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에서, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is mentioned that a certain element is "directly connected" or "directly connected" to another element, it should be understood that no other element is present in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서 "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. It should be understood that terms such as “comprise” or “have” in the present application do not preclude the possibility of addition or existence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification in advance. .

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해서 일반적으로 이해되는 것과 동일한 의미를 가지고 있다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

본 명세서에서는 본 발명에 따라 본 발명의 일과제는, 객체 인식을 위한 학습용 데이터셋 내의 현저히 개수가 부족한 클래스의 불균형성을 문제를 해소하기 위한 학습용 데이터셋 구성 방법에 관한 프레임워크(framework)를 정의하고, 상기 정의한 프레임워크를 통하여 학습용 데이터셋 구성을 자동화하는 다양한 실시예를 개시한다.In the present specification, one task of the present invention according to the present invention is to define a framework for a method of constructing a training dataset for resolving the problem of the imbalance of classes with a markedly insufficient number in the training dataset for object recognition. and discloses various embodiments of automating the configuration of a training dataset through the framework defined above.

본 발명의 이해를 돕고 설명의 편의를 위하여, 이하에서는 2개의 데이터셋을 예로 하여 설명하였으나, 본 발명은 이에 한정되지 않고 3개 이상의 데이터셋이 이용될 수 있다. 이를 통해 충분한 학습용 데이터셋을 구성 및 확보하여 필요한 연구에 활용할 수 있다. For better understanding of the present invention and for convenience of explanation, two data sets have been described below as an example, but the present invention is not limited thereto, and three or more data sets may be used. Through this, it is possible to construct and secure a sufficient training dataset and use it for necessary research.

상술한 본 발명에 따른 실시예에서 딥러닝과 관련하여 컨볼루션 신경망(CNN, Convolution Neural Network) 등에 대한 설명은 공지의 기술을 참조하고 별도 본 명세서에서 상세 설명은 생략하였다.In the above-described embodiment according to the present invention, a description of a convolutional neural network (CNN), etc. in relation to deep learning, refers to a known technology, and a detailed description is omitted in this specification.

도 1은 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템(10)의 구성도이다.1 is a block diagram of a learning dataset configuration system 10 for object recognition according to an embodiment of the present invention.

도 2는 본 발명의 일실시예에 따른 컴퓨팅 장치(120)의 구성 블록도이다.2 is a block diagram of a computing device 120 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋 구성 시스템(10)은, 소스(110)와 컴퓨팅 장치(120)를 포함할 수 있다.Referring to FIG. 1 , a system 10 for configuring a learning dataset for object recognition according to an embodiment of the present invention may include a source 110 and a computing device 120 .

소스(110)는, 객체 인식을 위한 공개된 딥러닝 학습용 데이터셋을 저장한다. 실시예에 따라, 소스(110)는 공개된 데이터베이스, 인터넷 등 다양할 수 있다.The source 110 stores an open data set for deep learning learning for object recognition. According to an embodiment, the source 110 may be various, such as a public database, the Internet, and the like.

도 1에서는 비록 하나의 소스만을 도시하였으나, 본 발명은 이에 한정되지 않고, 복수의 소스가 본 발명에 따른 객체 인식을 위하여 이용될 수 있다.Although only one source is illustrated in FIG. 1 , the present invention is not limited thereto, and a plurality of sources may be used for object recognition according to the present invention.

컴퓨팅 장치(120)는 소스(110)로부터 객체 인식을 위한 공개된 딥러닝 학습용 데이터셋을 획득 및 저장하고, 저장된 학습용 데이터셋에 기초하여 새로운 학습용 데이터셋을 구성할 수 있다.The computing device 120 may acquire and store the open data set for deep learning learning for object recognition from the source 110 , and configure a new data set for learning based on the stored data set for learning.

실시예에 따라, 컴퓨팅 장치(120)는 단말이거나 단말에 포함된 일구성일 수 있다. According to an embodiment, the computing device 120 may be a terminal or a component included in the terminal.

실시예에 따라, 컴퓨팅 장치(120)는 공개된 학습용 데이터셋을 기초로 새롭게 구성한 학습용 데이터셋을 단말로 전송하는 서버(server)일 수 있다.According to an embodiment, the computing device 120 may be a server that transmits a training dataset newly constructed based on the published training dataset to the terminal.

컴퓨팅 장치(120)는 소스(110)로부터 획득하여 저장한 공개 학습용 데이터셋에 기반하여 구성하여 저장한 학습용 데이터셋을 결합하여 객체 인식을 위한 새로운 딥러닝 학습용 데이터셋을 구성하고 저장할 수 있다. 실시예에 따라, 컴퓨팅 장치(120)는 전술한 과정을 공개 및/또는 저장한 학습용 데이터셋에 대하여 반복 수행하여 연구에 필요한 충분할 데이터셋을 확보할 수 있으며, 이러한 객체 인식을 위한 자체 딥러닝 학습용 데이터셋을 구축함으로써 초기 비용뿐만 아니라 레이블링(labeling) 비용까지 절감할 수 있다.The computing device 120 may configure and store a new dataset for deep learning learning for object recognition by combining the dataset for learning that is configured and stored based on the dataset for public learning obtained and stored from the source 110 . According to an embodiment, the computing device 120 may secure a sufficient dataset for research by repeatedly performing the above-described process on a publicly and/or stored learning dataset, and for self-deep learning learning for such object recognition. By building a dataset, not only the initial cost but also the labeling cost can be reduced.

컴퓨팅 장치(120)는 도 3 및 4에서 후술하는 본 발명에 따른 객체 인식을 위한 학습용 데이터셋 구성 과정을 자동으로 실행되도록 프레임워크를 정의하고, 정의된 프레임워크를 통하여 다양하고 충분한 객체 인식을 위한 딥러닝 학습용 데이터셋을 구축할 수 있다.The computing device 120 defines a framework to automatically execute the learning dataset configuration process for object recognition according to the present invention to be described later in FIGS. 3 and 4, and provides various and sufficient object recognition through the defined framework. You can build datasets for deep learning training.

도 2를 참조하면, 본 발명의 일실시예에 따른 컴퓨팅 장치(120)는, 메모리(207)와 프로세서(processor)를 포함할 수 있다. 실시예에 따라, 상기 프로세서는, 통신 인터페이스(201), 영역지정/배경제거모듈(202), 객체추출모듈(203), 객체배치모듈(204), 학습모듈(205), 및 제어모듈(206)을 포함할 수 있다. 다만, 본 발명은 이에 한정되지 않고 도시되지 않은 적어도 하나의 구성요소가 더 포함되거나 반대일 수 있다. 실시예에 따라, 상기한 구성요소 중 둘이상의 구성요소가 하나의 구성요소로 구현되거나 반대일 수 있다.Referring to FIG. 2 , a computing device 120 according to an embodiment of the present invention may include a memory 207 and a processor. According to an embodiment, the processor includes a communication interface 201 , a region designation/background removal module 202 , an object extraction module 203 , an object arrangement module 204 , a learning module 205 , and a control module 206 . ) may be included. However, the present invention is not limited thereto, and at least one component not shown may be further included or vice versa. According to an embodiment, two or more of the above components may be implemented as one component or vice versa.

실시예에 따라, 도 2에 도시된 메모리(207)는 컴퓨팅 장치(120)의 내장 메모리일 수도 있고, 외장 메모리나 가상 메모리일 수 있다. 실시예에 따라, 도 2에서는 비록 하나의 메모리를 도시하였으나, 복수의 메모리가 본 발명에 이용될 수 있다. 이 때, 모든 메모리가 내장 또는 외장될 필요는 없다. 한편, 복수의 메모리가 본 발명에 이용되는 경우에 각 메모리의 타입, 용량 등이 동일할 필요는 없다.According to an embodiment, the memory 207 illustrated in FIG. 2 may be an internal memory of the computing device 120 , an external memory, or a virtual memory. According to an embodiment, although one memory is illustrated in FIG. 2 , a plurality of memories may be used in the present invention. In this case, it is not necessary that all the memories are built-in or external. On the other hand, when a plurality of memories are used in the present invention, the type, capacity, etc. of each memory do not have to be the same.

통신 인터페이스(201)는 소스(110)와의 통신 환경을 제공하고 데이터 커뮤니케이션이 이루어질 수 있도록 지원할 수 있다.The communication interface 201 may provide a communication environment with the source 110 and support data communication.

본 발명에 따른 프로세서는, 객체 인식을 위한 딥러닝 학습용 데이터셋 내 현저히 개수가 부족한 클래스(Class)의 불균형 문제를, 해당 클래스가 포함된 다른 이미지에서 배경 제거 후, 보유한 다른 학습용 데이터셋의 일부에 삽입하도록 정의된 프레임워크(framework)를 통하여 해소하고자 한다.The processor according to the present invention solves the problem of imbalance of classes, which are markedly insufficient in number in the dataset for deep learning for object recognition, after removing the background from other images containing the class, and in some of the other training datasets possessed. We want to solve it through a framework defined to be inserted.

상기에서, 다른 학습용 데이터셋은 예를 들어, 상기에서 해당 클래스를 획득한 이후에 보유한 데이터셋일 수 있다.In the above, the other training dataset may be, for example, a dataset held after acquiring the corresponding class in the above.

도 2의 각 구성에 대한 상세 설명은 도 3 및 도 4의 순서도와 함께 설명한다.A detailed description of each configuration of FIG. 2 will be described together with the flowcharts of FIGS. 3 and 4 .

도 3은 본 발명의 일실시예에 따른 컴퓨팅 장치(120)에서 데이터셋 구성 방법을 설명하기 위해 도시한 순서도이다.3 is a flowchart illustrating a data set configuration method in the computing device 120 according to an embodiment of the present invention.

도 4는 본 발명의 일실시예에 따른 컴퓨팅 장치(120)에서 객체 배치 자동화 방법을 설명하기 위해 도시한 순서도이다.4 is a flowchart illustrating an object arrangement automation method in the computing device 120 according to an embodiment of the present invention.

도 2 및 도 3을 참조하여, 본 발명의 일실시예에 따른 객체 인식을 위한 학습 데이터셋을 구성 방법을 설명하면, 다음과 같다.A method of configuring a learning dataset for object recognition according to an embodiment of the present invention will be described with reference to FIGS. 2 and 3 .

이때, 프로세서는 적어도 2개의 객체 인식을 위한 학습용 데이터셋 폴더(folder)를 미리 준비할 수 있다. 다만, 본 발명은 이에 한정되지 않고, 3개 이상의 학습용 데이터셋 폴더가 이용될 수 있다.In this case, the processor may prepare a training dataset folder for recognizing at least two objects in advance. However, the present invention is not limited thereto, and three or more training dataset folders may be used.

먼저, 준비된 데이터셋 A 폴더에 대하여 다음과 같은 과정을 수행할 수 있다.First, the following process can be performed with respect to the prepared dataset A folder.

프로세서의 영역지정/배경제거모듈(202)은, 데이터셋 A 폴더에서 하나의 이미지를 선택(또는 호출)하고 선택된 이미지에 영역을 지정하고, 지정된 영역의 배경을 제거할 수 있다(S102). 실시예에 따라, 상기 이미지는, 폴더로부터 랜덤(random)하게 선택되거나 미리 설정된 기준에 따라 선택될 수 있다. 실시예에 따라, 영역 지정 및 지정된 영역에 대한 배경 제거는, 그래프컷 알고리즘에 기초하여 이루어질 수 있으나, 본 발명은 이에 한정되는 것은 아니다.The region designation/background removal module 202 of the processor may select (or call) one image from the dataset A folder, designate a region in the selected image, and remove the background of the designated region (S102). According to an embodiment, the image may be randomly selected from a folder or selected according to a preset criterion. According to an embodiment, the region designation and the background removal of the designated region may be performed based on the graph cut algorithm, but the present invention is not limited thereto.

프로세서의 객체추출모듈(203)은, 배경이 제거된 선택된 이미지의 지정된 영역에서 객체를 추출할 수 있다(S104).The object extraction module 203 of the processor may extract an object from a designated area of the selected image from which the background is removed (S104).

메모리(207)는, 추출된 객체 이미지를 지정 크기로 저장할 수 있다(S106).The memory 207 may store the extracted object image in a specified size (S106).

프로세서는 학습용 데이터셋 A에 대하여 S106 과정을 통하여 지정 크기로 저장한 객체 이미지를 호출할 수 있다(S108).The processor may call the object image stored in the specified size through the process S106 with respect to the training dataset A (S108).

프로세서는 학습용 데이터셋 B로부터 하나의 이미지를 호출할 수 있다(S110). 실시예에 따라, 상기 호출은 랜덤으로 이루어지거나 미리 설정된 바에 따라 이루어질 수 있다.The processor may call one image from the training dataset B (S110). According to an embodiment, the call may be made randomly or according to a preset setting.

프로세서의 객체배치모듈(204)은, 호출된 학습용 데이터셋 B의 랜덤 이미지 내에 기호출한 학습용 데이터셋 A의 객체 이미지를 배치할 수 있다(S112). S112의 배치 과정에서 배치 영역에 대한 결정은 자동으로 수행될 수 있으며, 이에 대한 상세 설명은 도 4에서 상세하게 설명하고, 여기서 상세 설명은 생략한다.The object placement module 204 of the processor may arrange the object image of the extracted learning dataset A in the called random image of the called learning dataset B (S112). In the arrangement process of S112, the determination of the arrangement area may be performed automatically, and a detailed description thereof will be described in detail with reference to FIG. 4 , and a detailed description thereof will be omitted.

프로세서의 제어모듈(206)은, 데이터셋 A의 객체 이미지가 데이터셋 B의 랜덤 이미지에 자동 배치된 이미지를 메모리(207)에 저장할 수 있다(S114).The control module 206 of the processor may store an image in which the object image of the dataset A is automatically placed in the random image of the dataset B in the memory 207 (S114).

프로세서는, 학습용 데이터셋 B에서 호출된 이미지에 대하여 학습용 데이터셋 A가 모두 사용되었는지 판단하여(S116), 모두 사용될 때까지 상기한 과정을 반복한다.The processor determines whether all of the training dataset A has been used with respect to the image called from the training dataset B (S116), and repeats the above process until all are used.

실시예에 따라, 프로세서는, 학습용 데이터셋 B에서 호출 가능한 모든 이미지 대하여 전술한 과정을 호출된 이미지 단위로 반복 수행할 수 있다.According to an embodiment, the processor may repeat the above-described process for all images that can be called from the training dataset B in units of the called images.

프로세서의 학습모듈(205)은, 전술한 과정을 통하여 새롭게 구성되는 학습용 데이터셋을 딥러닝 학습에 이용할 수 있다.The learning module 205 of the processor may use the learning dataset newly constructed through the above-described process for deep learning learning.

도 4에서는, 전술한 도 3의 S112 단계, 즉 컴퓨팅 장치(120)에서 객체 배치 자동화 방법을 더욱 상세하게 설명한다. In FIG. 4 , step S112 of FIG. 3 , that is, a method for automating object placement in the computing device 120 will be described in more detail.

도 4를 참조하면, 프로세서는, 도 3의 S108 과정에서, 학습용 데이터셋 A에서 추출 또는 저장된 객체 이미지에 대한 영상 특징을 추출할 수 있다(S202). 실시예에 따라, 추출되는 영상 특징은 점, 선, 면, 텍스처에 대한 4대 영상 특징일 수 있다. Referring to FIG. 4 , in step S108 of FIG. 3 , the processor may extract an image feature of an object image extracted or stored in the training dataset A ( S202 ). According to an embodiment, the extracted image features may be four major image features for points, lines, planes, and textures.

한편, 프로세서는, 도 3의 S110 과정에서, 학습용 데이터셋 B에서 호출한 이미지를 분할할 수 있다(S204). 실시예에 따라, 프로세서는, 상기 호출한 이미지를 평면으로 4분할할 수 있다. 다만, 본 발명은 이에 한정되는 것은 아니다. Meanwhile, the processor may divide the image called from the training dataset B in the process S110 of FIG. 3 ( S204 ). According to an embodiment, the processor may divide the called image into four planes. However, the present invention is not limited thereto.

프로세서는, 분할된 각 평면 이미지에 대하여 영상 특징을 추출할 수 있다(S206). 이때, 상기 추출되는 영상 특징은 후술하는 바와 같이, 영상 특징량 비교를 위하여 전술한 학습용 데이터셋 A와 대응되도록 점, 선, 면, 텍스처 4대 영상 특징을 추출할 수 있다.The processor may extract image features from each of the divided planar images (S206). In this case, as will be described later, four image features of points, lines, planes, and textures may be extracted so that the extracted image features correspond to the aforementioned training dataset A for comparison of image feature amounts.

프로세서는, 학습용 데이터셋 A의 객체 이미지에 대하여 추출한 영상 특징과 학습용 데이터셋 B에서 추출한 분할된 평면 이미지에 대한 영상 특징의 특징량을 비교(S210)하여 기준값 초과 여부를 판단할 수 있다(S212).The processor may determine whether the reference value is exceeded by comparing the feature amount of the image feature extracted from the object image of the training dataset A with the image feature of the divided flat image extracted from the training dataset B (S210) (S212) .

프로세서는, S212 과정에서 판단한 결과 학습용 데이터셋 B에서 분할된 평면 이미지에서 기준값을 초과되었으면, 학습용 데이터셋 B에서 분할된 하나의 평면 이미지를 선정할 수 있다. 실시예에 따라, 상기 선정은 분할된 모든 평면에 대하여 순차로 이루어질 수도 있다. 프로세서는, 상기 선정된 학습용 데이터셋 B의 분할된 평면 이미지에 학습용 데이터셋 A의 객체 이미지를 삽입할 수 있다(S216).If it is determined in step S212 that the reference value is exceeded in the plane image divided in the training dataset B, the processor may select one plane image divided in the training dataset B. According to an embodiment, the selection may be sequentially performed for all divided planes. The processor may insert the object image of the training dataset A into the divided plane image of the selected training dataset B (S216).

프로세서는, 삽입되는 학습용 데이터셋 A의 객체 이미지의 경계영역에 대하여 스무딩(smoothing)을 수행할 수 있다(S218).The processor may perform smoothing on the boundary region of the object image of the inserted training dataset A (S218).

프로세서는, 객체 레이블(object label) 기반 상황 판단 후 선정된 학습용 데이터셋 B의 분할된 평면 이미지 내에 학습용 데이터셋 A의 객체 이미지의 최종 삽입 위치를 결정하고 삽입 완료할 수 있다(S220).The processor may determine the final insertion position of the object image of the training dataset A in the divided plane image of the selected training dataset B after determining the object label-based situation and complete the insertion (S220).

상기한 과정을 통하여 객체 배치 자동화 과정이 이루어질 수 있다.Through the above process, an object arrangement automation process can be made.

실시예에 따라, 상기 S212 과정 이후에 대하여는 분할된 각 평면 이미지 단위로 수행될 수 있다. According to an embodiment, the subsequent steps of S212 may be performed in units of each divided planar image.

프로세서는, S212 과정에서 분할된 모든 평면 이미지가 기준값을 초과하지 못하였으면, S204 과정에서 분할하였던 영상을 재분할할 수 있다(S214). 만약 S204 단계에서 평면 4분할을 수행하였다면, 이 경우에는 평면 n분할하되, 상기 n분할을 4분할 이상(예를 들어, 8분할, 16분할 등)을 의미할 수 있다. 실시예에 따라, 프로세서는, S214 과정의 영상 평면 재분할이 일정 반복 횟수 이상인지 판단하고(S208), 판단 결과에 따라 일정 반복 횟수 이상이면, 학습용 데이터셋 B로부터 새롭게 랜덤으로 호출한 이미지에 대하여 전술한 과정을 수행하도록 할 수 있다.If all the flat images divided in step S212 do not exceed the reference value, the processor may re-segment the image divided in step S204 ( S214 ). If the plane 4 division is performed in step S204, in this case, the plane is divided into n divisions, but the n division may mean more than 4 divisions (eg, 8 divisions, 16 divisions, etc.). According to an embodiment, the processor determines whether the image plane re-segmentation in the process S214 is more than a certain number of repetitions (S208), and if it is greater than or equal to a certain number of repetitions according to the determination result, the image newly randomly called from the training dataset B is described above You can do one process.

도 3 내지 4에서는 각 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 발명의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 발명의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 3 내지 4에 기재된 순서를 변경하여 실행하거나 각 과정 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 3 내지 4는 시계열적인 순서로 한정되는 것은 아니다.Although it is described that each process is sequentially executed in FIGS. 3 to 4 , this is merely illustrative of the technical idea of an embodiment of the present invention. In other words, those of ordinary skill in the art to which an embodiment of the present invention pertain may change the order described in FIGS. Since it will be possible to apply various modifications and variations by executing the process in parallel, FIGS. 3 to 4 are not limited to a time-series order.

한편, 도 3 내지 4에 도시된 과정들은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 즉, 컴퓨터가 읽을 수 있는 기록매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드디스크 등), 광학적 판독 매체(예를 들면, 시디롬, 디브이디 등) 및 캐리어 웨이브(예를 들면, 인터넷을 통한 전송)와 같은 저장매체를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Meanwhile, the processes illustrated in FIGS. 3 to 4 can be implemented as computer-readable codes on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. That is, the computer-readable recording medium includes a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.), an optically readable medium (eg, a CD-ROM, a DVD, etc.) and a carrier wave (eg, the Internet). storage media such as transmission via In addition, the computer-readable recording medium is distributed in a network-connected computer system so that the computer-readable code can be stored and executed in a distributed manner.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present embodiment.

10: 객체 인식을 위한 학습 데이터셋 구성 시스템
110: 소스
120: 컴퓨팅 장치
201: 통신 인터페이스
202: 영역지정/배경제거모듈
203: 객체추출모듈
204: 객체배치모듈
205: 학습모듈
206: 제어모듈
207: 메모리10: Learning dataset configuration system for object recognition
110: source
120: computing device
201: communication interface
202: area designation / background removal module
203: object extraction module
204: object placement module
205: learning module
206: control module
207: memory

Claims

In the method of constructing a training dataset for object recognition,
(a) calling the stored object image in the first dataset;
(b) randomly calling an image from the second dataset;
(c) automatically arranging the object image of the called first dataset in the random image of the called second dataset in an arbitrary position according to a preset method; and
(d) storing an image in which the object image of the first dataset is automatically placed in a random image of a second dataset;

According to claim 1,
Steps (a), (c) and (d) are,
For each image of the second dataset called in step (b), iteratively performed until all object images stored in the first dataset are used.

The method of claim 1,
randomly calling an image from the first dataset;
designating an area for the called image and removing the background;
removing the background and extracting an object from the designated area; and
The method of constructing training data, further comprising the step of storing the extracted object in a specified size.

The method of claim 1,
Step (c) is,
extracting four major image features of the extracted or stored object image from the first dataset;
dividing an image randomly called from the second dataset into a plane;
extracting four major image features for each divided plane image;
determining whether or not a reference value is exceeded by comparing the feature amounts of the four image features of the extracted or stored object images from the extracted first dataset with the feature amounts of the four image features of each flat image divided from the extracted second dataset;
selecting one divided planar image from the second dataset when a reference value is exceeded in all of the planar images divided in the second dataset as a result of the determination; and
Inserting the object image of the first dataset into the divided planar image of the selected second dataset; further comprising, learning data configuration method.

5. The method of claim 4,
The inserting step is
performing smoothing on a boundary region of the object image of the inserted first dataset; and
After determining the object label-based situation, determining the final insertion position in the divided planar image of the selected second dataset; further comprising, a method of constructing learning data.

In an apparatus for configuring a learning dataset for object recognition,
Memory; and
including a processor;
The processor is
(a) calling the stored object image in the first dataset,
(b) randomly calling an image from the second dataset,
(c) automatically placing the object image of the called first dataset in the random image of the called second dataset at an arbitrary position according to a preset method, and (d) the object image of the first dataset is 2 A training dataset configuration device that stores images automatically placed in random images of the dataset.

7. The method of claim 6,
The processor is
Repeating steps (a), (c) and (d) until all object images stored in the first dataset are used for each image of the second dataset called in step (b), Learning data organization device.

7. The method of claim 6,
The processor is
Calling an image randomly from the first dataset,
Designate an area for the called image and remove the background,
The background is removed and the object is extracted from the specified area,
Storing the extracted object in the memory in a specified size, learning data configuration device.

7. The method of claim 6,
The processor, in the process (c),
Extracting the four major image features of the extracted or stored object image from the first dataset,
Splitting the image randomly called from the second dataset in a plane,
Extracting four image features for each segmented flat image,
Comparing the feature amounts of the four image features of the extracted or stored object image from the extracted first dataset with the feature amounts of the four image features for each flat image divided from the extracted second dataset, it is determined whether the reference value is exceeded, the determination As a result, if the reference value is exceeded in all the flat images divided in the second dataset, one flat image divided in the second dataset is selected,
An apparatus for constructing learning data that inserts the object image of the first dataset into the divided planar image of the selected second dataset.

10. The method of claim 9,
The processor, in the process of inserting,
Smoothing is performed on the boundary area of the object image of the inserted first dataset,
After determining the object label-based situation, the learning data configuration apparatus determines the final insertion position in the divided flat image of the selected second dataset.