KR20230095581A

KR20230095581A - Method and apparatus for generating training data for object recognition

Info

Publication number: KR20230095581A
Application number: KR1020210185210A
Authority: KR
Inventors: 장재성; 이현규; 김종찬
Original assignee: 국민대학교산학협력단
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2023-06-29

Abstract

A method and apparatus for generating training data for object recognition are disclosed. The method for generating training data for object recognition according to one embodiment of the present disclosure comprises: a step of controlling a driving simulation environment based on simulation environment data and controlling a vehicle simulator to drive own vehicle in the driving simulation environment; a data extraction step of collecting simulation information data based on the field of view (FOV) of the own vehicle while driving; a two-dimensional bounding box generation step of generating a two-dimensional bounding box of at least one object present within the field of view while the own vehicle is driving based on the simulation information data; and a training data generation step of generating a training data image for object recognition of objects based on the bounding box. According to the present invention, time and costs required to generate data can be reduced.

Description

Method and apparatus for generating training data for object recognition {METHOD AND APPARATUS FOR GENERATING TRAINING DATA FOR OBJECT RECOGNITION}

본 개시는 차량 시뮬레이터에 기반하여 객체 인식용 훈련 데이터를 자동으로 생성하는 객체 인식용 훈련 데이터 생성 방법 및 장치에 관한 것이다.The present disclosure relates to a method and apparatus for generating training data for object recognition automatically based on a vehicle simulator.

자율 주행의 경우 카메라 기반 인식 시스템을 개발하려면 엄청난 양의 훈련 이미지가 필요하다. 일반적으로 이러한 훈련 이미지는 사람의 노동에 의해 수집되고 레이블이 지정된다. 따라서 비용이 많이 들고 오류가 발생하기 쉽다. For autonomous driving, developing a camera-based perception system requires an enormous amount of training images. Typically, these training images are collected and labeled by human labor. This is costly and error-prone.

또한, 비정상적인 코너 케이스나 특이한 날씨 및 조명 조건을 인위적으로 만들 수 없기 때문에, 방대한 양의 실제 주행 이미지를 충분히 다양하게 수집하는 것은 매우 어려운 일이다. 따라서 방대한 양의 데이터 세트는 대부분 상당한 예산과 자원을 갖춘 소수의 회사와 기관에서 수행되고 있는 실정이다. In addition, it is very difficult to collect a large amount of real-life driving images with sufficient variety because it is impossible to artificially create unusual corner cases or unusual weather and lighting conditions. Thus, the vast majority of data sets are being conducted by a handful of companies and institutions with significant budgets and resources.

따라서, 인식 시스템을 위한 데이터 세트 생성의 어려운 작업을 일반화하기 위해, 훈련 데이터를 자동으로 수집하고, 자동으로 수집된 훈련 데이터에 기반하여 객체 인식용 학습 모델을 훈련시킬 수 있는 기술이 필요하다. Therefore, in order to generalize the difficult task of generating data sets for recognition systems, there is a need for a technique capable of automatically collecting training data and training a learning model for object recognition based on the automatically collected training data.

전술한 배경기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.The foregoing background art is technical information that the inventor possessed for derivation of the present invention or acquired during the derivation process of the present invention, and cannot necessarily be said to be known art disclosed to the general public prior to filing the present invention.

선행기술 1: G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, in Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 32343243.Prior Art 1: G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes, in Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 32343243. 선행기술 2: A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, CARLA: An open urban driving simulator, in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 116.Prior Art 2: A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, CARLA: An open urban driving simulator, in Proceedings of the 1st Annual Conference on Robot Learning, 2017, pp. 116.

본 개시의 실시 예의 일 과제는, 차량 시뮬레이터에 기반하여 다양한 상황에 대한 많은 양의 객체 인식용 훈련 데이터를 생성하고자 하는데 있다.An object of an embodiment of the present disclosure is to generate a large amount of training data for object recognition for various situations based on a vehicle simulator.

본 개시의 실시 예의 일 과제는, 주행 시뮬레이터를 사용하여 간편하게 객체 인식용 훈련 데이터를 생성하고, 주행 시뮬레이터로부터 획득되는 3차원 바운딩 박스를 객체에 최대한 일치하는 2차원 바운딩 박스로 변환하는데 있다.An object of an embodiment of the present disclosure is to conveniently generate training data for object recognition using a driving simulator and convert a 3D bounding box obtained from the driving simulator into a 2D bounding box that matches the object as much as possible.

본 개시의 실시 예의 일 과제는, 코너 케이스와 같은 실제 운전 시나리오에서 거의 발생하지 않는 상황에 대해서 운전 장면을 자동으로 합성하여 객체 인식용 훈련 데이터를 손쉽게 수집할 수 있도록 함으로써, 2차원 카메라 영상에서 대상 객체의 클래스와 위치를 찾는 2차원 객체 인식 학습의 정확도를 보다 향상시키고자 하는데 있다.An object of an embodiment of the present disclosure is to easily collect training data for object recognition by automatically synthesizing driving scenes for situations that rarely occur in actual driving scenarios, such as corner cases, so that objects can be detected in 2D camera images. It is intended to further improve the accuracy of 2D object recognition learning to find the class and location of an object.

본 개시의 실시예의 목적은 이상에서 언급한 과제에 한정되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있고, 본 발명의 실시 예에 의해 보다 분명하게 이해될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 알 수 있을 것이다.The purpose of the embodiments of the present disclosure is not limited to the above-mentioned tasks, and other objects and advantages of the present invention not mentioned above can be understood by the following description and will be more clearly understood by the embodiments of the present invention. will be. It will also be seen that the objects and advantages of the present invention may be realized by means of the instrumentalities and combinations indicated in the claims.

본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 방법은, 시뮬레이션 환경 데이터에 기반하여 주행 시뮬레이션 환경을 제어하고, 주행 시뮬레이션 환경에서 자차량을 주행하도록 차량 시뮬레이터를 제어하는 단계와, 주행 중인 자차량의 주행 중 시야(field of view: FOV)를 기준으로 한 시뮬레이션 정보 데이터를 수집하는 데이터 추출 단계와, 시뮬레이션 정보 데이터를 기반으로 자차량의 주행 중 시야 내에 존재하는 적어도 하나의 객체의 2차원 바운딩 박스(bounding box)를 생성하는 2차원 바운딩 박스 생성 단계와, 바운딩 박스에 기반하여 객체의 객체 인식용 훈련 데이터 영상을 생성하는 훈련 데이터 생성 단계를 포함할 수 있다.A method for generating training data for object recognition according to an embodiment of the present disclosure includes the steps of controlling a driving simulation environment based on simulation environment data and controlling a vehicle simulator to drive a vehicle in the driving simulation environment; Data extraction step of collecting simulation information data based on the field of view (FOV) of the vehicle while driving, and 2D bounding of at least one object existing within the field of view while driving the host vehicle based on the simulation information data A 2D bounding box generating step of generating a box and a training data generating step of generating a training data image for object recognition of an object based on the bounding box may be included.

이 외에도, 본 개시의 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램이 저장된 컴퓨터로 판독 가능한 기록매체가 더 제공될 수 있다.In addition to this, another method for implementing the present disclosure, another system, and a computer readable recording medium storing a computer program for executing the method may be further provided.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages other than those described above will become apparent from the following drawings, claims and detailed description of the invention.

본 개시의 실시 예에 의하면, 다양한 상황에 대한 주행 시뮬레이션을 통해, 합리적인 시간 내에 객체 인식용 훈련 데이터와 함께 상당한 양의 사실적인 주행 이미지를 생성함으로써, 데이터 생성에 소요되는 시간 및 비용을 감소시킬 수 있다.According to an embodiment of the present disclosure, it is possible to reduce the time and cost required for data generation by generating a considerable amount of realistic driving images together with training data for object recognition within a reasonable time through driving simulation for various situations. there is.

또한, 주행 시뮬레이터를 사용하여 간편하게 객체 인식용 훈련 데이터를 생성하고, 주행 시뮬레이터로부터 획득되는 3차원 바운딩 박스를 객체에 최대한 일치하는 2차원 바운딩 박스로 변환함으로써, 생성된 객체 인식용 훈련 데이터를 사용한 객체 인식 학습의 정확도를 향상시킬 수 있다.In addition, by simply generating training data for object recognition using a driving simulator and converting the 3D bounding box obtained from the driving simulator into a 2D bounding box that matches the object as much as possible, the object using the generated training data for object recognition The accuracy of recognition learning can be improved.

또한, 실제 주행 시나리오에서 얻기 어려운 비정상적인 기상 및 조명 조건을 인위적으로 만든 합성 훈련 이미지를 사용하여, 특히 열악한 환경 등의 현실 세계에서 물체 인식 정확도를 크게 향상시킬 수 있다. In addition, object recognition accuracy can be greatly improved in the real world, especially in harsh environments, by using synthetic training images artificially created with abnormal weather and lighting conditions that are difficult to obtain in real driving scenarios.

본 개시의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.Effects of the present disclosure are not limited to those mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 개시의 일 실시 예에 따른 주행 시뮬레이터에서의 다양한 기상 조건에서 다른 운전 시나리오에서 인식된 객체들의 예시 이미지이다.
도 2는 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 시스템 환경을 개략적으로 개시한 도면이다.
도 3은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치를 개략적으로 나타낸 블록도이다.
도 4는 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 프로세스를 설명하기 위한 도면이다.
도 5는 본 개시의 일 실시 예에 따른 최종 바운딩 박스 생성을 위한 후처리 과정을 설명하기 위한 예시도이다.
도 6은 본 개시의 일 실시 예에 따른 2차원 이미지 좌표에서 생성된 객체에 대한 히트맵이다.
도 7은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치의 확장성 평가 그래프이다.
도 8은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치의 정확성 평가 도면이다.
도 9는 본 개시의 일 실시 예에 다른 객체 인식용 훈련 데이터 생성 장치의 훈련 성과 평가 그래프이다.
도 10은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치가 적용된 차량을 개략적으로 나타낸 예시도이다.
도 11은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 방법을 설명하기 위한 흐름도이다.1 is an exemplary image of objects recognized in different driving scenarios under various weather conditions in a driving simulator according to an embodiment of the present disclosure.
2 is a diagram schematically illustrating a system environment for generating training data for object recognition according to an embodiment of the present disclosure.
3 is a schematic block diagram of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure.
4 is a diagram for explaining a process of generating training data for object recognition according to an embodiment of the present disclosure.
5 is an exemplary diagram for explaining a post-processing process for generating a final bounding box according to an embodiment of the present disclosure.
6 is a heat map of an object generated from 2D image coordinates according to an embodiment of the present disclosure.
7 is a scalability evaluation graph of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure.
8 is an accuracy evaluation diagram of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure.
9 is a training performance evaluation graph of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure.
10 is an exemplary diagram schematically illustrating a vehicle to which an apparatus for generating training data for object recognition according to an embodiment of the present disclosure is applied.
11 is a flowchart illustrating a method of generating training data for object recognition according to an embodiment of the present disclosure.

본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 설명되는 실시 예들을 참조하면 명확해질 것이다.Advantages and features of the present disclosure, and methods of achieving them, will become clear with reference to the detailed description of embodiments in conjunction with the accompanying drawings.

그러나 본 개시는 아래에서 제시되는 실시 예들로 한정되는 것이 아니라, 서로 다른 다양한 형태로 구현될 수 있고, 본 개시의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 아래에 제시되는 실시 예들은 본 개시가 완전하도록 하며, 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 개시의 범주를 완전하게 알려주기 위해 제공되는 것이다. 본 개시를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.However, it should be understood that the present disclosure is not limited to the embodiments presented below, but may be implemented in a variety of different forms, and includes all conversions, equivalents, and substitutes included in the spirit and technical scope of the present disclosure. . The embodiments presented below are provided to complete the present disclosure and to fully inform those skilled in the art of the scope of the disclosure. In describing the present disclosure, if it is determined that a detailed description of related known technologies may obscure the gist of the present disclosure, the detailed description will be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 개시를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present disclosure. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded. Terms such as first and second may be used to describe various components, but components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

이하, 본 개시에 따른 실시 예들을 첨부된 도면을 참조하여 상세히 설명하기로 하며, 첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same or corresponding components are assigned the same reference numerals, and overlapping descriptions thereof are omitted. I'm going to do it.

도 1은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터의 생성 결과를 개략적으로 개시한 도면이다. 1 is a diagram schematically illustrating a result of generating training data for object recognition according to an embodiment of the present disclosure.

도 1은 다양한 기상 조건(열 기준)에서 4가지 다른 운전 시나리오(행 기준)로 합성된 예시 이미지를 보여준다. 각 이미지의 바운딩 박스(bounding box) 및 이미지 자체가 본 개시의 일 실시 예에 따라 자동으로 생성될 수 있다.Figure 1 shows example images composited with four different driving scenarios (row-wise) under various weather conditions (row-wise). A bounding box of each image and the image itself may be automatically generated according to an embodiment of the present disclosure.

본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치는 자율 주행 시뮬레이션 서버에서의 시뮬레이션에 따른 정보를 수집할 수 있다. 자율 주행 시뮬레이션 서버는 데이터 추출 클라이언트가 API(응용 프로그래밍 인터페이스)를 사용하여 서버와 상호 작용하는 미리 정의된 가상 세계(소위 도시)를 시뮬레이션 한다. 객체 인식용 훈련 데이터 생성 장치는 자율 주행 시뮬레이션 서버에서 제공되는 API를 통해 마을의 날씨, 자 차량, 주변 차량, 보행자 등을 제어할 수 있다. 즉 정보 생성 소프트웨어를 통해 자 차량의 센서 데이터(예: 전방 카메라)와 주변 차량 및 보행자(이하 객체)의 물리적 속성(예: 위치, 포즈, 속도)을 추출할 수 있다.An apparatus for generating training data for object recognition according to an embodiment of the present disclosure may collect information according to simulation in an autonomous driving simulation server. An autonomous driving simulation server simulates a predefined virtual world (so-called city) in which data extraction clients interact with the server using APIs (application programming interfaces). The device for generating training data for object recognition can control the town's weather, own vehicles, surrounding vehicles, pedestrians, etc. through the API provided by the autonomous driving simulation server. That is, sensor data (eg, a front camera) of the own vehicle and physical attributes (eg, position, pose, speed) of surrounding vehicles and pedestrians (hereinafter referred to as objects) may be extracted through the information generating software.

도 2는 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 시스템 환경을 개략적으로 개시한 도면이다. 2 is a diagram schematically illustrating a system environment for generating training data for object recognition according to an embodiment of the present disclosure.

도 2를 참조하면, 일 실시 예의 객체 인식용 훈련 데이터 생성 시스템(1)은 객체 인식용 훈련 데이터 생성 장치(100), 서버(200) 및 네트워크(300)를 포함할 수 있다.Referring to FIG. 2 , an object recognition training data generation system 1 according to an embodiment may include an object recognition training data generation apparatus 100 , a server 200 and a network 300 .

일 실시 예에서 객체 인식용 훈련 데이터 생성 장치(100)는 자율 주행을 위한 2차원 객체 인식용 훈련 데이터를 자동으로 합성하기 위한 것으로, 2차원 카메라 영상에서 대상 객체(예를 들어, 차량, 보행자)의 클래스와 위치를 찾는 2차원 객체 인식 태스크를 목표로 한다. In an embodiment, the object recognition training data generating apparatus 100 is for automatically synthesizing training data for 2D object recognition for autonomous driving, and is a target object (eg, a vehicle or a pedestrian) in a 2D camera image. It aims at the two-dimensional object recognition task of finding the class and location of

객체 인식은 예를 들어, 첨단 운전자 보조 시스템(ADAS) 및 자율 주행에서 중요한 역할을 한다. 따라서 일 실시 예에서는 시뮬레이션을 통해 실제 운전 시나리오에서 활용 가능한 훈련 데이터 세트를 생성하여 객체 인식 모델 학습에 사용되도록 할 수 있다. Object recognition plays an important role in, for example, advanced driver assistance systems (ADAS) and autonomous driving. Accordingly, in an embodiment, a training data set usable in an actual driving scenario may be generated through simulation and used for object recognition model learning.

나아가 일 실시 예에서는, 실제 운전 시나리오에서 거의 발생하지 않는 코너 케이스를 확보하여 더 나은 성능의 객체 인식 모델 학습이 수행되도록 할 수 있다. 따라서 일 실시 예에서는, 다양한 날씨 및 조명 조건에서 합성된 훈련 데이터 세트를 생성함으로써, 실제 객체 인식의 정확도가 크게 향상될 수 있도록 할 수 있다.Furthermore, in an embodiment, it is possible to perform object recognition model learning with better performance by securing corner cases that rarely occur in actual driving scenarios. Therefore, in one embodiment, by generating a training data set synthesized under various weather and lighting conditions, the accuracy of real object recognition can be greatly improved.

이를 위해, 객체 인식용 훈련 데이터 생성 장치(100)는 크게 시뮬레이션을 수행하는 부분, 시뮬레이션 관련 정보를 수집하는 부분 및 수집한 정보로 2차원 바운딩 박스를 추출하는 데이터 생성 부분으로 구성될 수 있다. 이러한 구성에 대한 구체적인 설명은 후술하도록 한다.To this end, the apparatus 100 for generating training data for object recognition may be largely composed of a part that performs a simulation, a part that collects simulation-related information, and a data generation part that extracts a 2D bounding box from the collected information. A detailed description of this configuration will be described later.

한편 일 실시 예에서, 객체 인식용 훈련 데이터 생성 장치(100)는 그 일부 또는 전체가 서버(200)로 구현될 수 있다. 그 일부가 서버(200)로 구현되는 경우, 객체 인식용 훈련 데이터 생성 장치(100)는 서버(200)에서 수행된 시뮬레이션 결과 및 시뮬레이션 관련 정보를 수신하여 데이터 생성을 수행할 수 있다.Meanwhile, in one embodiment, a part or the whole of the apparatus 100 for generating training data for object recognition may be implemented as the server 200 . When a part thereof is implemented as the server 200, the apparatus 100 for generating training data for object recognition may generate data by receiving simulation results and simulation-related information performed by the server 200.

또한, 일 실시 예에서, 객체 인식용 훈련 데이터 생성 장치(100)는 시뮬레이션을 수행하는 시뮬레이터 및 시뮬레이터 이후의 데이터 추출과 추출된 데이터 기반 데이터 생성을 수행하는 모듈을 포함하는 구성으로서 구성되거나, 다른 실시 예에서는 시뮬레이터로부터 추출된 데이터 기반 데이터를 생성하는 후 처리 모듈로서 구현될 수 있다.In addition, in one embodiment, the training data generation apparatus 100 for object recognition is configured as a configuration including a simulator that performs simulation and a module that performs data extraction after the simulator and data generation based on the extracted data, or in another embodiment. In an example, it may be implemented as a post-processing module that generates data based on data extracted from the simulator.

일 실시 예에서, 객체 인식용 훈련 데이터 생성 장치(100)는 객체 인식용 훈련 데이터 생성 장치(100) 및/또는 서버(200)에서 구현될 수 있는데, 이때 서버(200)는 객체 인식용 훈련 데이터 생성 장치(100)가 포함되는 객체 인식용 훈련 데이터 생성 시스템(1)을 운용하기 위한 서버이거나 객체 인식용 훈련 데이터 생성 장치(100)의 일부분 또는 전 부분을 구현하는 서버일 수 있다.In one embodiment, the training data generating apparatus 100 for object recognition may be implemented in the apparatus 100 for generating training data for object recognition and/or the server 200, where the server 200 provides training data for object recognition. It may be a server for operating the object recognition training data generating system 1 including the generating device 100 or a server implementing all or part of the object recognition training data generating device 100 .

일 실시 예에서, 서버(200)는 시뮬레이션 단계에서 다양한 정보를 추출하고 이미지 내 객체들의 3차원 바운딩 박스를 2차원 바운딩 박스로 정확히 변환하여 객체 인식 학습을 위한 훈련 데이터 세트를 생성하는 전반의 프로세스에 대한 객체 인식용 훈련 데이터 생성 장치(100)의 동작을 제어하는 서버일 수 있다. 훈련 데이터 세트는, Ground truth로서 레이블링 된 데이터를 의미할 수 있다.In one embodiment, the server 200 extracts various information in the simulation step and accurately transforms the 3D bounding boxes of objects in the image into 2D bounding boxes in the overall process of generating a training data set for object recognition learning. It may be a server that controls the operation of the apparatus 100 for generating training data for object recognition. A training data set may refer to data labeled as ground truth.

또한, 서버(200)는 객체 인식용 훈련 데이터 생성 장치(100)를 동작시키는 데이터를 제공하는 데이터베이스 서버일 수 있다. 그 밖에 서버(200)는 웹 서버 또는 어플리케이션 서버 또는 딥러닝 네트워크 제공 서버를 포함할 수 있다.Also, the server 200 may be a database server that provides data for operating the apparatus 100 for generating training data for object recognition. In addition, the server 200 may include a web server, an application server, or a deep learning network providing server.

그리고 서버(200)는 각종 인공 지능 알고리즘을 적용하는데 필요한 빅데이터 서버 및 AI 서버, 각종 알고리즘의 연산을 수행하는 연산 서버 등을 포함할 수 있다.In addition, the server 200 may include a big data server and an AI server required to apply various artificial intelligence algorithms, and a calculation server that performs calculations of various algorithms.

또한 본 실시 예에서, 서버(200)는 상술하는 서버들을 포함하거나 이러한 서버들과 네트워킹 할 수 있다. 즉, 본 실시 예에서, 서버(200)는 상기의 웹 서버 및 AI 서버를 포함하거나 이러한 서버들과 네트워킹 할 수 있다.Also, in this embodiment, the server 200 may include the aforementioned servers or network with these servers. That is, in this embodiment, the server 200 may include the above web server and AI server or network with these servers.

객체 인식용 훈련 데이터 생성 시스템(1)에서 객체 인식용 훈련 데이터 생성 장치(100) 및 서버(200)는 네트워크(300)에 의해 연결될 수 있다. 이러한 네트워크(300)는 예컨대 LANs(local area networks), WANs(Wide area networks), MANs(metropolitan area networks), ISDNs(integrated service digital networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 또한 네트워크(300)는 근거리 통신 및/또는 원거리 통신을 이용하여 정보를 송수신할 수 있다. In the object recognition training data generation system 1 , the object recognition training data generation apparatus 100 and the server 200 may be connected by a network 300 . Such a network 300 may be, for example, a wired network such as local area networks (LANs), wide area networks (WANs), metropolitan area networks (MANs), and integrated service digital networks (ISDNs), wireless LANs, CDMA, Bluetooth, and satellite communication. However, the scope of the present invention is not limited thereto. Also, the network 300 may transmit and receive information using short-range communication and/or long-distance communication.

또한, 네트워크(300)는 허브, 브리지, 라우터, 스위치 및 게이트웨이와 같은 네트워크 요소들의 연결을 포함할 수 있다. 네트워크(300)는 인터넷과 같은 공용 네트워크 및 안전한 기업 사설 네트워크와 같은 사설 네트워크를 비롯한 하나 이상의 연결된 네트워크들, 예컨대 다중 네트워크 환경을 포함할 수 있다. 네트워크(300)에의 액세스는 하나 이상의 유선 또는 무선 액세스 네트워크들을 통해 제공될 수 있다. 더 나아가 네트워크(300)는 사물 등 분산된 구성 요소들 간에 정보를 주고받아 처리하는 IoT(Internet of Things, 사물인터넷) 망 및/또는 5G 통신을 지원할 수 있다.In addition, the network 300 may include connections of network elements such as hubs, bridges, routers, switches, and gateways. Network 300 may include one or more connected networks, such as a multiple network environment, including a public network such as the Internet and a private network such as a secure corporate private network. Access to network 300 may be provided via one or more wired or wireless access networks. Furthermore, the network 300 may support an Internet of Things (IoT) network and/or 5G communication in which information is exchanged and processed between distributed components such as things.

도 3은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치를 개략적으로 나타낸 블록도이고, 도 4는 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 프로세스를 설명하기 위한 도면이다.3 is a block diagram schematically illustrating an apparatus for generating training data for object recognition according to an embodiment of the present disclosure, and FIG. 4 is a diagram for explaining a process for generating training data for object recognition according to an embodiment of the present disclosure. .

도 3을 참조하면, 객체 인식용 훈련 데이터 생성 장치(100)는 통신부(110), 사용자 인터페이스(120), 메모리(130) 및 프로세서(140)를 포함할 수 있다.Referring to FIG. 3 , the apparatus 100 for generating training data for object recognition may include a communication unit 110 , a user interface 120 , a memory 130 and a processor 140 .

통신부(110)는 네트워크(300)와 연동하여 외부 장치간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공할 수 있다. 또한 통신부(110)는 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다.The communication unit 110 may provide a communication interface necessary to provide a transmission/reception signal between external devices in the form of packet data in conjunction with the network 300 . In addition, the communication unit 110 may be a device including hardware and software necessary for transmitting and receiving signals such as control signals or data signals to and from other network devices through wired or wireless connections.

즉, 프로세서(140)는 통신부(110)를 통해 연결된 외부 장치로부터 각종 데이터 또는 정보를 수신할 수 있으며, 외부 장치로 각종 데이터 또는 정보를 전송할 수도 있다. That is, the processor 140 may receive various data or information from an external device connected through the communication unit 110 and may transmit various data or information to the external device.

일 실시 예에서, 사용자 인터페이스(120)는 객체 인식용 훈련 데이터 생성 장치(100)의 동작(예컨대, 시뮬레이션 조건 변경, 데이터 생성 알고리즘의 파라미터 변경 등)을 제어하기 위한 사용자 요청 및 명령들이 입력되는 입력 인터페이스를 포함할 수 있다. 즉 사용자 인터페이스(120)는 후술하는 데이터추출부(도 4의 142)의 클라이언트 API를 의미하는 것일 수 있다.In one embodiment, the user interface 120 is an input into which user requests and commands for controlling the operation of the apparatus 100 for generating training data for object recognition (eg, changing simulation conditions, changing parameters of a data generating algorithm, etc.) are input. May contain interfaces. That is, the user interface 120 may mean a client API of a data extraction unit (142 in FIG. 4) to be described later.

그리고 일 실시 예에서, 사용자 인터페이스(120)는 객체 인식용 훈련 데이터 생성 결과를 출력하는 출력 인터페이스를 포함할 수 있다. 즉, 사용자 인터페이스(120)는 사용자 요청 및 명령에 따른 결과를 출력할 수 있다. 이러한 사용자 인터페이스(120)의 입력 인터페이스와 출력 인터페이스는 동일한 인터페이스에서 구현될 수 있다.In one embodiment, the user interface 120 may include an output interface outputting a result of generating training data for object recognition. That is, the user interface 120 may output results according to user requests and commands. An input interface and an output interface of the user interface 120 may be implemented in the same interface.

메모리(130)는 객체 인식용 훈련 데이터 생성 장치(100)의 동작의 제어(연산)에 필요한 각종 정보들을 저장하고, 제어 소프트웨어를 저장할 수 있는 것으로, 휘발성 또는 비휘발성 기록 매체를 포함할 수 있다. The memory 130 can store various information necessary for controlling (operation) the operation of the apparatus 100 for generating training data for object recognition and store control software, and may include a volatile or non-volatile recording medium.

메모리(130)는 하나 이상의 프로세서(140)와 전기적 또는 내부 통신 인터페이스로 연결되고, 프로세서(140)에 의해 실행될 때, 프로세서(140)로 하여금 객체 인식용 훈련 데이터 생성 장치(100)를 제어하도록 야기하는(cause) 코드들을 저장할 수 있다.The memory 130 is connected to one or more processors 140 through an electrical or internal communication interface, and when executed by the processor 140, causes the processor 140 to control the apparatus 100 for generating training data for object recognition. You can store code that causes.

여기서, 메모리(130)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media) 등의 비 일시적 저장매체이거나 램(RAM) 등의 일시적 저장매체를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다. 이러한 메모리(130)는 내장 메모리 및/또는 외장 메모리를 포함할 수 있으며, DRAM, SRAM, 또는 SDRAM 등과 같은 휘발성 메모리, OTPROM(one time programmable ROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, NAND 플래시 메모리, 또는 NOR 플래시 메모리 등과 같은 비휘발성 메모리, SSD. CF(compact flash) 카드, SD 카드, Micro-SD 카드, Mini-SD 카드, Xd 카드, 또는 메모리 스틱(memory stick) 등과 같은 플래시 드라이브, 또는 HDD와 같은 저장 장치를 포함할 수 있다. 그리고, 메모리(130)에는 본 개시에 따른 학습을 수행하기 위한 알고리즘에 관련된 정보가 저장될 수 있다. 그 밖에도 본 개시의 목적을 달성하기 위한 범위 내에서 필요한 다양한 정보가 메모리(130)에 저장될 수 있으며, 메모리(130)에 저장된 정보는 서버 또는 외부 장치로부터 수신되거나 사용자에 의해 입력됨에 따라 갱신될 수도 있다.Here, the memory 130 may include non-temporary storage media such as magnetic storage media or flash storage media, or temporary storage media such as RAM, but the scope of the present invention is not limited thereto. The memory 130 may include built-in memory and/or external memory, and may include volatile memory such as DRAM, SRAM, or SDRAM, one time programmable ROM (OTPROM), PROM, EPROM, EEPROM, mask ROM, flash ROM, Non-volatile memory such as NAND flash memory, or NOR flash memory, SSD. It may include a compact flash (CF) card, a flash drive such as an SD card, a Micro-SD card, a Mini-SD card, an Xd card, or a memory stick, or a storage device such as an HDD. Also, information related to an algorithm for performing learning according to the present disclosure may be stored in the memory 130 . In addition, various information necessary within the scope of achieving the object of the present disclosure may be stored in the memory 130, and the information stored in the memory 130 may be updated as received from a server or an external device or input by a user. may be

프로세서(140)는 객체 인식용 훈련 데이터 생성 장치(100)의 전반적인 동작을 제어할 수 있다. 구체적으로, 프로세서(140)는 메모리(130)를 포함하는 객체 인식용 훈련 데이터 생성 장치(100)의 구성과 연결되며, 메모리(130)에 저장된 적어도 하나의 명령을 실행하여 객체 인식용 훈련 데이터 생성 장치(100)의 동작을 전반적으로 제어할 수 있다. The processor 140 may control overall operations of the apparatus 100 for generating training data for object recognition. Specifically, the processor 140 is connected to the configuration of the apparatus 100 for generating training data for object recognition including the memory 130, and generates training data for object recognition by executing at least one command stored in the memory 130. Overall operation of the device 100 may be controlled.

프로세서(140)는 다양한 방식으로 구현될 수 있다. 예를 들어, 프로세서(140)는 주문형 집적 회로(Application Specific Integrated Circuit, ASIC), 임베디드 프로세서, 마이크로 프로세서, 하드웨어 컨트롤 로직, 하드웨어 유한 상태 기계(Hardware Finite State Machine, FSM), 디지털 신호 프로세서(Digital Signal Processor, DSP) 중 적어도 하나로 구현될 수 있다. Processor 140 can be implemented in a variety of ways. For example, the processor 140 may include an application specific integrated circuit (ASIC), an embedded processor, a microprocessor, hardware control logic, a hardware finite state machine (FSM), a digital signal processor Processor, DSP) may be implemented as at least one.

프로세서(140)는 일종의 중앙처리장치로서 메모리(130)에 탑재된 제어 소프트웨어를 구동하여 객체 인식용 훈련 데이터 생성 장치(100)의 동작을 제어할 수 있다. 프로세서(140)는 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령어로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다.The processor 140, as a kind of central processing unit, may control the operation of the apparatus 100 for generating training data for object recognition by driving control software loaded in the memory 130. The processor 140 may include any type of device capable of processing data. Here, a 'processor' may refer to a data processing device embedded in hardware having a physically structured circuit to perform functions expressed by codes or instructions included in a program, for example.

도 4를 참조하여 프로세서(140)의 객체 인식용 훈련 데이터 생성을 위한 프로세스를 설명한다.A process for generating training data for object recognition by the processor 140 will be described with reference to FIG. 4 .

도 4에 도시된 바와 같이, 객체 인식용 훈련 데이터 생성 장치(100)는 시뮬레이션부(141), 데이터추출부(142) 및 데이터생성부(143)를 포함할 수 있다.As shown in FIG. 4 , the apparatus 100 for generating training data for object recognition may include a simulation unit 141 , a data extraction unit 142 and a data generation unit 143 .

시뮬레이션부(141)는 시뮬레이션 환경 데이터에 기반하여 주행 시뮬레이션 환경을 제어하고, 주행 시뮬레이션 환경에서 자차량을 주행하도록 차량 시뮬레이터를 제어할 수 있다. 즉, 시뮬레이션부(141)는 주행 시뮬레이션을 수행하고, 주행 시뮬레이션에 관련된 시뮬레이션 정보 데이터를 데이터추출부(142)에 제공할 수 있다. 일 실시 예에서, 시뮬레이션부(141)는 차량 시뮬레이터(simulator) 또는 차량 시뮬레이터에 주행 환경을 설정하는 설정부를 의미할 수 있다.The simulation unit 141 may control a driving simulation environment based on the simulation environment data and control a vehicle simulator to drive the own vehicle in the driving simulation environment. That is, the simulation unit 141 may perform driving simulation and provide simulation information data related to the driving simulation to the data extraction unit 142 . In one embodiment, the simulation unit 141 may refer to a vehicle simulator or a setting unit that sets a driving environment in the vehicle simulator.

시뮬레이션부(141)는 가상 세계(소위 마을)를 시뮬레이션하며, 데이터추출부(142)로부터의 시뮬레이션 환경 데이터 변경 명령에 따라 주행 시뮬레이션 환경을 제어할 수 있다. 예를 들어, 데이터추출부(142)는 통신을 위해 API를 사용하여 클라이언트(client)로부터 시뮬레이션 환경 데이터를 변경할 수 있다.The simulation unit 141 simulates a virtual world (a so-called village) and controls a driving simulation environment according to a command to change simulation environment data from the data extraction unit 142 . For example, the data extraction unit 142 may change simulation environment data from a client using an API for communication.

일 실시 예에서는, 시뮬레이션부(141)에서의 시뮬레이션에 있어서, 세계(world), 행위자(actor) 및 블루프린트(blueprint)의 3가지 중요한 개념이 있다.In one embodiment, in the simulation in the simulation unit 141, there are three important concepts of world, actor, and blueprint.

시뮬레이션부(141)는 미리 정의된 맵을 사용하여 가상 세계를 생성할 수 있다. 이렇게 생성된 가상 세계에서 행위자는 시뮬레이션에서 특정 역할을 하는 모든 객체가 될 수 있으며, 일반적으로 가상 세계의 마을 주변을 이동하게 된다. 행위자는 예를 들어, 차량, 보행자 및 센서를 포함할 수 있다. The simulation unit 141 may generate a virtual world using a predefined map. In the virtual world created in this way, the actor can be any object that plays a specific role in the simulation, and generally moves around the virtual world's town. Actors may include, for example, vehicles, pedestrians, and sensors.

그리고 행위자는 애니메이션 효과가 있는 3차원 모델인 블루프린트에 배치될 수 있다. 일 실시 예의 시뮬레이션부(141)는 기본적으로 사용할 수 있는 블루프린트 세트를 제공할 수 있다. 블루프린트 세트는, 예를 들어, 다양한 차량 모델, 다양한 외모를 가진 보행자(예: 남성, 여성, 어린이 등) 및 다양한 센서(예: 카메라, 라이더, 레이더 등)를 포함할 수 있다.And actors can be placed in blueprints, which are 3D models with animation effects. The simulation unit 141 of one embodiment may provide a set of blueprints that can be basically used. A set of blueprints can include, for example, different vehicle models, different types of pedestrians (eg male, female, child, etc.) and different sensors (eg cameras, lidars, radars, etc.).

즉 시뮬레이션부(141)는 필요한 행위자가 있는 가상 세계를 만든 후, 시뮬레이션을 수행할 수 있으며, 데이터추출부(142)로부터의 시뮬레이션 환경 데이터 변경 명령에 따라, 날씨 및 조명 조건을 제어하고 자차량, 주변 차량 및 보행자를 이동시킬 수 있다.That is, the simulation unit 141 can perform simulation after creating a virtual world with necessary actors, and controls the weather and lighting conditions according to the simulation environment data change command from the data extraction unit 142, It can move surrounding vehicles and pedestrians.

한편, 일 실시 예에서, 프로세서(140)는 다양한 훈련 데이터 확보를 위해, 시뮬레이션부(141)의 주행 환경을 제어할 수 있다. 즉 프로세서(140)는 자차량의 주행 속도를 동적으로 변화시키거나, 주행 시뮬레이션 환경의 날씨 또는 주행 시각을 변화시킴으로써 차량 시뮬레이터를 제어할 수 있다.Meanwhile, in an embodiment, the processor 140 may control a driving environment of the simulation unit 141 to obtain various training data. That is, the processor 140 may control the vehicle simulator by dynamically changing the driving speed of the host vehicle or changing the weather or driving time of the driving simulation environment.

그리고 프로세서(140)는 자차량의 주행 속도에 따라 데이터 수집을 변화시킬 수 있는데, 자차량의 주행속도가 빠르면 수집 주기를 빠르게 할 수 있다. 즉 프로세서(140)는 자차량의 주행 속도에 기반하여 시뮬레이션 정보 데이터를 수집하는 주기를 변경할 수 있다.Further, the processor 140 may change data collection according to the driving speed of the own vehicle, and if the driving speed of the own vehicle is fast, the collection period may be increased. That is, the processor 140 may change the period of collecting simulation information data based on the driving speed of the host vehicle.

데이터추출부(142)는 시뮬레이션부(141)에서 시뮬레이션 정보 데이터를 자동으로 수집할 수 있다. 이때 데이터추출부(142)는 클라이언트(client)를 통해 데이터 추출을 위한 시뮬레이션 환경 데이터 등을 입력하여, 입력한 시뮬레이션 환경 데이터에 따른 시뮬레이션 정보 데이터를 수집할 수 있다. 예를 들어, 데이터 추출을 위한 시뮬레이션 환경 데이터는, 자차량 제어 데이터, 날씨 및 조명 제어 데이터 및 주변 객체 제어 데이터 등을 포함할 수 있다. The data extraction unit 142 may automatically collect simulation information data from the simulation unit 141 . At this time, the data extraction unit 142 may input simulation environment data for data extraction through a client and collect simulation information data according to the input simulation environment data. For example, simulation environment data for data extraction may include vehicle control data, weather and lighting control data, surrounding object control data, and the like.

특히, 일 실시 예에서, 데이터추출부(142)는 코너 케이스에 대한 시뮬레이션 정보 데이터 획득이 필요한 경우, 해당 코너 케이스로 주행 시뮬레이션 조건이 변경되도록 시뮬레이션 환경 데이터 변경 명령을 시뮬레이션부(141)에 요청할 수 있다. 따라서 시뮬레이션부(141)는 데이터추출부(142)에 의해 변경된 시뮬레이션 환경 데이터에 따라 시뮬레이션을 수행할 수 있으며, 데이터추출부(142)는 시뮬레이션부(141)의 시뮬레이션에 따른 시뮬레이션 정보 데이터를 수집할 수 있다. 또한, 시뮬레이션부(141)는 일정 기간 후에 수집 축적된 훈련 데이터들을 분석하여 특정 상황의 훈련 데이터들이 부족한 경우, 해당 상황에 맞도록 시뮬레이션 환경을 제어할 수 있다. 예를 들어, 코너 케이스에 대한 훈련 데이터들이 부족한 경우, 코너 케이스가 많은 경로를 반복하여 자차량을 주행하도록 주행 경로 또는 주행 기간 또는 주행 횟수를 설정할 수 있다. In particular, in one embodiment, when it is necessary to obtain simulation information data for a corner case, the data extraction unit 142 may request a simulation environment data change command from the simulation unit 141 to change driving simulation conditions for the corresponding corner case. there is. Therefore, the simulation unit 141 may perform a simulation according to the simulation environment data changed by the data extraction unit 142, and the data extraction unit 142 may collect simulation information data according to the simulation of the simulation unit 141. can In addition, the simulation unit 141 analyzes training data collected and accumulated after a certain period of time to control a simulation environment to suit the situation when training data in a specific situation is insufficient. For example, when training data for corner cases is insufficient, a driving route, driving period, or number of driving may be set so that the host vehicle is driven by repeating a route with many corner cases.

이때 데이터추출부(142)는 시뮬레이션이 수행되는 동안, 기 설정한 주기에 따라 또는 사용자의 설정에 기반하여 발생하는 이벤트에 따라 시뮬레이션 정보 데이터를 추출할 수 있다. 다만 이에 한정되지 않고, 데이터추출부(142)는 시간 진행에 따라 실시간으로 시뮬레이션 정보 데이터를 추출할 수 있다.At this time, while the simulation is being performed, the data extraction unit 142 may extract simulation information data according to a preset period or according to an event that occurs based on a user's setting. However, it is not limited thereto, and the data extraction unit 142 may extract simulation information data in real time as time progresses.

데이터추출부(142)는 주행 시뮬레이션 환경에서 주행 중인 자차량의 주행 중 시야(field of view: FOV)를 기준으로 한 시뮬레이션 정보 데이터를 수집할 수 있다. 즉, 데이터추출부(142)는 자차량에 가상으로 설치된 전방 카메라에서 촬영된 전방 영상을 획득할 수 있다. 또한 데이터추출부(142)는 전방 영상에 존재하는 적어도 하나의 객체에 대한 3차원 바운딩 박스를 획득할 수 있으며, 전방 영상의 시맨틱 분할 이미지도 획득할 수 있다.The data extraction unit 142 may collect simulation information data based on a field of view (FOV) of the own vehicle driving in a driving simulation environment. That is, the data extraction unit 142 may obtain a front image captured by a front camera virtually installed in the vehicle. In addition, the data extraction unit 142 may obtain a 3D bounding box for at least one object present in the front image, and may also obtain a semantic segmentation image of the front image.

그리고 데이터추출부(142)는 가상 세계의 중심이 원점인 월드 좌표계, 자차량의 중심이 원점인 차량 좌표계 및 전방 영상의 왼쪽 상단 모서리가 원점인 이미지 좌표계 정보를 획득할 수 있다. 즉 일 실시 예에서는, 좌표 원점이 마을의 중심인 월드 좌표계(3차원), 좌표 원점이 자차량의 중심인 차량 좌표계(3차원) 및 좌표 원점이 이미지의 왼쪽 상단 모서리인 이미지 좌표계(2차원)와 같이 3가지 다른 좌표계에 기반하여 시뮬레이션 및 시뮬레이션 관련 데이터 추출을 수행할 수 있다.Further, the data extraction unit 142 may obtain information on a world coordinate system in which the center of the virtual world is the origin, a vehicle coordinate system in which the center of the own vehicle is the origin, and an image coordinate system in which the top left corner of the front image is the origin. That is, in one embodiment, a world coordinate system (3D) in which the coordinate origin is the center of the village, a vehicle coordinate system (3D) in which the coordinate origin is the center of the own vehicle, and an image coordinate system (2D) in which the coordinate origin is the upper left corner of the image. Simulation and simulation-related data extraction can be performed based on three different coordinate systems.

다시 말하면, 데이터추출부(142)는 날씨 및 조명 조건과 주변 차량 및 보행자 측면에서 시뮬레이션 된 가상 세계를 제어하여 데이터를 추출할 수 있다. 또한 데이터추출부(142)는 2차원 바운딩 박스 생성에 필요한 3차원 바운딩 박스 및 클래스, 카메라 이미지 및 시맨틱 분할 이미지와 같은, 2차원 바운딩 박스 생성에 필요한 시뮬레이션 정보 데이터를 추출할 수 있다.In other words, the data extraction unit 142 may extract data by controlling the simulated virtual world in terms of weather and lighting conditions and surrounding vehicles and pedestrians. In addition, the data extraction unit 142 may extract simulation information data necessary for generating a 2D bounding box, such as a 3D bounding box, a class, a camera image, and a semantic segmentation image necessary for generating a 2D bounding box.

3차원 바운딩 박스는, 3차원 좌표에서 모든 객체의 위치와 자세를 나타내는 것이다. 예를 들어, n개의 행위자가 마을에서 활동한다고 가정하면, 데이터추출부(142)는 n개의 3차원 바운딩 박스를 추출할 수 있다. 3차원 바운딩 박스는 3차원 바운딩 박스 세트

와 같이 8개의 꼭지점으로 나타낼 수 있다. 그리고 각 점 C는 3차원 좌표에서 (x, y, z)로 나타낼 수 있다. 이때 데이터추출부(142)는

로 나타낼 수 있는 각 행위자의 클래스(차량 또는 보행자)도 함께 추출할 수 있다.A 3D bounding box represents the position and posture of all objects in 3D coordinates. For example, assuming that n actors are active in a village, the data extraction unit 142 may extract n 3D bounding boxes. A 3D bounding box is a set of 3D bounding boxes.

It can be represented by 8 vertices as And each point C can be expressed as (x, y, z) in three-dimensional coordinates. At this time, the data extraction unit 142

The class (vehicle or pedestrian) of each actor that can be represented by can also be extracted.

그리고 카메라 이미지는, 자차량에 설치된 가상 전방 카메라의 2차원 RGB 데이터를 의미할 수 있다. 일 실시 예에서는, 예를 들어 960 X 540 해상도로 RGB 이미지를 획득할 수 있는 전방 카메라가 자차량에 설치될 수 있다. 따라서 전방 카메라에서 획득되는 카메라 이미지는, 960 X 540 행렬 I로 나타낼 수 있다. 여기서 각 요소 I[x][y]는 2차원 이미지 좌표의 위치 (x, y)에 있는 픽셀의 색상 코드를 나타낼 수 있다.Also, the camera image may refer to 2D RGB data of a virtual front camera installed in the vehicle. In one embodiment, for example, a front camera capable of obtaining an RGB image with a resolution of 960 X 540 may be installed in the vehicle. Therefore, a camera image obtained from a front camera can be represented by a 960 X 540 matrix I. Here, each element I[x][y] may represent the color code of a pixel at the position (x, y) of the 2D image coordinates.

시맨틱 분할 이미지는, 각 카메라 이미지에 대해 픽셀 단위로 서로 다른 색상 코드로 분류된(예를 들어, 차량의 경우 파란색, 보행자의 경우 빨간색) 이미지를 의미할 수 있다. 예를 들어, 각 RGB 이미지에 대해 해당 시맨틱 분할 이미지가 960 X 540 행렬 S로 추출될 수 있다. 여기서 각 요소 S[x][y]는 픽셀이 사전 정의된 색상 코드에 기반한 객체 클래스를 나타낼 수 있다.The semantic segmentation image may refer to an image classified into different color codes (for example, blue for vehicles and red for pedestrians) in units of pixels for each camera image. For example, for each RGB image, a corresponding semantic segmentation image may be extracted as a 960 X 540 matrix S. Here, each element S[x][y] can represent an object class whose pixels are based on a predefined color code.

데이터생성부(143)는 자차량이 운전되는 동안, 데이터추출부(142)로부터의 3차원 경계 상자, 카메라 이미지, 시맨틱 분할 이미지와 같은 원시(raw) 데이터를 처리하여, 모든 객체에 대한 정확한 2차원 바운딩 박스를 찾아, 객체 인식용 훈련 데이터 생성을 수행할 수 있다. 다시 말하면, 데이터생성부(143)는 시뮬레이션 정보 데이터를 기반으로 자차량의 주행 중 시야 내에 존재하는 적어도 하나의 객체의 2차원 바운딩 박스를 생성할 수 있다.While the vehicle is being driven, the data generator 143 processes raw data such as a 3D bounding box, camera image, and semantic segmentation image from the data extraction unit 142 to obtain accurate 2-dimensional images for all objects. By finding a dimensional bounding box, it is possible to generate training data for object recognition. In other words, the data generator 143 may generate a 2D bounding box of at least one object existing within the field of view while the vehicle is driving based on the simulation information data.

한편, 일 실시 예에서, 객체 인식용 훈련 데이터 생성은, 데이터 추출의 시점과 다르게 오프라인에서 작동하는 별도의 프로세스로 실행할 수 있다. 즉, 데이터생성부(143)에서 수행되는 프로세스를 후처리 과정이라고 칭할 수 있다. 다만 이에 한정되는 것은 아니며, 실시간으로 시뮬레이션 정보 데이터가 수집되고 동시에 객체 인식용 훈련 데이터가 생성될 수도 있다.Meanwhile, in one embodiment, generation of training data for object recognition may be performed as a separate process that operates offline, different from the time point of data extraction. That is, the process performed by the data generator 143 may be referred to as a post-processing process. However, it is not limited thereto, and simulation information data may be collected in real time and training data for object recognition may be generated at the same time.

도 5는 본 개시의 일 실시 예에 따른 최종 바운딩 박스 생성을 위한 후처리 과정을 설명하기 위한 예시도이다.5 is an exemplary diagram for explaining a post-processing process for generating a final bounding box according to an embodiment of the present disclosure.

도 5를 참조하면, 데이터생성부(143)는 3차원 바운딩 박스의 꼭지점 좌표를 이미지 좌표계에 투영하고, 2차원 이미지 좌표계에 투영된 3차원 바운딩 박스를 2차원 바운딩 박스로 변환할 수 있다. 그리고 데이터생성부(143)는 시맨틱 분할 이미지에 기반하여 가려진 객체의 2차원 바운딩 박스를 필터링하고, 시맨틱 분할 이미지에 기반하여 2차원 바운딩 박스에 대응하는 객체의 형태에 적합하게 2차원 바운딩 박스의 형태를 변경하여 최종 바운딩 박스를 생성할 수 있다.Referring to FIG. 5 , the data generator 143 may project vertex coordinates of the 3D bounding box onto an image coordinate system and convert the 3D bounding box projected onto the 2D image coordinate system into a 2D bounding box. And the data generator 143 filters the 2D bounding box of the obscured object based on the semantic segmentation image, and the shape of the 2D bounding box is suitable for the shape of the object corresponding to the 2D bounding box based on the semantic segmentation image. can be changed to create the final bounding box.

일 실시 예에서는, 최종 바운딩 박스를 적절한 파일 형식으로 저장하여, 객체 인식용 훈련 데이터로 생성할 수 있다. 예를 들어, 객체 클래스 번호와 높이(h) 및 너비(w)와 함께 중심점(x, y)으로 표시되는 2차원 바운딩 박스가 있는 일반 텍스트 파일 형식이 사용될 수 있다. 또한 JSON(JavaScript Object Notation) 기반 파일 형식이 사용될 수도 있다. 그러나 일 실시 예에서는 파일 형식이 한정되지 않으며, 또한 다른 형식 간의 변환이 간단하기 때문에, 다양한 객체 인식 알고리즘에 적용될 수 있다.In one embodiment, the final bounding box may be saved in an appropriate file format and generated as training data for object recognition. For example, a plain text file format with a two-dimensional bounding box denoted by a center point (x, y) along with an object class number and height (h) and width (w) can be used. Also, a JSON (JavaScript Object Notation) based file format may be used. However, in one embodiment, since the file format is not limited and conversion between different formats is simple, it can be applied to various object recognition algorithms.

한편, 일 실시 예에서는, 카메라 이미지에 여러 객체가 있는 경우, 각 객체를 하나씩 처리할 수 있다. 따라서 이하에서는, 단일 객체의 바운딩 박스를 추출하는 것을 실시 예로 하여 설명하도록 한다.Meanwhile, in an embodiment, when there are several objects in the camera image, each object may be processed one by one. Therefore, hereinafter, extracting a bounding box of a single object will be described as an example.

상술한 바와 같이, 일 실시 예의 시뮬레이션 환경에서는, 세 가지 다른 좌표계가 공존할 수 있다. 시뮬레이션부(141)에서는 차량 좌표에서 마을에 있는 모든 객체의 3차원 바운딩 박스를 제공할 수 있다. 그리고 자차량과 카메라의 위치는 월드 좌표계에서 제공될 수 있다. As described above, in the simulation environment of one embodiment, three different coordinate systems may coexist. The simulation unit 141 may provide a 3D bounding box of all objects in the village in vehicle coordinates. Also, the location of the vehicle and the camera may be provided in the world coordinate system.

따라서, 데이터생성부(143)는 차량 좌표에 있는 3차원 바운딩 박스의 8개 점을 전방 카메라의 2차원 이미지 좌표에 있는 투영점으로 변환해야 한다.Therefore, the data generator 143 needs to convert eight points of the 3D bounding box at vehicle coordinates into projected points at 2D image coordinates of the front camera.

이를 위해, 데이터생성부(143)는 월드 좌표계에서의 자차량의 중심점을 기반으로, 3차원 바운딩 박스의 꼭지점 좌표 각각에 변환 행렬을 적용할 수 있다. 일 실시 예에서는, 월드 좌표계에서 자차량의 중심점을 알고 있기 때문에, 차량 좌표계의 각 점(x, y, z)은 변환 행렬을 사용하여 월드 좌표계로 변환할 수 있다.To this end, the data generator 143 may apply a transformation matrix to each vertex coordinate of the 3D bounding box based on the center point of the own vehicle in the world coordinate system. In one embodiment, since the center point of the own vehicle is known in the world coordinate system, each point (x, y, z) of the vehicle coordinate system can be transformed into the world coordinate system using a transformation matrix.

그리고 데이터생성부(143)는 월드 좌표계에서의 전방 카메라의 위치를 기반으로, z축이 전방 카메라의 FOV 방향과 평행하도록 임시 3차원 카메라 좌표를 생성할 수 있다. 즉, 데이터생성부(143)는 전방 카메라 시점으로의 원근 변환을 위해 전방 카메라의 위치도 통합하여 임시 3차원 카메라가 전방 카메라의 시야 방향과 평행한 z축과 함께 좌표를 생성할 수 있다. Further, the data generator 143 may generate temporary 3D camera coordinates such that the z-axis is parallel to the FOV direction of the front camera based on the position of the front camera in the world coordinate system. That is, the data generator 143 may also integrate the position of the front camera for perspective conversion to the front camera viewpoint, and the temporary 3D camera may generate coordinates along with the z-axis parallel to the viewing direction of the front camera.

다음으로 데이터생성부(143)는 전방 카메라의 FOV 및 이미지 해상도를 기반으로, 임시 3차원 카메라 좌표의 각 꼭지점에 대해 원근 변환을 수행할 수 있다.Next, the data generator 143 may perform perspective transformation on each vertex of the temporary 3D camera coordinates based on the FOV and image resolution of the front camera.

도 5(a)에 도시된 바와 같이, 데이터생성부(143)는 3차원 바운딩 박스의 각 꼭지점을 모두 포함하는 최소 크기의 2차원 바운딩 박스를 결정할 수 있다.As shown in FIG. 5(a), the data generator 143 may determine a 2D bounding box having a minimum size including all vertices of the 3D bounding box.

도 5(a)의 가장 왼쪽 이미지의 직육면체의 8개 꼭지점은 2차원 이미지 좌표에 투영된 3차원 바운딩 박스를 나타낸다. 그리고 도 5(a)의 가장 오른쪽 이미지의 직사각형은 각 직육면체에 대한 최소 영역 바운딩 박스를 나타낸다.The eight vertices of the cuboid in the leftmost image of FIG. 5 (a) represent a 3D bounding box projected onto 2D image coordinates. And the rectangle in the rightmost image of FIG. 5(a) represents the minimum area bounding box for each rectangular parallelepiped.

일 실시 예로, 2차원 이미지 좌표계에 투영된 3차원 바운딩 박스를 2차원 바운딩 박스로 변환하는 프로세스의 구현을 위한 알고리즘을 아래 표 1과 같이 개략적으로 나타낼 수 있다.As an example, an algorithm for implementing a process of transforming a 3D bounding box projected on a 2D image coordinate system into a 2D bounding box may be schematically shown in Table 1 below.

표 1은 투영된 3차원 바운딩 박스를 나타내는

이

로 표시되는 2차원 바운딩 박스로 변환되는 과정을 나타낸다.Table 1 shows the projected 3D bounding box.

this

It shows the process of converting to a 2-dimensional bounding box represented by .

이러한 2차원 바운딩 박스는, 폐색(다른 객체에 의해 가려짐)으로 인해 보이지 않는 물체가 존재한다는 문제가 있을 수 있다. 예를 들어, 도 5(a)에서 왼쪽 상단 모서리에 있는 3차원 바운딩 박스에 대한 객체는 건물 뒤에 있어 제거되어야 하는 객체이다.Such a two-dimensional bounding box may have a problem in that an invisible object exists due to occlusion (occluded by another object). For example, in FIG. 5(a), the object for the 3D bounding box in the upper left corner is an object that needs to be removed because it is behind a building.

또한 상기 생성된 2차원 바운딩 박스는, 바운딩 박스와 객체 경계 사이에 간격이 넓다는 문제가 있을 수 있다. 바운딩 박스가 객체의 경계와 최대한 일치하지 않으면, 해당 훈련 데이터로 훈련된 모델의 심각한 객체 인식 오류가 발생할 수 있다. Also, the generated 2D bounding box may have a problem in that the gap between the bounding box and the object boundary is wide. If the bounding box does not match the boundary of the object as much as possible, serious object recognition errors may occur in the model trained with the corresponding training data.

따라서, 데이터생성부(143)는 2차원 바운딩 박스, 2차원 바운딩 박스에 대응하는 객체의 클래스 정보 및 시맨틱 분할 이미지를 기반으로 2차원 바운딩 박스에 대응하는 객체의 가려짐 여부를 판단할 수 있다. 객체의 클래스 정보는 데이터추출부(142)에서 추출될 수 있다.Accordingly, the data generator 143 may determine whether the object corresponding to the 2D bounding box is occluded based on the 2D bounding box, the class information of the object corresponding to the 2D bounding box, and the semantic segmentation image. Class information of the object may be extracted by the data extraction unit 142 .

데이터생성부(143)는 이미지 좌표계 상에서 2차원 바운딩 박스의 중심점 좌표를 획득할 수 있다. 그리고 데이터생성부(143)는 시맨틱 분할 이미지 상에서, 중심점 좌표에 대응하는 클래스 정보가 2차원 바운딩 박스에 대응하는 객체의 클래스와 동일한 것으로 판단되면, 2차원 바운딩 박스에 대응하는 객체를 가려지지 않은 객체로 결정할 수 있다.The data generator 143 may obtain coordinates of the center point of the 2D bounding box on the image coordinate system. And, if it is determined that the class information corresponding to the center point coordinates is the same as the class of the object corresponding to the 2D bounding box on the semantic segmentation image, the data generator 143 converts the object corresponding to the 2D bounding box to an unoccluded object. can be determined by

시맨틱 분할 이미지는 각 픽셀이 객체의 클래스를 나타내는 색상 코드로 분류된 것이다. 예를 들어, 도 5(b)에 도시된 바와 같이, 차량과 다른 객체들은 다른 색으로 구분되어 표시될 수 있다.A semantic segmentation image is one in which each pixel is classified with a color code representing the class of an object. For example, as shown in FIG. 5( b ), vehicles and other objects may be displayed in different colors.

도 5(b)의 가장 왼쪽 이미지의 왼쪽 상단 모서리에 표시된 바운딩 박스는 건물이 가리고 있기 때문에 실제로 보이는 객체가 아님을 나타낸다. 따라서 일 실시 예에서는, 도 5(b)의 두 번째 이미지와 같이 해당 객체를 제거할 수 있다. The bounding box displayed in the upper left corner of the leftmost image of FIG. 5(b) indicates that the object is not actually visible because it is covered by a building. Accordingly, in one embodiment, the corresponding object may be removed as shown in the second image of FIG. 5(b).

일 실시 예로, 시맨틱 분할 이미지에 기반하여 가려진 객체의 2차원 바운딩 박스를 필터링하는 프로세스의 구현을 위한 알고리즘을 아래 표 2와 같이 개략적으로 나타낼 수 있다.As an embodiment, an algorithm for implementing a process of filtering a 2D bounding box of an occluded object based on a semantic segmentation image may be schematically shown in Table 2 below.

표 2는 주어진 2차원 바운딩 박스 내의 유효하고 가시적인 객체의 존재를 확인할 수 있는 프로세스를 나타낸다. 표 2에 따르면, 데이터생성부(143)는 2차원 바운딩 박스, 해당 객체 클래스 및 시맨틱 분할 이미지를 사용할 수 있다. 그리고 데이터생성부(1430는 시맨틱 분할 이미지를 확인하여 바운딩 박스의 중심 픽셀이 주어진 객체 클래스에 속하는 것으로 식별되면 보이는 것으로 결정할 수 있다.Table 2 shows a process that can confirm the existence of a valid and visible object within a given 2D bounding box. According to Table 2, the data generator 143 may use a 2D bounding box, a corresponding object class, and a semantic segmentation image. Further, the data generator 1430 checks the semantic segmentation image and determines that the central pixel of the bounding box is visible if it is identified as belonging to a given object class.

한편, 일 실시 예에서는, 중심 픽셀 이외의 픽셀에 대한 샘플링을 수행하여, 일부가 가려진 것도 훈련 데이터로 생성되도록 할 수 있다. Meanwhile, in an embodiment, sampling is performed on pixels other than the central pixel so that a partially occluded pixel may be generated as training data.

도 5(c)를 참조하면, 데이터생성부(143)는 시맨틱 분할 이미지 상에서, 2차원 바운딩 박스의 각 경계가 2차원 바운딩 박스의 객체의 클래스 분할 영역 내의 픽셀에 도달할 때까지 2차원 바운딩 박스의 각 경계를 중심점에 가까워지는 방향으로 이동시킴으로써, 최종 바운딩 박스를 생성할 수 있다.Referring to FIG. 5(c), the data generator 143 performs the 2D bounding box on the semantic segmentation image until each boundary of the 2D bounding box reaches a pixel within the class segmentation area of the object of the 2D bounding box. By moving each boundary of in a direction closer to the center point, the final bounding box can be created.

도 5(c)의 가장 왼쪽 이미지는 바운딩 박스와 실제 차량의 경계 사이에 상당한 간격이 있음을 나타낸다. 따라서 데이터생성부(143)는 바운딩 박스를 보다 정확하게 생성하기 위해 바운딩 박스를 객체를 차지하는 정확한 영역에 맞춰야 한다. 이를 위해 데이터생성부(143)는 시맨틱 분할 이미지를 사용할 수 있다.The leftmost image of FIG. 5(c) shows that there is a significant gap between the bounding box and the boundary of the actual vehicle. Accordingly, the data generator 143 needs to fit the bounding box to the exact area occupied by the object in order to more accurately generate the bounding box. To this end, the data generator 143 may use a semantic segmentation image.

도 5(c)의 가운데 이미지는 2차원 바운딩 박스의 각 경계가 중심점에 가까워지는 방향으로 이동하여 2차원 바운딩 박스와 객체의 간격이 최소화되는 과정을 나타낸다. 즉 데이터생성부(143)는 2차원 바운딩 박스의 각 경계가 2차원 바운딩 박스의 객체의 클래스 분향 영역 내의 픽셀에 도달하면 도 5(c)의 가장 오른쪽 이미지와 같이 객체에 바운딩 박스를 고정할 수 있다.The middle image of FIG. 5(c) shows a process in which the distance between the 2D bounding box and the object is minimized by moving each boundary of the 2D bounding box in a direction closer to the center point. That is, the data generator 143 may fix the bounding box to the object as shown in the rightmost image of FIG. 5(c) when each boundary of the 2D bounding box reaches a pixel within the class-direction area of the object of the 2D bounding box. there is.

일 실시 예로, 시맨틱 분할 이미지에 기반하여 2차원 바운딩 박스에 대응하는 객체의 형태에 적합하게 2차원 바운딩 박스의 형태를 변경하여 최종 바운딩 박스를 생성하는 프로세스의 구현을 위한 알고리즘을 아래 표 3과 같이 개략적으로 나타낼 수 있다.As an embodiment, an algorithm for implementing a process of generating a final bounding box by changing the shape of a 2D bounding box to suit the shape of an object corresponding to the 2D bounding box based on a semantic segmentation image is shown in Table 3 below. can be shown schematically.

표 3은 원래 바운딩 박스

가 입력으로 제공되고 최종 바운딩 박스

가 출력되는 최종 바운딩 박스 생성 과정을 나타낸다. Table 3 is the original bounding box

is given as input and the final bounding box

Indicates the final bounding box generation process in which is output.

도 6은 본 개시의 일 실시 예에 따른 2차원 이미지 좌표에서 생성된 객체에 대한 히트맵이고, 도 7은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치의 확장성 평가 그래프이며, 도 8은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치의 정확성 평가 도면이고, 도 9는 본 개시의 일 실시 예에 다른 객체 인식용 훈련 데이터 생성 장치의 훈련 성과 평가 그래프이다.6 is a heat map for an object generated from 2D image coordinates according to an embodiment of the present disclosure, and FIG. 7 is a scalability evaluation graph of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure. 8 is an accuracy evaluation diagram of an apparatus for generating training data for object recognition according to an embodiment of the present disclosure, and FIG. 9 is a training performance evaluation graph of another apparatus for generating training data for object recognition according to an embodiment of the present disclosure.

도 6 내지 도 9을 참조하여, 일 실시 예의 객체 인식용 훈련 데이터 생성 장치(100)의 성능을 검증하기 위해 실험을 진행하였다.6 to 9, experiments were conducted to verify the performance of the apparatus 100 for generating training data for object recognition according to an embodiment.

일 실시 예의 객체 인식용 훈련 데이터 생성 장치(100)는 Carla 시뮬레이션 서버에 기반하여 구현될 수 있다. 이하에서, 객체 인식용 훈련 데이터 생성 장치(100)는 CarFree로 기재할 수 있다.The apparatus 100 for generating training data for object recognition according to an embodiment may be implemented based on a Carla simulation server. Hereinafter, the training data generating apparatus 100 for object recognition may be described as CarFree.

일 실시 예의 객체 인식용 훈련 데이터 생성 장치(100)는 스크립트로 구현될 수 있다. 일 실시 예에서, 데이터 추출 부분은 마을에 필요한 차량과 보행자를 배치하기 위한 제1 스크립트, 날씨 및 조명 조건을 제어하기 위한 제2 스크립트 및 차차량을 제어하고 차량 시뮬레이터에서 데이터를 추출하기 위한 제3 스크립트의 세 가지 스크립트로 구현될 수 있다.The apparatus 100 for generating training data for object recognition according to an embodiment may be implemented as a script. In one embodiment, the data extraction unit includes a first script for arranging vehicles and pedestrians necessary for the village, a second script for controlling weather and lighting conditions, and a third script for controlling vehicles and extracting data from a vehicle simulator. It can be implemented with three scripts in the script.

객체 인식용 훈련 데이터 생성 장치(100)는, 명령줄 옵션에서 지정한 기간으로 데이터를 지속적으로 검색하는 주기적 데이터 추출과, 사용자의 키 누름에 의해 트리거 되는 이벤트 기반 데이터 추출의 두 가지 작동 모드를 지원할 수 있다. 이벤트 기반 데이터 추출은 시뮬레이션을 모니터링 하는 동안 사용자가 우연히(accidentally) 코너 케이스 시나리오를 찾을 때 유용할 수 있다.The apparatus 100 for generating training data for object recognition can support two operation modes: periodic data extraction in which data is continuously retrieved for a period specified in a command line option, and event-based data extraction triggered by a user's key press. there is. Event-based data extraction can be useful when a user accidentally finds a corner case scenario while monitoring a simulation.

일 실시 예에서, 추출된 데이터는 지정된 디렉토리에 있는 이미지 파일(카메라 이미지 및 시맨틱 분할 이미지)과 텍스트 파일(3차원 바운딩 박스)에 기록될 수 있다. In one embodiment, the extracted data may be recorded in image files (camera images and semantic segmentation images) and text files (3D bounding boxes) in a designated directory.

일 실시 예에서는, 확장성, 정확도 및 훈련 성능 측면에서, 객체 인식용 훈련 데이터 생성 장치(100)를 검증할 수 있다. 검증을 위해 4개의 다른 마을에서 5000개의 훈련 이미지와 실제 주행 환경에서 수집된 Ground truth 데이터(훈련 데이터)를 통합하였다. 각 시뮬레이션 동안 차량 시뮬레이터의 블루스크린에서 50대의 무작위 차량과 30명의 무작위 보행자가 배치되었다. 그리고 차량과 보행자들은 자동 조종 장치 모듈에 의해 마을 주변을 자율적으로 이동하도록 하였다.In an embodiment, the apparatus 100 for generating training data for object recognition may be verified in terms of scalability, accuracy, and training performance. For verification, 5000 training images from 4 different villages and ground truth data (training data) collected in real driving environments were integrated. During each simulation, 50 random vehicles and 30 random pedestrians were placed on the vehicle simulator's blue screen. And vehicles and pedestrians were allowed to move autonomously around the village by the autopilot module.

자차량은 960 X 540의 해상도로 이미지를 획득하는 90도 FOV의 전방 카메라와 함께 배치되었다. 태양의 고도와 방위각은 다양한 조명 조건을 시뮬레이션 하면서 빠르게 변한다. 날씨 조건은 날씨 매개변수(예: 흐림, 강수량 및 바람 등)를 변경하여 제어된다.The vehicle was deployed with a 90 degree FOV front camera acquiring images at a resolution of 960 X 540. The altitude and azimuth of the sun change rapidly, simulating different lighting conditions. Weather conditions are controlled by changing weather parameters (e.g. cloudiness, precipitation and wind).

합성된 이미지에서 객체의 위치를 확인하기 위해 도 6은 2차원 이미지 좌표에 따른 히트맵을 나타낸다. 히트맵은 객체의 픽셀 단위 강도를 계산하여 생성될 수 있다. 이를 위해 합성된 모든 이미지에서 차량과 보행자에 대한 각 픽셀을 커버하는 바운딩 박스를 추출할 수 있다.6 shows a heat map according to 2D image coordinates to determine the location of an object in the synthesized image. A heatmap may be created by calculating the intensity of an object in units of pixels. To this end, a bounding box covering each pixel for vehicles and pedestrians can be extracted from all synthesized images.

도 6(a)는 소실점 주변에서 차량이 수평선을 따라 더 강하게 나타나는 것을 보여준다. 도 6(b)는 보행자가 주로 보도를 따라 걷기 때문에 이미지의 왼쪽과 오른쪽에 나타날 가능성이 더 높음을 보여준다. 그러나 일부 보행자는 중앙 지역의 밝은 점으로 표시된 도로를 임의로 횡단한다. 객체의 수는 각 이미지에서 무작위로 다를 수 있다. 예를 들어, 이미지의 27%에는 단 하나의 객체가 있고, 이미지의 36%에는 두 개의 객체가 포함되어 있을 수 있으며, 이미지의 20%에는 세 개의 객체가, 이미지의 18%에는 4개 이상의 객체가 있을 수 있다.6(a) shows that the vehicle appears more strongly along the horizontal line around the vanishing point. Figure 6(b) shows that pedestrians are more likely to appear on the left and right sides of the image because they mainly walk along the sidewalk. However, some pedestrians arbitrarily cross the road marked with bright dots in the central area. The number of objects may vary randomly in each image. For example, 27% of images may contain only one object, 36% of images may contain two objects, 20% of images may contain three objects, and 18% of images may contain four or more objects. there may be

도 7을 참조하여, 객체 인식용 훈련 데이터 생성 장치(100)와, 경험이 없는 작업자 및 숙련된 작업자가 바운딩 박스를 생성할 때의 평균 처리량(즉, 초당 이미지)을 비교할 수 있다. y축은 로그 스케일이다. 도 7은 각 이미지의 객체 수가 인간 노동의 라벨링 성능에 큰 영향을 미치는 반면, 객체 인식용 훈련 데이터 생성 장치(100)의 프로세스에서는 그 영향이 중요하지 않음을 보여준다. 또한, 객체 인식용 훈련 데이터 생성 장치(100)는 6개의 객체의 경우 초당 평균 5개의 이미지를 처리하는데, 이는 숙련된 작업자의 결과보다 약 75배 빠른 속도이다.Referring to FIG. 7 , it is possible to compare the average throughput (ie, images per second) of the apparatus 100 for generating training data for object recognition and when a bounding box is generated by an inexperienced operator and an experienced operator. The y-axis is on a logarithmic scale. 7 shows that the number of objects in each image has a great effect on the labeling performance of human labor, but the effect is not significant in the process of the apparatus 100 for generating training data for object recognition. In addition, the apparatus 100 for generating training data for object recognition processes an average of 5 images per second in the case of 6 objects, which is about 75 times faster than the result of a skilled worker.

도 8에 도시된 바와 같이, 객체 인식용 훈련 데이터 생성 장치(100)에 의해 생성된 바운딩 박스는 픽셀 수준 정확도를 갖는 반면 수동 레이블링은 그러한 정확한 바운딩 박스를 생성할 수 없다. 또한 사람의 눈에는 멀리 있는 아주 작은 물체가 쉽게 무시되지만, 객체 인식용 훈련 데이터 생성 장치(100)에서는 그러한 작은 물체도 놓치지 않는다.As shown in Fig. 8, the bounding box generated by the apparatus 100 for generating training data for object recognition has pixel-level accuracy, whereas manual labeling cannot generate such an accurate bounding box. In addition, a very small object far away is easily ignored by the human eye, but the training data generation apparatus 100 for object recognition does not miss such a small object.

도 9를 참조하여, 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 데이터 세트의 성능은 다른 데이터 세트에서 무작위로 선택된 200개의 이미지 테스트 세트로 평가되었다. 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 데이터 세트의 성능을 검증하기 위한 객체 인식 모델은 Darknet 프레임워크 기반의 YOLO v3 DNN이 사용되었다. 일 실시 예에서는, 훈련을 위해 합성된 것과 함께 KITTI 데이터 세트의 실제 운전 이미지가 사용되었다. KITTI only, CarFree only, KITTI+CarFree(random) 및 KITTI+CarFree(4 객체 이상)과 같은 이미지 구성으로 훈련된 모델을 각각 평가하였다. Referring to FIG. 9 , the performance of the data set generated by the apparatus 100 for generating training data for object recognition was evaluated with 200 image test sets randomly selected from other data sets. YOLO v3 DNN based on the Darknet framework was used as an object recognition model for verifying the performance of the data set generated by the apparatus 100 for generating training data for object recognition. In one embodiment, actual driving images from the KITTI data set were used along with synthesized ones for training. Models trained with image configurations such as KITTI only, CarFree only, KITTI+CarFree (random) and KITTI+CarFree (more than 4 objects) were evaluated, respectively.

여기서, KITTI only는 KITTI 데이터 베이스 이미지만 사용한 것이고, CarFree only는 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 이미지만 사용한 것이다. KITTI+CarFree(random)은 KITTI 데이터 베이스 이미지를 기준으로 사용하되, 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 이미지가 무작위로 점진적으로 추가된 것이다. KITTI+CarFree(4개 객체 이상)은 KITTI 이미지와 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 이미지의 합성 데이터 세트에서 임의의 이미지를 선택하되, 최소 4개의 객체가 있는 이미지만 고려된 것이다.Here, KITTI only uses only KITTI database images, and CarFree only uses only images generated by the apparatus 100 for generating training data for object recognition. KITTI+CarFree (random) uses the KITTI database image as a reference, but images generated by the apparatus 100 for generating training data for object recognition are randomly and gradually added. KITTI+CarFree (more than 4 objects) selects a random image from a synthetic data set of images generated by the KITTI image and the training data generating apparatus 100 for object recognition, but only images with at least 4 objects are considered .

도 9(a)는 훈련 이미지를 더 많이 추가하여 훈련된 모델의 mAP(평균 평균 정밀도)가 어떻게 향상되는지 나타낸다. 비교를 위해 처음에 500개의 무작위 KITTI 이미지를 사용하여 DNN을 훈련했다. 그런 다음 KITTI 데이터 세트의 이미지 또는 합성 이미지가 규칙에 따라 훈련 세트에 점진적으로 추가되었다. 도 9(a)에서 x축은 기본 KITTI 이미지와 추가 이미지를 포함하여 훈련 이미지의 총 수를 나타낸다. Fig. 9(a) shows how the mean average precision (mAP) of the trained model is improved by adding more training images. For comparison, we initially trained a DNN using 500 random KITTI images. Then images or composite images from the KITTI dataset were progressively added to the training set according to the rules. In Fig. 9(a), the x-axis represents the total number of training images including the basic KITTI image and additional images.

도 9(a)에 도시된 바와 같이, 이미지를 더 많이 추가할수록 일반적으로 성능이 향상되었다. 그리고 KITTI + CarFree(random)가 KITTI only보다 성능이 능가하는 것으로 나타나는데, 그 이유는 합성된 이미지가 KITTI 데이터 세트보다 다양하기 때문일 수 있다. 이러한 경향은 더 복잡한 운전 장면을 포함하는 KITTI + CarFree(4개 객체 이상)에서 더 또렷하게 나타난다. As shown in Fig. 9(a), the performance generally improved as more images were added. And KITTI + CarFree (random) appears to outperform KITTI only, which may be because the synthesized images are more diverse than the KITTI data set. This trend is more evident in KITTI + CarFree (more than 4 objects), which includes more complex driving scenes.

도 9(b)는 비오는 날, 야간 운전, 일몰 운전과 같은 비정상적인 날씨 및 조명 조건에서 특별히 선택된 이미지로 구성된 다른 평가 데이터 세트의 성능을 비교한 것이다. 이러한 가혹한(harsh) 주행 환경에서 KITTI는 도 9(a)에 표시된 일반 주행 환경보다 훨씬 더 나쁜 성능을 나타낸다. 반면, 합성한 이미지를 추가하면 성능이 크게 향상되었다.9(b) compares the performance of different evaluation data sets composed of specially selected images under abnormal weather and lighting conditions such as rainy day, night driving, and sunset driving. In such a harsh driving environment, KITTI exhibits much worse performance than the general driving environment shown in FIG. 9(a). On the other hand, adding the synthesized image significantly improved the performance.

또한 도 9(b)를 참조하면, CarFree도 KITTI보다 성능이 우수한데, 그 이유는 현실 세계의 KITTI 데이터 세트가 대부분이 다양성 없이 맑은 날의 주행 이미지로 구성되어 있기 때문이다. 즉, 일 실시 예의 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 데이터(CarFree)는 실제 이미지와 함께 인공적인 다양성을 가진 합성 이미지를 사용하기 때문에, DNN 성능을 크게 향상시킬 수 있다.Also, referring to FIG. 9(b), CarFree also outperforms KITTI because most of the KITTI data set in the real world consists of driving images on sunny days without diversity. That is, since the data (CarFree) generated by the apparatus 100 for generating training data for object recognition according to an embodiment uses synthetic images having artificial diversity together with real images, DNN performance can be greatly improved.

상기 실험을 통해 평가한 바와 같이, 객체 인식용 훈련 데이터 생성 장치(100)는 인간의 노동 없이 픽셀 수준의 정확도로 많은 훈련 데이터(이미지와 바운딩 박스)를 생성할 수 있다. 또한 가상 세계에서 날씨와 조명 조건을 자유롭게 제어하고 복잡한 운전 장면을 인공적으로 구성할 수 있기 때문에 매우 다양한 합성 이미지를 생성할 수 있다. 따라서 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 훈련 데이터는 대부분의 공공 자율 주행 데이터 세트에서 발견되는 평범하고 지루한 실제 운전 이미지보다 머신 러닝(예를 들어, DNN) 훈련에 더욱 효과적일 수 있다.As evaluated through the above experiments, the apparatus 100 for generating training data for object recognition can generate a lot of training data (images and bounding boxes) with pixel-level accuracy without human labor. In addition, since the weather and lighting conditions can be freely controlled in the virtual world and complex driving scenes can be artificially composed, a wide variety of synthetic images can be created. Therefore, training data generated by the apparatus 100 for generating training data for object recognition may be more effective for machine learning (e.g., DNN) training than ordinary and boring actual driving images found in most public self-driving data sets. .

한편, 도 10은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 장치가 적용된 차량을 개략적으로 나타낸 예시도이다.Meanwhile, FIG. 10 is an exemplary diagram schematically illustrating a vehicle to which an apparatus for generating training data for object recognition according to an embodiment of the present disclosure is applied.

차량(10)은 주변 영상을 촬영하는 카메라와, 프로세서 및 상기 프로세서와 전기적으로 연결되고, 상기 프로세서에서 수행되는 적어도 하나의 코드가 저장되는 메모리를 포함할 수 있다.The vehicle 10 may include a camera that captures a surrounding image, a processor, and a memory electrically connected to the processor and storing at least one code executed by the processor.

일 실시 예에서, 차량(10)은 카메라에서 촬영된 영상을 입력 받고, 영상을 객체 인식용 머신 러닝 기반의 학습 모델에 입력하여 인식된 객체의 정보를 출력하고, 인식된 객체의 정보에 기반하여 제어될 수 있다. 이때 학습 모델은 차량 시뮬레이터에서 획득된 훈련 데이터 및 주행 차량의 카메라에서 실제로 촬영된 영상을 함께 훈련 데이터로 사용하여 훈련된 학습 모델일 수 있다. In one embodiment, the vehicle 10 receives an image captured by a camera, inputs the image to a machine learning-based learning model for object recognition, outputs information of the recognized object, and based on the information of the recognized object can be controlled In this case, the learning model may be a learning model trained by using training data obtained from a vehicle simulator and images actually captured by a camera of a driving vehicle as training data.

상기 실험에 따른 평가 결과에 의하면, 합성 이미지를 사용하면 최첨단 객체 인식 모델의 객체 인식 성능이 크게 향상됨을 확인할 수 있다. 성능 향상은 비정상적인 기상 조건을 나타내는 테스트 이미지로 모델을 평가할 때 훨씬 더 중요하며, 이는 열악한 환경에서 작동하는 자율 주행 시스템의 안전성에 상당한 이점을 줄 수 있다.According to the evaluation results according to the above experiment, it can be confirmed that the object recognition performance of the state-of-the-art object recognition model is greatly improved when the synthesized image is used. Performance improvements are even more important when models are evaluated with test images representing unusual weather conditions, which can have significant advantages for the safety of autonomous systems operating in harsh environments.

즉 일 실시 예의 객체 인식용 훈련 데이터 생성 장치(100)에서 생성된 훈련 데이터를 기반으로 차량(10)을 제어하는 경우, 보다 정확한 객체 인식에 따른 차량 주행이 가능할 수 있다. That is, when the vehicle 10 is controlled based on the training data generated by the object recognition training data generation apparatus 100 according to an embodiment, vehicle driving may be possible according to more accurate object recognition.

도 11은 본 개시의 일 실시 예에 따른 객체 인식용 훈련 데이터 생성 방법을 설명하기 위한 흐름도이다.11 is a flowchart illustrating a method of generating training data for object recognition according to an embodiment of the present disclosure.

도 11을 참조하면, S100단계에서, 프로세서(140)는 시뮬레이션 환경 데이터에 기반하여 주행 시뮬레이션 환경을 제어하고, 주행 시뮬레이션 환경에서 자차량을 주행하도록 차량 시뮬레이터를 제어한다.Referring to FIG. 11 , in step S100, the processor 140 controls a driving simulation environment based on the simulation environment data and controls the vehicle simulator to drive the own vehicle in the driving simulation environment.

시뮬레이션부(141)는 가상 세계(소위 마을)를 시뮬레이션하며, 데이터추출부(142)로부터의 시뮬레이션 환경 데이터 변경 명령에 따라 주행 시뮬레이션 환경을 제어할 수 있다. 즉 시뮬레이션부(141)는 데이터추출부(142)로부터의 시뮬레이션 환경 데이터 변경 명령에 따라, 날씨 및 조명 조건을 제어하고 자차량, 주변 차량 및 보행자를 이동시킬 수 있다.The simulation unit 141 simulates a virtual world (a so-called village) and controls a driving simulation environment according to a command to change simulation environment data from the data extraction unit 142 . That is, the simulation unit 141 may control the weather and lighting conditions and move the own vehicle, surrounding vehicles, and pedestrians according to the simulation environment data change command from the data extraction unit 142 .

S200단계에서, 프로세서(140)는 주행 중인 자차량의 주행 중 시야(field of view: FOV)를 기준으로 한 시뮬레이션 정보 데이터를 수집한다.In step S200, the processor 140 collects simulation information data based on a field of view (FOV) of the own vehicle that is driving.

즉, 데이터추출부(142)는 자차량에 가상으로 설치된 전방 카메라에서 촬영된 전방 영상을 획득할 수 있다. 또한 데이터추출부(142)는 전방 영상에 존재하는 적어도 하나의 객체에 대한 3차원 바운딩 박스를 획득할 수 있으며, 전방 영상의 시맨틱 분할 이미지도 획득할 수 있다.That is, the data extraction unit 142 may obtain a front image captured by a front camera virtually installed in the vehicle. In addition, the data extraction unit 142 may obtain a 3D bounding box for at least one object present in the front image, and may also obtain a semantic segmentation image of the front image.

그리고 데이터추출부(142)는 가상 세계의 중심이 원점인 월드 좌표계, 자차량의 중심이 원점인 차량 좌표계 및 전방 영상의 왼쪽 상단 모서리가 원점인 이미지 좌표계 정보를 획득할 수 있다.Further, the data extraction unit 142 may obtain information on a world coordinate system in which the center of the virtual world is the origin, a vehicle coordinate system in which the center of the own vehicle is the origin, and an image coordinate system in which the top left corner of the front image is the origin.

S300단계에서, 프로세서(140)는 시뮬레이션 정보 데이터를 기반으로 자차량의 주행 중 시야 내에 존재하는 적어도 하나의 객체의 2차원 바운딩 박스(bounding box)를 생성한다.In step S300, the processor 140 generates a 2D bounding box of at least one object existing within the field of view while the vehicle is driving based on the simulation information data.

데이터생성부(143)는 자차량이 운전되는 동안, 데이터추출부(142)로부터의 3차원 경계 상자, 카메라 이미지, 시맨틱 분할 이미지와 같은 원시(raw) 데이터를 처리하여, 모든 객체에 대한 정확한 2차원 바운딩 박스를 찾아, 객체 인식용 훈련 데이터 생성을 수행할 수 있다. 다시 말하면, 데이터생성부(143)는 시뮬레이션 정보 데이터를 기반으로 자차량의 주행 중 시야 내에 존재하는 적어도 하나의 객체의 2차원 바운딩 박스를 생성할 수 있다.While the vehicle is being driven, the data generator 143 processes raw data such as a 3D bounding box, camera image, and semantic segmentation image from the data extraction unit 142 to obtain accurate 2-dimensional images for all objects. It is possible to generate training data for object recognition by finding a dimensional bounding box. In other words, the data generator 143 may generate a 2D bounding box of at least one object existing within the field of view while the vehicle is driving based on the simulation information data.

데이터생성부(143)는 3차원 바운딩 박스의 꼭지점 좌표를 이미지 좌표계에 투영하고, 2차원 이미지 좌표계에 투영된 3차원 바운딩 박스를 2차원 바운딩 박스로 변환할 수 있다. 그리고 데이터생성부(143)는 시맨틱 분할 이미지에 기반하여 가려진 객체의 2차원 바운딩 박스를 필터링하고, 시맨틱 분할 이미지에 기반하여 2차원 바운딩 박스에 대응하는 객체의 형태에 적합하게 2차원 바운딩 박스의 형태를 변경하여 최종 바운딩 박스를 생성할 수 있다.The data generator 143 may project vertex coordinates of the 3D bounding box onto an image coordinate system and convert the 3D bounding box projected onto the 2D image coordinate system into a 2D bounding box. And the data generator 143 filters the 2D bounding box of the obscured object based on the semantic segmentation image, and the shape of the 2D bounding box is suitable for the shape of the object corresponding to the 2D bounding box based on the semantic segmentation image. By changing , the final bounding box can be created.

S400단계에서, 프로세서(140)는 바운딩 박스에 기반하여 객체의 객체 인식용 훈련 데이터 영상을 생성한다.In step S400, the processor 140 generates a training data image for object recognition of an object based on the bounding box.

즉 프로세서(140)는 시뮬레이션부(141)에서의 다양한 시뮬레이션 상황에 대한 이미지에서의 객체들에 대한 정확한 바운딩 박스를 생성하여, 객체 인식용 훈련 데이터 영상을 생성할 수 있다. 특히, 자율 주행에서의 객체 인식을 위하여, 다양한 상황에 대한 훈련 데이터 영상을 생성할 수 있다.That is, the processor 140 may generate accurate bounding boxes for objects in images for various simulation situations in the simulation unit 141 to generate training data images for object recognition. In particular, for object recognition in autonomous driving, training data images for various situations may be generated.

이상 설명된 본 개시에 따른 실시 예는 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등과 같은, 프로그램 명령어를 저장하고 실행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다.Embodiments according to the present disclosure described above may be implemented in the form of a computer program that can be executed on a computer through various components, and such a computer program may be recorded on a computer-readable medium. At this time, the medium is a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical recording medium such as a CD-ROM and a DVD, a magneto-optical medium such as a floptical disk, and a ROM hardware devices specially configured to store and execute program instructions, such as RAM, flash memory, and the like.

한편, 상기 컴퓨터 프로그램은 본 개시를 위하여 특별히 설계되고 구성된 것이거나 컴퓨터 소프트웨어 분야의 통상의 기술자에게 공지되어 사용 가능한 것일 수 있다. 컴퓨터 프로그램의 예에는, 컴파일러에 의하여 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용하여 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함될 수 있다.Meanwhile, the computer program may be specially designed and configured for the purpose of the present disclosure, or may be known and available to those skilled in the art in the field of computer software. An example of a computer program may include not only machine language code generated by a compiler but also high-level language code that can be executed by a computer using an interpreter or the like.

본 개시의 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 본 개시에서 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 적용한 발명을 포함하는 것으로서(이에 반하는 기재가 없다면), 발명의 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다.In the specification of the present disclosure (particularly in the claims), the use of the term "above" and similar indicating terms may correspond to both singular and plural. In addition, when a range is described in the present disclosure, as including the invention to which individual values belonging to the range are applied (unless otherwise stated), each individual value constituting the range is described in the detailed description of the invention Same as

본 개시에 따른 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 따라 본 개시가 한정되는 것은 아니다. 본 개시에서 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 본 개시를 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 본 개시의 범위가 한정되는 것은 아니다. 또한, 통상의 기술자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.Unless an order is explicitly stated or stated to the contrary for steps comprising a method according to the present disclosure, the steps may be performed in any suitable order. The present disclosure is not necessarily limited to the order of description of the steps. The use of all examples or exemplary terms (eg, etc.) in this disclosure is simply to explain the present disclosure in detail, and the scope of the present disclosure is limited due to the examples or exemplary terms unless limited by the claims. it is not going to be In addition, those skilled in the art can appreciate that various modifications, combinations and changes can be made according to design conditions and factors within the scope of the appended claims or equivalents thereof.

따라서, 본 개시의 사상은 상기 설명된 실시 예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위는 본 개시의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present disclosure should not be limited to the above-described embodiments, and not only the claims to be described later, but also all ranges equivalent to or equivalent to these claims are within the scope of the spirit of the present disclosure. will be said to belong to

1 : 객체 인식용 훈련 데이터 생성 시스템
100 : 객체 인식용 훈련 데이터 생성 장치
110 : 통신부
120 : 사용자 인터페이스
130 : 메모리
140 : 프로세서
200 : 서버
300 : 네트워크1: Training data generation system for object recognition
100: training data generation device for object recognition
110: Communication Department
120: user interface
130: memory
140: processor
200: server
300: network

Claims

A method for automatically generating training data for object recognition based on a vehicle simulator, wherein at least part of each step is performed by a processor,
a simulation step of controlling a driving simulation environment based on the simulation environment data and controlling a vehicle simulator to drive the host vehicle in the driving simulation environment;
a data extraction step of collecting simulation information data based on a field of view (FOV) of the own vehicle while driving;
a 2D bounding box generating step of generating a 2D bounding box of at least one object existing within a visual field while the vehicle is driving based on the simulation information data; and
A training data generation step of generating a training data image for object recognition of the object based on the bounding box,
A method for generating training data for object recognition.

According to claim 1,
The data extraction step,
obtaining a front image captured by a front camera virtually installed in the vehicle;
obtaining a 3D bounding box for at least one object present in the front image; and
Acquiring a semantic segmentation image of the front image,
A method for generating training data for object recognition.

According to claim 2,
The data extraction step,
Acquiring information on a world coordinate system whose origin is the center of the virtual world of the simulation environment, a vehicle coordinate system whose origin is the center of the own vehicle, and an image coordinate system whose origin is the upper left corner of the front image,
A method for generating training data for object recognition.

According to claim 1,
The data extraction step,
Extracting the simulation information data according to a predetermined period or according to an event that occurs based on a user's setting,
A method for generating training data for object recognition.

According to claim 3,
In the step of generating the two-dimensional bounding box,
projecting the coordinates of vertices of the 3D bounding box onto the image coordinate system;
converting the 3D bounding box projected on the image coordinate system into a 2D bounding box;
filtering a 2D bounding box of an occluded object based on the semantic segmentation image; and
Generating a final bounding box by changing the shape of the 2D bounding box to suit the shape of the object corresponding to the 2D bounding box based on the semantic segmentation image,
A method for generating training data for object recognition.

According to claim 5,
The step of projecting onto the image coordinate system,
applying a transformation matrix to each vertex coordinate of the 3D bounding box based on the center point of the own vehicle in the world coordinate system;
generating temporary 3D camera coordinates such that a z-axis is parallel to the FOV direction of the front camera based on the position of the front camera in the world coordinate system; and
Based on the FOV and image resolution of the front camera, performing perspective transformation on each vertex of the temporary 3D camera coordinates,
A method for generating training data for object recognition.

According to claim 6,
The step of transforming into a two-dimensional bounding box,
Determining a 2-dimensional bounding box of a minimum size including all vertices of the 3-dimensional bounding box,
A method for generating training data for object recognition.

According to claim 7,
The data extraction step further comprises obtaining class information of the object,
In the step of filtering the bounding box,
Determining whether the object corresponding to the 2-dimensional bounding box is occluded based on the 2-dimensional bounding box, the class information of the object corresponding to the 2-dimensional bounding box, and the semantic segmentation image,
A method for generating training data for object recognition.

According to claim 8,
The step of filtering the two-dimensional bounding box,
obtaining coordinates of a center point of the 2D bounding box on the image coordinate system; and
In the semantic segmentation image, if it is determined that the class information corresponding to the center point coordinates is the same as the class of the object corresponding to the 2D bounding box, determining the object corresponding to the 2D bounding box as an unoccluded object. including,
A method for generating training data for object recognition.

According to claim 8,
The step of generating the final bounding box,
On the semantic segmentation image, each boundary of the 2D bounding box is moved in a direction approaching the center point until each boundary of the 2D bounding box reaches a pixel in the class division area of the object of the 2D bounding box. Including the steps of
A method for generating training data for object recognition.

According to claim 1,
Controlling the vehicle simulator,
Dynamically changing the driving speed of the own vehicle or changing the weather or driving time of the driving simulation environment,
A method for generating training data for object recognition.

According to claim 11,
Controlling the vehicle simulator,
Changing a cycle for collecting the simulation information data based on the traveling speed of the own vehicle,
A method for generating training data for object recognition.

An apparatus for automatically generating training data for object recognition based on a vehicle simulator,
Memory; and
a processor coupled with the memory and configured to execute computer readable instructions contained in the memory;
The at least one processor,
A simulation operation of controlling a driving simulation environment based on simulation environment data and controlling a vehicle simulator to drive a vehicle in the driving simulation environment;
A data extraction operation of collecting simulation information data based on a field of view (FOV) of the vehicle in motion;
A 2D bounding box generating operation of generating a 2D bounding box of at least one object existing within a field of view while the vehicle is driving based on the simulation information data; and
Set to perform a training data generation operation of generating a training data image for object recognition of the object based on the bounding box,
A device for generating training data for object recognition.

According to claim 13,
The data extraction operation,
Obtaining a front image captured by a front camera virtually installed in the vehicle;
Obtaining a 3D bounding box for at least one object present in the front image;
Obtaining a semantic segmentation image of the front image, and
Obtaining information about a world coordinate system whose origin is the center of the virtual world of the simulation environment, a vehicle coordinate system whose origin is the center of the own vehicle, and an image coordinate system whose origin is the upper left corner of the front image,
A device for generating training data for object recognition.

15. The method of claim 14,
The operation of generating the two-dimensional bounding box,
Projecting the coordinates of vertices of the 3-dimensional bounding box onto the image coordinate system;
converting the 3D bounding box projected on the image coordinate system into a 2D bounding box;
Filtering a 2D bounding box of an occluded object based on the semantic segmentation image; and
Based on the semantic segmentation image, generating a final bounding box by changing the shape of the 2D bounding box to suit the shape of the object corresponding to the 2D bounding box.
A device for generating training data for object recognition.

According to claim 15,
The operation of projecting onto the image coordinate system,
Applying a transformation matrix to each vertex coordinate of the 3-dimensional bounding box based on the center point of the own vehicle in the world coordinate system;
Based on the position of the front camera in the world coordinate system, generating temporary 3D camera coordinates such that the z-axis is parallel to the FOV direction of the front camera, and
Based on the FOV and image resolution of the front camera, performing perspective transformation on each vertex of the temporary 3D camera coordinates,
A device for generating training data for object recognition.

17. The method of claim 16,
The operation of converting to the two-dimensional bounding box,
Including an operation of determining a 2-dimensional bounding box of a minimum size including all vertices of the 3-dimensional bounding box,
A device for generating training data for object recognition.

18. The method of claim 17,
The data extraction operation further includes obtaining class information of the object,
The operation of filtering the bounding box,
Determining whether an object corresponding to the 2-dimensional bounding box is occluded based on the 2-dimensional bounding box, the class information of the object corresponding to the 2-dimensional bounding box, and the semantic segmentation image,
A device for generating training data for object recognition.

According to claim 18,
The operation of generating the final bounding box,
On the semantic segmentation image, each boundary of the 2D bounding box is moved in a direction approaching the center point until each boundary of the 2D bounding box reaches a pixel in the class division area of the object of the 2D bounding box. Including the action to
A device for generating training data for object recognition.

A camera that captures surrounding images;
processor; and
A memory electrically connected to the processor and storing at least one code executed by the processor;
When the memory is executed through the processor, the processor receives an image captured by the camera, inputs the image to a machine learning-based learning model for object recognition, outputs information of the recognized object, and recognizes the object Stores a code that causes the vehicle to be controlled based on the information of
The learning model is a learning model trained by using training data obtained from a vehicle simulator and images actually captured by a camera of a driving vehicle as training data,
vehicle.