KR20230156851A

KR20230156851A - Method and system for labeling 3D object

Info

Publication number: KR20230156851A
Application number: KR1020220055936A
Authority: KR
Inventors: 김도윤; 최경택
Original assignee: 위드로봇 주식회사; 대구가톨릭대학교산학협력단
Priority date: 2022-05-06
Filing date: 2022-05-06
Publication date: 2023-11-15

Abstract

삼차원 객체의 라벨링 방법 및 그 시스템이 개시된다. 본 발명의 일 측면에 따르면, 삼차원 객체의 라벨링 방법은 시스템이 라벨링 대상이 되는 객체의 종류별 크기정보를 포함하는 템플릿 정보를 저장하는 단계, 상기 시스템이 카메라로부터 촬영된 원본 영상에 기초한 라벨링 대상영상을 유저에게 제공하고, 제공에 응답하여 상기 객체에 상응하는 3차원의 바운딩 박스를 특정하기 위한 라벨링 필요성분들 중 일부인 라벨링 성분만을 입력받는 단계, 및 상기 시스템이 상기 유저로부터 상기 라벨링 성분의 상기 바운딩 박스 내에서의 기하 식별정보를 입력받고, 상기 기하 식별정보, 상기 템플릿 정보에 포함된 상기 객체의 크기정보, 및 상기 카메라의 캘리브레이션 정보에 기초하여 상기 라벨링 성분을 제외한 상기 바운딩 박스의 나머지 성분을 특정할 수 있다.A method and system for labeling three-dimensional objects are disclosed. According to one aspect of the present invention, a method for labeling a three-dimensional object includes the steps of a system storing template information including size information for each type of object to be labeled, and the system storing a labeling target image based on an original image captured from a camera. Providing to a user and, in response to providing, receiving only a labeling component that is part of the labeling needs for specifying a three-dimensional bounding box corresponding to the object, and the system receiving input from the user within the bounding box of the labeling component. The geometric identification information in is input, and the remaining components of the bounding box excluding the labeling component can be specified based on the geometric identification information, the size information of the object included in the template information, and the calibration information of the camera. there is.

Description

Method and system for labeling 3D object {Method and system for labeling 3D object}

본 발명은 삼차원 객체의 라벨링 방법 및 그 시스템에 관한 것이다. 보다 상세하게는 이차원의 영상에 표시된 객체의 삼차원 정보(예컨대, 삼차원 바운딩 박스)를 용이하고 정확하게 라벨링할 수 있는 기술적 사상에 관한 것이다. The present invention relates to a method and system for labeling three-dimensional objects. More specifically, it relates to a technical idea that can easily and accurately label three-dimensional information (e.g., three-dimensional bounding box) of an object displayed in a two-dimensional image.

영상에서 물체를 인식하여 그 위치를 파악하는 작업은 자율주행자동차, 지능형 CCTV, 이동로봇이 제공하는 서비스 품질에 직결되기에 가장 중요한 작업 중에 하나이다. 딥뉴럴네트워크(Deep Neural Netowrk, 이하 DNN)의 영상인식기술 발달로 영상에서 물체를 2D 사각형(이차원 바운딩 박스)으로 찾고, 그 사각형 안에 물체가 있는 것으로 계산하는 것이 일반적인 방법이다.Recognizing an object in an image and determining its location is one of the most important tasks as it is directly related to the quality of services provided by self-driving cars, intelligent CCTV, and mobile robots. With the development of deep neural network (DNN) image recognition technology, a common method is to find an object in an image as a 2D square (two-dimensional bounding box) and calculate that the object is inside the square.

이러한 종래의 방식은 차량에 장착된 카메라와 실질적으로 동일한 평면상에 검출하고자 오브젝트가 존재하는 경우 또는 상대적으로 거리가 멀지 않은 경우에는, 바운딩 박스의 중심이나 단순히 바운딩 박스의 밑변의 중심을 해당 물체의 위치로 간주하여도 큰 오차가 발생하지 않는다. This conventional method detects objects on substantially the same plane as the camera mounted on the vehicle, or when the distance is not relatively far, the center of the bounding box or simply the center of the bottom of the bounding box is used to detect the object. Even if it is considered as a location, there is no significant error.

하지만 카메라가 오브젝트와 동일한 평면에 존재하는 것이 아닌 경우, 그리고 카메라와 차량과의 거리가 상대적으로 멀어지는 경우에는, 바운딩 박스의 중심으로 검출한 결과는 실제 물체의 위치와 오차가 커지게 된다. However, if the camera is not on the same plane as the object, and if the distance between the camera and the vehicle is relatively long, the result of detection based on the center of the bounding box will have a large error compared to the actual object position.

따라서 정확한 오브젝트의 검출을 위해서는 이차원 영상에서도 오브젝트 즉, 객체의 삼차원 정보를 검출할 필요가 있다. Therefore, in order to accurately detect an object, it is necessary to detect the object, that is, the three-dimensional information of the object, even in a two-dimensional image.

삼차원 정보를 검출하기 위해서는 오브젝트의 삼차원 바운딩 박스를 검출하는 것이 필요하고, 이를 위해서는 다수의 이차원 영상에서 오브젝트에 해당하는 영역을 삼차원 바운딩 박스로 라벨링하는 과정이 필요하다.In order to detect three-dimensional information, it is necessary to detect the three-dimensional bounding box of an object, and this requires a process of labeling the area corresponding to the object in multiple two-dimensional images as a three-dimensional bounding box.

통상 삼차원 바운딩 박스를 라벨링하기 위해서 바운딩 박스에 포함되는 모든 선분 또는 점을 라벨링 할 필요는 없으며, 삼차원 바운딩 박스를 특정할 수 있는 최소한의 필요성분들(예컨대, 3개의 선분 또는 4개의 꼭짐점)을 라벨링 하는 작업이 필요하다. 3개의 선분으로 라벨링하는 경우는 동일 면에서 2개의 선분, 그리고 다른 면에 포함된 1개의 선분을 라벨링하는 경우일 수 있고, 4개의 점으로 라벨링하는 경우는 1개의 면에 포함되는 3개의 점과 다른 면에 포함되는 1개의 점을 라베링 하는 경우일 수 있다.Normally, in order to label a three-dimensional bounding box, it is not necessary to label all line segments or points included in the bounding box, but rather label the minimum necessary elements that can specify the three-dimensional bounding box (e.g., three line segments or four vertices). Work is needed. In the case of labeling with 3 line segments, it may be the case of labeling 2 line segments on the same side and 1 line segment included in the other side, and in the case of labeling with 4 dots, it may be the case of labeling 3 dots included in 1 side and This may be a case of labeling one point included in another face.

도 1은 종래의 삼차원 라벨링 방식을 설명하기 위한 도면이다. Figure 1 is a diagram for explaining a conventional three-dimensional labeling method.

도 1을 참조하면, 도 1은 삼차원 바운딩 박스를 구성하는 4개의 꼭지점(예컨대, width, common, length, height)를 라벨링하는 경우를 예시적으로 나타내고 있는데, 이러한 방식으로 라벨링을 수행하는 경우에는 4개의 꼭지점의 위치를 정확히 특정하는데 오랜 시간이 걸리는 문제가 있다.Referring to FIG. 1, FIG. 1 exemplarily shows a case of labeling four vertices (e.g., width, common, length, height) constituting a three-dimensional bounding box. When labeling is performed in this way, 4 There is a problem that it takes a long time to accurately specify the location of the dog's vertex.

또한 카메라를 통해 획득되는 영상에 라벨링을 하는 경우는 렌즈의 특성에 따라 왜곡이 발생할 수 잇다. 특히 카메라에 시야각(FOV)이 넓은 광각렌즈나 초광각 렌즈를 사용하면 넓은 범위를 볼 수 있지만 이로 인해 상대적으로 영상 왜곡이 심해지는 문제가 있어서 비록 정확한 라벨링을 수행한다고 하더라도 영상 자체의 왜곡으로 인해 라벨링 된 결과에는 오차가 발생할 수밖에 없다. Additionally, when labeling images acquired through a camera, distortion may occur depending on the characteristics of the lens. In particular, if you use a wide-angle lens or ultra-wide-angle lens with a wide field of view (FOV) on the camera, you can see a wide area, but this causes relatively severe image distortion, so even if accurate labeling is performed, the labeled image itself is distorted. Errors are bound to occur in the results.

한국공개특허(공개번호 10-2020-0087354, "자율주행용 데이터 라벨링 장치 및 방법")Korea Public Patent (Publication No. 10-2020-0087354, “Data labeling device and method for autonomous driving”)

따라서 본 발명이 해결하고자 하는 과제는 카메라로부터 획득된 원본 영상에 대해 왜곡을 보정하여 라벨링을 수행할 수 있도록 함으로써 정확한 렌즈의 특성으로 영상 왜곡이 있는 경우에도 정확한 라벨링을 수행할 수 있는 기술적 사상을 제공하는 것이다. Therefore, the problem that the present invention aims to solve is to correct the distortion of the original image obtained from the camera and perform labeling, thereby providing a technical idea that can perform accurate labeling even when there is image distortion due to the characteristics of the accurate lens. It is done.

또한 라벨링 필요성분들(예컨대, 삼차원 바운딩 박스의 3개의 선분 또는 4개의 꼭지점)을 모두 라벨링하지 않아도 일부의 성분(예컨대, 1개의 선분 또는 2개의 꼭지점 등)만 라벨링하는 경우에 라벨링이 완성될 수 있도록 하여 라벨링 효율을 매우 높일 수 있는 기술적 사상을 제공하는 것이다. In addition, labeling can be completed by labeling only some components (e.g., 1 line segment or 2 vertices, etc.) without labeling all of the components that require labeling (e.g., 3 line segments or 4 vertices of a three-dimensional bounding box). This provides technical ideas that can greatly increase labeling efficiency.

본 발명의 일 측면에 따르면, 이차원의 영상에서 삼차원의 객체를 라벨링하기 위한 방법은 시스템이 라벨링 대상이 되는 객체의 종류별 크기정보를 포함하는 템플릿 정보를 저장하는 단계, 상기 시스템이 카메라로부터 촬영된 원본 영상에 기초한 라벨링 대상영상을 유저에게 제공하고, 제공에 응답하여 상기 객체에 상응하는 3차원의 바운딩 박스를 특정하기 위한 라벨링 필요성분들 중 일부인 라벨링 성분만을 입력받는 단계, 및 상기 시스템이 상기 유저로부터 상기 라벨링 성분의 상기 바운딩 박스 내에서의 기하 식별정보를 입력받고, 상기 기하 식별정보, 상기 템플릿 정보에 포함된 상기 객체의 크기정보, 및 상기 카메라의 캘리브레이션 정보에 기초하여 상기 라벨링 성분을 제외한 상기 바운딩 박스의 나머지 성분을 특정하는 단계를 포함한다.According to one aspect of the present invention, a method for labeling a three-dimensional object in a two-dimensional image includes the steps of a system storing template information including size information for each type of object to be labeled, and the system storing an original image captured by a camera. Providing a labeling target image based on an image to a user, receiving only labeling elements that are part of the labeling elements needed to specify a three-dimensional bounding box corresponding to the object in response to the provision, and the system receiving the labeling element from the user. Geometric identification information within the bounding box of the labeling component is input, and the bounding box excluding the labeling component is based on the geometric identification information, size information of the object included in the template information, and calibration information of the camera. It includes the step of specifying the remaining components.

상기 삼차원 객체의 라벨링 방법은 상기 원본 영상을 상기 시스템이 입력받는 단계, 상기 시스템이 입력받은 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성하는 단계를 더 포함할 수 있다.The method of labeling a three-dimensional object may further include receiving the original image into the system, and generating the labeling target image in which lens distortion is corrected for the original image received by the system.

상기 삼차원 객체의 라벨링 방법은 상기 시스템이 생성된 상기 라벨링 대상영상에 기초하여 상기 카메라 캘리브레이션 정보를 획득하는 단계를 더 포함할 수 있다.The method for labeling a three-dimensional object may further include obtaining the camera calibration information based on the labeling target image generated by the system.

상기 시스템이 입력받은 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성하는 단계는, 상기 원본 영상으로부터 직선상에 존재하여야 하는 포인트들을 특정하는 단계, 특정된 상기 포인트들이 직선으로 투영되도록 하는 함수를 이용하여 상기 시스템이 상기 원본 영상으로부터 상기 라벨링 대상영상을 생성하는 단계를 포함한다.The step of generating the labeling target image in which lens distortion is corrected for the original image received by the system includes specifying points that must exist on a straight line from the original image, and projecting the specified points into a straight line. It includes the step of the system generating the labeling target image from the original image using a function.

상기 시스템이 생성된 상기 라벨링 대상영상에 기초하여 카메라 캘리브레이션 정보를 획득하는 단계는, 상기 시스템이 상기 라벨링 대상영상에서 서로 다른 방향으로 형성되는 복수의 평행 직선 페어들를 특정하는 단계, 상기 시스템이 상기 평행 직선 페어들에 기초하여 소실점들을 특정하고, 특정한 상기 소실점들에 기초하여 렌즈의 초점거리를 포함하는 상기 카메라 캘리브레이션 정보를 획득하는 단계를 포함할 수 있다.The step of the system acquiring camera calibration information based on the generated labeling target image includes the system specifying a plurality of parallel straight line pairs formed in different directions in the labeling target image, and the system specifying the parallel straight line pairs. It may include specifying vanishing points based on straight pairs, and obtaining the camera calibration information including the focal length of the lens based on the specific vanishing points.

상기 시스템이 생성된 상기 라벨링 대상영상에 기초하여 카메라 캘리브레이션 정보를 획득하는 단계는 상기 시스템이 상기 라벨링 대상영상에서 미리 높이를 알고 있는 높이 객체를 특정하는 단계, 상기 시스템이 상기 높이 객체의 미리 알고 있는 높이와 상기 라벨링 대상영상에서의 높이에 기초하여 카메라 높이를 특정하여 상기 카메라 캘리브레이션 정보를 획득하는 단계를 포함할 수 있다.The step of acquiring camera calibration information based on the labeling target image generated by the system includes specifying a height object whose height the system knows in advance in the labeling target image, and specifying a height object whose height the system knows in advance in the labeling target image. It may include obtaining the camera calibration information by specifying the camera height based on the height and the height in the labeling target image.

상기 라벨링 필요성분들은, 상기 바운딩 박스를 구성하는 선분들이며, 상기 라벨링 성분은 상기 바운딩 박스를 구성하는 선분들 중 1개의 선분인 것을 특징으로 할 수 있다.The labeling components may be line segments constituting the bounding box, and the labeling component may be one of the line segments constituting the bounding box.

다른 일 측면에 따른 방법은 시스템이 카메라로부터 촬영된 원본 영상을 입력받는 단계, 상기 시스템이 입력받은 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성하는 단계; 상기 시스템이 생성된 상기 라벨링 대상영상에 기초하여 상기 카메라 캘리브레이션 정보를 획득하는 단계를 포함하며, 상기 시스템이 생성된 상기 라벨링 대상영상에 기초하여 상기 카메라 캘리브레이션 정보를 획득하는 단계는 상기 시스템이 상기 라벨링 대상영상에서 서로 다른 방향으로 형성되는 복수의 평행 직선 페어들를 특정하고, 특정한 상기 평행 직선 페어들에 기초하여 소실점들을 특정하고며 특정한 상기 소실점들에 기초하여 렌즈의 초점거리를 포함하는 상기 카메라 캘리브레이션 정보를 획득하는 단계, 또는 상기 시스템이 상기 라벨링 대상영상에서 미리 높이를 알고 있는 높이 객체를 특정하고, 상기 높이 객체의 미리 알고 있는 높이와 상기 라벨링 대상영상에서의 높이에 기초하여 카메라 높이를 특정하여 상기 카메라 캘리브레이션 정보를 획득하는 단계를 포함한다.A method according to another aspect includes steps of a system receiving an original image captured from a camera, generating a labeling target image in which lens distortion is corrected for the original image received by the system; Obtaining the camera calibration information based on the labeling target image generated by the system, wherein the step of acquiring the camera calibration information based on the labeling target image generated by the system includes the system performing the labeling. The camera calibration information includes specifying a plurality of pairs of parallel straight lines formed in different directions in the target image, specifying vanishing points based on the specific pairs of parallel straight lines, and including a focal length of the lens based on the specific vanishing points. Obtaining, or the system specifies a height object whose height is known in advance in the labeling target image, and specifies the camera height based on the previously known height of the height object and the height in the labeling target image, It includes obtaining camera calibration information.

상기의 방법은 컴퓨터 판독 가능한 기록매체에 저장된 컴퓨터 프로그램에 의해 구현될 수 있다. The above method can be implemented by a computer program stored in a computer-readable recording medium.

본 발명의 다른 일 측면에 따른 이차원의 영상에서 삼차원의 객체를 라벨링하기 위한 시스템은 프로세서, 상기 프로세서에 의해 구동되는 프로그램을 저장하는 메모리를 포함하며, 상기 프로세서는 상기 프로그램을 구동하여, 라벨링 대상이 되는 객체의 종류별 크기정보를 포함하는 템플릿 정보를 저장하고, 카메라로부터 촬영된 원본 영상에 기초한 라벨링 대상영상을 유저에게 제공하고, 제공에 응답하여 상기 객체에 상응하는 3차원의 바운딩 박스를 특정하기 위한 라벨링 필요성분들 중 일부인 라벨링 성분만 입력받으며, 상기 유저로부터 상기 라벨링 성분의 상기 바운딩 박스 내에서의 기하 식별정보를 입력받고, 상기 기하 식별정보, 상기 템플릿 정보에 포함된 상기 객체의 크기정보, 및 상기 카메라의 캘리브레이션 정보에 기초하여 상기 라벨링 성분을 제외한 상기 바운딩 박스의 나머지 성분을 특정한다.A system for labeling a three-dimensional object in a two-dimensional image according to another aspect of the present invention includes a processor and a memory that stores a program driven by the processor, and the processor drives the program so that the labeling target is To store template information including size information for each type of object, to provide the user with a labeling target image based on the original image captured from a camera, and to specify a three-dimensional bounding box corresponding to the object in response to the provision. Only the labeling component, which is a part of the labeling elements, is input, geometric identification information within the bounding box of the labeling component is input from the user, the geometric identification information, size information of the object included in the template information, and the Based on the camera's calibration information, the remaining components of the bounding box excluding the labeling component are specified.

상기 프로세서는 상기 프로그램을 구동하여, 상기 원본 영상을 입력받고, 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성할 수 있다.The processor may run the program, receive the original image, and generate the labeling target image in which lens distortion of the original image has been corrected.

상기 프로세서는 상기 프로그램을 구동하여, 상기 라벨링 대상영상에서 서로 다른 방향으로 형성되는 복수의 평행 직선 페어들를 특정하고, 상기 평행 직선 페어들에 기초하여 소실점들을 특정하고, 특정한 상기 소실점들에 기초하여 렌즈의 초점거리를 포함하는 상기 카메라 캘리브레이션 정보를 획득할 수 있다.The processor drives the program to specify a plurality of parallel straight line pairs formed in different directions in the labeling target image, to specify vanishing points based on the parallel straight line pairs, and to use a lens lens based on the specific vanishing points. The camera calibration information including the focal length can be obtained.

상기 프로세서는 상기 프로그램을 구동하여, 상기 라벨링 대상영상에서 미리 높이를 알고 있는 높이 객체를 특정하고, 상기 높이 객체의 미리 알고 있는 높이와 상기 라벨링 대상영상에서의 높이에 기초하여 카메라 높이를 특정하여 상기 카메라 캘리브레이션 정보를 획득할 수 있다.The processor runs the program, specifies a height object whose height is known in advance in the labeling target image, and specifies the camera height based on the known height of the height object and the height in the labeling target image. Camera calibration information can be obtained.

본 발명의 다른 일 측면에 따른 시스템은 프로세서, 상기 프로세서에 의해 구동되는 프로그램을 저장하는 메모리를 포함하며, 상기 프로세서는 상기 프로그램을 구동하여 카메라로부터 촬영된 원본 영상을 입력받고, 입력받은 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성하며, 생성된 상기 라벨링 대상영상에 기초하여 상기 카메라 캘리브레이션 정보를 획득하되, 상기 라벨링 대상영상에서 서로 다른 방향으로 형성되는 복수의 평행 직선 페어들를 특정하고, 특정한 상기 평행 직선 페어들에 기초하여 소실점들을 특정하고며 특정한 상기 소실점들에 기초하여 렌즈의 초점거리를 포함하는 상기 카메라 캘리브레이션 정보를 획득하거나, 상기 라벨링 대상영상에서 미리 높이를 알고 있는 높이 객체를 특정하고, 상기 높이 객체의 미리 알고 있는 높이와 상기 라벨링 대상영상에서의 높이에 기초하여 카메라 높이를 특정하여 상기 카메라 캘리브레이션 정보를 획득한다.A system according to another aspect of the present invention includes a processor and a memory that stores a program driven by the processor, wherein the processor runs the program to receive an original image captured from a camera and to Generating the labeling target image with lens distortion corrected, obtaining the camera calibration information based on the generated labeling target image, and specifying a plurality of parallel straight line pairs formed in different directions in the labeling target image, and , specify vanishing points based on the specific pairs of parallel straight lines, and obtain the camera calibration information including the focal length of the lens based on the specific vanishing points, or select a height object whose height is known in advance in the labeling target image. The camera calibration information is obtained by specifying the camera height based on the previously known height of the height object and the height in the labeling target image.

본 발명의 기술적 사상에 의하면, 카메라로부터 획득된 원본 영상과 미리 알고 있는 정보(예컨대, 영상 내에서의 직선 성분 등)을 이용하여 원본 영상의 왜곡을 보정하고 보정된 영상에 대해 라벨링을 수행할 수 있도록 함으로써 정확한 렌즈의 특성으로 영상 왜곡이 있는 경우에도 정확한 라벨링을 수행할 수 있는 효과가 있다. 특히 이러한 효과는 넓은 범위를 촬영하기 위한 광각렌즈를 사용하는 환경(예컨대, 교통 관제, 자율주행을 위한 카메라 등)에 특히 효과적일 수 있다.According to the technical idea of the present invention, it is possible to correct distortion of the original image and perform labeling on the corrected image using the original image obtained from the camera and previously known information (e.g., straight line components in the image, etc.). This has the effect of enabling accurate labeling even when there is image distortion due to the characteristics of the accurate lens. In particular, this effect can be particularly effective in environments that use wide-angle lenses to capture a wide range (e.g., traffic control, cameras for autonomous driving, etc.).

또한 라벨링 필요성분들(예컨대, 삼차원 바운딩 박스의 3개의 선분 또는 4개의 꼭지점)을 모두 라벨링하지 않아도 일부의 성분(예컨대, 1개의 선분 또는 2개의 꼭지점 등)만 라벨링하는 경우에 라벨링이 완성될 수 있도록 하여 라벨링 효율을 매우 높일 수 있는 효과가 있다.In addition, labeling can be completed by labeling only some components (e.g., 1 line segment or 2 vertices, etc.) without labeling all of the components that require labeling (e.g., 3 line segments or 4 vertices of a three-dimensional bounding box). This has the effect of greatly increasing labeling efficiency.

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.
도 1은 종래의 삼차원 라벨링 방식을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시 예에 따른 삼차원 객체의 라벨링 방법을 구현하기 위한 개략적인 시스템 구성을 나타내는 도면이다.
도 3은 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 방법을 구현하기 위한 시스템의 개략적인 물리적 구성을 설명하기 위한 도면이다.
도 4 내지 도 5는 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 방법을 설명하기 위한 개략적인 플로우 차트를 나타낸다.
도 6은 본 발명의 실시 예에 따른 원본 영상과 라벨링 대상영상의 일 예를 나타낸다.
도 7 내지 도 8은 본 발명의 실시 예에 따른 카메라 캘리브레이션 정보를 획득하기 위한 방법을 설명하기 위한 도면이다.
도 9 내지 도 10은 본 발명의 실시 예에 따른 삼차원 라벨링 툴을 설명하기 위한 도면이다.
도 11은 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 결과를 나타내는 도면이다. In order to more fully understand the drawings cited in the detailed description of the present invention, a brief description of each drawing is provided.
Figure 1 is a diagram for explaining a conventional three-dimensional labeling method.
Figure 2 is a diagram showing a schematic system configuration for implementing a method for labeling three-dimensional objects according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating a schematic physical configuration of a system for implementing a method for labeling three-dimensional objects according to an embodiment of the present invention.
Figures 4 and 5 show a schematic flow chart for explaining a method of labeling a three-dimensional object according to an embodiment of the present invention.
Figure 6 shows an example of an original image and a labeling target image according to an embodiment of the present invention.
Figures 7 and 8 are diagrams for explaining a method for obtaining camera calibration information according to an embodiment of the present invention.
9 to 10 are diagrams for explaining a three-dimensional labeling tool according to an embodiment of the present invention.
Figure 11 is a diagram showing the labeling results of a three-dimensional object according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.Since the present invention can be modified in various ways and can have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all transformations, equivalents, and substitutes included in the spirit and technical scope of the present invention. In describing the present invention, if it is determined that a detailed description of related known technologies may obscure the gist of the present invention, the detailed description will be omitted.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The terms used in this application are only used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 명세서에 있어서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are intended to indicate the presence of one or more other It should be understood that this does not exclude in advance the presence or addition of features, numbers, steps, operations, components, parts, or combinations thereof.

또한, 본 명세서에 있어서는 어느 하나의 구성요소가 다른 구성요소로 데이터를 '전송'하는 경우에는 상기 구성요소는 상기 다른 구성요소로 직접 상기 데이터를 전송할 수도 있고, 적어도 하나의 또 다른 구성요소를 통하여 상기 데이터를 상기 다른 구성요소로 전송할 수도 있는 것을 의미한다. 반대로 어느 하나의 구성요소가 다른 구성요소로 데이터를 '직접 전송'하는 경우에는 상기 구성요소에서 다른 구성요소를 통하지 않고 상기 다른 구성요소로 상기 데이터가 전송되는 것을 의미한다.Additionally, in this specification, when one component 'transmits' data to another component, the component may transmit the data directly to the other component, or through at least one other component. This means that the data can be transmitted to the other components. Conversely, when one component 'directly transmits' data to another component, it means that the data is transmitted from the component to the other component without going through the other component.

도 2는 본 발명의 일 실시 예에 따른 삼차원 객체의 라벨링 방법을 구현하기 위한 개략적인 시스템 구성을 나타내는 도면이다. 또한, 도 3은 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 방법을 구현하기 위한 시스템의 개략적인 물리적 구성을 설명하기 위한 도면이다.Figure 2 is a diagram showing a schematic system configuration for implementing a method for labeling three-dimensional objects according to an embodiment of the present invention. Additionally, FIG. 3 is a diagram illustrating a schematic physical configuration of a system for implementing a method for labeling three-dimensional objects according to an embodiment of the present invention.

우선 도 2를 참조하면, 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 방법을 구현하기 위한 소정의 시스템(100)이 구비될 수 있다.First, referring to FIG. 2, a system 100 may be provided to implement a method for labeling three-dimensional objects according to an embodiment of the present invention.

상기 시스템(100)은 본 발명의 기술적 사상에 따라 카메라(200)로부터 영상을 수신하고, 수신된 영상을 이용하여 상기 영상에 포함된 객체의 삼차원 라벨링을 수행할 수 있다. The system 100 can receive an image from the camera 200 and perform three-dimensional labeling of objects included in the image using the received image according to the technical idea of the present invention.

상기 시스템(100)은 카메라(200)로부터 획득되는 영상 즉, 원본 영상에서 렌즈의 특성에 따른 왜곡을 보정한 영상 즉, 라벨링 대상영상을 생성할 수 있다. 이를 통해 라벨링의 정확성이 높아질 수 있다.The system 100 can generate an image obtained from the camera 200, that is, an image obtained by correcting distortion according to the characteristics of the lens from the original image, that is, a labeling target image. This can increase the accuracy of labeling.

또한, 상기 시스템(100)은 상기 라벨링 대상영상으로 유저가 라벨링을 수행하도록 할 수 있다. 이때 상기 시스템(100)은 라벨링을 위한 필요성분들 모두를 라벨링하지 않아도 일부 성분만의 라벨링만으로 라벨링이 완성되도록 함으로써 라벨링 작업의 효율을 비약적으로 상승시킬 수도 있다. Additionally, the system 100 can allow the user to perform labeling with the labeling target image. At this time, the system 100 can dramatically increase the efficiency of labeling work by allowing labeling to be completed only by labeling some ingredients without labeling all of the ingredients necessary for labeling.

예컨대, 라벨링을 위한 필요성분들은 전술한 바와 같이 삼차원 바운딩 박스를 특정할 수 있는 성분들을 의미할 수 있다. 전술한 바와 같이 삼차원 바운딩 박스를 특정하기 위해서는 삼차원 바운딩 박스를 구성하는 선분 3개 또는 꼭지점 4개가 필요할 수 있으며, 이때 선분 3개 중 하나는 삼차원 바운딩 박스의 제1면의 직교선분들이고 다른 1개는 상기 제1면과 직교하는 선분일 수 있다. 또는 꼭지점 4개는 상술한 선분 3개의 꼭지점일 수 있다.For example, necessary components for labeling may refer to components that can specify a three-dimensional bounding box, as described above. As described above, in order to specify a three-dimensional bounding box, three line segments or four vertices may be required to make up the three-dimensional bounding box, where one of the three line segments is an orthogonal line segment of the first side of the three-dimensional bounding box and the other is It may be a line segment perpendicular to the first surface. Alternatively, the four vertices may be the vertices of the three line segments described above.

결국 삼차원 바운딩 박스를 특정하는 것이 삼차원 객체의 라벨링일 수 있고, 이를 위해서는 적어도 상술한 바와 같은 선분 3개 또는 꼭지점 4개가 라벨링되어야 할 수 있다. 따라서 라벨링 필요성분들은 선분 3개 또는 꼭지점 4개일 수 있다. 하지만 상기 시스템(100)은 이러한 필요성분들 중 일부(예컨대, 선분 1개 또는 꼭지점 2개)만 라벨링하면 나머지 바운딩 박스의 성분들(예컨대, 바운딩 박스의 나머지 선분들 또는 꼭지점들)을 모두 특정할 수 있다. 즉, 필요성분들 중 일부인 라벨링 성분(예컨대, 1개의 선분)만 유저가 라벨링하도록 하면 바운딩 박스의 나머지 성분들(예컨대, 라벨링 성분으로 라벨링된 1개의 선분을 제외한 바운딩 박스의 나머지 선분들)이 특정될 수 있다. Ultimately, specifying a three-dimensional bounding box may be the labeling of a three-dimensional object, and for this, at least three line segments or four vertices as described above may need to be labeled. Therefore, the elements that need labeling may be 3 line segments or 4 vertices. However, the system 100 can specify all of the remaining bounding box components (e.g., the remaining line segments or vertices of the bounding box) by labeling only some of these necessary elements (e.g., one line segment or two vertices). there is. In other words, if the user labels only a labeling component (e.g., one line segment), which is part of the necessary components, the remaining components of the bounding box (e.g., the remaining line segments of the bounding box excluding one line segment labeled as a labeling component) can be specified. You can.

상기 시스템(100)이 필요성분들 중 일부인 라벨링 성분만 라벨링하면 자동으로 나머지 성분들이 특정되기 위해서, 본 발명의 실시 예에서는 객체의 종류별 크기정보를 미리 저장하고 있을 수 있다. 이러한 크기정보의 집합을 본 명세서에서는 템플릿 정보로 정의하기로 한다. 상기 크기정보는 삼차원 객체의 형상을 특정할 수 있는 정보일 수 있다. 예컨대 크기정보는 객체의 종류별 디폴트 삼차원 바운딩 박스 또는 이를 특정할 수 있는 정보를 의미할 수 있다.In order to automatically specify the remaining components when the system 100 labels only some of the necessary labeling components, size information for each type of object may be stored in advance in an embodiment of the present invention. This set of size information is defined as template information in this specification. The size information may be information that can specify the shape of a three-dimensional object. For example, size information may mean the default three-dimensional bounding box for each type of object or information that can specify it.

본 명세서에서 라벨링의 대상이 되는 객체는 다양한 차량들 일 수 있고, 차량의 종류별로 미리 크기정보가 특정되어 있을 수 있으므로, 이러한 차량의 종류별 크기정보를 이용하여 일부의 성분(예컨대, 바운딩 박스를 구성하는 선분들 중 1개의 선분)만을 라벨링하면 나머지 성분들이 특정될 수 있다.In this specification, objects subject to labeling may be various vehicles, and size information may be specified in advance for each type of vehicle, so size information for each type of vehicle is used to configure some components (e.g., a bounding box). By labeling only one line segment among the line segments, the remaining components can be specified.

한편, 상기 시스템(100)이 나머지 성분들을 특정하기 위해서는 카메라 캘리브레이션 정보를 알아야 할 수 있다. 상기 시스템(100)은 라벨링 대상영상에 표시된 정보 중 미리 알고 있는 정보(예컨대, 어떤 것들이 직선인지 또는 미리 알고 있는 객체의 높이정보 등)를 이용하여 상기 카메라 캘리브레이션 정보를 파악할 수 있다. 그러면 상기 시스템(100)은 카메라 캘리브레이션 정보와 상기 크기정보 그리고 라벨링된 라벨링 성분을 이용하여 라벨링 즉, 삼차원 바운딩 박스를 생성할 수 있다.Meanwhile, the system 100 may need to know camera calibration information in order to specify the remaining components. The system 100 can determine the camera calibration information using information known in advance (for example, which straight lines are straight or height information about objects known in advance) among the information displayed in the labeling target image. Then, the system 100 can generate labeling, that is, a three-dimensional bounding box, using the camera calibration information, the size information, and the labeled labeling component.

이하, 본 명세서에서는 라벨링의 대상이 되는 객체가 차량인 경우를 예시적으로 설명하지만 본 발명의 기술적 사상이 반드시 이에 국한되어 적용될 필요는 없음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.Hereinafter, in this specification, the case where the object subject to labeling is a vehicle will be described as an example, but an average expert in the technical field of the present invention can easily deduce that the technical idea of the present invention is not necessarily limited to this. There will be.

또한, 카메라(200)에 의해 획득되는 원본 영상은 도로인 경우를 예시적으로 설명하기만 이에 국한되지는 않는다.Additionally, the original image acquired by the camera 200 is not limited to the example of a road.

본 발명의 기술적 사상을 이용하면 자율주행, 이동 로봇, 지능형 CCTV 등 다양한 분야에서 객체의 위치를 정확하게 결정하고, 그에 따라 서비스 품질이 높아질 수 있는 효과가 있다.Using the technical idea of the present invention has the effect of accurately determining the location of objects in various fields such as autonomous driving, mobile robots, and intelligent CCTV, thereby improving service quality.

이러한 기술적 사상을 위한 본 발명의 실시 예에 따른 시스템(100)의 물리적인 구성은 도 3에 도시된 바와 같을 수 있다.The physical configuration of the system 100 according to an embodiment of the present invention for this technical idea may be as shown in FIG. 3.

도 3을 참조하면, 상기 시스템(100)은 본 발명의 기술적 사상을 구현하기 위해 필요한 하드웨어 리소스(resource) 및/또는 소프트웨어를 구비한 데이터 처리장치를 의미할 수 있다.Referring to FIG. 3, the system 100 may mean a data processing device equipped with hardware resources and/or software necessary to implement the technical idea of the present invention.

또한 도 2 또는 도 3에는 설명의 편의상 상기 시스템(100)이 어느 하나의 물리적 장치인 것처럼 도시하였지만, 반드시 하나의 물리적인 구성요소를 의미하거나 하나의 장치를 의미하는 것은 아니다. Also, in Figures 2 or 3, for convenience of explanation, the system 100 is shown as if it were a physical device, but this does not necessarily mean one physical component or one device.

즉, 상기 시스템(100)은 본 발명의 기술적 사상을 구현하기 위해 구비되는 하드웨어 및/또는 소프트웨어가 결합되어 본 명세서에서 정의되는 기능을 수행할 수 있는 장치를 의미할 수 있으며, 필요한 경우에는 서로 이격된 장치에 설치되어 각각의 기능을 수행함으로써 본 발명의 기술적 사상에 따른 상기 시스템(100)이 구현될 수도 있다. 따라서 도 3에 도시된 각각의 구성들은 서로 다른 물리적 장치에 분산되어 구비될 수도 있다. In other words, the system 100 may refer to a device capable of performing the functions defined in this specification by combining hardware and/or software provided to implement the technical idea of the present invention, and if necessary, may be separated from each other. The system 100 according to the technical idea of the present invention may be implemented by being installed in a device and performing each function. Accordingly, each of the components shown in FIG. 3 may be distributed and provided in different physical devices.

또한, 도 2에서는 상기 시스템(100)과 카메라가 별도의 장치처럼 도시하였지만, 필요에 따라 상기 시스템(100)은 카메라에 탑재되어 본 발명의 기술적 사상을 구현할 수도 있고, 이러한 경우에는 엣지 컴퓨팅을 통해 자율주행이나 이동로봇의 제어에 활용될 수 있는 효과가 있다. In addition, in FIG. 2, the system 100 and the camera are shown as separate devices, but if necessary, the system 100 may be mounted on the camera to implement the technical idea of the present invention, and in this case, edge computing is used. It can be used to control autonomous driving or mobile robots.

한편, 상기 시스템(100)은 물리적으로는 도 3에 도시된 바와 같은 구성을 가질 수 있다. 상기 시스템(100)은 본 발명의 기술적 사상을 구현하기 위한 프로그램이 저장되는 메모리(저장장치)(120), 및 상기 메모리(120)에 저장된 프로그램을 실행하기 위한 프로세서(110)가 구비될 수 있다. 상기 프로그램은 본 발명의 기술적 사상에 따른 삼차원 객체의 라벨링을 수행할 수 있다. Meanwhile, the system 100 may have a physical configuration as shown in FIG. 3. The system 100 may be provided with a memory (storage device) 120 in which a program for implementing the technical idea of the present invention is stored, and a processor 110 for executing the program stored in the memory 120. . The program can perform labeling of three-dimensional objects according to the technical idea of the present invention.

상기 프로세서(110)는 상기 시스템(100)의 구현 예에 따라, CPU, 모바일 프로세서 등 다양한 명칭으로 명명될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. 또한, 전술한 바와 같이 상기 시스템(100)은 복수의 물리적 장치들이 유기적으로 결합되어 구현될 수도 있으며, 이러한 경우 상기 프로세서(110)는 물리적 장치별로 적어도 한 개 구비되어 본 발명의 시스템(100)을 구현할 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다.An average expert in the field of the present invention can easily deduce that the processor 110 may be named by various names, such as CPU or mobile processor, depending on the implementation example of the system 100. In addition, as described above, the system 100 may be implemented by organically combining a plurality of physical devices. In this case, at least one processor 110 is provided for each physical device to operate the system 100 of the present invention. An average expert in the technical field of the present invention can easily deduce that it can be implemented.

상기 메모리(120)는 상기 프로그램이 저장되며, 상기 프로그램을 구동시키기 위해 상기 프로세서가 접근할 수 있는 어떠한 형태의 저장장치로 구현되어도 무방하다. 또한 하드웨어적 구현 예에 따라 상기 메모리(120)는 어느 하나의 저장장치가 아니라 복수의 저장장치로 구현될 수도 있다. 또한 상기 메모리(120)는 주기억장치 뿐만 아니라, 임시기억장치를 포함할 수도 있다. 또한 휘발성 메모리 또는 비휘발성 메모리로 구현될 수도 있으며, 상기 프로그램이 저장되고 상기 프로세서에 의해 구동될 수 있도록 구현되는 모든 형태의 정보저장 수단을 포함하는 의미로 정의될 수 있다. The memory 120 stores the program, and may be implemented as any type of storage device that can be accessed by the processor to run the program. Additionally, depending on the hardware implementation example, the memory 120 may be implemented not as a single storage device but as a plurality of storage devices. Additionally, the memory 120 may include not only a main memory but also a temporary memory. It may also be implemented as volatile memory or non-volatile memory, and can be defined to include all types of information storage means implemented so that the program can be stored and driven by the processor.

또한 상기 시스템(100)의 실시 예에 따라 다양한 주변장치들(주변장치 1 내지 주변장치 N, 130, 131)이 더 구비될 수 있다. 예컨대, 키보드, 모니터, 그래픽 카드, 통신장치 등이 주변장치로써 상기 시스템(100)에 더 포함될 수 있음을 본 발명의 기술분야의 평균적 전문가는 용이하게 추론할 수 있을 것이다. Additionally, depending on the embodiment of the system 100, various peripheral devices (peripheral devices 1 to peripheral devices N, 130, and 131) may be further provided. For example, an average expert in the field of the present invention can easily deduce that a keyboard, monitor, graphics card, communication device, etc. may be further included in the system 100 as peripheral devices.

이하, 본 명세서에서 상기 시스템(100)이 소정의 기능을 수행한다고 함은, 상기 프로세서(110)가 상기 프로그램을 구동하여 상기 기능을 수행하는 경우를 의미할 수 있다.Hereinafter, in this specification, when the system 100 performs a certain function, it may mean that the processor 110 performs the function by running the program.

또한, 상기 시스템(100)은 라벨링을 수행하는 유저가 이용하는 단말기일 수도 있고, 필요에 따라 상기 유저가 이용하는 별도의 단말기(미도시)가 구비될 수도 있다. 이러한 경우 상기 시스템(100)은 상기 단말기(미도시)와 통신을 수행하면서 본 발명의 기술적 사상을 구현할 수도 있다.Additionally, the system 100 may be a terminal used by a user who performs labeling, or may be provided with a separate terminal (not shown) used by the user as needed. In this case, the system 100 may implement the technical idea of the present invention while communicating with the terminal (not shown).

도 4 내지 도 5는 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 방법을 설명하기 위한 개략적인 플로우 차트를 나타낸다.Figures 4 and 5 show a schematic flow chart for explaining a method of labeling a three-dimensional object according to an embodiment of the present invention.

우선 도 4를 참조하면, 본 발명의 실시 예에 따른 시스템(100)은 라벨링을 수행할 객체의 종류별 크기정보들 즉, 템플릿 정보를 저장할 수 있다(S100).First, referring to FIG. 4, the system 100 according to an embodiment of the present invention can store size information for each type of object to be labeled, that is, template information (S100).

상기 객체가 차량인 경우 예컨대, 소형 승용차, 중형 승용차, 대형 승용차, 트럭, 버스, 승합차 등과 같이 차량의 종류별 크기정보(예컨대, 삼차원 바운딩 박스 또는 이를 특정할 수 있는 정보, 너비, 길이, 높이 등)가 미리 상기 시스템(100)에 저장되어 있을 수 있다.If the object is a vehicle, for example, small passenger car, medium-sized car, large car, truck, bus, van, etc., size information for each type of vehicle (e.g., three-dimensional bounding box or information that can specify it, width, length, height, etc.) may be stored in the system 100 in advance.

그러면 상기 시스템(100)은 유저에게 라벨링 대상영상을 제공할 수 있다(S110). 예컨대, 유저가 상기 시스템(100)을 이용하는 경우에는 라벨링 대상영상을 소정의 모니터에 디스플레이함으로써 유저에게 제공할 수 있다. 예컨대, 유저가 별도의 단말기(미도시)를 이용하는 경우에 상기 시스템(100)은 상기 단말기(미도시)로 라벨링 대상영상을 전송함으로써 상기 라벨링 대상영상을 유저에게 제공할 수도 있다.Then, the system 100 can provide the labeling target image to the user (S110). For example, when a user uses the system 100, the labeling target image can be provided to the user by displaying it on a certain monitor. For example, when the user uses a separate terminal (not shown), the system 100 may provide the labeling target image to the user by transmitting the labeling target image to the terminal (not shown).

상기 라벨링 대상영상은 전술한 바와 같이 카메라(200)에 의해 획득된 원본 영상의 왜곡이 조정된 영상일 수 있다.As described above, the labeling target image may be an image in which the distortion of the original image acquired by the camera 200 has been adjusted.

상기 시스템(100)은 라벨링 대상영상을 유저에게 제공하고 이에 응답하여 유저로부터 라벨링 성분을 입력받을 수 있다(S120). The system 100 may provide a labeling target image to the user and receive labeling components from the user in response (S120).

상기 라벨링 성분은 전술한 바와 같이 라벨링 필요성분들 중 일부일 수 있다. 예컨대, 라벨링 필요성분은 적어도 삼차원 바운딩 박스를 구성하는 8개의 선분들 중 3개 이상이거나, 라벨링 필요성분은 삼차원 바운딩 박스를 구성하는 꼭지점 8개 중 4개이상일 수 있음은 전술한 바와 같다.The labeling component may be some of the labeling elements as described above. For example, as described above, the elements required for labeling may be at least three of the eight line segments constituting the three-dimensional bounding box, or the elements required for labeling may be four or more out of eight vertices constituting the three-dimensional bounding box.

그리고 본 발명의 실시 예에 의하면, 상기 시스템(100)은 라벨링 성분으로써 삼차원 바운딩 박스를 구성하는 선분들 중 1개의 선분일 수 있다. 즉, 상기 시스템(100)은 1개의 선분만 입력받을 수 있다. 그러면 상기 시스템(100)은 1개의 선분만으로도 객체에 상응하는 삼차원 바운딩 박스를 특정할 수 있다(S130). 즉, 입력받은 1개의 선분을 제외한 삼차원 바운딩 박스의 나머지 성분(예컨대, 선분들 또는 꼭지점들)을 특정할 수 있다.And according to an embodiment of the present invention, the system 100 may be one line segment among line segments constituting a three-dimensional bounding box as a labeling component. That is, the system 100 can only receive one line segment. Then, the system 100 can specify a three-dimensional bounding box corresponding to the object using only one line segment (S130). That is, the remaining components (eg, line segments or vertices) of the three-dimensional bounding box excluding one input line segment can be specified.

필요성분들을 다 입력받지 않고 라벨링 성분(예컨대, 선분 1개)만 입력받아도 삼차원 바운딩 박스를 특정할 수 있는 것은, 객체의 크기정보와 카메라 캘리브레이션 정보를 미리 알고 있으면 가능하다.It is possible to specify a three-dimensional bounding box by inputting only the labeling component (for example, one line segment) without inputting all necessary components if the size information of the object and camera calibration information are known in advance.

이를 위해 상기 시스템(100)은 객체의 종류에 대한 정보를 추가로 유저로부터 선택받을 수 있다. 예컨대, 후술할 도 10에 도시된 바와 같이 라벨링할 객체의 객체 종류(예컨대, class : 버스)를 입력할 수 있다. To this end, the system 100 can additionally receive information about the type of object selected by the user. For example, as shown in FIG. 10, which will be described later, the object type (eg, class: bus) of the object to be labeled can be entered.

그리고 상기 시스템(100)은 유저로부터 입력받은 라벨링 성분이 바운딩 박스 내에서 어떤 선분인지에 대한 정보 즉, 기하 식별정보를 입력받을 수 있다.And the system 100 can receive information about which line segment the labeling component input from the user is within the bounding box, that is, geometric identification information.

이러한 일 예는 도 9 및 도 10을 참조하여 설명하도록 한다.One such example will be described with reference to FIGS. 9 and 10.

도 9 내지 도 10은 본 발명의 실시 예에 따른 삼차원 라벨링 툴을 설명하기 위한 도면이다.9 to 10 are diagrams for explaining a three-dimensional labeling tool according to an embodiment of the present invention.

우선 도 9를 참조하면, 본 발명의 기술적 사상에 따른 라운딩 박스는 도 9에 도시된 바와 같이 8개의 선분으로 특정될 수 있다. 각각의 선분들에는 기하 식별정보가 도 9에 도시된 바와 같이 부여될 수 있다. First, referring to FIG. 9, a rounding box according to the technical idea of the present invention can be specified as eight line segments as shown in FIG. 9. Geometric identification information may be assigned to each line segment as shown in FIG. 9.

그러면 상기 시스템(100)은 도 9에 도시된 바와 같은 예시적인 바운딩 박스 및 각 선분의 기하 식별정보를 유저에게 제공할 수 있다. Then, the system 100 may provide the user with an exemplary bounding box as shown in FIG. 9 and geometric identification information for each line segment.

그리고 유저는 도 10에 도시된 바와 같이 자신이 라벨링한 라벨링 선분(기준선)이 어떤 선분인지 즉, 라벨링 선분의 기하 식별정보를 입력할 수 있다. 또한 도 10에 도시된 바와 같이 객체의 종류(예컨대, 클래스)에 대한 정보를 입력할 수 있다. 그러면 상기 시스템(100)은 입력받은 객체의 종류에 상응하는 크기정보(예컨대, 너비 250, 길이 1200, 높이 340)에 대한 정보를 특정할 수 있다.And, as shown in FIG. 10, the user can enter the geometric identification information of the labeling line segment (baseline) that the user has labeled. Additionally, as shown in FIG. 10, information about the type (eg, class) of the object can be entered. Then, the system 100 can specify size information (e.g., width 250, length 1200, height 340) corresponding to the type of the input object.

그러면 상기 시스템(100)은 상기 기하 식별정보(예컨대, 1번), 상기 템플릿 정보에 포함된 상기 객체의 크기정보(예컨대, 너비 250, 길이 1200, 높이 340) , 및 카메라의 캘리브레이션 정보에 기초하여 상기 라벨링 성분을 제외한 상기 바운딩 박스의 나머지 성분을 특정할 수 있다. 즉, 바운딩 박스를 특정할 수 있다. Then, the system 100 is based on the geometric identification information (e.g., number 1), size information of the object included in the template information (e.g., width 250, length 1200, height 340), and camera calibration information. The remaining components of the bounding box excluding the labeling component can be specified. In other words, the bounding box can be specified.

즉, 라벨링 대상영상은 왜곡이 보정되어 있고, 카메라 캘리브레이션 정보 즉, 카메라(200)의 내부 파라미터 및 외부 파라미터를 알고 있고, 유저가 라벨링한 라벨링 선분이 바운딩 박스의 어떤 선분(기하 식별정보)인지를 알고 있으면 라벨링할 객체의 삼차원 바운딩 박스를 용이하게 특정할 수 있다.That is, the labeling target image has had its distortion corrected, the camera calibration information, that is, the internal and external parameters of the camera 200, is known, and the labeling line segment labeled by the user is a line segment (geometric identification information) of the bounding box. If you know this, you can easily specify the three-dimensional bounding box of the object to be labeled.

한편, 본 발명의 기술적 사상에 의하면, 상기 시스템(100)은 카메라(200)로부터 획득된 이차원의 원본 영상을 이용하여 카메라 캘리브레이션 정보를 획득할 수 있다. Meanwhile, according to the technical idea of the present invention, the system 100 can obtain camera calibration information using the two-dimensional original image obtained from the camera 200.

이러한 일 예는 도 5 내지 도 8을 참조하여 설명하도록 한다.One such example will be described with reference to FIGS. 5 to 8.

도 5를 참조하면, 상기 시스템(100)은 카메라(200)로부터 상기 카메라(200)가 촬영한 원본 영상을 입력받을 수 있다(S200). 이를 위해 상기 시스템(100)은 상기 카메라(200)와 유무선 통신을 수행할 수 있음은 물론이다.Referring to FIG. 5, the system 100 can receive an original image captured by the camera 200 (S200). To this end, it goes without saying that the system 100 can perform wired and wireless communication with the camera 200.

그러면 상기 시스템(100)은 입력받은 원본 영상에 대해 렌즈 왜곡이 보정된 상기 라벨링 대상영상을 생성할 수 있다(S210).Then, the system 100 can generate the labeling target image with lens distortion corrected for the input original image (S210).

그러면 상기 시스템(100)은 라벨링 대상영상 즉, 왜곡이 보정된 영상을 이용하여 상기 카메라(200)의 카메라 캘리브레이션 정보를 획득할 수 있다(S220). Then, the system 100 can obtain camera calibration information of the camera 200 using the labeling target image, that is, the distortion-corrected image (S220).

상기 시스템(100)은 원본 영상 및 라벨링 대상영상에 표시된 정보를 이용하여 왜곡을 보정할 수 있고, 또한 카메라 캘리브레이션 정보를 획득할 수 있다. The system 100 can correct distortion using information displayed on the original image and the labeling target image, and can also obtain camera calibration information.

이러한 일 예는 도 6 내지 도 8을 참조하여 설명한다.One such example is described with reference to FIGS. 6 to 8.

도 6은 본 발명의 실시 예에 따른 원본 영상과 라벨링 대상영상의 일 예를 나타낸다. 또한, 도 7 내지 도 8은 본 발명의 실시 예에 따른 카메라 캘리브레이션 정보를 획득하기 위한 방법을 설명하기 위한 도면이다.Figure 6 shows an example of an original image and a labeling target image according to an embodiment of the present invention. Additionally, Figures 7 and 8 are diagrams for explaining a method for obtaining camera calibration information according to an embodiment of the present invention.

우선 도 6을 참조하면, 카메라(200)에 의해 획득되는 원본 영상은 도 6a와 같이 렌즈의 특성에 따라 왜곡된 영상일 수 있다.First, referring to FIG. 6, the original image obtained by the camera 200 may be a distorted image depending on the characteristics of the lens, as shown in FIG. 6A.

이러한 왜곡된 원본 영상을 보정하기 위해 상기 시스템(100)은 원본 영상내에서 직선 상에 존재하여하는 포인트들을 복수개 특정할 수 있다. In order to correct such a distorted original image, the system 100 can specify a plurality of points that exist on a straight line in the original image.

예컨대, 도 6a에 도시된 바와 같이 원본 영상에서 흰 점으로 표시된 일련의 포인트들은 각각 직선상에 존재하여야 할 포인트들일 수 있다. For example, as shown in FIG. 6A, a series of points marked as white dots in the original image may each be points that should exist on a straight line.

예컨대, 원본 영상에 차선이 존재하는 경우 하나의 차선에 존재하는 복수의 포인트들(예컨대, 10 내지 13, 14 내지 16)은 직선상에 존재하여야할 포인트들일 수 있다. 이외에도 상기 원본 영상에는 건물의 모서리와 같이 직선상에 존재하여야할 포인트들이 다수 존재할 수 있다. For example, when a lane exists in the original image, a plurality of points (eg, 10 to 13, 14 to 16) that exist in one lane may be points that should exist on a straight line. In addition, there may be many points in the original image that must exist on a straight line, such as the corners of a building.

상기 시스템(100)은 이러한 포인트들 즉, 직선상에 존재하여야할 복수의 포인트들의 세트를 적어도 하나 특정할 수 있다. 예컨대, 하나의 포인트 세트에 포함된 포인트들은 직선으로 투영되어야 할 포인트들일 수 있다. 그리고 상기 시스템(100)은 세트들에 포함된 포인트들이 최대한 직선으로 투영될 수 있도록 하는 함수를 연산할 수 있다. The system 100 may specify at least one set of such points, that is, a plurality of points that must exist on a straight line. For example, points included in one point set may be points that should be projected onto a straight line. And the system 100 can calculate a function that allows the points included in the sets to be projected as straight as possible.

상기 시스템(100)은 직선상에 존재하여야 할 포인트들을 관리자 또는 유저에 의해 특정받을 수 있다. 그리고 이를 이용하여 상기의 함수를 한번 연산하면 그 이후에는 상기 카메라(200)에 의해 촬영되는 원본 영상은 자동으로 왜곡이 보정될 수 있다. The system 100 can specify points that should exist on a straight line by an administrator or user. And if the above function is calculated once using this, the distortion of the original image captured by the camera 200 can be automatically corrected thereafter.

상기 시스템(100)은 직선상에 존재하여야 할 포인트들이 최대한 직선으로 투영되도록 하는 함수를 에퀴디스턴트(equidistatn) 모델을 사용하여 연산할 수 있다. The system 100 can use the equidistance model to calculate a function that allows points that must exist on a straight line to be projected as straight as possible.

예컨대, 상기 시스템(100)은 다음과 같은 수식을 이용하여 상기 함수를 연산할 수 있다.For example, the system 100 can calculate the function using the following formula.

[수학식 1][Equation 1]

[수학식 2][Equation 2]

여기서 (x_d, y_d)는 왜곡된 영상의 좌표이고 (x_u, y_u)는 보정 된 영상의 좌표이며, r_d는 영상 중심(x_c, y_c)에서 왜곡된 영상의 좌표까지 거리를 의미한다. Here, (x _d , y _d ) are the coordinates of the distorted image, (x _u , y _u ) are the coordinates of the corrected image, and r _d is the distance from the image center (x _c , y _c ) to the coordinates of the distorted image. means.

그리고 예컨대, 수학식 2에서 미리 정해진 차수의 항(예컨대, 4차 항)까지 사용하여 곡선이 최대한 직선이 되도록 함수의 계수를 추정하면, 함수 L(r_u)를 특정할 수 있다.And, for example, by estimating the coefficient of the function so that the curve is as straight as possible by using terms of a predetermined order (e.g., a fourth-order term) in Equation 2, the function L(r _u ) can be specified.

함수 L(r_u)가 특정되면 상기 함수를 이용하여 상기 시스템(100)은 이후에 상기 카메라(200)로부터 획득되는 원본 영상을 상기 함수를 이용하여 보정할 수 있으며, 그 결과 라벨링 대상영상을 생성할 수 있다. When the function L(r _u ) is specified, the system 100 can later correct the original image acquired from the camera 200 using the function, and as a result, generate a labeling target image. can do.

이러한 방식으로 도 6a에 도시된 바와 같은 원본 영상의 왜곡을 보정한 라벨링 대상영상의 일 예는 도 6b에 도시된 바와 같을 수 있다.An example of a labeling target image in which the distortion of the original image as shown in FIG. 6A is corrected in this way may be as shown in FIG. 6B.

상기 시스템(100)은 카메라(200) 렌즈의 특성에 따른 왜곡이 보정된 영상 즉, 라벨링 대상영상에 표시된 정보를 이용하여 카메라 캘리브레이션 정보를 획득할 수 있다. The system 100 can obtain camera calibration information using information displayed on an image for which distortions have been corrected according to the characteristics of the lens of the camera 200, that is, a labeling target image.

본 발명의 일 실시 예에 의하면, 상기 시스템(100)은 소실점(vanishing point) 및/또는 미리 알고 있는 물체의 높이를 이용하여 상기 카메라(200)의 내부 파라미터(초점거리) 및/또는 외부 파라미터(회전 R 과 평행이동 T)를 연산할 수 있다. According to one embodiment of the present invention, the system 100 uses the vanishing point and/or the height of a previously known object to determine the internal parameters (focal length) and/or external parameters ( Rotation R and translation T) can be calculated.

상기 시스템(100)은 라벨링 대상영상으로부터 두 개의 소실점을 연산할 수 있다. The system 100 can calculate two vanishing points from the labeling target image.

이를 위해 상기 시스템(100)은 유저 또는 관리자로부터 가로방향 및 세로방향으로 각각 서로 평행한 두 개의 직선 선분들 즉, 직선 페어들을 입력받을 수 있다. For this purpose, the system 100 can receive input from a user or administrator as two straight line segments, that is, straight pairs, that are parallel to each other in the horizontal and vertical directions, respectively.

예컨대, 도 7에서 가로방향의 직선 페어는 횡단보도의 양 끝 점들을 있는 두 직선들(예컨대, 20, 21)일 수 있다. For example, in FIG. 7, the pair of horizontal straight lines may be two straight lines (eg, 20, 21) with both end points of the crosswalk.

또한 세로 방향의 직선 페어는 두 개의 차선들(예컨대, 30, 31)일 수 있다. 즉, 상기 시스템(100)은 미리 서로 평행한 직선임을 알고 있는 이미지 내의 정보를 유저 또는 관리자로부터 특정받고, 이를 이용하여 두 개의 소실점을 구할 수 있다. Additionally, a vertical straight pair may be two lanes (eg, 30 and 31). That is, the system 100 can receive information from a user or administrator in an image that is known in advance to be straight lines parallel to each other, and use this information to obtain two vanishing points.

두 개의 소실점이 상기 라벨링 대상영상에 표시된 직선 페어들에 의해 특정되면, 나머지 하나의 소실점은 소실점과 중심점의 관계에 의해 계산될 수 있다.If two vanishing points are specified by pairs of straight lines indicated in the labeling target image, the remaining vanishing point can be calculated based on the relationship between the vanishing point and the center point.

예컨대, 도 8을 참조하면, 카메라의 광축이 영상 평면과 만나는 점을 주점이라 하며, 이를 편의상 영상 중심점으로 정하고, 중심점 x0는 도 8과 같이 세 축의 소실점들(x1, x2, x3)을 꼭짓점으로 하는 삼각형의 수심(orthocenter)이 된다. 두 개의 소실점과 중심점을 알고 있으면, 소실점과 중심점과의 관계를 활용하여 남은 한 개의 소실점을 계산할 수 있다. For example, referring to Figure 8, the point where the optical axis of the camera meets the image plane is called the principal point, and for convenience, this is designated as the image center point, and the center point x0 is the vertex with the vanishing points (x1, x2, x3) of the three axes as shown in Figure 8. becomes the orthocenter of the triangle. If you know the two vanishing points and the center point, you can use the relationship between the vanishing points and the center point to calculate the remaining vanishing point.

그리고 3개의 소실점들이 특정되면, 각 소실점들에 대한 스케일 값을 연산할 수 있고, 이를 통해 렌즈의 초점거리를 계산할 수 있다. And once the three vanishing points are specified, the scale value for each vanishing point can be calculated, and through this, the focal length of the lens can be calculated.

예컨대, 도 8의 삼각형에서 소실점 x1 에 대한 스케일 λ1의 제곱은 아래 수학식 3과 같이 해당 소실점(x1)를 제외한 주점(x0)과 다른 두 소실점(x2, x3)으로 생성된 삼각형의 면적(△x0, x2, x3)을 소실점으로 생성된 삼각형(△x1, x2, x3)으로 나누어 계산할 수 있다.For example, in the triangle of FIG. 8, the square of scale λ1 for the vanishing point x1 is the area of the triangle created by the main point (x0) excluding the vanishing point (x1) and the other two vanishing points (x2, It can be calculated by dividing x0, x2, x3) by the triangle created by the vanishing point (△x1, x2, x3).

[수학식 3][Equation 3]

위와 같은 방식으로 각 소실점에 대한 스케일 값이 계산되면, 상기 시스템(100)은 예컨대, 수학식 4를 통해 렌즈의 초점 거리를 계산할 수 있다.If the scale value for each vanishing point is calculated in the above manner, the system 100 can calculate the focal length of the lens through Equation 4, for example.

[수학식 4][Equation 4]

여기서, (u₁, v₁), (u₂, v₂), (u₃, v₃)는 각각 앞서 구한 소실점 3개의 영상(uv좌표계)에서 좌표를 의미한다. 또한, λ₁, λ₂. λ₃은 소실점에 대한 스케일 값이며, K는 내부 행렬, R은 회전행렬을 의미한다.Here, (u ₁ , v ₁ ), (u ₂ , v ₂ ), and (u ₃ , v ₃ ) respectively mean the coordinates in the three vanishing point images (uv coordinate system) obtained previously. Also, λ ₁ , λ ₂ . λ ₃ is the scale value for the vanishing point, K is the internal matrix, and R is the rotation matrix.

결국 상기 시스템(100)은 상기 라벨링 대상영상에서 서로 다른 방향(가로방향 및 세로방향)으로 형성되는 복수의 평행 직선 페어들을 특정하고, 상기 평행 직선 페어들에 기초하여 소실점들을 특정할 수 있다. 상기 평행 직선 페어들에 기초하여 소실점들을 특정한다고 함은, 전술한 바와 같이 평행 직선 페어들에 기초하여 2개의 소실점을 연산하고 나머지 1개의 소실점은 연산을 통해 특정하는 경우를 포함하는 의미일 수 있다.Ultimately, the system 100 can specify a plurality of parallel straight line pairs formed in different directions (horizontal and vertical directions) in the labeling target image, and specify vanishing points based on the parallel straight line pairs. Specifying vanishing points based on the pairs of parallel straight lines may mean calculating two vanishing points based on the pairs of parallel straight lines and specifying the remaining vanishing point through calculation, as described above. .

그리고 상술한 바와 같이 특정한 상기 소실점들에 기초하여 렌즈의 초점거리를 포함하는 상기 카메라 캘리브레이션 정보를 획득할 수 있다. And as described above, the camera calibration information including the focal length of the lens can be obtained based on the specific vanishing points.

그러면 상기 시스템(100)은 카메라(200)의 외부 파라미터인 회전행렬(R) 및 평행 이동(T)를 다음과 같은 수학식을 이용하여 연산할 수 있다.Then, the system 100 can calculate the rotation matrix (R) and translation matrix (T), which are external parameters of the camera 200, using the following equation.

[수학식 5][Equation 5]

여기서 K^-1는 내부행렬의 역행렬을 의미한다.Here, K ^-1 means the inverse matrix of the internal matrix.

회전행렬이 특정되면 평행이동(T)는 아래의 수학식 6에 연산될 수 있는데, 이를 위해서는 카메라의 장착 높이(H_c)를 알아야 할 수 있다. 그리고 상기 카메라의 장착 높이(H_c)는 라벨링 대상영상에서 미리 그 높이를 알고 있는 물체(높이 객체)를 이용할 수 있다.Once the rotation matrix is specified, the translation (T) can be calculated using Equation 6 below, which may require knowing the mounting height (H _c ) of the camera. And, as the mounting height (H _c ) of the camera, an object (height object) whose height is known in advance in the labeling target image can be used.

예컨대, 도 7에 도시된 라벨링 대상영상에서 지면으로부터의 높이를 미리 알고 있는 물체(예컨대, 신호등, 차단 바 등, 40)의 양 끝점을 관리자로부터 입력받을 수 있다. For example, in the labeling target image shown in FIG. 7, both endpoints of an object (e.g., a traffic light, a blocking bar, etc. 40) whose height from the ground is known in advance can be input from the administrator.

그러면 상기 시스템(100)은 상기 높이 객체(40)의 미리 알고 있는 높이와 상기 라벨링 대상영상에서의 높이에 기초하여 용이하게 카메라의 장착 높이(Hc)를 연산할 수 있다. 그러면 수학식 6과 같이 카메라 캘리브레이션 정보에 포함되는 평행이동(T)를 획득할 수 있다.Then, the system 100 can easily calculate the mounting height (Hc) of the camera based on the previously known height of the height object 40 and the height in the labeling target image. Then, the parallel movement (T) included in the camera calibration information can be obtained as shown in Equation 6.

[수학식 6][Equation 6]

T는 평행이동 변위 벡터를 나타내며, H_c는 3차원 좌표계에서 카메라의 높이를 나타낸다.T represents the translation displacement vector, and H _c represents the height of the camera in a three-dimensional coordinate system.

그리고 상술한 바와 같이 카메라의 내부파라미터(초점거리) 및 외부파라미터(R, T)가 획득되면 상기 시스템(100)은 수학식 7과 같은 프로젝션 매트릭스(P)를 연산할 수 있다. And as described above, when the camera's internal parameters (focal length) and external parameters (R, T) are obtained, the system 100 can calculate the projection matrix (P) as shown in Equation 7.

[수학식 7][Equation 7]

여기서 α는 스케일 값, R^T는 회전행렬의 전치행렬을 나타낸다.Here, α represents the scale value and R ^T represents the transpose matrix of the rotation matrix.

그리고 (u₀, v₀)는 주점의 좌표, (u₁, v₁)은 영상 좌표계에서 임의의 좌표를 의미하고, f는 초점거리를 의미한다.And (u ₀ , v ₀ ) are the coordinates of the main point, (u ₁ , v ₁ ) are arbitrary coordinates in the image coordinate system, and f is the focal length.

또한 (X_c, Y_c, Z_c)는 3차원 좌표계에서 카메라의 위치를 나타낸다.Also, (X _c , Y _c , Z _c ) represents the position of the camera in a three-dimensional coordinate system.

이러한 프로젝션 매트릭스가 특정되면, 상기 시스템(100)은 3차원 좌표계를 2차원 이미지 좌표계로 변환할 수 있다. Once this projection matrix is specified, the system 100 can convert the three-dimensional coordinate system into a two-dimensional image coordinate system.

결국, 상기 시스템(100)은 상술한 바와 같이 라벨링 대상영상을 이용하여 카메라(200)의 카메라 캘리브레이션 정보를 획득할 수 있고, 미리 저장된 객체의 크기정보와 유저가 라벨링한 라벨링 선분의 기하 식별정보를 이용하여 삼차원 바운딩 박스를 특정하는데 필요한 필요성분들 모두를 입력받지 않아도, 상기 라벨링 선분의 입력만으로 삼차원 바운딩 박스를 생성할 수 있다. Ultimately, the system 100 can obtain camera calibration information of the camera 200 using the labeling target image as described above, and uses the pre-stored size information of the object and geometric identification information of the labeling line segment labeled by the user. Even without inputting all the necessary elements needed to specify a three-dimensional bounding box, a three-dimensional bounding box can be created just by inputting the labeling line segment.

이러한 일 예는 도 11을 참조하여 설명하도록 한다.One such example will be described with reference to FIG. 11 .

도 11은 본 발명의 실시 예에 따른 삼차원 객체의 라벨링 결과를 나타내는 도면이다. Figure 11 is a diagram showing the labeling results of a three-dimensional object according to an embodiment of the present invention.

도 11을 참조하면, 상기 시스템(100)은 라벨링 대상영상을 유저에게 제공할 수 있다.Referring to FIG. 11, the system 100 can provide a labeling target image to the user.

그러면 상기 시스템(100)은 유저로부터 라벨링할 객체(예컨대, 버스)에 상응하는 바운딩 박스 중 어느 하나의 선분 즉, 라벨링 선분(예컨대, 50)를 라벨링 대상영상 상에서 라벨링할 수 있다. Then, the system 100 can label one of the bounding boxes corresponding to the object to be labeled by the user (e.g., a bus), that is, a labeling line segment (e.g., 50) on the labeling target image.

그리고 도 10에서 설명한 바와 같이 유저는 라벨링할 객체의 종류(예컨대, 버스) 및 라벨링 선분의 기하식별정보(예컨대, 6)를 입력받을 수 있다.And as described in FIG. 10, the user can input the type of object to be labeled (eg, bus) and the geometric identification information of the labeling line segment (eg, 6).

그러면 상기 시스템(100)은 상기 객체의 종류에 상응하는 크기정보(너비, 높이, 길이)를 템플릿 정보에서 추출할 수 있고, 상기 기하 식별정보에 기초하여 라벨링한 라벨링 선분(50)이 바운딩 박스 상에서 객체의 길이에 해당하는 선분이며, 바운딩 박스를 구성하는 8개의 선분 중 어떤 선분인지 역시 알 수 있다.Then, the system 100 can extract size information (width, height, length) corresponding to the type of the object from the template information, and the labeling line segment 50 labeled based on the geometric identification information is placed on the bounding box. It is a line segment corresponding to the length of the object, and you can also know which of the eight line segments that make up the bounding box is.

그러면 상기 시스템(100)은 상기 라벨링 성분을 3차원 좌표계로 변환하고, 크기정보에 포함된 높이정보와 너비 정보를 추출하여 나머지 성분들(예컨대, 나머지 선분 또는 각각 8개의 꼭지점의 좌표)을 연산할 수 있다.Then, the system 100 converts the labeling component into a three-dimensional coordinate system, extracts the height information and width information included in the size information, and calculates the remaining components (e.g., the remaining line segments or the coordinates of each of the eight vertices). You can.

그리고 3차원 좌표계에서 나머지 성분들이 연산되면, 이를 상기 프로젝션 매트릭스를 이용하여 2차원의 이미지 좌표계로 변환할 수 있으며, 그러면 도 11에 도시된 바와 같이 라벨링 성분(예컨대, 50)만 유저가 라벨링을 하여도 상기 객체의 삼차원 바운딩 박스가 생성될 수 있다. And once the remaining components are calculated in the three-dimensional coordinate system, they can be converted to a two-dimensional image coordinate system using the projection matrix, and then the user labels only the labeling component (e.g., 50) as shown in FIG. 11. A three-dimensional bounding box of the object may also be created.

결국 본 발명의 기술적 사상에 의하면, 카메라(200)의 왜곡을 보정하고 보정된 영상으로 라벨링을 수행함으로써 객체의 삼차원 정보가 더욱 정확히 라벨링될 수 있으며, 삼차원 객체를 특정하기 위한 라벨링 작업의 일부만 수행하여도(필요성분 전체가 아닌 라벨링 성분만 라벨링하여도) 라벨링이 완성되어 라벨링 작업의 효율이 매우 커질 수 있는 효과가 있다.Ultimately, according to the technical idea of the present invention, the three-dimensional information of the object can be more accurately labeled by correcting the distortion of the camera 200 and performing labeling with the corrected image, and by performing only part of the labeling task to specify the three-dimensional object. This has the effect of greatly increasing the efficiency of labeling work by completing labeling (even by labeling only the labeling ingredients rather than all required ingredients).

한편, 본 발명의 실시예에 따른 삼차원 객체의 라벨링 방법은 컴퓨터가 읽을 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명의 실시예에 따른 제어 프로그램 및 대상 프로그램도 컴퓨터로 판독 가능한 기록 매체에 저장될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, the method for labeling three-dimensional objects according to an embodiment of the present invention can be implemented in the form of computer-readable program instructions and stored in a computer-readable recording medium, and the control program and object according to an embodiment of the present invention Programs may also be stored in a computer-readable recording medium. Computer-readable recording media include all types of recording devices that store data that can be read by a computer system.

기록 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 소프트웨어 분야 당업자에게 공지되어 사용 가능한 것일 수도 있다.Program instructions recorded on the recording medium may be those specifically designed and configured for the present invention, or may be known and available to those skilled in the software field.

컴퓨터로 읽을 수 있는 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and floptical disks. Includes magneto-optical media such as ROM, RAM, flash memory, and other hardware devices specifically configured to store and execute program instructions. Additionally, computer-readable recording media can be distributed across computer systems connected to a network, so that computer-readable code can be stored and executed in a distributed manner.

프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 전자적으로 정보를 처리하는 장치, 예를 들어, 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a device that electronically processes information using an interpreter, for example, a computer.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성요소들도 결합된 형태로 실시될 수 있다.The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, each component described as unitary may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타나며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described below rather than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

Claims

In a method for labeling a three-dimensional object in a two-dimensional image,
A step of the system storing template information including size information for each type of object to be labeled;
The system provides a labeling target image based on an original image captured from a camera to the user, and in response to the provision, receives only labeling elements that are part of the labeling elements necessary for specifying a three-dimensional bounding box corresponding to the object; and
The system receives geometric identification information within the bounding box of the labeling component from the user, and based on the geometric identification information, size information of the object included in the template information, and calibration information of the camera, A method for labeling a three-dimensional object, including the step of specifying remaining components of the bounding box excluding the labeling component.

The method of claim 1, wherein the labeling method of the three-dimensional object comprises:
Receiving the original image into the system;
A method for labeling a three-dimensional object, further comprising generating the labeling target image in which lens distortion is corrected for the original image received by the system.

The method of claim 2, wherein the labeling method of the three-dimensional object comprises:
A method for labeling a three-dimensional object, further comprising obtaining the camera calibration information based on the labeling target image generated by the system.

The method of claim 2, wherein the step of generating the labeling target image in which lens distortion is corrected for the original image received by the system comprises:
specifying points that must exist on a straight line from the original image;
A method of labeling a three-dimensional object comprising the step of the system generating the labeling target image from the original image using a function that projects the specified points into a straight line.

The method of claim 3, wherein the step of acquiring camera calibration information based on the labeling target image generated by the system comprises:
a step of the system specifying a plurality of parallel straight line pairs formed in different directions in the labeling target image;
The method of labeling a three-dimensional object including the system specifying vanishing points based on the pairs of parallel straight lines, and obtaining the camera calibration information including the focal length of a lens based on the specific vanishing points.

The method of claim 3, wherein the step of acquiring camera calibration information based on the labeling target image generated by the system comprises:
The system specifies a height object whose height is known in advance in the labeling target image;
A method for labeling a three-dimensional object, including the step of the system specifying a camera height based on a previously known height of the height object and a height in the labeling target image to obtain the camera calibration information.

The method of claim 1, wherein the labeling elements are:
These are the line segments that make up the bounding box,
The labeling ingredients are:
A method of labeling a three-dimensional object, characterized in that it is one line segment among the line segments constituting the bounding box.

A step where the system receives an original image captured from a camera;
generating the labeling target image in which lens distortion is corrected for the original image received by the system;
Obtaining the camera calibration information based on the labeling target image generated by the system,
The step of acquiring the camera calibration information based on the labeling target image generated by the system,
The system specifies a plurality of parallel straight line pairs formed in different directions in the labeling target image, specifies vanishing points based on the specific parallel straight pairs, and includes a focal length of the lens based on the specific vanishing points. Obtaining the camera calibration information; or
The system specifies a height object whose height is known in advance in the labeling target image, and specifies a camera height based on the known height of the height object and the height in the labeling target image to obtain the camera calibration information. A labeling method for three-dimensional objects involving steps.

A computer program installed in a data processing device and stored in a computer-readable recording medium to perform the method described in any one of claims 1 to 8.

In a system for labeling three-dimensional objects in two-dimensional images,
processor;
Includes a memory that stores a program driven by the processor,
The processor runs the program,
Stores template information including size information for each type of object subject to labeling, and provides the user with a labeling target image based on the original image captured from the camera,
In response to the provision, only labeling components that are part of the labeling requirements for specifying the three-dimensional bounding box corresponding to the object are input,
Geometric identification information within the bounding box of the labeling component is input from the user, and the labeling component is generated based on the geometric identification information, size information of the object included in the template information, and calibration information of the camera. A system for specifying the remaining components of the bounding box except for the remaining components.

The method of claim 10, wherein the processor drives the program,
A system that receives the original image and generates the labeling target image with lens distortion corrected for the original image.

The method of claim 11, wherein the processor runs the program,
The camera calibration includes specifying a plurality of parallel straight line pairs formed in different directions in the labeling target image, specifying vanishing points based on the parallel straight line pairs, and including a focal length of the lens based on the specific vanishing points. A system for obtaining information.

The method of claim 11, wherein the processor runs the program,
A system for obtaining the camera calibration information by specifying a height object whose height is known in advance in the labeling target image and specifying a camera height based on the previously known height of the height object and the height in the labeling target image.

processor;
Includes a memory that stores a program driven by the processor,
The processor runs the program,
Receives the original video captured from the camera,
Generates the labeling target image with lens distortion corrected for the input original image,
Obtaining the camera calibration information based on the generated labeling target image,
The camera specifies a plurality of pairs of parallel straight lines formed in different directions in the labeling target image, specifies vanishing points based on the specific pairs of parallel straight lines, and includes a focal length of a lens based on the specific vanishing points. Obtain calibration information, or
A system for obtaining the camera calibration information by specifying a height object whose height is known in advance in the labeling target image and specifying a camera height based on the previously known height of the height object and the height in the labeling target image.