KR20230108436A

KR20230108436A - Method and apparatus for generating realistic driving image based on adversarial generation neural network through image context error

Info

Publication number: KR20230108436A
Application number: KR1020220003822A
Authority: KR
Inventors: 변혜란; 전석규; 문철현; 홍기범
Original assignee: 연세대학교 산학협력단
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2023-07-18

Abstract

본 실시예들은 적대적 손실 함수, 순환 일치 손실 함수, 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 기반으로 학습된 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성하는 실사 주행 영상 생성 방법 및 장치를 제공한다.The present embodiments provide a method and apparatus for generating a real driving image from a virtual driving image through an image generation model learned based on an overall loss function including an adversarial loss function, a circular coincidence loss function, and a context error loss function. provides

Description

METHOD AND APPARATUS FOR GENERATING REALISTIC DRIVING IMAGE BASED ON ADVERSARIAL GENERATION NEURAL NETWORK THROUGH IMAGE CONTEXT ERROR}

본 발명이 속하는 기술 분야는 적대적 생성 신경망 기반의 영상 생성 방법 및 장치에 관한 것이다. The technical field to which the present invention belongs relates to a method and apparatus for generating an image based on an adversarial generative neural network.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this part merely provide background information on the present embodiment and do not constitute prior art.

자율 주행 모델 학습을 위한 물체 인식, 영상 분할의 완전 지도 학습에 사용되는 데이터는 수집 환경이 제한적이거나 비용이 매우 크다. 이미지 내의 정확한 물체의 위치 정보, 픽셀 단위 클래스 정보 등 각 이미지의 메타 데이터 수집에 있어서 시간적, 금전적 비용이 매우 크다.Data used for fully supervised learning of object recognition and image segmentation for autonomous driving model learning has a limited collection environment or is very expensive. Time and money costs are very high in collecting meta data of each image, such as accurate object location information and pixel unit class information in the image.

반면에 가상 환경 데이터는 다양한 주행 환경의 영상을 수집 기관의 필요에 따라 자유롭게 저비용으로 수집이 가능하다. 가상 환경 데이터는 개발 및 물체 구현에 이용된 그래픽 엔진으로부터 이미지 내 물체들의 정확한 위치 정보를 신속하게 취득이 가능하다.On the other hand, virtual environment data can freely collect images of various driving environments at low cost according to the needs of the collection agency. The virtual environment data can quickly obtain accurate location information of objects in the image from the graphic engine used for development and object realization.

한국공개특허공보 제10-2020-0094656호 (2020.08.07)Korean Patent Publication No. 10-2020-0094656 (2020.08.07) 한국공개특허공보 제10-2020-0093425호 (2020.08.05)Korean Patent Publication No. 10-2020-0093425 (2020.08.05)

본 발명의 실시예들은 적대적 손실 함수, 순환 일치 손실 함수, 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 기반으로 학습된 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성하므로, 네트워크의 파라미터를 절감하고 생성된 이미지의 품질 보존 및 개선을 도모하는데 주된 목적이 있다.Embodiments of the present invention generate a real driving image from a virtual driving image through an image generation model learned based on an overall loss function including an adversarial loss function, a circular coincidence loss function, and a context error loss function, so that parameters of the network Its main purpose is to save and promote the preservation and improvement of the quality of the images created.

본 발명의 명시되지 않은 또 다른 목적들은 하기의 상세한 설명 및 그 효과로부터 용이하게 추론할 수 있는 범위 내에서 추가적으로 고려될 수 있다.Other non-specified objects of the present invention may be additionally considered within the scope that can be easily inferred from the following detailed description and effects thereof.

본 실시예의 일 측면에 의하면 컴퓨팅 디바이스에 의한 실사 주행 영상 생성 방법에 있어서, 가상 주행 영상을 입력받는 단계; 제1 영상 생성 모델을 통해 상기 가상 주행 영상으로부터 실제 주행 영상을 생성하는 단계를 포함하며, 상기 제1 영상 생성 모델은 인코더(Encoder), 상기 인코더에 연결된 레지듀얼 블록(Residual Block), 상기 레지듀얼 블록에 연결된 디코더(Decoder)를 포함하는 네트워크 구조를 포함하는 것을 특징으로 하는 실사 주행 영상 생성 방법을 제공한다.According to an aspect of the present embodiment, a method for generating a real-life driving image using a computing device includes receiving an input of a virtual driving image; Generating an actual driving image from the virtual driving image through a first image generation model, wherein the first image generation model includes an encoder, a residual block connected to the encoder, and the residual Provided is a live-action driving image generation method comprising a network structure including a decoder connected to a block.

상기 제1 영상 생성 모델은 (i) 상기 제1 영상 생성 모델이 제1 판별 모델에 연결되어 적대적 생성 네트워크를 구성하고, 제2 영상 생성 모델이 제2 판별 모델에 연결되어 적대적 생성 네트워크를 구성하여 산출되는 적대적 손실 함수, (ii) 상기 제1 영상 생성 모델이 상기 제2 영상 생성 모델에 연결되어 산출되는 순환 일치 손실 함수, (iii) 상기 제1 영상 생성 모델이 특징 추출 모델에 연결되어 산출되는 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 최소화하도록 학습될 수 있다.The first image generation model is (i) the first image generation model is connected to the first discrimination model to form an adversarial generation network, and the second image generation model is connected to the second discrimination model to form an adversarial generation network. A calculated adversarial loss function, (ii) a circular coincidence loss function calculated by connecting the first image generating model to the second image generating model, (iii) a calculated result of connecting the first image generating model to a feature extraction model It can be learned to minimize the overall loss function including the context error loss function.

상기 적대적 손실 함수는 상기 제1 판별 모델이 실제 주행 영상 입력을 진짜로 구분하고 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 가짜로 구분하도록 설정된 제1 판별 손실 함수를 포함할 수 있다.The adversarial loss function may include a first discrimination loss function set so that the first discrimination model distinguishes a real driving image input as real and a real driving image generated from the virtual driving image input as fake.

상기 적대적 손실 함수는 상기 제2 판별 모델이 가상 주행 영상 입력을 진짜로 구분하고 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 가짜로 구분하도록 설정된 제2 판별 손실 함수를 포함할 수 있다.The adversarial loss function may include a second discrimination loss function set so that the second discrimination model classifies the virtual driving image input as genuine and classifies the virtual driving image generated from the actual driving image input as fake.

상기 적대적 손실 함수는 상기 제1 영상 생성 모델이 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 진짜로 구분하도록 설정된 제1 생성 손실 함수를 포함할 수 있다.The adversarial loss function may include a first generation loss function set so that the first image generation model distinguishes a virtual driving image generated from an actual driving image input as real.

상기 적대적 손실 함수는 상기 제2 영상 생성 모델이 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 진짜로 구분하도록 설정된 제2 생성 손실 함수를 포함할 수 있다.The adversarial loss function may include a second generation loss function configured so that the second image generation model distinguishes a real driving image generated from a virtual driving image input as real.

상기 순환 일치 손실 함수는 (i) 상기 제1 영상 생성 모델을 통해 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 상기 제2 영상 생성 모델을 통해 다시 변환하여 생성된 가상 주행 영상 및 (ii) 상기 가상 주행 영상 입력 간의 차이로 정의된 제1 순환 손실 함수를 포함할 수 있다.The cyclic coincidence loss function includes (i) a virtual driving image generated by converting a real driving image generated from a virtual driving image input through the first image generation model back through the second image generation model, and (ii) the virtual driving image. A first circular loss function defined as a difference between driving image inputs may be included.

상기 순환 일치 손실 함수는 (i) 상기 제2 영상 생성 모델을 통해 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 상기 제1 영상 생성 모델을 통해 다시 변환하여 생성된 실제 주행 영상 및 (ii) 상기 실제 주행 영상 입력 간의 차이로 정의된 제2 순환 손실 함수를 포함할 수 있다.The cyclic coincidence loss function includes (i) a real driving image generated by converting a virtual driving image generated from an actual driving image input through the second image generation model through the first image generation model, and (ii) the actual driving image. A second circular loss function defined as a difference between driving image inputs may be included.

상기 문맥 오차 손실 함수는 상기 특징 추출 모델을 통해 (i) 가상 주행 영상 입력으로부터 변환한 입력 특징 영상 및 (ii) 상기 가상 주행 영상 입력으로부터 생성된 실제 주행 영상 출력으로부터 변환한 출력 특징 영상 간의 유사도로 정의된 제1 문맥 손실 함수를 포함할 수 있다.The context error loss function is a similarity between (i) an input feature image converted from a virtual driving image input and (ii) an output feature image converted from an actual driving image output generated from the virtual driving image input through the feature extraction model. A defined first context loss function may be included.

상기 문맥 오차 손실 함수는 상기 특징 추출 모델을 통해 (i) 상기 실제 주행 영상 출력으로부터 변환한 상기 출력 특징 영상 및 (ii) 검증용 대상 주행 영상으로부터 변환한 대상 특징 영상 간의 유사도로 정의된 제2 문맥 손실 함수를 포함할 수 있다.The context error loss function is a second context defined by the similarity between (i) the output feature image converted from the actual driving image output and (ii) the target feature image converted from the target driving image for verification through the feature extraction model. A loss function may be included.

이상에서 설명한 바와 같이 본 발명의 실시예들에 의하면, 적대적 손실 함수, 순환 일치 손실 함수, 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 기반으로 학습된 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성하므로, 네트워크의 파라미터를 절감하고 생성된 이미지의 품질 보존 및 개선을 도모할 수 있는 효과가 있다.As described above, according to the embodiments of the present invention, an actual driving image is obtained from a virtual driving image through an image generation model learned based on an overall loss function including an adversarial loss function, a circular coincidence loss function, and a context error loss function. Since it generates, there is an effect of reducing the parameters of the network and promoting preservation and improvement of the quality of the generated image.

여기에서 명시적으로 언급되지 않은 효과라 하더라도, 본 발명의 기술적 특징에 의해 기대되는 이하의 명세서에서 기재된 효과 및 그 잠정적인 효과는 본 발명의 명세서에 기재된 것과 같이 취급된다.Even if the effects are not explicitly mentioned here, the effects described in the following specification expected by the technical features of the present invention and their provisional effects are treated as described in the specification of the present invention.

도 1은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치를 예시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 처리하는 가상 주행 영상을 예시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치의 제1 영상 생성 모델을 예시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치에 적용되는 제1 영상 생성 모델, 제2 영상 생성 모델, 제1 판별 모델, 제2 판별 모델을 예시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 제1 판별 모델을 이용하여 적대적 손실 함수를 학습하는 동작을 예시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 제2 영상 생성 모델을 이용하여 순환 일치 손실 함수를 학습하는 동작을 예시한 도면이다.
도 7은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 특징 추출 모델을 이용하여 문맥 오차 손실 함수를 학습하는 동작을 예시한 도면이다.
도 8은 본 발명의 다른 실시예에 따른 실사 주행 영상 생성 방법을 예시한 흐름도이다.
도 9는 본 발명의 실시예들에 따라 시뮬레이션을 수행한 결과를 예시한 도면이다.1 is a diagram illustrating an apparatus for generating a live-action driving image according to an embodiment of the present invention.
2 is a diagram illustrating a virtual driving image processed by an apparatus for generating a live action driving image according to an embodiment of the present invention.
3 is a diagram illustrating a first image generation model of an apparatus for generating a live-action driving image according to an embodiment of the present invention.
4 is a diagram illustrating a first image generation model, a second image generation model, a first discrimination model, and a second discrimination model applied to the apparatus for generating a live-action driving image according to an embodiment of the present invention.
5 is a diagram illustrating an operation of learning an adversarial loss function by using a first image generation model and a first discriminant model in a live-action driving video generating apparatus according to an embodiment of the present invention.
6 is a diagram illustrating an operation of learning a circular coincidence loss function by using a first image generation model and a second image generation model by the apparatus for generating a live-action driving image according to an embodiment of the present invention.
7 is a diagram illustrating an operation of learning a context error loss function by using a first image generation model and a feature extraction model in a live-action driving video generating apparatus according to an embodiment of the present invention.
8 is a flowchart illustrating a method for generating a live action driving image according to another embodiment of the present invention.
9 is a diagram illustrating a result of performing a simulation according to embodiments of the present invention.

이하, 본 발명을 설명함에 있어서 관련된 공지기능에 대하여 이 분야의 기술자에게 자명한 사항으로서 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하고, 본 발명의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. Hereinafter, in the description of the present invention, if it is determined that a related known function may unnecessarily obscure the subject matter of the present invention as an obvious matter to those skilled in the art, the detailed description thereof will be omitted, and some embodiments of the present invention will be described. It will be described in detail through exemplary drawings.

1024*564의 고해상도 주행 영상 변환을 위해서는 생성 네트워크에서 대량의 파라미터가 필요하고, 일반적인 환경의 GPU(Graphics Processing Unit)에서는 메모리 부족 문제로 인하여 네트워크 학습이 불가능하다. In order to convert high-resolution driving images of 1024*564, a large number of parameters are required in the generation network, and network learning is impossible in a GPU (Graphics Processing Unit) in a general environment due to insufficient memory.

본 실시예에 따른 주행 영상 생성 장치는 적대적 손실 함수, 순환 일치 손실 함수, 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 기반으로 학습된 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성하므로, 네트워크의 파라미터를 절감하고 생성된 이미지의 품질 보존 및 개선을 도모할 수 있다.Since the driving image generating apparatus according to the present embodiment generates a real driving image from a virtual driving image through an image generation model learned based on an overall loss function including an adversarial loss function, a circular coincidence loss function, and a context error loss function, It is possible to reduce the parameters of the network and to preserve and improve the quality of generated images.

도 1은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치를 예시한 도면이다.1 is a diagram illustrating an apparatus for generating a live-action driving image according to an embodiment of the present invention.

실사 주행 영상 생성 장치(110)는 적어도 하나의 프로세서(120), 컴퓨터 판독 가능한 저장매체(130) 및 통신 버스(170)를 포함한다. The live-action driving image generating device 110 includes at least one processor 120 , a computer readable storage medium 130 and a communication bus 170 .

프로세서(120)는 실사 주행 영상 생성 장치(110)로 동작하도록 제어할 수 있다. 예컨대, 프로세서(120)는 컴퓨터 판독 가능한 저장 매체(130)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 컴퓨터 실행 가능 명령어는 프로세서(120)에 의해 실행되는 경우 실사 주행 영상 생성 장치(110)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.The processor 120 may be controlled to operate as the live-action driving image generating device 110 . For example, the processor 120 may execute one or more programs stored in the computer readable storage medium 130 . The one or more programs may include one or more computer-executable instructions, which, when executed by the processor 120, cause the live-action driving image generating device 110 to perform operations according to an exemplary embodiment. can be configured.

컴퓨터 판독 가능한 저장 매체(130)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보는 입출력 인터페이스(150)나 통신 인터페이스(160)를 통해서도 주어질 수 있다. 컴퓨터 판독 가능한 저장 매체(130)에 저장된 프로그램(140)은 프로세서(120)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능한 저장 매체(130)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 실사 주행 영상 생성 장치(110)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 130 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. Computer executable instructions or program codes, program data and/or other suitable forms of information may also be provided via input/output interface 150 or communication interface 160. The program 140 stored in the computer readable storage medium 130 includes a set of instructions executable by the processor 120 . In one embodiment, computer readable storage medium 130 may include memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other types of storage media that can be accessed by the live-action driving image generating apparatus 110 and store desired information, or a suitable combination thereof.

통신 버스(170)는 프로세서(120), 컴퓨터 판독 가능한 저장 매체(130)를 포함하여 실사 주행 영상 생성 장치(110)의 다른 다양한 컴포넌트들을 상호 연결한다.The communication bus 170 interconnects various other components of the live-action driving image generating device 110, including the processor 120 and the computer readable storage medium 130.

실사 주행 영상 생성 장치(110)는 또한 하나 이상의 입출력 장치를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(150) 및 하나 이상의 통신 인터페이스(160)를 포함할 수 있다. 입출력 인터페이스(150) 및 통신 인터페이스(160)는 통신 버스(170)에 연결된다. 입출력 장치(미도시)는 입출력 인터페이스(150)를 통해 실사 주행 영상 생성 장치(110)의 다른 컴포넌트들에 연결될 수 있다.The live-action driving image generating device 110 may also include one or more input/output interfaces 150 and one or more communication interfaces 160 providing interfaces for one or more input/output devices. The input/output interface 150 and the communication interface 160 are connected to the communication bus 170 . An input/output device (not shown) may be connected to other components of the live-action driving image generating device 110 through the input/output interface 150 .

실사 주행 영상 생성 장치는 가상 주행 영상을 입력받고, 제1 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성한다.The live action driving image generating apparatus receives a virtual driving image and generates a real driving image from the virtual driving image through a first image generating model.

도 2는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 처리하는 가상 주행 영상을 예시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치의 제1 영상 생성 모델을 예시한 도면이고, 도 4는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치에 적용되는 제1 영상 생성 모델, 제2 영상 생성 모델, 제1 판별 모델, 제2 판별 모델을 예시한 도면이다.2 is a diagram illustrating a virtual driving image processed by an apparatus for generating a live-action driving image according to an embodiment of the present invention, and FIG. 3 is a first image generation model of the apparatus for generating a live-action driving image according to an embodiment of the present invention. , and FIG. 4 is a diagram illustrating a first image generation model, a second image generation model, a first discrimination model, and a second discrimination model applied to the apparatus for generating a live-action driving image according to an embodiment of the present invention. am.

제1 영상 생성 모델(G_A->B)은 가상 이미지로부터 실제 이미지를 생성하는 네트워크이다.The first image generation model ( _GA->B ) is a network that generates real images from virtual images.

제2 영상 생성 모델(G_B->A)은 실제 이미지로부터 가상 이미지를 생성하는 네트워크이다.The second image generation model (GB _B->A ) is a network that generates a virtual image from a real image.

제1 판별 모델(D_A)은 가상 이미지를 판별하는 네트워크이다.The first discrimination model D _A is a network for discriminating virtual images.

제2 판별 모델(D_>B)은 실제 이미지를 판별하는 네트워크이다.The second discrimination model (D _>B ) is a network that discriminates real images.

영상 생성 모델은 인코더(Encoder), 인코더에 연결된 레지듀얼 블록(Residual Block), 레지듀얼 블록에 연결된 디코더(Decoder)를 포함하는 네트워크 구조를 포함한다.The image generation model includes a network structure including an encoder, a residual block connected to the encoder, and a decoder connected to the residual block.

네트워크 모델은 다수의 레이어가 네트워크로 연결되며 컨볼루션 레이어를 포함할 수 있다. 레이어는 파라미터를 포함할 수 있고, 레이어의 파라미터는 학습가능한 필터 집합을 포함한다. 파라미터는 노드 간의 가중치 및/또는 바이어스를 포함한다. 네트워크 모델은 손실 함수를 최소화하는 방향으로 네트워크 가중치를 갱신한다.In the network model, a plurality of layers are connected by a network and may include a convolutional layer. A layer can contain parameters, and the parameters of a layer contain a set of learnable filters. Parameters include weights and/or biases between nodes. The network model updates the network weights in a direction that minimizes the loss function.

인코더는 이미지로부터 특징을 추출하는 네트워크이다. 차원 축소 기능을 수행한다. An encoder is a network that extracts features from an image. It performs dimension reduction function.

디코더는 특징으로부터 이미지를 복원하는 네트워크이다. 차원 확대 기능을 수행한다.A decoder is a network that reconstructs images from features. Performs a dimension expansion function.

레지듀얼 블록은 네트워크 구조의 출력에 다시 입력을 더해서 다음 레이어로 넘기며, 레이어의 입력을 레이어의 출력에 바로 연결하는 스킵 구조를 가질 수 있다. 입력값을 출력값에 더해줄 수 있도록 지름길을 만든 구조이다.The residual block may have a skip structure in which inputs are added to outputs of the network structure and passed to the next layer, and inputs of the layer are directly connected to outputs of the layer. It is a structure that created a shortcut so that the input value can be added to the output value.

제1 영상 생성 모델은 적대적 손실 함수, 순환 일치 손실 함수, 문맥 오차 손실 함수를 포함하는 전체 손실 함수를 기반으로 학습된다.The first image generation model is learned based on an overall loss function including an adversarial loss function, a circular coincidence loss function, and a context error loss function.

적대적 손실 함수(L_adv)는 제1 영상 생성 모델이 제1 판별 모델에 연결되어 적대적 생성 네트워크를 구성하고, 제2 영상 생성 모델이 제2 판별 모델에 연결되어 적대적 생성 네트워크를 구성하여 산출된다.The adversarial loss function (L _adv ) is calculated by connecting the first image generation model to the first discrimination model to form an adversarial generation network, and connecting the second image generation model to the second discrimination model to form an adversarial generation network.

순환 일치 손실 함수(L_cyc)는 제1 영상 생성 모델이 제2 영상 생성 모델에 연결되어 산출된다.The cyclic coincidence loss function (L _cyc ) is calculated by connecting the first image generation model to the second image generation model.

문맥 오차 손실 함수(L_ctx)는 제1 영상 생성 모델이 특징 추출 모델에 연결되어 산출된다.The context error loss function (L _ctx ) is calculated by connecting the first image generation model to the feature extraction model.

전체 학습 단계는 판별 네트워크(DA, DB)를 학습하는 단계, 생성 네트워크(G_A->B, G_B->A)를 학습하는 단계를 번갈아 가며 학습을 진행한다.In the entire learning phase, learning proceeds by alternating between learning discriminant networks (DA, DB) and learning generative networks (GA- _>B , G _B->A ).

도 5는 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 제1 판별 모델을 이용하여 적대적 손실 함수를 학습하는 동작을 예시한 도면이다.5 is a diagram illustrating an operation of learning an adversarial loss function by using a first image generation model and a first discriminant model in a live-action driving video generating apparatus according to an embodiment of the present invention.

이미지 판별 네트워크와 생성 네트워크는 교차로 학습된다.The image discrimination network and the generative network are trained crosswise.

이미지 판별 네트워크는 해당 이미지가 학습 데이터 세트에 있는 이미지라면 진짜(Real)에 해당하고, 네트워크가 생성한 이미지라면 가짜(Fake)에 해당하는 것으로 출력하도록 학습된다. 이미지 생성 네트워크는 판별 네트워크를 속이도록 학습된다.The image discrimination network is trained to output a corresponding image as Real if the image is in the training data set, and as Fake if the image is generated by the network. The image generation network is trained to fool the discriminant network.

적대적 손실 함수는 제1 판별 모델이 실제 주행 영상 입력을 진짜로 구분하고 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 가짜로 구분하도록 설정된 제1 판별 손실 함수를 포함할 수 있다.The adversarial loss function may include a first discrimination loss function configured such that the first discrimination model classifies the real driving image input as genuine and classifies the real driving image generated from the virtual driving image input as fake.

적대적 손실 함수는 제2 판별 모델이 가상 주행 영상 입력을 진짜로 구분하고 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 가짜로 구분하도록 설정된 제2 판별 손실 함수를 포함할 수 있다.The adversarial loss function may include a second discrimination loss function set so that the second discrimination model classifies the virtual driving image input as genuine and classifies the virtual driving image generated from the actual driving image input as fake.

적대적 손실 함수는 제1 영상 생성 모델이 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 진짜로 구분하도록 설정된 제1 생성 손실 함수를 포함할 수 있다.The adversarial loss function may include a first generation loss function configured so that the first image generation model distinguishes the virtual driving image generated from the actual driving image input as real.

적대적 손실 함수는 제2 영상 생성 모델이 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 진짜로 구분하도록 설정된 제2 생성 손실 함수를 포함할 수 있다.The adversarial loss function may include a second generation loss function configured so that the second image generation model distinguishes a real driving image generated from a virtual driving image input as real.

도 6은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 제2 영상 생성 모델을 이용하여 순환 일치 손실 함수를 학습하는 동작을 예시한 도면이다.6 is a diagram illustrating an operation of learning a circular coincidence loss function by using a first image generation model and a second image generation model by the apparatus for generating a live-action driving image according to an embodiment of the present invention.

순환 일치 손실 함수는 생성된 이미지를 다른 생성 네트워크의 입력으로 통과시켰을 때 결과 이미지와 원본 이미지 간의 오차를 계산한다. 이미지의 평균 픽셀 절대값 오차를 계산할 수 있다.The Cyclic Consistency Loss function calculates the error between the resulting image and the original image when the generated image is passed as an input to another generating network. The average pixel absolute error of an image can be calculated.

순환 일치 손실 함수는 (i) 제1 영상 생성 모델을 통해 가상 주행 영상 입력으로부터 생성된 실제 주행 영상을 제2 영상 생성 모델을 통해 다시 변환하여 생성된 가상 주행 영상 및 (ii) 가상 주행 영상 입력 간의 차이로 정의된 제1 순환 손실 함수를 포함할 수 있다.The circular coincidence loss function is calculated between (i) a virtual driving image generated by converting the actual driving image generated from the virtual driving image input through the first image generation model back through the second image generation model and (ii) the virtual driving image input. A first circular loss function defined as the difference may be included.

순환 일치 손실 함수는 (i) 제2 영상 생성 모델을 통해 실제 주행 영상 입력으로부터 생성된 가상 주행 영상을 제1 영상 생성 모델을 통해 다시 변환하여 생성된 실제 주행 영상 및 (ii) 실제 주행 영상 입력 간의 차이로 정의된 제2 순환 손실 함수를 포함할 수 있다.The circular coincidence loss function is calculated between (i) the actual driving image generated by converting the virtual driving image generated from the actual driving image input through the second image generation model back through the first image generation model and (ii) the actual driving image input. A second circular loss function defined as the difference may be included.

도 7은 본 발명의 일 실시예에 따른 실사 주행 영상 생성 장치가 제1 영상 생성 모델과 특징 추출 모델을 이용하여 문맥 오차 손실 함수를 학습하는 동작을 예시한 도면이다.7 is a diagram illustrating an operation of learning a context error loss function by using a first image generation model and a feature extraction model in a live-action driving video generating apparatus according to an embodiment of the present invention.

문맥 정보 손실 함수는 사전 학습된 이미지 특징 추출 신경망을 이용해 생성된 이미지의 문맥 정보 유사도를 계산한다.The contextual information loss function calculates the similarity of the contextual information of the generated image using the pretrained image feature extraction neural network.

상기 문맥 오차 손실 함수는 상기 특징 추출 모델을 통해 (i) 가상 주행 영상 입력으로부터 변환한 입력 특징 영상 및 (ii) 가상 주행 영상 입력으로부터 생성된 실제 주행 영상 출력으로부터 변환한 출력 특징 영상 간의 유사도로 정의된 제1 문맥 손실 함수를 포함할 수 있다.The context error loss function defines similarity between (i) an input feature image converted from a virtual driving image input and (ii) an output feature image converted from an actual driving image output generated from a virtual driving image input through the feature extraction model. may include a first context loss function.

문맥 오차 손실 함수는 특징 추출 모델을 통해 (i) 실제 주행 영상 출력으로부터 변환한 출력 특징 영상 및 (ii) 검증용 대상 주행 영상으로부터 변환한 대상 특징 영상 간의 유사도로 정의된 제2 문맥 손실 함수를 포함할 수 있다.The context error loss function includes a second context loss function defined as similarity between (i) an output feature image converted from an actual driving image output and (ii) a target feature image converted from a target driving image for verification through a feature extraction model. can do.

F는 특징 추출 신경망이고, F_input는 입력 도메인의 이미지로 F(x_A)이고, F_output은 대상 도메인으로 변환된 이미지로 F(G_A->B(x_A))이고, F_target은 대상 도메인의 이미지로 F(x_B)이다.F is a feature extraction neural network, F _input is F(x _A ) as an image of the input domain, F _output is F( _GA->B (x _A )) as an image transformed to the target domain, and F _target is the target An image of a domain, F(x _B ).

각 특징의 형태는 원본 이미지 크기를 N배 축소한 형태이다. 예컨대, 3 * 564 * 1024 -> 512 * 35 * 64로 변환된다.The shape of each feature is a form in which the size of the original image is reduced by N times. For example, 3 * 564 * 1024 -> 512 * 35 * 64 is converted.

입력-출력 문맥 오차는 변환 결과가 입력 영상의 전체적 상황을 잘 보존했는지 평가한다.The input-output context error evaluates whether the conversion result has well preserved the overall context of the input image.

출력-대상 문맥 오차는 변환 결과가 대상 도메인의 특징을 잘 반영하고 있는지 평가한다. 가상 이미지에서 변환된 실제 이미지인 경우 실제 주행 영상과 변환 결과를 비교한다. The output-target context error evaluates whether the conversion result reflects the characteristics of the target domain well. In the case of a real image converted from a virtual image, the conversion result is compared with the actual driving image.

도 8은 본 발명의 다른 실시예에 따른 실사 주행 영상 생성 방법을 예시한 흐름도이다.8 is a flowchart illustrating a method for generating a live action driving image according to another embodiment of the present invention.

실사 주행 영상 생성 방법은 컴퓨팅 디바이스에 의하여 수행될 수 있으며, 실사 주행 영상 생성 장치와 동일한 방식으로 동작한다.The live action driving image generating method may be performed by a computing device and operates in the same manner as the live action driving image generating apparatus.

단계 S10에서는 가상 주행 영상을 입력받는 단계를 수행한다.In step S10, a step of receiving a virtual driving image is performed.

단계 S20에서는 제1 영상 생성 모델을 통해 가상 주행 영상으로부터 실제 주행 영상을 생성하는 단계를 수행한다.In step S20, a step of generating a real driving image from a virtual driving image through a first image generating model is performed.

실사 주행 영상 생성 방법은 실제 주행 영상을 생성하는 단계 이전에 제1 영상 생성 모델을 학습하는 단계를 더 포함할 수 있다.The method of generating a live-action driving image may further include learning a first image generating model prior to generating the actual driving image.

도 9는 본 발명의 실시예들에 따라 시뮬레이션을 수행한 결과를 예시한 도면이다.9 is a diagram illustrating a result of performing a simulation according to embodiments of the present invention.

도 9에 도시된 바와 같이 더 적은 파라미터 수로도 비슷하거나 더 좋은 성능을 나타내는 것을 확인할 수 있다.As shown in FIG. 9 , it can be confirmed that similar or better performance is obtained even with a smaller number of parameters.

실사 주행 영상 생성 장치는 하드웨어, 펌웨어, 소프트웨어 또는 이들의 조합에 의해 로직회로 내에서 구현될 수 있고, 범용 또는 특정 목적 컴퓨터를 이용하여 구현될 수도 있다. 장치는 고정배선형(Hardwired) 기기, 필드 프로그램 가능한 게이트 어레이(Field Programmable Gate Array, FPGA), 주문형 반도체(Application Specific Integrated Circuit, ASIC) 등을 이용하여 구현될 수 있다. 또한, 장치는 하나 이상의 프로세서 및 컨트롤러를 포함한 시스템온칩(System on Chip, SoC)으로 구현될 수 있다.The live-action driving image generating device may be implemented in a logic circuit by hardware, firmware, software, or a combination thereof, or may be implemented using a general-purpose or special-purpose computer. The device may be implemented using a hardwired device, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or the like. In addition, the device may be implemented as a System on Chip (SoC) including one or more processors and controllers.

실사 주행 영상 생성 장치는 하드웨어적 요소가 마련된 컴퓨팅 디바이스 또는 서버에 소프트웨어, 하드웨어, 또는 이들의 조합하는 형태로 탑재될 수 있다. 컴퓨팅 디바이스 또는 서버는 각종 기기 또는 유무선 통신망과 통신을 수행하기 위한 통신 모뎀 등의 통신장치, 프로그램을 실행하기 위한 데이터를 저장하는 메모리, 프로그램을 실행하여 연산 및 명령하기 위한 마이크로프로세서 등을 전부 또는 일부 포함한 다양한 장치를 의미할 수 있다.The live-action driving image generating apparatus may be installed in software, hardware, or a combination thereof in a computing device or server equipped with hardware elements. A computing device or server includes all or part of a communication device such as a communication modem for communicating with various devices or wired/wireless communication networks, a memory for storing data for executing a program, and a microprocessor for executing calculations and commands by executing a program. It can mean a variety of devices, including

도 8에서는 각각의 과정을 순차적으로 실행하는 것으로 기재하고 있으나 이는 예시적으로 설명한 것에 불과하고, 이 분야의 기술자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 8에 기재된 순서를 변경하여 실행하거나 또는 하나 이상의 과정을 병렬적으로 실행하거나 다른 과정을 추가하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이다.In FIG. 8, it is described that each process is sequentially executed, but this is merely an example, and a person skilled in the art changes and executes the sequence described in FIG. 8 within the range not departing from the essential characteristics of the embodiment of the present invention Alternatively, it will be possible to apply various modifications and variations by executing one or more processes in parallel or adding another process.

본 실시예들에 따른 동작은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능한 매체에 기록될 수 있다. 컴퓨터 판독 가능한 매체는 실행을 위해 프로세서에 명령어를 제공하는 데 참여한 임의의 매체를 나타낸다. 컴퓨터 판독 가능한 매체는 프로그램 명령, 데이터 파일, 데이터 구조 또는 이들의 조합을 포함할 수 있다. 예를 들면, 자기 매체, 광기록 매체, 메모리 등이 있을 수 있다. 컴퓨터 프로그램은 네트워크로 연결된 컴퓨터 시스템 상에 분산되어 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 본 실시예를 구현하기 위한 기능적인(Functional) 프로그램, 코드, 및 코드 세그먼트들은 본 실시예가 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있을 것이다.Operations according to the present embodiments may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. Computer readable medium refers to any medium that participates in providing instructions to a processor for execution. A computer readable medium may include program instructions, data files, data structures, or combinations thereof. For example, there may be a magnetic medium, an optical recording medium, a memory, and the like. The computer program may be distributed over networked computer systems so that computer readable codes are stored and executed in a distributed manner. Functional programs, codes, and code segments for implementing this embodiment may be easily inferred by programmers in the art to which this embodiment belongs.

본 실시예들은 본 실시예의 기술 사상을 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.These embodiments are for explaining the technical idea of this embodiment, and the scope of the technical idea of this embodiment is not limited by these embodiments. The scope of protection of this embodiment should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of rights of this embodiment.

Claims

A method for generating a real-life driving image using a computing device,
Receiving a virtual driving image;
Generating a real driving image from the virtual driving image through a first image generation model;
The first image generation model comprises a network structure including an encoder, a residual block connected to the encoder, and a decoder connected to the residual block. method.

According to claim 1,
The first image generation model,
(i) an adversarial loss function calculated by connecting the first image generation model to the first discrimination model to form an adversarial generation network, and connecting the second image generation model to the second discrimination model to form an adversarial generation network, ( ii) a circular coincidence loss function calculated by connecting the first image generation model to the second image generation model, and (iii) a context error loss function calculated by connecting the first image generation model to a feature extraction model. A live-action driving image generation method characterized in that it is learned to minimize the total loss function.

According to claim 2,
The adversarial loss function is
(i) a first discrimination loss function set so that the first discrimination model classifies an actual driving image input as real and a real driving image generated from the virtual driving image input as fake;
(ii) a second discrimination loss function set so that the second discrimination model classifies a virtual driving image input as genuine and classifies a virtual driving image generated from an actual driving image input as fake;
(iii) a first generation loss function set so that the first image generation model distinguishes a virtual driving image generated from an actual driving image input as real;
(iv) The second image generation model includes a second generation loss function configured to distinguish a real driving image generated from a virtual driving image input as real.

According to claim 2,
The cyclic coincidence loss function is
(i) the virtual driving image generated by converting the actual driving image generated from the virtual driving image input through the first image generation model through the second image generation model and (ii) the difference between the virtual driving image input a first circular loss function defined;
The difference between (i) a real driving image generated by converting a virtual driving image generated from an actual driving image input through the second image generation model through the first image generation model and (ii) the actual driving image input A live-action driving image generating method comprising a defined second circular loss function.

According to claim 2,
The context error loss function is
A first context loss defined as similarity between (i) an input feature image converted from a virtual driving image input and (ii) an output feature image converted from an actual driving image output generated from the virtual driving image input through the feature extraction model. function,
Through the feature extraction model, a second context loss function defined as a similarity between (i) the output feature image converted from the actual driving image output and (ii) the target feature image converted from the target driving image for verification is included. A method for generating a live-action driving video characterized by