KR102091643B1

KR102091643B1 - Apparatus for processing image using artificial neural network, method thereof and computer recordable medium storing program to perform the method

Info

Publication number: KR102091643B1
Application number: KR1020180046511A
Authority: KR
Inventors: 권택순; 정윤희; 강동구
Original assignee: (주)이스트소프트; 주식회사 딥아이
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2020-03-20
Also published as: KR20190122955A

Abstract

본 발명은 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체에 관한 것으로, 이러한 본 발명은 원본착용영상이 입력되면, 가중치가 적용되는 복수의 연산을 통해 원본착용영상으로부터 가공미착용영상 및 가공안경마스크를 생성하는 생성기와, 원본미착용영상 또는 가공미착용영상 중 어느 하나의 영상이 입력되면, 입력된 영상이 실제인지 혹은 가공된 것인지 여부를 출력하는 글로벌판별기와, 입력된 로컬 영상이 실제인지 혹은 가공된 것인지 여부를 출력하는 로컬판별기와, 글로벌판별기에 원본미착용영상이 입력되면 실제인 것으로 출력하고 가공미착용영상이 입력되면 가공된 것으로 출력하도록 상기 글로벌판별기를 학습시키고, 로컬판별기에 원본로컬영상이 입력되면 실제인 것으로 출력하고 가공로컬영상이 입력되면 가공된 것으로 출력하도록 로컬판별기를 학습시키며, 글로벌판별기가 입력된 가공미착용영상을 실제인 것으로 출력하고 로컬판별기가 입력된 가공로컬영상을 실제인 것으로 출력하도록 상기 생성기를 학습시키는 학습부를 포함하는 장치와, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공한다. The present invention relates to an apparatus for generating an image of wearing glasses using an artificial neural network, a method therefor, and a computer-readable recording medium in which a program for performing the method is recorded. A generator that generates an unprocessed image and a processed glasses mask from an original worn image through a plurality of calculations applied to it, and if either an original unworn image or an unprocessed image is input, whether the input image is real or processed Whether the global discriminator outputs whether or not the input local image is real or processed, and if the original unworn image is input to the global discriminator, it is output as real. The global discriminator is trained to be output as a local discriminator. When the original local image is input, the local discriminator is trained to output as a real one, and when the processed local image is input, the local discriminator is trained to be output as the processed local image. It provides an apparatus including a learning unit for learning the generator to output the actual, and a computer-readable recording medium for the method and a program for performing the same.

Description

Apparatus for processing image using artificial neural network, method thereof and computer recordable medium storing program to record an apparatus for generating glasses wearing images using an artificial neural network, a method therefor, and a program for performing the method perform the method}

본 발명은 영상 처리 기술에 관한 것으로, 보다 상세하게는, 얼굴 영상에 안경을 착용한 영상을 합성하는 영상 처리를 위한 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체에 관한 것이다. The present invention relates to an image processing technology, and more particularly, an apparatus for image processing for synthesizing an image of a face wearing glasses on a face image, a computer readable recording medium having a method therefor, and a program for performing the method It is about.

인공 지능이라는 개념은 1956년 미국 다트머스 대학에 있던 존 매카시 교수가 개최한 다트머스 회의에서 처음 등장했으며, 최근 몇 년 사이 폭발적으로 성장하고 있는 중이다. 특히, 2015년 이후 신속하고 강력한 병렬 처리 성능을 제공하는 GPU의 도입으로 더욱 가속화되고 있다. 폭발적으로 늘어나고 있는 저장 용량과 이미지, 텍스트, 매핑 데이터 등 모든 영역의 데이터가 범람하게 된 '빅데이터' 시대의 도래도 이러한 성장세에 큰 영향을 미치고 있다. 1956년 당시 인공 지능의 선구자들은 최종적으로 인간의 지능과 유사한 특성을 가진 복잡한 컴퓨터를 제작하고자 했다. 이렇듯 인간의 감각, 사고력을 지닌 채 인간처럼 생각하는 인공 지능을 '일반 AI(General AI)'라고 하지만, 현재의 기술 발전 수준에서 만들 수 있는 인공지능은 '좁은 AI(Narrow AI)'의 개념에 포함된다. 좁은 AI는 소셜 미디어의 이미지 분류 서비스나 얼굴 인식 기능 등과 같이 특정 작업을 인간 이상의 능력으로 해낼 수 있는 것이 특징이다. 한편, 머신 러닝은 기본적으로 알고리즘을 이용해 데이터를 분석하고, 분석을 통해 학습하며, 학습한 내용을 기반으로 판단이나 예측을 수행한다. 따라서 궁극적으로는 의사 결정 기준에 대한 구체적인 지침을 소프트웨어에 직접 코딩해 넣는 것이 아닌, 대량의 데이터와 알고리즘을 통해 컴퓨터 그 자체를 '학습'시켜 작업 수행 방법을 익히는 것을 목표로 한다. 초기 머신 러닝 연구자들이 만들어 낸 또 다른 알고리즘인 인공 신경망(artificial neural network)에 영감을 준 것은 인간의 뇌가 지닌 생물학적 특성, 특히 뉴런의 연결 구조다. 그러나 물리적으로 근접한 어떤 뉴런이든 상호 연결이 가능한 뇌와는 달리, 인공 신경망은 레이어 연결 및 데이터 전파 방향이 일정하다. 예를 들어, 이미지를 수많은 타일로 잘라 신경망의 첫 번째 레이어에 입력하면, 그 뉴런들은 데이터를 다음 레이어로 전달하는 과정을 마지막 레이어에서 최종 출력이 생성될 때까지 반복한다. 그리고 각 뉴런에는 수행하는 작업을 기준으로 입력의 정확도를 나타내는 가중치가 할당되며, 그 후 가중치를 모두 합산해 최종 출력이 결정된다. 딥 러닝은 인공신경망에서 발전한 형태의 인공 지능으로, 뇌의 뉴런과 유사한 정보 입출력 계층을 활용해 데이터를 학습한다. 딥 러닝의 등장으로 인해 머신 러닝의 실용성은 강화됐고, 인공 지능의 영역은 확장됐다. 딥 러닝은 컴퓨터 시스템을 통해 지원 가능한 모든 방식으로 작업을 세분화한다.The concept of artificial intelligence first appeared at the Dartmouth conference held in 1956 by Professor John McCarthy at the University of Dartmouth, USA, and has been exploding in recent years. In particular, it has been accelerated since 2015 with the introduction of GPUs that provide fast and powerful parallel processing performance. The advent of the 'big data' era, when data in all areas, such as explosively increasing storage capacity and image, text, and mapping data, has overflowed, has a great influence on this growth. In 1956, the pioneers of artificial intelligence ultimately wanted to build complex computers with characteristics similar to human intelligence. Artificial intelligence that thinks like a human with human sensation and thinking ability is called 'General AI', but the AI that can be created at the current level of technological development is based on the concept of 'Narrow AI'. Is included. Narrow AI is characterized by being able to accomplish certain tasks with more than human abilities, such as social media image classification service or face recognition function. Meanwhile, machine learning basically analyzes data using an algorithm, learns through analysis, and makes judgments or predictions based on what is learned. Ultimately, the goal is not to code specific guidelines for decision criteria directly into software, but to 'learn' the computer itself through a large amount of data and algorithms to learn how to perform tasks. Another algorithm invented by early machine learning researchers, the artificial neural network, is the biological properties of the human brain, especially the neuronal connections. However, unlike a brain that can interconnect any physically close neuron, artificial neural networks have a constant layer connection and data propagation direction. For example, if an image is cut into a number of tiles and input to the first layer of a neural network, the neurons repeat the process of passing data to the next layer until the final output is generated in the last layer. In addition, each neuron is assigned a weight indicating the accuracy of the input based on the task being performed, and the final output is then determined by summing all the weights. Deep learning is a form of artificial intelligence developed from an artificial neural network, and learns data using information input / output layers similar to neurons in the brain. With the advent of deep learning, the practicality of machine learning has been enhanced, and the realm of artificial intelligence has expanded. Deep learning breaks down work in any way that can be supported through a computer system.

[선행기술문헌][Advanced technical literature]

[특허문헌][Patent Document]

한국공개특허 제2013-0103153호 2013년 09월 23일 공개 (명칭: 고객 맞춤형 안경 및 콘택트렌즈 버추얼 피팅 방법 및 그 시스템) Published Korean Patent No. 2013-0103153 on September 23, 2013 (Name: Customer-specific glasses and contact lenses virtual fitting method and system)

본 발명의 목적은 안경을 착용한 사람의 영상에 기존의 안경을 소거하고 사용자가 선택한 새로운 안경을 착용한 영상을 가상으로 생성하여 안경 착용자가 착용한 안경을 벗지 않고도 자신이 선택한 안경을 가상으로 피팅한 영상을 살펴볼 수 있도록 하는 장치, 이를 위한 방법 및 이 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공함에 있다. An object of the present invention is to virtually fit the glasses selected by the wearer without removing the glasses worn by the wearer by erasing the existing glasses on the image of the person wearing the glasses and virtually generating the image wearing the new glasses selected by the user It is to provide a computer-readable recording medium in which an apparatus capable of viewing an image, a method therefor, and a program performing the method are recorded.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치는 실제 안경을 착용한 사람의 영상인 원본착용영상이 입력되면, 가중치가 적용되는 복수의 연산을 통해 상기 원본착용영상으로부터 가공된 안경을 미착용한 사람의 영상인 가공미착용영상 및 상기 가공미착용영상 중 상기 원본착용영상의 안경이 위치한 영역을 가공한 영역을 나타내는 가공안경마스크를 생성하는 생성기와, 실제 안경을 미착용한 사람의 영상인 원본미착용영상 및 상기 가공미착용영상 중 어느 하나의 영상이 입력되면, 입력된 영상에 대해 가중치가 적용되는 복수의 연산을 통해 상기 입력된 영상이 실제인지 혹은 가공된 것인지 여부를 출력하는 글로벌판별기와, 상기 원본미착용영상의 일부 영역인 원본로컬영상 및 상기 가공미착용영상의 일부 영역인 가공로컬영상 중 어느 하나의 로컬 영상을 입력받고, 입력된 로컬 영상에 대해 가중치가 적용되는 복수의 연산을 통해 상기 입력된 로컬 영상이 실제인지 혹은 가공된 것인지 여부를 출력하는 로컬판별기와, 상기 글로벌판별기에 상기 원본미착용영상이 입력되면 실제인 것으로 출력하고 상기 가공미착용영상이 입력되면 가공된 것으로 출력하도록 상기 글로벌판별기를 학습시키고, 상기 로컬판별기에 상기 원본로컬영상이 입력되면 실제인 것으로 출력하고 상기 가공로컬영상이 입력되면 가공된 것으로 출력하도록 상기 로컬판별기를 학습시키며, 상기 글로벌판별기가 상기 입력된 가공미착용영상을 실제인 것으로 출력하고 상기 로컬판별기가 상기 입력된 가공로컬영상을 실제인 것으로 출력하도록 상기 생성기를 학습시키는 학습부를 포함한다. In order to achieve the above object, an apparatus for generating an eyeglasses wearing image using an artificial neural network according to a preferred embodiment of the present invention, when an original wearing image that is an image of a person wearing real glasses is input, a weight is applied Through a plurality of calculations to generate a processed glasses mask that represents the area where the glasses of the original wearing image are processed, the processed non-wearing image, which is an image of a person who does not wear glasses processed from the original wearing image, and the processed non-wearing image. If an image of a generator and an original unworn image, which is an image of a person who does not actually wear glasses, is input, whether the input image is real through a plurality of calculations in which weight is applied to the input image Or a global discriminator that prints out whether it has been processed, or a part of the original unworn video. Whether the input local image is real through a plurality of calculations in which a local image of one of the original local image and the processed local image which is a part of the non-processed image is input, and a weight is applied to the input local image, or The local discriminator outputting whether it is processed, the global discriminator is trained to output as the actual image when the original unworn image is input to the global discriminator, and output as the processed image when the non-processed image is input, and the local discriminator When the original local image is input, the local discriminator is trained to output as a real one, and when the processed local image is input, the local discriminator is trained to be output, and the global discriminator outputs the input unprocessed image as real and discriminates the local. Giga is the actual processed local video It should include a study to learn the generator to power.

상기 가공미착용영상은 상기 원본착용영상의 각 픽셀에 대응하는 RGB 값으로 이루어지며, 상기 가공안경마스크는 상기 원본착용영상의 각 픽셀에 대응하여 상기 원본착용영상의 안경이 위치한 영역을 가공한 영역을 나타내는지 여부를 나타내는 플래그값으로 이루어지는 것을 특징으로 한다. The unprocessed image is composed of RGB values corresponding to each pixel of the original wearable image, and the processed glasses mask corresponds to each pixel of the original worn image and processes the area where the glasses of the original worn image are located. It is characterized by comprising a flag value indicating whether or not to indicate.

상기 장치는 안경을 착용한 사용자를 촬영하여 사용자의 원본착용영상을 생성하는 카메라부와, 상기 사용자의 원본착용영상을 상기 생성기에 입력하여 사용자의 가공미착용영상을 생성하는 영상생성부를 더 포함한다. The apparatus further includes a camera unit for photographing a user wearing glasses and generating a user's original wearing image, and an image generating unit for inputting the user's original wearing image into the generator to generate a user's unprocessed image.

또한, 상기 장치는 복수의 안경 이미지를 표시하는 표시부를 더 포함한다. In addition, the device further includes a display unit for displaying a plurality of glasses images.

상기 영상생성부는 사용자의 선택에 따라 복수의 안경 이미지 중 선택된 안경 이미지를 상기 사용자의 가공미착용영상에 합성하여 합성착용영상을 생성하고, 생성된 합성착용영상을 상기 표시부를 통해 표시하는 것을 특징으로 한다. The image generating unit may synthesize a selected glasses image among a plurality of glasses images according to a user's selection, and generate a synthetic wear image by displaying the synthesized wear image on the user's unprocessed image, and display the generated composite wear image through the display unit. .

상기 학습부는 상기 글로벌판별기 및 상기 로컬판별기를 학습시키는 절차와, 상기 생성기를 학습시키는 절차를 교대로 반복하는 것을 특징으로 한다. The learning unit may alternately repeat the procedure of learning the global discriminator and the local discriminator and the procedure of training the generator.

상기 학습부는, 상기 글로벌판별기 및 상기 로컬판별기를 학습시킬 때, 상기 글로벌판별기가 상기 가공미착용영상이 입력되면 가공된 것으로 출력하도록 상기 글로벌판별기의 가중치를 수정하고, 상기 로컬판별기가 상기 가공로컬영상이 입력되면 가공된 것으로 출력하도록 상기 로컬판별기의 가중치를 수정하는 것을 특징으로 한다. The learning unit, when learning the global discriminator and the local discriminator, corrects the weight of the global discriminator so that the global discriminator outputs the processed image when the non-wearing image is input, and the local discriminator is the local local. When the image is input, it is characterized in that the weight of the local discriminator is corrected to be output as a processed one.

상기 학습부는, 상기 생성기를 학습시킬 때, 상기 글로벌판별기가 상기 가공미착용영상이 입력되면 실제인 것으로 출력하고, 상기 로컬판별기가 상기 가공로컬영상이 입력되면 실제인 것으로 출력하도록 상기 생성기의 가중치를 수정하는 것을 특징으로 한다. When learning the generator, the learning unit corrects the weight of the generator so that the global discriminator outputs the actual image when the unprocessed image is input, and outputs the local discriminator as real when the processed local image is input. It is characterized by.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 방법은 생성기가, 실제 안경을 착용한 사람의 영상인 원본착용영상이 입력되면, 가중치가 적용되는 복수의 연산을 통해 상기 원본착용영상으로부터 가공된 안경을 미착용한 사람의 영상인 가공미착용영상 및 상기 가공미착용영상 중 상기 원본착용영상의 안경이 위치한 영역을 가공한 영역을 나타내는 가공안경마스크를 생성하는 단계와, 학습부가, 상기 글로벌판별기가 실제 안경을 미착용한 사람의 영상인 원본미착용영상이 입력되면 실제인 것으로 출력하고, 상기 가공미착용영상이 입력되면 가공된 것으로 출력하도록 하고, 상기 로컬판별기가 상기 원본미착용영상의 일부 영역인 원본로컬영상이 입력되면 실제인 것으로 출력하고, 상기 가공미착용영상의 일부 영역인 가공로컬영상이 입력되면 가공된 것으로 출력하도록 상기 글로벌판별기 및 상기 로컬판별기를 학습시키는 단계와, 상기 학습부가, 상기 글로벌판별기가 상기 가공미착용영상을 실제인 것으로 출력하고, 상기 로컬판별기가 상기 가공로컬영상을 실제인 것으로 출력하도록 상기 생성기를 학습시키는 단계를 포함한다. A method for generating an image of wearing glasses using an artificial neural network according to a preferred embodiment of the present invention for achieving the above object is that the generator, when the original wearing image that is the image of the person wearing the actual glasses is input, the weight A processing glasses mask indicating an area in which the area where the glasses of the original wearing image are located is processed, which is an image of a person who does not wear the glasses processed from the original wearing image through a plurality of operations to which is applied. Generating, and the learning unit, the global discriminator outputs the original unworn image, which is the image of the person who is not wearing the actual glasses, and outputs the processed image when the unprocessed image is input, and outputs the processed image. When the original local image, which is a partial area of the original unworn image, is input, the discriminator Outputting, and learning the global discriminator and the local discriminator to output a processed local image that is a partial region of the non-processed image, and the learning unit, the global discriminator to display the non-processed image. And outputting what is real and learning the generator so that the local discriminator outputs the processed local image as real.

상기 방법은 제어부가 카메라부를 통해 안경을 착용한 사용자를 촬영하여 사용자의 원본착용영상을 생성하는 단계와, 영상생성부가 상기 사용자의 원본착용영상을 상기 생성기에 입력하여 사용자의 가공미착용영상을 생성하는 단계를 더 포함한다. The method includes the step of generating a user's original wearing image by photographing a user wearing glasses through the camera unit, and the image generating unit inputting the user's original wearing image into the generator to generate a user's unprocessed image. Further comprising steps.

상기 방법은 상기 영상생성부가 복수의 안경 이미지 중 선택된 안경 이미지를 상기 사용자의 가공미착용영상에 합성하여 합성착용영상을 생성하는 단계와, 상기 영상생성부가 상기 생성된 합성착용영상을 표시부를 통해 표시하는 단계를 더 포함한다. The method includes the step of generating a composite wearing image by synthesizing the selected glasses image among a plurality of glasses images from the plurality of glasses images by the image generating unit, and the image generating unit displaying the generated synthetic wearing image through a display unit. Further comprising steps.

특히, 상기 글로벌판별기 및 상기 로컬판별기를 학습시키는 단계와 상기 생성기를 학습시키는 단계는 교대로 반복되는 것을 특징으로 한다. In particular, the step of learning the global discriminator and the local discriminator and the step of learning the generator are characterized by being repeated alternately.

상기 글로벌판별기 및 상기 로컬판별기를 학습시키는 단계는 상기 학습부가 상기 글로벌판별기가 상기 가공미착용영상이 입력되면 가공된 것으로 출력하도록 상기 글로벌판별기의 가중치를 수정하고, 상기 로컬판별기가 상기 가공로컬영상이 입력되면 가공된 것으로 출력하도록 상기 로컬판별기의 가중치를 수정하는 것을 특징으로 한다. In the step of learning the global discriminator and the local discriminator, the learning unit corrects the weight of the global discriminator so that the global discriminator outputs as the processed image when the unprocessed image is input, and the local discriminator is the processed local image. When it is input, it is characterized in that the weight of the local discriminator is corrected to be output as a processed one.

상기 생성기를 학습시키는 단계는 상기 학습부가 상기 글로벌판별기가 상기 가공미착용영상이 입력되면 실제인 것으로 출력하고, 상기 로컬판별기가 상기 가공로컬영상이 입력되면 실제인 것으로 출력하도록 상기 생성기의 가중치를 수정하는 것을 특징으로 한다. In the step of learning the generator, the learning unit corrects the weight of the generator so that the global discriminator outputs the actual image when the unprocessed image is input and the local discriminator outputs the actual image when the processed local image is input. It is characterized by.

상술한 바와 같은 목적을 달성하기 위해 전술한 본 발명의 실시예에 따른 안경 착용 영상을 생성하기 위한 방법을 수행하는 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공한다. In order to achieve the above object, there is provided a computer-readable recording medium in which a program for performing a method for generating an eyewear wearing image according to the above-described embodiment of the present invention is recorded.

안경을 착용한 사용자의 영상에서 안경을 소거하지 않고 단순히 안경을 합성할 경우, 착용한 안경과 새로운 안경이 겹쳐져 새로운 안경을 착용한 영상이 부자연스럽다. 따라서 사용자는 안경을 벗은 채로 촬영한 후, 새로운 안경을 합성한 영상을 생성한다. 하지만, 시력이 좋지 않은 안경 착용자는 안경을 벗은 상태에서 사물이 흐릿하게 보이기 때문에 새로운 안경을 착용한 영상을 확인하기 위해서는 흐릿하게 보이는 영상을 확인하거나, 다시 안경을 착용해야 하는 번거로움이 있다. 하지만, 본 발명에 따르면, 안경 쓴 영상에서 안경을 소거한 후, 새로운 안경을 착용한 영상을 제공한다. 따라서 안경 쓴 사람은 자신의 사진을 촬영할 때 안경을 벗은 채로 촬영할 필요가 없으며, 새로운 안경을 착용한 영상을 확인하기 위해 안경을 다시 착용하는 번거로움도 해소할 수 있다. When glasses are simply synthesized without erasing the glasses from the image of the user who wears the glasses, the images of the new glasses are unnatural due to the overlapping of the glasses and the new glasses. Therefore, the user photographs the glasses naked, and then creates a new composite image of the glasses. However, the wearer with poor eyesight has a hassle of checking the blurry image or wearing the glasses again in order to check the image wearing the new eyeglasses because the objects appear blurry while the glasses are removed. However, according to the present invention, after the glasses are erased from the glasses, the new glasses are worn. Therefore, a person wearing glasses does not need to take a picture with his glasses naked when taking a picture of himself, and it can also eliminate the hassle of wearing glasses again to check the image of wearing new glasses.

도 1은 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치의 인공신경망의 세부 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 인공신경망의 생성기의 출력값을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 글로벌판별기 및 로컬판별기에 대한 학습 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 글로벌판별기 및 로컬판별기에 대한 학습 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 실시예에 따른 생성기에 대한 학습 방법을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 생성기에 대한 학습 방법을 설명하기 위한 흐름도이다.
도 8은 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하는 방법을 설명하기 위한 흐름도이다. 1 is a view for explaining the configuration of a device for generating glasses wearing image using an artificial neural network according to an embodiment of the present invention.
2 is a view for explaining the detailed configuration of the artificial neural network of the device for generating an image wearing glasses using the artificial neural network according to an embodiment of the present invention.
3 is a view for explaining the output value of the generator of the artificial neural network according to an embodiment of the present invention.
4 is a diagram for explaining a learning method for a global classifier and a local classifier according to an embodiment of the present invention.
5 is a flowchart for explaining a learning method for a global classifier and a local classifier according to an embodiment of the present invention.
6 is a view for explaining a learning method for a generator according to an embodiment of the present invention.
7 is a flowchart illustrating a learning method for a generator according to an embodiment of the present invention.
8 is a flowchart for explaining a method of generating glasses wearing images using an artificial neural network according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, terms or words used in the present specification and claims described below should not be interpreted as being limited to a conventional or lexical meaning, and the inventor may use his own invention in the best way. In order to explain, it should be interpreted as meanings and concepts consistent with the technical spirit of the present invention based on the principle that it can be properly defined as a concept of terms. Therefore, the embodiments shown in the embodiments and the drawings shown in this specification are only the most preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, and various equivalents that can replace them at the time of this application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. At this time, it should be noted that the same components in the accompanying drawings are indicated by the same reference numerals as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the subject matter of the present invention will be omitted. For the same reason, in the accompanying drawings, some components are exaggerated, omitted, or schematically illustrated, and the size of each component does not entirely reflect the actual size.

먼저, 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치의 구성에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치의 구성을 설명하기 위한 도면이다. 도 2는 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치의 인공신경망의 세부 구성을 설명하기 위한 도면이다. First, a configuration of an apparatus for generating an image of wearing glasses using an artificial neural network according to an embodiment of the present invention will be described. 1 is a view for explaining the configuration of a device for generating glasses wearing image using an artificial neural network according to an embodiment of the present invention. 2 is a view for explaining the detailed configuration of the artificial neural network of the device for generating an image wearing glasses using an artificial neural network according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 장치(이하, '영상생성장치'로 칭함)는 카메라부(110), 입력부(120), 표시부(130), 저장부(140) 및 제어부(150)를 포함한다. Referring to FIG. 1, an apparatus (hereinafter, referred to as an “image generating device”) for generating an eyewear wearing image using an artificial neural network according to an embodiment of the present invention includes a camera unit 110, an input unit 120, and a display unit ( 130), a storage unit 140 and a control unit 150.

카메라부(110)는 사용자의 영상, 즉, 사용자가 안경을 착용한 영상을 촬영하기 위한 것이다. 이러한 카메라부(110)는 이미지 센서를 포함한다. 이미지 센서는 피사체에서 반사되는 빛을 입력받아 전기신호로 변환하며, CCD(Charged Coupled Device), CMOS(Complementary Metal-Oxide Semiconductor) 등을 기반으로 구현될 수 있다. 카메라부(110)는 아날로그-디지털 변환기(Analog to Digital Converter)를 더 포함할 수 있으며, 이미지 센서에서 출력되는 전기신호를 디지털 수열로 변환하여 제어부(150)로 출력할 수 있다. The camera unit 110 is for capturing an image of a user, that is, an image of the user wearing glasses. The camera unit 110 includes an image sensor. The image sensor receives light reflected from the subject and converts it into an electrical signal, and may be implemented based on a CCD (Charged Coupled Device), a Complementary Metal-Oxide Semiconductor (CMOS), or the like. The camera unit 110 may further include an analog-to-digital converter, and convert the electrical signal output from the image sensor into a digital sequence and output it to the controller 150.

입력부(120)는 사용자 장치(100)의 각 종 기능, 동작 등을 제어하기 위한 사용자의 키 조작을 입력받고 입력 신호를 생성하여 제어부(150)에 전달한다. 입력부(120)는 특수키, 키패드, 키보드, 마우스, 트랙볼 등을 예시할 수 있다. 특히, 입력부(120)는 전원 on/off를 위한 전원 키, 문자 키, 숫자 키, 방향키 중 적어도 하나를 포함할 수 있다. 입력부(120)의 기능은 표시부(130)가 터치스크린으로 구현된 경우, 표시부(130)에서 이루어질 수 있으며, 표시부(130)만으로 모든 기능을 수행할 수 있는 경우, 입력부(120)는 생략될 수도 있다. The input unit 120 receives a key operation of a user for controlling various functions, operations, and the like of the user device 100, generates an input signal, and transmits the input signal to the control unit 150. The input unit 120 may exemplify a special key, a keypad, a keyboard, a mouse, and a trackball. In particular, the input unit 120 may include at least one of a power key for power on / off, a character key, a number key, and a direction key. The function of the input unit 120 may be performed by the display unit 130 when the display unit 130 is implemented as a touch screen, and if all functions can be performed by the display unit 130 alone, the input unit 120 may be omitted. have.

표시부(130)는 제어부(150)로부터 화면 표시를 위한 데이터를 수신하여 수신된 데이터를 화면으로 표시할 수 있다. 또한, 표시부(130)는 사용자 장치(100)의 메뉴, 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공할 수 있다. 표시부(130)가 터치스크린으로 형성되는 경우, 입력부(120)의 기능의 일부 또는 전부를 대신 수행할 수 있다. 표시부(130)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. The display unit 130 may receive data for screen display from the control unit 150 and display the received data on a screen. In addition, the display 130 may visually provide a menu, data, function setting information, and other various information of the user device 100 to the user. When the display unit 130 is formed of a touch screen, some or all of the functions of the input unit 120 may be performed instead. The display 130 may be formed of a liquid crystal display (LCD), organic light emitting diodes (OLED), active matrix organic light emitting diodes (AMOLED), or the like.

저장부(140)는 사용자 장치(100)의 동작에 필요한 각 종 데이터, 어플리케이션, 사용자 장치(100)의 동작에 따라 발생된 각 종 데이터를 저장하는 역할을 수행한다. 이러한 저장부(140)는 스토리지, 메모리 등이 될 수 있다. 이러한 저장부(140)는 사용자 장치(100)의 부팅(booting) 및 운영(operation)을 위한 운영체제(OS, Operating System), 본 발명의 실시예에 따른 게임을 제공하기 위한 애플리케이션을 저장할 수 있다. 저장부(140)에 저장되는 각 종 데이터는 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 140 serves to store various types of data required for the operation of the user device 100, applications, and various types of data generated according to the operation of the user device 100. The storage unit 140 may be storage, memory, or the like. The storage unit 140 may store an operating system (OS) for booting and operating the user device 100 and an application for providing a game according to an embodiment of the present invention. Various types of data stored in the storage unit 140 may be deleted, changed, or added according to user manipulation.

제어부(150)는 사용자 장치(100)의 전반적인 동작 및 사용자 장치(100)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 이러한 제어부(150)는 중앙 처리 장치(Central Processing Unit : CPU), 어플리케이션 프로세서(Application Processor), GPU(Graphic Processing Unit) 등이 될 수 있다. The controller 150 may control the overall operation of the user device 100 and the signal flow between the internal blocks of the user device 100, and may perform a data processing function for processing data. The controller 150 may be a central processing unit (CPU), an application processor, a GPU (Graphic Processing Unit), or the like.

제어부(150)는 인공신경망(artificial neural network: 200), 학습부(300) 및 영상생성부(400)를 포함한다. 이러한 인공신경망(200), 학습부(300) 및 영상생성부(400)는 하드웨어 형태로 제어부(150)의 일 구성으로 구현되거나, 소프트웨어 형태로 제어부(150)에서 구동될 수 있다. 이러한 인공신경망(200), 학습부(300) 및 영상생성부(400)를 포함하는 제어부(150)의 동작은 아래에서 보다 상세하게 설명될 것이다. The control unit 150 includes an artificial neural network (200), a learning unit 300, and an image generating unit 400. The artificial neural network 200, the learning unit 300, and the image generating unit 400 may be implemented as a component of the control unit 150 in the form of hardware, or may be driven by the control unit 150 in the form of software. The operation of the control unit 150 including the artificial neural network 200, the learning unit 300, and the image generating unit 400 will be described in more detail below.

또한 도시되진 않았으나, 본 발명의 실시예에 따른 사용자 장치(100)는 메모리 카드와 같은 외부 저장 매체를 삽입하여 데이터 저장을 가능토록 하는 저장매체 삽입부, 외부 디지털 기기와의 데이터 교환을 위한 연결 단자, 충전용 단자를 구비할 수 있다. 또한, 사용자 장치(100)는 마이크 및 스피커를 통해 오디오 신호를 입력 혹은 출력하는 오디오 처리부, 디지털 음원 재생을 위한 MP3 모듈 등의 부가 기능을 갖는 유닛들을 선택적으로 더 포함하여 구성될 수 있다. 디지털 기기의 컨버전스(convergence) 추세에 따라 휴대 기기의 변형이 매우 다양하여 모두 열거할 수는 없으나, 상기 언급된 유닛들과 동등한 수준의 유닛이 본 발명에 따른 사용자 장치(100)에 추가로 더 포함되어 구성될 수 있다는 것은 본 기술분야의 통상의 지식을 가진 자라면 쉽게 이해할 수 있을 것이다. Also, although not shown, the user device 100 according to an embodiment of the present invention inserts an external storage medium such as a memory card to insert a storage medium to enable data storage, and a connection terminal for data exchange with an external digital device , A charging terminal may be provided. In addition, the user device 100 may be further configured to further include units having additional functions such as an audio processing unit for inputting or outputting an audio signal through a microphone and a speaker, an MP3 module for reproducing a digital sound source, and the like. According to the trend of convergence of digital devices, the variations of the portable devices are very diverse, and thus cannot be enumerated, but a unit equivalent to the above-mentioned units is further included in the user device 100 according to the present invention. It can be easily configured by those of ordinary skill in the art.

그러면, 본 발명의 실시예에 따른 인공신경망(200)에 대해서 보다 상세하게 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 인공신경망의 구성을 설명하기 위한 도면이다. 도 3은 본 발명의 실시예에 따른 인공신경망의 생성기의 출력값을 설명하기 위한 도면이다. Then, the artificial neural network 200 according to an embodiment of the present invention will be described in more detail. 2 is a view for explaining the configuration of an artificial neural network according to an embodiment of the present invention. 3 is a view for explaining the output value of the generator of the artificial neural network according to an embodiment of the present invention.

도 2를 참조하면, 인공신경망(200)은 생성기(210), 글로벌판별기(220) 및 로컬판별기(230)를 포함한다. 2, the artificial neural network 200 includes a generator 210, a global discriminator 220, and a local discriminator 230.

생성기(210), 글로벌판별기(220) 및 로컬판별기(230) 각각이 독립적인 인공신경망이 될 수 있다. 이에 따라, 생성기(210), 글로벌판별기(220) 및 로컬판별기(230) 각각은 복수의 계층으로 이루어져 있으며, 복수의 계층 각각은 가중치가 적용되는 복수의 연산을 포함한다. 여기서, 복수의 계층은 컨볼루션 계층(convolution layer), 디컨볼루션 계층(deconvolution layer), 풀링 계층(pooling layer), 완전연결계층(fully-connected layer) 등을 예시할 수 있다. 또한, 연산은 컨볼루션(convolution) 연산, 디컨볼루션(deconvolution) 연산, 최대 풀링(max-pooling) 연산, 최소 풀링(min-pooling) 연산, 소프트맥스(soft-max) 연산 등을 예시할 수 있다. 이러한 연산들은 모두 각각 가중치를 포함한다. 예컨대, 컨볼루션 연산, 풀링 연산 등은 필터를 이용하며, 이러한 필터는 행렬로 이루어지고, 행렬의 각 원소의 값은 가중치가 될 수 있다. Each of the generator 210, the global discriminator 220, and the local discriminator 230 may be independent artificial neural networks. Accordingly, each of the generator 210, the global discriminator 220, and the local discriminator 230 includes a plurality of layers, and each of the plurality of layers includes a plurality of operations to which weights are applied. Here, the plurality of layers may exemplify a convolution layer, a deconvolution layer, a pooling layer, a fully-connected layer, and the like. In addition, the operation can be exemplified by a convolution operation, a deconvolution operation, a max-pooling operation, a min-pooling operation, and a soft-max operation. have. Each of these operations includes a weight. For example, a convolution operation, a pooling operation, or the like uses a filter, such a filter is composed of a matrix, and the value of each element of the matrix can be a weight.

생성기(210)는 원본착용영상(10)이 입력되면, 가중치가 적용되는 복수의 연산을 통해 가공미착용영상(30) 및 가공안경마스크(35)를 출력한다. 여기서, 원본착용영상(10)은 실제 안경을 착용한 사람의 영상을 의미한다. 또한, 가공미착용영상(30)은 원본착용영상(10)으로부터 가공된 안경을 미착용한 사람의 영상을 의미한다. 그리고 가공안경마스크(35)는 가공미착용영상(30) 중 원본착용영상(10)의 안경이 위치한 영역을 가공한 영역을 나타낸다. When the original wear image 10 is input, the generator 210 outputs the unprocessed image 30 and the processed glasses mask 35 through a plurality of calculations to which weights are applied. Here, the original wearing image 10 means an image of a person wearing actual glasses. In addition, the unprocessed image 30 refers to an image of a person who does not wear glasses processed from the original wearing image 10. In addition, the processed glasses mask 35 represents an area in which the area where the glasses of the original wearing image 10 are located is processed among the unworn images 30.

도 3을 참조하여 보다 자세히 설명하면, 생성기(210)는 원본착용영상(10)의 각 픽셀에 대응하여 가중치가 적용되는 복수의 연산을 통해 각 픽셀의 픽셀값(예컨대, RGB값)과 플래그값(예컨대, 비트 0 혹은 1)을 출력한다. 가공미착용영상(30)은 원본착용영상(10)의 각 픽셀에 대응하여 복수의 연산을 통해 출력된 각 픽셀의 픽셀값(예컨대, RGB값)으로 이루어진다. 또한, 가공안경마스크(35)는 원본착용영상(10)의 각 픽셀에 대응하여 복수의 연산을 통해 출력된 플래그값(예컨대, 비트 0 혹은 1)으로 이루어진다. 예컨대, 원본착용영상(10)의 픽셀 P01에 대응하여 픽셀 P11의 픽셀값과 플래그값 F1이 출력될 수 있다. 또한, 원본착용영상(10)의 P02에 대응하여 픽셀 P12의 픽셀값과 플래그값 F2가 출력될 수 있다. 즉, 플래그값은 도 3의 플래그값 F1과 같이, 원본착용영상(10)에서 안경이 위치한 영역(픽셀 P01)로부터 생성된 픽셀인지 혹은 도 3의 플래그값 F2와 같이, 안경이 위치하지 않은 영역(픽셀 P02)으로부터 생성된 픽셀인지 여부를 나타낸다. 도시된 바에 따르면, 픽셀 P11에 대응하는 플래그값 F1은 비트 1이고, 이는 가공미착용영상(30) 중 원본착용영상(10)의 안경이 위치한 영역을 가공한 영역을 나타낸다. 또한, 픽셀 P11에 대응하는 플래그값 F2는 비트 0이고, 가공미착용영상(30) 중 원본착용영상(10)의 안경이 위치한 영역을 가공한 영역 이외의 영역을 나타낸다. Referring to FIG. 3 in more detail, the generator 210 generates pixel values (eg, RGB values) and flag values of each pixel through a plurality of operations in which a weight is applied to each pixel of the original wearing image 10. (Eg, bit 0 or 1) is output. The unprocessed image 30 is composed of pixel values (eg, RGB values) of each pixel output through a plurality of operations corresponding to each pixel of the original wearing image 10. In addition, the processed glasses mask 35 is composed of flag values (eg, bit 0 or 1) output through a plurality of operations corresponding to each pixel of the original wearing image 10. For example, the pixel value and the flag value F1 of the pixel P11 may be output corresponding to the pixel P01 of the original wearing image 10. Further, the pixel value and the flag value F2 of the pixel P12 may be output corresponding to P02 of the original wearing image 10. That is, the flag value is a pixel generated from an area (pixel P01) in which the glasses are located in the original wearing image 10, as in the flag value F1 in FIG. 3, or an area in which the glasses are not located, as in the flag value F2 in FIG. It indicates whether the pixel is generated from (pixel P02). According to the illustration, the flag value F1 corresponding to the pixel P11 is bit 1, which represents an area in which the area of the original wearing image 10 glasses is processed. In addition, the flag value F2 corresponding to the pixel P11 is bit 0, and indicates an area other than the area where the glasses of the original wearing image 10 are located in the non-processed image 30.

생성기(210)는 단순히 원본착용영상(10)으로부터 가중치가 적용되는 복수의 연산을 통해 가공미착용영상(30) 뿐만 아니라 가공안경마스크(35)도 출력한다. 따라서 생성기(210)는 학습을 통해 픽셀값인 3개의 채널, 즉, R(빨강) 채널, G(초록) 채널, B(파랑) 채널뿐만 아니라, 해당 픽셀이 안경 영역에 속하는지 여부를 나타내는 플래그값(마스크)에 대한 채널을 추가로 학습할 수 있다. 따라서 단순히 픽셀값으로 3개의 채널(RGB 채널)만을 학습하는 것에 비교하였을 때, 안경을 벗었을 때 안경을 착용했던 영역에 대한 특징을 보다 명확하게 학습하고 보다 자연스러운 가공미착용영상을 생성할 수 있다. The generator 210 simply outputs the unprocessed image 30 as well as the processed glasses mask 35 through a plurality of calculations in which weights are applied from the original worn image 10. Accordingly, the generator 210 learns through three channels, which are pixel values, that is, an R (red) channel, a G (green) channel, and a B (blue) channel, as well as a flag indicating whether the corresponding pixel belongs to the spectacle region The channel for the value (mask) can be further learned. Therefore, compared to simply learning three channels (RGB channels) as pixel values, it is possible to more clearly learn the characteristics of the area where the glasses were worn when the glasses are taken off and generate a more natural unprocessed image.

글로벌판별기(220)는 원본미착용영상 및 가공미착용영상 중 어느 하나의 영상이 입력되면, 입력된 영상에 대해 가중치가 적용되는 복수의 연산을 통해 입력된 영상이 실제인지 혹은 가공된 것인지 여부를 출력한다. 여기서, 원본미착용영상은 실제 안경을 미착용한 사람의 영상을 나타낸다. 또한, 가공미착용영상은 생성기(210)가 생성한 영상을 나타낸다. The global discriminator 220 outputs whether the input image is real or processed through a plurality of calculations in which a weight is applied to the input image when either one of the original unprocessed image and the unprocessed image is input. do. Here, the original unworn image represents an image of a person who does not actually wear glasses. In addition, the non-processed image represents the image generated by the generator 210.

로컬판별기(230)는 원본미착용영상에서 임의의 일부 영역을 추출한 로컬 영상인 원본로컬영상 및 가공미착용영상에서 임의의 일부 영역을 추출한 로컬 영상인 가공로컬영상을 입력받고, 입력된 로컬 영상에 대해 가중치가 적용되는 복수의 연산을 통해 입력된 로컬 영상이 실제인지 혹은 가공된 것인지 여부를 출력한다. 여기서, 원본로컬영상의 원본미착용영상에서의 위치 및 크기와, 가공로컬영상의 가공미착용영상에서의 위치 및 크기는 랜덤으로 결정된다. The local discriminator 230 receives the original local image, which is a local image extracted from the original unworn image, and the processed local image, which is a local image extracted from the unprocessed image, and the input local image. It outputs whether the local image inputted through a plurality of operations to which the weight is applied is real or processed. Here, the position and size of the original local image in the original unworn image and the location and size of the processed local image in the unworn image are randomly determined.

다음으로, 보다 자세히 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하기 위한 방법에 대해서 설명하기로 한다. 본 발명은 인공신경망(200)을 이용하기 위하여 인공신경망(200)을 학습시킨다. 이러한 학습 방법에 대해서 설명하기로 한다. Next, a method for generating glasses wearing images using an artificial neural network according to an embodiment of the present invention will be described in more detail. The present invention trains the artificial neural network 200 to use the artificial neural network 200. This learning method will be described.

본 발명은 글로벌판별기(220) 및 로컬판별기(230)에 대한 학습과 생성기(210)에 대한 학습을 번갈아가면서 수행한다. 먼저, 글로벌판별기(220) 및 로컬판별기(230)에 대한 학습에 대해서 설명하기로 한다. The present invention is performed by alternately learning the global discriminator 220 and the local discriminator 230 and the generator 210. First, learning about the global discriminator 220 and the local discriminator 230 will be described.

먼저, 글로벌판별기(220) 및 로컬판별기(230)에 대한 학습에 대해서 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 글로벌판별기 및 로컬판별기에 대한 학습 방법을 설명하기 위한 도면이다. 도 5는 본 발명의 실시예에 따른 글로벌판별기 및 로컬판별기에 대한 학습 방법을 설명하기 위한 흐름도이다. First, learning about the global discriminator 220 and the local discriminator 230 will be described. 4 is a diagram for explaining a learning method for a global classifier and a local classifier according to an embodiment of the present invention. 5 is a flowchart for explaining a learning method for a global classifier and a local classifier according to an embodiment of the present invention.

먼저, 도 4를 참조하면, 글로벌판별기(220) 및 로컬판별기(230)를 포함하는 판별기를 학습시키기 위한 학습 데이터는 원본 영상 및 가공 영상을 이용할 수 있다. 즉, 학습 데이터는 원본착용영상(10), 원본미착용영상(20), 원본로컬영상(23), 가공미착용영상(30) 및 가공로컬영상(33)을 포함한다. 원본착용영상(10)은 사람이 안경을 착용한 영상이다. 그리고 원본미착용영상(20)은 사람이 안경을 미착용한 영상이다. 또한, 원본로컬영상(23)은 원본미착용영상(20)으로부터 추출된 원본미착용영상(20)의 일부이다. 원본착용영상(10) 및 원본미착용영상(20)은 실제 사람을 촬영한 영상이며, 원본로컬영상(23)은 원본미착용영상(20)으로부터 생성된다. 반면, 가공미착용영상(30) 및 가공로컬영상(33)는 생성기(210)에 의해 원본착용영상(10)으로부터 가공된 영상이다. 즉, 생성기(210)는 원본착용영상(10)이 입력되면, 복수의 연산을 통해 원본착용영상(10)의 안경 부분을 제거하고, 제거된 부분에 안경을 착용하지 않았다면 보이는 얼굴 부분을 채워 넣은 가공미착용영상(30)을 출력한다. 이때, 생성기(210)는 원본착용영상(10)의 각 픽셀에 대응하여 복수의 연산을 통해 RGB 값을 출력한다. 이에 따라, 가공미착용영상(30)은 원본착용영상(10)의 각 픽셀에 대응하여 생성기(210)로부터 출력된 RGB 값으로 이루어진다. 이와 함께, 생성기(210)는 가공미착용영상(30)의 안경을 제거하고 얼굴을 채워 넣은 부분, 즉, 원본착용영상(10)의 안경이 위치한 영역을 가공한 영역을 나타내는 가공안경마스크(35)를 생성한다. 여기서, 가공로컬영상(33)은 가공안경마스크(35)와는 별개로 가공미착용영상(30)으로부터 추출된 가공미착용영상(30)의 일부이다. 즉, 가공로컬영상(33)은 가공미착용영상(30)으로부터 생성된다. First, referring to FIG. 4, the original image and the processed image may be used as training data for training the discriminator including the global discriminator 220 and the local discriminator 230. That is, the learning data includes an original worn image 10, an original unworn image 20, an original local image 23, an unprocessed image 30, and a processed local image 33. The original wearing image 10 is an image of a person wearing glasses. In addition, the original non-wearing image 20 is an image in which a person does not wear glasses. Also, the original local image 23 is a part of the original unworn image 20 extracted from the original unworn image 20. The original worn image 10 and the original unworn image 20 are images of a real person, and the original local image 23 is generated from the original unworn image 20. On the other hand, the unprocessed image 30 and the processed local image 33 are images processed from the original wearing image 10 by the generator 210. That is, when the original wearing image 10 is input, the generator 210 removes the glasses portion of the original wearing image 10 through a plurality of calculations, and fills the visible face portion if the glasses are not worn on the removed portion The non-processed image 30 is output. At this time, the generator 210 outputs an RGB value through a plurality of operations corresponding to each pixel of the original wearing image 10. Accordingly, the unprocessed image 30 consists of RGB values output from the generator 210 corresponding to each pixel of the original wearing image 10. Along with this, the generator 210 removes the glasses of the unworn image 30 and fills the face, that is, the processed glasses mask 35 indicating the area where the regions of the original worn image 10 are located. Produces Here, the processed local image 33 is a part of the unweared image 30 extracted from the unweared image 30 separately from the processed glasses mask 35. That is, the processed local image 33 is generated from the unprocessed image 30.

도 4 및 도 5를 참조하면, 학습부(300)는 S110 단계에서 글로벌판별기(220) 및 로컬판별기(230)에 학습 데이터를 입력한다. 학습부(300)는 글로벌판별기(220)에 원본미착용영상(20) 및 가공미착용영상(30) 중 어느 하나를 학습 데이터로 입력할 수 있다. 또한, 학습부(300)는 원본미착용영상(20)으로부터 원본로컬영상(23)을 추출하거나, 가공미착용영상(30)으로부터 가공로컬영상(33)을 추출하여 로컬판별기(230)에 원본로컬영상(23) 및 가공로컬영상(33) 중 어느 하나의 로컬 영상을 학습 데이터로 입력할 수 있다. 원본미착용영상(20) 및 이로부터 추출되는 원본로컬영상(23)은 저장부(140)에 미리 저장된 것을 이용할 수 있다. 또한, 가공미착용영상(30) 및 이로부터 추출되는 가공로컬영상(33)은 생성기(210)에 원본착용영상(10)을 입력하여 생성기(210)를 통해 생성한 것을 사용한다. 4 and 5, the learning unit 300 inputs learning data to the global classifier 220 and the local classifier 230 in step S110. The learning unit 300 may input any one of the original unworn image 20 and the processed unworn image 30 into the global discriminator 220 as learning data. In addition, the learning unit 300 extracts the original local image 23 from the original unworn image 20 or extracts the processed local image 33 from the unworn image 30 to extract the original local image 23 into the local discriminator 230. Any one of the local image 23 and the processed local image 33 may be input as learning data. The original unworn image 20 and the original local image 23 extracted therefrom may be used in advance stored in the storage 140. In addition, the unprocessed image 30 and the processed local image 33 extracted therefrom are inputted through the generator 210 by inputting the original wearing image 10 to the generator 210.

학습 데이터가 입력되면, 글로벌판별기(220) 및 로컬판별기(230)는 S120 단계에서 복수의 연산을 통해 입력된 학습 데이터가 실제(real)인지 혹은 가공(fake)된 것인지를 나타내는 출력값을 출력한다. 여기서, 출력값은 입력된 학습 데이터가 실제(real)일 확률 및 가공(fake)된 것일 확률을 포함한다. 즉, 글로벌판별기(220)는 원본미착용영상(20)이 입력되면, 복수의 가중치가 적용되는 연산을 통해 원본미착용영상(20)이 실제(real)일 확률 및 가공(fake)된 것일 확률을 출력값으로 출력할 수 있다. 또한, 글로벌판별기(220)는 가공미착용영상(30)이 입력되면, 복수의 가중치가 적용되는 연산을 통해 가공미착용영상(30)이 실제일 확률 및 가공된 것일 확률을 출력값으로 출력할 수 있다. 예컨대, 출력값은 실제일 확률 93%, 가공된 것일 확률 7%가 될 수 있다. When the training data is input, the global discriminator 220 and the local discriminator 230 output an output value indicating whether the training data input through a plurality of operations in step S120 is real or fake. do. Here, the output value includes a probability that the input learning data is real and a probability that it is faked. That is, when the original unworn image 20 is input, the global discriminator 220 determines the probability that the original unworn image 20 is real and the probability that the original unworn image 20 is real through an operation to which a plurality of weights are applied. It can be output as an output value. In addition, when the unprocessed image 30 is input, the global discriminator 220 may output the probability that the unprocessed image 30 is real and the probability of being processed as output values through an operation to which a plurality of weights are applied. . For example, the output value may be 93% probability of being real and 7% probability of being processed.

마찬가지로, 로컬판별기(230)는 원본로컬영상(23) 또는 가공로컬영상(33)이 입력되면, 복수의 가중치가 적용되는 연산을 통해 입력된 원본로컬영상(23) 또는 가공로컬영상(33)이 실제(real)일 확률 및 가공(fake)된 것일 확률을 출력값으로 출력한다. 예컨대, 출력값은 실제일 확률 15%, 가공된 것일 확률 85%가 될 수 있다. Likewise, when the original local image 23 or the processed local image 33 is input to the local discriminator 230, the original local image 23 or the processed local image 33 input through an operation to which a plurality of weights are applied is applied. The probability of being real and the probability of being fake is output as output values. For example, the output value may be 15% probability of being real and 85% probability of being processed.

한편, 글로벌판별기(220) 및 로컬판별기(230)를 학습시키기 위한 목표는 글로벌판별기(220) 및 로컬판별기(230) 각각이 원본미착용영상(20) 및 원본로컬영상(23)을 실제인 것으로 판별하고, 가공미착용영상(30) 및 가공로컬영상(33)을 가공된 것으로 판별하도록 하는 것이다. 이에 따라, 학습 데이터 각각에 대응하여 목표값이 설정되며, 예컨대, 목표값은 원본미착용영상(20) 또는 원본로컬영상(23)의 경우, 실제일 확률 100%, 가공된 것일 확률 0%로 설정될 수 있다. 또한, 가공미착용영상(30) 또는 가공로컬영상(33)의 경우, 목표값은 실제일 확률 0%, 가공된 것일 확률 100%로 설정될 수 있다. On the other hand, the goal for learning the global discriminator 220 and the local discriminator 230 is that the global discriminator 220 and the local discriminator 230 each have an original unworn image 20 and an original local image 23. It is to determine that it is real, and to determine that the unprocessed image 30 and the processed local image 33 are processed. Accordingly, a target value is set corresponding to each of the learning data, for example, the target value is set to 100% probability of actuality and 0% probability of processing in the case of the original unworn image 20 or the original local image 23. Can be. In addition, in the case of the unprocessed image 30 or the processed local image 33, the target value may be set to 0% probability of being real and 100% probability of being processed.

하지만, 학습이 충분히 완료되기 전, 글로벌판별기(220) 및 로컬판별기(230)의 출력값은 목표값과 차이가 있다. 따라서 학습부(300)는 S130 단계에서 글로벌판별기(220) 및 로컬판별기(230)의 출력값과 목표값을 비교하여, S140 단계에서 목표값과 출력값의 차이가 최소가 되도록 역전파(back-propagation) 알고리즘을 통해 글로벌판별기(220) 및 로컬판별기(230)의 가중치를 수정한다. 이는 즉, 글로벌판별기(220)가 원본미착용영상(20)이 입력되면, 입력된 원본미착용영상(20)을 실제인 것으로 출력하고, 가공미착용영상(30)이 입력되면, 가공미착용영상(30)이 가공된 것으로 출력하도록 학습하는 것을 의미한다. 또한, 로컬판별기(230)가 원본로컬영상(23)이 입력되면, 입력된 원본로컬영상(23)을 실제인 것으로 출력하도록 학습하고, 가공로컬영상(33)이 입력되면, 입력된 가공로컬영상(33)을 가공된 것으로 출력하도록 학습하는 것을 의미한다. However, before learning is sufficiently completed, the output values of the global discriminator 220 and the local discriminator 230 are different from the target values. Accordingly, the learning unit 300 compares the output value and the target value of the global discriminator 220 and the local discriminator 230 in step S130, and back-propagates the back-propagation so that the difference between the target value and the output value is minimal in step S140. The weights of the global classifier 220 and the local classifier 230 are corrected through a propagation algorithm. That is, when the original discriminate image 220 is input, the global discriminator 220 outputs the input original non-wearing image 20 as real, and when the non-processing image 30 is input, the non-processed image 30 ) Means learning to output as processed. In addition, when the original local image 23 is input, the local discriminator 230 learns to output the input original local image 23 as real, and when the processed local image 33 is input, the input local processing image 23 It means learning to output the image 33 as processed.

이와 같이, 도 4 및 도 5를 참조로 하는 실시예에서 생성기(210)의 가중치를 수정하는 학습은 이루어지지 않는다는 점에 유의하여야 한다. 전술한 바와 같이, 글로벌판별기(220) 및 로컬판별기(230)에 대한 학습 후, 생성기(210)에 대한 학습을 수행한다. 그러면, 생성기(210)에 대한 학습에 대해서 설명하기로 한다. 도 6은 본 발명의 실시예에 따른 생성기에 대한 학습 방법을 설명하기 위한 도면이다. 도 7은 본 발명의 실시예에 따른 생성기에 대한 학습 방법을 설명하기 위한 흐름도이다. As described above, it should be noted that in the embodiment with reference to FIGS. 4 and 5, learning to modify the weight of the generator 210 is not performed. As described above, after learning about the global discriminator 220 and the local discriminator 230, the learner 210 is trained. Then, learning about the generator 210 will be described. 6 is a view for explaining a learning method for a generator according to an embodiment of the present invention. 7 is a flowchart illustrating a learning method for a generator according to an embodiment of the present invention.

먼저, 도 6을 참조하면, 생성기(210)를 학습시키기 위한 학습 데이터는 원본착용영상(10)을 이용한다. 전술한 바와 같이, 원본착용영상(10)은 사람이 안경을 착용한 영상이다. First, referring to FIG. 6, the original wear image 10 is used as learning data for training the generator 210. As described above, the original wearing image 10 is an image of a person wearing glasses.

도 7을 참조하면, 학습부(300)는 S210 단계에서 원본착용영상(10)을 생성기(210)에 입력한다. 그러면, 생성기(210)는 S220 단계에서 가각이 가중치가 적용되는 복수의 연산을 통해 가공미착용영상(30) 및 가공안경마스크(35)를 출력한다. 가공미착용영상(30)은 원본착용영상(10)의 각 픽셀에 대응하여 생성기(210)로부터 출력된 RGB 값으로 이루어진다. 또한, 가공안경마스크(35)는 원본착용영상(10)의 각 픽셀에 대응하여 생성기(210)로부터 출력된 플래그값으로 이루어진다. 플래그값은 도 3의 플래그값 F1과 같이, 원본착용영상(10)에서 안경이 위치한 영역(픽셀 P01)로부터 생성된 픽셀인지 혹은 도 3의 플래그값 F2와 같이, 안경이 위치하지 않은 영역(픽셀 P02)으로부터 생성된 픽셀인지 여부를 나타낸다. Referring to FIG. 7, the learning unit 300 inputs the original wearing image 10 into the generator 210 in step S210. Then, the generator 210 outputs the unprocessed image 30 and the processed glasses mask 35 through a plurality of calculations in which the weight is applied in step S220. The unprocessed image 30 consists of RGB values output from the generator 210 corresponding to each pixel of the original wearing image 10. In addition, the processed glasses mask 35 is made of a flag value output from the generator 210 corresponding to each pixel of the original wearing image 10. The flag value is a pixel generated from an area (pixel P01) in which the glasses are located in the original wearing image 10, as in the flag value F1 in FIG. 3, or an area (pixel in which the glasses are not located, as in the flag value F2 in FIG. 3). P02).

그러면, 학습부(300)는 S230 단계에서 글로벌판별기(220)에 가공미착용영상(30)을 입력시키고, 가공미착용영상(30)으로부터 가공로컬영상(33)을 추출한 후, 추출된 가공로컬영상(33)을 로컬판별기(230)에 입력한다. 이에 따라, S240 단계에서 글로벌판별기(220) 및 로컬판별기(230) 각각은 가중치가 적용되는 복수의 연산을 통해 가공미착용영상(30) 및 가공로컬영상(33) 각각이 실제(real)일 확률과 가공(fake)된 것일 확률을 출력한다. Then, the learning unit 300 inputs the unprocessed image 30 to the global discriminator 220 in step S230, extracts the processed local image 33 from the unprocessed image 30, and then extracts the processed local image. (33) is input to the local discriminator 230. Accordingly, in step S240, each of the global discriminator 220 and the local discriminator 230 is a real (real) image of each of the unprocessed image 30 and the processed local image 33 through a plurality of calculations to which weights are applied. Outputs the probability and the probability of being faked.

한편, 생성기(210)를 학습시키기 위한 목표는 생성기(210)가 생성한 영상, 즉, 가공미착용영상(30) 및 가공로컬영상(33)을 글로벌판별기(220) 및 로컬판별기(230)가 실제인 것으로 판별하도록 하는 것이다. 이에 따라, 목표값이 설정되며, 예컨대, 목표값은 판별기(220, 230) 학습과는 반대로 실제일 확률 100%, 가공된 것일 확률 0%로 설정될 수 있다. On the other hand, the goal for learning the generator 210 is the image generated by the generator 210, that is, the unprocessed image 30 and the processed local image 33, the global discriminator 220 and the local discriminator 230. Is to determine that it is real. Accordingly, the target value is set, for example, the target value may be set to 100% probability of being real and 0% probability of being processed as opposed to learning of the discriminators 220 and 230.

하지만, 학습이 충분히 완료되기 전, 글로벌판별기(220) 및 로컬판별기(230)의 출력값은 목표값과 차이가 있다. 따라서 학습부(300)는 S250 단계에서 글로벌판별기(220) 및 로컬판별기(230)의 출력값과 목표값을 비교하여, S260 단계에서 목표값과 출력값의 차이가 최소가 되도록 역전파(back-propagation) 알고리즘을 통해 글로벌판별기(220) 및 로컬판별기(230)의 가중치를 수정하지 않고, 생성기(210)의 가중치만 수정한다. 이는 즉, 생성기(210)가 생성한 가공미착용영상(30)이 글로벌판별기(220)에 입력되면, 글로벌판별기(220)가 가공미착용영상(30)이 실제인 것으로 출력하도록 학습하고, 가공로컬영상(33)이 로컬판별기(230)에 입력되면, 로컬판별기(230)가 입력된 가공로컬영상(33)을 실제인 것으로 출력하도록 학습하는 것을 의미한다. 이와 같이, 도 6 및 도 7을 참조로 하는 실시예에서 생성기(210)의 가중치를 수정하지만, 글로벌판별기(220) 및 로컬판별기(230)에 대한 가중치를 수정하지 않는다는 점에 유의하여야 한다. However, before learning is sufficiently completed, the output values of the global discriminator 220 and the local discriminator 230 are different from the target values. Therefore, the learning unit 300 compares the output value and the target value of the global discriminator 220 and the local discriminator 230 in step S250, and back-propagates (back-) to minimize the difference between the target value and the output value in step S260. It does not modify the weights of the global classifier 220 and the local classifier 230 through the propagation algorithm, but only the weights of the generator 210. That is, when the unprocessed image 30 generated by the generator 210 is input to the global discriminator 220, the global discriminator 220 learns to output the unprocessed image 30 as real, and processes it. When the local image 33 is input to the local discriminator 230, it means that the local discriminator 230 learns to output the inputted local image 33 as real. As described above, it should be noted that, in the embodiment with reference to FIGS. 6 and 7, the weights of the generator 210 are corrected, but the weights for the global discriminator 220 and the local discriminator 230 are not modified. .

한편, 전술한 바와 같이, 도 4 및 도 5를 참조로 하는 판별기 학습 절차와 도 6 및 도 7을 참조로 하는 학습 절차가 번갈아가면서 수행된다. 이러한 학습 절차는 생성기(210), 글로벌판별기(220) 및 로컬판별기(230)의 가중치의 변화가 없을 때까지 반복하여 수행된다. Meanwhile, as described above, the discriminator learning procedure with reference to FIGS. 4 and 5 and the learning procedure with reference to FIGS. 6 and 7 are alternately performed. This learning procedure is repeatedly performed until there is no change in the weight of the generator 210, the global discriminator 220, and the local discriminator 230.

본 발명은 전술한 도 4 내지 도 7에서 설명된 바와 같은 절차에 따라 학습이 충분히 이루어진 인공신경망을 이용하여 안경 착용 영상을 생성한다. 이러한 안경 착용 영상을 생성하는 방법에 대해서 설명하기로 한다. 도 8은 본 발명의 실시예에 따른 인공신경망을 이용한 안경 착용 영상을 생성하는 방법을 설명하기 위한 흐름도이다. The present invention generates an image of wearing glasses using an artificial neural network in which learning is sufficiently performed according to the procedures described in FIGS. 4 to 7 described above. A method of generating such an image wearing glasses will be described. 8 is a flow chart for explaining a method of generating glasses wearing images using an artificial neural network according to an embodiment of the present invention.

도 8에서 사용자는 안경을 착용한 사람이며, 새로운 안경을 구매하기 위한 상황을 가정한다. 제어부(150)는 S310 단계에서 표시부(130)를 통해 복수의 서로 다른 안경 이미지를 표시할 수 있다. 사용자는 입력부(120) 또는 표시부(130)를 통해 복수의 안경 이미지 중 어느 하나를 선택할 수 있다. 사용자가 어느 하나를 선택하면, 제어부(150)는 S320 단계에서 입력부(120) 또는 표시부(130)를 통해 사용자가 선택한 안경(안경 이미지)을 특정할 수 있다. In FIG. 8, the user is a person who wears glasses and assumes a situation for purchasing new glasses. The controller 150 may display a plurality of different glasses images through the display unit 130 in step S310. The user may select any one of a plurality of glasses images through the input unit 120 or the display unit 130. When the user selects one, the controller 150 may specify glasses (glasses images) selected by the user through the input unit 120 or the display unit 130 in step S320.

다음으로, 제어부(150)는 S330 단계에서 카메라부(110)를 통해 안경을 착용한 사용자를 촬영하여 사용자의 원본착용영상(10)을 생성한다. 그러면, 제어부(150)의 영상생성부(400)는 S340 단계에서 앞서 생성된 사용자의 원본착용영상(10)을 인공신경망(200)의 생성기(210)에 입력하여 생성기(210)로부터 사용자의 가공미착용영상(30)을 도출한다. 즉, 생성기(210)는 사용자의 원본착용영상(10)이 입력되면, 가중치가 적용되는 복수의 연산을 통해 사용자의 가공미착용영상(30)을 출력한다. Next, the controller 150 photographs a user wearing glasses through the camera unit 110 in step S330 to generate the original wearing image 10 of the user. Then, the image generating unit 400 of the control unit 150 inputs the user's original wearing image 10 previously generated in step S340 to the generator 210 of the artificial neural network 200 to process the user from the generator 210. The unworn image 30 is derived. That is, when the user's original wearing image 10 is input, the generator 210 outputs the user's unprocessed image 30 through a plurality of calculations to which weights are applied.

사용자의 가공미착용영상(30)이 얻어지면, 영상생성부(400)는 S350 단계에서 사용자의 가공미착용영상(30)에 앞서 사용자가 선택한 안경 이미지를 합성하여 합성착용영상을 생성한다. 그런 다음, 영상생성부(400)는 S360 단계에서 합성착용영상을 표시부(130)를 통해 표시한다. When the user's unprocessed image 30 is obtained, the image generation unit 400 generates a composite wear image by synthesizing the glasses image selected by the user prior to the user's unprocessed image 30 in step S350. Then, the image generating unit 400 displays the composite wearable image through the display unit 130 in step S360.

이에 따라, 사용자는 안경을 벗지 않아도 다른 안경을 착용한 자신의 모습을 표시부(130)를 통해 확인할 수 있다. 이는 특히, 시력이 좋지 않은 사용자가 자신의 새로운 안경을 고르기 위해 안경을 쓰고 벗는 것을 반복하는 번거로움을 해소할 수 있다. Accordingly, the user can check his / her appearance wearing other glasses through the display unit 130 without removing the glasses. This can eliminate the hassle of repeating wearing and taking off glasses, especially for users with poor eyesight, to choose their new glasses.

한편, 본 발명의 실시예에 따른 인공신경망(200)은 영상을 이루는 복수의 RGB 픽셀 각각에 대해 3개의 채널, 즉, R(빨강) 채널, G(초록) 채널, B(파랑) 채널과, 이에 추가로, 해당 픽셀이 안경 영역에 속하는지 여부를 나타내는 마스크에 대한 채널을 학습시킨다. 따라서 RGB 채널만 이용하는 경우에 비해 안경의 형상 및 안경을 벗었을 때 안경을 착용했던 영역에 대한 특징을 보다 명확하게 학습시킬 수 있다. 따라서 보다 자연스러운 가공미착용영상(30)을 생성할 수 있고, 더 나아가 가공미착용영상(30)에 새로운 디자인의 안경을 합성할 때 보다 자연스러운 합성착용영상을 생성할 수 있다. Meanwhile, the artificial neural network 200 according to an embodiment of the present invention includes three channels for each of a plurality of RGB pixels constituting an image, that is, an R (red) channel, a G (green) channel, and a B (blue) channel, In addition, a channel for a mask indicating whether the corresponding pixel belongs to the spectacle area is trained. Therefore, compared to the case of using only the RGB channel, the shape of the glasses and the characteristics of the area where the glasses were worn when the glasses were removed can be more clearly learned. Therefore, a more natural unwearable image 30 can be generated, and further, when a new design of glasses is synthesized on the unprocessed image 30, a more natural synthetic wearable image can be generated.

한편, 앞서 설명된 본 발명의 실시예에 따른 다양한 방법들은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 와이어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 와이어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. On the other hand, various methods according to the embodiments of the present invention described above may be implemented in a program readable form through various computer means and recorded on a computer-readable recording medium. Here, the recording medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention or may be known and usable by those skilled in computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic-optical media such as floptical disks ( magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level language wires that can be executed by a computer using an interpreter, as well as machine language wires such as those produced by a compiler. Such hardware devices may be configured to operate as one or more software modules to perform the operation of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described using several preferred embodiments, these embodiments are illustrative and not limiting. As described above, those skilled in the art to which the present invention pertains will understand that various changes and modifications can be made according to the theory of equality without departing from the spirit of the present invention and the scope of the rights set forth in the appended claims.

110: 카메라부
120: 입력부
130: 표시부
140: 저장부
150: 제어부
200: 인공신경망
210: 생성기
220: 글로벌판별기
230: 로컬판별기 110: camera unit
120: input unit
130: display unit
140: storage
150: control unit
200: artificial neural network
210: generator
220: global discriminator
230: local discriminator

Claims

In the apparatus for generating glasses wearing image using an artificial neural network,
When an original wearing image, which is an image of a person wearing actual glasses, is input, the processed unwearing image and the unprocessed image, which are images of a person not wearing glasses processed from the original wearing image through a plurality of calculations to which weights are applied, are A generator for generating a processed eyeglass mask representing an area in which the area of the original wearing image is processed;
When an image of an original unworn image or an image of the unprocessed image, which is an image of a person who is not wearing real glasses, is input, whether the input image is real or processed through a plurality of calculations in which weight is applied to the input image A global discriminator that outputs whether or not it is output;
The local local image, which is a partial region of the original unworn image, and the processed local image, which is a partial region of the processed unworn image, are inputted through a plurality of operations in which a weight is applied to the input local image. A local discriminator that outputs whether the local image is real or processed; And
When the original unworn image is input to the global discriminator, the global discriminator is trained to output as the processed image, and when the non-processed image is input, the global discriminator is trained to be output. And when the processed local image is input, the local discriminator is learned to output as processed, the global discriminator outputs the input unprocessed image as real, and the local discriminator outputs the input processed local image as real. And a learning unit for learning the generator to output.

According to claim 1,
The unprocessed image is composed of RGB values corresponding to each pixel of the original worn image,
The processing glasses mask is a device for generating an eyeglasses wearing image, characterized in that it comprises a flag value indicating whether or not a region in which the glasses of the original wearing image is located corresponds to each pixel of the original wearing image. .

According to claim 1,
A camera unit for photographing a user wearing glasses to generate an original wearing image of the user; And
And an image generating unit for inputting the user's original wearing image into the generator to generate a user's unprocessed image.

According to claim 3,
Further comprising a display unit for displaying a plurality of glasses images,
The image generating unit synthesizes a selected glasses image among a plurality of glasses images according to a user's selection, and generates a composite wear image by displaying the synthesized wear image on the user's unprocessed image, and displays the generated composite wear image through the display unit. A device for generating an image of wearing glasses.

According to claim 1,
The learning unit,
A procedure for learning the global discriminator and the local discriminator,
Apparatus for generating an image of wearing glasses, characterized in that the procedure of learning the generator is alternately repeated.

According to claim 1,
The learning unit,
When learning the global discriminator and the local discriminator, the global discriminator corrects the weight of the global discriminator to output as the processed image when the unprocessed image is input, and when the local discriminator inputs the processed local image Apparatus for generating an image of wearing glasses, characterized in that the weight of the local discriminator is corrected to be output as processed.

According to claim 1,
The learning unit,
When training the generator, the global discriminator outputs the actual image when the unprocessed image is input, and modifies the weight of the generator to output the local discriminator as the actual image when the processed local image is input. A device for generating an image of wearing glasses.

In the method for generating an image wearing glasses using an artificial neural network,
When the generator inputs the original wearing image, which is the image of the person wearing the actual glasses, the unprocessed image and the unprocessed image, which are images of the person who does not wear the glasses processed from the original wearing image through a plurality of calculations to which weights are applied, are input. Generating a processed spectacles mask representing an area in which the glasses of the original wearing image are located;
When the global discriminator inputs any one of the original unworn image and the unprocessed image, which are images of a person who does not actually wear glasses, the input image is actually processed through a plurality of calculations in which weight is applied to the input image. Outputting whether it is recognized or processed;
When a local discriminator inputs one local image of an original local image which is a partial region of the original unworn image and a processed local image which is a partial region of the original unworn image, a plurality of calculations in which a weight is applied to the input local image Outputting whether the input local image is real or processed through;
The learning unit outputs the global discriminator as real when the original unworn image, which is an image of a person who does not actually wear glasses, is input, and outputs the processed disc if the unprocessed image is input, and the local discriminator outputs the original unworn image. Learning the global discriminator and the local discriminator to output the original local image, which is a partial region, as a real one, and output the processed local image, which is a partial region of the non-processed image, as a processed one; And
And the learning unit training the generator such that the global discriminator outputs the unprocessed image as real and the local discriminator outputs the processed local image as real. Method for creating an image.

The method of claim 8,
The unprocessed image is composed of RGB values corresponding to each pixel of the original worn image,
The processing glasses mask is a method for generating an eyewear wearing image, characterized in that it comprises a flag value indicating whether or not a region in which the eyewear of the original wearing image is located corresponds to each pixel of the original wearing image. .

The method of claim 8,
The control unit photographs a user wearing glasses through the camera unit to generate an original wearing image of the user; And
The image generating unit inputs the user's original wearing image to the generator to generate a user's unprocessed image. A method for generating an eyeglasses wearing image, further comprising.

The method of claim 10,
Generating a composite wearing image by synthesizing the image of the glasses selected from the plurality of eyeglass images by the image generating unit to the unprocessed image of the user; And
And displaying, by the image generating unit, the generated synthetic wearing image through a display unit.

The method of claim 8,
A method for generating an eyeglasses wearing image, wherein the learning unit alternately repeats the steps of learning the global discriminator and the local discriminator, and the learning unit learning the generator.

The method of claim 8,
The step of learning the global discriminator and the local discriminator,
The learning unit corrects the weight of the global discriminator so that the global discriminator outputs the processed image when the unprocessed image is input, and the local discriminator outputs the processed local image as processed when the processed local image is input. Method for generating an image wearing glasses characterized in that the weight is corrected.

The method of claim 8,
The step of learning the generator,
The learning unit corrects the weight of the generator so that the global discriminator outputs the actual image when the unprocessed image is input and the local discriminator outputs the actual image when the processed local image is input. Method for generating.

A computer readable recording medium having a program recorded thereon for executing a method for generating an image of wearing glasses according to any one of claims 8 to 14.