KR20220123091A

KR20220123091A - Image processing methods, devices, devices and storage media

Info

Publication number: KR20220123091A
Application number: KR1020227026471A
Authority: KR
Inventors: 하오 주; 첸이 우; 원옌 우; 천 첸; 차오유 푸; 란 하오
Original assignee: 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드
Priority date: 2020-10-21
Filing date: 2021-07-26
Publication date: 2022-09-05
Also published as: CN112330530B; WO2022083200A1; CN112330530A; TW202217646A

Abstract

본 발명은 이미지 처리 방법, 장치, 기기 및 저장 매체를 개시하고 상기 방법은, 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득된 제1 얼굴 이미지의 제1 얼굴 특징을 획득하고; 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하고, 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭되고; 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득하는 것을 포함한다. The present invention discloses an image processing method, apparatus, apparatus and storage medium, the method comprising: acquiring a first facial feature of a first facial image obtained based on a facial feature of a source image and a target image; obtaining a second facial feature by mapping the first facial feature, wherein a distribution of at least some of the second facial features matches a feature distribution of the target image; and acquiring a target facial image according to the first facial image and the second facial feature.

Description

Image processing methods, devices, devices and storage media

[관련 출원의 교차 인용][Cross Citation of Related Applications]

본 특허 출원은 2020년 10월 21일에 제출된 출원 번호가 "202011135767.7" 인 중 국 특허 출원 및 2020년 11월 4일에 제출된 출원 번호가 "202011218375.7"인 중국 특허 출원의 우선권을 주장하는 바, 이의 모든 내용은 인용을 통해 본 출원에 통합된다.This patent application claims the priority of the Chinese patent application filed on October 21, 2020 with the application number "202011135767.7" and the Chinese patent application filed on November 4, 2020 with the application number "202011218375.7". , the entire contents of which are incorporated herein by reference.

본 발명은 컴퓨터 비전 기술에 관한 것으로, 특히 이미지 처리 방법, 장치, 기기 및 저장 매체에 관한 것이다.The present invention relates to computer vision technology, and more particularly to an image processing method, apparatus, apparatus and storage medium.

얼굴 교환과 같은 이미지 처리는 디지털 엔터테인먼트와 영화 산업에서 광범위한 응용 가치를 가지는 바, 예를 들어 배우 대역의 효과는 얼굴 교환을 통해 달성할 수 있다.Image processing such as face exchange has a wide application value in digital entertainment and film industry, for example, the effect of an actor's band can be achieved through face exchange.

현재 얼굴 교환은 소스 이미지의 얼굴 특징을 타겟 이미지 내의 얼굴 영역으로 마이그레이션하여 얼굴 포즈 정렬을 수행하여 구현하고 있는 바, 소스 이미지와 타켓 이미지의 외관 차이가 큰 장면에서 얼굴 교환의 성능이 저하되는 문제점이 있다.Currently, face exchange is implemented by migrating the facial features of the source image to the face region in the target image and performing face pose alignment. have.

본 발명은 이미지 처리 방법을 제공한다. The present invention provides an image processing method.

제1 양태에 있어서, 상기 방법은, 제1 얼굴 이미지의 제1 얼굴 특징을 획득하고, 상기 제1 얼굴 이미지는 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득하는 단계; 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 단계 - 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭됨 -; 및 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득하는 단계를 포함하는 이미지 처리 방법을 제공한다. According to the first aspect, the method further comprises: obtaining a first facial feature of a first facial image, wherein the first facial image is obtained based on the facial feature of a source image and a target image; obtaining a second facial feature by mapping the first facial feature, wherein a distribution of at least some of the second facial features matches a feature distribution of the target image; and obtaining a target face image according to the first face image and the second face feature.

본 발명의 임의의 한 실시예에 있어서, 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 단계는, 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나를 획득하는 단계; 특징 인코딩 네트워크를 통해 상기 제1 얼굴 이미지에 대해 특징 추출 처리를 수행하여 인코딩 특징 정보를 획득하는 단계; 및 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 상기 인코딩 특징 정보에 따라 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 단계를 포함한다. In any one embodiment of the present invention, the acquiring the first facial feature of the first facial image includes: acquiring at least one of facial coordinate information and facial normal vector information of the first facial image; performing feature extraction processing on the first face image through a feature encoding network to obtain encoding feature information; and obtaining a first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information.

본 발명의 임의의 한 실시예에 있어서, 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 단계는, 특징 매핑 네트워크를 통해 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징에 대해 매핑을 수행하여 상기 픽셀의 제2 특징을 획득하는 단계를 포함하되, 상기 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징의 확률 분포 사이의 거리가 설정 조건을 만족하고; 여기서, 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징은 상기 제1 얼굴 특징에 속하고, 상기 픽셀의 제2 특징은 상기 제2 얼굴 특징에 속한다. In any one embodiment of the present invention, the step of obtaining a second facial feature by mapping the first facial feature comprises performing mapping on the first feature of a pixel in the first facial image through a feature mapping network. to obtain the second characteristic of the pixel, wherein a distance between the probability distribution of the second characteristic of the pixel and the probability distribution of the third characteristic of the corresponding pixel in the target image satisfies a setting condition; Here, a first feature of a pixel in the first facial image belongs to the first facial feature, and a second feature of the pixel belongs to the second facial feature.

본 발명의 임의의 한 실시예에 있어서, 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득하는 단계는, 특징 디코딩 네트워크를 통해 상기 제2 얼굴 특징에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득하는 단계; 및 상기 제1 얼굴 이미지 내의 상기 얼굴 영역 이외의 이미지 및 디코딩하여 획득된 상기 얼굴 영역의 이미지에 따라 상기 타겟 얼굴 이미지를 획득하는 단계를 포함한다. In any one embodiment of the present invention, the step of obtaining a target facial image according to the first facial image and the second facial feature comprises performing decoding on the second facial feature through a feature decoding network to perform a facial acquiring an image of the area; and obtaining the target face image according to an image other than the face region in the first face image and an image of the face region obtained by decoding.

본 발명의 임의의 한 실시예에 있어서, 상기 인코딩 특징 정보는 n차 특징 정보를 포함하고; 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 상기 인코딩 특징 정보에 따라 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 단계는, 상기 n차 특징 정보의 전 M차 특징 정보를 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 각각 연결하여 M차 연결 특징 정보를 획득하는 단계; 및 상기 M차 연결 특징 정보와 후차 특징 정보에 따라 상기 제1 얼굴 특징을 획득하는 단계를 포함하되, 여기서, 상기 후차 특징 정보는 상기 인코딩 특징 정보 중 상기 전 M차 특징 정보 이외의 특징 정보를 포함하며, n 및 M은 자연수이고, M<n이다.In any one embodiment of the present invention, the encoding characteristic information includes nth order characteristic information; The step of obtaining the first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information may include: all M-order features of the n-th order feature information connecting the information with at least one of face coordinate information and face normal vector information of the first face image, respectively, to obtain M-order linked feature information; and obtaining the first facial feature according to the M-order connection feature information and the subsequent feature information, wherein the subsequent feature information includes feature information other than the previous M-order feature information among the encoding feature information and n and M are natural numbers, and M<n.

본 발명의 임의의 한 실시예에 있어서, 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 단계는, 상기 M차 연결 특징 정보에 대해 매핑을 수행하여 상기 타겟 이미지의 특징 분포와 매칭되는 M차 매핑 특징 정보를 획득하는 단계를 포함하고; 특징 디코딩 네트워크를 통해 상기 제2 얼굴 특징에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득하는 단계는, 상기 M차 매핑 특징 정보와 상기 후차 특징 정보에 대해 디코딩을 수행하여 상기 얼굴 영역의 이미지를 획득하는 단계를 포함한다. In any one embodiment of the present invention, the step of obtaining the second facial feature by mapping the first facial feature comprises performing mapping on the M-order connected feature information to match the feature distribution of the target image. obtaining M-order mapping feature information; The step of obtaining an image of a face region by performing decoding on the second facial feature through a feature decoding network includes: performing decoding on the M-order mapping feature information and the subsequent feature information to obtain an image of the face region including the steps of

본 발명의 임의의 한 실시예에 있어서, 상기 방법은 상기 특징 인코딩 네트워크, 상기 특징 매핑 네트워크, 상기 특징 디코딩 네트워크에 대해 종단간 훈련을 수행하는 단계를 더 포함하되, 여기서, 각 세대 훈련에서 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크의 훈련은 특징 매핑 네트워크의 훈련과 순차적으로 수행된다. In any one embodiment of the present invention, the method further comprises performing end-to-end training on the feature encoding network, the feature mapping network, and the feature decoding network, wherein in each generation training the feature The training of the encoding network and the feature decoding network is performed sequentially with the training of the feature mapping network.

본 발명의 임의의 한 실시예에 있어서, 상기 특징 매핑 네트워크는 최적화된 마이그레이션 네트워크의 훈련을 이용하여 획득되고, 상기 최적화된 마이그레이션 네트워크는 상기 특징 매핑 네트워크와 거리 평가 네트워크를 포함하고, 상기 훈련의 네트워크 손실은, 상기 거리 평가 네트워크에 의해 결정된 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 해당 픽셀의 제3 특징의 확률 분포 사이의 차이를 가리키는 데에 사용되는 매핑 손실을 포함한다. In any one embodiment of the present invention, the feature mapping network is obtained using training of an optimized migration network, the optimized migration network comprising the feature mapping network and a distance estimation network, the network of training The loss is a mapping loss used to indicate the difference between a probability distribution of a second feature of a pixel in the first facial image determined by the distance estimation network and a probability distribution of a third feature of that pixel in the target image. include

본 발명의 임의의 한 실시예에 있어서, 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크는 외관 마이그레이션 네트워크 훈련을 이용하여 획득하고, 상기 외관 마이그레이션 네트워크는 상기 특징 인코딩 네트워크, 상기 특징 디코딩 네트워크를 포함하고, 상기 훈련의 네트워크 손실은, 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 제1 특징 사이의 차이를 가리키기 위한 제1 손실; 및 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징 사이의 차이를 가리키기 위한 제2 손실을 포함한다. In any one embodiment of the present invention, the feature encoding network and the feature decoding network are obtained using appearance migration network training, the appearance migration network comprising the feature encoding network and the feature decoding network, The network loss of training includes: a first loss for indicating a difference between a second feature and a first feature of a pixel in the first facial image; and a second loss to indicate a difference between a second characteristic of a pixel in the first facial image and a third characteristic of a corresponding pixel in the target image.

본 발명의 임의의 한 실시예에 있어서, 상기 외관 마이그레이션 네트워크는 얼굴 재구성 네트워크를 더 포함하고, 상기 얼굴 재구성 네트워크는 상기 타겟 이미지의 얼굴 특징 재구성에 따라 재구성된 얼굴 이미지를 획득하는 데에 사용되며, 상기 훈련의 네트워크 손실은, 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 얼굴 재구성 네트워크 출력의 얼굴 재구성 이미지 내의 해당 픽셀의 제4 특징 사이의 차이를 가리키기 위한 제3 손실을 더 포함한다.In any one embodiment of the present invention, the appearance migration network further comprises a face reconstruction network, wherein the face reconstruction network is used to obtain a reconstructed face image according to the facial feature reconstruction of the target image, The network loss of training further includes a third loss for indicating a difference between a second feature of a pixel in the first facial image and a fourth feature of that pixel in the facial reconstruction image of the facial reconstruction network output.

본 발명의 임의의 한 실시예에 있어서, 상기 외관 마이그레이션 네트워크는 감별 네트워크를 더 포함하고, 상기 훈련된 네트워크 손실은 상기 감별 네트워크에 의해 결정된 혼합 이미지 샘플의 픽셀 분류 결과와 상기 혼합 이미지 샘플의 라벨링 정보 사이의 차이를 가리키는 데에 사용되는 제4손실을 더 포함하며, 여기서, 상기 혼합 이미지 샘플은 상기 타겟 얼굴 이미지 내의 픽셀과 상기 타겟 이미지 또는 상기 얼굴 재구성 이미지 내의 픽셀을 혼합하여 얻은 이미지를 포함하고, 상기 라벨링 정보는 생성된 이미지 픽셀을 가리키거나 실제 이미지 픽셀을 가리킨다. In any one embodiment of the present invention, the appearance migration network further comprises a discrimination network, and the trained network loss is determined by the discrimination network according to the pixel classification result of the mixed image sample and the labeling information of the mixed image sample. a fourth loss used to indicate a difference between, wherein the mixed image sample comprises an image obtained by mixing pixels in the target facial image and pixels in the target image or the facial reconstruction image; The labeling information indicates a generated image pixel or an actual image pixel.

제2 양태에 있어서, 상기 장치는, 이미지 처리 장치를 제공하는 바, 상기 장치는, 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득된 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 데에 사용되는 제1획득 유닛; 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭하는 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 데에 사용되는 매핑 유닛; 및 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득하는 데에 사용되는 제2획득유닛을 포함한다.According to the second aspect, the apparatus provides an image processing apparatus, wherein the apparatus is used to obtain a first facial feature of a first facial image obtained based on a facial feature of a source image and a target image a first acquisition unit to be; a mapping unit used to map the first facial feature in which a distribution of at least some of the second facial features matches a feature distribution of the target image to obtain a second facial feature; and a second acquiring unit used to acquire a target facial image according to the first facial image and the second facial feature.

제3 양태에 있어서, 전자 기기를 제공하는 바, 상기 전자 기기는 프로세스에서 실행 가능한 컴퓨터 명령을 저장하기 위한 메모리와, 컴퓨터 명령을 실행할 때 본 개시내용의 임의의 실시예에 따른 이미지 처리 방법을 실행하기 위한 프로세스를 포함한다. According to a third aspect, there is provided an electronic device, wherein the electronic device includes a memory for storing computer instructions executable in a process, and executing the image processing method according to any embodiment of the present disclosure when executing the computer instructions includes a process for

제4 양태에 있어서, 컴퓨터 프로그램이 저장되어 있는 저장 매체를 제공하는 바, 상기 프로그램이 프로세서에 의해 실행될 때, 본 발명의 임의의 실시예에 따른 이미지 처리 방법을 구현한다. According to a fourth aspect, there is provided a storage medium in which a computer program is stored. When the program is executed by a processor, the image processing method according to any embodiment of the present invention is implemented.

본 발명의 하나 또는 복수의 실시예에 따른 이미지 처리 방법, 장치, 기기 및 저장 매체에 있어서, 소스 이미지의 얼굴 특징과 타겟 이미지의 얼굴 특징에 기초하여 획득된 제1 얼굴 이미지의 제1 얼굴 특징을 통해, 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하고, 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭되어 소스 이미지에서 타겟 이미지로의 외관 마이그레이션의 정확도를 향상시킬 수 있으며, 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득함에 따라 상기 타겟 얼굴 이미지 내의 얼굴 영역과 다른 영역 사이의 연속성과 일치성을 향상시킬 수 있기 때문에 상기 타겟 얼굴 이미지의 품질을 향상시킬 수 있다.In the image processing method, apparatus, apparatus and storage medium according to one or more embodiments of the present invention, a first facial feature of a first facial image obtained based on a facial feature of a source image and a facial feature of a target image to obtain a second facial feature by mapping the first facial feature, and the distribution of at least some of the second facial features matches the feature distribution of the target image, so that the accuracy of appearance migration from the source image to the target image Since the target face image is acquired according to the first facial image and the second facial feature, the continuity and consistency between the facial region and other regions in the target facial image can be improved. It can improve the quality of the face image.

전술한 일반적인 설명과 이하의 상세한 설명은 단지 예시적이고 해석적인 것이며 본 발명을 제한하려는 의도가 아님을 이해해야 한다.It is to be understood that the foregoing general description and the following detailed description are illustrative and interpretative only and are not intended to limit the invention.

본 명세서에 포함되어 그 일부를 구성하는 첨부 도면은 본 명세서와 일치하는 실시예를 도시한 것으로, 명세서와 함께 본 발명의 원리를 설명하기 위한 것이다.
도 1은 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법의 흐름도를 도시한다.
도 2는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 얼굴 특징을 획득하기 위한 방법의 모식도를 도시한다.
도 3은 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 얼굴 특징 매핑 방법의 모식도를 도시한다.
도 4a는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 특징 인코딩 네트워크와 특징 디코딩 네트워크를 훈련하기 위한 방법의 모식도를 도시한다.
도 4b는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 특징 인코딩 네트워크 및 특징 디코딩 네트워크를 훈련하기 위한 다른 방법의 모식도를 도시한다.
도 4c는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 특징 인코딩 네트워크와 특징 디코딩 네트워크를 훈련하기 위한 또 다른 방법의 모식도를 도시한다.
도 5는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 장치의 구조 모식도를 도시한다.
도 6은 본 발명의 적어도 하나의 실시예에 따른 전자 기기의 구조도를 도시한다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of this specification, show embodiments consistent with this specification, and together with the specification, serve to explain the principles of the invention.
1 shows a flowchart of an image processing method according to at least one embodiment of the present invention.
2 is a schematic diagram of a method for acquiring facial features in an image processing method according to at least one embodiment of the present invention.
3 is a schematic diagram illustrating a facial feature mapping method among image processing methods according to at least one embodiment of the present invention.
4A is a schematic diagram of a method for training a feature encoding network and a feature decoding network in an image processing method according to at least one embodiment of the present invention.
4B is a schematic diagram of another method for training a feature encoding network and a feature decoding network in an image processing method according to at least one embodiment of the present invention.
4C is a schematic diagram of another method for training a feature encoding network and a feature decoding network in an image processing method according to at least one embodiment of the present invention.
5 is a structural schematic diagram of an image processing apparatus according to at least one embodiment of the present invention.
6 is a structural diagram of an electronic device according to at least one embodiment of the present invention.

예들이 본 명세서에서 상세히 설명될 것이며, 그 예시들은 도면들에 나타나 있다. 이하의 설명들이 도면들을 포함할 때, 상이한 도면들에서의 동일한 번호들은 달리 지시되지 않는 한 동일하거나 유사한 요소들을 지칭한다. 하기 예들에 설명된 실시예들은 본 개시내용과 부합하는 모든 실시예를 나타내지 않는다. 오히려, 이들은 본 개시내용의 일부 양태들과 부합하며 첨부된 청구항들에 상술된 바와 같은 장치들 및 방법들의 예들에 불과하다.Examples will be described in detail herein, examples of which are shown in the drawings. When the following description includes drawings, like numbers in different drawings refer to the same or similar elements unless otherwise indicated. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, these are merely examples of apparatuses and methods consistent with some aspects of the present disclosure and as detailed in the appended claims.

본 발명에서, "및/또는” 용어는 연관 객체를 설명하는 연관 관계일 뿐이고, 3 가지 관계가 존재할 수 있음을 나타내며, 예를 들면, A 및/또는 B는 A만 존재, A와 B가 동시에 존재, B만 존재하는 3 가지 경우를 나타낼 수 있다. 또한, 본 발명에서 사용되는 "적어도 하나” 용어는 여러 가지 중의 임의의 한 가지 또는 여러 가지 중의 적어도 두 가지의 임의의 조합을 나타내는 바, 예를 들면 A, B, C에서의 적어도 하나를 포함하는 것은, A, B 및 C로 구성된 집합에서 선택한 임의의 하나 또는 복수의 요소를 포함한다는 것을 나타낼 수 있다.In the present invention, the term "and/or" is merely an association relationship that describes an associated object, and indicates that three relationships may exist, for example, A and/or B is only A, A and B are simultaneously It can represent three cases in which only B exists and B. In addition, the term "at least one" used in the present invention denotes any one of several or any combination of at least two of several, e.g. For example, including at least one of A, B, and C may indicate that it includes any one or a plurality of elements selected from the set consisting of A, B, and C.

본 발명의 실시예는 다수의 다른 범용 또는 전용 컴퓨팅 시스템, 환경 및/또는 구성과 함께 컴퓨터 시스템/서버에 적용될 수 있다. 이러한 컴퓨팅 시스템, 환경 및/또는 구성의 예시는, 개인용 컴퓨터 시스템, 서버 컴퓨터 시스템, 씬 클라이언트, 씩 클라이언트, 핸드헬드 또는 랩톱 장치, 마이크로프로세서 기반 시스템, 셋톱 박스, 프로그래밍 가능한 소비자 전자 제품, 네트워크로 연결된 개인용 컴퓨터, 소형 컴퓨터 시스템, 대형 컴퓨터 시스템과 위의 임의의 시스템을 포함한 분산 클라우드 컴퓨팅 기술 환경을 포함하되 이에 한정되지 않는다. Embodiments of the present invention may be applied to computer systems/servers along with many other general purpose or dedicated computing systems, environments, and/or configurations. Examples of such computing systems, environments and/or configurations are personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, microprocessor based systems, set top boxes, programmable consumer electronics, networked Distributed cloud computing technology environments including, but not limited to, personal computers, small computer systems, large computer systems and any of the above.

도 1은 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법의 흐름도를 도시한다. 도 1에 도시된 바와 같이 상기 방법은 단계 101 내지 103을 포함한다.1 shows a flowchart of an image processing method according to at least one embodiment of the present invention. As shown in FIG. 1 , the method includes steps 101 to 103 .

단계 101에서, 제1 얼굴 이미지의 제1 얼굴 특징을 획득한다.In step 101, a first facial feature of a first facial image is acquired.

본 발명의 실시예에서 상기 획득된 제1 얼굴 이미지의 제1 얼굴 특징은 구조, 무늬와 같은 다양한 양태의 특징을 포함할 수 있는 바, 예를 들어 상기 제1 얼굴 이미지에 포함되는 얼굴의 형상, 크기 및 방향 등 특징을 포함할 수 있다. In an embodiment of the present invention, the first facial feature of the obtained first facial image may include various aspects such as a structure and a pattern, for example, a shape of a face included in the first facial image; It may include features such as size and orientation.

여기서, 상기 제1 얼굴 이미지는 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득한 것이다. Here, the first face image is obtained based on the facial features of the source image and the target image.

본 발명의 실시예에서 상기 소스 이미지와 타겟 이미지는 모두 얼굴 영역을 포함하고, 상기 제1 얼굴 이미지는 상기 소스 이미지 내의 얼굴 영역의 얼굴 특징과 상기 타겟 이미지에 따라 생성될 수 있다. 예를 들어, 소스 이미지 내의 얼굴 영역의 얼굴 특징을 타겟 이미지 내의 얼굴 영역으로 마이그레이션하고, 또한 소스 이미지 내의 얼굴 포즈와 타겟 이미지 내의 얼굴 포즈를 정렬하여 제1 얼굴 이미지를 획득할 수 있다. 여기서, 소스 이미지의 얼굴 특징은 상기 소스 이미지 내의 얼굴 영역에 대해 특징 추출을 수행함으로써 획득될 수 있다. 예시적으로, 상기 소스 이미지와 상기 타겟 이미지 내의 얼굴 영역은 사람의 얼굴 영역, 동물의 얼굴 영역 또는 다른 가상 객체의 지정된 영역 등일 수 있다.In an embodiment of the present invention, both the source image and the target image include a face region, and the first face image may be generated according to a facial feature of the face region in the source image and the target image. For example, the first facial image may be obtained by migrating facial features of the facial region in the source image to the facial region in the target image, and aligning the facial pose in the source image with the facial pose in the target image. Here, the facial features of the source image may be obtained by performing feature extraction on a facial region in the source image. Exemplarily, the face regions in the source image and the target image may be a human face region, an animal face region, or a designated region of another virtual object.

소스 이미지와 타겟 이미지의 얼굴 특징에 기초하여 상기 제 1 얼굴 이미지를 획득하는 경우, 상기 소스 이미지와 타겟 이미지의 피부색 또는 조명 차이가 큰 경우, 생성 된 제1 얼굴 이미지 내의 얼굴 영역과 상기 얼굴 영역 이외의 다른 영역은 외관상 큰 차이가 존재할 수 있는 바, 즉 시각적으로 불연속성을 가질 수 있다.When the first face image is obtained based on the facial features of the source image and the target image, when the skin color or lighting difference between the source image and the target image is large, other than the face region and the face region in the generated first face image Other areas of , may have large differences in appearance, i.e., may have a visual discontinuity.

단계 102에서, 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하고, 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭된다.In step 102, a second facial feature is obtained by mapping the first facial feature, and a distribution of at least some of the second facial features is matched with a feature distribution of the target image.

상기 소스 이미지와 타겟 이미지의 외관 차이가 큰 경우, 일반적으로 두 이미지의 특징 분포가 매칭되지 않으며, 포지셔닝 정보의 안내가 없이 매핑하면 외관이 모호하게 마이그레이션될 수 있다. 여기서, 상기 포지셔닝 정보는 매핑된 특징의 위치를 가리키는 정보이다.When the appearance difference between the source image and the target image is large, the feature distribution of the two images does not generally match, and if the mapping is performed without guidance of positioning information, the appearance may be vaguely migrated. Here, the positioning information is information indicating the location of the mapped feature.

본 발명의 실시예에서, 상기 제1 얼굴 특징을 매핑함으로써, 획득된 제2 얼굴 특징 중 적어도 일부의 분포가 상기 타겟 이미지의 특징 분포와 매칭되어 소스 이미지의 얼굴 특징에서 타겟 이미지로의 외관 마이그레이션의 정확도를 향상할 수 있다. In an embodiment of the present invention, by mapping the first facial feature, the distribution of at least some of the obtained second facial features is matched with the feature distribution of the target image, so that the appearance migration from the facial feature of the source image to the target image is obtained. accuracy can be improved.

단계103에서, 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득한다.In step 103, a target facial image is acquired according to the first facial image and the second facial feature.

상기 타겟 얼굴 이미지는 소스 이미지 내의 얼굴 영역의 얼굴 특징을 타겟 이미지의 얼굴 영역으로 마이그레이션한 후, 특징 매핑을 통해 마이그레이션이 발생된 영역의 특징을 더욱 최적화한 후 형성된 이미지로 이해될 수 있다. 시각적으로 타겟 얼굴 이미지는 소스 이미지 내의 얼굴 영역의 특징뿐만 아니라 타겟 이미지 내의 얼굴 영역 이외의 영역의 특징을 반영할 수 있어 자연스러운 얼굴 교체 효과를 얻을 수 있다.The target face image may be understood as an image formed after migrating facial features of the face region in the source image to the face region of the target image and further optimizing the features of the region where the migration occurs through feature mapping. Visually, the target face image can reflect not only the features of the face region in the source image, but also features of regions other than the face region in the target image, so that a natural face replacement effect can be obtained.

특징 매핑을 통해 상기 타겟 이미지의 특징 분포에 더 가까운 제2 얼굴 특징을 획득하므로, 상기 제2 얼굴 특징에 따라 획득된 얼굴 영역의 이미지는 색 공간 분포와 같은 외관 상에서 타겟 이미지의 특징 분포에 더 가깝기 때문에 상기 제2 얼굴 특징 및 상기 제1 얼굴 이미지에 따라 획득된 타겟 얼굴 이미지는 얼굴 영역의 외관이 기타 영역과 불일치하고 불연속되는 문제점이 제거된다. Since the second facial feature closer to the feature distribution of the target image is obtained through feature mapping, the image of the facial region obtained according to the second facial feature is closer to the feature distribution of the target image in appearance, such as color space distribution. Therefore, in the target face image obtained according to the second facial feature and the first face image, the problem that the appearance of the face region is inconsistent and discontinuous with other regions is eliminated.

본 발명의 실시예에서, 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득된 제1 얼굴 이미지의 제1 얼굴 특징을 통해, 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하고, 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭하여 소스 이미지 내의 얼굴 영역의 얼굴 특징에서 타겟 이미지로의 외관의 마이그레이션의 정확도를 향상시킬 수 있으며; 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득함으로써 상기 타겟 얼굴 이미지 내의 얼굴 영역과 다른 영역의 연속성과 일치성을 향상시킬 수 있기 때문에 상기 타겟 얼굴 이미지의 품질을 향상시킬 수 있다.In an embodiment of the present invention, the second facial feature is obtained by mapping the first facial feature through the first facial feature of the first facial image obtained based on the facial feature of the source image and the target image, and the second facial feature is obtained. 2 the distribution of at least some of the facial features matches the feature distribution of the target image to improve the accuracy of migration of the appearance from the facial features of the facial region in the source image to the target image; By acquiring a target face image according to the first facial image and the second facial feature, it is possible to improve the continuity and consistency of the facial region and other regions in the target facial image, thereby improving the quality of the target facial image. have.

일부 실시예에서, 제1 얼굴 이미지의 제1 얼굴 특징은 다음과 같은 방식으로 획득할 수 있다.In some embodiments, the first facial feature of the first facial image may be acquired in the following manner.

먼저, 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어 도 하나를 획득한다.First, at least one of face coordinate information and face normal vector information of the first face image is acquired.

여기서, 상기 얼굴 좌표 정보는, 얼굴 포인트 클라우드의 투영 정규화 좌표 코드 (Projected Normalized Coordinate Code, PNCC) 정보와 같은 이미지에 포함된 얼굴의 기하학적 정보를 특성화하는 데에 사용되고; 상기 얼굴 법선 벡터 정보는 얼굴의 조명 방향 정보를 특성화하는 데에 사용된다.Here, the face coordinate information is used to characterize geometrical information of a face included in an image, such as Projected Normalized Coordinate Code (PNCC) information of a face point cloud; The face normal vector information is used to characterize face illumination direction information.

한 예시에서, 3차원 고밀도 얼굴 정렬 (3D Dense Face Alignment, 3DDFA)을 통해 얼굴 포인트 클라우드를 획득하는 것과 같이， 3D 얼굴 피팅 모델을 사용하여 얼굴을 포함하는 이미지로부터 얼굴 포인트 클라우드를 획득할 수 있으며, 얼굴 포인트 클라우드를 2차원 이미지에 렌더링하고 투영하여 얼굴 좌표 정보 및/또는 얼굴 법선 벡터 정보를 획득할 수 있다. In one example, such as obtaining a face point cloud through 3D Dense Face Alignment (3DDFA), a face point cloud can be obtained from an image including a face using a 3D face fitting model, Face coordinate information and/or face normal vector information may be obtained by rendering and projecting a face point cloud on a two-dimensional image.

다른 한 양태에서, 특징 인코딩 네트워크를 통해 제1 얼굴 이미지 내의 얼굴 영역에 대해 특징 추출 처리를 수행하여 인코딩 특징 정보를 획득할 수 있는 바, 즉, 상기 특징 코딩 네트워크를 통해 상기 제1 얼굴 이미지를 인코딩하여 고차원 특징을 획득 할 수 있다.In another aspect, the encoding feature information may be obtained by performing feature extraction processing on the face region in the first face image through the feature encoding network, that is, encoding the first face image through the feature coding network. Thus, high-order features can be obtained.

얼굴 좌표 정보 및/또는 얼굴 법선 벡터 정보를 상기 인코딩 특징 정보와 결합하여 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득할 수 있다.The first facial feature of the first face image may be obtained by combining face coordinate information and/or face normal vector information with the encoding feature information.

본 발명의 실시예에서, 제1 얼굴 이미지의 얼굴 좌표 정보 및/또는 얼굴 법선 벡터 정보를 특징 인코딩 네트워크에 의해 획득된 인코딩 특징 정보와 결합하여 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하고; 상기 제1 얼굴 특징을 상기 타겟 이미지의 특징 분포에 부합되는 제2 얼굴 특징으로 매핑함으로써, 생성된 타겟 얼굴 이미지의 얼굴 기하학적 정보 및/또는 조명 방향 정보의 분포가 타겟 이미지와 일치하도록 하여, 상기 소스 이미지와 상기 타겟 이미지 사이의 피부색 또는 조명에 큰 차이가 있는 경우, 제1 얼굴 이미지 내의 얼굴 영역의 외관이 다른 영역과 불일치, 불연속되는 문제점을 제거할 수 있다. In an embodiment of the present invention, combining face coordinate information and/or face normal vector information of a first face image with encoding feature information obtained by a feature encoding network to obtain a first facial feature of the first face image; By mapping the first facial feature to a second facial feature matching the feature distribution of the target image, the distribution of facial geometric information and/or lighting direction information of the generated target facial image is consistent with the target image, When there is a large difference in skin color or illumination between the image and the target image, it is possible to remove the problem that the appearance of the face region in the first face image is inconsistent and discontinuous with other regions.

일부 실시예에서, 상기 특징 인코딩 네트워크에서 출력되는 인코딩 특징 정보는 n 차 특징 정보를 포함하는 바, 즉, n개의 서로 다른 사이즈의 특징 맵이 출력된다. 상기 n차 특징 정보 중 전 M차 특징 정보를 통해 각각 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 연결하여 M차 연결 특징 정보를 획득하고, 또한 M차 연결 특징 정보와 후차 특징 정보에 따라 상기 제1 얼굴 특징을 획득한다. 여기서, 상기 후차 특징 정보는 상기 n차 특징 정보 중 전 M차 특징 정보 이외의 특징 정보를 포함한다. 상기 실시예에서, n, M은 자연수이고, M<n이다.In some embodiments, the encoding feature information output from the feature encoding network includes nth-order feature information, that is, n different size feature maps are output. Among the nth-order characteristic information, M-th order connection characteristic information is obtained by linking with at least one of face coordinate information and face normal vector information of the first face image, respectively, through M-order characteristic information, and also M-order connection characteristic information and The first facial feature is acquired according to the subsequent feature information. Here, the second-order feature information includes feature information other than the previous M-order feature information among the n-th feature information. In the above embodiment, n and M are natural numbers, and M<n.

도 2는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법에 중 제1 얼굴 특징을 획득하는 방법의 모식도를 도시한다. 도 2에 도시된 바와 같이, 입력 이미지X(예를 들어, 제1 얼굴 이미지

또는 타겟 이미지

)에 대해, 한 방면으로는, 특징 인코딩 네트워크(201)를 통해 상기 입력 이미지의 인코딩 특징 정보를 획득하고, 상기 인코딩 특징 정보는 4차 특징 정보를 포함하며; 다른 한 방면으로는, 3차원 피팅 모델(202)을 통해 상기 입력 이미지에 포함된 얼굴의 얼굴 포인트 클라우드를 획득하고, 상기 얼굴 포인트 클라우드를 렌더링하여 PNCC 정보 및/또는 법선 벡터 정보를 획득하고, 또한 상기 PNCC 정보 및/또는 법선 벡터 이미지를 투영하여 PNCC 이미지 및/또는 법선 벡터 이미지를 획득한다. 이어서, 상기 PNCC 이미지 및/또는 법선 벡터 이미지를 각각 전 3차 특징 정보와 매칭하는 사이즈로 스케일링한 후, 스케일링된 상기 PNCC 이미지 및/또는 법선 벡터 이미지가 각각 전 3차 특징 정보와 연결되어 연결 특징 정보

를 획득한다. 연결 특징 정보

및 4차 특징 정보

를 연결하여 상기 입력 이미지의 제1 얼굴 특징을 형성한다. 도 2에 도시된 제1 얼굴 특징을 획득하기 위한 구조는 감지 인코더라고 칭한다. 2 is a schematic diagram illustrating a method of acquiring a first facial feature in an image processing method according to at least one embodiment of the present invention. As shown in Figure 2, the input image X (eg, the first face image

or target image

), in one aspect, obtain encoding characteristic information of the input image through a characteristic encoding network 201, wherein the encoding characteristic information includes quaternary characteristic information; In another aspect, a face point cloud of a face included in the input image is obtained through a three-dimensional fitting model 202, and PNCC information and/or normal vector information is obtained by rendering the face point cloud, and also A PNCC image and/or a normal vector image is obtained by projecting the PNCC information and/or a normal vector image. Then, the PNCC image and/or the normal vector image are each scaled to a size matching the full tertiary feature information, and then the scaled PNCC image and/or the normal vector image are respectively connected to the full tertiary feature information to connect features Information

to acquire Connection feature information

and quaternary characteristic information

to form a first facial feature of the input image. The structure for acquiring the first facial feature shown in FIG. 2 is called a sensory encoder.

일부 실시예에서, 특징 매핑 네트워크를 통해 제1 얼굴 특징에서 제2 얼굴 특징으로 매핑할 수 있다. In some embodiments, a first facial feature to a second facial feature may be mapped via a feature mapping network.

일 예시에서, 특징 매핑 네트워크를 통해 상기 제1 얼굴 이미지 픽셀의 제1 얼굴 특징에 대해 매핑을 수행하여, 상기 픽셀에 대응하는 제2 특징을 획득하고, 상기 픽셀에 대응하는 제2 특징의 확률 분포와 상기 타겟 이미지 내의 해당 픽셀의 제3 특징의 확률 분포 사이의 거리가 설정 조건을 만족하고; 여기서, 상기 제1 얼굴 이미지 내의 픽셀에 대응하는 제1 특징은 제1 얼굴 특징에 속하고, 상기 픽셀에 대응하는 제2 특징은 상기 제2 얼굴 특징에 속한다. In one example, mapping is performed on the first facial feature of the first facial image pixel through a feature mapping network to obtain a second feature corresponding to the pixel, and a probability distribution of the second feature corresponding to the pixel and a distance between the probability distribution of the third feature of the corresponding pixel in the target image satisfies a setting condition; Here, a first feature corresponding to a pixel in the first facial image belongs to a first facial feature, and a second feature corresponding to the pixel belongs to the second facial feature.

상기 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 해당 픽셀의 제3 특징의 확률 분포 사이의 거리가 설정 조건을 만족하는 것은, 상기 거리가 설정된 임계값보다 작거나 또는 상기 거리가 최소에 달하는 것을 포함한다. 여기서, 상기 설정된 임계값은 응용 수요에 따라 설정될 수 있으며, 본 발명의 실시예에서는 이에 대해 제한하지 않는다.If the distance between the probability distribution of the second feature of the pixel and the probability distribution of the third feature of the corresponding pixel in the target image satisfies a preset condition, the distance is less than a preset threshold value or the distance reaches a minimum include that Here, the set threshold may be set according to application demand, and the embodiment of the present invention is not limited thereto.

일부 실시예에서, 상기 제1 얼굴 특징에 포함된 일부 특징을 타겟 이미지의 특징 분포에 부합되는 제2 얼굴 특징으로 매핑할 수 있고, 상기 제1 얼굴 특징 중의 다른 특징을 직접 전송하여 동등한 특징을 획득한다. In some embodiments, some features included in the first facial feature may be mapped to a second facial feature that matches a feature distribution of a target image, and another feature of the first facial feature is directly transmitted to obtain an equivalent feature do.

예를 들어, 상기 제1 얼굴 특징 정보가 M차 연결 특징 정보와 후차 특징 정보를 포함하는 경우, M차 연결 특징 정보에 대해 매핑을 수행하여 타겟 이미지의 특징 분포에 부합되는M차 매핑 특징 정보를 획득할 수 있으며; 또한, 상기 후차 특징 정보를 직접 전송하여 획득된 제2 얼굴 특징은 M차 매핑 특징 정보 및 후차 특징 정보를 포함한다. For example, when the first facial feature information includes M-order connected feature information and subsequent feature information, mapping is performed on the M-order connected feature information to obtain M-order mapping feature information that matches the feature distribution of the target image. can be obtained; In addition, the second facial feature obtained by directly transmitting the subsequent feature information includes M-order mapping feature information and subsequent feature information.

도 3은 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 제1 얼굴 특징의 매핑 방법의 모식도를 도시한다.3 is a schematic diagram illustrating a first facial feature mapping method among image processing methods according to at least one embodiment of the present invention.

본 예시에서 픽셀의 특징을 특징 벡터

으로 표시하고, 여기서

는 인코딩된 특징 정보를 나타내는 바, 예를 들어, 콘볼루션 신경망을 이용하여 픽셀 공간을 은닉층 공간으로 인코딩하여 얻은 k차원 특징 정보이고,

는 3차원 PNCC 특징과 같은 얼굴 좌표 정보를 나타내고,

은 3차원 법선 벡터 정보와 같은 법선 벡터 정보를 나타낸다. 예를 들어, 제1 얼굴 이미지

내의 픽셀의 제1 특징은 제1 특징 벡터

로 표시될 수 있고, 상기 픽셀의 제2 특징은 제2 특징 벡터

로 표시될 수 있으며, 상기 타겟 이미지

내의 픽셀의 제3 특징은 제3 특징 벡터

로 표시될 수 있다.In this example, the feature vector of the pixel

marked as, where

represents the encoded feature information, for example, k-dimensional feature information obtained by encoding the pixel space into the hidden layer space using a convolutional neural network,

represents face coordinate information such as three-dimensional PNCC features,

denotes normal vector information such as 3D normal vector information. For example, the first face image

The first feature of the pixel in the first feature vector

may be represented as , wherein the second feature of the pixel is a second feature vector

may be displayed as, and the target image

the third feature of the pixel in the third feature vector

can be displayed as

도 3에 도시된 바와 같이, 우선 매핑 함수

를 통해 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징 벡터

에 대해 매핑을 수행하여 픽셀 {m,n}의 제2 특징 벡터

를 획득한다. 여기서 i는 특징의 차수를 나타내고, {m,n}은 픽셀 위치를 나타내며, 여기서 m은 행 번호, n은 열 번호,

는 제1 얼굴 이미지의 제i차 특징 정보를 나타내며,

는 제1 얼굴 이미지의 제i 차 특징 정보에 대응하는 매핑 특징 정보를 가리킨다.As shown in Figure 3, first the mapping function

a first feature vector of a pixel in the first facial image via

The second feature vector of the pixel {m,n} by performing mapping on

to acquire where i denotes the order of the feature, {m,n} denotes the pixel position, where m is the row number, n is the column number,

represents the ith characteristic information of the first face image,

denotes mapping feature information corresponding to the i-th feature information of the first face image.

일 예시에서, 평가 함수

를 도입하여 이미지 내의 픽셀의 특징 벡터의 확률 분포를 결정할 수 있으며, 따라서 제2 특징 벡터

와 상기 타겟의 이미지 내의 픽셀에 대응하는 제3 특징 벡터

사이의 와서스테인(Wasserstein) 거리를 결정할 수 있는 바, 즉 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징 벡터의 확률 분포와 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징 벡터의 확률 분포 사이의 거리를 결정할 수 있다. 매핑 함수

를 조정하여 제2 특징 벡터의 확률분포와 제3 특징 벡터의 확률분포 사이의 거리가 설정된 조건을 만족시키도록 하는 바, 예를 들어 설정된 임계값보다 작거나 또는 상기 거리를 최소치로 조정하여 제2 특징 벡터의 분포가 타겟 이미지의 제3 특징 벡터의 분포에 컨버징되도록 함으로써, 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징 벡터의 최적화 마이그레이션을 구현한다.In one example, the evaluation function

can determine the probability distribution of the feature vectors of pixels in the image by introducing

and a third feature vector corresponding to a pixel in the image of the target.

It is possible to determine a Wasserstein distance between can decide mapping function

is adjusted so that the distance between the probability distribution of the second feature vector and the probability distribution of the third feature vector satisfies a set condition, for example, it is smaller than a set threshold or adjusts the distance to a minimum value By allowing the distribution of the feature vector to be converged with the distribution of the third feature vector of the target image, an optimized migration of the first feature vector of the pixel in the first face image is realized.

상기 과정은 또한 수학식 (1)을 통해 미니맥스(minimax) 문제로 표시될 수 있다:The above process can also be expressed as a minimax problem via Equation (1):

여기서,

는 기대값을 나타내고,

는 1차 립쉬츠(1-Lipschitz)의 제약을 받으며, 이미지 내의 픽셀의 특징 벡터의 확률 분포를 획득하기 위함이고,

는 제1 얼굴 이미지

이 타겟 이미지

로 변환한 기초 상에, 픽셀의 제1 특징 벡터에 대해 매핑을 수행하는 것을 가리킨다.here,

represents the expected value,

is constrained by the first order Lipschitz (1-Lipschitz), to obtain the probability distribution of the feature vector of the pixel in the image,

is the first face image

this target image

It refers to performing mapping on the first feature vector of the pixel on the basis of the transformation.

본 발명의 실시예에서, 상기 매핑 함수

의 기능은 특징 매핑 네트워크를 통해 구현될 수 있고, 상기 평가 함수

의 기능은 평가 네트워크를 통해 구현될 수 있다.In an embodiment of the present invention, the mapping function

The function of may be implemented through a feature mapping network, and the evaluation function

The function of can be implemented through an evaluation network.

상기 제1 얼굴 이미지의 제1 얼굴 특징에 대해 매핑을 수행한 후, 특징 디코딩 네트워크를 통해 상기 제2 얼굴 특징에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득할 수 있고; 또한, 상기 제1 얼굴 이미지 내의 상기 얼굴 영역 이외의 이미지 및 디코딩하여 획득된 상기 얼굴 영역의 이미지에 따라 상기 타겟 얼굴 이미지를 획득한다. after performing mapping on the first facial feature of the first facial image, performing decoding on the second facial feature through a feature decoding network to obtain an image of a facial region; In addition, the target face image is obtained according to an image other than the face region in the first face image and an image of the face region obtained by decoding.

예를 들어, 매핑을 통해 획득된 제2 얼굴 특징이 M차 매핑 특징 정보 및 후차 특징 정보를 포함하는 경우, 상기 M차 매핑 특징 정보와 상기 후차 특징 정보에 대해 디코딩을 수행하여 상기 얼굴 영역의 이미지를 획득할 수 있다.For example, when the second facial feature obtained through mapping includes M-order mapping feature information and subsequent feature information, decoding is performed on the M-order mapping feature information and the subsequent feature information to obtain an image of the face region can be obtained.

도 4a는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법의 모식도를 도시한다. 도 4a에 도시된 바와 같이, 소스 이미지

와 타겟 이미지

에 따라 제1 얼굴 이미지

을 획득하고, 상기 제1 얼굴 이미지

을 감지 인코더(401)에 입력하여 제1 얼굴 특징

를 획득한다. 4A is a schematic diagram of an image processing method according to at least one embodiment of the present invention. As shown in Figure 4a, the source image

and target image

1st face image according to

to obtain the first face image

to the sensory encoder 401 to input the first facial feature

to acquire

특징 매핑 네트워크

를 사용하여 제1 얼굴 특징 중의

에 대해 매핑을 수행하여 매핑 특징

을 획득하고, 또한, 얼굴 특징

에 대해 직접 전송하며(동등 특징으로 매핑); 얼굴 디코더(402)(특징 디코딩 네트워크)는 매핑 특징

및 얼굴 특징

에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득하고, 상기 얼굴 영역의 이미지 및 상기 제1 얼굴 이미지의 상기 얼굴 영역 이외의 이미지에 따라 타겟 얼굴 이미지

을 획득한다.feature mapping network

of the first facial feature using

Mapping features by performing a mapping on

to obtain, and also facial features

send directly to (map to equivalence features); Face decoder 402 (feature decoding network) is a feature mapping feature

and facial features

performing decoding to obtain an image of a face region, and a target face image according to the image of the face region and an image other than the face region of the first face image

to acquire

다음은 본 발명의 실시예에서 응용하는 특징 인코딩 네트워크, 특징 매핑 네트워크, 특징 디코딩 네트워크의 훈련 과정을 설명한다.The following describes the training process of a feature encoding network, a feature mapping network, and a feature decoding network applied in an embodiment of the present invention.

본 발명의 실시예에서, 상기 특징 인코딩 네트워크, 상기 특징 매핑 네트워크, 상기 특징 디코딩 네트워크에 대해 종단간 훈련을 수행한다. 여기서, 각 세대 훈련에서 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크의 훈련과 특징 매핑 네트워크의 훈련은 순서대로 수행된다. 즉, 우선 한 세대 훈련에서, 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크에 대해 공동으로 훈련을 수행한 후, 상기 특징 매핑 네트워크에 대해 훈련을 수행하며, 이런 방법으로 각 세대 훈련을 번갈아 가며 수행한다.In an embodiment of the present invention, end-to-end training is performed on the feature encoding network, the feature mapping network, and the feature decoding network. Here, in each generation training, training of the feature encoding network and the feature decoding network and training of the feature mapping network are sequentially performed. That is, first, in one-generation training, the feature encoding network and the feature decoding network are jointly trained, and then the feature mapping network is trained, and in this way, each generation training is alternately performed.

먼저 특징 매핑 네트워크의 훈련 과정을 설명한다.First, the training process of the feature mapping network will be described.

상기 특징 매핑 네트워크는 최적화된 마이그레이션 네트워크의 훈련을 이용하여 획득된 것이고, 상기 최적화된 마이그레이션 네트워크는 상기 특징 매핑 네트워크와 거리 평가 네트워크를 포함하며, 여기서, 상기 거리 평가 네트워크는 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징의 확률 분포 사이의 거리를 결정하는 데에 사용된다.The feature mapping network is obtained using training of an optimized migration network, wherein the optimized migration network includes the feature mapping network and a distance estimation network, wherein the distance estimation network comprises pixels in the first face image. is used to determine the distance between the probability distribution of the second feature of , and the probability distribution of the third feature of the corresponding pixel in the target image.

상기 특징 매핑 네트워크 훈련을 수행하는 네트워크 손실은, 상기 거리 평가 네트워크에 의해 결정된 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 해당 픽셀의 제3 특징의 확률 분포 사이의 차이를 가리키는 데에 사용되는 매핑 손실을 포함한다.The network loss performing the feature mapping network training is the difference between the probability distribution of the second feature of the pixel in the first face image determined by the distance estimation network and the probability distribution of the third feature of the corresponding pixel in the target image. It contains the mapping loss used to point to .

일 예시에서, 상기 매핑 손실을 최소화함으로써 상기 특징 매핑 네트워크의 파라미터를 조정할 수 있다.In one example, parameters of the feature mapping network may be adjusted by minimizing the mapping loss.

일 예시에서, 특징 매핑 네트워크와 상기 거리 평가 네트워크에 대해 번갈아 가며 훈련할 수 있다. 예를 들어, 한 세대 훈련에서는 상기 매핑 손실을 최대화하는 것을 통해 상기 거리 평가 네트워크의 네트워크 파라미터를 조정한 후, 상기 매핑 손실을 최소화하는 것을 통해 상기 특징 매핑 네트워크의 네트워크 파라미터를 조정하며, 이런 방법으로 각 세대 훈련을 번갈아 가며 수행한다. In one example, it is possible to alternately train the feature mapping network and the distance estimation network. For example, in one-generation training, the network parameters of the distance estimation network are adjusted by maximizing the mapping loss, and then the network parameters of the feature mapping network are adjusted by minimizing the mapping loss, in this way. Take turns training each generation.

상기 매핑 손실은 수학식 (2)로 표현할 수 있다.The mapping loss can be expressed by Equation (2).

여기서 i는 특징의 차수, m은 픽셀의 행 번호, n은 픽셀의 열 번호, M은 총 행수, N은 총 열수,

는 기대값,

는 제1 얼굴 이미지

이 타겟 이미지

로 변환한 기초 상에, 제1 얼굴 이미지의 픽셀｛m,n｝의 제i 차 특징 벡터에 대해 매핑을 수행하는 것을 가리킨다.where i is the order of the feature, m is the row number of the pixel, n is the column number of the pixel, M is the total number of rows, N is the total number of columns,

is the expected value,

is the first face image

this target image

It indicates that mapping is performed on the i-th feature vector of pixels m,n , of the first face image, on the basis of the transformation.

다음은 특징 인코딩 네트워크와 특징 디코딩 네트워크의 훈련 과정을 설명한다. 상기 특징 인코딩 네트워크와 특징 디코딩 네트워크의 훈련은 세 가지로 나눌 수 있는데, 첫 번째 경우는 다른 네트워크의 도움 없이 상기 특징 인코딩 네트워크와 특징 디코딩 네트워크 자체를 이용하여 훈련하는 것이다. 이 경우, 상기 특징 인 코딩 네트워크와 상기 특징 디코딩 네트워크를 공동으로 외관 마이그레이션 네트워크라고 칭하며, 훈련을 위한 네트워크 손실은:The following describes the training process of the feature encoding network and the feature decoding network. The training of the feature encoding network and the feature decoding network can be divided into three types. In the first case, training is performed using the feature encoding network and the feature decoding network itself without the help of other networks. In this case, the feature-in coding network and the feature-decoding network are jointly called an appearance migration network, and the network loss for training is:

상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 제1 특징 사이의 차이를 가리키기 위한 제1 손실; 및a first loss for indicating a difference between a second feature and a first feature of a pixel in the first facial image; and

상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징 사이의 차이를 가리키기 위한 제2 손실을 포함한다.and a second loss to indicate a difference between a second characteristic of a pixel in the first facial image and a third characteristic of a corresponding pixel in the target image.

일 예시에서, 상기 제1 손실과 상기 제2 손실을 최소화함으로써 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크의 파라미터를 조정할 수 있다.In one example, parameters of the feature encoding network and the feature decoding network may be adjusted by minimizing the first loss and the second loss.

상기 훈련의 손실은 수학식 (3)으로 표시할 수 있다:The loss of training can be expressed as Equation (3):

여기서

는 vgg 네트워크를 사용하여 얻은 이미지의 특징을 나타내며, 여기서 제 1 손실은 타겟 얼굴 이미지와 제1 얼굴 이미지 사이의 내용 상의 차이를 반영할 수 있고, 제2 손실은 타겟 얼굴 이미지와 타겟 이미지 사이의 외관 상의 차이를 반영할 수 있다. here

denotes the features of the image obtained using the vgg network, where the first loss may reflect a difference in content between the target face image and the first face image, and the second loss may reflect the appearance between the target face image and the target image. differences may be reflected.

두 번째 경우는 얼굴 재구성 네트워크를 빌어 상기 특징 인코딩 네트워크와 특징 디코딩 네트워크에 대해 훈련을 수행할 수 있는 경우이다. 이 경우, 상기 특징 인코딩 네트워크, 특징 디코딩 네트워크, 얼굴 재구성 네트워크를 공동으로 외관 마이그레이션 네트워크라고 칭할 수 있다. 여기서, 상기 얼굴 재구성 네트워크는 상기 타겟 얼굴 이미지의 얼굴 특징에 따라 재구성하여 얻은 재구성된 얼굴 이미지를 획득하는 데에 사용된다. The second case is a case in which training can be performed on the feature encoding network and the feature decoding network using the face reconstruction network. In this case, the feature encoding network, the feature decoding network, and the face reconstruction network may be jointly referred to as an appearance migration network. Here, the face reconstruction network is used to obtain a reconstructed face image obtained by reconstructing according to the facial features of the target face image.

도 4b는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 특징 인코딩 네트워크와 특징 디코딩 네트워크를 훈련하기 위한 방법의 모식도를 도시한다. 도 4b에 도시된 바와 같이, 상기 얼굴 재구성 네트워크는 감지 인코더(401) 및 얼굴 디코더(402)와 함께 가중치를 공유하는 감지 인코더(403)와 얼굴 디코더(404)를 포함한다. 감지 인코더(403)를 통해 타겟 이미지

로부터 획득된 얼굴 특징

을 직접 전송하고, 또한 얼굴 디코더(404)를 통해 얼굴 특징

를 디코딩함으로써 재구성 이미지

를 획득한다. 4B is a schematic diagram of a method for training a feature encoding network and a feature decoding network in an image processing method according to at least one embodiment of the present invention. As shown in FIG. 4B , the face reconstruction network includes a sensory encoder 403 and a face decoder 404 that share weights together with a sensory encoder 401 and a face decoder 402 . Target image via sensing encoder 403

facial features obtained from

directly transmits the facial features, and also through the facial decoder 404

Reconstructed image by decoding

to acquire

이 경우, 상기 훈련의 네트워크 손실은 상기 제1 손실과 제2 손실 외에 제3 손실을 더 포함하며, 상기 제3 손실은 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 얼굴 재구성 네트워크에 의해 출력된 얼굴 재구성 이미지 내의 상응한 픽셀의 제4 특징 사이의 차이를 가리키는 데에 사용된다. In this case, the network loss of the training further includes a third loss in addition to the first loss and the second loss, wherein the third loss is output by the face reconstruction network and the second feature of the pixel in the first face image. used to indicate the difference between the fourth feature of the corresponding pixel in the reconstructed face image.

상기 훈련된 손실은 수학식 (4)를 사용하여 표시할 수 있다.The trained loss can be expressed using Equation (4).

여기서 제1 손실과 제2 손실은 수학식 (3)과 같으며, 제3 손실은 재구성 얼굴 이미지와 타겟 이미지 사이의 외관 상의 차이를 반영할 수 있다.Here, the first loss and the second loss are the same as in Equation (3), and the third loss may reflect a difference in appearance between the reconstructed face image and the target image.

세 번째 경우는, 혼합 네트워크와 감별 네트워크를 빌어 상기 특징 인코딩 네트워크와 특징 디코딩 네트워크에 대해 훈련을 수행할 수 있는 경우이다. 이 경우, 상기 특징 인코딩 네트워크, 특징 디코딩 네트워크, 혼합 네트워크와 감별 네트워크, 또는 상기 얼굴 재구성 네트워크를 공동으로 외관 마이그레이션 네트워크라고 칭할 수 있다. 여기서, 상기 혼합 네트워크는 상기 타겟 얼굴 이미지 내의 픽셀과 상기 타겟 이미지/상기 재구성 이미지 내의 픽셀을 혼합하여 혼합 이미지를 획득하는 데에 사용되며; 상기 감별 네트워크는 상기 혼합 이미지 내의 픽셀의 분류 결과를 결정하는 데에 사용되는 바, 즉, 상기 픽셀이 타겟 얼굴 이미지의 픽셀(생성된 이미지 픽셀) 또는 목표 이미지 또는 재구성된 얼굴 이미지의 픽셀(실제 이미지 픽셀)에 속하는지 여부를 결정한다. A third case is a case in which training can be performed on the feature encoding network and the feature decoding network using the mixed network and the discrimination network. In this case, the feature encoding network, the feature decoding network, the mixed network and the discrimination network, or the face reconstruction network may be jointly referred to as an appearance migration network. wherein the blending network is used to mix pixels in the target face image and pixels in the target image/the reconstructed image to obtain a blended image; The discriminative network is used to determine the classification result of pixels in the blended image, i.e., the pixel is a pixel of a target face image (generated image pixel) or a pixel of a target image or a reconstructed face image (real image). pixel) or not.

도 4c는 본 발명의 적어도 하나의 실시예에 따른 이미지 처리 방법 중 특징 인코딩 네트워크와 특징 디코딩 네트워크를 훈련하기 위한 다른 방법의 모식도를 도시한다. 도 4c에 도시된 바와 같이, 혼합 네트워크(405)는 랜덤으로 생성된 마스크

를 사용하여 혼합 이미지

를 생성하는 바, 예를 들어 수학식 (5)에 의해 혼합 이미지

을 생성한다.4C is a schematic diagram of another method for training a feature encoding network and a feature decoding network in an image processing method according to at least one embodiment of the present invention. As shown in Figure 4c, the mixing network 405 is a randomly generated mask.

blend image using

Create a bar, e.g. a mixed image by Equation (5)

create

여기서

는 마스크,

는 타겟 얼굴 이미지,

는 타겟 이미지를 나타낸다.here

is a mask,

is the target face image,

represents the target image.

감별 네트워크(406)는 혼합 이미지

에 대해 혼합 이미지의 픽셀이 실제 이미지 픽셀인지 생성된 이미지 픽셀인지 여부를 예측한다.The discriminative network 406 is a mixed image

Predict whether a pixel in the blended image is an actual image pixel or a generated image pixel.

이 경우, 상기 훈련의 손실은 수학식 (6)과 같이 표시할 수 있다:In this case, the training loss can be expressed as Equation (6):

여기서 E는 기대값,

는 혼합 이미지,

는 마스크,

는 혼합 이미지에서 픽셀 기반 와서스테인(Wasserstein) 거리를 예측하는 함수이고,

은 MSD 함수가 1차 립쉬츠(Lipschitz) 제한을 받는다는 것을 나타낸다.where E is the expected value,

is a mixed image,

is a mask,

is a function to predict the pixel-based Wasserstein distance in the mixed image,

indicates that the MSD function is subject to a first-order Lipschitz constraint.

이런 훈련의 방식의 경우, 거짓으로 판단되는 부분(손실 감별)에 대한 손실을 최대화하고, 또한, 진실로 판단되는 부분(손실 생성)에 대한 손실을 최소화하여 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크의 파라미터에 대해 조정을 수행할 수 있다. In the case of this training method, the parameters of the feature encoding network and the feature decoding network are maximized by maximizing the loss for the part judged to be false (loss discrimination) and by minimizing the loss for the part judged to be true (loss generation). can be adjusted for

일 예시에서, 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크를 훈련하는 네트워크 손실은 수학식 (7)과 같이 표시할 수 있다:In one example, the network loss for training the feature encoding network and the feature decoding network can be expressed as Equation (7):

여기서,

는 가중 계수로서 각 부분의 손실의 중요도에 따라 결정될 수 있다.here,

is a weighting factor and may be determined according to the importance of the loss of each part.

이상은 도 4a내지 도 4c를 결부하여 본 발명의 실시예의 이미지 처리 방법에서 사용된 다양한 신경망의 훈련 과정을 설명하였다. 도 4a 내지 도 4c에 도시된 얼굴 특징은 예시일 뿐이며 이를 제한하는 것은 아니다. 구체적인 실시예에서 필요에 따라 상기 얼굴 특징은 예를 들어 눈, 코, 입 등 부위의 특징과 같은 국부 특징을 포함할 수 있고, 또한 전체 얼굴의 전역 특징을 포함할 수도 있거나, 또는, 국부 특징 및 전역 특징 및 기타 얼굴 특징 등을 모두 포함할 수 있다. The above has described the training process of various neural networks used in the image processing method of the embodiment of the present invention in conjunction with FIGS. 4A to 4C. The facial features shown in FIGS. 4A to 4C are only examples and are not limiting. In a specific embodiment, if necessary, the facial features may include local features such as, for example, features of eyes, nose, mouth, etc., and may also include global features of the entire face, or local features and It can include both global features and other facial features.

도 5는 본 발명의 한 실시예에 따른 이미지 처리 장치의 구조 모식도로서, 도 5 에 도시된 바와 같이 상기 장치는, 소스 이미지의 얼굴 특징과 타겟 이미지에 기초하여 획득된 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 데에 사용되는 제1 획득 유닛(501); 상기 제2 얼굴 특징 중 적어도 일부 특징의 분포가 상기 타겟 이미지의 특징 분포와 매칭하는 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 데에 사용되는 매핑 유닛(502); 및 상기 제1 얼굴 이미지 및 상기 제2 얼굴 특징에 따라 타겟 얼굴 이미지를 획득하는 데에 사용되는 제2 획득 유닛을 포함한다.5 is a structural schematic diagram of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 5 , the apparatus includes a first facial image obtained based on facial features of a source image and a target image. a first acquiring unit 501 used to acquire facial features; a mapping unit (502), used for mapping the first facial feature in which distribution of at least some of the second facial features matches the feature distribution of the target image to obtain a second facial feature; and a second acquiring unit, used to acquire a target facial image according to the first facial image and the second facial feature.

일부 실시예에서, 상기 제1 획득 유닛은 구체적으로, 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나를 획득하고; 특징 인코딩 네트워크를 통해 상기 제1 얼굴 이미지에 대해 특징 추출 처리를 수행하여 인코딩 특징 정보를 획득하고; 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 상기 인코딩 특징 정보에 따라 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 데에 사용된다. In some embodiments, the first acquiring unit is specifically configured to: acquire at least one of face coordinate information and face normal vector information of the first face image; performing feature extraction processing on the first face image through a feature encoding network to obtain encoding feature information; used to obtain a first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information.

일부 실시예에서, 상기 매핑 유닛은 구체적으로, 특징 매핑 네트워크를 통해 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징을 매핑하여 상기 픽셀의 제2 특징을 획득하는 데에 사용되되, 상기 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징의 확률 분포 사이의 거리가 설정 조건을 만족하고; 여기서, 상기 제1 얼굴 이미지 내의 픽셀의 제1 특징은 상기 제1 얼굴 특징에 속하고, 상기 픽셀의 제2 특징은 상기 제2 얼굴 특징에 속한다. In some embodiments, the mapping unit is specifically used to map a first feature of a pixel in the first facial image through a feature mapping network to obtain a second feature of the pixel, wherein the second feature of the pixel is a distance between a probability distribution of a feature and a probability distribution of a third feature of a corresponding pixel in the target image satisfies a setting condition; Here, a first feature of a pixel in the first facial image belongs to the first facial feature, and a second feature of the pixel belongs to the second facial feature.

일부 실시예에서, 상기 제2 획득 유닛은 구체적으로, 특징 디코딩 네트워크를 통해 상기 제2 얼굴 특징에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득하고; 상기 제1 얼굴 이미지 내의 상기 얼굴 영역 이외의 이미지 및 디코딩하여 획득된 상기 얼굴 영역의 이미지에 따라 상기 타겟 얼굴 이미지를 획득하는 데에 사용된다. In some embodiments, the second obtaining unit is specifically configured to: perform decoding on the second facial feature through a feature decoding network to obtain an image of a facial region; used to obtain the target face image according to an image other than the face region in the first face image and an image of the face region obtained by decoding.

일부 실시예에서, 상기 인코딩 특징 정보는 n차 특징 정보를 포함하고; 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 상기 인코딩 특징 정보에 따라 상기 제1 얼굴 이미지의 제1 얼굴 특징을 획득하는 것은, 구체적으로 상기 n차 특징 정보의 전 M차 특징 정보를 상기 제1 얼굴 이미지의 얼굴 좌표 정보와 얼굴 법선 벡터 정보 중 적어도 하나와 각각 연결하여 M차 연결 특징 정보를 획득하고; 상기 M차 연결 특징 정보와 후차 특징 정보에 따라 상기 제1 얼굴 특징을 획득하는 것에 사용되되, 여기서 상기 후차 특징 정보는 상기 인코딩 특징 정보 중 상기 전 M차 특징 정보 이외의 특징 정보를 포함하며, n 및 M은 자연수이고, M<n이다.In some embodiments, the encoding characteristic information includes nth-order characteristic information; Acquiring the first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information includes, specifically, all M orders of the nth order feature information. connecting the feature information with at least one of face coordinate information and face normal vector information of the first face image, respectively, to obtain M-order linked feature information; used to obtain the first facial feature according to the M-th order connection feature information and the subsequent feature information, wherein the subsequent feature information includes feature information other than the previous M-order feature information among the encoding feature information, n and M is a natural number, and M<n.

일부 실시예에서, 상기 매핑 유닛이 상기 제1 얼굴 특징을 매핑하여 제2 얼굴 특징을 획득하는 데에 사용되는 경우, 구체적으로, 상기 M차 연결 특징 정보에 대해 매핑을 수행하여 상기 타겟 이미지의 특징 분포와 매칭되는 M차 매핑 특징 정보를 획득하는 데에 사용되고; 상기 제2 획득 유닛이 특징 디코딩 네트워크를 통해 상기 제2 얼굴 특징에 대해 디코딩을 수행하여 얼굴 영역의 이미지를 획득하는 데에 사용되는 경우, 구체적으로, 상기 M차 매핑 특징 정보와 상기 후차 특징 정보에 대해 디코딩을 수행하여 상기 얼굴 영역의 이미지를 획득하는 데에 사용된다. In some embodiments, when the mapping unit is used to map the first facial feature to obtain a second facial feature, specifically, the mapping is performed on the M-th order connected feature information to characterize the target image used to obtain M-order mapping feature information matching the distribution; When the second obtaining unit is used to obtain an image of a face region by performing decoding on the second facial feature through a feature decoding network, specifically, in the M-th order mapping feature information and the subsequent feature information It is used to obtain an image of the face region by performing decoding on the face region.

일부 실시예에서, 상기 장치는 상기 특징 인코딩 네트워크, 상기 특징 매핑 네트워크, 상기 특징 디코딩 네트워크에 대해 종단간 훈련을 수행하기 위한 훈련 유닛을 더 포함하고, 여기서, 각 세대 훈련에서 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크의 훈련과 특징 매핑 네트워크의 훈련은 순서대로 수행된다.In some embodiments, the apparatus further comprises a training unit for performing end-to-end training on the feature encoding network, the feature mapping network, and the feature decoding network, wherein in each generation training, the feature encoding network and the The training of the feature decoding network and the training of the feature mapping network are performed sequentially.

일부 실시예에서, 상기 특징 매핑 네트워크는 최적화된 마이그레이션 네트워크의 훈련을 이용하여 획득되고, 상기 최적화된 마이그레이션 네트워크는 상기 특징 매핑 네트워크와 거리 평가 네트워크를 포함하고, 상기 훈련의 네트워크 손실은, 상기 거리 평가 네트워크에 의해 결정된 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징의 확률 분포와 상기 타겟 이미지 내의 해당 픽셀의 제3 특징의 확률 분포 사이의 차이를 가리키는 데에 사용되는 매핑 손실을 포함한다.In some embodiments, the feature mapping network is obtained using training of an optimized migration network, the optimized migration network includes the feature mapping network and a distance estimation network, and the network loss of the training is: a mapping loss used to indicate a difference between a probability distribution of a second feature of a pixel in the first facial image determined by the network and a probability distribution of a third feature of that pixel in the target image.

일부 실시예에서, 상기 특징 인코딩 네트워크와 상기 특징 디코딩 네트워크는 외관 마이그레이션 네트워크 훈련을 이용하여 획득하고, 상기 외관 마이그레이션 네트워크는 상기 특징 인코딩 네트워크, 상기 특징 디코딩 네트워크를 포함하고, 상기 훈련의 네트워크 손실은, 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 제1 특징 사이의 차이를 가리키기 위한 제1 손실; 및 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 타겟 이미지 내의 상응한 픽셀의 제3 특징 사이의 차이를 가리키기 위한 제2 손실을 포함한다. In some embodiments, the feature encoding network and the feature decoding network are obtained using appearance migration network training, wherein the appearance migration network includes the feature encoding network, the feature decoding network, and the network loss of the training is: a first loss for indicating a difference between a second feature and a first feature of a pixel in the first facial image; and a second loss to indicate a difference between a second characteristic of a pixel in the first facial image and a third characteristic of a corresponding pixel in the target image.

일부 실시예에서, 상기 외관 마이그레이션 네트워크는 얼굴 재구성 네트워크를 더 포함하고, 상기 얼굴 재구성 네트워크는 상기 타겟 이미지의 얼굴 특징 재구성에 따라 재구성된 얼굴 이미지를 획득하는 데에 사용되며, 상기 훈련의 네트워크 손실은, 상기 제1 얼굴 이미지 내의 픽셀의 제2 특징과 상기 얼굴 재구성 네트워크 출력의 얼굴 재구성 이미지 내의 해당 픽셀의 제4 특징 사이의 차이를 가리키기 위한 제3 손실을 더 포함한다.In some embodiments, the appearance migration network further comprises a face reconstruction network, the face reconstruction network is used to obtain a reconstructed face image according to the facial feature reconstruction of the target image, and the network loss of the training is , a third loss for indicating a difference between a second feature of a pixel in the first facial image and a fourth feature of that pixel in the facial reconstruction image of the facial reconstruction network output.

일부 실시예에서, 상기 외관 마이그레이션 네트워크는 감별 네트워크를 더 포함하고, 상기 훈련된 네트워크 손실은 상기 감별 네트워크에 의해 결정된 혼합 이미지 샘플의 픽셀 분류 결과와 상기 혼합 이미지 샘플의 라벨링 정보 사이의 차이를 가리키는 데에 사용되는 제4손실을 더 포함하며, 여기서, 상기 혼합 이미지 샘플은 상기 타겟 얼굴 이미지 내의 픽셀과 상기 타겟 이미지 또는 상기 얼굴 재구성 이미지 내의 픽셀을 혼합하여 얻은 이미지를 포함하고, 상기 라벨링 정보는 생성된 이미지 픽셀을 가리키거나 실제 이미지 픽셀을 가리킨다. In some embodiments, the appearance migration network further comprises a discrimination network, wherein the trained network loss indicates a difference between the pixel classification result of the mixed image sample determined by the discrimination network and the labeling information of the mixed image sample. and a fourth loss used for: wherein the mixed image sample includes an image obtained by mixing pixels in the target face image and pixels in the target image or the face reconstruction image, and wherein the labeling information is generated Points to an image pixel or an actual image pixel.

도 6은 본 발명의 적어도 하나의 실시예에 의해 제공되는 전자 장치를 도시하고, 상기 장치는 프로세서에서 실행 가능한 컴퓨터 명령을 저장하는 데에 사용되는 메모리(601) 및 프로세서(602)를 포함하는 바, 상기 프로세서는 상기 컴퓨터 명령을 실행할 때 본 명세서의 임의의 실시예에 따른 이미지 처리 방법을 실행하기 위해 사용된다. 6 illustrates an electronic device provided by at least one embodiment of the present invention, the device comprising a processor 602 and a memory 601 used to store computer instructions executable by the processor. , the processor is used to execute the image processing method according to any embodiment of the present specification when executing the computer instruction.

본 명세서의 적어도 하나의 실시예는 컴퓨터 프로그램이 저장되는 저장 매체를 더 제공하며, 상기 프로그램이 프로세서에 의해 실행되는 경우, 본 명세서의 임의의 실시예에 따른 이미지 방법이 구현된다. . At least one embodiment of the present specification further provides a storage medium in which a computer program is stored, and when the program is executed by a processor, the imaging method according to any embodiment of the present specification is implemented. .

본 발명의 실시예에서, 컴퓨터 판독가능 저장 매체는 다양한 형태일 수 있는 바, 예를 들어, 다른 예에서, 상기 기계 판독가능 저장 매체는RAM(Radom Access Memory, 랜덤 액세스 메모리), 휘발성 메모리, 비휘발성 메모리, 플래시 메모리, 메모리 드라이버(예컨대, 하드디스크 드라이버), 솔리드 스테이트 하드디스크, 임의의 유형의 메모리 디스크(예컨대, CD, dvd 등), 또는 유사한 저장 매체, 또는 이들의 조합일 수 있다. 특히, 상기 컴퓨터 판독 가능 매체는 또한 종이 또는 프로그램을 인쇄할 수 있는 다른 적절한 매체일 수 있다. 이러한 매체를 사용하여, 이러한 프로그램은 전기적으로 획득(예: 광학 스캐닝)될 수 있고, 적절한 방식으로 컴파일되고 해석되고 처리된 다음 컴퓨터 매체에 저장될 수 있다.In an embodiment of the present invention, the computer-readable storage medium may have various forms. For example, in another example, the machine-readable storage medium may include random access memory (RAM), volatile memory, and non-volatile memory. It may be a volatile memory, flash memory, a memory driver (eg, a hard disk driver), a solid state hard disk, any type of memory disk (eg, CD, dvd, etc.), or a similar storage medium, or a combination thereof. In particular, the computer readable medium may also be paper or other suitable medium capable of printing a program. Using such a medium, such a program may be obtained electrically (eg, optically scanned), compiled in an appropriate manner, interpreted, processed, and then stored on a computer medium.

이상의 설명은 본 발명의 바람직한 실시예에 불과하며, 본 발명을 제한하려는 의도가 아니며, 본 발명의 사상 및 원칙 내에서 이루어진 모든 수정, 균등 교체, 개량 등은 본 발명의 보호 범위에 포함되어야 한다.The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

An image processing method comprising:
acquiring a first facial feature of a first facial image, wherein the first facial image is acquired based on facial features of a source image and a target image;
obtaining a second facial feature by mapping the first facial feature, wherein a distribution of at least some of the second facial features matches a feature distribution of the target image; and
acquiring a target facial image according to the first facial image and the second facial feature
An image processing method, characterized in that.

According to claim 1,
Acquiring a first facial feature of the first facial image includes:
obtaining at least one of face coordinate information and face normal vector information of the first face image;
performing feature extraction processing on the first face image through a feature encoding network to obtain encoding feature information; and
acquiring a first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information
An image processing method, characterized in that.

3. The method of claim 1 or 2,
obtaining a second facial feature by mapping the first facial feature,
performing mapping on a first feature of a pixel in the first facial image through a feature mapping network to obtain a second feature of the pixel, wherein a probability distribution of the second feature of the pixel and a probability distribution of the second feature in the target image a distance between the probability distributions of the third feature of the corresponding pixel satisfies the setting condition;
a first feature of a pixel in the first facial image belongs to the first facial feature, and a second feature of the pixel belongs to the second facial feature
An image processing method, characterized in that.

4. The method according to any one of claims 1 to 3,
Acquiring a target facial image according to the first facial image and the second facial feature comprises:
performing decoding on the second facial feature through a feature decoding network to obtain an image of a facial region; and
acquiring the target face image according to an image other than the face region in the first face image and an image of the face region obtained by decoding
An image processing method, characterized in that.

5. The method according to any one of claims 2 to 4,
the encoding characteristic information includes nth-order characteristic information;
Acquiring the first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information comprises:
obtaining M-order linked feature information by linking all M-order feature information of the n-th feature information to at least one of face coordinate information and face normal vector information of the first face image, respectively; and
obtaining the first facial feature according to the M-order connection feature information and the subsequent feature information, wherein the subsequent feature information includes feature information other than the previous M-order feature information among the encoding feature information, n and M is a natural number, and M<n
An image processing method, characterized in that.

6. The method of claim 5,
obtaining a second facial feature by mapping the first facial feature,
performing mapping on the M-order connection feature information to obtain M-order mapping feature information matching the feature distribution of the target image;
Acquiring an image of a facial region by performing decoding on the second facial feature through a feature decoding network includes:
and obtaining an image of the face region by performing decoding on the M-order mapping feature information and the subsequent feature information.

7. The method according to any one of claims 2 to 6,
Further comprising the step of performing end-to-end training on the feature encoding network, the feature mapping network, and the feature decoding network, wherein in each generation training, the training of the feature encoding network and the feature decoding network is of the feature mapping network. training and sequential
An image processing method, characterized in that.

8. The method of claim 7,
The feature mapping network is obtained using training of an optimized migration network, the optimized migration network includes the feature mapping network and the distance estimation network, and the network loss of the training is:
a mapping loss used to indicate a difference between a probability distribution of a second feature of a pixel in the first facial image determined by the distance estimation network and a probability distribution of a third feature of that pixel in the target image.
An image processing method, characterized in that.

8. The method of claim 7,
The feature encoding network and the feature decoding network are obtained using appearance migration network training, wherein the appearance migration network includes the feature encoding network and the feature decoding network, and the network loss of the training is:
a first loss for indicating a difference between a second feature and a first feature of a pixel in the first facial image;
a second loss to indicate a difference between a second characteristic of a pixel in the first facial image and a third characteristic of a corresponding pixel in the target image;
An image processing method, characterized in that.

10. The method of claim 9,
The appearance migration network further includes a face reconstruction network, wherein the face reconstruction network is used to obtain a reconstructed face image according to the facial feature reconstruction of the target image, and the network loss of the training is:
a third loss to indicate a difference between a second feature of a pixel in the first facial image and a fourth feature of that pixel in the facial reconstruction image of the facial reconstruction network output
An image processing method, characterized in that.

11. The method of claim 9 or 10,
The appearance migration network further comprises a discriminative network, and the trained network loss is
a fourth loss used to indicate a difference between the pixel classification result of the mixed image sample determined by the discrimination network and the labeling information of the mixed image sample,
The mixed image sample includes an image obtained by mixing pixels in the target face image and pixels in the target image or the face reconstruction image, and the labeling information points to a generated image pixel or points to an actual image pixel.
An image processing method, characterized in that.

An image processing device comprising:
a first acquiring unit used to acquire a first facial feature of a first facial image acquired based on the facial feature of the source image and the target image;
a mapping unit used to map the first facial feature in which a distribution of at least some of the second facial features matches a feature distribution of the target image to obtain a second facial feature; and
a second acquiring unit used to acquire a target facial image according to the first facial image and the second facial feature;
Image processing device, characterized in that.

13. The method of claim 12,
The first acquiring unit,
acquiring at least one of face coordinate information and face normal vector information of the first face image;
performing feature extraction processing on the first face image through a feature encoding network to obtain encoding feature information;
used to obtain a first facial feature of the first face image according to at least one of face coordinate information and face normal vector information of the first face image and the encoding feature information
Image processing device, characterized in that.

14. The method of claim 12 or 13,
The mapping unit is
is used to map a first feature of a pixel in the first facial image through a feature mapping network to obtain a second feature of the pixel, wherein the probability distribution of the second feature of the pixel and the corresponding pixel in the target image the distance between the probability distributions of the third feature satisfies the setting condition;
a first feature of a pixel in the first facial image belongs to the first facial feature, and a second feature of the pixel belongs to the second facial feature
Image processing device, characterized in that.

15. The method according to any one of claims 12 to 14,
The second acquisition unit,
perform decoding on the second facial feature through a feature decoding network to obtain an image of a facial region;
used to obtain the target face image according to an image other than the face region in the first face image and an image of the face region obtained by decoding
Image processing device, characterized in that.

16. The method according to any one of claims 12 to 15,
the encoding characteristic information includes nth-order characteristic information;
when the first obtaining unit is used to obtain a first facial feature of the first face image according to the encoding feature information and at least one of face coordinate information and face normal vector information of the first face image;
connecting all M-order feature information of the n-th feature information with at least one of face coordinate information and face normal vector information of the first face image, respectively, to obtain M-order linked feature information;
used to obtain the first facial feature according to the M-order connection feature information and the subsequent feature information, wherein the subsequent feature information includes feature information other than the pre-M-order feature information among the encoding feature information, n and M is a natural number, where M < n
An image processing method, characterized in that.

17. The method of claim 16,
when the mapping unit is used to map the first facial feature to obtain a second facial feature,
is used to perform mapping on the M-order connection feature information to obtain M-order mapping feature information matching the feature distribution of the target image;
when the second acquiring unit is used to perform decoding on the second facial feature through a feature decoding network to acquire an image of a facial region,
used to obtain an image of the face region by performing decoding on the M-order mapping feature information and the subsequent feature information
Image processing device, characterized in that.

18. The method according to any one of claims 13 to 17,
A training unit for performing end-to-end training on the feature encoding network, the feature mapping network, and the feature decoding network, wherein training of the feature encoding network and the feature decoding network and the feature mapping network in each generation training training is performed in sequence
Image processing device, characterized in that.

As an electronic device,
12. A method comprising: a memory and a process, wherein the memory is used to store computer instructions executable in the process, the processor executing the image processing method according to any one of claims 1 to 11 when executing the computer instructions used to
Electronic device, characterized in that.

A storage medium in which a computer program is stored, comprising:
When the program is executed by the processor, the image processing method according to any one of claims 1 to 11 is implemented
A storage medium, characterized in that.