TW202211167A

TW202211167A - Training method and training system of generative adversarial network for image cross domain conversion

Info

Publication number: TW202211167A
Application number: TW109144098A
Authority: TW
Inventors: 林哲聰; 范峻; 郭宗憲
Original assignee: 財團法人工業技術研究院
Priority date: 2020-09-04
Filing date: 2020-12-14
Publication date: 2022-03-16

Abstract

A training method and a training system of generative adversarial network for image cross domain conversion are provided. The generative confrontation network is used for performing image cross domain conversion with preserved structure. The training method of generative confrontation network for image multimodal conversion includes the following steps. A first real image is inputted to a first generator to obtain a first generated image. The first real image has a first mode, and the first generated image has a second mode. The first generated image is inputted to a second generator to obtain a first reconstructed image. The parameters of the first generator and the second generator are updated according to a plurality of loss values, so as to achieve image multimodal conversion with cyclically consistent or similar structure.

Description

Training method and training system of generative adversarial network for performing image multimodal transformation

本揭露是有關於一種執行影像多模態轉換之生成式對抗網路的訓練方法與訓練系統。The present disclosure relates to a training method and training system of a generative adversarial network for performing image multimodal transformation.

近年來，深度學習技術的興起，將各類影像辨識技術的辨識率大幅提高。深度學習模型需要採用大量標記資料進行訓練。然而，標記資料是相關研發單位/公司，產品開發中最重要且絕不公開的資產。許多中小企業即使具有深度學習模型開發能力，卻受限於人力或物力，使得自行標記資料之數量極為有限，使得產品化能力無法與國際大廠並駕齊驅，尤其是在行車物件偵測/語義分割/實例分割等應用上。在這些應用中，尤其是語義分割及實例分割，需要在影像中逐像素進行標記，因而耗費巨量人力成本。In recent years, the rise of deep learning technology has greatly improved the recognition rate of various image recognition technologies. Deep learning models require a large amount of labeled data for training. However, the marked data is the most important and never public asset of the relevant R&D unit/company, product development. Even though many small and medium-sized enterprises have the ability to develop deep learning models, they are limited by manpower or material resources, which makes the amount of self-labeled data extremely limited, making it impossible to keep pace with international manufacturers in productization capabilities, especially in vehicle detection/semantic segmentation/ instance segmentation and other applications. In these applications, especially semantic segmentation and instance segmentation, labeling needs to be done pixel by pixel in the image, which consumes huge labor costs.

執行影像多模態轉換之生成式對抗網路（Generative Adversarial Network, GAN）提出後，深度學習模型開始具有產生影像的能力，許多後續研究更進化到能將影像從A風格（模態）轉換至B風格（模態）等，運用這樣強大的影像轉換能力，使得影像分割或是物件偵測有可能達到所謂的多模態應用（cross domain adaptation），也就是運用GAN來將已標記的訓練資料從原始模態（source domain）轉換到目標模態（target domain）。因而增加影像分割或是物件偵測模型在目標模態（target domain）的泛用性與強健性。然而，現存的模型仍然無法達到這樣的功能的主要原因在於，現有之GAN雖能做到影像風格極大的改變，但影像中所標記的物件或是特定類別的像素往往在轉換後的位置會改變，而這將使得影像需要重新標記，使得GAN無法運用在標記資料之影像風格轉換。After the Generative Adversarial Network (GAN) that performs multi-modal conversion of images was proposed, deep learning models began to have the ability to generate images, and many subsequent studies have evolved to convert images from A-style (modal) to B style (modality), etc., using such a powerful image conversion capability, it is possible to achieve the so-called multi-modal application (cross domain adaptation) for image segmentation or object detection, that is, using GAN to convert the labeled training data. Convert from the original mode (source domain) to the target mode (target domain). Therefore, the generality and robustness of the image segmentation or object detection model in the target domain are increased. However, the main reason why the existing models are still unable to achieve such a function is that although the existing GAN can greatly change the image style, the marked objects or specific types of pixels in the image often change their positions after conversion. , which will make the image need to be re-labeled, making GAN unable to use the image style transfer of labeled data.

因此，研究人員正在開發一種執行影像多模態轉換之生成式對抗網路，期望其可將一張原始模態影像轉換至不同程度目標模態影像，且每張轉換後的目標模態影像的結構，都能夠與轉換前維持一致或相似。Therefore, researchers are developing a generative adversarial network that performs image multi-modal conversion, expecting it to convert an original modality image to target modality images of different degrees, and each converted target modality image has a The structure can be maintained the same or similar to that before the conversion.

本揭露係有關於一種執行影像多模態轉換之生成式對抗網路的訓練方法與訓練系統。The present disclosure relates to a training method and training system of a generative adversarial network for performing image multimodal transformation.

根據本揭露之一實施例，提出一種執行影像多模態轉換之生成式對抗網路的訓練方法。生成式對抗網路用以進行結構保存之影像多模態轉換。執行影像多模態轉換之生成式對抗網路的訓練方法包括以下步驟。輸入一第一真實影像至一第一生成器，以獲得一第一生成影像。第一真實影像具有一第一模態，第一生成影像具有一第二模態。輸入第一生成影像至一第二生成器，以獲得一第一結構重建影像。第一結構重建影像具有第一模態。輸入第一生成影像至一第一鑑別器，以計算一第一損失值。輸入第一生成影像至一第一影像分割網路，以計算一第二損失值。輸入第一結構重建影像至一第二鑑別器，以計算一第三損失值。輸入第一結構重建影像至一第二影像分割網路，以計算一第四損失值。更新第一生成器及該第二生成器的參數，以訓練執行影像多模態轉換之生成式對抗網路。第一生成器及第二生成器的參數至少依據第一損失值、第二損失值、第三損失值及第四損失值進行更新。According to an embodiment of the present disclosure, a training method of a generative adversarial network for performing image multimodal transformation is proposed. Generative Adversarial Networks for Structure-Preserving Image Multimodal Transformation. The training method of a generative adversarial network for performing image multimodal transformation includes the following steps. A first real image is input to a first generator to obtain a first generated image. The first real image has a first modality, and the first generated image has a second modality. The first generated image is input to a second generator to obtain a first structural reconstruction image. The reconstructed image of the first structure has a first modality. The first generated image is input to a first discriminator to calculate a first loss value. The first generated image is input to a first image segmentation network to calculate a second loss value. The reconstructed image of the first structure is input to a second discriminator to calculate a third loss value. The reconstructed image of the first structure is input to a second image segmentation network to calculate a fourth loss value. Parameters of the first generator and the second generator are updated to train a generative adversarial network that performs image multimodal transformation. The parameters of the first generator and the second generator are updated according to at least the first loss value, the second loss value, the third loss value and the fourth loss value.

根據本揭露之另一實施例，提出一種執行影像多模態轉換之生成式對抗網路的訓練系統。生成式對抗網路用以進行結構保存之影像多模態轉換。執行影像多模態轉換之生成式對抗網路的訓練系統包括一第一生成器、一第二生成器、一第一鑑別器、一第二鑑別器、一第一影像分割網路及一第二影像分割網路。第一生成器用以接收一第一真實影像，以獲得一第一生成影像。第一真實影像具有一第一模態。第一生成影像具有一第二模態。第二生成器用以接收第一生成影像，以獲得一第一結構重建影像。第一結構重建影像具有第一模態。第一鑑別器用以接收第一生成影像，以計算一第一損失值。第二鑑別器用以接收第一結構重建影像，以計算一第二損失值。第一影像分割網路用以接收第一生成影像，以計算一第三損失值。第二影像分割網路用以接收第一結構重建影像，以計算一第四損失值。第一生成器及第二生成器的參數至少依據第一損失值、第二損失值、第三損失值及第四損失值進行更新，以訓練執行影像多模態轉換之生成式對抗網路。According to another embodiment of the present disclosure, a training system of a generative adversarial network for performing image multimodal transformation is provided. Generative Adversarial Networks for Structure-Preserving Image Multimodal Transformation. A training system for a generative adversarial network that performs image multimodal transformation includes a first generator, a second generator, a first discriminator, a second discriminator, a first image segmentation network, and a first Two image segmentation network. The first generator is used for receiving a first real image to obtain a first generated image. The first real image has a first mode. The first generated image has a second modality. The second generator is used for receiving the first generated image to obtain a first structural reconstruction image. The reconstructed image of the first structure has a first modality. The first discriminator is used for receiving the first generated image to calculate a first loss value. The second discriminator is used for receiving the reconstructed image of the first structure to calculate a second loss value. The first image segmentation network is used for receiving the first generated image to calculate a third loss value. The second image segmentation network is used for receiving the reconstructed image of the first structure to calculate a fourth loss value. The parameters of the first generator and the second generator are updated according to at least the first loss value, the second loss value, the third loss value and the fourth loss value, so as to train a generative adversarial network for performing image multimodal transformation.

為了對本揭露之上述及其他方面有更佳的瞭解，下文特舉實施例，並配合所附圖式詳細說明如下：In order to have a better understanding of the above-mentioned and other aspects of the present disclosure, the following embodiments are given and described in detail with the accompanying drawings as follows:

本揭露提出執行影像多模態轉換之生成式對抗網路的訓練方法。訓練出來的生成式對抗網路可以進行結構保存之影像多模態轉換。以行車影像為例，白天行車影像可以透過本揭露訓練出來的生成式對抗網路擴增出夜晚行車影像作為深度學習的訓練樣本。所擴增出的夜晚行車影像的物件位置與白天行車影像一致或相似，而無須重新標記。不僅解決標記資料不足之行車物件偵測辨識率下降問題，亦可達成避免需要大量標記資料之功效。The present disclosure proposes a training method for generative adversarial networks that perform image multimodal transformation. The trained generative adversarial network can perform structure-preserving image multimodal transformation. Taking driving images as an example, daytime driving images can be augmented by the generative adversarial network trained in this disclosure to augment nighttime driving images as training samples for deep learning. The positions of the objects in the amplified night driving image are consistent with or similar to the daytime driving image, and do not need to be marked again. It not only solves the problem that the detection and recognition rate of driving objects with insufficient marking data is reduced, but also achieves the effect of avoiding the need for a large amount of marking data.

本揭露提出之執行影像多模態轉換之生成式對抗網路的訓練方法包含前進循環架構（forward cycle）與退後循環架構（backward cycle）。前進循環架構係為第一模態轉換至第二模態的訓練，退後循環架構則為第二模態轉換至第一模態的訓練。前進循環架構與退後循環架構兩者的結合可以使第一模態與第二模態之間的互相轉換都能夠相當準確。The training method of a generative adversarial network for performing image multimodal transformation proposed in the present disclosure includes a forward cycle architecture and a backward cycle architecture. The forward loop framework is the training for the transition from the first modality to the second modality, and the backward loop framework is the training for the transition from the second modality to the first modality. The combination of both the forward loop architecture and the backward loop architecture can enable the mutual conversion between the first modality and the second modality to be quite accurate.

在一實施例中，執行影像多模態轉換之生成式對抗網路的訓練方法可以僅有前進循環架構。以下先說明前進循環架構。In one embodiment, the training method of the generative adversarial network performing image multimodal transformation may only be a forward loop architecture. The forward loop architecture is first described below.

請參照第1圖，其繪示根據一實施例之執行影像多模態轉換之生成式對抗網路的訓練系統100的方塊圖。第1圖之執行影像多模態轉換之生成式對抗網路的訓練系統100僅包含前進循環架構。執行影像多模態轉換之生成式對抗網路的訓練系統100包括一第一生成器G1、一第二生成器G2、一第一鑑別器D1、一第二鑑別器D2、一第一影像分割網路S1及一第二影像分割網路S2。各項元件之功能概述如下。第一生成器G1及第二生成器G2用以進行模態、風格的轉換。第一鑑別器D1及第二鑑別器D2用以進行生成資料的驗證。第一影像分割網路S1及第二影像分割網路S2用以進行物件分割與標註，例如是語意分割網路。第一生成器G1、第二生成器G2、第一鑑別器D1、第二鑑別器D2、第一影像分割網路S1及第二影像分割網路S2例如是一電路、一晶片、一電路板或儲存程式碼之儲存裝置。Please refer to FIG. 1 , which illustrates a block diagram of a training system 100 for a generative adversarial network that performs image multimodal transformation according to an embodiment. The training system 100 of FIG. 1 for performing a generative adversarial network for image multimodal transformation includes only a forward loop architecture. The training system 100 of a generative adversarial network for performing image multimodal transformation includes a first generator G1, a second generator G2, a first discriminator D1, a second discriminator D2, and a first image segmentation The network S1 and a second image segmentation network S2. The functions of each component are summarized below. The first generator G1 and the second generator G2 are used for mode and style conversion. The first discriminator D1 and the second discriminator D2 are used to verify the generated data. The first image segmentation network S1 and the second image segmentation network S2 are used for object segmentation and labeling, such as semantic segmentation networks. The first generator G1, the second generator G2, the first discriminator D1, the second discriminator D2, the first image segmentation network S1 and the second image segmentation network S2 are, for example, a circuit, a chip, a circuit board or a storage device where code is stored.

第一生成器G1主要目的是將第一模態（例如是白天）之第一真實影像GT1轉換為第二模態（例如是夜晚）之第一生成影像FK1。第一生成器G1被注入隨機雜訊N(z)後，透過學習與訓練，讓第一生成影像FK1通過第一鑑別器D1的驗證。另外，為了維持第一生成影像FK1與第一真實影像GT1之結構一致性或相似性，第一影像分割網路S1可以協助第一生成影像FK1進行結構確認，以確保影像轉換前後的結構得以保存或相似。上述之第一模態與第二模態係以白天與夜晚為例做說明，在另一實施例中，第一模態與第二模態亦可以式風格差異不大的情況，例如是晴天與陰天。The main purpose of the first generator G1 is to convert the first real image GT1 of the first mode (eg, daytime) into the first generated image FK1 of the second mode (eg, night). After the random noise N(z) is injected into the first generator G1, through learning and training, the first generated image FK1 can pass the verification of the first discriminator D1. In addition, in order to maintain the structural consistency or similarity between the first generated image FK1 and the first real image GT1, the first image segmentation network S1 can assist the first generated image FK1 to perform structural confirmation to ensure that the structures before and after image conversion are preserved. or similar. The above-mentioned first mode and second mode are described by taking day and night as examples. In another embodiment, the first mode and the second mode can also be in a situation where there is little difference in style, such as a sunny day with cloudy days.

再者，為了防止影像轉換的模態、風格過於強烈，第二生成器G2將第二模態之第一生成影像FK1再轉換回第一模態之第一結構重建影像RC1，並通過第二鑑別器D2的驗證。此外，第一結構重建影像RC1與第一真實影像GT1也需要維持結構一致性或相似性，第二影像分割網路S2可以協助第一結構重建影像RC1進行結構確認，以確保影像轉換前後的循環結構一致或近似。以下透過一流程圖詳細說明上述各項元件之運作。Furthermore, in order to prevent the mode and style of the image conversion from being too strong, the second generator G2 converts the first generated image FK1 of the second mode back to the first structure reconstruction image RC1 of the first mode, and uses the second Verification of discriminator D2. In addition, the first structural reconstructed image RC1 and the first real image GT1 also need to maintain structural consistency or similarity, and the second image segmentation network S2 can assist the first structural reconstructed image RC1 to confirm the structure, so as to ensure the cycle before and after image conversion Consistent or similar structure. The following describes the operation of the above elements in detail through a flow chart.

請參照第1～2圖，第2圖繪示根據一實施例執行影像多模態轉換之生成式對抗網路的訓練方法的流程圖。第2圖之執行影像多模態轉換之生成式對抗網路的訓練方法僅包含前進循環架構。在步驟S110中，如第1圖所示，輸入第一真實影像GT1至第一生成器G1，以獲得第一生成影像FK1。第一真實影像GT1具有第一模態，第一生成影像FK1具有第二模態。第一真實影像GT1係由影像感測器擷取。影像感測器例如是一紅外線影像擷取裝置、一光電耦合元件、一互補式金氧半導體光學感測元件、或其任意組合。以擷取行車影像之攝影機為例，其可裝置於前、後、左、右或是任何能擷取車輛四周影像之位置。第一模態例如是白天，第二模態例如是夜晚、雨天、夕陽、下雪天等。Please refer to FIGS. 1-2. FIG. 2 is a flowchart illustrating a training method of a generative adversarial network for performing image multimodal transformation according to an embodiment. The training method of the generative adversarial network performing image multimodal transformation in Figure 2 includes only the forward loop architecture. In step S110, as shown in FIG. 1, the first real image GT1 is input to the first generator G1 to obtain the first generated image FK1. The first real image GT1 has a first modality, and the first generated image FK1 has a second modality. The first real image GT1 is captured by an image sensor. The image sensor is, for example, an infrared image capturing device, a photo-coupling element, a complementary metal-oxide-semiconductor optical sensing element, or any combination thereof. Taking the camera for capturing driving images as an example, it can be installed in the front, rear, left, right or any position that can capture images around the vehicle. The first mode is, for example, daytime, and the second mode is, for example, night, rainy day, sunset, snowy day, and the like.

如第1圖所示，第一生成器G1被注入隨機雜訊N(z)，以獲得第一生成影像FK1。請參照附圖1～附圖2。附圖1示例說明原始的第一真實影像GT1，其為白天所拍攝到的道路情況。附圖2示例說明加入隨機雜訊N(z)後，所模擬出夜晚的第一生成影像FK1。加入不同程度的隨機雜訊N(z)可以獲得不同程度的第一生成影像FK1，如傍晚或深夜。As shown in FIG. 1, the first generator G1 is injected with random noise N(z) to obtain the first generated image FK1. Please refer to Figures 1 to 2. FIG. 1 exemplifies the original first real image GT1 , which is a road condition photographed during the day. FIG. 2 illustrates the simulated first generated image FK1 at night after adding random noise N(z). By adding random noise N(z) of different degrees, the first generated image FK1 of different degrees can be obtained, such as evening or late night.

接著，在步驟S120中，如第1圖所示，輸入第一生成影像FK1至第二生成器G2，以獲得第一結構重建影像RC1。第一結構重建影像RC1具有第一模態。也就是說，原本第一模態之第一真實影像GT1轉換為第二模態之第一生成影像FK1後，再轉換回第一模態之第一結構重建影像RC1。請參照附圖3，附圖3示例說明第一結構重建影像RC1。Next, in step S120, as shown in FIG. 1, the first generated image FK1 is input to the second generator G2 to obtain the first structural reconstructed image RC1. The first structural reconstructed image RC1 has a first modality. That is, after the original first real image GT1 of the first modality is converted into the first generated image FK1 of the second modality, it is then converted back to the first structural reconstruction image RC1 of the first modality. Please refer to FIG. 3 . FIG. 3 illustrates the reconstructed image RC1 with the first structure.

然後，在步驟S130中，如第1圖所示，輸入第一生成影像FK1至第一鑑別器D1，以計算一第一損失值LS1。第一損失值LS1代表轉換後之第二模態的第一生成影像FK1是否與第二模態的第二真實影像GT2（標示於第3圖）相似。第二真實影像GT2與第一真實影像GT1不需要完全或幾乎重疊，只要大體上類似即可。Then, in step S130, as shown in FIG. 1, the first generated image FK1 is input to the first discriminator D1 to calculate a first loss value LS1. The first loss value LS1 represents whether the converted first generated image FK1 of the second modality is similar to the second real image GT2 (marked in FIG. 3 ) of the second modality. The second real image GT2 and the first real image GT1 do not need to completely or almost overlap, but only need to be substantially similar.

接著，在步驟S140中，如第1圖所示，輸入第一生成影像FK1至第一影像分割網路S1，以計算一第二損失值LS2。在此步驟中，第二損失值LS2係依據第一真實影像GT1的影像分割真值（Ground-Truth）SG1與第一生成影像FK1之影像分割進行計算。請參照附圖4，其示例說明第一真實影像GT1之影像分割真值SG1。請參照附圖5，其示例說明第一生成影像FK1之影像分割。第二損失值LS2代表第一生成影像FK1的結構與第一真實影像GT1的結構是否一致或近似。在生成式對抗網路的訓練過程中考慮了第二損失值LS2可以確保第一生成器G1所生成之第一生成影像FK1的物件位置不會改變或近似。Next, in step S140, as shown in FIG. 1, the first generated image FK1 is input to the first image segmentation network S1 to calculate a second loss value LS2. In this step, the second loss value LS2 is calculated according to the image segmentation ground truth (Ground-Truth) SG1 of the first real image GT1 and the image segmentation of the first generated image FK1 . Please refer to FIG. 4 , which illustrates the ground truth value SG1 of image segmentation of the first real image GT1 as an example. Please refer to FIG. 5 , which illustrates the image segmentation of the first generated image FK1 as an example. The second loss value LS2 represents whether the structure of the first generated image FK1 is consistent with or similar to the structure of the first real image GT1 . Taking the second loss value LS2 into account during the training of the generative adversarial network can ensure that the object position of the first generated image FK1 generated by the first generator G1 does not change or approximate.

然後，在步驟S150中，如第1圖所示，輸入第一結構重建影像RC1至第二鑑別器D2，以計算一第三損失值LS3。第三損失值LS3代表轉換後之第一模態的第一結構重建影像RC1是否與第一模態的第一真實影像GT1相似。在生成式對抗網路的訓練過程中考慮了第三損失值LS3可以確保第一生成器G1所生成之第二模態的第一生成影像FK1也能夠正確還原回第一模態。Then, in step S150, as shown in FIG. 1, the reconstructed image RC1 of the first structure is input to the second discriminator D2 to calculate a third loss value LS3. The third loss value LS3 represents whether the converted first structural reconstructed image RC1 of the first modality is similar to the first real image GT1 of the first modality. Taking the third loss value LS3 into consideration in the training process of the generative adversarial network can ensure that the first generated image FK1 of the second modality generated by the first generator G1 can also be correctly restored to the first modality.

接著，在步驟S160中，如第1圖所示，輸入第一結構重建影像RC1至第二影像分割網路S2，以計算一第四損失值LS4。在此步驟中，第四損失值LS4係依據第一真實影像GT1的影像分割真值SG1與第一結構重建影像RC1之影像分割進行計算。請參照附圖6，其示例說明第一結構重建影像RC1之影像分割結果。第四損失值LS4代表第一結構重建影像RC1的結構與第一真實影像GT1的結構是否一致或近似。在生成式對抗網路的訓練過程中考慮了第四損失值LS4可以確保第二生成器G2所生成之第一結構重建影像RC1的物件位置不會改變或近似。步驟S130～步驟S160並不限定於第2圖所示例之順序，亦可以同時執行或變更為任何順序，只要是執行於步驟S170之前即可。Next, in step S160 , as shown in FIG. 1 , the reconstructed image RC1 with the first structure is input to the second image segmentation network S2 to calculate a fourth loss value LS4 . In this step, the fourth loss value LS4 is calculated according to the image segmentation truth value SG1 of the first real image GT1 and the image segmentation of the first structural reconstructed image RC1. Please refer to FIG. 6 , which illustrates an example of the image segmentation result of the reconstructed image RC1 with the first structure. The fourth loss value LS4 represents whether the structure of the first structural reconstructed image RC1 is consistent with or similar to the structure of the first real image GT1 . Considering the fourth loss value LS4 during the training of the generative adversarial network can ensure that the object position of the reconstructed image RC1 of the first structure generated by the second generator G2 does not change or approximate. Steps S130 to S160 are not limited to the order illustrated in FIG. 2 , and may be performed simultaneously or changed to any order, as long as they are performed before step S170 .

然後，在步驟S170中，更新第一生成器G1及第二生成器G2的參數，以訓練執行影像多模態轉換之生成式對抗網路。在前進循環架構（forward cycle）之實施例中，第一生成器G1及第二生成器G2的參數依據第一損失值LS1、第二損失值LS2、第三損失值LS3及第四損失值LS4進行更新。Then, in step S170, the parameters of the first generator G1 and the second generator G2 are updated to train a generative adversarial network for performing image multimodal transformation. In the embodiment of the forward cycle architecture, the parameters of the first generator G1 and the second generator G2 are based on the first loss value LS1 , the second loss value LS2 , the third loss value LS3 and the fourth loss value LS4 to update.

上述內容為前進循環架構，在訓練生成式對抗網路的過程中，透過第一損失值LS1可以確保轉換後之第二模態的第一生成影像FK1與第二模態的第二真實影像GT2（標示於第3圖）相似。透過第二損失值LS2可以確保第一生成器G1所生成之第一生成影像FK1的物件位置不會改變。透過第三損失值LS3可以確保第一生成器G1所生成之第二模態的第一生成影像FK1也能夠還原回第一模態。透過第四損失值LS4可以確保第二生成器G2所生成之第一結構重建影像RC1的物件位置不會改變。The above content is the forward loop architecture. In the process of training the generative adversarial network, the first loss value LS1 can ensure the converted first generated image FK1 of the second modality and the second real image GT2 of the second modality (marked in Figure 3) are similar. The second loss value LS2 can ensure that the object position of the first generated image FK1 generated by the first generator G1 will not change. The third loss value LS3 can ensure that the first generated image FK1 of the second modality generated by the first generator G1 can also be restored to the first modality. The fourth loss value LS4 can ensure that the object position of the reconstructed image RC1 of the first structure generated by the second generator G2 does not change.

如前所述，前進循環架構與退後循環架構兩者的結合可以使第一模態與第二模態之間的相互轉換都能夠相當準確。以下接續說明前進循環架構與退後循環架構兩者的結合。As previously mentioned, the combination of both the forward loop architecture and the backward loop architecture can enable the mutual conversion between the first modality and the second modality to be quite accurate. The following continues to describe the combination of the forward loop architecture and the backward loop architecture.

請參照第3圖，其繪示根據另一實施例之執行影像多模態轉換之生成式對抗網路的訓練系統200。第3圖之執行影像多模態轉換之生成式對抗網路的訓練系統200同時包括前進循環架構與退後循環架構。執行影像多模態轉換之生成式對抗網路的訓練系統200包括第一生成器G1、第二生成器G2、第一鑑別器D1、第二鑑別器D2、第一影像分割網路S1及第二影像分割網路S2。第3圖之上半部為前進循環架構，第3圖之下半部為退後循環架構。前進循環架構同上所述，在此不再重複敘述。以下透過一流程圖詳細說明各項元件之運作。Please refer to FIG. 3, which illustrates a training system 200 of a generative adversarial network for performing image multimodal transformation according to another embodiment. The training system 200 of a generative adversarial network for performing image multimodal transformation in FIG. 3 includes both a forward loop architecture and a backward loop architecture. The training system 200 of a generative adversarial network for performing image multimodal transformation includes a first generator G1, a second generator G2, a first discriminator D1, a second discriminator D2, a first image segmentation network S1, and a first Two image segmentation network S2. The upper half of Figure 3 is a forward loop structure, and the lower half of Figure 3 is a backward loop structure. The forward loop structure is the same as described above, and the description is not repeated here. The following describes the operation of each element in detail through a flow chart.

請參照第3～4圖，第4圖繪示根據另一實施例執行影像多模態轉換之生成式對抗網路的訓練方法的流程圖。第4圖之執行影像多模態轉換之生成式對抗網路的訓練方法同時包括前進循環架構與退後循環架構。步驟S110～S160係為前進循環架構，步驟S210～S270係為退後循環架構。步驟S110～S160之前進循環架構同上所述，在此不再重複敘述。Please refer to FIGS. 3-4. FIG. 4 is a flowchart illustrating a training method of a generative adversarial network for performing image multimodal transformation according to another embodiment. The training method of the generative adversarial network performing image multimodal transformation in Fig. 4 includes both a forward-loop architecture and a backward-loop architecture. Steps S110-S160 are a forward loop structure, and steps S210-S270 are a backward loop structure. The forward loop structure of steps S110 to S160 is the same as that described above, and the description is not repeated here.

在步驟S210中，輸入第二真實影像GT2至第二生成器G2，以獲得一第二生成影像FK2。第二真實影像GT2具有第二模態，第二生成影像FK2具有第一模態。第二真實影像GT2係由影像感測器擷取。影像感測器例如是一紅外線影像擷取裝置、一光電耦合元件、一互補式金氧半導體光學感測元件、或其任意組合。第一模態例如是白天，第二模態例如是夜晚、雨天、夕陽、下雪天等。In step S210, the second real image GT2 is input to the second generator G2 to obtain a second generated image FK2. The second real image GT2 has a second modality, and the second generated image FK2 has a first modality. The second real image GT2 is captured by an image sensor. The image sensor is, for example, an infrared image capturing device, a photo-coupling element, a complementary metal-oxide-semiconductor optical sensing element, or any combination thereof. The first mode is, for example, daytime, and the second mode is, for example, night, rainy day, sunset, snowy day, and the like.

接著，在步驟S220中，如第3圖所示，輸入第二生成影像FK2至第一生成器G1，以獲得一第二結構重建影像RC2。第二結構重建影像RC2具有第二模態。如第3圖所示，第一生成器G1被注入隨機雜訊N(z)，以獲得第二結構重建影像RC2。也就是說，原本第二模態之第二真實影像GT2轉換為第一模態之第二生成影像FK2後，再轉換回第二模態之第二結構重建影像RC2。Next, in step S220, as shown in FIG. 3, the second generated image FK2 is input to the first generator G1 to obtain a second structural reconstructed image RC2. The second structural reconstructed image RC2 has a second modality. As shown in FIG. 3 , the first generator G1 is injected with random noise N(z) to obtain the second reconstructed image RC2 . That is, after the original second real image GT2 of the second modality is converted into the second generated image FK2 of the first modality, it is then converted back to the second structural reconstruction image RC2 of the second modality.

然後，在步驟S230中，如第3圖所示，輸入第二生成影像FK2至第二鑑別器D2，以計算一第五損失值LS5。第五損失值LS5代表轉換後之第一模態的第二生成影像FK2是否與第一模態的第一真實影像GT1相似。Then, in step S230, as shown in FIG. 3, the second generated image FK2 is input to the second discriminator D2 to calculate a fifth loss value LS5. The fifth loss value LS5 represents whether the converted second generated image FK2 of the first modality is similar to the first real image GT1 of the first modality.

接著，在步驟S240中，如第3圖所示，輸入第二生成影像FK2至第二影像分割網路S2，以計算一第六損失值LS6。在此步驟中，第六損失值LS6係依據第二真實影像GT2的影像分割真值（Ground-Truth）SG2與第二生成影像FK2之影像分割進行計算。第六損失值LS6代表第二生成影像FK2的結構與第二真實影像GT2的結構是否一致或近似。在執行影像多模態轉換之生成式對抗網路的訓練過程中考慮了第六損失值LS6可以確保第二生成器G2所生成之第二生成影像FK2的物件位置不會改變。Next, in step S240, as shown in FIG. 3, the second generated image FK2 is input to the second image segmentation network S2 to calculate a sixth loss value LS6. In this step, the sixth loss value LS6 is calculated according to the image segmentation ground truth (Ground-Truth) SG2 of the second real image GT2 and the image segmentation of the second generated image FK2. The sixth loss value LS6 represents whether the structure of the second generated image FK2 is consistent with or similar to the structure of the second real image GT2. Taking the sixth loss value LS6 into consideration during the training of the generative adversarial network performing the image multimodal transformation ensures that the object position of the second generated image FK2 generated by the second generator G2 does not change.

然後，在步驟S250中，如第3圖所示，輸入第二結構重建影像RC2至第一鑑別器D1，以計算一第七損失值LS7。第七損失值LS7代表轉換後之第二模態的第二結構重建影像RC2是否與第二模態的第二真實影像GT2相似。在生成式對抗網路的訓練過程中考慮了第七損失值LS7可以確保第二生成器G2所生成之第一模態的第二生成影像FK2也能夠正確還原回第二模態。Then, in step S250 , as shown in FIG. 3 , the reconstructed image RC2 of the second structure is input to the first discriminator D1 to calculate a seventh loss value LS7 . The seventh loss value LS7 represents whether the converted second structural reconstructed image RC2 of the second modality is similar to the second real image GT2 of the second modality. Considering the seventh loss value LS7 during the training of the generative adversarial network can ensure that the second generated image FK2 of the first modality generated by the second generator G2 can also be correctly restored to the second modality.

接著，在步驟S260中，如第3圖所示，輸入第二結構重建影像RC2至第一影像分割網路S1，以計算一第八損失值LS8。在此步驟中，第八損失值LS8係依據第二真實影像GT2的影像分割真值SG2與第二結構重建影像RC2之影像分割進行計算。第八損失值LS8代表第二結構重建影像RC2的結構與第二真實影像GT2的結構是否一致或近似。在生成式對抗網路的訓練過程中考慮了第八損失值LS8可以確保第一生成器G1所生成之第二結構重建影像RC2的物件位置不會改變。Next, in step S260 , as shown in FIG. 3 , the reconstructed image RC2 of the second structure is input to the first image segmentation network S1 to calculate an eighth loss value LS8 . In this step, the eighth loss value LS8 is calculated according to the image segmentation truth value SG2 of the second real image GT2 and the image segmentation of the second structural reconstructed image RC2. The eighth loss value LS8 represents whether the structure of the second structural reconstructed image RC2 is consistent with or similar to the structure of the second real image GT2. Taking the eighth loss value LS8 into consideration during the training of the generative adversarial network can ensure that the object position of the reconstructed image RC2 of the second structure generated by the first generator G1 does not change.

然後，在步驟S270中，更新第一生成器G1及第二生成器G2的參數，以訓練執行影像多模態轉換之生成式對抗網路。在前進循環架構與退後循環架構結合之實施例中，第一生成器G1及第二生成器G2的參數依據第一損失值LS1、第二損失值LS2、第三損失值LS3、第四損失值LS4、第五損失值LS5、第六損失值LS6、第七損失值LS7及第八損失值LS8進行更新。Then, in step S270, the parameters of the first generator G1 and the second generator G2 are updated to train a generative adversarial network for performing image multimodal transformation. In the embodiment of the combination of the forward loop structure and the backward loop structure, the parameters of the first generator G1 and the second generator G2 are based on the first loss value LS1, the second loss value LS2, the third loss value LS3, the fourth loss value The value LS4, the fifth loss value LS5, the sixth loss value LS6, the seventh loss value LS7, and the eighth loss value LS8 are updated.

上述內容為前進循環架構與退後循環架構的結合，在訓練執行影像多模態轉換之生成式對抗網路的過程中，透過第一損失值LS1可以確保轉換後之第二模態的第一生成影像FK1與第二模態的真實影像相似。透過第二損失值LS2可以確保第一生成器G1所生成之第一生成影像FK1的物件位置不會改變。透過第三損失值LS3可以確保第一生成器G1所生成之第二模態的第一生成影像FK1也能夠還原回第一模態。透過第四損失值LS4可以確保第二生成器G2所生成之第一結構重建影像RC1的物件位置不會改變。透過第五損失值LS5可以確保轉換後之第一模態的第二生成影像FK2與第一模態的第一真實影像GT1相似。透過第六損失值LS6可以確保第二生成影像FK2的結構與第二真實影像GT2的結構一致或近似。透過第七損失值LS7可以確保轉換後之第二模態的第二結構重建影像RC2與第二模態的第二真實影像GT2相似。透過第八損失值LS8可以確保第二結構重建影像RC2的結構與第二真實影像GT2的結構一致或近似。The above content is the combination of the forward loop architecture and the backward loop architecture. In the process of training the generative adversarial network for performing image multimodal transformation, the first loss value LS1 can ensure the first loss of the second modality after the transformation. The generated image FK1 is similar to the real image of the second modality. The second loss value LS2 can ensure that the object position of the first generated image FK1 generated by the first generator G1 will not change. The third loss value LS3 can ensure that the first generated image FK1 of the second modality generated by the first generator G1 can also be restored to the first modality. The fourth loss value LS4 can ensure that the object position of the reconstructed image RC1 of the first structure generated by the second generator G2 does not change. The fifth loss value LS5 can ensure that the converted second generated image FK2 of the first modality is similar to the first real image GT1 of the first modality. Through the sixth loss value LS6, it can be ensured that the structure of the second generated image FK2 is consistent with or similar to the structure of the second real image GT2. The seventh loss value LS7 can ensure that the converted second structural reconstructed image RC2 of the second modality is similar to the second real image GT2 of the second modality. Through the eighth loss value LS8, it can be ensured that the structure of the second structure reconstructed image RC2 is consistent with or similar to the structure of the second real image GT2.

以下更透過網路架構圖詳細說明第一損失值LS1、第二損失值LS2、第三損失值LS3、第四損失值LS4、第五損失值LS5、第六損失值LS6、第七損失值LS7及第八損失值LS8之內容及其計算方式。The first loss value LS1, the second loss value LS2, the third loss value LS3, the fourth loss value LS4, the fifth loss value LS5, the sixth loss value LS6, and the seventh loss value LS7 are described in detail below through the network architecture diagram. and the content of the eighth loss value LS8 and its calculation method.

請參照第5圖，其繪示執行影像多模態轉換之生成式對抗網路的網路架構圖。第一模態之第一真實影像x輸入至編碼器

及生成器

，並加入隨機雜訊N(z)後，輸出第二模態之第一生成影像

。生成器

的功能在於將第一模態轉換為第二模態。鑑別器

負責判斷轉換後的第一生成影像

是否與真實的第二真實影像y相似。Please refer to FIG. 5 , which shows a network architecture diagram of a generative adversarial network performing image multimodal transformation. The first real image x of the first mode is input to the encoder

and generator

, and after adding random noise N(z), the first generated image of the second mode is output

. Builder

The function of is to convert the first mode to the second mode. Discriminator

Responsible for judging the converted first generated image

Whether it is similar to the real second real image y.

在前進循環架構（forward cycle）中，第一損失值LS1的計算方式如以下式（1）：In the forward cycle architecture, the first loss value LS1 is calculated as the following formula (1):

………………………（1）

………………………(1)

在上式（1）中，生成器

於網路學習中的目的是降低損失，鑑別器

的目地則是最大化損失。例如，若鑑別器

完美的運作，則上式（1）中

會輸出1，也就是

=0，而對於注入雜訊的生成器

所產生的資料

來說

，所以

=0。In the above formula (1), the generator

The purpose of network learning is to reduce the loss, the discriminator

The goal is to maximize the loss. For example, if the discriminator

Perfect operation, then the above formula (1)

will output 1, which is

=0, while for generators injecting noise

data generated

for

,so

=0.

上式（1）的計算方式同樣的也可被用在退後循環架構中使用相同生成器

與鑑別器

之第七損失值LS7，其方程式為

。The same calculation method of Equation (1) above can also be used in the backward loop architecture using the same generator

with the discriminator

The seventh loss value LS7, its equation is

.

其中，第一模態之第二生成影像

輸入至編碼器

及生成器

，並加入隨機雜訊N(z)後，輸出第二模態之第二結構重建影像

。使用相同生成器

與鑑別器

所產生的第一損失值LS1與第七損失值LS7主要的差別在於，前者是判斷轉換後之第二模態的第一生成影像

是否夠真實，後者是判斷退後循環架構中所重建之第二模態的第二結構重建影像

是否夠真實。Among them, the second generated image of the first modality

input to encoder

and generator

, and after adding random noise N(z), the reconstructed image of the second structure of the second mode is output

. use the same generator

with the discriminator

The main difference between the generated first loss value LS1 and the seventh loss value LS7 is that the former is the first generated image for judging the converted second modality

Whether it is real enough, the latter is to judge the reconstructed image of the second structure of the second modality reconstructed in the backward loop structure

Is it real enough.

類似的，在退後循環架構中，使用生成器

與鑑別器

所產生之第五損失值LS5的計算方式如下式（2）。Similarly, in a backward loop architecture, use generators

with the discriminator

The calculation method of the generated fifth loss value LS5 is as follows (2).

……………………………….（2）

……………………………….(2)

其中，第二模態之第二真實影像y輸入至編碼器

及生成器

後，輸出第一模態之第二生成影像

。生成器

的功能在於將第二模態轉換為第一模態。鑑別器

負責判斷轉換後的第二生成影像

是否與真實的第一真實影像x相似。值得一提的是，在第二模態轉換至第一模態時，並不需要加入隨機雜訊N(z)。Among them, the second real image y of the second mode is input to the encoder

and generator

After that, output the second generated image of the first mode

. Builder

The function of is to convert the second modality to the first modality. Discriminator

Responsible for judging the converted second generated image

Is it similar to the real first real image x. It is worth mentioning that random noise N(z) does not need to be added when the second mode is converted to the first mode.

上式（2）的計算方式同樣的也可被用在前進循環架構中使用相同生成器

與鑑別器

之第三損失值LS3，其方程式為

。The same calculation method of Equation (2) above can also be used in the forward loop architecture using the same generator

with the discriminator

The third loss value LS3, its equation is

.

此外，前進循環架構之第二損失值LS2的計算方式如下。In addition, the calculation method of the second loss value LS2 of the forward loop architecture is as follows.

……………………………………（3）

……………………………… (3)

如第5圖所示，第一生成影像

輸入編碼器

及分割器

後，獲得影像分割

，第一真實影像x之影像分割真值（Ground-Truth）

與影像分割

利用比對函數

進行比對。假始生成器

轉換的第一生成影像

真實且結構與第一真實影像x一致或近似，則第二損失值LS2則會相當的小。As shown in Fig. 5, the first generated image

input encoder

and divider

After that, get the image segmentation

, the ground-truth of the first real image x

with image segmentation

Use the compare function

Compare. fake generator

Converted first generated image

If it is real and the structure is consistent with or similar to the first real image x, the second loss value LS2 will be quite small.

類似的，退後循環架構之第六損失值LS6之計算方式如下式（4）。Similarly, the calculation method of the sixth loss value LS6 of the backward loop structure is as follows (4).

………………………….（4）

………………………….(4)

如第5圖所示，第二生成影像

輸入編碼器

及分割器

後，獲得影像分割

，第二真實影像y之影像分割真值（Ground-Truth）

與影像分割

進行比對。假始生成器

轉換的第二生成影像

真實且結構與第二真實影像y一致，則第六損失值LS6則會相當的小。As shown in Fig. 5, the second generated image

input encoder

and divider

After that, get the image segmentation

, the ground-truth of the second real image y

with image segmentation

Compare. fake generator

Converted second generated image

If it is real and the structure is consistent with the second real image y, the sixth loss value LS6 will be quite small.

此外，前進循環架構中的第三損失值LS3之計算方式如下式（5）。In addition, the calculation method of the third loss value LS3 in the forward loop structure is as follows (5).

…………………………………..（5）

……………………………………..(5)

如第5圖所示，第一結構重建影像

輸入編碼器

及分割器

後，獲得影像分割

，第一真實影像x之影像分割真值（Ground-Truth）

與影像分割

進行比對，假始結構一致或近似，則第三損失值LS3則會相當的小。As shown in Figure 5, the reconstructed image of the first structure

input encoder

and divider

After that, get the image segmentation

, the ground-truth of the first real image x

with image segmentation

For comparison, if the pseudo-initial structure is consistent or similar, the third loss value LS3 will be quite small.

類似的，退後循環架構之第八損失值LS8之計算方式如下式（6）。Similarly, the calculation method of the eighth loss value LS8 of the backward loop structure is as follows (6).

…………………………………….（6）

…………………………………….(6)

最後，整個網路目標函數為Finally, the entire network objective function is

……………………………………（7）

…………………………………… (7)

而整個網路最佳化的目標在於從下式（8）中找到最佳的

,

,

,

,

與

。

………………………………………………………………..（8）The goal of the entire network optimization is to find the best

,

and

.

………………………………………………………………..(8)

在完成整個網路的訓練後，輸入同一張第一模態的影像與不同的隨機雜訊N(z)輸入至生成器

，即可產生結構與此第一模態之影像相同的數張第二模態的影像，其中第二模態之影像的整體亮度、影像中車的車燈明亮度都不同。After completing the training of the entire network, input the same first modality image and different random noise N(z) to the generator

, several images of the second mode with the same structure as the image of the first mode can be generated, wherein the overall brightness of the images of the second mode and the brightness of the lights of the vehicle in the images are different.

本揭露之技術可供離線訓練行車物體偵測/語義分割/實例分割模型使用，而這些模型可運用機器學習、深度學習技術實現。The technology of the present disclosure can be used for offline training of driving object detection/semantic segmentation/instance segmentation models, and these models can be implemented using machine learning and deep learning technologies.

根據上述說明，本揭露至少具有下列優點：According to the above description, the present disclosure has at least the following advantages:

本揭露之生成式對抗網路可以轉換各種不同程度的模態（如行車天候），且影像中的標記位置不會改變或近似。The generative adversarial network of the present disclosure can transform modalities of varying degrees (eg, driving weather) without changing or approximating the location of markers in the image.

本揭露所生成之生成影像，由於標記位置不會改變或近似，因此原始真實影像的標記可持續使用，因而可大量降低人工標記成本。In the generated image generated by the present disclosure, since the marking position will not change or approximate, the marking of the original real image can be continuously used, thus greatly reducing the cost of manual marking.

此外，本揭露可以透過一張真實影像及其標記轉換初數張其他模態的生成影像，這些其他模態的生成影像可用於訓練或是提升其他模態下，物體偵測/語義分割/實例分割模型的辨識率。In addition, the present disclosure can transform the first few generated images of other modalities through a real image and its tags, and these generated images of other modalities can be used for training or to improve object detection/semantic segmentation/instances in other modalities The recognition rate of the segmentation model.

綜上所述，雖然本揭露已以實施例揭露如上，然其並非用以限定本揭露。本揭露所屬技術領域中具有通常知識者，在不脫離本揭露之精神和範圍內，當可作各種之更動與潤飾。因此，本揭露之保護範圍當視後附之申請專利範圍所界定者為準。To sum up, although the present disclosure has been disclosed above with embodiments, it is not intended to limit the present disclosure. Those with ordinary knowledge in the technical field to which the present disclosure pertains can make various changes and modifications without departing from the spirit and scope of the present disclosure. Therefore, the scope of protection of the present disclosure should be determined by the scope of the appended patent application.

100, 200:執行影像多模態轉換之生成式對抗網路的訓練系統 D1:第一鑑別器 D2:第二鑑別器

,

:鑑別器

,

:編碼器 FK1:第一生成影像 FK2:第二生成影像 G1:第一生成器 G2:第二生成器 GT1:第一真實影像 GT2:第二真實影像

,

:生成器 LS1:第一損失值 LS2:第二損失值 LS3:第三損失值 LS4:第四損失值 LS5:第五損失值 LS6:第六損失值 LS7:第七損失值 LS8:第八損失值

:比對函數 N(z) :隨機雜訊

,

:分割器 RC1:第一結構重建影像 RC2:第二結構重建影像 S1:第一影像分割網路 S2:第二影像分割網路 S110, S120, S130, S170, S150, S160, S170, S210, S220, S230, S240, S250, S260, S270:步驟 SG1:第一真實影像的影像分割真值 SG2:第二真實影像的影像分割真值 x:第一真實影像

:第一真實影像之影像分割真值

:第二生成影像

:第二生成影像的影像分割

:第一結構重建影像

:第一結構重建影像的影像分割 y:第二真實影像

:第二真實影像之影像分割真值

:第一生成影像

:第一生成影像的影像分割

:第二結構重建影像

:第二結構重建影像的影像分割100, 200: Training system for generative adversarial network performing multimodal transformation of images D1: First discriminator D2: Second discriminator

,

: discriminator

,

: encoder FK1: first generated image FK2: second generated image G1: first generator G2: second generator GT1: first real image GT2: second real image

,

: generator LS1: first loss value LS2: second loss value LS3: third loss value LS4: fourth loss value LS5: fifth loss value LS6: sixth loss value LS7: seventh loss value LS8: eighth loss value

: Comparison function N(z) : Random noise

,

: segmenter RC1: reconstructed image of first structure RC2: reconstructed image of second structure S1: first image segmentation network S2: second image segmentation network S110, S120, S130, S170, S150, S160, S170, S210, S220 , S230, S240, S250, S260, S270: Step SG1: The true value of image segmentation of the first real image SG2: The true value of image segmentation of the second real image x: The first real image

: The true value of the image segmentation of the first real image

: Second generated image

: Image segmentation of the second generated image

: Reconstructed image of the first structure

: Image segmentation of the reconstructed image of the first structure y: The second real image

: The true value of the image segmentation of the second real image

: First generated image

: Image segmentation of the first generated image

: Reconstructed image of the second structure

: Image segmentation of reconstructed images of the second structure

第1圖繪示根據一實施例之執行影像多模態轉換之生成式對抗網路的訓練系統的方塊圖。第2圖繪示根據一實施例執行影像多模態轉換之生成式對抗網路的訓練方法的流程圖。第3圖繪示根據另一實施例之執行影像多模態轉換之生成式對抗網路的訓練系統的方塊圖。第4圖繪示根據另一實施例執行影像多模態轉換之生成式對抗網路的訓練方法的流程圖。第5圖繪示執行影像多模態轉換之生成式對抗網路的網路架構圖。附圖1示例說明原始的第一真實影像。附圖2示例說明加入隨機雜訊後，所模擬出夜晚的第一生成影像。附圖3示例說明第一結構重建影像。附圖4示例說明第一真實影像之影像分割真值。附圖5示例說明第一生成影像之影像分割。附圖6示例說明第一結構重建影像之影像分割結果。FIG. 1 illustrates a block diagram of a training system for a generative adversarial network that performs image multimodal transformation, according to one embodiment. FIG. 2 is a flowchart illustrating a training method of a generative adversarial network for performing image multimodal transformation according to an embodiment. FIG. 3 shows a block diagram of a training system for a generative adversarial network that performs image multimodal transformation according to another embodiment. FIG. 4 is a flowchart illustrating a training method of a generative adversarial network for performing image multimodal transformation according to another embodiment. Figure 5 shows a network architecture diagram of a generative adversarial network that performs image multimodal transformation. Figure 1 illustrates the original first real image. FIG. 2 illustrates an example of the first generated image at night simulated after adding random noise. FIG. 3 illustrates the reconstructed image of the first structure. FIG. 4 illustrates the image segmentation ground truth of the first real image. FIG. 5 illustrates image segmentation of the first generated image. FIG. 6 illustrates the image segmentation result of the reconstructed image of the first structure.

100:執行影像多模態轉換之生成式對抗網路之訓練系統100: A training system for generative adversarial networks performing multimodal transformation of images

D1:第一鑑別器D1: first discriminator

D2:第二鑑別器D2: Second discriminator

FK1:第一生成影像FK1: First generated image

G1:第一生成器G1: First generator

G2:第二生成器G2: Second generator

GT1:第一真實影像GT1: The first real image

LS1:第一損失值LS1: first loss value

LS2:第二損失值LS2: second loss value

LS3:第三損失值LS3: third loss value

LS4:第四損失值LS4: Fourth loss value

N(z):隨機雜訊N(z): random noise

RC1:第一結構重建影像RC1: First Structure Reconstructed Image

S1:第一影像分割網路S1: The first image segmentation network

S2:第二影像分割網路S2: Second Image Segmentation Network

SG1:第一真實影像的影像分割真值SG1: Image segmentation ground truth of the first real image

Claims

A method for training a generative adversarial network that performs image multimodal transformation, the generative adversarial network is used to perform image multimodal transformation for structure preservation, and the generative adversarial network that performs image multimodal transformation Training methods include: inputting a first real image to a first generator to obtain a first generated image, the first real image has a first modality, and the first generated image has a second modality; inputting the first generated image to a second generator to obtain a first structural reconstructed image, the first structural reconstructed image having the first modality; inputting the first generated image to a first discriminator to calculate a first loss value; inputting the first generated image to a first image segmentation network to calculate a second loss value; inputting the reconstructed image of the first structure to a second discriminator to calculate a third loss value; inputting the reconstructed image of the first structure to a second image segmentation network to calculate a fourth loss value; and Updating the parameters of the first generator and the second generator to train the generative adversarial network performing image multimodal transformation, the parameters of the first generator and the second generator at least according to the first loss value, the second loss value, the third loss value and the fourth loss value are updated.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 1, wherein the first loss value is calculated according to a second real image.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 1, wherein the second loss value is calculated according to an image segmentation ground truth value of the first real image.

The training method of a generative adversarial network for performing image multimodal transformation as described in claim 1, wherein the third loss value is calculated according to the first real image.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 1, wherein the fourth loss value is calculated according to an image segmentation ground truth value of the first real image.

The method for training a generative adversarial network for performing image multimodal transformation as described in claim 1, further comprising: inputting a second real image to the second generator to obtain a second generated image, the second real image has the second modality, and the second generated image has the first modality; inputting the second generated image to the first generator to obtain a second structural reconstructed image, the second structural reconstructed image having the second modality; inputting the second generated image to the second discriminator to calculate a fifth loss value; inputting the second generated image to the second image segmentation network to calculate a sixth loss value; inputting the second structural reconstruction image to the first discriminator to calculate a seventh loss value; and inputting the reconstructed image of the second structure to the first image segmentation network to calculate an eighth loss value; The parameters of the first generator and the second generator are based on the first loss value, the second loss value, the third loss value, the fourth loss value, the fifth loss value, and the sixth loss value , the seventh loss value and the eighth loss value are updated.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 6, wherein the fifth loss value is calculated according to the first real image.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 6, wherein the sixth loss value is calculated according to an image segmentation ground truth value of the second real image.

The training method of a generative adversarial network for performing image multimodal transformation as described in claim 6, wherein the seventh loss value is calculated according to the second real image.

The training method of a generative adversarial network for performing image multimodal transformation as claimed in claim 6, wherein the eighth loss value is calculated according to an image segmentation ground truth value of the second real image.

The training method of a generative adversarial network for performing image multi-modal conversion as described in item 1 of the claimed scope, wherein the first modality and the second modality include day, night, rainy days, and snowy days.

The training method of a generative adversarial network for performing image multimodal transformation as described in claim 1, wherein the first image segmentation network and the second image segmentation network are semantic segmentation networks.

The method for training a generative adversarial network for performing image multimodal transformation as described in claim 1, wherein the first generator further obtains the first generated image according to a random noise.

A training system for a generative adversarial network that performs image multimodal transformation, the generative adversarial network is used to perform structure-preserving image multimodal transformation, and the generative adversarial network that performs image multimodal transformation The training system includes: a first generator for receiving a first real image to obtain a first generated image, the first real image has a first modality, and the first generated image has a second modality; a second generator for receiving the first generated image to obtain a first structural reconstructed image, the first structural reconstructed image having the first modality; a first discriminator for receiving the first generated image to calculate a first loss value; a second discriminator for receiving the reconstructed image of the first structure to calculate a second loss value; a first image segmentation network for receiving the first generated image to calculate a third loss value; and a second image segmentation network for receiving the reconstructed image of the first structure to calculate a fourth loss value; The parameters of the first generator and the second generator are updated according to at least the first loss value, the second loss value, the third loss value and the fourth loss value, so as to train and perform image multimodal conversion The generative adversarial network.

The training system for a generative adversarial network performing image multimodal transformation as recited in claim 14, wherein the first loss value is calculated based on a second real image.

The training system for a generative adversarial network performing image multimodal transformation as described in claim 14, wherein the second loss value is calculated based on the first real image.

The training system for a generative adversarial network performing multimodal transformation of images as recited in claim 14, wherein the third loss value is calculated according to an image segmentation truth value of the first real image.

The training system of a generative adversarial network for performing image multimodal transformation as recited in claim 14, wherein the fourth loss value is calculated according to an image segmentation truth value of the first real image.

The training system for performing generative adversarial networks for image multimodal transformation as described in claim 14, wherein The second generator further receives a second real image to obtain a second generated image, the second real image has the second modality, and the second generated image has the first modality; The first generator further receives the second generated image to obtain a second structural reconstructed image, the second structural reconstructed image has the second modality; The second discriminator further receives the second generated image to calculate a fifth loss value; The first discriminator further receives the reconstructed image of the second structure to calculate a sixth loss value; The second image segmentation network further receives the second generated image to calculate a seventh loss value; and The first image segmentation network further receives the reconstructed image of the second structure to calculate an eighth loss value; The parameters of the first generator and the second generator are based on the first loss value, the second loss value, the third loss value, the fourth loss value, the fifth loss value, and the sixth loss value , the seventh loss value, and the eighth loss value are updated to train the generative adversarial network performing image multimodal transformation.

The training system for a generative adversarial network performing multimodal transformation of images as described in claim 19, wherein the fifth loss value is calculated based on the first real image.

The training system for a generative adversarial network performing image multimodal transformation as described in claim 19, wherein the sixth loss value is calculated based on the second real image.

The training system for a generative adversarial network performing image multimodal transformation as recited in claim 19, wherein the seventh loss value is calculated according to an image segmentation ground truth value of the second real image.

The training system for a generative adversarial network performing image multimodal transformation as recited in claim 19, wherein the eighth loss value is calculated according to an image segmentation ground truth value of the second real image.

The training system for a generative adversarial network for performing image multi-modal conversion as described in claim 14, wherein the first modality and the second modality include day, night, sunny, rainy, and snowy days .

The training system for a generative adversarial network for performing image multi-modal transformation as described in claim 14, wherein the first image segmentation network and the second image segmentation network are semantic segmentation networks.

The training system for a generative adversarial network for performing image multi-modal conversion as described in claim 14, wherein the first generator further obtains the first generated image according to a random noise.