CN115544613A

CN115544613A - Multi-mode data-driven urban road layout design automation method

Info

Publication number: CN115544613A
Application number: CN202211138806.8A
Authority: CN
Inventors: 冯天; 张微; 杨乐豪; 张继灵
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2022-12-30

Abstract

The invention discloses a multi-mode data-driven urban road layout design automation method. The invention introduces a multi-mode data fusion module in an urban road network layout design task, fuses three types of data of human mouth density, terrain elevation and land utilization classification, inputs the data into a Conditional generation countermeasure network (CGAN), and generates a road layout corresponding to the three types of data distribution. Compared with the traditional manual design method, the method does not need professional knowledge in related fields, is not limited by a road template, and can efficiently generate urban road layouts with various characteristics. Compared with a method only using a generated confrontation network (GAN), the method introduces multi-modal geospatial data, compresses and reduces dimensions of the data, avoids the problem of high computational power requirement caused by a large number of parameters, and highlights key characteristics of the data, so that the generated urban road layout has higher quality in vision and structure.

Description

Multi-mode data-driven urban road layout design automation method

Technical Field

The invention relates to the field of deep learning and computational vision, in particular to a multi-mode data-driven urban road layout design automation method.

Background

In recent years, with the rapid development of deep learning technology and the field of computer vision, more and more traditional tasks burst new strength by means of the deep learning technology, and one of the tasks of urban road layout modeling is. In urban planning, a planned road needs to have certain connectivity and ensure that the road can meet the requirement of high traffic volume. In addition, the road layout modeling is also applied to the game industry, game players pursue abundant and changeable game scenes, and the instantly generated virtual environment can enhance the experience of the game players. Road layout planning also plays an important role for autonomous vehicles, which are tested by creating virtual roads for various city streets.

The existing automatic method for urban road layout design can be divided into two types: first, process Modeling (Procedural Modeling). Process modeling designs urban road layouts based on a rule set customized by a technician. This manual design modeling approach is very time consuming and inefficient, and while ensuring that the roads meet the user-specified constraint rules, it is still inflexible. And in the second category, automatic generation algorithms based on deep learning technology are adopted. A Generative Adaptive Network (GAN) is adopted to learn a low-dimensional generation model from a high-dimensional model containing shape features, and then a road layout is generated through sampling points of a low-dimensional space, so that the road layout modeling is performed more quickly and efficiently. Although the urban road layout obtained by the method is rich and various, the urban road layout is often low in quality, the road connectivity cannot be guaranteed, and the road cannot be designed by combining spatial geographic information and user requirements.

Therefore, a new automatic generation algorithm based on the deep learning technology needs to be designed, so that the road layout can be constrained while the high efficiency and the generation quality are ensured, and the corresponding space geographic information is met.

Disclosure of Invention

The technical problem to be solved by the invention is how to introduce multi-modal spatial geographic information into the GAN to generate a high-quality urban road layout design drawing meeting corresponding input conditions. The invention provides a multi-mode data-driven urban road layout design automation method.

The invention adopts the following specific technical scheme:

a multi-mode data-driven urban road layout design automation method comprises the following steps: acquiring population density grid data, terrain elevation grid data and land utilization grid data of a target area to form multi-modal data, inputting the multi-modal data into a pre-trained multi-modal data fusion module, compressing and reducing dimensions of different modal data by the multi-modal data fusion module, outputting a coding graph, generating a confrontation network (CGAN) module by a pre-trained condition to capture geospatial features and road texture features in the coding graph, and finally outputting a corresponding urban road layout design graph;

the multi-modal data fusion module is a self-encoder and consists of an encoder and a decoder; the input different modal data are overlapped along the dimension of the characteristic channel and then input into an encoder, and the encoder outputs a fused encoding graph; in the pre-training stage, the coding diagram output by the encoder is input to the decoder, the decoder outputs the reconstructed data of each mode, and the reconstructed data of each mode is close to the original data of each mode through training; in the inference stage, the coded graph output by the coder is directly used as the output of the multi-mode data fusion module;

the condition generation countermeasure network module consists of a generator and a discriminator; the generator takes the coded map output by the multi-mode data fusion module as input and outputs a road layout design map; in the pre-training stage, the road layout design diagram output by the generator is input into a discriminator, the discriminator takes the real road network layout diagram and the encoding diagram as the other two inputs, three scales of block diagrams (Patch) of 128 × 128, 32 × 32 and 8 × 8 are output, the authenticity of the input diagram is discriminated from three angles of shallow layer, middle layer and deep layer characteristics, and the discriminator and the generator are optimized alternately; in the inference phase, the road layout design diagram output by the generator is directly used as a condition to generate the output of the confrontation network module.

Preferably, the encoder and the decoder in the multi-modal data fusion module both use a U-Net model as a baseline model.

Preferably, the multi-modal data fusion module performs training by optimizing the average absolute error between the reconstructed modal data and the original modal data in a pre-training stage.

Preferably, in the multi-modal data fusion module, the input population density raster data, terrain elevation raster data and land use raster data are 512 × 512-sized rasterized images, and the size of the code pattern formed by fusing the three rasterized images of different modalities is also 512 × 512.

Preferably, the input of the generator is the coding pattern output by the encoder, the generator adopts a U-Net structure and comprises 8 down-sampling modules, 7 up-sampling modules and 1 output module; the first 4 downsampling modules are 4 x 4 convolutional layers with a normalization layer and a LeakyRelu activation function, and the last 4 downsampling modules comprise 4 x 4 convolutional layers with a normalization layer and a LeakyRelu activation function and a Dropout layer; the size of the feature map is reduced by half every time the feature map passes through a down-sampling module; the first 4 upsampling modules comprise 4 x 4 deconvolution layers with normalization layers and Relu activation functions and Dropout layers, and the last 3 upsampling modules are 4 x 4 deconvolution layers with normalization layers and Relu activation functions; the output module comprises an upper sampling layer and a 3 x 3 convolution layer with a Tanh activation function, and the upper sampling layer doubles the image size by adopting neighbor interpolation; the 8 up-sampling modules and the 7 down-sampling modules are connected by adopting a Skip-connection structure.

Preferably, the number of the discriminators is 3, the discriminators all adopt a PatchGAN structure, the input of each discriminator is the code graph output by the encoder, the real road network layout graph and the road layout design graph output by the generator, the three input sizes of the discriminators are all 512 × 512, the sizes of the output graphs of the discriminators are respectively 128 × 128, 32 × 32 and 8 × 8, and further, the comprehensive authenticity of the input image is discriminated under three feature sizes of a shallow layer, a middle layer and a deep layer.

Preferably, the three discriminators are specifically as follows:

the first discriminator consists of 2 down-sampling modules and 1 output layer, wherein the down-sampling modules comprise 4 × 4 convolution layers with a normalization layer and a Relu activation function, the output layers are 3 × 3 convolution layers, and an output image is a single-channel image block with the size of 128 × 128;

the second discriminator consists of 4 down-sampling modules and 1 output layer, wherein the down-sampling module comprises 4 × 4 convolution layers with a normalization layer and a Relu activation function, the output layer is a 3 × 3 convolution layer, and the output image is a single-channel 32 × 32-size image block;

the third discriminator consists of 6 down-sampling modules and 1 output layer, wherein the down-sampling modules comprise 4 × 4 convolution layers with a normalization layer and a Relu activation function, the output layers are 3 × 3 convolution layers, and the output image is a single-channel 8 × 8-size image block.

Preferably, the loss function adopted by the condition generation confrontation network module training is the sum of the losses of three discriminators, and the loss function of each discriminator is in the form of MSE loss.

Compared with the prior art, the invention has the following beneficial effects:

the invention introduces a multi-mode data fusion module in an urban road network layout design task, fuses three data of population density, terrain elevation and land utilization, inputs the data into a CGAN module, and generates a road layout according with population, terrain and land types. Compared with the traditional manual design method, the method does not need professional knowledge in related fields, is not limited by a road template, and can efficiently generate urban road layouts with various characteristics. Compared with a method only using GAN, the method introduces multi-modal geospatial data, compresses and reduces the dimension of the data, avoids the problem of high computational demand caused by a large number of parameters, highlights the key characteristics of the data, and enables the generated urban road layout to have higher quality in vision and structure.

Drawings

FIG. 1 is a diagram of a multi-modal data-driven urban road layout design model architecture;

FIG. 2 is a schematic diagram of a generator configuration;

FIG. 3 is a schematic diagram of a structure of a discriminator;

FIG. 4 is a diagram illustrating the pre-training multi-modal data fusion module 200 cycle generation results;

FIG. 5 is a schematic diagram of an optimization process for generating a road layout plan by a generator;

FIG. 6 is a schematic diagram of a road layout design drawing generation test result.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will recognize without departing from the spirit and scope of the present invention. The technical characteristics in the embodiments of the present invention can be combined correspondingly without mutual conflict.

Modality refers to the way things happen or exist, and may also refer to the source or representation of information, and multimodal refers to a combination of two or more modalities. Just as understanding of strange concepts by people is based on the joint participation of senses of touch such as vision, smell, hearing and the like, a model needs data of different modes if comprehensive semantic information of things needs to be captured. Different modal expressions are different, different characteristics of the same object can be captured, information redundancy exists among multi-modal data, and the multi-modal data can be combined with each other to generate richer semantic characteristic information, which is redundancy and complementarity of the multi-modal data. The fusion of multi-modal data is necessary, however, the fusion of multi-modal data is difficult due to different expression forms of different modal information, and the current mainstream adopts a deep learning-based method to deal with the problem of multi-modal data fusion.

One of the cores of the present invention is the processing of multimodal data fusion, exemplified by population density data, terrain elevation data and land utilization data as three types of multimodal data, and the use of self-coders to handle multimodal data fusion problems. It should be noted that the specific form of the multi-modal data processing module is not limited, and other encoder-decoder structure models can be adopted.

Another core of the present invention is an improvement of Conditional generation countermeasure Networks (CGANs) and an embedded combination of CGAN modules and multimodal data fusion modules. The generator in the CGAN module adopts a U-Net structure, the arbiter adopts a PatchGAN structure and adopts a Multi-scale Discriminator idea.

In a preferred embodiment of the present invention, a multi-modal data-driven automated method for urban road layout design is provided, and the overall model structure of the method is shown in fig. 1. As can be seen from FIG. 1, the model is divided into an upper part and a lower part, the upper part is a multi-mode data fusion module, and the model is a self-encoder, and is divided into an encoder and a decoder. The input of the self-encoder is data of different modes, and the grid data of three modes are used in the invention, namely population density data, terrain elevation data and land utilization data. All three types of modal data are single-channel rasterized images with the size of 512 × 512, and are used as final input of the multi-modal data fusion module after being overlapped along the dimension of the characteristic channel of the images. In the multi-modal data fusion module, an input image is fused into a coded graph containing three modal semantic features through an encoder, the size of the coded graph is a single-channel 512-512 image, the coded graph has multi-modal semantic information, and meanwhile the problem of high computation power requirement caused by a large number of parameters is avoided. In addition, the encoded image can be restored to the original rasterized image by a decoder.

It should be noted that the decoder is mainly used in the model training phase, which is to assist the encoder in training, but the decoder does not participate in the inference process in the inference phase. Therefore, in the pre-training stage, the coding diagram output by the encoder is input to the decoder, the decoder outputs the reconstructed data of each mode, and the reconstructed data of each mode is close to the original data of each mode through training; and in the inference stage, the coded graph output by the coder is directly used as the output of the multi-modal data fusion module. In this embodiment, in the pre-training stage of the multi-modal data fusion module, the pre-training stage may perform training by optimizing the average absolute error between the reconstructed modal data and the original modal data, that is, training the module by optimizing the L1 loss of the input image and the restored image output by the decoder, and making the reconstructed modal data approach the original modal data by training.

In this embodiment, the objective function of the multi-modal data fusion model can be expressed as:

a is a self-encoder, and x is the input of all modal data after the feature channel dimensions are superposed.

As shown in fig. 1, the lower part is a CGAN module consisting of a generator G and a discriminator D. The generator inputs the single-channel 512-by-512 coded graph output by the encoder, and outputs the single-channel 512-by-512 road network layout design graph. The arbiter D has 3 arbiters, all adopt PatchGAN structure, but output three kinds of different size blocks (Patch) separately. This structure is called a Multi-scale Discriminator. The three discriminators discriminate the integrated authenticity of the input image in three sizes, small, medium and large. Specifically, the inputs to the three discriminators are both two sets of data. One group is a single-channel coding graph and a real road layout graph with the size of 512 × 512, and the other group is a single-channel coding graph and a road layout design graph with the size of 512 × 512. The two sets of data are down sampled by different discriminators to generate three patches of different sizes. The three discriminators are respectively sampled to three sizes of 128 x 128, 32 x 32 and 8 x 8, and the three sizes represent three characteristics of a shallow layer, a middle layer and a deep layer. Also, it should be noted that the discriminator D is mainly used in the model training phase, and its role is to assist the generator G in training, but in the inference phase, the discriminator D does not participate in the inference process. Therefore, in the pre-training stage of the CGAN module, the road layout design graph output by the generator is input into the discriminator, the discriminator takes the real road network layout graph and the encoding graph as the other two inputs, the discriminator outputs blocks (Patch) with three scales of 128 × 128, 32 × 32 and 8 × 8, the authenticity of the input graph is discriminated from three angles of shallow layer, middle layer and deep layer characteristics, and the discriminator and the generator are optimized alternately; in the inference stage, the road layout design diagram output by the generator is directly used as a condition to generate the output of the confrontation network module, namely the final output result.

The training process of the CGAN module is consistent with the training mode of the conventional conditional generation confrontation network, and the network can be trained in an alternate training mode, namely, the arbiter network D is trained firstly and then the generator network G is trained. The Loss function form of the discriminator is MSE Loss, namely, the MSE Loss is calculated by Patch generated by a real road layout and all 1 matrixes with the same size, and the MSE Loss is calculated by Patch generated by a road layout design and all 0 matrixes with the same size. However, since the Multi-scale Discriminator structure employs three discriminators, the loss function employed in calculating the loss needs to be the sum of the losses of the three discriminators.

In the embodiment, the specific generator and discriminator network structure and parameters in the CGAN module are optimized according to the actual urban road layout design scenario.

Specifically, the generator structure is shown in fig. 2. The generator adopts a U-Net structure and comprises 8 down-sampling modules (d 1-d 8), 7 up-sampling modules (U1-U7) and 1 output module (U8). The downsampling module is 4 by 4 convolutional layers (with normalization layer and LeakyRelu activation function). The upsampling module includes 4 by 4 deconvolution layers (band normalization layer and Relu activation function). The choice of whether to use a Dropout layer or not. The downsampling block numbers d5-d8 and the upsampling block numbers u1-u1 in this example use a Dropout layer. The output layers include an upsampling layer (using nearest neighbor interpolation) and 3 x 3 convolutional layers (with Tanh activation function). The up-sampling module and the down-sampling module are connected by adopting a Skip-connection structure, for example, the output of d2 is input to d3, and is overlapped with the output of u6 in the channel dimension to be used as the final output. Compared with the traditional encoder-decoder structure, the Skip-connection structure in the U-Net can bring richer semantic information to the decoder in the up-sampling process. Using d (x, y) to represent the down-sampling block with the number of input channels being x and the number of output channels being y, 8 down-sampling block parameters are set as: d (1, 64) -d (64, 128) -d (128, 256) -d (256, 512) -d (512 ). The upsampling block with the number of input channels x and the number of output channels y is represented by u (x, y), and the parameters of 7 upsampling blocks are set as follows: u (512 ) -u (1024, 256) -u (512, 128) -u (256, 64).

In addition, the discriminator adopts a Multi-scale discriminator structure, and the embodiment uses 3 discriminators, which can be adjusted according to the situation. The lower sampling block numbers of different discriminators are different, and the PatchGAN structure is adopted. Fig. 3 is an overall structure diagram of the discriminator, which schematically shows 6 downsampling modules, but this is merely an exemplary illustration, and the number of downsampling modules in a specific discriminator needs to be adjusted according to respective parameters. The three discriminators discriminate the comprehensive authenticity of the input image in three sizes, namely a shallow layer, a middle layer and a deep layer.

The first discriminator consists of 2 downsampling blocks comprising 4 x 4 convolutional layers (band normalization layer and Relu activation function) and 1 output layer. The output layers were 3 x 3 convolutional layers. The output image is a single channel 128 by 128 Patch, and the discriminator is used to evaluate the authenticity of the shallow features of the input image.

The second discriminator consists of 4 down-sampling modules, including 4 by 4 convolutional layers (band normalization layer and Relu activation function), and 1 output layer. The output layers are 3 x 3 convolutional layers, the output image is a single-channel 32 x 32 size Patch, and the discriminator is used to evaluate the authenticity of layer features in the input image.

The third discriminator consists of 6 down-sampling modules including 4 by 4 convolutional layers (band normalization layer and Relu activation function) and 1 output layer. The output layer is 3 × 3 convolution layer, the output image is single-channel Patch of 8 × 8 size, and the discriminator is used to evaluate the authenticity of the deep features of the input image.

In this embodiment, the target function of the CGAN may be expressed as:

wherein G is a generator, D is a discriminator, x is a real road network layout graph, and c is a corresponding code graph. It is important to note that since there are three discriminators, the total penalty function requires the calculation of the sum of the penalty terms of the three discriminators. The process of training by adopting the loss function in the CGAN module is consistent with the traditional training mode of the conditional generation confrontation network, and is not described again.

After the multi-modal data fusion model and the CGAN model are trained, the method can be applied in an inference stage, and comprises the following steps: the method comprises the steps of obtaining population density raster data, terrain elevation raster data and land utilization raster data of a target area to form multi-mode data, inputting the multi-mode data into a pre-trained multi-mode data fusion module, compressing and reducing dimensions of different modal data by the multi-mode data fusion module, outputting a coding graph, capturing geographic spatial features and road texture features in the coding graph by a pre-trained CGAN module, and finally outputting a corresponding urban road layout design graph.

The multi-modal data-driven automatic method for urban road layout design in the embodiment is applied to a specific case to show the technical effect that the method can achieve.

Examples

The general process in this embodiment may be divided into three stages, i.e., data preprocessing, multi-modal data fusion model training, CGAN model training, and image generation, where the specific structures of the multi-modal data fusion model and the CGAN model are as described above and are not described again.

Step 1, data preprocessing stage

Step 1.1: in the present case, the original data are population density data, terrain elevation data, land utilization data and road layout data recorded in a grid form, four grid images are a set of sample data, the first three are used as input data of the sample, and the last road layout data is used as a label of the sample, namely a real road layout. And performing preliminary preprocessing on each original raster image in the sample to ensure that the longitude and latitude of the center are aligned, wherein the size of the single-channel grayscale image is 512 x 512.

Step 1.2: the images are screened to remove meaningless or sample-sparsely-sampled data sets, such as data with 0 population, high altitude and sparse roads. And finally, performing expansion processing on the road layout drawing and amplifying the characteristics.

Step 2, multi-mode data fusion model training

Step 2.1: and dividing the training data set and the test data set according to the proportion of 7.

Step 2.2: a batch of training samples with index i is sequentially selected from the training dataset, where i ∈ {0,1, \ 8230;, N }. And training the multi-modal data fusion model by utilizing the training samples of each batch. The model employs a self-encoder structure. During the training process, calculating each training sample

Training by gradient descent method, based on total loss of all training samples in batch

The network parameters in the whole model are adjusted, and the learning rate parameter is set to 0.0002, and the learning rate parameter is reduced to one tenth of the learning rate parameter every 100 rounds. Fig. 4 is a schematic diagram showing the results of 200 rounds of training, and the reconstructed image of the self-encoder is almost the same as the input image.

Step 3, CGAN model training

Embedding an encoder in the multi-modal data fusion module pre-trained in the step 2 into the CGAN model, wherein the encoder can encode the original input data into a single-channel 512-by-512 encoding graph which is used as the input of a generator in the CGAN, and the generator generates a corresponding road layout according to the inputAnd (5) designing a drawing. The arbiter learns to distinguish between the real road layout and the road layout. The training of the generator and the discriminator is a mutual game and continuous iteration optimization process, the generator aims at learning to cheat the discriminator to generate a road layout design drawing which is false or spurious, and the discriminator needs to discriminate true or false. During the training process, calculating each training sample

And

training by gradient descent, based on the total loss of all training samples in the batch

The network parameters in the whole model are adjusted, and the learning rate parameters of the generator and the discriminator are set to 0.0002, and are reduced to one tenth of the previous time after every 100 rounds. Fig. 5 is an optimization process of the generator for generating the road layout design diagram, and it can be seen that, along with the continuous optimization of the discriminator, the generator is also continuously evolving, and the corresponding road can be selectively generated according to the three data of terrain, population and land utilization, and through the continuous optimization, the connectivity of the road gradually becomes better, and the broken circuit phenomenon is obviously reduced.

4. Image generation

Inputting the test data set into the generator through the same process as described above, it can be seen that for the unlearned test data set, the model can still generate a corresponding road layout plan according to the input population density, terrain elevation and land utilization data, as shown in fig. 6.

The above-described embodiments are merely preferred embodiments of the present invention, and are not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical solutions obtained by means of equivalent substitution or equivalent transformation all fall within the protection scope of the present invention.

Claims

1. A multi-modal data-driven urban road layout design automation method is characterized in that multi-modal data are formed by acquiring population density grid data, terrain elevation grid data and land utilization grid data of a target area, the multi-modal data are input into a pre-trained multi-modal data fusion module, the multi-modal data fusion module compresses and reduces dimensions of different modal data and outputs a code graph, a countermeasure network (CGAN) module is generated according to pre-trained conditions to capture geospatial features and road texture features in the code graph, and finally a corresponding urban road layout design graph is output;

the condition generation countermeasure network module consists of a generator and a discriminator; the generator takes the coded map output by the multi-mode data fusion module as input and outputs a road layout design map; in the pre-training stage, the road layout design diagram output by the generator is input into a discriminator, the discriminator simultaneously takes a real road network layout diagram and a code diagram as the other two inputs, outputs block diagrams (Patch) with three scales of 128 × 128, 32 × 32 and 8 × 8, discriminates the authenticity of the input diagram from three angles of shallow layer, middle layer and deep layer characteristics, and further optimizes the discriminator and the generator alternately; in the inference phase, the road layout design diagram output by the generator is directly used as a condition to generate the output of the confrontation network module.

2. The automated multi-modal data-driven urban road layout design method of claim 1, wherein both the encoder and the decoder in the multi-modal data fusion module employ a U-Net model as a baseline model.

3. The automated multi-modal data-driven urban road layout design method of claim 2, wherein the multi-modal data fusion module performs training by optimizing the mean absolute error of the reconstructed data of each modality and the original data of each modality during a pre-training phase.

4. The automated multi-modal data-driven urban road layout design method according to claim 1, wherein the multi-modal data fusion module is configured to input rasterized images with a size of 512 x 512 from population density raster data, terrain elevation raster data and land utilization raster data, and the size of the code map formed by fusing the rasterized images of three different modalities is also 512 x 512.

5. The automated multi-modal data-driven urban road layout design method according to claim 1, wherein the input of the generator is the code pattern output by the encoder, and the generator adopts a U-Net structure and comprises 8 down-sampling modules, 7 up-sampling modules and 1 output module; the first 4 downsampling modules are 4 x 4 convolutional layers with a normalization layer and a LeakyRelu activation function, and the last 4 downsampling modules comprise 4 x 4 convolutional layers with a normalization layer and a LeakyRelu activation function and a Dropout layer; the size of the feature map is halved after each downsampling module; the first 4 upsampling modules comprise 4 x 4 deconvolution layers with normalization layers and Relu activation functions and Dropout layers, and the last 3 upsampling modules are 4 x 4 deconvolution layers with normalization layers and Relu activation functions; the output module comprises an upper sampling layer and a 3 x 3 convolution layer with a Tanh activation function, and the upper sampling layer doubles the image size by adopting neighbor interpolation; the 8 up-sampling modules and the 7 down-sampling modules are connected by adopting a Skip-connection structure.

6. The automated multi-modal data-driven urban road layout design method of claim 4, wherein 3 discriminators each have a PatchGAN structure, the inputs of each discriminator are the code map output by the encoder, the real road network layout map and the road layout design map output by the generator, the three input dimensions of the discriminator are 512, the dimensions of the output map blocks of the three discriminators are 128 × 128, 32 × 32 and 8 × 8, respectively, and the comprehensive authenticity of the input image is discriminated under three feature dimensions of shallow, middle and deep layers.

7. The automated multi-modal data-driven urban road layout design method of claim 6, wherein said three discriminators are specifically as follows:

the second discriminator consists of 4 down-sampling modules and 1 output layer, wherein the down-sampling modules comprise 4 × 4 convolution layers with a normalization layer and a Relu activation function, the output layers are 3 × 3 convolution layers, and the output image is a single-channel 32 × 32-size image block;

8. The automated multi-modal data-driven urban road layout design method of claim 1, wherein the loss function employed by the conditional generation countermeasure network module training is the sum of the losses of three discriminators, the loss function of each discriminator being in the form of an MSE loss.