CN115546589A

CN115546589A - Image generation method based on graph neural network

Info

Publication number: CN115546589A
Application number: CN202211503117.2A
Authority: CN
Inventors: 陈培; 张杨康; 李泽健; 孙凌云
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2022-11-29
Filing date: 2022-11-29
Publication date: 2022-12-30
Anticipated expiration: 2042-11-29
Also published as: CN115546589B

Abstract

The invention discloses an image generation method based on a graph neural network, which comprises the steps of constructing a hypergraph through an image feature node set and a corresponding scene topological graph, and constructing a graph neural network on the hypergraph to simultaneously learn semantic features and potential features of an image in the scene topological graph; simulating object interaction in a real scene through four message transfer modes on a graph neural network, and sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code; training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model; the method can efficiently generate the image with higher visual quality and more correct relationship between the objects.

Description

Image generation method based on graph neural network

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image generation method based on a graph neural network.

Background

In recent years, the generation of antagonistic neural Networks (GAN) has made great progress in the field of generating realistic images, which creates high-quality images rich in content from pixel-level images that are indistinguishable from humans. In addition, the image generation method with the condition can make the generated result more controllable and meet the requirements of users, such as: generating images based on the text description, generating human body images based on the skeletal key points, and the like.

In the method for generating the image based on the scene topological graph, each node in the scene topological graph is endowed with a specific semantic meaning, and the nodes represent the relationship between the semantic meanings by using the connection of edges, so that the semantic content and the layout plan of an image can be described, and the semantic content and the layout plan are similar to the form of a human mind map. Therefore, the technology for generating the image by the scene topological graph has important application in the field of human and artificial intelligence cooperative drawing creation.

The existing method for generating images based on scene topology maps involves two phases. In the first stage, the semantic features of the object are obtained by the neural network learning of the graph, and the semantic features are used for determining a semantic segmentation graph of the object, wherein the semantic segmentation graph comprises the coordinate boundary of the object and the rough shape of the object. In the second stage, the existing method generates the final image using a method of generating an image based on a semantic segmentation map. A key challenge of the two-stage based approach is the need to learn semantic features that contain interactions between objects through the graph neural network.

When the graph neural network model fails to capture the interaction of the object or does not incorporate the information of the interaction into the semantic features, the semantic features obtained by that will only contain semantic category information. In this case, each object is generated independently, and the final image is not realistic.

On the other hand, the existing image generation methods ignore the interaction of objects in the image generation phase, i.e. the objects are generated independently and in parallel at this phase without further messaging, which may result in distortion of the objects in the generated image. Therefore, based on the two-stage method, the learning of the interaction information between the objects only exists in the learning stage of the semantic features, which brings a serious burden to the learning of the semantic features.

In order to more accurately capture the interaction between objects, the relationship between the objects needs to be considered in both the semantic feature learning phase and the image generation phase. Therefore, it is necessary to design an image generation method capable of accurately obtaining the relationship between objects and efficiently generating an image with high visual quality.

Disclosure of Invention

The invention provides an image generation method based on a graph neural network, which can efficiently generate an image with higher visual quality and more correct relationship between objects.

An image generation method based on a graph neural network comprises the following steps:

(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image feature node set, constructing a hypergraph through the image feature node set and the corresponding scene topological graph, and constructing a training sample set by the plurality of hypergraphs;

(2) Constructing a training network model, wherein the training network model comprises a message transfer function, an attention mechanism unit, a full connection layer and a normalized exponential function, and the training network model comprises the following steps:

semantic feature message passing mode on scene topological graph: in the scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;

global message passing mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, a regression network method is adopted to construct a rectangular frame based on each node of the scene topological graph, image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, the semantic features of the updated scene topological graph nodes and the global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, and the aggregate features obtained by the fusion result through an attention mechanism are used as the image features updated in a global message transfer mode;

local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image features updated by adopting a local message transfer mode;

sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code;

(3) Training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model;

(4) When the method is applied, the scene topological graph is input into the graph neural network model to obtain a generated image code, and the generated image code is input into a decoder of the VQGAN system to generate an image.

Inputting the real image into a VQGAN system to obtain a real image code, wherein the method comprises the following steps:

firstly, a real image is processed by an encoder of a VQGAN system to obtain an initial potential vector combination, the initial potential vector in the initial potential vector combination is compared with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, and the subscript of the potential vector combination is the real image encoding, wherein:

the potential vector

Comprises the following steps:

wherein the content of the first and second substances,

in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z _k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is,handwrespectively the height and width of the potential vector.

The scene topological graph constructed based on the objects in the real image is characterized in that the nodes of the scene topological graph represent the objects in the real image, the connected edges represent the relation between the objects, and the scene topological graph is composed of primitive ancestors

The composition is as follows:

set of scene topology graph nodesOComprises the following steps:

wherein the content of the first and second substances, o _i is as followsiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,

a set of object categories;

set of scene topological graph connecting edges

，

As a set of relationship classes, each edge is represented as

，

Is composed of

To (1)

The number of the neighbor nodes is increased,

，

is made byiThe nodes of the scene topology map point to the first

And connecting edges of the nodes of the scene topological graph.

And inputting the scene topological graph into the embedded layer network to obtain the semantic features and the edge connecting features of the scene topological graph nodes.

Fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer function to obtain a first neighbor node message

Comprises the following steps:

wherein the content of the first and second substances,

is as follows

The semantic characteristics of each of the neighboring nodes,

in order to have the characteristic of connecting the edges,

for the information transfer parameter matrix within the scene topology,

，D1 is the dimension of the semantic features of the neighboring nodes, D and 2 is the dimension of the edge connecting feature.

Updating image characteristics corresponding to image characteristic nodes through fusion results

Comprises the following steps:

wherein the content of the first and second substances,

being a feature of a nodev _i Is determined by the node of the neighbor node set,

is a normalized node

To the node

Attention coefficient of (1), W ₁ And W ₂ Respectively, a parameter matrix is formed by the parameter matrixes,GeLUis an activation function.

Image characteristics updated based on global message passing mode

Comprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

is a firstiUpdated semantic node characteristics

Is transmitted to the firstjIndividual image node characteristics

The message of (2) is transmitted to the mobile terminal,r _g is as followsgA global side-to-side type is provided,

is a parameter matrix of the global edge-connected type,

in order to be a global edge-connected feature,

is a firstiAn updated semantic node characteristic

To image node features

Attention coefficient of (1), W ₁ And W ₂ Respectively, a parameter matrix is formed by the parameter matrixes,

for image node features

The semantic feature of (2) is a neighbor node set.

Sequentially carrying out feedforward neural network and normalized operation on the image features obtained by updating based on the global message transfer mode and the local message transfer mode to obtain final image features;

and sequentially carrying out feed-forward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.

When the neighbor nodes of the image feature nodes are in the current rectangular frame, each image feature node in the rectangular frame points to other image feature nodes, and specific local edges are arranged between the nodesr _l The connection is carried out in such a way that,lan index representing a local edge is determined,

for the first local side-by-side feature,

for image feature nodes

The neighbor node set in the same rectangular frame obtains the updated image feature node through the message transfer function and the attention mechanism

Comprises the following steps:

wherein the content of the first and second substances,

is composed ofjIndividual image feature node

To the first

Individual neighbor node characteristics

a parameter matrix of a first local connected edge type.

When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,

representing nodes of an object

Passing edge

Node with object

To carry out connection, object node

And object node

Respectively correspond to the position rectangular frames

And

in a rectangular frame

The image feature node in (2) will be associated with the rectangular frame

The image feature node in (2) is also edge-matched

Connection for image-level relational messaging, definition

Are as follows

The image feature nodes in other rectangular frames in which all the image feature nodes are connected with edges are updated through the message transfer function and the attention mechanism of the image feature nodes

Comprises the following steps:

wherein the content of the first and second substances,

is composed ofjIndividual image feature node

To the first

Individual neighbor node characteristics

Attention coefficient of (1), W ₁ And W ₂ Respectively, a parameter matrix is formed by the parameters,

a parameter matrix of a second local edge type,

is as followsjIndividual image feature node

To the first

Individual neighbor node characteristics

The edge connecting feature of (1).

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention constructs a hypergraph based on an input scene topological graph. The method is different from a two-stage learning method required in the prior art, so that the learning efficiency is improved.

(2) The invention provides four message transmission modes on a graph neural network to simulate the interaction of objects in a real scene, wherein the message transmission on a scene topological graph is used for learning semantic features, the message transmission between the semantic features and image features on the scene topological graph is used for controlling the global generation of images, two message transmission modes are arranged between the image features, one mode is used for controlling the learning of local features of the images, the other mode is used for controlling the learning of the relation between different regions of the images, and the relation between the image features corresponds to the relation defined by the scene topological graph, and finally, the quality of the images generated based on the scene topological graph, including the visual quality of the objects and the correctness of the relation between the objects at the image level, is improved.

Drawings

FIG. 1 is a flowchart of a graph neural network-based image generation model method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a graph neural network-based image generation model method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of four message delivery methods according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.

The application provides an image generation method based on a graph neural network, as shown in fig. 1 and fig. 2, comprising:

s1: obtaining an image generation pre-training dataset and a controllable image generation dataset: the samples of the pre-training data set are all composed of real images; the controllable image generation data set comprises a real image and a scene topological graph corresponding to the real image.

S2: and (3) constructing a pre-training system VQGAN through a generative confrontation network based on an image generation pre-training data set: VQGAN expresses the composition of an image in the form of a sequence. Any image

Can be represented as a combination of potential vectors,

whereinnIs the dimension of the potential vector or vectors,HandWto be the height and width of the image,handwthe height and width of the potential vector. Two convolution models of VQGAN learning are respectively encoders

And decoder

Obtaining a learned discrete latent vector dictionary

To represent the image or images of the scene,Krepresenting the size of the dictionary in terms of,z _k is the first in a vector dictionarykA vector.

The encoder is firstly utilized during VQGAN training

Obtaining initial in-vector combinations

Calculated by the distance nearest principle

The potential vector closest to the potential feature in the potential vector dictionary at each position is used as the potential vector of the current positionzComprises the following steps:

in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z _k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector,

. When training, the image reconstructed by the latent vector combination is basically consistent with the original image:

namely:

the real images in the pre-training data set are input into a pre-training system VQGAN, an encoder

Encoding an image into

I.e. by

Of discrete vectors, decoders

And restoring the discrete vectors into the original image. Combining discrete vectorszI.e. by

Record of

And as initial image feature nodes, the number of potential vectors is used for learning of a scene topological graph image generation training system based on a graph neural network.

The hypergraph is constructed through the image feature node set and the corresponding scene topological graph

And constructing a training sample set by a plurality of hypergraphs. Semantic feature nodes of the scene topology represent objects, and edges represent relationships between the objects. Given a set of object classes

And a set of relationship categories

Semantic node-by-tuple of scene topology

Is composed of (a) wherein

Is a collection of object nodes and each object

，

Is a collection of edges, each of which can be represented as

，

，

Is composed ofiThe scene topology node points to the first

The connecting edges of the nodes of the scene topological graph,

is composed of

To (1) a

And (4) each neighbor node.

The method includes the steps that a scene topological graph is input into an embedded network to obtain semantic features of each node of the scene topological graph

And edge characteristics of edges connecting between nodes

In which

Indicating the edge type.

S3: building a training network model, and defining four message transfer models on a graph neural network to simulate the interaction of objects in a scene, wherein the four message transfer models comprise:

s31: semantic feature message passing mode on scene topological graph: as shown in fig. 3 (a), in the scene topology graph, the semantic features and the edge connecting features of each neighbor node of the scene topology graph nodes are fused through a message transfer function to obtain first neighbor node messages, each first neighbor node message is aggregated through an attention mechanism unit, and the semantic features of the scene topology graph nodes are updated through an aggregation result. After the message transmission is finished, the semantic features of the nodes of the topological graph of each scene are further updated by utilizing a feedforward neural network and a normalization operation to obtain final semantic features, so that the feature conversion capability is improved, and the phenomenon of over-smoothness is relieved.

First neighbor node message provided by the application

Comprises the following steps:

wherein the content of the first and second substances,

is as follows

The semantic features of each of the neighboring nodes,

in order to have the characteristic of connecting the edges,

for the information transfer parameter matrix within the scene topology,

Updating image characteristic node pairs through fusion results provided by the applicationCorresponding image characteristics

Comprises the following steps:

wherein the content of the first and second substances,

is a normalized node

To node

Attention coefficient of (1), W ₁ And W ₂ Respectively, a parameter matrix is formed by the parameters,GeLUis an activation function.

The semantic features of the nodes of the topological graph of each scene are further updated by utilizing the feedforward neural network and the normalized operation to obtain the final semantic features

Comprises the following steps:

wherein the content of the first and second substances,LayerNormis a function of the normalization of the signals,

and

is a parameter matrix of a feed-forward neural network,

is an activation function.

S32: global message passing mode: as shown in fig. 3 (b), the global messaging considers information interaction between node semantic feature information and image feature information in the input scene topology map. And when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame of the real image based on each node of the scene topological graph by adopting a regression network method. The prior art uses semantic features of nodes to predict the position rectangular box and object shape of an object, and then fills the semantic features into specific position and shape regions. The invention follows the similar object-to-region criterion, firstly defines a regression network of the rectangular frame of the object position to predict each object

At a rectangular position

Wherein

Representing the coordinates of the upper left corner of the rectangular box,

and

respectively representing the width and height of the rectangular box.

Image feature nodes of an object are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, updated scene topological graph node semantic features and global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, image features corresponding to the image feature nodes are updated through a fusion result based on an attention mechanism, and image features corresponding to each updated image feature node are further updated through a feedforward neural network and a normalized operation to obtain final image features

Comprises the following steps:

is a firstiUpdated semantic node characteristics

Is transmitted to the firstjIndividual image node characteristics

The message of (a) is received,r _g is as followsgA global edge-connected type, which is a global edge-connected type,

is a parameter matrix of the global edge-connected type,

in order to be a global edge-connected feature,

is a firstiAn updated semantic node characteristic

To image node features

as a feature of a node of an image

The semantic feature neighbor node set of (1).

S32: local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and updating the image features corresponding to the image feature nodes through an aggregation result;

when the neighbor nodes of the image feature nodes are in the current rectangular frame, the message transmission mode is defined as a first local message transmission mode, and when the neighbor nodes of the image feature nodes are in other rectangular frames, the message transmission mode is defined as a second local message transmission mode.

First local messaging: as shown in fig. 3 (c), the purpose of local messaging is to learn local visual details of an image so that the generated image has finer granularity of details. Each image feature node is sensitive to its surrounding image feature nodes, and in particular, all image feature nodes within a rectangular box constitute a complete graph, i.e. each image feature node within a rectangular box points to other image feature nodes with specific local edges between themr _l The connection is carried out in such a way that,lan index representing a local edge is determined,

is a local continuous edge feature. Definition of

For image feature nodes

The neighbor nodes in the same rectangular frame obtain the final image characteristic node updated by the first local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalization operation

Comprises the following steps:

is composed ofjIndividual image feature node

To the first

Individual neighbor node characteristics

a parameter matrix of a first local continuous type.

Second local messaging: as shown in fig. 3 (d), the second local messaging approach is to model the interrelationship between objects at the image level. And according to the semantic feature message transmission mode on the scene topological graph, transmitting the message according to the defined object and the relation between the objects at the image level. When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,

representing nodes of an object

Passing edge

Node with object

To carry out connection, object node

And object node

Respectively correspond to the position rectangular frame

And

in a rectangular frame

The image feature node in (1) will be in contact with the rectangular frame

The image feature node in (2) is also edge-matched

Connection to enable image-level relational messaging, definition

Is a sum of

In the method, the image feature nodes in other rectangular frames with edge connection of all the image feature nodes are considered that different rectangular frames have huge number of connecting edges, and the random sampling strategy is adopted to reduce the mapping number of the edges. Obtaining the final image characteristic node updated by the second local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalized operation

Comprises the following steps:

is composed ofjIndividual image feature node

To the first

Individual neighbor node characteristics

a parameter matrix of a second local edge type,

is as followsjIndividual image feature node

To the first

Individual neighbor node characteristics

The edge connecting feature of (1).

Finally, the final image node characteristics are obtained through the information transmission mode related to the last three image characteristics

Comprises the following steps:

obtaining a final image node feature set by a plurality of final image node features, and sequentially inputting the final image node feature set into a layer of fully-connected prediction network and a normalization index function (a)softmax) Generating image codes

，

：

Wherein the content of the first and second substances,

to predict parameters of the network.

S4: training the training network model based on the training sample set, and training the training network model by adopting a cross entropy loss function through generating image codes and real image codes to obtain the graph neural network model. And defining a loss function combined with an autoregressive prediction mode, and predicting and generating image codes by using real image codes.

Training phase, inputting real images

As input, the encoder of VQGAN

Encoding and converting images into

Image latent vector ofzFinding potential vectors in a vector dictionary

Subscript of (1), vector dictionary

Constructing real image code by using a plurality of subscripts in the method, and using the real image code as a training label

，BThe number of codes for the real image.

Wherein the content of the first and second substances,

is frontb-1 real image dictionary subscript, i.e. frontb-1 real image encoding, trained so thatbGenerating image coding

And a firstbEncoding of a real image

Probability of proximity

At the maximum, the number of the first,

are graph neural network parameters.

S5: and testing the model trained in the S4.

Testing, inputting any scene topological graph, generating new image dictionary subscripts one by one through the graph neural network model trained in S4 in an autoregressive mode under the condition of not needing the real image dictionary subscripts, generating image codes, and trainingThe difference is that the generated subscript is used for predicting and generating a new dictionary subscript instead of the real dictionary subscript, namely real image coding, and after all image potential vectors are obtained, a decoder of VQGAN is used

And converting the image potential vector corresponding to the subscript into a generated image. Polynomial resampling methods are utilized to obtain different image latent vectors to increase the diversity of the generated images.

Claims

1. An image generation method based on a graph neural network is characterized by comprising the following steps:

(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image characteristic node set, constructing a hypergraph through the image characteristic node set and the corresponding scene topological graph, and constructing a training sample set by a plurality of hypergraphs;

semantic feature message delivery mode on scene topological graph: in a scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;

global message delivery mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame based on each node of the scene topological graph by adopting a regression network method, wherein image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, fusing the semantic features of the updated scene topological graph nodes with the global edge connecting features connected with the corresponding rectangular frame through a message transfer function, and taking the aggregation features obtained by the fusion result through an attention mechanism as the image features updated in a global message transfer mode;

local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features and the corresponding connection edge features of the neighbor nodes of the image feature nodes in the rectangular frame through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image feature updated by adopting a local message transfer mode;

2. The method of claim 1, wherein inputting the real image into the VQGAN system to obtain the real image code comprises:

firstly, obtaining an initial potential vector combination of a real image through an encoder of a VQGAN system, comparing an initial potential vector in the initial potential vector combination with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, wherein a subscript of the potential vector combination is real image encoding, and the method comprises the following steps:

the potential vector

Comprises the following steps:

wherein the content of the first and second substances,

in order to be the initial combination of potential vectors,qthe value of the junction is a function of the distance,z _k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector.

3. The method according to claim 1, wherein the scene topology graph constructed based on the objects in the real image is a scene topology graph, nodes of the scene topology graph represent the objects in the real image, and edges represent relationships between the objects, and the scene topology graph is a primitive ancestor

The composition is as follows:

set of scene topology graph nodesOComprises the following steps:

wherein, the first and the second end of the pipe are connected with each other, o _i is a firstiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,

a set of object categories;

set of scene topology graph connecting edges

，

For a set of relationship classes, each edge is represented as

，

Is composed of

To (1) a

The number of the neighbor nodes is increased,

，

is made byiThe scene topology node points to the first

And connecting edges of the scene topological graph nodes.

4. The method for generating an image based on a graph neural network according to claim 1, wherein the scene topological graph is input into the embedded layer network to obtain semantic features and edge connection features of nodes of the scene topological graph.

5. The graph neural network-based image generation method of claim 3, wherein a first neighbor node message is obtained by fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer function

Comprises the following steps:

is as follows

The semantic features of each of the neighboring nodes,

in order to have the characteristic of connecting the edges,

for the information transfer parameter matrix within the scene topology,

，D1 is the dimension of the semantic features of the neighboring nodes,Dand 2 is the dimension of the edge connecting feature.

6. The method of claim 3, wherein the image features corresponding to the image feature nodes are updated by fusing the results

Comprises the following steps:

wherein the content of the first and second substances,

being a characteristic of a nodev _i Is determined by the node of the neighbor node set,

is a normalized node

To the node

7. The method of claim 1, wherein the image features are updated based on global messaging

Comprises the following steps:

is as followsiUpdated semantic node characteristics

Is transmitted to the firstjIndividual image node characteristics

The message of (2) is transmitted to the mobile terminal,r _g is a firstgA global side-to-side type is provided,

is a parameter matrix of the global edge-connected type,

in order to be a global edge-connected feature,

is as followsiAn updated semantic node characteristic

To image node features

for image node features

The semantic feature neighbor node set of (1).

8. The method for generating an image based on a graph neural network according to claim 1, wherein the image features obtained by updating based on a global message transfer mode and a local message transfer mode are subjected to a feed-forward neural network and a normalization operation in sequence to obtain final image features;

and sequentially carrying out feedforward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.

9. The method of claim 1, wherein when the neighbor nodes of the image feature node are in the current rectangle, each image feature node in the rectangle points to other image feature nodes, and the nodes are connected by a specific local edger _l The connection is carried out in such a way that,lthe index of the partial edge is represented by,

for the first local side-by-side feature,

for image feature nodes

The updated image feature node is obtained through the message transfer function and the attention mechanism of the neighbor node set in the same rectangular frame