CN115546589A - Image generation method based on graph neural network - Google Patents
Image generation method based on graph neural network Download PDFInfo
- Publication number
- CN115546589A CN115546589A CN202211503117.2A CN202211503117A CN115546589A CN 115546589 A CN115546589 A CN 115546589A CN 202211503117 A CN202211503117 A CN 202211503117A CN 115546589 A CN115546589 A CN 115546589A
- Authority
- CN
- China
- Prior art keywords
- image
- node
- nodes
- scene
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses an image generation method based on a graph neural network, which comprises the steps of constructing a hypergraph through an image feature node set and a corresponding scene topological graph, and constructing a graph neural network on the hypergraph to simultaneously learn semantic features and potential features of an image in the scene topological graph; simulating object interaction in a real scene through four message transfer modes on a graph neural network, and sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code; training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model; the method can efficiently generate the image with higher visual quality and more correct relationship between the objects.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image generation method based on a graph neural network.
Background
In recent years, the generation of antagonistic neural Networks (GAN) has made great progress in the field of generating realistic images, which creates high-quality images rich in content from pixel-level images that are indistinguishable from humans. In addition, the image generation method with the condition can make the generated result more controllable and meet the requirements of users, such as: generating images based on the text description, generating human body images based on the skeletal key points, and the like.
In the method for generating the image based on the scene topological graph, each node in the scene topological graph is endowed with a specific semantic meaning, and the nodes represent the relationship between the semantic meanings by using the connection of edges, so that the semantic content and the layout plan of an image can be described, and the semantic content and the layout plan are similar to the form of a human mind map. Therefore, the technology for generating the image by the scene topological graph has important application in the field of human and artificial intelligence cooperative drawing creation.
The existing method for generating images based on scene topology maps involves two phases. In the first stage, the semantic features of the object are obtained by the neural network learning of the graph, and the semantic features are used for determining a semantic segmentation graph of the object, wherein the semantic segmentation graph comprises the coordinate boundary of the object and the rough shape of the object. In the second stage, the existing method generates the final image using a method of generating an image based on a semantic segmentation map. A key challenge of the two-stage based approach is the need to learn semantic features that contain interactions between objects through the graph neural network.
When the graph neural network model fails to capture the interaction of the object or does not incorporate the information of the interaction into the semantic features, the semantic features obtained by that will only contain semantic category information. In this case, each object is generated independently, and the final image is not realistic.
On the other hand, the existing image generation methods ignore the interaction of objects in the image generation phase, i.e. the objects are generated independently and in parallel at this phase without further messaging, which may result in distortion of the objects in the generated image. Therefore, based on the two-stage method, the learning of the interaction information between the objects only exists in the learning stage of the semantic features, which brings a serious burden to the learning of the semantic features.
In order to more accurately capture the interaction between objects, the relationship between the objects needs to be considered in both the semantic feature learning phase and the image generation phase. Therefore, it is necessary to design an image generation method capable of accurately obtaining the relationship between objects and efficiently generating an image with high visual quality.
Disclosure of Invention
The invention provides an image generation method based on a graph neural network, which can efficiently generate an image with higher visual quality and more correct relationship between objects.
An image generation method based on a graph neural network comprises the following steps:
(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image feature node set, constructing a hypergraph through the image feature node set and the corresponding scene topological graph, and constructing a training sample set by the plurality of hypergraphs;
(2) Constructing a training network model, wherein the training network model comprises a message transfer function, an attention mechanism unit, a full connection layer and a normalized exponential function, and the training network model comprises the following steps:
semantic feature message passing mode on scene topological graph: in the scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;
global message passing mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, a regression network method is adopted to construct a rectangular frame based on each node of the scene topological graph, image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, the semantic features of the updated scene topological graph nodes and the global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, and the aggregate features obtained by the fusion result through an attention mechanism are used as the image features updated in a global message transfer mode;
local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image features updated by adopting a local message transfer mode;
sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code;
(3) Training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model;
(4) When the method is applied, the scene topological graph is input into the graph neural network model to obtain a generated image code, and the generated image code is input into a decoder of the VQGAN system to generate an image.
Inputting the real image into a VQGAN system to obtain a real image code, wherein the method comprises the following steps:
firstly, a real image is processed by an encoder of a VQGAN system to obtain an initial potential vector combination, the initial potential vector in the initial potential vector combination is compared with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, and the subscript of the potential vector combination is the real image encoding, wherein:
wherein the content of the first and second substances,in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is,handwrespectively the height and width of the potential vector.
The scene topological graph constructed based on the objects in the real image is characterized in that the nodes of the scene topological graph represent the objects in the real image, the connected edges represent the relation between the objects, and the scene topological graph is composed of primitive ancestorsThe composition is as follows:
set of scene topology graph nodesOComprises the following steps:
wherein the content of the first and second substances, o i is as followsiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,a set of object categories;
set of scene topological graph connecting edges,As a set of relationship classes, each edge is represented as,Is composed ofTo (1)The number of the neighbor nodes is increased,,is made byiThe nodes of the scene topology map point to the firstAnd connecting edges of the nodes of the scene topological graph.
And inputting the scene topological graph into the embedded layer network to obtain the semantic features and the edge connecting features of the scene topological graph nodes.
Fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer function to obtain a first neighbor node messageComprises the following steps:
wherein the content of the first and second substances,is as followsThe semantic characteristics of each of the neighboring nodes,in order to have the characteristic of connecting the edges,for the information transfer parameter matrix within the scene topology,,D1 is the dimension of the semantic features of the neighboring nodes, D and 2 is the dimension of the edge connecting feature.
Updating image characteristics corresponding to image characteristic nodes through fusion resultsComprises the following steps:
wherein the content of the first and second substances,being a feature of a nodev i Is determined by the node of the neighbor node set,is a normalized nodeTo the nodeAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,GeLUis an activation function.
wherein, the first and the second end of the pipe are connected with each other,is a firstiUpdated semantic node characteristicsIs transmitted to the firstjIndividual image node characteristicsThe message of (2) is transmitted to the mobile terminal,r g is as followsgA global side-to-side type is provided,is a parameter matrix of the global edge-connected type,in order to be a global edge-connected feature,is a firstiAn updated semantic node characteristicTo image node featuresAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,for image node featuresThe semantic feature of (2) is a neighbor node set.
Sequentially carrying out feedforward neural network and normalized operation on the image features obtained by updating based on the global message transfer mode and the local message transfer mode to obtain final image features;
and sequentially carrying out feed-forward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.
When the neighbor nodes of the image feature nodes are in the current rectangular frame, each image feature node in the rectangular frame points to other image feature nodes, and specific local edges are arranged between the nodesr l The connection is carried out in such a way that,lan index representing a local edge is determined,for the first local side-by-side feature,for image feature nodesThe neighbor node set in the same rectangular frame obtains the updated image feature node through the message transfer function and the attention mechanismComprises the following steps:
wherein the content of the first and second substances,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,a parameter matrix of a first local connected edge type.
When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,representing nodes of an objectPassing edgeNode with objectTo carry out connection, object nodeAnd object nodeRespectively correspond to the position rectangular framesAndin a rectangular frameThe image feature node in (2) will be associated with the rectangular frameThe image feature node in (2) is also edge-matchedConnection for image-level relational messaging, definitionAre as followsThe image feature nodes in other rectangular frames in which all the image feature nodes are connected with edges are updated through the message transfer function and the attention mechanism of the image feature nodesComprises the following steps:
wherein the content of the first and second substances,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,a parameter matrix of a second local edge type,is as followsjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsThe edge connecting feature of (1).
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention constructs a hypergraph based on an input scene topological graph. The method is different from a two-stage learning method required in the prior art, so that the learning efficiency is improved.
(2) The invention provides four message transmission modes on a graph neural network to simulate the interaction of objects in a real scene, wherein the message transmission on a scene topological graph is used for learning semantic features, the message transmission between the semantic features and image features on the scene topological graph is used for controlling the global generation of images, two message transmission modes are arranged between the image features, one mode is used for controlling the learning of local features of the images, the other mode is used for controlling the learning of the relation between different regions of the images, and the relation between the image features corresponds to the relation defined by the scene topological graph, and finally, the quality of the images generated based on the scene topological graph, including the visual quality of the objects and the correctness of the relation between the objects at the image level, is improved.
Drawings
FIG. 1 is a flowchart of a graph neural network-based image generation model method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a graph neural network-based image generation model method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of four message delivery methods according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
The application provides an image generation method based on a graph neural network, as shown in fig. 1 and fig. 2, comprising:
s1: obtaining an image generation pre-training dataset and a controllable image generation dataset: the samples of the pre-training data set are all composed of real images; the controllable image generation data set comprises a real image and a scene topological graph corresponding to the real image.
S2: and (3) constructing a pre-training system VQGAN through a generative confrontation network based on an image generation pre-training data set: VQGAN expresses the composition of an image in the form of a sequence. Any imageCan be represented as a combination of potential vectors,whereinnIs the dimension of the potential vector or vectors,HandWto be the height and width of the image,handwthe height and width of the potential vector. Two convolution models of VQGAN learning are respectively encodersAnd decoderObtaining a learned discrete latent vector dictionaryTo represent the image or images of the scene,Krepresenting the size of the dictionary in terms of,z k is the first in a vector dictionarykA vector.
The encoder is firstly utilized during VQGAN trainingObtaining initial in-vector combinationsCalculated by the distance nearest principleThe potential vector closest to the potential feature in the potential vector dictionary at each position is used as the potential vector of the current positionzComprises the following steps:
in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector,. When training, the image reconstructed by the latent vector combination is basically consistent with the original image:namely:
the real images in the pre-training data set are input into a pre-training system VQGAN, an encoderEncoding an image intoI.e. byOf discrete vectors, decodersAnd restoring the discrete vectors into the original image. Combining discrete vectorszI.e. byRecord ofAnd as initial image feature nodes, the number of potential vectors is used for learning of a scene topological graph image generation training system based on a graph neural network.
The hypergraph is constructed through the image feature node set and the corresponding scene topological graphAnd constructing a training sample set by a plurality of hypergraphs. Semantic feature nodes of the scene topology represent objects, and edges represent relationships between the objects. Given a set of object classesAnd a set of relationship categoriesSemantic node-by-tuple of scene topologyIs composed of (a) whereinIs a collection of object nodes and each object,Is a collection of edges, each of which can be represented as,,Is composed ofiThe scene topology node points to the firstThe connecting edges of the nodes of the scene topological graph,is composed ofTo (1) aAnd (4) each neighbor node.
The method includes the steps that a scene topological graph is input into an embedded network to obtain semantic features of each node of the scene topological graphAnd edge characteristics of edges connecting between nodesIn whichIndicating the edge type.
S3: building a training network model, and defining four message transfer models on a graph neural network to simulate the interaction of objects in a scene, wherein the four message transfer models comprise:
s31: semantic feature message passing mode on scene topological graph: as shown in fig. 3 (a), in the scene topology graph, the semantic features and the edge connecting features of each neighbor node of the scene topology graph nodes are fused through a message transfer function to obtain first neighbor node messages, each first neighbor node message is aggregated through an attention mechanism unit, and the semantic features of the scene topology graph nodes are updated through an aggregation result. After the message transmission is finished, the semantic features of the nodes of the topological graph of each scene are further updated by utilizing a feedforward neural network and a normalization operation to obtain final semantic features, so that the feature conversion capability is improved, and the phenomenon of over-smoothness is relieved.
wherein the content of the first and second substances,is as followsThe semantic features of each of the neighboring nodes,in order to have the characteristic of connecting the edges,for the information transfer parameter matrix within the scene topology,,D1 is the dimension of the semantic features of the neighboring nodes, D and 2 is the dimension of the edge connecting feature.
Updating image characteristic node pairs through fusion results provided by the applicationCorresponding image characteristicsComprises the following steps:
wherein the content of the first and second substances,being a feature of a nodev i Is determined by the node of the neighbor node set,is a normalized nodeTo nodeAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,GeLUis an activation function.
The semantic features of the nodes of the topological graph of each scene are further updated by utilizing the feedforward neural network and the normalized operation to obtain the final semantic featuresComprises the following steps:
wherein the content of the first and second substances,LayerNormis a function of the normalization of the signals,andis a parameter matrix of a feed-forward neural network,is an activation function.
S32: global message passing mode: as shown in fig. 3 (b), the global messaging considers information interaction between node semantic feature information and image feature information in the input scene topology map. And when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame of the real image based on each node of the scene topological graph by adopting a regression network method. The prior art uses semantic features of nodes to predict the position rectangular box and object shape of an object, and then fills the semantic features into specific position and shape regions. The invention follows the similar object-to-region criterion, firstly defines a regression network of the rectangular frame of the object position to predict each objectAt a rectangular positionWhereinRepresenting the coordinates of the upper left corner of the rectangular box,andrespectively representing the width and height of the rectangular box.
Image feature nodes of an object are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, updated scene topological graph node semantic features and global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, image features corresponding to the image feature nodes are updated through a fusion result based on an attention mechanism, and image features corresponding to each updated image feature node are further updated through a feedforward neural network and a normalized operation to obtain final image featuresComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,is a firstiUpdated semantic node characteristicsIs transmitted to the firstjIndividual image node characteristicsThe message of (a) is received,r g is as followsgA global edge-connected type, which is a global edge-connected type,is a parameter matrix of the global edge-connected type,in order to be a global edge-connected feature,is a firstiAn updated semantic node characteristicTo image node featuresAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,as a feature of a node of an imageThe semantic feature neighbor node set of (1).
S32: local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and updating the image features corresponding to the image feature nodes through an aggregation result;
when the neighbor nodes of the image feature nodes are in the current rectangular frame, the message transmission mode is defined as a first local message transmission mode, and when the neighbor nodes of the image feature nodes are in other rectangular frames, the message transmission mode is defined as a second local message transmission mode.
First local messaging: as shown in fig. 3 (c), the purpose of local messaging is to learn local visual details of an image so that the generated image has finer granularity of details. Each image feature node is sensitive to its surrounding image feature nodes, and in particular, all image feature nodes within a rectangular box constitute a complete graph, i.e. each image feature node within a rectangular box points to other image feature nodes with specific local edges between themr l The connection is carried out in such a way that,lan index representing a local edge is determined,is a local continuous edge feature. Definition ofFor image feature nodesThe neighbor nodes in the same rectangular frame obtain the final image characteristic node updated by the first local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalization operationComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,a parameter matrix of a first local continuous type.
Second local messaging: as shown in fig. 3 (d), the second local messaging approach is to model the interrelationship between objects at the image level. And according to the semantic feature message transmission mode on the scene topological graph, transmitting the message according to the defined object and the relation between the objects at the image level. When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,representing nodes of an objectPassing edgeNode with objectTo carry out connection, object nodeAnd object nodeRespectively correspond to the position rectangular frameAndin a rectangular frameThe image feature node in (1) will be in contact with the rectangular frameThe image feature node in (2) is also edge-matchedConnection to enable image-level relational messaging, definitionIs a sum ofIn the method, the image feature nodes in other rectangular frames with edge connection of all the image feature nodes are considered that different rectangular frames have huge number of connecting edges, and the random sampling strategy is adopted to reduce the mapping number of the edges. Obtaining the final image characteristic node updated by the second local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalized operationComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,a parameter matrix of a second local edge type,is as followsjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsThe edge connecting feature of (1).
Finally, the final image node characteristics are obtained through the information transmission mode related to the last three image characteristicsComprises the following steps:
obtaining a final image node feature set by a plurality of final image node features, and sequentially inputting the final image node feature set into a layer of fully-connected prediction network and a normalization index function (a)softmax) Generating image codes,:
S4: training the training network model based on the training sample set, and training the training network model by adopting a cross entropy loss function through generating image codes and real image codes to obtain the graph neural network model. And defining a loss function combined with an autoregressive prediction mode, and predicting and generating image codes by using real image codes.
Training phase, inputting real imagesAs input, the encoder of VQGANEncoding and converting images intoImage latent vector ofzFinding potential vectors in a vector dictionarySubscript of (1), vector dictionaryConstructing real image code by using a plurality of subscripts in the method, and using the real image code as a training label,BThe number of codes for the real image.
Wherein the content of the first and second substances,is frontb-1 real image dictionary subscript, i.e. frontb-1 real image encoding, trained so thatbGenerating image codingAnd a firstbEncoding of a real imageProbability of proximityAt the maximum, the number of the first,are graph neural network parameters.
S5: and testing the model trained in the S4.
Testing, inputting any scene topological graph, generating new image dictionary subscripts one by one through the graph neural network model trained in S4 in an autoregressive mode under the condition of not needing the real image dictionary subscripts, generating image codes, and trainingThe difference is that the generated subscript is used for predicting and generating a new dictionary subscript instead of the real dictionary subscript, namely real image coding, and after all image potential vectors are obtained, a decoder of VQGAN is usedAnd converting the image potential vector corresponding to the subscript into a generated image. Polynomial resampling methods are utilized to obtain different image latent vectors to increase the diversity of the generated images.
Claims (10)
1. An image generation method based on a graph neural network is characterized by comprising the following steps:
(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image characteristic node set, constructing a hypergraph through the image characteristic node set and the corresponding scene topological graph, and constructing a training sample set by a plurality of hypergraphs;
(2) Constructing a training network model, wherein the training network model comprises a message transfer function, an attention mechanism unit, a full connection layer and a normalized exponential function, and the training network model comprises the following steps:
semantic feature message delivery mode on scene topological graph: in a scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;
global message delivery mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame based on each node of the scene topological graph by adopting a regression network method, wherein image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, fusing the semantic features of the updated scene topological graph nodes with the global edge connecting features connected with the corresponding rectangular frame through a message transfer function, and taking the aggregation features obtained by the fusion result through an attention mechanism as the image features updated in a global message transfer mode;
local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features and the corresponding connection edge features of the neighbor nodes of the image feature nodes in the rectangular frame through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image feature updated by adopting a local message transfer mode;
sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code;
(3) Training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model;
(4) When the method is applied, the scene topological graph is input into the graph neural network model to obtain a generated image code, and the generated image code is input into a decoder of the VQGAN system to generate an image.
2. The method of claim 1, wherein inputting the real image into the VQGAN system to obtain the real image code comprises:
firstly, obtaining an initial potential vector combination of a real image through an encoder of a VQGAN system, comparing an initial potential vector in the initial potential vector combination with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, wherein a subscript of the potential vector combination is real image encoding, and the method comprises the following steps:
wherein the content of the first and second substances,in order to be the initial combination of potential vectors,qthe value of the junction is a function of the distance,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector.
3. The method according to claim 1, wherein the scene topology graph constructed based on the objects in the real image is a scene topology graph, nodes of the scene topology graph represent the objects in the real image, and edges represent relationships between the objects, and the scene topology graph is a primitive ancestorThe composition is as follows:
set of scene topology graph nodesOComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other, o i is a firstiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,a set of object categories;
4. The method for generating an image based on a graph neural network according to claim 1, wherein the scene topological graph is input into the embedded layer network to obtain semantic features and edge connection features of nodes of the scene topological graph.
5. The graph neural network-based image generation method of claim 3, wherein a first neighbor node message is obtained by fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer functionComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,is as followsThe semantic features of each of the neighboring nodes,in order to have the characteristic of connecting the edges,for the information transfer parameter matrix within the scene topology,,D1 is the dimension of the semantic features of the neighboring nodes,Dand 2 is the dimension of the edge connecting feature.
6. The method of claim 3, wherein the image features corresponding to the image feature nodes are updated by fusing the resultsComprises the following steps:
wherein the content of the first and second substances,being a characteristic of a nodev i Is determined by the node of the neighbor node set,is a normalized nodeTo the nodeAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,GeLUis an activation function.
7. The method of claim 1, wherein the image features are updated based on global messagingComprises the following steps:
wherein, the first and the second end of the pipe are connected with each other,is as followsiUpdated semantic node characteristicsIs transmitted to the firstjIndividual image node characteristicsThe message of (2) is transmitted to the mobile terminal,r g is a firstgA global side-to-side type is provided,is a parameter matrix of the global edge-connected type,in order to be a global edge-connected feature,is as followsiAn updated semantic node characteristicTo image node featuresAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,for image node featuresThe semantic feature neighbor node set of (1).
8. The method for generating an image based on a graph neural network according to claim 1, wherein the image features obtained by updating based on a global message transfer mode and a local message transfer mode are subjected to a feed-forward neural network and a normalization operation in sequence to obtain final image features;
and sequentially carrying out feedforward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.
9. The method of claim 1, wherein when the neighbor nodes of the image feature node are in the current rectangle, each image feature node in the rectangle points to other image feature nodes, and the nodes are connected by a specific local edger l The connection is carried out in such a way that,lthe index of the partial edge is represented by,for the first local side-by-side feature,for image feature nodesThe updated image feature node is obtained through the message transfer function and the attention mechanism of the neighbor node set in the same rectangular frameComprises the following steps:
wherein the content of the first and second substances,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,is firstA parameter matrix of a local edge-connected type.
10. The method of claim 1, wherein when the neighbor nodes of the image feature node are within other rectangular boxes, in the scene topology map,representing nodes of an objectPassing edgeNode with objectTo carry out connection, object nodeAnd object nodeRespectively correspond to the position rectangular framesAndin a rectangular frameThe image feature node in (1) will be in contact with the rectangular frameThe image feature node in (1) is also edgeConnection to enable image-level relational messaging, definitionIs a sum ofThe image feature nodes in other rectangular frames in which all the image feature nodes are connected with edges are updated through the message transfer function and the attention mechanism of the image feature nodesComprises the following steps:
wherein the content of the first and second substances,is composed ofjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsAttention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,a parameter matrix of a second local edge type,is a firstjIndividual image feature nodeTo the firstIndividual neighbor node characteristicsThe edge connecting feature of (1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211503117.2A CN115546589B (en) | 2022-11-29 | 2022-11-29 | Image generation method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211503117.2A CN115546589B (en) | 2022-11-29 | 2022-11-29 | Image generation method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115546589A true CN115546589A (en) | 2022-12-30 |
CN115546589B CN115546589B (en) | 2023-04-07 |
Family
ID=84722287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211503117.2A Active CN115546589B (en) | 2022-11-29 | 2022-11-29 | Image generation method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115546589B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115941501A (en) * | 2023-03-08 | 2023-04-07 | 华东交通大学 | Host equipment control method based on graph neural network |
CN116919593A (en) * | 2023-08-04 | 2023-10-24 | 溧阳市中医医院 | Gallbladder extractor for cholecystectomy |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2017101166A4 (en) * | 2017-08-25 | 2017-11-02 | Lai, Haodong MR | A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks |
CN110609891A (en) * | 2019-09-18 | 2019-12-24 | 合肥工业大学 | Visual dialog generation method based on context awareness graph neural network |
US20200074707A1 (en) * | 2018-09-04 | 2020-03-05 | Nvidia Corporation | Joint synthesis and placement of objects in scenes |
CN111325323A (en) * | 2020-02-19 | 2020-06-23 | 山东大学 | Power transmission and transformation scene description automatic generation method fusing global information and local information |
US20200242774A1 (en) * | 2019-01-25 | 2020-07-30 | Nvidia Corporation | Semantic image synthesis for generating substantially photorealistic images using neural networks |
CN112862093A (en) * | 2021-01-29 | 2021-05-28 | 北京邮电大学 | Graph neural network training method and device |
CN113065587A (en) * | 2021-03-23 | 2021-07-02 | 杭州电子科技大学 | Scene graph generation method based on hyper-relation learning network |
CN113221613A (en) * | 2020-12-14 | 2021-08-06 | 国网浙江宁海县供电有限公司 | Power scene early warning method for generating scene graph auxiliary modeling context information |
CN113627557A (en) * | 2021-08-19 | 2021-11-09 | 电子科技大学 | Scene graph generation method based on context graph attention mechanism |
CN113642630A (en) * | 2021-08-10 | 2021-11-12 | 福州大学 | Image description method and system based on dual-path characteristic encoder |
WO2022039465A1 (en) * | 2020-08-18 | 2022-02-24 | 삼성전자 주식회사 | Artificial intelligence system and method for modifying image on basis of relationship between objects |
WO2022045531A1 (en) * | 2020-08-24 | 2022-03-03 | 경기대학교 산학협력단 | Scene graph generation system using deep neural network |
CN114677544A (en) * | 2022-03-24 | 2022-06-28 | 西安交通大学 | Scene graph generation method, system and equipment based on global context interaction |
CN115170449A (en) * | 2022-06-30 | 2022-10-11 | 陕西科技大学 | Method, system, device and medium for generating multi-mode fusion scene graph |
-
2022
- 2022-11-29 CN CN202211503117.2A patent/CN115546589B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2017101166A4 (en) * | 2017-08-25 | 2017-11-02 | Lai, Haodong MR | A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks |
US20200074707A1 (en) * | 2018-09-04 | 2020-03-05 | Nvidia Corporation | Joint synthesis and placement of objects in scenes |
US20200242774A1 (en) * | 2019-01-25 | 2020-07-30 | Nvidia Corporation | Semantic image synthesis for generating substantially photorealistic images using neural networks |
CN110609891A (en) * | 2019-09-18 | 2019-12-24 | 合肥工业大学 | Visual dialog generation method based on context awareness graph neural network |
CN111325323A (en) * | 2020-02-19 | 2020-06-23 | 山东大学 | Power transmission and transformation scene description automatic generation method fusing global information and local information |
WO2022039465A1 (en) * | 2020-08-18 | 2022-02-24 | 삼성전자 주식회사 | Artificial intelligence system and method for modifying image on basis of relationship between objects |
WO2022045531A1 (en) * | 2020-08-24 | 2022-03-03 | 경기대학교 산학협력단 | Scene graph generation system using deep neural network |
CN113221613A (en) * | 2020-12-14 | 2021-08-06 | 国网浙江宁海县供电有限公司 | Power scene early warning method for generating scene graph auxiliary modeling context information |
CN112862093A (en) * | 2021-01-29 | 2021-05-28 | 北京邮电大学 | Graph neural network training method and device |
CN113065587A (en) * | 2021-03-23 | 2021-07-02 | 杭州电子科技大学 | Scene graph generation method based on hyper-relation learning network |
CN113642630A (en) * | 2021-08-10 | 2021-11-12 | 福州大学 | Image description method and system based on dual-path characteristic encoder |
CN113627557A (en) * | 2021-08-19 | 2021-11-09 | 电子科技大学 | Scene graph generation method based on context graph attention mechanism |
CN114677544A (en) * | 2022-03-24 | 2022-06-28 | 西安交通大学 | Scene graph generation method, system and equipment based on global context interaction |
CN115170449A (en) * | 2022-06-30 | 2022-10-11 | 陕西科技大学 | Method, system, device and medium for generating multi-mode fusion scene graph |
Non-Patent Citations (6)
Title |
---|
MAXIMILIAN ZIPFL,ET AL.: "Relation-based Motion Prediction using Traffic Scene Graphs", 《2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)》 * |
P PRADHYUMNA,ET AL.: "Graph Neural Network (GNN) in Image and Video Understanding Using Deep Learning for Computer Vision Applications", 《2021 SECOND INTERNATIONAL CONFERENCE ON ELECTRONICS AND SUSTAINABLE COMMUNICATION SYSTEMS (ICESC)》 * |
PEI CHEN,ET AL.: "Few-Shot Incremental Learning for Label-to-Image Translation" * |
兰红等: "图注意力网络的场景图到图像生成模型", 《中国图象图形学报》 * |
张伟.: "基于目标关系理解的视觉场景图生成算法研究", 《中国优秀硕士学位论文全文库(电子期刊)》 * |
林欣.: "基于上下文的场景图生成", 《中国优秀硕士学位论文全文库(电子期刊)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115941501A (en) * | 2023-03-08 | 2023-04-07 | 华东交通大学 | Host equipment control method based on graph neural network |
CN115941501B (en) * | 2023-03-08 | 2023-07-07 | 华东交通大学 | Main machine equipment control method based on graphic neural network |
CN116919593A (en) * | 2023-08-04 | 2023-10-24 | 溧阳市中医医院 | Gallbladder extractor for cholecystectomy |
CN116919593B (en) * | 2023-08-04 | 2024-02-06 | 溧阳市中医医院 | Gallbladder extractor for cholecystectomy |
Also Published As
Publication number | Publication date |
---|---|
CN115546589B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115546589B (en) | Image generation method based on graph neural network | |
CN110399518B (en) | Visual question-answer enhancement method based on graph convolution | |
Gao et al. | LFT-Net: Local feature transformer network for point clouds analysis | |
CN110766038B (en) | Unsupervised landform classification model training and landform image construction method | |
CN113284100B (en) | Image quality evaluation method based on recovery image to mixed domain attention mechanism | |
CN108563755A (en) | A kind of personalized recommendation system and method based on bidirectional circulating neural network | |
CN113486190B (en) | Multi-mode knowledge representation method integrating entity image information and entity category information | |
CN112884758B (en) | Defect insulator sample generation method and system based on style migration method | |
CN116664719B (en) | Image redrawing model training method, image redrawing method and device | |
CN110706303A (en) | Face image generation method based on GANs | |
CN113065974A (en) | Link prediction method based on dynamic network representation learning | |
CN111275640A (en) | Image enhancement method for fusing two-dimensional discrete wavelet transform and generating countermeasure network | |
CN113240683A (en) | Attention mechanism-based lightweight semantic segmentation model construction method | |
CN115064020A (en) | Intelligent teaching method, system and storage medium based on digital twin technology | |
CN116010813A (en) | Community detection method based on influence degree of fusion label nodes of graph neural network | |
CN109658508B (en) | Multi-scale detail fusion terrain synthesis method | |
CN114723037A (en) | Heterogeneous graph neural network computing method for aggregating high-order neighbor nodes | |
CN114283315A (en) | RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion | |
Jiang et al. | Cross-level reinforced attention network for person re-identification | |
CN115861664A (en) | Feature matching method and system based on local feature fusion and self-attention mechanism | |
CN115965968A (en) | Small sample target detection and identification method based on knowledge guidance | |
Hu et al. | Data Customization-based Multiobjective Optimization Pruning Framework for Remote Sensing Scene Classification | |
CN116798052B (en) | Training method and device of text recognition model, storage medium and electronic equipment | |
CN116628358B (en) | Social robot detection system and method based on multi-view Graph Transformer | |
CN116340842A (en) | Common attention-based heterogeneous graph representation learning method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |