CN115546589A - Image generation method based on graph neural network - Google Patents

Image generation method based on graph neural network Download PDF

Info

Publication number
CN115546589A
CN115546589A CN202211503117.2A CN202211503117A CN115546589A CN 115546589 A CN115546589 A CN 115546589A CN 202211503117 A CN202211503117 A CN 202211503117A CN 115546589 A CN115546589 A CN 115546589A
Authority
CN
China
Prior art keywords
image
node
nodes
scene
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211503117.2A
Other languages
Chinese (zh)
Other versions
CN115546589B (en
Inventor
陈培
张杨康
李泽健
孙凌云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202211503117.2A priority Critical patent/CN115546589B/en
Publication of CN115546589A publication Critical patent/CN115546589A/en
Application granted granted Critical
Publication of CN115546589B publication Critical patent/CN115546589B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an image generation method based on a graph neural network, which comprises the steps of constructing a hypergraph through an image feature node set and a corresponding scene topological graph, and constructing a graph neural network on the hypergraph to simultaneously learn semantic features and potential features of an image in the scene topological graph; simulating object interaction in a real scene through four message transfer modes on a graph neural network, and sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code; training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model; the method can efficiently generate the image with higher visual quality and more correct relationship between the objects.

Description

Image generation method based on graph neural network
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to an image generation method based on a graph neural network.
Background
In recent years, the generation of antagonistic neural Networks (GAN) has made great progress in the field of generating realistic images, which creates high-quality images rich in content from pixel-level images that are indistinguishable from humans. In addition, the image generation method with the condition can make the generated result more controllable and meet the requirements of users, such as: generating images based on the text description, generating human body images based on the skeletal key points, and the like.
In the method for generating the image based on the scene topological graph, each node in the scene topological graph is endowed with a specific semantic meaning, and the nodes represent the relationship between the semantic meanings by using the connection of edges, so that the semantic content and the layout plan of an image can be described, and the semantic content and the layout plan are similar to the form of a human mind map. Therefore, the technology for generating the image by the scene topological graph has important application in the field of human and artificial intelligence cooperative drawing creation.
The existing method for generating images based on scene topology maps involves two phases. In the first stage, the semantic features of the object are obtained by the neural network learning of the graph, and the semantic features are used for determining a semantic segmentation graph of the object, wherein the semantic segmentation graph comprises the coordinate boundary of the object and the rough shape of the object. In the second stage, the existing method generates the final image using a method of generating an image based on a semantic segmentation map. A key challenge of the two-stage based approach is the need to learn semantic features that contain interactions between objects through the graph neural network.
When the graph neural network model fails to capture the interaction of the object or does not incorporate the information of the interaction into the semantic features, the semantic features obtained by that will only contain semantic category information. In this case, each object is generated independently, and the final image is not realistic.
On the other hand, the existing image generation methods ignore the interaction of objects in the image generation phase, i.e. the objects are generated independently and in parallel at this phase without further messaging, which may result in distortion of the objects in the generated image. Therefore, based on the two-stage method, the learning of the interaction information between the objects only exists in the learning stage of the semantic features, which brings a serious burden to the learning of the semantic features.
In order to more accurately capture the interaction between objects, the relationship between the objects needs to be considered in both the semantic feature learning phase and the image generation phase. Therefore, it is necessary to design an image generation method capable of accurately obtaining the relationship between objects and efficiently generating an image with high visual quality.
Disclosure of Invention
The invention provides an image generation method based on a graph neural network, which can efficiently generate an image with higher visual quality and more correct relationship between objects.
An image generation method based on a graph neural network comprises the following steps:
(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image feature node set, constructing a hypergraph through the image feature node set and the corresponding scene topological graph, and constructing a training sample set by the plurality of hypergraphs;
(2) Constructing a training network model, wherein the training network model comprises a message transfer function, an attention mechanism unit, a full connection layer and a normalized exponential function, and the training network model comprises the following steps:
semantic feature message passing mode on scene topological graph: in the scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;
global message passing mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, a regression network method is adopted to construct a rectangular frame based on each node of the scene topological graph, image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, the semantic features of the updated scene topological graph nodes and the global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, and the aggregate features obtained by the fusion result through an attention mechanism are used as the image features updated in a global message transfer mode;
local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image features updated by adopting a local message transfer mode;
sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code;
(3) Training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model;
(4) When the method is applied, the scene topological graph is input into the graph neural network model to obtain a generated image code, and the generated image code is input into a decoder of the VQGAN system to generate an image.
Inputting the real image into a VQGAN system to obtain a real image code, wherein the method comprises the following steps:
firstly, a real image is processed by an encoder of a VQGAN system to obtain an initial potential vector combination, the initial potential vector in the initial potential vector combination is compared with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, and the subscript of the potential vector combination is the real image encoding, wherein:
the potential vector
Figure 744778DEST_PATH_IMAGE001
Comprises the following steps:
Figure 68443DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 582601DEST_PATH_IMAGE003
in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is,handwrespectively the height and width of the potential vector.
The scene topological graph constructed based on the objects in the real image is characterized in that the nodes of the scene topological graph represent the objects in the real image, the connected edges represent the relation between the objects, and the scene topological graph is composed of primitive ancestors
Figure 774548DEST_PATH_IMAGE004
The composition is as follows:
set of scene topology graph nodesOComprises the following steps:
Figure 244712DEST_PATH_IMAGE005
wherein the content of the first and second substances, o i is as followsiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,
Figure 801596DEST_PATH_IMAGE006
a set of object categories;
set of scene topological graph connecting edges
Figure 865367DEST_PATH_IMAGE007
Figure 470791DEST_PATH_IMAGE008
As a set of relationship classes, each edge is represented as
Figure 874091DEST_PATH_IMAGE009
Figure 664192DEST_PATH_IMAGE010
Is composed of
Figure 152942DEST_PATH_IMAGE011
To (1)
Figure 545747DEST_PATH_IMAGE012
The number of the neighbor nodes is increased,
Figure 865869DEST_PATH_IMAGE013
Figure 30135DEST_PATH_IMAGE014
is made byiThe nodes of the scene topology map point to the first
Figure 678285DEST_PATH_IMAGE012
And connecting edges of the nodes of the scene topological graph.
And inputting the scene topological graph into the embedded layer network to obtain the semantic features and the edge connecting features of the scene topological graph nodes.
Fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer function to obtain a first neighbor node message
Figure 750146DEST_PATH_IMAGE015
Comprises the following steps:
Figure 128038DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 118996DEST_PATH_IMAGE017
is as follows
Figure 644656DEST_PATH_IMAGE012
The semantic characteristics of each of the neighboring nodes,
Figure 457891DEST_PATH_IMAGE018
in order to have the characteristic of connecting the edges,
Figure 627972DEST_PATH_IMAGE019
for the information transfer parameter matrix within the scene topology,
Figure 134040DEST_PATH_IMAGE020
D1 is the dimension of the semantic features of the neighboring nodes, D and 2 is the dimension of the edge connecting feature.
Updating image characteristics corresponding to image characteristic nodes through fusion results
Figure 615837DEST_PATH_IMAGE021
Comprises the following steps:
Figure 419713DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 303356DEST_PATH_IMAGE023
being a feature of a nodev i Is determined by the node of the neighbor node set,
Figure 652429DEST_PATH_IMAGE024
is a normalized node
Figure 824784DEST_PATH_IMAGE012
To the node
Figure 776560DEST_PATH_IMAGE025
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,GeLUis an activation function.
Image characteristics updated based on global message passing mode
Figure 170501DEST_PATH_IMAGE026
Comprises the following steps:
Figure 18371DEST_PATH_IMAGE027
Figure 474760DEST_PATH_IMAGE028
wherein, the first and the second end of the pipe are connected with each other,
Figure 105593DEST_PATH_IMAGE029
is a firstiUpdated semantic node characteristics
Figure 698248DEST_PATH_IMAGE030
Is transmitted to the firstjIndividual image node characteristics
Figure 903970DEST_PATH_IMAGE031
The message of (2) is transmitted to the mobile terminal,r g is as followsgA global side-to-side type is provided,
Figure 50918DEST_PATH_IMAGE032
is a parameter matrix of the global edge-connected type,
Figure 875654DEST_PATH_IMAGE033
in order to be a global edge-connected feature,
Figure 463762DEST_PATH_IMAGE034
is a firstiAn updated semantic node characteristic
Figure 184593DEST_PATH_IMAGE035
To image node features
Figure 740208DEST_PATH_IMAGE036
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 40740DEST_PATH_IMAGE037
for image node features
Figure 873566DEST_PATH_IMAGE038
The semantic feature of (2) is a neighbor node set.
Sequentially carrying out feedforward neural network and normalized operation on the image features obtained by updating based on the global message transfer mode and the local message transfer mode to obtain final image features;
and sequentially carrying out feed-forward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.
When the neighbor nodes of the image feature nodes are in the current rectangular frame, each image feature node in the rectangular frame points to other image feature nodes, and specific local edges are arranged between the nodesr l The connection is carried out in such a way that,lan index representing a local edge is determined,
Figure 906244DEST_PATH_IMAGE039
for the first local side-by-side feature,
Figure 27784DEST_PATH_IMAGE040
for image feature nodes
Figure 928744DEST_PATH_IMAGE041
The neighbor node set in the same rectangular frame obtains the updated image feature node through the message transfer function and the attention mechanism
Figure 740711DEST_PATH_IMAGE042
Comprises the following steps:
Figure 68924DEST_PATH_IMAGE043
Figure 84285DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure 992198DEST_PATH_IMAGE045
is composed ofjIndividual image feature node
Figure 534038DEST_PATH_IMAGE046
To the first
Figure 423365DEST_PATH_IMAGE047
Individual neighbor node characteristics
Figure 253918DEST_PATH_IMAGE048
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 434364DEST_PATH_IMAGE049
a parameter matrix of a first local connected edge type.
When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,
Figure 220923DEST_PATH_IMAGE050
representing nodes of an object
Figure 94201DEST_PATH_IMAGE051
Passing edge
Figure 208788DEST_PATH_IMAGE052
Node with object
Figure 130607DEST_PATH_IMAGE053
To carry out connection, object node
Figure 850301DEST_PATH_IMAGE051
And object node
Figure 691218DEST_PATH_IMAGE053
Respectively correspond to the position rectangular frames
Figure 683314DEST_PATH_IMAGE054
And
Figure 471142DEST_PATH_IMAGE055
in a rectangular frame
Figure 107659DEST_PATH_IMAGE054
The image feature node in (2) will be associated with the rectangular frame
Figure 322740DEST_PATH_IMAGE055
The image feature node in (2) is also edge-matched
Figure 287285DEST_PATH_IMAGE052
Connection for image-level relational messaging, definition
Figure 675541DEST_PATH_IMAGE056
Are as follows
Figure 104248DEST_PATH_IMAGE057
The image feature nodes in other rectangular frames in which all the image feature nodes are connected with edges are updated through the message transfer function and the attention mechanism of the image feature nodes
Figure 435039DEST_PATH_IMAGE058
Comprises the following steps:
Figure 11514DEST_PATH_IMAGE059
wherein the content of the first and second substances,
Figure 813248DEST_PATH_IMAGE060
is composed ofjIndividual image feature node
Figure 158779DEST_PATH_IMAGE061
To the first
Figure 981241DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 169646DEST_PATH_IMAGE063
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,
Figure 899704DEST_PATH_IMAGE064
a parameter matrix of a second local edge type,
Figure 37425DEST_PATH_IMAGE065
is as followsjIndividual image feature node
Figure 968472DEST_PATH_IMAGE061
To the first
Figure 253959DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 725392DEST_PATH_IMAGE063
The edge connecting feature of (1).
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention constructs a hypergraph based on an input scene topological graph. The method is different from a two-stage learning method required in the prior art, so that the learning efficiency is improved.
(2) The invention provides four message transmission modes on a graph neural network to simulate the interaction of objects in a real scene, wherein the message transmission on a scene topological graph is used for learning semantic features, the message transmission between the semantic features and image features on the scene topological graph is used for controlling the global generation of images, two message transmission modes are arranged between the image features, one mode is used for controlling the learning of local features of the images, the other mode is used for controlling the learning of the relation between different regions of the images, and the relation between the image features corresponds to the relation defined by the scene topological graph, and finally, the quality of the images generated based on the scene topological graph, including the visual quality of the objects and the correctness of the relation between the objects at the image level, is improved.
Drawings
FIG. 1 is a flowchart of a graph neural network-based image generation model method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a graph neural network-based image generation model method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of four message delivery methods according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings.
The application provides an image generation method based on a graph neural network, as shown in fig. 1 and fig. 2, comprising:
s1: obtaining an image generation pre-training dataset and a controllable image generation dataset: the samples of the pre-training data set are all composed of real images; the controllable image generation data set comprises a real image and a scene topological graph corresponding to the real image.
S2: and (3) constructing a pre-training system VQGAN through a generative confrontation network based on an image generation pre-training data set: VQGAN expresses the composition of an image in the form of a sequence. Any image
Figure 170149DEST_PATH_IMAGE066
Can be represented as a combination of potential vectors,
Figure 334414DEST_PATH_IMAGE067
whereinnIs the dimension of the potential vector or vectors,HandWto be the height and width of the image,handwthe height and width of the potential vector. Two convolution models of VQGAN learning are respectively encoders
Figure 841619DEST_PATH_IMAGE068
And decoder
Figure 54425DEST_PATH_IMAGE069
Obtaining a learned discrete latent vector dictionary
Figure 432317DEST_PATH_IMAGE070
To represent the image or images of the scene,Krepresenting the size of the dictionary in terms of,z k is the first in a vector dictionarykA vector.
The encoder is firstly utilized during VQGAN training
Figure 564221DEST_PATH_IMAGE068
Obtaining initial in-vector combinations
Figure 27563DEST_PATH_IMAGE071
Calculated by the distance nearest principle
Figure 762170DEST_PATH_IMAGE072
The potential vector closest to the potential feature in the potential vector dictionary at each position is used as the potential vector of the current positionzComprises the following steps:
Figure 56885DEST_PATH_IMAGE002
Figure 297374DEST_PATH_IMAGE003
in order to be the initial combination of potential vectors,q("back") is the function of distance to the nearest,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector,
Figure 185695DEST_PATH_IMAGE073
. When training, the image reconstructed by the latent vector combination is basically consistent with the original image:
Figure 802621DEST_PATH_IMAGE074
namely:
Figure 686264DEST_PATH_IMAGE075
the real images in the pre-training data set are input into a pre-training system VQGAN, an encoder
Figure 550183DEST_PATH_IMAGE076
Encoding an image into
Figure 722539DEST_PATH_IMAGE072
I.e. by
Figure 939893DEST_PATH_IMAGE077
Of discrete vectors, decoders
Figure 818988DEST_PATH_IMAGE078
And restoring the discrete vectors into the original image. Combining discrete vectorszI.e. by
Figure 401279DEST_PATH_IMAGE079
Record of
Figure 857668DEST_PATH_IMAGE080
And as initial image feature nodes, the number of potential vectors is used for learning of a scene topological graph image generation training system based on a graph neural network.
The hypergraph is constructed through the image feature node set and the corresponding scene topological graph
Figure 3347DEST_PATH_IMAGE081
And constructing a training sample set by a plurality of hypergraphs. Semantic feature nodes of the scene topology represent objects, and edges represent relationships between the objects. Given a set of object classes
Figure 861582DEST_PATH_IMAGE082
And a set of relationship categories
Figure 552457DEST_PATH_IMAGE083
Semantic node-by-tuple of scene topology
Figure 699405DEST_PATH_IMAGE084
Is composed of (a) wherein
Figure 258562DEST_PATH_IMAGE085
Is a collection of object nodes and each object
Figure 174566DEST_PATH_IMAGE086
Figure 285610DEST_PATH_IMAGE007
Is a collection of edges, each of which can be represented as
Figure 982171DEST_PATH_IMAGE009
Figure 282702DEST_PATH_IMAGE013
Figure 990895DEST_PATH_IMAGE014
Is composed ofiThe scene topology node points to the first
Figure 351469DEST_PATH_IMAGE087
The connecting edges of the nodes of the scene topological graph,
Figure 269747DEST_PATH_IMAGE010
is composed of
Figure 295340DEST_PATH_IMAGE011
To (1) a
Figure 185936DEST_PATH_IMAGE012
And (4) each neighbor node.
The method includes the steps that a scene topological graph is input into an embedded network to obtain semantic features of each node of the scene topological graph
Figure 779728DEST_PATH_IMAGE088
And edge characteristics of edges connecting between nodes
Figure 122985DEST_PATH_IMAGE089
In which
Figure 703002DEST_PATH_IMAGE014
Indicating the edge type.
S3: building a training network model, and defining four message transfer models on a graph neural network to simulate the interaction of objects in a scene, wherein the four message transfer models comprise:
s31: semantic feature message passing mode on scene topological graph: as shown in fig. 3 (a), in the scene topology graph, the semantic features and the edge connecting features of each neighbor node of the scene topology graph nodes are fused through a message transfer function to obtain first neighbor node messages, each first neighbor node message is aggregated through an attention mechanism unit, and the semantic features of the scene topology graph nodes are updated through an aggregation result. After the message transmission is finished, the semantic features of the nodes of the topological graph of each scene are further updated by utilizing a feedforward neural network and a normalization operation to obtain final semantic features, so that the feature conversion capability is improved, and the phenomenon of over-smoothness is relieved.
First neighbor node message provided by the application
Figure 510421DEST_PATH_IMAGE015
Comprises the following steps:
Figure 212798DEST_PATH_IMAGE090
wherein the content of the first and second substances,
Figure 230301DEST_PATH_IMAGE017
is as follows
Figure 207484DEST_PATH_IMAGE012
The semantic features of each of the neighboring nodes,
Figure 72672DEST_PATH_IMAGE091
in order to have the characteristic of connecting the edges,
Figure 883633DEST_PATH_IMAGE019
for the information transfer parameter matrix within the scene topology,
Figure 201482DEST_PATH_IMAGE020
D1 is the dimension of the semantic features of the neighboring nodes, D and 2 is the dimension of the edge connecting feature.
Updating image characteristic node pairs through fusion results provided by the applicationCorresponding image characteristics
Figure 247936DEST_PATH_IMAGE021
Comprises the following steps:
Figure 889002DEST_PATH_IMAGE092
wherein the content of the first and second substances,
Figure 933181DEST_PATH_IMAGE023
being a feature of a nodev i Is determined by the node of the neighbor node set,
Figure 535064DEST_PATH_IMAGE024
is a normalized node
Figure 260574DEST_PATH_IMAGE012
To node
Figure 834775DEST_PATH_IMAGE025
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,GeLUis an activation function.
The semantic features of the nodes of the topological graph of each scene are further updated by utilizing the feedforward neural network and the normalized operation to obtain the final semantic features
Figure 112172DEST_PATH_IMAGE093
Comprises the following steps:
Figure 60406DEST_PATH_IMAGE094
wherein the content of the first and second substances,LayerNormis a function of the normalization of the signals,
Figure 651924DEST_PATH_IMAGE095
and
Figure 142948DEST_PATH_IMAGE096
is a parameter matrix of a feed-forward neural network,
Figure 201034DEST_PATH_IMAGE097
is an activation function.
S32: global message passing mode: as shown in fig. 3 (b), the global messaging considers information interaction between node semantic feature information and image feature information in the input scene topology map. And when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame of the real image based on each node of the scene topological graph by adopting a regression network method. The prior art uses semantic features of nodes to predict the position rectangular box and object shape of an object, and then fills the semantic features into specific position and shape regions. The invention follows the similar object-to-region criterion, firstly defines a regression network of the rectangular frame of the object position to predict each object
Figure 980771DEST_PATH_IMAGE098
At a rectangular position
Figure 172718DEST_PATH_IMAGE099
Wherein
Figure 455932DEST_PATH_IMAGE100
Representing the coordinates of the upper left corner of the rectangular box,
Figure 199766DEST_PATH_IMAGE101
and
Figure 997958DEST_PATH_IMAGE102
respectively representing the width and height of the rectangular box.
Image feature nodes of an object are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, updated scene topological graph node semantic features and global edge connecting features connected with the corresponding rectangular frame are fused through a message transfer function, image features corresponding to the image feature nodes are updated through a fusion result based on an attention mechanism, and image features corresponding to each updated image feature node are further updated through a feedforward neural network and a normalized operation to obtain final image features
Figure 603383DEST_PATH_IMAGE103
Comprises the following steps:
Figure 803420DEST_PATH_IMAGE104
wherein, the first and the second end of the pipe are connected with each other,
Figure 718155DEST_PATH_IMAGE029
is a firstiUpdated semantic node characteristics
Figure 941326DEST_PATH_IMAGE030
Is transmitted to the firstjIndividual image node characteristics
Figure 475075DEST_PATH_IMAGE031
The message of (a) is received,r g is as followsgA global edge-connected type, which is a global edge-connected type,
Figure 670565DEST_PATH_IMAGE032
is a parameter matrix of the global edge-connected type,
Figure 569250DEST_PATH_IMAGE033
in order to be a global edge-connected feature,
Figure 607614DEST_PATH_IMAGE034
is a firstiAn updated semantic node characteristic
Figure 882737DEST_PATH_IMAGE035
To image node features
Figure 182000DEST_PATH_IMAGE036
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 517167DEST_PATH_IMAGE037
as a feature of a node of an image
Figure 511668DEST_PATH_IMAGE038
The semantic feature neighbor node set of (1).
S32: local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features of the neighbor nodes of the image feature nodes in the rectangular frame and the corresponding connecting edge features through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and updating the image features corresponding to the image feature nodes through an aggregation result;
when the neighbor nodes of the image feature nodes are in the current rectangular frame, the message transmission mode is defined as a first local message transmission mode, and when the neighbor nodes of the image feature nodes are in other rectangular frames, the message transmission mode is defined as a second local message transmission mode.
First local messaging: as shown in fig. 3 (c), the purpose of local messaging is to learn local visual details of an image so that the generated image has finer granularity of details. Each image feature node is sensitive to its surrounding image feature nodes, and in particular, all image feature nodes within a rectangular box constitute a complete graph, i.e. each image feature node within a rectangular box points to other image feature nodes with specific local edges between themr l The connection is carried out in such a way that,lan index representing a local edge is determined,
Figure 528165DEST_PATH_IMAGE039
is a local continuous edge feature. Definition of
Figure 760563DEST_PATH_IMAGE040
For image feature nodes
Figure 63369DEST_PATH_IMAGE041
The neighbor nodes in the same rectangular frame obtain the final image characteristic node updated by the first local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalization operation
Figure 14007DEST_PATH_IMAGE105
Comprises the following steps:
Figure 552305DEST_PATH_IMAGE106
wherein, the first and the second end of the pipe are connected with each other,
Figure 701526DEST_PATH_IMAGE045
is composed ofjIndividual image feature node
Figure 785020DEST_PATH_IMAGE046
To the first
Figure 957375DEST_PATH_IMAGE047
Individual neighbor node characteristics
Figure 440309DEST_PATH_IMAGE048
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,
Figure 381720DEST_PATH_IMAGE049
a parameter matrix of a first local continuous type.
Second local messaging: as shown in fig. 3 (d), the second local messaging approach is to model the interrelationship between objects at the image level. And according to the semantic feature message transmission mode on the scene topological graph, transmitting the message according to the defined object and the relation between the objects at the image level. When the neighbor nodes of the image feature node are in other rectangular frames, in the scene topological graph,
Figure 150962DEST_PATH_IMAGE050
representing nodes of an object
Figure 872931DEST_PATH_IMAGE051
Passing edge
Figure 831659DEST_PATH_IMAGE052
Node with object
Figure 830839DEST_PATH_IMAGE053
To carry out connection, object node
Figure 849611DEST_PATH_IMAGE051
And object node
Figure 793296DEST_PATH_IMAGE053
Respectively correspond to the position rectangular frame
Figure 555716DEST_PATH_IMAGE054
And
Figure 393091DEST_PATH_IMAGE055
in a rectangular frame
Figure 645080DEST_PATH_IMAGE054
The image feature node in (1) will be in contact with the rectangular frame
Figure 279324DEST_PATH_IMAGE055
The image feature node in (2) is also edge-matched
Figure 517539DEST_PATH_IMAGE052
Connection to enable image-level relational messaging, definition
Figure 84786DEST_PATH_IMAGE056
Is a sum of
Figure 445360DEST_PATH_IMAGE057
In the method, the image feature nodes in other rectangular frames with edge connection of all the image feature nodes are considered that different rectangular frames have huge number of connecting edges, and the random sampling strategy is adopted to reduce the mapping number of the edges. Obtaining the final image characteristic node updated by the second local message transfer mode through the message transfer function, the attention mechanism, the feedforward neural network and the normalized operation
Figure 488272DEST_PATH_IMAGE107
Comprises the following steps:
Figure 592494DEST_PATH_IMAGE108
Figure 279827DEST_PATH_IMAGE109
wherein, the first and the second end of the pipe are connected with each other,
Figure 483406DEST_PATH_IMAGE060
is composed ofjIndividual image feature node
Figure 888980DEST_PATH_IMAGE061
To the first
Figure 531314DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 197787DEST_PATH_IMAGE063
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 962481DEST_PATH_IMAGE064
a parameter matrix of a second local edge type,
Figure 793034DEST_PATH_IMAGE065
is as followsjIndividual image feature node
Figure 176742DEST_PATH_IMAGE061
To the first
Figure 838667DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 711945DEST_PATH_IMAGE063
The edge connecting feature of (1).
Finally, the final image node characteristics are obtained through the information transmission mode related to the last three image characteristics
Figure 951166DEST_PATH_IMAGE110
Comprises the following steps:
Figure 200881DEST_PATH_IMAGE111
obtaining a final image node feature set by a plurality of final image node features, and sequentially inputting the final image node feature set into a layer of fully-connected prediction network and a normalization index function (a)softmax) Generating image codes
Figure 982893DEST_PATH_IMAGE112
Figure 699176DEST_PATH_IMAGE113
Figure 238742DEST_PATH_IMAGE114
Wherein the content of the first and second substances,
Figure 88886DEST_PATH_IMAGE115
to predict parameters of the network.
S4: training the training network model based on the training sample set, and training the training network model by adopting a cross entropy loss function through generating image codes and real image codes to obtain the graph neural network model. And defining a loss function combined with an autoregressive prediction mode, and predicting and generating image codes by using real image codes.
Training phase, inputting real images
Figure 663087DEST_PATH_IMAGE116
As input, the encoder of VQGAN
Figure 799539DEST_PATH_IMAGE117
Encoding and converting images into
Figure 154297DEST_PATH_IMAGE118
Image latent vector ofzFinding potential vectors in a vector dictionary
Figure 480236DEST_PATH_IMAGE119
Subscript of (1), vector dictionary
Figure 846626DEST_PATH_IMAGE119
Constructing real image code by using a plurality of subscripts in the method, and using the real image code as a training label
Figure 294925DEST_PATH_IMAGE120
BThe number of codes for the real image.
Figure 809083DEST_PATH_IMAGE121
Wherein the content of the first and second substances,
Figure 125664DEST_PATH_IMAGE122
is frontb-1 real image dictionary subscript, i.e. frontb-1 real image encoding, trained so thatbGenerating image coding
Figure 471195DEST_PATH_IMAGE123
And a firstbEncoding of a real image
Figure 700182DEST_PATH_IMAGE124
Probability of proximity
Figure 701636DEST_PATH_IMAGE125
At the maximum, the number of the first,
Figure 431695DEST_PATH_IMAGE126
are graph neural network parameters.
S5: and testing the model trained in the S4.
Testing, inputting any scene topological graph, generating new image dictionary subscripts one by one through the graph neural network model trained in S4 in an autoregressive mode under the condition of not needing the real image dictionary subscripts, generating image codes, and trainingThe difference is that the generated subscript is used for predicting and generating a new dictionary subscript instead of the real dictionary subscript, namely real image coding, and after all image potential vectors are obtained, a decoder of VQGAN is used
Figure 756365DEST_PATH_IMAGE127
And converting the image potential vector corresponding to the subscript into a generated image. Polynomial resampling methods are utilized to obtain different image latent vectors to increase the diversity of the generated images.

Claims (10)

1. An image generation method based on a graph neural network is characterized by comprising the following steps:
(1) Acquiring a plurality of real images, constructing a scene topological graph based on objects in the real images, inputting the real images into a VQGAN system to obtain real image codes and an image characteristic node set, constructing a hypergraph through the image characteristic node set and the corresponding scene topological graph, and constructing a training sample set by a plurality of hypergraphs;
(2) Constructing a training network model, wherein the training network model comprises a message transfer function, an attention mechanism unit, a full connection layer and a normalized exponential function, and the training network model comprises the following steps:
semantic feature message delivery mode on scene topological graph: in a scene topological graph, fusing semantic features and edge connecting features of each neighbor node of the scene topological graph nodes through a message transfer function to obtain first neighbor node messages, aggregating each first neighbor node message through an attention mechanism unit, and taking an aggregation result as an updated scene topological graph node semantic feature;
global message delivery mode: when the neighbor nodes of the image feature nodes are scene topological graph nodes, constructing a rectangular frame based on each node of the scene topological graph by adopting a regression network method, wherein image feature nodes of objects are arranged in the rectangular frame, each node of the scene topological graph points to the corresponding rectangular frame, fusing the semantic features of the updated scene topological graph nodes with the global edge connecting features connected with the corresponding rectangular frame through a message transfer function, and taking the aggregation features obtained by the fusion result through an attention mechanism as the image features updated in a global message transfer mode;
local message transmission mode: when the neighbor nodes of the image feature nodes are in the current rectangular frame or other rectangular frames, fusing the image features and the corresponding connection edge features of the neighbor nodes of the image feature nodes in the rectangular frame through a message transfer function to obtain second neighbor node information, aggregating each second neighbor node information through an attention mechanism unit, and taking the aggregation result as the image feature updated by adopting a local message transfer mode;
sequentially inputting an image feature set obtained by updating based on a global message transfer mode and a local message transfer mode into a full connection layer and a normalization index function to obtain a generated image code;
(3) Training the training network model based on the training sample set, and training the training network model by adopting a loss function through generating image codes and real image codes to obtain a graph neural network model;
(4) When the method is applied, the scene topological graph is input into the graph neural network model to obtain a generated image code, and the generated image code is input into a decoder of the VQGAN system to generate an image.
2. The method of claim 1, wherein inputting the real image into the VQGAN system to obtain the real image code comprises:
firstly, obtaining an initial potential vector combination of a real image through an encoder of a VQGAN system, comparing an initial potential vector in the initial potential vector combination with a vector dictionary based on a distance nearest principle to obtain a potential vector combination, wherein a subscript of the potential vector combination is real image encoding, and the method comprises the following steps:
the potential vector
Figure 459599DEST_PATH_IMAGE001
Comprises the following steps:
Figure 682770DEST_PATH_IMAGE002
wherein the content of the first and second substances,
Figure 950940DEST_PATH_IMAGE003
in order to be the initial combination of potential vectors,qthe value of the junction is a function of the distance,z k is the first in a vector dictionarykThe number of the vectors is such that,nis a dimension of a vector and is a function of,handwrespectively the height and width of the potential vector.
3. The method according to claim 1, wherein the scene topology graph constructed based on the objects in the real image is a scene topology graph, nodes of the scene topology graph represent the objects in the real image, and edges represent relationships between the objects, and the scene topology graph is a primitive ancestor
Figure 943167DEST_PATH_IMAGE004
The composition is as follows:
set of scene topology graph nodesOComprises the following steps:
Figure 310694DEST_PATH_IMAGE005
wherein, the first and the second end of the pipe are connected with each other, o i is a firstiThe nodes of the topological graph of the scene are,Nthe number of nodes of the scene topological graph is,
Figure 817899DEST_PATH_IMAGE006
a set of object categories;
set of scene topology graph connecting edges
Figure 76711DEST_PATH_IMAGE007
Figure 657865DEST_PATH_IMAGE008
For a set of relationship classes, each edge is represented as
Figure 524190DEST_PATH_IMAGE009
Figure 456374DEST_PATH_IMAGE010
Is composed of
Figure 4030DEST_PATH_IMAGE011
To (1) a
Figure 502007DEST_PATH_IMAGE012
The number of the neighbor nodes is increased,
Figure 991763DEST_PATH_IMAGE013
Figure 145664DEST_PATH_IMAGE014
is made byiThe scene topology node points to the first
Figure 293748DEST_PATH_IMAGE012
And connecting edges of the scene topological graph nodes.
4. The method for generating an image based on a graph neural network according to claim 1, wherein the scene topological graph is input into the embedded layer network to obtain semantic features and edge connection features of nodes of the scene topological graph.
5. The graph neural network-based image generation method of claim 3, wherein a first neighbor node message is obtained by fusing semantic features and edge connecting features of each neighbor node of a scene topological graph node through a message transfer function
Figure 849495DEST_PATH_IMAGE015
Comprises the following steps:
Figure 260884DEST_PATH_IMAGE016
wherein, the first and the second end of the pipe are connected with each other,
Figure 964398DEST_PATH_IMAGE017
is as follows
Figure 103124DEST_PATH_IMAGE012
The semantic features of each of the neighboring nodes,
Figure 575694DEST_PATH_IMAGE018
in order to have the characteristic of connecting the edges,
Figure 892406DEST_PATH_IMAGE019
for the information transfer parameter matrix within the scene topology,
Figure 286478DEST_PATH_IMAGE020
D1 is the dimension of the semantic features of the neighboring nodes,Dand 2 is the dimension of the edge connecting feature.
6. The method of claim 3, wherein the image features corresponding to the image feature nodes are updated by fusing the results
Figure 776365DEST_PATH_IMAGE021
Comprises the following steps:
Figure 306704DEST_PATH_IMAGE022
wherein the content of the first and second substances,
Figure 309164DEST_PATH_IMAGE023
being a characteristic of a nodev i Is determined by the node of the neighbor node set,
Figure 924953DEST_PATH_IMAGE024
is a normalized node
Figure 218531DEST_PATH_IMAGE012
To the node
Figure 868955DEST_PATH_IMAGE025
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,GeLUis an activation function.
7. The method of claim 1, wherein the image features are updated based on global messaging
Figure 527470DEST_PATH_IMAGE026
Comprises the following steps:
Figure 427292DEST_PATH_IMAGE027
Figure 711512DEST_PATH_IMAGE028
wherein, the first and the second end of the pipe are connected with each other,
Figure 216443DEST_PATH_IMAGE029
is as followsiUpdated semantic node characteristics
Figure 311438DEST_PATH_IMAGE030
Is transmitted to the firstjIndividual image node characteristics
Figure 698557DEST_PATH_IMAGE031
The message of (2) is transmitted to the mobile terminal,r g is a firstgA global side-to-side type is provided,
Figure 271621DEST_PATH_IMAGE032
is a parameter matrix of the global edge-connected type,
Figure 693375DEST_PATH_IMAGE033
in order to be a global edge-connected feature,
Figure 208538DEST_PATH_IMAGE034
is as followsiAn updated semantic node characteristic
Figure 20637DEST_PATH_IMAGE035
To image node features
Figure 397391DEST_PATH_IMAGE036
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 876914DEST_PATH_IMAGE037
for image node features
Figure 110449DEST_PATH_IMAGE038
The semantic feature neighbor node set of (1).
8. The method for generating an image based on a graph neural network according to claim 1, wherein the image features obtained by updating based on a global message transfer mode and a local message transfer mode are subjected to a feed-forward neural network and a normalization operation in sequence to obtain final image features;
and sequentially carrying out feedforward neural network and normalization operation on the semantic features of the nodes of the scene topological graph obtained by updating the semantic feature message transfer mode on the basis of the scene topological graph to obtain the final semantic feature message.
9. The method of claim 1, wherein when the neighbor nodes of the image feature node are in the current rectangle, each image feature node in the rectangle points to other image feature nodes, and the nodes are connected by a specific local edger l The connection is carried out in such a way that,lthe index of the partial edge is represented by,
Figure 924691DEST_PATH_IMAGE039
for the first local side-by-side feature,
Figure 839557DEST_PATH_IMAGE040
for image feature nodes
Figure 173586DEST_PATH_IMAGE041
The updated image feature node is obtained through the message transfer function and the attention mechanism of the neighbor node set in the same rectangular frame
Figure 578023DEST_PATH_IMAGE042
Comprises the following steps:
Figure 630292DEST_PATH_IMAGE043
Figure 598117DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure 786653DEST_PATH_IMAGE045
is composed ofjIndividual image feature node
Figure 361991DEST_PATH_IMAGE046
To the first
Figure 635978DEST_PATH_IMAGE047
Individual neighbor node characteristics
Figure 158226DEST_PATH_IMAGE048
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameter matrixes,
Figure 263585DEST_PATH_IMAGE049
is firstA parameter matrix of a local edge-connected type.
10. The method of claim 1, wherein when the neighbor nodes of the image feature node are within other rectangular boxes, in the scene topology map,
Figure 196775DEST_PATH_IMAGE050
representing nodes of an object
Figure 223637DEST_PATH_IMAGE051
Passing edge
Figure 80734DEST_PATH_IMAGE052
Node with object
Figure 978283DEST_PATH_IMAGE053
To carry out connection, object node
Figure 161003DEST_PATH_IMAGE051
And object node
Figure 409581DEST_PATH_IMAGE053
Respectively correspond to the position rectangular frames
Figure 257321DEST_PATH_IMAGE054
And
Figure 337272DEST_PATH_IMAGE055
in a rectangular frame
Figure 628576DEST_PATH_IMAGE054
The image feature node in (1) will be in contact with the rectangular frame
Figure 98872DEST_PATH_IMAGE055
The image feature node in (1) is also edge
Figure 563351DEST_PATH_IMAGE052
Connection to enable image-level relational messaging, definition
Figure 435492DEST_PATH_IMAGE056
Is a sum of
Figure 694435DEST_PATH_IMAGE057
The image feature nodes in other rectangular frames in which all the image feature nodes are connected with edges are updated through the message transfer function and the attention mechanism of the image feature nodes
Figure 166874DEST_PATH_IMAGE058
Comprises the following steps:
Figure 107148DEST_PATH_IMAGE059
wherein the content of the first and second substances,
Figure 99375DEST_PATH_IMAGE060
is composed ofjIndividual image feature node
Figure 794798DEST_PATH_IMAGE061
To the first
Figure 239686DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 232919DEST_PATH_IMAGE063
Attention coefficient of (1), W 1 And W 2 Respectively, a parameter matrix is formed by the parameters,
Figure 141969DEST_PATH_IMAGE064
a parameter matrix of a second local edge type,
Figure 680398DEST_PATH_IMAGE065
is a firstjIndividual image feature node
Figure 878161DEST_PATH_IMAGE061
To the first
Figure 222554DEST_PATH_IMAGE062
Individual neighbor node characteristics
Figure 923794DEST_PATH_IMAGE063
The edge connecting feature of (1).
CN202211503117.2A 2022-11-29 2022-11-29 Image generation method based on graph neural network Active CN115546589B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211503117.2A CN115546589B (en) 2022-11-29 2022-11-29 Image generation method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211503117.2A CN115546589B (en) 2022-11-29 2022-11-29 Image generation method based on graph neural network

Publications (2)

Publication Number Publication Date
CN115546589A true CN115546589A (en) 2022-12-30
CN115546589B CN115546589B (en) 2023-04-07

Family

ID=84722287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211503117.2A Active CN115546589B (en) 2022-11-29 2022-11-29 Image generation method based on graph neural network

Country Status (1)

Country Link
CN (1) CN115546589B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941501A (en) * 2023-03-08 2023-04-07 华东交通大学 Host equipment control method based on graph neural network
CN116919593A (en) * 2023-08-04 2023-10-24 溧阳市中医医院 Gallbladder extractor for cholecystectomy

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
CN110609891A (en) * 2019-09-18 2019-12-24 合肥工业大学 Visual dialog generation method based on context awareness graph neural network
US20200074707A1 (en) * 2018-09-04 2020-03-05 Nvidia Corporation Joint synthesis and placement of objects in scenes
CN111325323A (en) * 2020-02-19 2020-06-23 山东大学 Power transmission and transformation scene description automatic generation method fusing global information and local information
US20200242774A1 (en) * 2019-01-25 2020-07-30 Nvidia Corporation Semantic image synthesis for generating substantially photorealistic images using neural networks
CN112862093A (en) * 2021-01-29 2021-05-28 北京邮电大学 Graph neural network training method and device
CN113065587A (en) * 2021-03-23 2021-07-02 杭州电子科技大学 Scene graph generation method based on hyper-relation learning network
CN113221613A (en) * 2020-12-14 2021-08-06 国网浙江宁海县供电有限公司 Power scene early warning method for generating scene graph auxiliary modeling context information
CN113627557A (en) * 2021-08-19 2021-11-09 电子科技大学 Scene graph generation method based on context graph attention mechanism
CN113642630A (en) * 2021-08-10 2021-11-12 福州大学 Image description method and system based on dual-path characteristic encoder
WO2022039465A1 (en) * 2020-08-18 2022-02-24 삼성전자 주식회사 Artificial intelligence system and method for modifying image on basis of relationship between objects
WO2022045531A1 (en) * 2020-08-24 2022-03-03 경기대학교 산학협력단 Scene graph generation system using deep neural network
CN114677544A (en) * 2022-03-24 2022-06-28 西安交通大学 Scene graph generation method, system and equipment based on global context interaction
CN115170449A (en) * 2022-06-30 2022-10-11 陕西科技大学 Method, system, device and medium for generating multi-mode fusion scene graph

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
US20200074707A1 (en) * 2018-09-04 2020-03-05 Nvidia Corporation Joint synthesis and placement of objects in scenes
US20200242774A1 (en) * 2019-01-25 2020-07-30 Nvidia Corporation Semantic image synthesis for generating substantially photorealistic images using neural networks
CN110609891A (en) * 2019-09-18 2019-12-24 合肥工业大学 Visual dialog generation method based on context awareness graph neural network
CN111325323A (en) * 2020-02-19 2020-06-23 山东大学 Power transmission and transformation scene description automatic generation method fusing global information and local information
WO2022039465A1 (en) * 2020-08-18 2022-02-24 삼성전자 주식회사 Artificial intelligence system and method for modifying image on basis of relationship between objects
WO2022045531A1 (en) * 2020-08-24 2022-03-03 경기대학교 산학협력단 Scene graph generation system using deep neural network
CN113221613A (en) * 2020-12-14 2021-08-06 国网浙江宁海县供电有限公司 Power scene early warning method for generating scene graph auxiliary modeling context information
CN112862093A (en) * 2021-01-29 2021-05-28 北京邮电大学 Graph neural network training method and device
CN113065587A (en) * 2021-03-23 2021-07-02 杭州电子科技大学 Scene graph generation method based on hyper-relation learning network
CN113642630A (en) * 2021-08-10 2021-11-12 福州大学 Image description method and system based on dual-path characteristic encoder
CN113627557A (en) * 2021-08-19 2021-11-09 电子科技大学 Scene graph generation method based on context graph attention mechanism
CN114677544A (en) * 2022-03-24 2022-06-28 西安交通大学 Scene graph generation method, system and equipment based on global context interaction
CN115170449A (en) * 2022-06-30 2022-10-11 陕西科技大学 Method, system, device and medium for generating multi-mode fusion scene graph

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
MAXIMILIAN ZIPFL,ET AL.: "Relation-based Motion Prediction using Traffic Scene Graphs", 《2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC)》 *
P PRADHYUMNA,ET AL.: "Graph Neural Network (GNN) in Image and Video Understanding Using Deep Learning for Computer Vision Applications", 《2021 SECOND INTERNATIONAL CONFERENCE ON ELECTRONICS AND SUSTAINABLE COMMUNICATION SYSTEMS (ICESC)》 *
PEI CHEN,ET AL.: "Few-Shot Incremental Learning for Label-to-Image Translation" *
兰红等: "图注意力网络的场景图到图像生成模型", 《中国图象图形学报》 *
张伟.: "基于目标关系理解的视觉场景图生成算法研究", 《中国优秀硕士学位论文全文库(电子期刊)》 *
林欣.: "基于上下文的场景图生成", 《中国优秀硕士学位论文全文库(电子期刊)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115941501A (en) * 2023-03-08 2023-04-07 华东交通大学 Host equipment control method based on graph neural network
CN115941501B (en) * 2023-03-08 2023-07-07 华东交通大学 Main machine equipment control method based on graphic neural network
CN116919593A (en) * 2023-08-04 2023-10-24 溧阳市中医医院 Gallbladder extractor for cholecystectomy
CN116919593B (en) * 2023-08-04 2024-02-06 溧阳市中医医院 Gallbladder extractor for cholecystectomy

Also Published As

Publication number Publication date
CN115546589B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115546589B (en) Image generation method based on graph neural network
CN110399518B (en) Visual question-answer enhancement method based on graph convolution
Gao et al. LFT-Net: Local feature transformer network for point clouds analysis
CN110766038B (en) Unsupervised landform classification model training and landform image construction method
CN113284100B (en) Image quality evaluation method based on recovery image to mixed domain attention mechanism
CN108563755A (en) A kind of personalized recommendation system and method based on bidirectional circulating neural network
CN113486190B (en) Multi-mode knowledge representation method integrating entity image information and entity category information
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN116664719B (en) Image redrawing model training method, image redrawing method and device
CN110706303A (en) Face image generation method based on GANs
CN113065974A (en) Link prediction method based on dynamic network representation learning
CN111275640A (en) Image enhancement method for fusing two-dimensional discrete wavelet transform and generating countermeasure network
CN113240683A (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN115064020A (en) Intelligent teaching method, system and storage medium based on digital twin technology
CN116010813A (en) Community detection method based on influence degree of fusion label nodes of graph neural network
CN109658508B (en) Multi-scale detail fusion terrain synthesis method
CN114723037A (en) Heterogeneous graph neural network computing method for aggregating high-order neighbor nodes
CN114283315A (en) RGB-D significance target detection method based on interactive guidance attention and trapezoidal pyramid fusion
Jiang et al. Cross-level reinforced attention network for person re-identification
CN115861664A (en) Feature matching method and system based on local feature fusion and self-attention mechanism
CN115965968A (en) Small sample target detection and identification method based on knowledge guidance
Hu et al. Data Customization-based Multiobjective Optimization Pruning Framework for Remote Sensing Scene Classification
CN116798052B (en) Training method and device of text recognition model, storage medium and electronic equipment
CN116628358B (en) Social robot detection system and method based on multi-view Graph Transformer
CN116340842A (en) Common attention-based heterogeneous graph representation learning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant