CN114677544A - Scene graph generation method, system and equipment based on global context interaction - Google Patents
Scene graph generation method, system and equipment based on global context interaction Download PDFInfo
- Publication number
- CN114677544A CN114677544A CN202210297025.7A CN202210297025A CN114677544A CN 114677544 A CN114677544 A CN 114677544A CN 202210297025 A CN202210297025 A CN 202210297025A CN 114677544 A CN114677544 A CN 114677544A
- Authority
- CN
- China
- Prior art keywords
- target
- feature
- global
- vector
- gru
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000003993 interaction Effects 0.000 title claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 67
- 238000013528 artificial neural network Methods 0.000 claims abstract description 28
- 230000000007 visual effect Effects 0.000 claims abstract description 28
- 230000004927 fusion Effects 0.000 claims abstract description 20
- 230000007246 mechanism Effects 0.000 claims abstract description 13
- 238000012546 transfer Methods 0.000 claims abstract description 11
- 230000002457 bidirectional effect Effects 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 24
- 238000001514 detection method Methods 0.000 claims description 15
- 230000002776 aggregation Effects 0.000 claims description 11
- 238000004220 aggregation Methods 0.000 claims description 11
- 230000000306 recurrent effect Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000013604 expression vector Substances 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000013459 approach Methods 0.000 claims description 2
- 238000007500 overflow downdraw method Methods 0.000 claims description 2
- 239000004576 sand Substances 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 230000005540 biological transmission Effects 0.000 abstract description 9
- 125000004122 cyclic group Chemical group 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000011056 performance test Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a scene graph generation method, a system and equipment based on global context interaction, 1) vector joint representation based on fusion of a plurality of characteristics such as object visual characteristics, space coordinates, semantic labels and the like; 2) generating global characteristics based on a bidirectional gating cyclic neural network; 3) a message iteration transfer mechanism based on the global feature vector; 4) and generating a scene graph based on the target and the relation state representation. Compared with the existing scene graph generation method, the scene graph generation method based on the global context interaction has the advantages that the global characteristics of the images are fully utilized through the context interaction, and the application universality is higher; meanwhile, after the global features after context interaction are obtained, message transmission between the target pair and the relation is carried out, the existing state is updated by utilizing the potential relation between the targets, and more accurate scene graph generation is carried out, so that the method has the advantages of practical application.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a scene graph generation method, a scene graph generation system and scene graph generation equipment based on global context interaction.
Background
A scene graph composed of < subject-relationship-object > triplets is capable of describing objects in an image and scene structure relationships between pairs of objects. The scene graph has two main advantages: firstly, the < subject-relation-object > triple of the scene graph has structured semantic content, and has obvious advantages in the fine-grained information acquisition and processing process compared with the natural language text; secondly, the scene graph can fully represent the object and scene structure relationship in the image, and has wide application prospects in various computer vision tasks, such as: in the field of automatic driving of vehicles, the scene graph is used for environment modeling, so that more comprehensive environment information can be provided for a decision-making system; in the semantic image retrieval task, an image provider models the scene structure relationship of the image through a scene graph, so that a user can retrieve the image meeting the requirement only by describing a main target or relationship. Based on the real-time requirements of massive pictures and downstream tasks on the scene graph, the scene graph generation by using a computer gradually becomes a research hotspot, and has important significance for the field of image understanding.
The existing scene graph generation method based on message transmission constructs target nodes and relationship edges according to the result of target inspection, updates the state in a local subgraph by using a recurrent neural network based on a message transmission mechanism, and uses the characteristics after message transmission for relationship prediction. The method adopts a message transmission mechanism based on a local context idea, ignores implicit constraints among targets, only takes the visual characteristics of target nodes as an initial state, detects the relationship only depending on the repeated communication of the characteristics of the subject object nodes and the joint visual characteristics, cannot consider the overall structure of the image by a model, and does not play a role in relationship prediction by global information, thereby limiting the prediction capability of the model. In addition, the existing method cannot utilize object coordinates, and the visual relation between the targets is not analyzed from a space perspective. Aiming at the problems, the invention provides a scene graph generation method based on global context interaction. The existing scene graph generation method comprises the following steps:
The prior art 2 proposes a scene graph generation method based on a depth relationship self-attention network, and the method mainly includes: firstly, carrying out target detection on an input image to obtain a label, an object frame characteristic and a combined frame characteristic; then, constructing target characteristics and relative relation characteristics; and finally, generating a final visual scene graph by using the deep neural network.
The scene graph generation method in the prior art 1 does not consider fully utilizing the feature vectors in a feature fusion mode; the method in the prior art 2 does not use a message transfer mechanism, does not consider information interaction between a target pair and the relationship thereof, and cannot update the state after context transfer. And both do not use implicit constraints existing among all objects in the image to construct the context, and have certain disadvantages.
Disclosure of Invention
The invention aims to provide a scene graph generation method, a scene graph generation system and scene graph generation equipment based on global context interaction so as to solve the problems.
In order to achieve the purpose, the invention adopts the following technical scheme:
compared with the prior art, the invention has the following technical effects:
compared with a feature representation method using visual features to represent target features, the method makes full use of the target visual features, category features and space coordinate information, so that the method makes full use of information, and improves the relation prediction performance generated by a scene graph;
compared with a scene graph generating method using local context interaction, the method provided by the invention utilizes a recurrent neural network to extract the global context of the image, realizes information interaction based on the global context, then performs message transmission, and fully realizes data interaction and information expansion.
Drawings
FIG. 1 is a block diagram of a scene graph generation method based on global context interaction according to the present invention.
FIG. 2 is a flow diagram of a feature fusion based joint representation of vectors.
Fig. 3 is a structural diagram of a bidirectional gated recurrent neural network BiGRU.
Fig. 4 is a flow diagram of an iterative message delivery mechanism based on global feature vectors.
Fig. 5 is a diagram illustrating target detection results and corresponding scenarios.
FIG. 6 is a graph of the results of the performance test of the present invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the drawings and examples. It should be noted that the embodiments described herein are only for explaining the present invention, and are not intended to limit the present invention. Furthermore, the technical features related to the embodiments of the present invention may be combined with each other without conflict.
The specific implementation process of the invention comprises the processes of target detection and feature vector fusion of the image, feature generation based on global context interaction and message transmission. FIG. 1 is a block diagram of a scene graph generation method based on global context interaction according to the present invention.
1. Object detection and feature vector fusion of images
After an input image is given, the method uses a fast-RCNN deep learning model to carry out target detection, and obtains a target set O ═ O1,o2,…,on) The corresponding visual feature set V ═ V (V ═ V)1,v2,…,vn) The coordinate feature set B ═ B (B)1,b2,…,bn) And the set of pre-classified labels L ═ L1,l2,…,ln) And combining two target coordinates to collect visual characteristics C in the frame (C)i→j,i≠j)。
Firstly, the invention uses a feature fusion method to perform spatial coordinate feature b corresponding to each targetiVector v of visual featuresiAnd performing joint representation. For the object oiThe absolute position coordinate b is (x)1,y1,x2,y2) Wherein x is1,y1,x2,y2Respectively representing the left upper and right lower coordinates of a rectangular regression frame, and converting the coordinates into a relative position code b in an image by using the following formulai:
In the formula, wid represents the original width of the image I, and hei represents the original height of the image I.
Then, the relative position is encoded b using the fully-connected layer of the neural networkiExtended to 128-dimensional features si:
si=σ(Wsbi+bs),
Where σ represents the ReLU activation function, WsAnd bsThe parameters are linearly transformed and are automatically learned and adjusted by a neural network. Meanwhile, the method uses the full connection layer to convert the target visual characteristic v into the target visual characteristiciThe feature is converted into 512 dimensions from 4096 dimensions.
Then, the invention transforms the relative position feature vector s subjected to dimension transformationiAnd a visual feature viSplicing and dimension transformation are carried out to obtain a 512-dimensional target vision and coordinate feature fusion vector fiThe calculation flow is as follows:
fi=σ(Wf[si,vi]+bf),
in the formula [ ·]Represents the stitching operation, σ represents the ReLU activation function, WfAnd bfAre linear transformation parameters.
The above feature vector fusion process is shown in fig. 2.
2. Global feature generation based on bidirectional gated recurrent neural network
In the global feature generation process, the invention constructs a bidirectional gated recurrent neural network BiGRU, and uses a zero vector as an initial state, and the structure of the BiGRU is shown in FIG. 3. Obtaining a feature fusion vector F ═ F (F) of the target set1,f2,…,fn) Then, the first item x coordinate in the relative coordinate is sequenced from left to right, and the sequence is input into a BiGRU to obtain the global context target characteristic gamma (gamma is equal to (gamma))1,γ2,…,γn). The specific generation steps are as follows:
(1) initializing a zero vector as a BiGRU initial state;
(2) at two ends of the BiGRU, respectively fusing the first and the last feature fusion vectors f in the target set0And fnInputting, generating hidden states corresponding to the direction and sequence
(4) Fusing the forward and reverse hidden states to obtain a context fusion state gamma of each targeti。
Then, the invention utilizes the Glove word embedded vector to set the result L of the target pre-classification in the target detection process as (L)1,l2,…,ln) Target class feature vector g converted into 128 dimensionsi。
Finally, the invention uses a neural network full connection layer to map the global context target feature gamma of each targetiAnd its class feature vector giPerforming fusion to obtain the targetLocal feature ci. The above calculation process is shown by the formula:
gi=Glove(li),
ci=σ(Wc[γi,gi]+bc),
wherein, Glove (l)i) Represents the encoding of a pre-sorted tag of an object using the Glove approach [. cndot]Representing a splicing operation, WcAnd bcAre linear transformation parameters.
3. Message iteration transfer mechanism based on global feature vector
The message iteration transfer mechanism is divided into a message aggregation function and a state updating function.
Firstly, the invention constructs a message aggregation function: in the scene graph topology, nodes and edges respectively represent a subject object and the relationship thereof in a visual relationship, when a message is transmitted, a single node or edge simultaneously receives information of a plurality of sources, a pooling function is required to be designed to calculate the weight of each part of the message, and the final incoming message is aggregated by using the weighted sum of the messages. Depending on the recipient of the message, the incoming message may be a message received by the target nodeWith messages received by the relational edges
Knowing the hidden states of the current node GRU and the relationship edge GRUAndthe message that is transmitted into the ith node at the t iteration is represented asHidden state by the target GRU itselfIts out-of-range GRU hidden stateIn-degree edge hidden stateAnd calculating, wherein i → j represents that the target i is the subject and the target j is the object in the relation.
Similarly, the relationship edge from the ith target node to the jth target node in the tth iteration is aggregated into a messageHidden states corresponding to last iteration of relational edge GRUSubject node GRU hidden stateObject node GRU hidden stateAnd (4) forming.Andthe following adaptive weighting function:
wherein [ ·]Represents the stitching operation, σ represents the ReLU activation function, w1、w2And v1、v2Are learnable parameters.
Secondly, the invention constructs a state update function: respectively constructing a target node GRU and a relation edge GRU, and carrying out feature vector on the relation between the targetsStorage and updating of. First, when t is 0, the GRU state of each target node and relationship edge is initialized to a zero vector, and the global feature vector c of the target is initialized to a zero vectoriAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs the input of the relation edge GRU, the hidden states of the target node and the relation edge at the initial time are respectively generated
In subsequent iterations, each iteration t, each GRU, depending on whether it is a target GRU or a relationship GRU, will have its previous iteration hidden stateOrAnd the incoming message of the previous iterationOrAs input, and generates a new hidden stateOrAs an output, the message aggregation function generates a message for the next iteration.
Therefore, the specific steps of the whole message transmission mechanism are as follows:
(1) initializing GRU states of each target node and the relation edges into zero vectors;
(2) global feature vector c of targetiAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs the input of the relation edge GRU, the hidden states of the target node and the relation edge at the initial time are respectively generated
(3) Computing received messages for each target and relationship using a message aggregation functionAnd
(4) combined with hidden statesReceived messageAndutilizing GRU to update state to obtain state of next time
(5) If the iteration times reach the set times, the state of the current target and the relation is storedOtherwise, returning to the step (3).
The above message passing mechanism flow is shown in fig. 4.
4. Scenario map generation based on target and relationship state representation
And (3) regarding the hidden states of the targets and the relations after the updating of the message transfer mechanism as characteristic vectors of the targets and the relations, sending the characteristic vectors into a neural network, and performing category prediction on the targets and the relations by using a softmax function to obtain the category of each target and the relation category between each pair of targets, thereby obtaining a scene graph capable of reflecting the relations between the targets and the objects in the image.
Given an input image, the target detection result and the corresponding scene graph are shown in fig. 5, and the performance test result of the model is shown in fig. 6.
In another embodiment of the present invention, a scene graph generating system based on global context interaction is provided, which can be used to implement the above scene graph generating method based on global context interaction, and specifically, the system includes:
a target detection module for performing target detection on the input image I to obtain a target set O ═ O1,o2,…,on) And a corresponding set of visual features V ═ V (V ═ V)1,v2,…,vn) And the coordinate feature set B ═ B1,b2,…,bn) And the pre-classified label set L ═ (L)1,l2,…,ln) And (C) combining the two target coordinates and collecting the visual characteristics C in the framei→j,i≠j);
A joint expression vector acquisition module of the target vision and coordinate characteristics, which is used for transforming the absolute position coordinates of each target by using a neural network to obtain a joint expression vector f of the target vision and coordinate characteristicsi;
A target global feature obtaining module for obtaining (F) according to the feature fusion vector F1,f2,…,fn) Obtaining local context target characteristics gammaiAnd its class characteristicsVector giUsing a neural network to map the global context target feature gamma of the targetiAnd its class feature vector giFusing to obtain the global feature c of the targeti;
A scene graph acquisition module for acquiring a global feature vector c based on each targetiFeature vector c of each relationi→jInitialize its hidden stateFurther, each node incoming message is initially calculatedEach side incoming messageAnd performing iterative transfer, and updating hidden state by using recurrent neural networkAnd carrying out message aggregation to obtain the incoming message of each time iAnd generating a scene graph capable of reflecting the relation between the target and the target in the image by using the final states of the target node and the relation edge until the set iteration number is reached.
The division of the modules in the embodiments of the present invention is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present invention may be integrated in one processor, or may exist alone physically, or two or more modules are integrated in one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
In yet another embodiment of the present invention, a computer device is provided that includes a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The Processor may be a Central Processing Unit (CPU), or may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc., which is a computing core and a control core of the terminal, and is specifically adapted to load and execute one or more instructions in a computer storage medium to implement a corresponding method flow or a corresponding function; the processor of the embodiment of the invention can be used for the operation of the scene graph generation method based on the global context interaction.
The invention discloses a scene graph generation method based on global context interaction, which comprises the following steps of 1) carrying out vector joint representation based on fusion of a plurality of characteristics such as object visual characteristics, space coordinates, semantic labels and the like; 2) generating global characteristics based on a bidirectional gating cyclic neural network; 3) a message iteration transfer mechanism based on the global feature vector; 4) and generating a scene graph based on the target and the relation state representation. Compared with the existing scene graph generation method, the scene graph generation method based on the global context interaction has the advantages that the global characteristics of the images are fully utilized through the context interaction, and the application universality is higher; meanwhile, after the global features after context interaction are obtained, message transmission between the target pair and the relation is carried out, the existing state is updated by utilizing the potential relation between the targets, and more accurate scene graph generation is carried out, so that the method has the advantages of practical application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.
Claims (10)
1. A scene graph generation method based on global context interaction is characterized by comprising
Carrying out target detection on the input image I to obtain a target set O ═ O1,o2,…,on) And a corresponding set of visual features V ═ (V ═ V)1,v2,…,vn) And the coordinate feature set B ═ B1,b2,…,bn) And the set of pre-classified labels L ═ L1,l2,…,ln) And (C) combining the two target coordinates and collecting the visual characteristics C in the framei→j,i≠j);
The absolute position coordinates of each target are converted by utilizing a neural network to obtain a joint expression vector f of the target vision and coordinate characteristicsi;
According to the feature fusion vector F ═ F1,f2,…,fn) Obtaining local context target characteristics gammaiAnd its class feature vector giUsing a neural network to map the global context target feature gamma of the targetiAnd its class feature vector giFusing to obtain the global feature c of the targeti;
Global feature vector c based on each targetiFeature vector c of each relationi→jInitialize its hidden stateFurther, each node incoming message is initially calculatedEach side incoming messageAnd performing iterative transfer, and updating hidden state by using recurrent neural networkAnd carrying out message aggregation to obtain the incoming message of each time iAnd generating a scene graph capable of reflecting the relation between the target and the target in the image by using the final states of the target node and the relation edge until the set iteration number is reached.
2. The method as claimed in claim 1, wherein the neural network is used to convert the absolute position coordinates of each target into relative position codes in the image and expand the relative position codes into relative position features siVisual features v of the objectiConverting into 512 dimensions, adopting a feature fusion method to obtain a relative position feature vector siAnd a visual feature viSplicing and converting to obtain a joint expression vector f of the target vision and the coordinate characteristicsi。
3. The method as claimed in claim 2, wherein in the feature fusion-based vector joint representation, after target detection is performed on the input image I by using the fast-RCNN model, the absolute position coordinates of the target are converted into the relative position code b in the imageiFor the object oiIts coordinate (x)1,y1,x2,y2) Wherein x is1,y1,x2,y2Respectively representing the upper left coordinate and the lower right coordinate of a rectangular regression frame, and a relative position code calculation formula:
in the formula, wid represents the original width of the image I, hei represents the original height of the image I; then, the relative position is encoded b using the full connection layeriExtended to 128-dimensional features si:
si=σ(Wsbi+bs),
Where σ represents the ReLU activation function, WsAnd bsThe parameters are linear transformation parameters and are automatically learned and adjusted by a neural network; meanwhile, the visual characteristics v of the target are obtained by detecting the target by the same methodiPerforming dimension transformation, and converting 4096-dimensional features into 512-dimensional features by using a full connection layer; then, the relative position feature vector s subjected to dimension transformationiAnd a visual feature viSplicing and converting are carried out, and finally a 512-dimensional target vision and coordinate feature fusion vector f is obtainediThe calculation flow is as follows:
fi=σ(Wf[si,vi]+bf),
in the formula [ ·]Represents the stitching operation, σ represents the ReLU activation function, WfAnd bfAre linear transformation parameters.
4. The method of claim 1, wherein the fusion vector is F ═ F (F) according to features1,f2,…,fn) Obtaining global context target characteristic gamma (gamma) by using bidirectional gating recurrent neural network (BiGRU)1,γ2,…,γn) (ii) a Classifying the target by the target detection module to obtain a result L ═ L1,l2,…,ln) To obtain the category feature vector g of each targetiUsing a neural network to characterize the global context of the target by the target feature gammaiAnd its class feature vector giPerforming fusion to obtain the global characteristic c of the targeti。
5. The method as claimed in claim 4, wherein in the global feature generation process based on the bidirectional gated recurrent neural network, a feature fusion vector F ═ F (F) of the target set is obtained1,f2,…,fn) Then, it is expressed as x-coordinate in relative coordinatesSequencing from left to right, and inputting the sequence into a bidirectional gated recurrent neural network BiGRU to realize global context interaction to obtain a global context target characteristic gamma (gamma is ═1,γ2,…,γn);
Subsequently, the classification result L ═ of the target using the target detection (L)1,l2,…,ln) Calculating a Glove word embedding vector of the classification label to obtain a 128-dimensional target class feature vector giFinally, the global context target feature gamma of each target is determinediAnd its class feature vector giFusing to obtain the global feature c of the targetiThe above calculation process is shown as the formula:
gi=Glove(li),
ci=σ(Wc[γi,gi]+bc),
wherein, Glove (l)i) Represents the encoding of a pre-sorted tag of an object using the Glove approach [. cndot]Representing a splicing operation, WcAnd bcAre linear transformation parameters.
6. The method as claimed in claim 5, wherein γ is a set of parametersiThe specific generation steps are as follows:
(1) initializing a zero vector as a BiGRU initial state;
(2) at two ends of the BiGRU, respectively fusing the first and the last feature fusion vectors f in the target set0And fnInputting, generating hidden states corresponding to the direction and sequence
(4) Fusing the forward and reverse hidden states to obtain the context of each targetFusion state gammai。
7. The method for generating a scene graph based on global context interaction according to claim 1, wherein the message iterative transfer mechanism based on the global feature vector comprises two calculation functions of constructing a message aggregation function and a state update function;
constructing a message aggregation function: known ith target node GRU hidden stateHidden state of relationship edge GRU from ith target node to jth target nodeThe message that is transmitted into the ith node at the t iteration is represented asThenHidden state by the target GRU itselfIts out-of-range GRU hidden stateIn-degree edge hidden stateCalculated, where i → j represents that target i is the subject and target j is the object in the relationship:
similarly, the ith target node goes to the jth target node at the tth iterationAggregated messages for relational edges of individual target nodesHidden states corresponding to last iteration of relational edge GRUSubject node GRU hidden stateObject node GRU hidden stateThe components of the components are as follows,andthe following adaptive weighting function:
wherein [ ·]Represents the stitching operation, σ represents the ReLU activation function, w1、w2And v1、v2Is a learnable parameter;
constructing a state updating function: respectively constructing a target node GRU and a relation edge GRU, and carrying out feature vector on the relation between the targetsStorage and update of (2): first, when t is 0, the GRU state of each target node and relationship edge is initialized to a zero vector, and the global feature vector c of the target is initialized to a zero vectoriAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs input to its relational edge GRU, respectively generate targetsHidden states of nodes and relational edges at initial time
In subsequent iterations, each iteration t, each GRU, depending on whether it is a target GRU or a relationship GRU, will have its previous iteration hidden stateOrAnd the incoming message of the previous iterationOrAs input, and generates a new hidden stateOrAs an output, the message aggregation function generates a message for the next iteration:
8. the method for generating a scene graph based on global context interaction according to claim 1, wherein a message iteration delivery mechanism based on a global feature vector specifically comprises the following steps:
(1) initializing GRU states of each target node and the relation edges into zero vectors;
(2) global feature vector c of targetiAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs the input of the relation edge GRU, the hidden states of the target node and the relation edge at the initial time are respectively generated
(3) Computing received messages for each target and relationship using a message aggregation functionAnd with
(4) Combined with hidden stateReceived messageAndutilizing GRU to update state to obtain state of next time
(5) If the iteration times reach the set times, the state of the current target and the relation is storedOtherwise, returning to the step (3);
(6) and after the message is transmitted, sending the final state vector of the target and the relation into a neural network to obtain a scene graph capable of reflecting the relation between the target and the target in the image.
9. A scene graph generation system based on global context interaction, comprising:
a target detection module for performing target detection on the input image I to obtain a target set O ═ O1,o2,…,on) And a corresponding set of visual features V ═ V (V ═ V)1,v2,…,vn) And the coordinate feature set B ═ B1,b2,…,bn) And the pre-classified label set L ═ (L)1,l2,…,ln) And (C) combining the two target coordinates and collecting the visual characteristics C in the framei→j,i≠j);
A joint expression vector acquisition module of the target vision and coordinate characteristics, which is used for transforming the absolute position coordinates of each target by using a neural network to obtain a joint expression vector f of the target vision and coordinate characteristicsi;
A target global feature obtaining module for obtaining (F) according to the feature fusion vector F1,f2,…,fn) Obtaining local context target characteristics gammaiAnd its class feature vector giUsing a neural network to map the global context target feature gamma of the targetiAnd its class feature vector giFusing to obtain the global feature c of the targeti;
A scene graph acquisition module for acquiring a global feature vector c based on each targetiFeature vector c of each relationi→jInitialize its hidden stateFurther, each node incoming message is initially calculatedEach side incoming messageAnd performing iterative transfer, and updating hidden state by using recurrent neural networkAnd carrying out message aggregation to obtain the incoming message of each time iAnd generating a scene graph capable of reflecting the relation between the target and the target in the image by using the final states of the target node and the relation edge until the set iteration number is reached.
10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the scene graph generation method based on global context interaction according to any one of claims 1 to 8 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210297025.7A CN114677544B (en) | 2022-03-24 | 2022-03-24 | Scene graph generation method, system and equipment based on global context interaction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210297025.7A CN114677544B (en) | 2022-03-24 | 2022-03-24 | Scene graph generation method, system and equipment based on global context interaction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114677544A true CN114677544A (en) | 2022-06-28 |
CN114677544B CN114677544B (en) | 2024-08-16 |
Family
ID=82073908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210297025.7A Active CN114677544B (en) | 2022-03-24 | 2022-03-24 | Scene graph generation method, system and equipment based on global context interaction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114677544B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546589A (en) * | 2022-11-29 | 2022-12-30 | 浙江大学 | Image generation method based on graph neural network |
CN118015522A (en) * | 2024-03-22 | 2024-05-10 | 广东工业大学 | Time transition regularization method and system for video scene graph generation |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111462282A (en) * | 2020-04-02 | 2020-07-28 | 哈尔滨工程大学 | Scene graph generation method |
WO2020244287A1 (en) * | 2019-06-03 | 2020-12-10 | 中国矿业大学 | Method for generating image semantic description |
CN113221613A (en) * | 2020-12-14 | 2021-08-06 | 国网浙江宁海县供电有限公司 | Power scene early warning method for generating scene graph auxiliary modeling context information |
CN113627557A (en) * | 2021-08-19 | 2021-11-09 | 电子科技大学 | Scene graph generation method based on context graph attention mechanism |
CN113836339A (en) * | 2021-09-01 | 2021-12-24 | 淮阴工学院 | Scene graph generation method based on global information and position embedding |
KR20220025524A (en) * | 2020-08-24 | 2022-03-03 | 경기대학교 산학협력단 | System for generating scene graph using deep neural network |
-
2022
- 2022-03-24 CN CN202210297025.7A patent/CN114677544B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020244287A1 (en) * | 2019-06-03 | 2020-12-10 | 中国矿业大学 | Method for generating image semantic description |
CN111462282A (en) * | 2020-04-02 | 2020-07-28 | 哈尔滨工程大学 | Scene graph generation method |
KR20220025524A (en) * | 2020-08-24 | 2022-03-03 | 경기대학교 산학협력단 | System for generating scene graph using deep neural network |
CN113221613A (en) * | 2020-12-14 | 2021-08-06 | 国网浙江宁海县供电有限公司 | Power scene early warning method for generating scene graph auxiliary modeling context information |
CN113627557A (en) * | 2021-08-19 | 2021-11-09 | 电子科技大学 | Scene graph generation method based on context graph attention mechanism |
CN113836339A (en) * | 2021-09-01 | 2021-12-24 | 淮阴工学院 | Scene graph generation method based on global information and position embedding |
Non-Patent Citations (1)
Title |
---|
兰红;刘秦邑;: "图注意力网络的场景图到图像生成模型", 中国图象图形学报, no. 08, 12 August 2020 (2020-08-12) * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115546589A (en) * | 2022-11-29 | 2022-12-30 | 浙江大学 | Image generation method based on graph neural network |
CN115546589B (en) * | 2022-11-29 | 2023-04-07 | 浙江大学 | Image generation method based on graph neural network |
CN118015522A (en) * | 2024-03-22 | 2024-05-10 | 广东工业大学 | Time transition regularization method and system for video scene graph generation |
Also Published As
Publication number | Publication date |
---|---|
CN114677544B (en) | 2024-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084296B (en) | Graph representation learning framework based on specific semantics and multi-label classification method thereof | |
Liang et al. | Symbolic graph reasoning meets convolutions | |
US20210264190A1 (en) | Image questioning and answering method, apparatus, device and storage medium | |
Li et al. | Deep supervision with intermediate concepts | |
CN114677544B (en) | Scene graph generation method, system and equipment based on global context interaction | |
Quilodrán-Casas et al. | Digital twins based on bidirectional LSTM and GAN for modelling the COVID-19 pandemic | |
CN109858390A (en) | The Activity recognition method of human skeleton based on end-to-end space-time diagram learning neural network | |
CN111460928B (en) | Human body action recognition system and method | |
CN111462324B (en) | Online spatiotemporal semantic fusion method and system | |
CN110138595A (en) | Time link prediction technique, device, equipment and the medium of dynamic weighting network | |
CN110991532B (en) | Scene graph generation method based on relational visual attention mechanism | |
CN111382868A (en) | Neural network structure search method and neural network structure search device | |
CN111191526A (en) | Pedestrian attribute recognition network training method, system, medium and terminal | |
CN112016601B (en) | Network model construction method based on knowledge graph enhanced small sample visual classification | |
CN111241306B (en) | Path planning method based on knowledge graph and pointer network | |
CN112395438A (en) | Hash code generation method and system for multi-label image | |
Bajpai et al. | Transfer of deep reactive policies for mdp planning | |
CN111814658A (en) | Scene semantic structure chart retrieval method based on semantics | |
US20200160501A1 (en) | Coordinate estimation on n-spheres with spherical regression | |
CN113868448A (en) | Fine-grained scene level sketch-based image retrieval method and system | |
Oh et al. | Hcnaf: Hyper-conditioned neural autoregressive flow and its application for probabilistic occupancy map forecasting | |
Zhong et al. | 3d implicit transporter for temporally consistent keypoint discovery | |
CN112100376B (en) | Mutual enhancement conversion method for fine-grained emotion analysis | |
CN117520209A (en) | Code review method, device, computer equipment and storage medium | |
WO2023143121A1 (en) | Data processing method and related device thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |