CN112990202A - Scene graph generation method and system based on sparse representation - Google Patents
Scene graph generation method and system based on sparse representation Download PDFInfo
- Publication number
- CN112990202A CN112990202A CN202110497553.2A CN202110497553A CN112990202A CN 112990202 A CN112990202 A CN 112990202A CN 202110497553 A CN202110497553 A CN 202110497553A CN 112990202 A CN112990202 A CN 112990202A
- Authority
- CN
- China
- Prior art keywords
- node
- edge
- target
- graph
- edges
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000004927 fusion Effects 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 20
- 230000005540 biological transmission Effects 0.000 claims abstract description 9
- 238000001514 detection method Methods 0.000 claims abstract description 9
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 8
- 238000005259 measurement Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 9
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 10
- 238000010276 construction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/513—Sparse representations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a scene graph generation method and a scene graph generation system based on sparse representation, wherein the method comprises the following steps: carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set; identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph; synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying target types and relationships; and generating a scene graph according to the identified target type and relationship. The method can effectively filter the false relation, further effectively generate the sparse graph, reduce the computation complexity of the dense graph and improve the graph message transmission efficiency; meanwhile, the method can accurately extract features from the sparse graph, and further accurately generate the scene graph.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a scene graph generation method and system based on sparse representation.
Background
The generation of the scene graph plays an important role in the deep understanding of the visual scene. The scene graph is a refined semantic extraction of the target and the target relation in the real image, and is constructed by predicting the predefined target instance, the target attribute and the target-to-target relation, the interaction between the targets in the scene is represented by the structured language of the common triples, and the interaction can be represented by the form of the < subject-predicate-object > triples. In the scene graph, nodes are represented as target entities including category labels and bounding boxes, directed edges are represented as relationship categories between the subject and the object, and various attributes (such as color, material, and the like) of the target can be described and represented in the scene graph.
At present, the scene graph inference technology draws much attention due to the extraction of rich semantic information contained in target interaction. The rich scene graph semantic understanding can not only provide context clues for basic recognition tasks, but also have a wide prospect in various advanced visual applications, for example, the rich scene graph semantic is a key for improving image retrieval and various natural language-based image tasks, and also provides valuable information for applications such as visual question answering, image description, image generation and the like. Although conventional scenegraph generation methods have been empirically successful in many applications, the problems of high complexity of the computation of the dense map and imprecise pruning of the sparse map still remain.
Based on this, how to infer complex potential relationships among all targets and accurately extract a scene graph from an image is still a problem to be solved urgently at present.
Disclosure of Invention
The invention aims to provide a scene graph generation method and system based on sparse representation so as to realize reasonable reasoning of complex potential relations and accurately generate a scene graph.
In view of the above, in a first aspect, the present invention provides a scene graph generation method based on sparse representation, including:
carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set;
identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph;
synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying a target type and a target relationship;
and generating a scene graph according to the target type and the relation obtained by identification.
Preferably, the identifying all edges of the target pair as foreground edges and background edges through a preset relationship metric network and constructing a sparse map includes:
acquiring category characteristics, space characteristics and appearance characteristics of each target;
classifying edges of the target pair according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pair, and acquiring a classification result;
selecting according to the classification resultStrip foreground edge and frontA strip background edge, the structure comprisesA node andsparse graph of edges.
Preferably, the classifying the edges of the target pair according to the class features, the spatial features and the appearance features of two targets in the target pair and obtaining a classification result includes:
respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate a combined spatial feature and a combined appearance feature;
embedding prior statistical probability of a target class to construct a joint class characteristic of the target pair;
connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate a logs characteristic;
and inputting the logs characteristics into a sigmoid classifier to obtain the edge probability of the target pair.
Preferably, the joint spatial features are:
wherein,for the purpose of the joint spatial feature,in order to be a multi-layer sensor,in order to operate in series with each other,are respectively the targetAndthe spatial characteristic of (a);
the joint appearance characteristics are:
wherein,for the purpose of the said joint appearance feature,are respectively the targetAndthe appearance characteristic of (a);
wherein,a priori statistical probability for the object class defined as presence in the original imagePresence in case of a Category objectProbability of a class object;
the joint category characteristics are:
wherein,for the purpose of the joint class feature(s),for the number of all the categories it is,is a target ofBelong to the categoryThe probability of (a) of (b) being,is a target ofBelong to the categoryA probability of (a),Are respectively the targetAndthe class characteristics of (1).
Preferably, the synchronously learning the nodes and edges on the sparse graph and identifying the target type and the target relationship by the feature fusion and update strategy based on the graph attention neural network comprises:
fusing the appearance characteristic, the spatial characteristic and the category characteristic of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate a node characteristic and an edge characteristic;
acquiring attention weights of nodes and edges through a graph attention neural network;
and updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features.
Preferably, the fusing the appearance features, the spatial features and the category features of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate the node features and the edge features, includes:
aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph, and compressing through a coder and a decoder to obtain fusion characteristics;
obtaining an initialization node characteristic and an initialization edge characteristic according to the fusion characteristic;
embedding the prior statistical probability of the class relationship into the initialized node features and the initialized edge features to construct node features and edge features;
and distributing the node characteristics and the edge characteristics to corresponding nodes and edges in the sparse graph.
Preferably, the prior statistical probability of the class relationship is:
wherein,defining as a given node for said prior statistical probabilityAndcontext existsThe probability of (d);
the node is characterized in that:
wherein,in order to be a feature of the node,the number of all the relationships is the same,as to the number of all the nodes,initializing a node characteristic for the node;
the edge characteristics are as follows:
Preferably, the attention weights of the nodes and edges include an attention weight between nodes, an attention weight between nodes and edges, and an attention weight of edges;
the attention weight between the nodes is:
wherein,is a nodeAndthe weight of attention in between the two,is an AND nodeThere is a set of nodes connecting the edges,is the weight parameter to be learned,are respectively nodesAndthe characteristics of the nodes of (a) are,is an AND nodeAdjacent nodeThe characteristics of the nodes of (a) are,is a nodeAnd nodeConnecting edge ofThe edge characteristics of (a) are set,is the weight of the network;
the attention weight between the node and the edge is:
the attention weight of the edge is:
Preferably, the new node is characterized by:
wherein,for the purpose of the new node characteristic,in order to be a sigmoid function,is an AND nodeThere is a set of nodes connecting the edges,is a nodeAnd its adjacent nodeThe weight of attention in between the two,is a nodeAnd its adjacent nodeConnecting edge ofThe edge characteristics of (a) are set,is a nodeAnd the connecting edgeAttention weight in between;
the new edge characteristics are:
In a second aspect, the present invention provides a scene graph generation system based on sparse representation, including:
the target area extraction module is used for carrying out target detection on the original image through a fast area convolution neural network to obtain a target area set;
the sparse graph constructing module is used for identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network and constructing a sparse graph;
the graph message transmission module is used for synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network and identifying a target type and a target relationship;
and the scene graph generating module is used for generating a scene graph according to the target type and the relation obtained by identification.
According to the scene graph generation method and system based on sparse representation, all edges of the target pairs in the original image are classified into the foreground and the background through RelMN, and a sparse graph is constructed, so that the false relation can be effectively filtered, the sparse graph can be effectively generated, the computation complexity of the dense graph is reduced, and the graph message transmission efficiency is improved; furthermore, through a feature fusion and updating strategy based on the graph attention neural network, nodes and edges on the sparse graph are synchronously learned to obtain target features and relation features, and the target features and the relation features are used for target and relation classification, so that the features can be accurately extracted from the sparse graph, and the scene graph can be accurately generated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a sparse representation-based scene graph generation method according to an embodiment of the present invention;
FIG. 2 is a flowchart of step S20 of the sparse representation-based scene graph generating method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating two classifications of foreground and background in RelMN in an embodiment of the present invention;
FIG. 4 is a flowchart of step S30 of the sparse representation-based scene graph generating method according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a scene graph generation system based on sparse representation according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In an embodiment, as shown in fig. 1, a scene graph generation method based on sparse representation is provided, which includes the following steps:
and step S10, carrying out target detection on the original image through the fast regional convolutional neural network to obtain a target region set.
In this embodiment, an original image is obtained, Fast regional Convolutional Neural Network (Fast Convolutional Neural Network) is used to perform target detection on the image, and a plurality of (a) are automatically extracted from the original imageA) target areaObtaining a target area set. Wherein the target region setEach target region in (1)Including position information and appearance characteristics of the targetAnd class probability。
Understandably, the speed and accuracy of target detection can be improved by using the circumscribed matrix obtained in step S10 to cover most critical targets.
And step S20, identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph.
In this embodiment, RelMN (Relational metric Network) is configured to identify all edges as foreground edges and background edges, and automatically select all foreground edges and part of background edges to construct a sparse graph. The RelMN is composed of three parts, namely multi-feature extraction, two-classification of foreground and background and sparse image generation.
Preferably, as shown in fig. 2, step S20 includes the steps of:
step S201, multi-feature extraction: obtaining the category characteristics of each targetSpatial characteristics ofAnd appearance characteristics。
In step S201, based on the target area setEach target areaLocation information and class probability of included objectsConversion to spatial featuresAnd category characteristicsAnd further according to each target areaAppearance characteristics of the contained objectTransformed spatial featuresAnd category characteristicsTo obtainAnd the multidimensional characteristic is used for detecting whether potential relations exist between the target pairs. Preferably, for spatial featuresBy amplitude and splicing willDimensional position coordinate conversionSpatial features of dimensionsWhereinin order to be able to target the number of,position coordinates of the dimension as a target areaTop left corner of the bounding box of the characterizationCoordinates and lower right cornerThe coordinates, MPL, are the multi-layer perceptrons,is a series operation. Similarly, for class probability transitionClass characteristics of dimension。
Step S202, foregroundAnd two classifications of background: according to the target pairClass characteristics of two targetsSpatial characteristics ofAnd appearance characteristicsTo the target pairAnd classifying the edges and obtaining a classification result. Wherein the target pairFrom two different targetsAndand (4) forming.
As shown in fig. 3, the two classification diagrams of the foreground and the background in RelMN, step S202 includes the following steps:
the method comprises the following steps: respectively couple the targetsSpatial characteristics of two targetsAnd appearance characteristicsConnected in series to generate joint space characteristicsAnd combined appearance features. Wherein spatial features are combinedThe calculation formula of (2) is as follows:
step two, embedding prior statistical probability of target categoryConstructing object pairsJoint class characteristics of. Wherein the prior statistical probability of the object classDefined as being present in the original imagePresence in case of a Category objectProbability of class object, prior statistical probabilityCan be expressed as:
And learning statistical co-occurrence knowledge among target categories based on the prior statistical probability and the category characteristics. The joint class characteristicsCan be expressed as:
in the formula (4), the first and second groups,is a target ofBelong to the categoryThe probability of (a) of (b) being,is a target ofBelong to the categoryThe probability of (c).
Step three, combining the appearance characteristicsJoining spatial featuresAnd joint category featuresPerforming series connection to generate logs characteristics. Wherein, logs characteristicsThe calculation formula of (2) is as follows:
step four, identifying the Logits characteristicsInputting sigmoid classifier to obtain target pairEdge probability of. Where the edge probabilityThe calculation formula of (2) is as follows:
that is, in step S202, the position coordinates and the class probability of each object are determinedAre respectively provided withConversion to spatial featuresAnd category characteristicsThen, the spatial characteristics of two different targets are first determinedAnd appearance characteristicsConcatenate to generate joint spatial featuresAnd combined appearance featuresAnd constructing joint category featuresIntroducing prior statistical probability of classes, and then combining spatial featuresCombined appearance featuresAnd joint category featuresConcatenate to generate logs featuresAnd finally calculating output probability by using sigmoid classifierAnd further, all the edges are divided into a foreground type and a background type.
Step S203, sparse map generation: selection according to classification resultStrip foreground edge and frontA strip background edge, the structure comprisesA node andsparse graph of edges.
In step S203, according to the output of the sigmoid classifier, all of them are first selected(Not hyper-parametric and determined by sigmoid classifier) pairs of edges predicted targets as foreground. Secondly, automatically selecting front with high foreground probabilityA strip background edge is constructed to containThe sparse graph with the foreground edge and the background edge enhance the robustness of relation classification and reduce the risk background of eliminating real relations. Finally obtaining a product containingA node andsparse graph of edges.
In this embodiment, all the edges are divided into two types, namely foreground and background, by the RelMN to obtain the potential relationship between the target pairs, which is more reasonable and the constructed sparse graph is more reasonable than the potential relationship generated by the distance between the target pairs. In addition, the message transmission on the sparse graph can obviously reduce the calculation complexity, so that the message transmission is more accurate and effective.
And step S30, synchronously learning nodes and edges of the sparse graph through feature fusion and updating strategies based on the graph attention neural network, and identifying object classes and relationships.
In step S30, the feature fusion and update strategy based on the graph attention neural network includes three parts of node feature and edge feature generation, weight learning of node feature and edge feature, and object and relationship classification.
Preferably, as shown in fig. 4, step S30 includes the steps of:
step S301, generation of node features and edge features: appearance characteristics of each node in sparse graphSpatial characteristics ofAnd category characteristicsFusion is carried out, and prior statistical probability of class relation is embeddedFeature of construction nodeAnd edge characteristics。
In step S301, appearance features of each node in the sparse graph are first checkedSpatial characteristics ofAnd category characteristicsPerforming polymerization and compressing by a coder-decoder to obtain a fusion characteristicThen, the initialized node characteristics are obtained according to the fusion characteristicsAnd initializing edge features. Wherein the node characteristicsThe initialization process of (1) is as follows: direct through fusion featureInitializing node characteristics, i.e.(ii) a Edge featureThe initialization process of (1) is as follows: sequentially connecting the fusion features of the subject node and the object node, and performing dimension compression through the full connection layer, i.e.Wherein the full connection layerA Leaky Relu layer.
Further, the prior statistical probability of the class relationEmbedded to initialization node featuresAnd initializing edge featuresFeature of construction nodeAnd edge characteristicsFinally, the node characteristicsAnd edge characteristicsAnd distributing to corresponding nodes and edges in the sparse graph. Wherein a priori statistical probabilities of class relationshipsIs defined as a given nodeAndcontext existsProbability of (3), a priori statistical probability of class relationshipCan be expressed as:
in the formula (7), the first and second groups,is a main body nodeTo the object nodeCorresponding relation of (A) and,is the number of all relationships.
understandably, as part of the sparse graph, the prior statistical probability according to class relationshipsAnd class probabilityThe inherent weights of the nodes and edges are constructed. Intrinsic weights of nodes and edges, respectivelyReflects nodes in node setAnd edges between other nodes and in edge setsAnd other edge.
Step S302, learning the weight of the node characteristics and the edge characteristics: attention weights of nodes and edges are obtained through a graph attention neural network. Wherein, the attention weight of the node and the edge includes the attention weight between the node and the node, the attention weight between the node and the edge, and the attention weight of the edge.
For node message aggregation, the calculation formula of attention weight between nodes is as follows:
in the formula (10), the first and second groups,is a nodeAndthe weight of attention in between the two,is a nodeThere is a set of nodes connecting the edges,is the weight parameter to be learned,、are respectively nodesAndthe characteristics of the nodes of (a) are,is an AND nodeAdjacent nodeThe characteristics of the nodes of (a) are,is a nodeAnd nodeConnecting edge ofThe edge characteristics of (a) are set,is the weight of the network. Note that, in the formula (10), the nodeAnd nodeAre all main body nodes, and are all provided with a main body,is the object node.
The formula for calculating the attention weight between a node and an edge is:
in the formula (11), the reaction mixture,is a nodeAnd edgeAttention weight in between. In the formula (11), the nodeIs the object node.
For the aggregation of edge messages, the formula for calculating the attention weight of an edge is as follows:
in the formula (12), the first and second groups,、are all nodesAnd nodeConnecting edge ofAttention weight of (1). It should be noted that, in the formula (12), the nodeIs a main body node, a nodeIs the object node.
Understandably, as another aspect of the sparse Graph, Attention weights of nodes and edges are obtained through a GAT (Graph Attention neural Network), and are combined with prior statistical probabilities based on class relationships in step S301And class probabilityTo extract new target features and relationship features.
Step S303, object and relationship classification: updating node characteristics based on attention weights of nodes and edgesAnd edge characteristicsAnd according to the new node characteristicsAnd edge characteristicsThe objects and relationships are classified.
Specifically, the node characteristics are updated according to the hidden node characteristics, the adjacent node characteristics and the connection edge characteristicsAnd according to the new node characteristicsClassifying the target category; updating edge characteristics according to hidden edge characteristics, subject node characteristics and object node characteristics simultaneouslyAnd according to the new edge characteristicsThe relationships are classified. Wherein the new node characteristicsThe calculation formula of (2) is as follows:
in the formula (13), the first and second groups,is sigmoid functionIs an AND nodeThere is a set of nodes connecting the edges,is a nodeAnd its adjacent nodeThe weight of attention in between the two,is a nodeAnd its adjacent nodeConnecting edge ofThe edge characteristics of (a) are set,is a nodeAnd the connecting edgeAttention weight in between. Note that, in the formula (13), the nodeIs the object node.
In this embodiment, the statistical co-occurrence knowledge and the context clue in the data set are learned in a centralized manner based on the feature fusion and update strategy of the graph attention neural network, so as to obtain output features (including new node features and edge features), and further classify the targets and the relationships thereof according to the output features, so that the messages on the sparse graph can be effectively transmitted and integrated.
And step S40, generating a scene graph according to the identified object type and the relationship.
In step S40, the generated scene graph includes the positions of the objects, the categories of the objects, and the relationships between the objects, and the scene graph can be structurally represented as a set of triples, i.e. the triples。
Wherein,is a target area set, each target area in the target area setContaining coordinate position information of the object described by the bounding box.
Is a target set, each target area in the target area setAll correspond to a category labelAnd is and,a set of labels for all target categories.
Is a binary relation set, each relation in the binary relation setIs a triple containing a subject nodeObject nodeAnd a subject nodeAnd object nodeThe relationship betweenAnd is andis the complete set of relationships; wherein the main body nodeObject nodeFrom candidate regionsAnd candidate regionCorresponding category labelIs determined, i.e. is。
As can be seen from the above, in the scene graph generation method based on sparse representation according to the embodiment, after the target region set is extracted from the original image through Fast R-CNN, all edges of the target pair in the original image are classified into two types, namely foreground and background, through RelMN, and a sparse graph is constructed, so that the false relationship can be effectively filtered, the sparse graph is further effectively generated, the computation complexity of the dense graph is reduced, and the graph message transmission efficiency is improved; and then synchronously learning nodes and edges on the sparse graph to obtain target characteristics and relationship characteristics through a characteristic fusion and updating strategy based on the graph attention neural network, and performing target and relationship classification by using the target characteristics and the relationship characteristics, so that the characteristics can be accurately extracted from the sparse graph, and the scene graph is accurately generated.
In an embodiment, a scene graph generation system based on sparse representation is provided, and the scene graph generation system based on sparse representation corresponds to the scene graph generation method based on sparse representation in the above embodiments one to one. As shown in fig. 5, the sparse representation-based scene graph generation system includes a target region extraction module 110, a sparse graph construction module 120, a graph message transmission module 130, and a scene graph generation module 140, and the detailed description of each functional model is as follows:
and the target area extraction module 110 is configured to perform target detection on the original image through a fast area convolutional neural network to obtain a target area set.
And a sparse graph constructing module 120, configured to identify all edges of the target pair as a foreground edge and a background edge through a preset relationship metric network, and construct a sparse graph.
And the graph message transmission module 130 is used for synchronously learning the nodes and edges of the sparse graph and identifying the target type and the target relation through a feature fusion and updating strategy based on the graph attention neural network.
And the scene graph generating module 140 is configured to generate a scene graph according to the identified object type and relationship.
Further, the sparse graph constructing module 120 includes a multi-feature extracting unit, a binary classifying unit, and a sparse graph generating unit, and the detailed description of each functional unit is as follows:
and the multi-feature extraction unit is used for acquiring the category feature, the spatial feature and the appearance feature of each target.
And the two classification units are used for classifying the edges of the target pairs according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pairs and acquiring the classification result.
A sparse graph generating unit for selecting the sparse graph according to the classification resultStrip foreground edge and frontA strip background edge, the structure comprisesA node andsparse graph of edges.
Further, the classification unit includes a first joint subunit, a first knowledge embedding subunit, a second joint subunit and a classification subunit, and the detailed description of each functional subunit is as follows:
and the first joint subunit is used for respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate joint spatial features and joint appearance features.
And the first knowledge embedding subunit is used for embedding the prior statistical probability of the target class to construct the joint class characteristics of the target pair.
And the second joint subunit is used for connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate the logs characteristic.
And the classification subunit is used for inputting the logs characteristics into the sigmoid classifier to obtain the edge probability of the target pair.
Further, the graph message passing module 130 includes a node and edge feature generating unit, a weight learning unit, and a feature updating unit, and the detailed description of each functional unit is as follows:
and the node and edge feature generation unit is used for fusing the appearance feature, the spatial feature and the class feature of each node in the sparse graph, embedding the prior statistical probability of the class relationship, and generating the node feature and the edge feature.
And the weight learning unit is used for acquiring the attention weights of the nodes and the edges through the graph attention neural network.
And the feature updating unit is used for updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features.
Further, the node and edge feature generation unit includes a feature fusion subunit, an initialization subunit, a second knowledge embedding subunit and a feature allocation subunit, and the detailed description of each functional subunit is as follows:
and the characteristic fusion subunit is used for aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph and compressing the aggregated appearance characteristics, spatial characteristics and category characteristics through a coder and a decoder to obtain fusion characteristics.
And the initialization subunit is used for obtaining the initialized node characteristic and the initialized edge characteristic according to the fusion characteristic.
And the second knowledge embedding subunit is used for embedding the prior statistical probability of the class relationship into the initialized node feature and the initialized edge feature and constructing the node feature and the edge feature.
And the feature distribution subunit is used for distributing the node features and the edge features to corresponding nodes and edges in the sparse graph.
For specific limitations of the sparse representation-based scenegraph generation system, reference may be made to the above limitations of the sparse representation-based scenegraph generation method, and details thereof are not repeated here.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (10)
1. A scene graph generation method based on sparse representation is characterized by comprising the following steps:
carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set;
identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph;
synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying a target type and a target relationship;
and generating a scene graph according to the target type and the relation obtained by identification.
2. The sparse representation-based scene graph generation method of claim 1, wherein the identifying all edges of the target pair as foreground edges and background edges through a preset relationship metric network and constructing the sparse graph comprises:
acquiring category characteristics, space characteristics and appearance characteristics of each target;
classifying edges of the target pair according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pair, and acquiring a classification result;
3. The sparse representation-based scene graph generation method of claim 2, wherein the classifying edges of the target pair according to the class features, the spatial features and the appearance features of two targets in the target pair and obtaining the classification result comprises:
respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate a combined spatial feature and a combined appearance feature;
embedding prior statistical probability of a target class to construct a joint class characteristic of the target pair;
connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate a logs characteristic;
and inputting the logs characteristics into a sigmoid classifier to obtain the edge probability of the target pair.
4. The sparse representation-based scene graph generation method of claim 3, wherein the joint spatial features are:
wherein,for the purpose of the joint spatial feature,in order to be a multi-layer sensor,in order to operate in series with each other,are respectively the targetAndthe spatial characteristic of (a);
the joint appearance characteristics are:
wherein,for the purpose of the said joint appearance feature,are respectively the targetAndthe appearance characteristic of (a);
wherein,a priori statistical probability for said object class defined as beingThe original image is presentPresence in case of a Category objectProbability of a class object;
the joint category characteristics are:
5. The sparse representation-based scene graph generation method of claim 1, wherein the synchronously learning nodes and edges on the sparse graph and identifying target types and relationships through a graph attention neural network-based feature fusion and update strategy comprises:
fusing the appearance characteristic, the spatial characteristic and the category characteristic of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate a node characteristic and an edge characteristic;
acquiring attention weights of nodes and edges through a graph attention neural network;
and updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features.
6. The sparse representation-based scene graph generating method of claim 5, wherein the fusing the appearance features, the spatial features and the class features of each node in the sparse graph and embedding the prior statistical probability of the class relationship to generate the node features and the edge features comprises:
aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph, and compressing through a coder and a decoder to obtain fusion characteristics;
obtaining an initialization node characteristic and an initialization edge characteristic according to the fusion characteristic;
embedding the prior statistical probability of the class relationship into the initialized node features and the initialized edge features to construct node features and edge features;
and distributing the node characteristics and the edge characteristics to corresponding nodes and edges in the sparse graph.
7. The sparse representation-based scenegraph generation method of claim 6, wherein the prior statistical probability of the class relationship is:
wherein,defining as a given node for said prior statistical probabilityAndcontext existsThe probability of (d);
the node is characterized in that:
wherein,in order to be a feature of the node,the number of all the relationships is the same,as to the number of all the nodes,initializing a node characteristic for the node;
the edge characteristics are as follows:
8. The sparse representation based scene graph generation method of claim 5, wherein the attention weights of the nodes and edges comprise an attention weight between nodes, an attention weight between nodes and edges, and an attention weight of edges;
the attention weight between the nodes is:
wherein,is a nodeAndthe weight of attention in between the two,is an AND nodeThere is a set of nodes connecting the edges,is the weight parameter to be learned,are respectively nodesAndthe characteristics of the nodes of (a) are,is an AND nodeAdjacent nodeThe characteristics of the nodes of (a) are,is a nodeAnd nodeConnecting edge ofThe edge characteristics of (a) are set,is the weight of the network;
the attention weight between the node and the edge is:
the attention weight of the edge is:
9. The sparse representation-based scenegraph generation method of claim 8, wherein the new node features are:
wherein,for the purpose of the new node characteristic,in order to be a sigmoid function,is an AND nodeThere is a set of nodes connecting the edges,is a nodeAnd its adjacent nodeThe weight of attention in between the two,is a nodeAnd its adjacent nodeConnecting edge ofThe edge characteristics of (a) are set,is a nodeAnd the connecting edgeAttention weight in between;
the new edge characteristics are:
10. A sparse representation-based scenegraph generation system, comprising:
the target area extraction module is used for carrying out target detection on the original image through a fast area convolution neural network to obtain a target area set;
the sparse graph constructing module is used for identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network and constructing a sparse graph;
the graph message transmission module is used for synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network and identifying a target type and a target relationship;
and the scene graph generating module is used for generating a scene graph according to the target type and the relation obtained by identification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497553.2A CN112990202B (en) | 2021-05-08 | 2021-05-08 | Scene graph generation method and system based on sparse representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110497553.2A CN112990202B (en) | 2021-05-08 | 2021-05-08 | Scene graph generation method and system based on sparse representation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112990202A true CN112990202A (en) | 2021-06-18 |
CN112990202B CN112990202B (en) | 2021-08-06 |
Family
ID=76337256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110497553.2A Active CN112990202B (en) | 2021-05-08 | 2021-05-08 | Scene graph generation method and system based on sparse representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112990202B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113836339A (en) * | 2021-09-01 | 2021-12-24 | 淮阴工学院 | Scene graph generation method based on global information and position embedding |
CN115546626A (en) * | 2022-03-03 | 2022-12-30 | 中国人民解放军国防科技大学 | Data double-unbalance-oriented deviation reduction scene graph generation method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920711A (en) * | 2018-07-25 | 2018-11-30 | 中国人民解放军国防科技大学 | Deep learning label data generation method oriented to unmanned aerial vehicle take-off and landing guide |
US20190163982A1 (en) * | 2017-11-28 | 2019-05-30 | Visual Semantics, Inc. | Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation |
US20190370587A1 (en) * | 2018-05-29 | 2019-12-05 | Sri International | Attention-based explanations for artificial intelligence behavior |
CN110991532A (en) * | 2019-12-03 | 2020-04-10 | 西安电子科技大学 | Scene graph generation method based on relational visual attention mechanism |
CN112085124A (en) * | 2020-09-27 | 2020-12-15 | 西安交通大学 | Complex network node classification method based on graph attention network |
-
2021
- 2021-05-08 CN CN202110497553.2A patent/CN112990202B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190163982A1 (en) * | 2017-11-28 | 2019-05-30 | Visual Semantics, Inc. | Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation |
US20190370587A1 (en) * | 2018-05-29 | 2019-12-05 | Sri International | Attention-based explanations for artificial intelligence behavior |
CN108920711A (en) * | 2018-07-25 | 2018-11-30 | 中国人民解放军国防科技大学 | Deep learning label data generation method oriented to unmanned aerial vehicle take-off and landing guide |
CN110991532A (en) * | 2019-12-03 | 2020-04-10 | 西安电子科技大学 | Scene graph generation method based on relational visual attention mechanism |
CN112085124A (en) * | 2020-09-27 | 2020-12-15 | 西安交通大学 | Complex network node classification method based on graph attention network |
Non-Patent Citations (2)
Title |
---|
SHUAI SHAO,MINGZE TANG: "Semi-supervised Structured Sparse Graph Data", 《2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND ADVANCED MANUFACTURING (AIAM)》 * |
李振东: "基于注意力的场景图生成算法研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113836339A (en) * | 2021-09-01 | 2021-12-24 | 淮阴工学院 | Scene graph generation method based on global information and position embedding |
CN113836339B (en) * | 2021-09-01 | 2023-09-26 | 淮阴工学院 | Scene graph generation method based on global information and position embedding |
CN115546626A (en) * | 2022-03-03 | 2022-12-30 | 中国人民解放军国防科技大学 | Data double-unbalance-oriented deviation reduction scene graph generation method and system |
CN115546626B (en) * | 2022-03-03 | 2024-02-02 | 中国人民解放军国防科技大学 | Data double imbalance-oriented depolarization scene graph generation method and system |
Also Published As
Publication number | Publication date |
---|---|
CN112990202B (en) | 2021-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Multi-scale neighborhood feature extraction and aggregation for point cloud segmentation | |
CN107766933B (en) | Visualization method for explaining convolutional neural network | |
Choi et al. | A tree-based context model for object recognition | |
Chen et al. | Decentralized clustering by finding loose and distributed density cores | |
Chibane et al. | Box2mask: Weakly supervised 3d semantic instance segmentation using bounding boxes | |
CN103207879B (en) | The generation method and apparatus of image index | |
Lai et al. | RGB-D object recognition: Features, algorithms, and a large scale benchmark | |
CN112990202B (en) | Scene graph generation method and system based on sparse representation | |
Lim et al. | Context by region ancestry | |
CN112085072B (en) | Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information | |
Santosh et al. | Integrating vocabulary clustering with spatial relations for symbol recognition | |
CN115908908B (en) | Remote sensing image aggregation type target recognition method and device based on graph attention network | |
CN110111365B (en) | Training method and device based on deep learning and target tracking method and device | |
CN104008177B (en) | Rule base structure optimization and generation method and system towards linguistic indexing of pictures | |
CN111611774A (en) | Operation and maintenance operation instruction security analysis method, system and storage medium | |
ElAlami | Supporting image retrieval framework with rule base system | |
Yang et al. | Automated semantics and topology representation of residential-building space using floor-plan raster maps | |
Luqman et al. | Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images | |
CN105956604B (en) | Action identification method based on two-layer space-time neighborhood characteristics | |
Coustaty et al. | A new adaptive structural signature for symbol recognition by using a galois lattice as a classifier | |
CN112948581B (en) | Patent automatic classification method and device, electronic equipment and storage medium | |
Fang et al. | Spatial transformer point convolution | |
Ri et al. | Bayesian network based semantic image classification with attributed relational graph | |
Wen et al. | Attention-Based Joint Semantic-Instance Segmentation of 3D Point Clouds. | |
Kandagatla et al. | Object Detection Mechanism using Deep CNN Model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |