CN112990202B - Scene graph generation method and system based on sparse representation - Google Patents

Scene graph generation method and system based on sparse representation Download PDF

Info

Publication number
CN112990202B
CN112990202B CN202110497553.2A CN202110497553A CN112990202B CN 112990202 B CN112990202 B CN 112990202B CN 202110497553 A CN202110497553 A CN 202110497553A CN 112990202 B CN112990202 B CN 112990202B
Authority
CN
China
Prior art keywords
node
edge
target
graph
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110497553.2A
Other languages
Chinese (zh)
Other versions
CN112990202A (en
Inventor
雷军
杨亚洲
周浩
张军
李硕豪
王风雷
刘盼
于淼淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110497553.2A priority Critical patent/CN112990202B/en
Publication of CN112990202A publication Critical patent/CN112990202A/en
Application granted granted Critical
Publication of CN112990202B publication Critical patent/CN112990202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a scene graph generation method and a scene graph generation system based on sparse representation, wherein the method comprises the following steps: carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set; identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph; synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying target types and relationships; and generating a scene graph according to the identified target type and relationship. The method can effectively filter the false relation, further effectively generate the sparse graph, reduce the computation complexity of the dense graph and improve the graph message transmission efficiency; meanwhile, the method can accurately extract features from the sparse graph, and further accurately generate the scene graph.

Description

Scene graph generation method and system based on sparse representation
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a scene graph generation method and system based on sparse representation.
Background
The generation of the scene graph plays an important role in the deep understanding of the visual scene. The scene graph is a refined semantic extraction of the target and the target relation in the real image, and is constructed by predicting the predefined target instance, the target attribute and the target-to-target relation, the interaction between the targets in the scene is represented by the structured language of the common triples, and the interaction can be represented by the form of the < subject-predicate-object > triples. In the scene graph, nodes are represented as target entities including category labels and bounding boxes, directed edges are represented as relationship categories between the subject and the object, and various attributes (such as color, material, and the like) of the target can be described and represented in the scene graph.
At present, the scene graph inference technology draws much attention due to the extraction of rich semantic information contained in target interaction. The rich scene graph semantic understanding can not only provide context clues for basic recognition tasks, but also have a wide prospect in various advanced visual applications, for example, the rich scene graph semantic is a key for improving image retrieval and various natural language-based image tasks, and also provides valuable information for applications such as visual question answering, image description, image generation and the like. Although conventional scenegraph generation methods have been empirically successful in many applications, the problems of high complexity of the computation of the dense map and imprecise pruning of the sparse map still remain.
Based on this, how to infer complex potential relationships among all targets and accurately extract a scene graph from an image is still a problem to be solved urgently at present.
Disclosure of Invention
The invention aims to provide a scene graph generation method and system based on sparse representation so as to realize reasonable reasoning of complex potential relations and accurately generate a scene graph.
In view of the above, in a first aspect, the present invention provides a scene graph generation method based on sparse representation, including:
carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set;
identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph;
synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying a target type and a target relationship;
and generating a scene graph according to the target type and the relation obtained by identification.
Preferably, the identifying all edges of the target pair as foreground edges and background edges through a preset relationship metric network and constructing a sparse map includes:
acquiring category characteristics, space characteristics and appearance characteristics of each target;
classifying edges of the target pair according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pair, and acquiring a classification result;
selecting according to the classification result
Figure 219945DEST_PATH_IMAGE001
Strip foreground edge and front
Figure 547896DEST_PATH_IMAGE002
A strip background edge, the structure comprises
Figure 872698DEST_PATH_IMAGE003
A node and
Figure 242499DEST_PATH_IMAGE004
sparse graph of edges.
Preferably, the classifying the edges of the target pair according to the class features, the spatial features and the appearance features of two targets in the target pair and obtaining a classification result includes:
respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate a combined spatial feature and a combined appearance feature;
embedding prior statistical probability of a target class to construct a joint class characteristic of the target pair;
connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate a logs characteristic;
and inputting the logs characteristics into a sigmoid classifier to obtain the edge probability of the target pair.
Preferably, the joint spatial features are:
Figure 336357DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 336674DEST_PATH_IMAGE006
for the purpose of the joint spatial feature,
Figure 584991DEST_PATH_IMAGE007
in order to be a multi-layer sensor,
Figure 961745DEST_PATH_IMAGE008
in order to operate in series with each other,
Figure 238006DEST_PATH_IMAGE009
are respectively the target
Figure 612486DEST_PATH_IMAGE010
And
Figure 443039DEST_PATH_IMAGE011
the spatial characteristic of (a);
the joint appearance characteristics are:
Figure 805843DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 202189DEST_PATH_IMAGE013
for the purpose of the said joint appearance feature,
Figure 278729DEST_PATH_IMAGE014
are respectively the target
Figure 268682DEST_PATH_IMAGE010
And
Figure 456081DEST_PATH_IMAGE011
the appearance characteristic of (a);
the prior statistical probability of the target category is:
Figure 143152DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 718490DEST_PATH_IMAGE016
a priori statistical probability for the object class defined as presence in the original image
Figure 726897DEST_PATH_IMAGE017
Presence in case of a Category object
Figure 983566DEST_PATH_IMAGE018
Probability of a class object;
the joint category characteristics are:
Figure 964292DEST_PATH_IMAGE019
wherein the content of the first and second substances,
Figure 146749DEST_PATH_IMAGE020
for the purpose of the joint class feature(s),
Figure 704769DEST_PATH_IMAGE021
for the number of all the categories it is,
Figure 968391DEST_PATH_IMAGE022
is a target of
Figure 865940DEST_PATH_IMAGE023
Belong to the category
Figure 783081DEST_PATH_IMAGE017
The probability of (a) of (b) being,
Figure 234922DEST_PATH_IMAGE024
is a target of
Figure 161289DEST_PATH_IMAGE025
Belong to the category
Figure 146301DEST_PATH_IMAGE018
A probability of (a),
Figure 640867DEST_PATH_IMAGE026
Are respectively the target
Figure 845583DEST_PATH_IMAGE023
And
Figure 310063DEST_PATH_IMAGE025
the class characteristics of (1).
Preferably, the synchronously learning the nodes and edges on the sparse graph and identifying the target type and the target relationship by the feature fusion and update strategy based on the graph attention neural network comprises:
fusing the appearance characteristic, the spatial characteristic and the category characteristic of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate a node characteristic and an edge characteristic;
acquiring attention weights of nodes and edges through a graph attention neural network;
and updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features.
Preferably, the fusing the appearance features, the spatial features and the category features of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate the node features and the edge features, includes:
aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph, and compressing through a coder and a decoder to obtain fusion characteristics;
obtaining an initialization node characteristic and an initialization edge characteristic according to the fusion characteristic;
embedding the prior statistical probability of the class relationship into the initialized node features and the initialized edge features to construct node features and edge features;
and distributing the node characteristics and the edge characteristics to corresponding nodes and edges in the sparse graph.
Preferably, the prior statistical probability of the class relationship is:
Figure 916625DEST_PATH_IMAGE027
wherein the content of the first and second substances,
Figure 877365DEST_PATH_IMAGE028
defining as a given node for said prior statistical probability
Figure 772640DEST_PATH_IMAGE017
And
Figure 40810DEST_PATH_IMAGE018
context exists
Figure 767458DEST_PATH_IMAGE029
The probability of (d);
the node is characterized in that:
Figure 72668DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure 16091DEST_PATH_IMAGE031
in order to be a feature of the node,
Figure 87953DEST_PATH_IMAGE032
the number of all the relationships is the same,
Figure 606790DEST_PATH_IMAGE033
as to the number of all the nodes,
Figure 738694DEST_PATH_IMAGE034
initializing a node characteristic for the node;
the edge characteristics are as follows:
Figure 405299DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 890638DEST_PATH_IMAGE036
in order to be a feature of the edge,
Figure 919774DEST_PATH_IMAGE037
the edge feature is initialized.
Preferably, the attention weights of the nodes and edges include an attention weight between nodes, an attention weight between nodes and edges, and an attention weight of edges;
the attention weight between the nodes is:
Figure 127639DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 484802DEST_PATH_IMAGE039
is a node
Figure 898466DEST_PATH_IMAGE040
And
Figure 454212DEST_PATH_IMAGE041
the weight of attention in between the two,
Figure 272126DEST_PATH_IMAGE042
is an AND node
Figure 241219DEST_PATH_IMAGE043
There is a set of nodes connecting the edges,
Figure 629213DEST_PATH_IMAGE044
is the weight parameter to be learned,
Figure 977149DEST_PATH_IMAGE045
are respectively nodes
Figure 621757DEST_PATH_IMAGE040
And
Figure 750250DEST_PATH_IMAGE041
the characteristics of the nodes of (a) are,
Figure 115503DEST_PATH_IMAGE046
is an AND node
Figure 156096DEST_PATH_IMAGE047
Adjacent node
Figure 971605DEST_PATH_IMAGE048
The characteristics of the nodes of (a) are,
Figure 259498DEST_PATH_IMAGE049
is a node
Figure 818655DEST_PATH_IMAGE048
And node
Figure 203500DEST_PATH_IMAGE047
Connecting edge of
Figure 799698DEST_PATH_IMAGE050
The edge characteristics of (a) are set,
Figure 401318DEST_PATH_IMAGE051
is the weight of the network;
the attention weight between the node and the edge is:
Figure 498587DEST_PATH_IMAGE052
wherein the content of the first and second substances,
Figure 675622DEST_PATH_IMAGE053
is a node
Figure 832934DEST_PATH_IMAGE054
And edge
Figure 157736DEST_PATH_IMAGE055
Attention weight in between;
the attention weight of the edge is:
Figure 668483DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 355816DEST_PATH_IMAGE057
are all nodes
Figure 854668DEST_PATH_IMAGE040
And node
Figure 604449DEST_PATH_IMAGE047
Connecting edge of
Figure 43521DEST_PATH_IMAGE055
Attention weight of (1).
Preferably, the new node is characterized by:
Figure 257465DEST_PATH_IMAGE058
wherein the content of the first and second substances,
Figure 366366DEST_PATH_IMAGE059
for the purpose of the new node characteristic,
Figure 993656DEST_PATH_IMAGE060
in order to be a sigmoid function,
Figure 407058DEST_PATH_IMAGE061
is an AND node
Figure 678770DEST_PATH_IMAGE062
There is a set of nodes connecting the edges,
Figure 20890DEST_PATH_IMAGE063
is a node
Figure 869897DEST_PATH_IMAGE040
And its adjacent node
Figure 526138DEST_PATH_IMAGE048
The weight of attention in between the two,
Figure 947630DEST_PATH_IMAGE064
is a node
Figure 788547DEST_PATH_IMAGE040
And its adjacent node
Figure 796954DEST_PATH_IMAGE048
Connecting edge of
Figure 991306DEST_PATH_IMAGE065
The edge characteristics of (a) are set,
Figure 362245DEST_PATH_IMAGE066
is a node
Figure 46167DEST_PATH_IMAGE062
And the connecting edge
Figure 243668DEST_PATH_IMAGE065
Attention weight in between;
the new edge characteristics are:
Figure 366344DEST_PATH_IMAGE067
wherein the content of the first and second substances,
Figure 529473DEST_PATH_IMAGE068
is the new edge feature.
In a second aspect, the present invention provides a scene graph generation system based on sparse representation, including:
the target area extraction module is used for carrying out target detection on the original image through a fast area convolution neural network to obtain a target area set;
the sparse graph constructing module is used for identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network and constructing a sparse graph;
the graph message transmission module is used for synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network and identifying a target type and a target relationship;
and the scene graph generating module is used for generating a scene graph according to the target type and the relation obtained by identification.
According to the scene graph generation method and system based on sparse representation, all edges of the target pairs in the original image are classified into the foreground and the background through RelMN, and a sparse graph is constructed, so that the false relation can be effectively filtered, the sparse graph can be effectively generated, the computation complexity of the dense graph is reduced, and the graph message transmission efficiency is improved; furthermore, through a feature fusion and updating strategy based on the graph attention neural network, nodes and edges on the sparse graph are synchronously learned to obtain target features and relation features, and the target features and the relation features are used for target and relation classification, so that the features can be accurately extracted from the sparse graph, and the scene graph can be accurately generated.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a sparse representation-based scene graph generation method according to an embodiment of the present invention;
FIG. 2 is a flowchart of step S20 of the sparse representation-based scene graph generating method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating two classifications of foreground and background in RelMN in an embodiment of the present invention;
FIG. 4 is a flowchart of step S30 of the sparse representation-based scene graph generating method according to an embodiment of the present invention;
fig. 5 is a schematic block diagram of a scene graph generation system based on sparse representation according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In an embodiment, as shown in fig. 1, a scene graph generation method based on sparse representation is provided, which includes the following steps:
and step S10, carrying out target detection on the original image through the fast regional convolutional neural network to obtain a target region set.
In this embodiment, an original image is obtained, Fast regional Convolutional Neural Network (Fast Convolutional Neural Network) is used to perform target detection on the image, and a plurality of (a) are automatically extracted from the original image
Figure 587558DEST_PATH_IMAGE003
A) target area
Figure 632875DEST_PATH_IMAGE010
Obtaining a target area set
Figure 231346DEST_PATH_IMAGE069
. Wherein the target region set
Figure 685199DEST_PATH_IMAGE070
Each target region in (1)
Figure 710924DEST_PATH_IMAGE010
Including position information and appearance characteristics of the target
Figure 243537DEST_PATH_IMAGE071
And class probability
Figure 645699DEST_PATH_IMAGE072
Understandably, the speed and accuracy of target detection can be improved by using the circumscribed matrix obtained in step S10 to cover most critical targets.
And step S20, identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph.
In this embodiment, RelMN (Relational metric Network) is configured to identify all edges as foreground edges and background edges, and automatically select all foreground edges and part of background edges to construct a sparse graph. The RelMN is composed of three parts, namely multi-feature extraction, two-classification of foreground and background and sparse image generation.
Preferably, as shown in fig. 2, step S20 includes the steps of:
step S201, multi-feature extraction: obtaining the category characteristics of each target
Figure 455523DEST_PATH_IMAGE073
Spatial characteristics of
Figure 980045DEST_PATH_IMAGE074
And appearance characteristics
Figure 436172DEST_PATH_IMAGE075
In step S201, based on the target area set
Figure 845288DEST_PATH_IMAGE069
Each target areaDomain
Figure 634252DEST_PATH_IMAGE010
Location information and class probability of included objects
Figure 1780DEST_PATH_IMAGE072
Conversion to spatial features
Figure 649930DEST_PATH_IMAGE074
And category characteristics
Figure 456212DEST_PATH_IMAGE073
And further according to each target area
Figure 688565DEST_PATH_IMAGE010
Appearance characteristics of the contained object
Figure 492573DEST_PATH_IMAGE075
Transformed spatial features
Figure 221495DEST_PATH_IMAGE074
And category characteristics
Figure 972413DEST_PATH_IMAGE073
And obtaining multi-dimensional characteristics for detecting whether potential relations exist between the target pairs. Preferably, for spatial features
Figure 906609DEST_PATH_IMAGE074
By amplitude and splicing will
Figure 943835DEST_PATH_IMAGE076
Dimensional position coordinate conversion
Figure 300998DEST_PATH_IMAGE077
Spatial features of dimensions
Figure 386766DEST_PATH_IMAGE078
Wherein, in the step (A),
Figure 270408DEST_PATH_IMAGE079
in order to be able to target the number of,
Figure 353902DEST_PATH_IMAGE080
position coordinates of the dimension as a target area
Figure 322995DEST_PATH_IMAGE010
Top left corner of the bounding box of the characterization
Figure 710989DEST_PATH_IMAGE081
Coordinates and lower right corner
Figure 324504DEST_PATH_IMAGE081
The coordinates, MPL, are the multi-layer perceptrons,
Figure 703533DEST_PATH_IMAGE008
is a series operation. Similarly, for class probability transition
Figure 832026DEST_PATH_IMAGE077
Class characteristics of dimension
Figure 462858DEST_PATH_IMAGE082
Step S202, classification of foreground and background: according to the target pair
Figure 321093DEST_PATH_IMAGE083
Class characteristics of two targets
Figure 979345DEST_PATH_IMAGE084
Spatial characteristics of
Figure 595134DEST_PATH_IMAGE074
And appearance characteristics
Figure 154291DEST_PATH_IMAGE075
To the target pair
Figure 476819DEST_PATH_IMAGE083
And classifying the edges and obtaining a classification result. Wherein the target pair
Figure 463230DEST_PATH_IMAGE083
From two different targets
Figure 300736DEST_PATH_IMAGE010
And
Figure 303065DEST_PATH_IMAGE025
and (4) forming.
As shown in fig. 3, the two classification diagrams of the foreground and the background in RelMN, step S202 includes the following steps:
the method comprises the following steps: respectively couple the targets
Figure 745679DEST_PATH_IMAGE083
Spatial characteristics of two targets
Figure 902990DEST_PATH_IMAGE074
And appearance characteristics
Figure 227793DEST_PATH_IMAGE075
Connected in series to generate joint space characteristics
Figure 738539DEST_PATH_IMAGE085
And combined appearance features
Figure 425873DEST_PATH_IMAGE086
. Wherein spatial features are combined
Figure 924725DEST_PATH_IMAGE085
The calculation formula of (2) is as follows:
Figure 940085DEST_PATH_IMAGE087
(1)
associative appearance feature
Figure 113578DEST_PATH_IMAGE086
Is calculated byComprises the following steps:
Figure 327522DEST_PATH_IMAGE088
(2)
step two, embedding prior statistical probability of target category
Figure 702002DEST_PATH_IMAGE089
Constructing object pairs
Figure 63713DEST_PATH_IMAGE083
Joint class characteristics of
Figure 477115DEST_PATH_IMAGE090
. Wherein the prior statistical probability of the object class
Figure 748827DEST_PATH_IMAGE089
Defined as being present in the original image
Figure 153264DEST_PATH_IMAGE017
Presence in case of a Category object
Figure 143217DEST_PATH_IMAGE018
Probability of class object, prior statistical probability
Figure 596195DEST_PATH_IMAGE089
Can be expressed as:
Figure 17687DEST_PATH_IMAGE091
(3)
equation (3), a priori statistical probability
Figure 593024DEST_PATH_IMAGE092
Is the number of all categories.
And learning statistical co-occurrence knowledge among target categories based on the prior statistical probability and the category characteristics. The joint class characteristics
Figure 867011DEST_PATH_IMAGE090
Can be expressed as:
Figure 326942DEST_PATH_IMAGE093
(4)
in the formula (4), the first and second groups,
Figure 697881DEST_PATH_IMAGE094
is a target of
Figure 319486DEST_PATH_IMAGE010
Belong to the category
Figure 408665DEST_PATH_IMAGE017
The probability of (a) of (b) being,
Figure 713699DEST_PATH_IMAGE095
is a target of
Figure 814511DEST_PATH_IMAGE025
Belong to the category
Figure 997230DEST_PATH_IMAGE018
The probability of (c).
Step three, combining the appearance characteristics
Figure 980230DEST_PATH_IMAGE086
Joining spatial features
Figure 781964DEST_PATH_IMAGE085
And joint category features
Figure 861915DEST_PATH_IMAGE090
Performing series connection to generate logs characteristics
Figure 589437DEST_PATH_IMAGE096
. Wherein, logs characteristics
Figure 59733DEST_PATH_IMAGE096
Is calculated byComprises the following steps:
Figure 789792DEST_PATH_IMAGE097
(5)
step four, identifying the Logits characteristics
Figure 599616DEST_PATH_IMAGE096
Inputting sigmoid classifier to obtain target pair
Figure 124138DEST_PATH_IMAGE083
Edge probability of
Figure 81730DEST_PATH_IMAGE098
. Where the edge probability
Figure 723801DEST_PATH_IMAGE098
The calculation formula of (2) is as follows:
Figure 512766DEST_PATH_IMAGE099
(6)
that is, in step S202, the position coordinates and the class probability of each object are determined
Figure 880293DEST_PATH_IMAGE072
Respectively converted into spatial features
Figure 528443DEST_PATH_IMAGE074
And category characteristics
Figure 6829DEST_PATH_IMAGE084
Then, the spatial characteristics of two different targets are first determined
Figure 915879DEST_PATH_IMAGE074
And appearance characteristics
Figure 156106DEST_PATH_IMAGE075
Concatenate to generate joint spatial features
Figure 416186DEST_PATH_IMAGE085
And combined appearance features
Figure 698262DEST_PATH_IMAGE086
And constructing joint category features
Figure 602765DEST_PATH_IMAGE090
Introducing prior statistical probability of classes, and then combining spatial features
Figure 639991DEST_PATH_IMAGE085
Combined appearance features
Figure 793891DEST_PATH_IMAGE086
And joint category features
Figure 581457DEST_PATH_IMAGE084
Concatenate to generate logs features
Figure 871624DEST_PATH_IMAGE096
And finally calculating output probability by using sigmoid classifier
Figure 79751DEST_PATH_IMAGE098
And further, all the edges are divided into a foreground type and a background type.
Step S203, sparse map generation: selection according to classification result
Figure 720948DEST_PATH_IMAGE001
Strip foreground edge and front
Figure 548090DEST_PATH_IMAGE002
A strip background edge, the structure comprises
Figure 286239DEST_PATH_IMAGE003
A node and
Figure 101486DEST_PATH_IMAGE004
sparse graph of edges.
In step S203, according to the output of the sigmoid classifier, first all are selected
Figure 167662DEST_PATH_IMAGE001
Figure 923128DEST_PATH_IMAGE001
Not hyper-parametric and determined by sigmoid classifier) pairs of edges predicted targets as foreground. Secondly, automatically selecting front with high foreground probability
Figure 187887DEST_PATH_IMAGE002
A strip background edge is constructed to contain
Figure 878763DEST_PATH_IMAGE001
The sparse graph with the foreground edge and the background edge enhance the robustness of relation classification and reduce the risk background of eliminating real relations. Finally obtaining a product containing
Figure 556869DEST_PATH_IMAGE003
A node and
Figure 489928DEST_PATH_IMAGE004
sparse graph of edges.
In this embodiment, all the edges are divided into two types, namely foreground and background, by the RelMN to obtain the potential relationship between the target pairs, which is more reasonable and the constructed sparse graph is more reasonable than the potential relationship generated by the distance between the target pairs. In addition, the message transmission on the sparse graph can obviously reduce the calculation complexity, so that the message transmission is more accurate and effective.
And step S30, synchronously learning nodes and edges of the sparse graph through feature fusion and updating strategies based on the graph attention neural network, and identifying object classes and relationships.
In step S30, the feature fusion and update strategy based on the graph attention neural network includes three parts of node feature and edge feature generation, weight learning of node feature and edge feature, and object and relationship classification.
Preferably, as shown in fig. 4, step S30 includes the steps of:
step S301, generation of node features and edge features: appearance characteristics of each node in sparse graph
Figure 874772DEST_PATH_IMAGE075
Spatial characteristics of
Figure 595604DEST_PATH_IMAGE074
And category characteristics
Figure 636372DEST_PATH_IMAGE073
Fusion is carried out, and prior statistical probability of class relation is embedded
Figure 405745DEST_PATH_IMAGE028
Feature of construction node
Figure 238572DEST_PATH_IMAGE031
And edge characteristics
Figure 238627DEST_PATH_IMAGE036
In step S301, appearance features of each node in the sparse graph are first checked
Figure 829008DEST_PATH_IMAGE075
Spatial characteristics of
Figure 729968DEST_PATH_IMAGE074
And category characteristics
Figure 761509DEST_PATH_IMAGE073
Performing polymerization and compressing by a coder-decoder to obtain a fusion characteristic
Figure 761826DEST_PATH_IMAGE100
Then, the initialized node characteristics are obtained according to the fusion characteristics
Figure 901820DEST_PATH_IMAGE101
And initializing edge features
Figure 992091DEST_PATH_IMAGE102
. Wherein the node characteristics
Figure 268352DEST_PATH_IMAGE101
The initialization process of (1) is as follows: direct through fusion feature
Figure 642832DEST_PATH_IMAGE100
Initializing node characteristics, i.e.
Figure 942227DEST_PATH_IMAGE103
(ii) a Edge feature
Figure 184989DEST_PATH_IMAGE102
The initialization process of (1) is as follows: sequentially connecting the fusion features of the subject node and the object node, and performing dimension compression through the full connection layer, i.e.
Figure 191122DEST_PATH_IMAGE104
Wherein the full connection layer
Figure 31777DEST_PATH_IMAGE105
A Leaky Relu layer.
Further, the prior statistical probability of the class relation
Figure 146364DEST_PATH_IMAGE028
Embedded to initialization node features
Figure 537025DEST_PATH_IMAGE101
And initializing edge features
Figure 53457DEST_PATH_IMAGE102
Feature of construction node
Figure 769740DEST_PATH_IMAGE031
And edge characteristics
Figure 43727DEST_PATH_IMAGE036
Finally, the node characteristics
Figure 628292DEST_PATH_IMAGE031
And edge characteristics
Figure 107552DEST_PATH_IMAGE036
And distributing to corresponding nodes and edges in the sparse graph. Wherein a priori statistical probabilities of class relationships
Figure 791475DEST_PATH_IMAGE028
Is defined as a given node
Figure 615074DEST_PATH_IMAGE017
And
Figure 613117DEST_PATH_IMAGE018
context exists
Figure 510666DEST_PATH_IMAGE029
Probability of (3), a priori statistical probability of class relationship
Figure 693386DEST_PATH_IMAGE028
Can be expressed as:
Figure 378183DEST_PATH_IMAGE106
(7)
in the formula (7), the first and second groups,
Figure 304550DEST_PATH_IMAGE107
is a main body node
Figure 56606DEST_PATH_IMAGE017
To the object node
Figure 551172DEST_PATH_IMAGE018
Corresponding relation of (A) and
Figure 83785DEST_PATH_IMAGE108
Figure 689209DEST_PATH_IMAGE032
is the number of all relationships.
Node characteristics
Figure 794306DEST_PATH_IMAGE031
The calculation formula of (2) is as follows:
Figure 584408DEST_PATH_IMAGE109
(8)
edge feature
Figure 276420DEST_PATH_IMAGE036
The calculation formula of (2) is as follows:
Figure 419957DEST_PATH_IMAGE110
(9)
understandably, as part of the sparse graph, the prior statistical probability according to class relationships
Figure 208921DEST_PATH_IMAGE028
And class probability
Figure 45290DEST_PATH_IMAGE072
The inherent weights of the nodes and edges are constructed. The inherent weights of the node and the edge respectively reflect the node in the node set
Figure 723134DEST_PATH_IMAGE040
And edges between other nodes and in edge sets
Figure 529416DEST_PATH_IMAGE055
And other edge.
Step S302, learning the weight of the node characteristics and the edge characteristics: attention weights of nodes and edges are obtained through a graph attention neural network. Wherein, the attention weight of the node and the edge includes the attention weight between the node and the node, the attention weight between the node and the edge, and the attention weight of the edge.
For node message aggregation, the calculation formula of attention weight between nodes is as follows:
Figure 376149DEST_PATH_IMAGE111
(10)
in the formula (10), the first and second groups,
Figure 117840DEST_PATH_IMAGE039
is a node
Figure 377920DEST_PATH_IMAGE040
And
Figure 597680DEST_PATH_IMAGE041
the weight of attention in between the two,
Figure 797455DEST_PATH_IMAGE042
is a node
Figure 834681DEST_PATH_IMAGE041
There is a set of nodes connecting the edges,
Figure 191845DEST_PATH_IMAGE044
is the weight parameter to be learned,
Figure 277612DEST_PATH_IMAGE031
Figure 161255DEST_PATH_IMAGE112
are respectively nodes
Figure 41486DEST_PATH_IMAGE040
And
Figure 118901DEST_PATH_IMAGE041
the characteristics of the nodes of (a) are,
Figure 70677DEST_PATH_IMAGE046
is an AND node
Figure 746509DEST_PATH_IMAGE041
Adjacent node
Figure 903DEST_PATH_IMAGE048
The characteristics of the nodes of (a) are,
Figure 191713DEST_PATH_IMAGE049
is a node
Figure 619284DEST_PATH_IMAGE048
And node
Figure 863138DEST_PATH_IMAGE041
Connecting edge of
Figure 413068DEST_PATH_IMAGE050
The edge characteristics of (a) are set,
Figure 232120DEST_PATH_IMAGE113
is the weight of the network. Note that, in the formula (10), the node
Figure 197802DEST_PATH_IMAGE040
And node
Figure 910543DEST_PATH_IMAGE048
Are all main body nodes, and are all provided with a main body,
Figure 506740DEST_PATH_IMAGE041
is the object node.
The formula for calculating the attention weight between a node and an edge is:
Figure 108361DEST_PATH_IMAGE114
(11)
in the formula (11), the reaction mixture,
Figure 205630DEST_PATH_IMAGE115
is a node
Figure 382664DEST_PATH_IMAGE041
And edge
Figure 539976DEST_PATH_IMAGE116
Attention weight in between. In the formula (11), the node
Figure 130358DEST_PATH_IMAGE041
Is the object node.
For the aggregation of edge messages, the formula for calculating the attention weight of an edge is as follows:
Figure 641104DEST_PATH_IMAGE117
(12)
in the formula (12), the first and second groups,
Figure 62859DEST_PATH_IMAGE118
Figure 764973DEST_PATH_IMAGE119
are all nodes
Figure 577071DEST_PATH_IMAGE040
And node
Figure 281722DEST_PATH_IMAGE041
Connecting edge of
Figure 230087DEST_PATH_IMAGE116
Attention weight of (1). It should be noted that, in the formula (12), the node
Figure 135726DEST_PATH_IMAGE040
Is a main body node, a node
Figure 871338DEST_PATH_IMAGE041
Is the object node.
Understandably, as another aspect of the sparse Graph, Attention weights of nodes and edges are obtained through a GAT (Graph Attention neural Network), in combination with the category-based relationship in step S301Prior statistical probability of
Figure 520625DEST_PATH_IMAGE028
And class probability
Figure 182551DEST_PATH_IMAGE072
To extract new target features and relationship features.
Step S303, object and relationship classification: updating node characteristics based on attention weights of nodes and edges
Figure 462354DEST_PATH_IMAGE031
And edge characteristics
Figure 249044DEST_PATH_IMAGE036
And according to the new node characteristics
Figure 764339DEST_PATH_IMAGE059
And edge characteristics
Figure 654672DEST_PATH_IMAGE068
The objects and relationships are classified.
Specifically, the node characteristics are updated according to the hidden node characteristics, the adjacent node characteristics and the connection edge characteristics
Figure 167693DEST_PATH_IMAGE031
And according to the new node characteristics
Figure 176101DEST_PATH_IMAGE059
Classifying the target category; updating edge characteristics according to hidden edge characteristics, subject node characteristics and object node characteristics simultaneously
Figure 495086DEST_PATH_IMAGE036
And according to the new edge characteristics
Figure 741391DEST_PATH_IMAGE068
The relationships are classified. Wherein the new node characteristics
Figure 923849DEST_PATH_IMAGE059
The calculation formula of (2) is as follows:
Figure 481869DEST_PATH_IMAGE120
(13)
in the formula (13), the first and second groups,
Figure 542229DEST_PATH_IMAGE060
is sigmoid function
Figure 643040DEST_PATH_IMAGE061
Is an AND node
Figure 560180DEST_PATH_IMAGE040
There is a set of nodes connecting the edges,
Figure 808759DEST_PATH_IMAGE121
is a node
Figure 843449DEST_PATH_IMAGE040
And its adjacent node
Figure 595504DEST_PATH_IMAGE048
The weight of attention in between the two,
Figure 949125DEST_PATH_IMAGE122
is a node
Figure 419421DEST_PATH_IMAGE040
And its adjacent node
Figure 759266DEST_PATH_IMAGE048
Connecting edge of
Figure 693724DEST_PATH_IMAGE123
The edge characteristics of (a) are set,
Figure 654465DEST_PATH_IMAGE124
is a node
Figure 80898DEST_PATH_IMAGE040
And the connecting edge
Figure 490014DEST_PATH_IMAGE123
Attention weight in between. Note that, in the formula (13), the node
Figure 278978DEST_PATH_IMAGE040
Is the object node.
Novel edge features
Figure 849768DEST_PATH_IMAGE068
The calculation formula of (2) is as follows:
Figure 804910DEST_PATH_IMAGE125
(14)
note that, in the formula (13), the node
Figure 876771DEST_PATH_IMAGE040
Is a main body node, a node
Figure 395608DEST_PATH_IMAGE041
Is the object node.
In this embodiment, the statistical co-occurrence knowledge and the context clue in the data set are learned in a centralized manner based on the feature fusion and update strategy of the graph attention neural network, so as to obtain output features (including new node features and edge features), and further classify the targets and the relationships thereof according to the output features, so that the messages on the sparse graph can be effectively transmitted and integrated.
And step S40, generating a scene graph according to the identified object type and the relationship.
In step S40, the generated scene graph includes the positions of the objects, the categories of the objects, and the relationships between the objects, and the scene graph can be structurally represented as a set of triples, i.e. the triples
Figure 199616DEST_PATH_IMAGE126
Wherein the content of the first and second substances,
Figure 194117DEST_PATH_IMAGE127
is a target area set, each target area in the target area set
Figure 679456DEST_PATH_IMAGE128
Containing coordinate position information of the object described by the bounding box.
Figure 708592DEST_PATH_IMAGE129
Is a target set, each target area in the target area set
Figure 916457DEST_PATH_IMAGE128
All correspond to a category label
Figure 539199DEST_PATH_IMAGE130
And is and
Figure 687284DEST_PATH_IMAGE131
Figure 446293DEST_PATH_IMAGE033
a set of labels for all target categories.
Figure 326524DEST_PATH_IMAGE132
Is a binary relation set, each relation in the binary relation set
Figure 295617DEST_PATH_IMAGE133
Is a triple containing a subject node
Figure 886873DEST_PATH_IMAGE017
Object node
Figure 359443DEST_PATH_IMAGE018
And a subject node
Figure 676154DEST_PATH_IMAGE017
And object node
Figure 7910DEST_PATH_IMAGE018
The relationship between
Figure 763376DEST_PATH_IMAGE134
And is and
Figure 231398DEST_PATH_IMAGE135
is the complete set of relationships; wherein the main body node
Figure 46907DEST_PATH_IMAGE017
Object node
Figure 895652DEST_PATH_IMAGE018
From candidate regions
Figure 330176DEST_PATH_IMAGE128
And candidate region
Figure 42917DEST_PATH_IMAGE128
Corresponding category label
Figure 639114DEST_PATH_IMAGE130
Is determined, i.e. is
Figure 742199DEST_PATH_IMAGE136
As can be seen from the above, in the scene graph generation method based on sparse representation according to the embodiment, after the target region set is extracted from the original image through Fast R-CNN, all edges of the target pair in the original image are classified into two types, namely foreground and background, through RelMN, and a sparse graph is constructed, so that the false relationship can be effectively filtered, the sparse graph is further effectively generated, the computation complexity of the dense graph is reduced, and the graph message transmission efficiency is improved; and then synchronously learning nodes and edges on the sparse graph to obtain target characteristics and relationship characteristics through a characteristic fusion and updating strategy based on the graph attention neural network, and performing target and relationship classification by using the target characteristics and the relationship characteristics, so that the characteristics can be accurately extracted from the sparse graph, and the scene graph is accurately generated.
In an embodiment, a scene graph generation system based on sparse representation is provided, and the scene graph generation system based on sparse representation corresponds to the scene graph generation method based on sparse representation in the above embodiments one to one. As shown in fig. 5, the sparse representation-based scene graph generation system includes a target region extraction module 110, a sparse graph construction module 120, a graph message transmission module 130, and a scene graph generation module 140, and the detailed description of each functional model is as follows:
and the target area extraction module 110 is configured to perform target detection on the original image through a fast area convolutional neural network to obtain a target area set.
And a sparse graph constructing module 120, configured to identify all edges of the target pair as a foreground edge and a background edge through a preset relationship metric network, and construct a sparse graph.
And the graph message transmission module 130 is used for synchronously learning the nodes and edges of the sparse graph and identifying the target type and the target relation through a feature fusion and updating strategy based on the graph attention neural network.
And the scene graph generating module 140 is configured to generate a scene graph according to the identified object type and relationship.
Further, the sparse graph constructing module 120 includes a multi-feature extracting unit, a binary classifying unit, and a sparse graph generating unit, and the detailed description of each functional unit is as follows:
and the multi-feature extraction unit is used for acquiring the category feature, the spatial feature and the appearance feature of each target.
And the two classification units are used for classifying the edges of the target pairs according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pairs and acquiring the classification result.
A sparse graph generating unit for selecting the sparse graph according to the classification result
Figure 839468DEST_PATH_IMAGE001
Strip foreground edge and front
Figure 515038DEST_PATH_IMAGE002
A strip background edge, the structure comprises
Figure 672350DEST_PATH_IMAGE003
A node and
Figure 997152DEST_PATH_IMAGE004
sparse graph of edges.
Further, the classification unit includes a first joint subunit, a first knowledge embedding subunit, a second joint subunit and a classification subunit, and the detailed description of each functional subunit is as follows:
and the first joint subunit is used for respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate joint spatial features and joint appearance features.
And the first knowledge embedding subunit is used for embedding the prior statistical probability of the target class to construct the joint class characteristics of the target pair.
And the second joint subunit is used for connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate the logs characteristic.
And the classification subunit is used for inputting the logs characteristics into the sigmoid classifier to obtain the edge probability of the target pair.
Further, the graph message passing module 130 includes a node and edge feature generating unit, a weight learning unit, and a feature updating unit, and the detailed description of each functional unit is as follows:
and the node and edge feature generation unit is used for fusing the appearance feature, the spatial feature and the class feature of each node in the sparse graph, embedding the prior statistical probability of the class relationship, and generating the node feature and the edge feature.
And the weight learning unit is used for acquiring the attention weights of the nodes and the edges through the graph attention neural network.
And the feature updating unit is used for updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features.
Further, the node and edge feature generation unit includes a feature fusion subunit, an initialization subunit, a second knowledge embedding subunit and a feature allocation subunit, and the detailed description of each functional subunit is as follows:
and the characteristic fusion subunit is used for aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph and compressing the aggregated appearance characteristics, spatial characteristics and category characteristics through a coder and a decoder to obtain fusion characteristics.
And the initialization subunit is used for obtaining the initialized node characteristic and the initialized edge characteristic according to the fusion characteristic.
And the second knowledge embedding subunit is used for embedding the prior statistical probability of the class relationship into the initialized node feature and the initialized edge feature and constructing the node feature and the edge feature.
And the feature distribution subunit is used for distributing the node features and the edge features to corresponding nodes and edges in the sparse graph.
For specific limitations of the sparse representation-based scenegraph generation system, reference may be made to the above limitations of the sparse representation-based scenegraph generation method, and details thereof are not repeated here.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (8)

1. A scene graph generation method based on sparse representation is characterized by comprising the following steps:
carrying out target detection on the original image through a fast regional convolutional neural network to obtain a target region set;
identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network, and constructing a sparse graph; the method comprises the following steps:
acquiring category characteristics, space characteristics and appearance characteristics of each target;
classifying edges of the target pair according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pair, and acquiring a classification result;
selecting according to the classification result
Figure 791289DEST_PATH_IMAGE001
Strip foreground edge and front
Figure 997667DEST_PATH_IMAGE002
A strip background edge, the structure comprises
Figure 433327DEST_PATH_IMAGE003
A node and
Figure 798449DEST_PATH_IMAGE004
a sparse graph of edges;
synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network, and identifying a target type and a target relationship; the method comprises the following steps:
fusing the appearance characteristic, the spatial characteristic and the category characteristic of each node in the sparse graph, and embedding the prior statistical probability of the category relationship to generate a node characteristic and an edge characteristic;
acquiring attention weights of nodes and edges through a graph attention neural network;
updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features;
and generating a scene graph according to the target type and the relation obtained by identification.
2. The sparse representation-based scene graph generation method of claim 1, wherein the classifying edges of the target pair according to the class features, the spatial features and the appearance features of two targets in the target pair and obtaining the classification result comprises:
respectively connecting the spatial features and the appearance features of the two targets in the target pair in series to generate a combined spatial feature and a combined appearance feature;
embedding prior statistical probability of a target class to construct a joint class characteristic of the target pair;
connecting the joint appearance characteristic, the joint space characteristic and the joint category characteristic in series to generate a logs characteristic;
and inputting the logs characteristics into a sigmoid classifier to obtain the edge probability of the target pair.
3. The sparse representation-based scene graph generation method of claim 2, wherein the joint spatial features are:
Figure 217929DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 693910DEST_PATH_IMAGE006
for the purpose of the joint spatial feature,
Figure 311973DEST_PATH_IMAGE007
in order to be a multi-layer sensor,
Figure 723363DEST_PATH_IMAGE008
in order to operate in series with each other,
Figure 754773DEST_PATH_IMAGE009
Figure 909811DEST_PATH_IMAGE010
are respectively the target
Figure 710277DEST_PATH_IMAGE011
And
Figure 26988DEST_PATH_IMAGE012
the spatial characteristic of (a);
the joint appearance characteristics are:
Figure 545694DEST_PATH_IMAGE013
wherein the content of the first and second substances,
Figure 35582DEST_PATH_IMAGE014
for the purpose of the said joint appearance feature,
Figure 831499DEST_PATH_IMAGE015
Figure 443746DEST_PATH_IMAGE016
are respectively the target
Figure 59535DEST_PATH_IMAGE011
And
Figure 681010DEST_PATH_IMAGE012
the appearance characteristic of (a);
the prior statistical probability of the target category is:
Figure 128171DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 52265DEST_PATH_IMAGE018
a priori statistical probability for the object class defined as presence in the original image
Figure 277054DEST_PATH_IMAGE019
Presence in case of a Category object
Figure 312006DEST_PATH_IMAGE020
Probability of a class object;
the joint category characteristics are:
Figure 941571DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 36566DEST_PATH_IMAGE022
for the purpose of the joint class feature(s),
Figure 751581DEST_PATH_IMAGE023
for the number of all the categories it is,
Figure 386962DEST_PATH_IMAGE024
is a target of
Figure 11978DEST_PATH_IMAGE011
Belong to the category
Figure 402508DEST_PATH_IMAGE025
The probability of (a) of (b) being,
Figure 214606DEST_PATH_IMAGE026
is a target of
Figure 715995DEST_PATH_IMAGE012
Belong to the category
Figure 195518DEST_PATH_IMAGE020
The probability of (a) of (b) being,
Figure 429053DEST_PATH_IMAGE027
Figure 321923DEST_PATH_IMAGE028
are respectively the target
Figure 299106DEST_PATH_IMAGE011
And
Figure 757769DEST_PATH_IMAGE012
the class characteristics of (1).
4. The sparse representation-based scene graph generating method of claim 1, wherein the fusing appearance features, spatial features and class features of each node in the sparse graph and embedding prior statistical probabilities of class relations to generate node features and edge features comprises:
aggregating the appearance characteristics, the spatial characteristics and the category characteristics of each node in the sparse graph, and compressing through a coder and a decoder to obtain fusion characteristics;
obtaining an initialization node characteristic and an initialization edge characteristic according to the fusion characteristic;
embedding the prior statistical probability of the class relationship into the initialized node features and the initialized edge features to construct node features and edge features;
and distributing the node characteristics and the edge characteristics to corresponding nodes and edges in the sparse graph.
5. The sparse representation-based scenegraph generation method of claim 4, wherein the prior statistical probability of the class relationship is:
Figure 99889DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 276792DEST_PATH_IMAGE030
defining as a given node for said prior statistical probability
Figure 260929DEST_PATH_IMAGE019
And
Figure 246202DEST_PATH_IMAGE020
context exists
Figure 152366DEST_PATH_IMAGE031
The probability of (d);
the node is characterized in that:
Figure 691932DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 338814DEST_PATH_IMAGE033
the node characteristics are the number of all relationships, the number of all nodes and the initialized node characteristics;
the edge characteristics are as follows:
Figure 339951DEST_PATH_IMAGE036
wherein the content of the first and second substances,
Figure 627712DEST_PATH_IMAGE037
in order to be a feature of the edge,
Figure 748115DEST_PATH_IMAGE038
the edge feature is initialized.
6. The sparse representation based scene graph generation method of claim 1, wherein the attention weights of the nodes and edges comprise attention weights between nodes, attention weights between nodes and edges, and attention weights of edges;
the attention weight between the nodes is:
Figure 121328DEST_PATH_IMAGE039
wherein, for the attention weights between and at the nodes,
Figure 217143DEST_PATH_IMAGE043
a set of nodes having connecting edges with the nodes, which are weight parameters to be learned, node features of the nodes and the nodes respectively, node features of nodes adjacent to the nodes, edge features of the connecting edges of the nodes and the nodes, and weights of the network;
the attention weight between the node and the edge is:
Figure 213951DEST_PATH_IMAGE054
wherein the content of the first and second substances,
Figure 689931DEST_PATH_IMAGE055
is a node
Figure 307995DEST_PATH_IMAGE042
And edge
Figure 719384DEST_PATH_IMAGE056
Attention weight in between;
the attention weight of the edge is:
Figure 750794DEST_PATH_IMAGE057
wherein the content of the first and second substances,
Figure 374674DEST_PATH_IMAGE058
Figure 440719DEST_PATH_IMAGE059
are all nodes
Figure 819747DEST_PATH_IMAGE041
And node
Figure 213820DEST_PATH_IMAGE042
Connecting edge of
Figure 34533DEST_PATH_IMAGE056
Attention weight of (1).
7. The sparse representation-based scenegraph generation method of claim 6, wherein the new node features are:
Figure 299292DEST_PATH_IMAGE060
wherein the content of the first and second substances,
Figure 177118DEST_PATH_IMAGE061
for the purpose of the new node characteristic,
Figure 58486DEST_PATH_IMAGE062
in order to be a sigmoid function,
Figure 679960DEST_PATH_IMAGE063
is an AND node
Figure 330385DEST_PATH_IMAGE041
There is a set of nodes connecting the edges,
Figure 113533DEST_PATH_IMAGE064
is a node
Figure 278935DEST_PATH_IMAGE041
And its adjacent node
Figure 313887DEST_PATH_IMAGE049
The weight of attention in between the two,
Figure 209031DEST_PATH_IMAGE065
is a node
Figure 772867DEST_PATH_IMAGE041
And its adjacent node
Figure 753462DEST_PATH_IMAGE049
Connecting edge of
Figure 592105DEST_PATH_IMAGE066
The edge characteristics of (a) are set,
Figure 76176DEST_PATH_IMAGE067
is a node
Figure 138810DEST_PATH_IMAGE041
And the connecting edge
Figure 216487DEST_PATH_IMAGE066
Attention weight in between;
the new edge characteristics are:
Figure 983455DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 197398DEST_PATH_IMAGE069
is the new edge feature.
8. A sparse representation-based scenegraph generation system, comprising:
the target area extraction module is used for carrying out target detection on the original image through a fast area convolution neural network to obtain a target area set;
the sparse graph constructing module is used for identifying all edges of the target pair as a foreground edge and a background edge through a preset relation measurement network and constructing a sparse graph; the sparse graph construction module comprises:
the multi-feature extraction unit is used for acquiring the category feature, the spatial feature and the appearance feature of each target;
the two classification units are used for classifying the edges of the target pairs according to the class characteristics, the space characteristics and the appearance characteristics of the two targets in the target pairs and acquiring classification results;
a sparse map generation unit for selecting the sparse map according to the classification result
Figure 755900DEST_PATH_IMAGE070
Strip foreground edge and front
Figure 320874DEST_PATH_IMAGE071
A strip background edge, the structure comprises
Figure 298057DEST_PATH_IMAGE003
A node and
Figure 756720DEST_PATH_IMAGE072
a sparse graph of edges;
the graph message transmission module is used for synchronously learning nodes and edges on the sparse graph through a feature fusion and updating strategy based on a graph attention neural network and identifying a target type and a target relationship; the graph messaging module includes:
the node and edge feature generation unit is used for fusing the appearance feature, the spatial feature and the category feature of each node in the sparse graph, embedding the prior statistical probability of the category relationship, and generating the node feature and the edge feature;
the weight learning unit is used for acquiring attention weights of the nodes and the edges through the graph attention neural network;
the feature updating unit is used for updating the node features and the edge features according to the attention weights of the nodes and the edges, and classifying the targets and the relations according to the new node features and the new edge features;
and the scene graph generating module is used for generating a scene graph according to the target type and the relation obtained by identification.
CN202110497553.2A 2021-05-08 2021-05-08 Scene graph generation method and system based on sparse representation Active CN112990202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110497553.2A CN112990202B (en) 2021-05-08 2021-05-08 Scene graph generation method and system based on sparse representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110497553.2A CN112990202B (en) 2021-05-08 2021-05-08 Scene graph generation method and system based on sparse representation

Publications (2)

Publication Number Publication Date
CN112990202A CN112990202A (en) 2021-06-18
CN112990202B true CN112990202B (en) 2021-08-06

Family

ID=76337256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110497553.2A Active CN112990202B (en) 2021-05-08 2021-05-08 Scene graph generation method and system based on sparse representation

Country Status (1)

Country Link
CN (1) CN112990202B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113836339B (en) * 2021-09-01 2023-09-26 淮阴工学院 Scene graph generation method based on global information and position embedding
CN115546626B (en) * 2022-03-03 2024-02-02 中国人民解放军国防科技大学 Data double imbalance-oriented depolarization scene graph generation method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10452923B2 (en) * 2017-11-28 2019-10-22 Visual Semantics, Inc. Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation
US10909401B2 (en) * 2018-05-29 2021-02-02 Sri International Attention-based explanations for artificial intelligence behavior
CN108920711B (en) * 2018-07-25 2021-09-24 中国人民解放军国防科技大学 Deep learning label data generation method oriented to unmanned aerial vehicle take-off and landing guide
CN110991532B (en) * 2019-12-03 2022-03-04 西安电子科技大学 Scene graph generation method based on relational visual attention mechanism
CN112085124B (en) * 2020-09-27 2022-08-09 西安交通大学 Complex network node classification method based on graph attention network

Also Published As

Publication number Publication date
CN112990202A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN107515873B (en) Junk information identification method and equipment
CN103207879B (en) The generation method and apparatus of image index
Li et al. Multi-scale neighborhood feature extraction and aggregation for point cloud segmentation
CN112990202B (en) Scene graph generation method and system based on sparse representation
CN107766933A (en) A kind of method for visualizing for explaining convolutional neural networks
Lim et al. Context by region ancestry
TW202207077A (en) Text area positioning method and device
Santosh et al. Integrating vocabulary clustering with spatial relations for symbol recognition
CN112085072B (en) Cross-modal retrieval method of sketch retrieval three-dimensional model based on space-time characteristic information
CN111488911B (en) Image entity extraction method based on Mask R-CNN and GAN
CN115908908A (en) Remote sensing image gathering type target identification method and device based on graph attention network
Chu et al. Icm-3d: Instantiated category modeling for 3d instance segmentation
Mewada et al. Automatic room information retrieval and classification from floor plan using linear regression model
CN113157886A (en) Automatic question and answer generating method, system, terminal and readable storage medium
CN110111365B (en) Training method and device based on deep learning and target tracking method and device
Luqman et al. Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images
Yang et al. Automated semantics and topology representation of residential-building space using floor-plan raster maps
CN105956604B (en) Action identification method based on two-layer space-time neighborhood characteristics
Haznedar et al. Implementing PointNet for point cloud segmentation in the heritage context
CN112948581B (en) Patent automatic classification method and device, electronic equipment and storage medium
Cao et al. Label-efficient deep learning-based semantic segmentation of building point clouds at LOD3 level
CN114821188A (en) Image processing method, training method of scene graph generation model and electronic equipment
CN111611774B (en) Operation and maintenance operation instruction safety analysis method, system and storage medium
CN115269107B (en) Method, medium and electronic device for processing interface image
CN114826921B (en) Dynamic network resource allocation method, system and medium based on sampling subgraph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant