CN114677544A - Scene graph generation method, system and equipment based on global context interaction - Google Patents

Scene graph generation method, system and equipment based on global context interaction Download PDF

Info

Publication number
CN114677544A
CN114677544A CN202210297025.7A CN202210297025A CN114677544A CN 114677544 A CN114677544 A CN 114677544A CN 202210297025 A CN202210297025 A CN 202210297025A CN 114677544 A CN114677544 A CN 114677544A
Authority
CN
China
Prior art keywords
target
feature
global
vector
gru
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210297025.7A
Other languages
Chinese (zh)
Other versions
CN114677544B (en
Inventor
罗敏楠
杨名帆
郑庆华
董怡翔
刘欢
秦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202210297025.7A priority Critical patent/CN114677544B/en
Publication of CN114677544A publication Critical patent/CN114677544A/en
Application granted granted Critical
Publication of CN114677544B publication Critical patent/CN114677544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了一种基于全局上下文交互的场景图生成方法及系统及设备,1)基于物体视觉特征、空间坐标、语义标签等多种特征融合的向量联合表示;2)基于双向门控循环神经网络的全局特征生成;3)基于全局特征向量的消息迭代传递机制;4)基于目标与关系状态表示的场景图生成。本发明所公开的基于全局上下文交互的场景图生成方法,同现存的场景图生成方法相比,通过上下文交互充分利用图像的全局特征,更具有应用广泛性;同时,得到上下文交互后的全局特征后进行目标对与其关系间的消息传递,利用目标间的潜在联系更新现有状态,进行更准确的场景图生成,具有实际应用的优势。

Figure 202210297025

The invention discloses a scene graph generation method, system and device based on global context interaction, 1) vector joint representation based on the fusion of various features such as object visual features, spatial coordinates, semantic labels, etc.; 2) based on bidirectional gated cyclic neural network Network global feature generation; 3) message iterative delivery mechanism based on global feature vector; 4) scene graph generation based on target and relation state representation. Compared with the existing scene graph generation method, the scene graph generation method based on the global context interaction disclosed in the present invention fully utilizes the global features of the image through context interaction, and has more extensive application; meanwhile, the global features after context interaction are obtained. Then, the message transfer between the target pair and its relationship is carried out, and the existing state is updated by using the potential connection between the targets to generate a more accurate scene graph, which has the advantage of practical application.

Figure 202210297025

Description

一种基于全局上下文交互的场景图生成方法及系统及设备A method, system and device for generating scene graph based on global context interaction

技术领域technical field

本发明属于计算机视觉领域,特别涉及一种基于全局上下文交互的场景图生成方法及系统及设备。The invention belongs to the field of computer vision, and in particular relates to a scene graph generation method, system and device based on global context interaction.

背景技术Background technique

由<主语-关系-宾语>三元组构成的场景图能够描述图像中的物体及物体对之间的场景结构关系。场景图主要有两个方面的优点:首先,场景图的<主语-关系-宾语>三元组具有结构化的语义内容,相较于自然语言文本,在细粒化的信息获取与处理过程中有明显优势;其次,场景图能够充分表示图像中的物体及场景结构关系,在多种计算机视觉任务中有广泛的应用前景,例如:在车辆自动驾驶领域,使用场景图进行环境建模可以为决策系统提供更全面的环境信息;在语义图像检索任务中,图像供应商通过场景图对图像的场景结构关系进行建模,使得用户仅需要对主要目标或关系进行描述即可检索到符合需求的图像。基于海量图片以及下游任务对场景图的实时要求,使用计算机进行场景图生成逐渐成为研究热点,对图像理解领域具有重要的意义。The scene graph composed of <subject-relation-object> triples can describe the scene structure relationship between objects and object pairs in the image. The scene graph has two main advantages: First, the <subject-relation-object> triplet of the scene graph has structured semantic content. Compared with natural language text, in the process of fine-grained information acquisition and processing There are obvious advantages; secondly, the scene graph can fully represent the objects in the image and the structure relationship of the scene, and has a wide range of application prospects in a variety of computer vision tasks. The decision-making system provides more comprehensive environmental information; in the semantic image retrieval task, the image supplier models the scene structure relationship of the image through the scene graph, so that the user only needs to describe the main target or relationship to retrieve the desired image. image. Based on the massive images and the real-time requirements of downstream tasks for scene graphs, the use of computers to generate scene graphs has gradually become a research hotspot, which is of great significance to the field of image understanding.

现有的基于消息传递的场景图生成方法目标检查的结果构建目标节点和关系边,并基于消息传递机制,利用循环神经网络在局部子图内进行状态更新,将消息传递后的特征用于关系预测。此种方法采用基于局部上下文思想的消息传递机制,忽略目标之间的隐含约束,仅将目标节点的视觉特征作为初始状态,对关系的检测仅依赖于其主宾语节点特征、联合视觉特征的反复交流,模型无法考虑图像的整体结构,全局信息未在关系预测中发挥作用,因此,限制了模型的预测能力。此外,现有方法未能利用物体坐标,没有从空间角度分析目标间的视觉关系。针对以上问题,本发明提出了一种基于全局上下文交互的场景图生成方法。对现存的场景图生成方法:The existing message-passing-based scene graph generation method constructs target nodes and relation edges based on the result of target inspection, and uses recurrent neural network to update the state in the local subgraph based on the message-passing mechanism, and uses the message-passing features for the relation predict. This method adopts the message passing mechanism based on the idea of local context, ignores the implicit constraints between the targets, only takes the visual features of the target nodes as the initial state, and the detection of the relationship only depends on its subject-object node features and joint visual features. After repeated communication, the model cannot consider the overall structure of the image, and global information does not play a role in relation prediction, therefore, limiting the predictive ability of the model. In addition, existing methods fail to utilize object coordinates and do not analyze the visual relationship between objects from a spatial perspective. In view of the above problems, the present invention proposes a scene graph generation method based on global context interaction. For existing scene graph generation methods:

现有技术1提出了一种图像场景图生成方法,该方法采用将关系分为父类与子类的方式,进行双重关系预测,并采用归一化函数确定精确关系,生成该图像的场景图。The prior art 1 proposes a method for generating an image scene graph. The method adopts a method of dividing relationships into parent classes and subclasses, performs dual relationship prediction, and uses a normalization function to determine the exact relationship to generate a scene graph of the image. .

现有技术2提出了一种基于深度关系自注意力网络的场景图生成方法,方法主要包括:首先,对输入图像进行目标检测,获得标签、物体边框特征、联合边框特征;然后,构建目标特征、相对关系特征;最后,利用深度神经网络生成最终的视觉场景图。The prior art 2 proposes a scene graph generation method based on a deep relational self-attention network. The method mainly includes: first, performing target detection on an input image to obtain labels, object frame features, and joint frame features; then, constructing target features , relative relationship features; finally, a deep neural network is used to generate the final visual scene graph.

现有技术1中的场景图生成方法没有考虑以特征融合方式充分利用特征向量;现有技术2的方法未使用消息传递机制,没有考虑进行目标对与其关系间的信息交互,不能进行上下文传递后的状态更新。且两者均没有使用图像中全体目标之间存在的隐含约束来构建上下文,存在一定不足。The scene graph generation method in the prior art 1 does not consider fully utilizing the feature vector by feature fusion; the method in the prior art 2 does not use the message passing mechanism, does not consider the information exchange between the target pair and its relationship, and cannot perform context transfer. status update. And neither of them uses the implicit constraints that exist among all the objects in the image to construct the context, which has certain shortcomings.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提供一种基于全局上下文交互的场景图生成方法及系统及设备,以解决上述问题。The purpose of the present invention is to provide a scene graph generation method, system and device based on global context interaction to solve the above problems.

为实现上述目的,本发明采用以下技术方案:To achieve the above object, the present invention adopts the following technical solutions:

与现有技术相比,本发明有以下技术效果:Compared with the prior art, the present invention has the following technical effects:

本发明相较于使用视觉特征代表目标特征的特征表示方法,本发明充分利用目标视觉特征、类别特征与空间坐标信息,使得本发明对信息利用更加充分,提升了场景图生成的关系预测性能;Compared with the feature representation method that uses visual features to represent target features, the present invention makes full use of target visual features, category features and spatial coordinate information, so that the present invention utilizes information more fully and improves the relationship prediction performance of scene graph generation;

本发明相较于使用局部上下文交互的场景图生成方法,本发明利用循环神经网络进行图像的全局上下文提取,实现基于全局上下文的信息交互,随后进行消息传递,充分实现数据交互与信息拓展。Compared with the scene graph generation method using local context interaction, the present invention uses the cyclic neural network to extract the global context of the image, realizes information interaction based on the global context, and then performs message transmission, fully realizing data interaction and information expansion.

附图说明Description of drawings

图1是本发明基于全局上下文交互的场景图生成方法框图。FIG. 1 is a block diagram of a scene graph generation method based on global context interaction according to the present invention.

图2是基于特征融合的向量联合表示的流程图。Figure 2 is a flowchart of a vector joint representation based on feature fusion.

图3是双向门控循环神经网络BiGRU的结构图。Figure 3 is a structural diagram of the bidirectional gated recurrent neural network BiGRU.

图4是基于全局特征向量的消息迭代传递机制的流程图。Figure 4 is a flow chart of the message iterative delivery mechanism based on the global feature vector.

图5是目标检测结果及对应场景图示意图。FIG. 5 is a schematic diagram of a target detection result and a corresponding scene graph.

图6是本发明性能测试结果图。Fig. 6 is the performance test result graph of the present invention.

具体实施方式Detailed ways

以下结合附图及实施例对本发明的实施方式进行详细说明。需要说明的是,此处描述的实施例只用以解释本发明,并不用于限定本发明。此外,在不冲突的情况下,本发明中的实施例涉及的技术特征可以相互结合。The embodiments of the present invention will be described in detail below with reference to the accompanying drawings and examples. It should be noted that the embodiments described herein are only used to explain the present invention, and are not used to limit the present invention. In addition, the technical features involved in the embodiments of the present invention may be combined with each other without conflict.

本发明的具体实施过程包括图像的目标检测与特征向量融合、基于全局上下文交互的特征生成和消息传递过程。图1是本发明基于全局上下文交互的场景图生成方法框图。The specific implementation process of the present invention includes image target detection and feature vector fusion, feature generation based on global context interaction, and message transmission processes. FIG. 1 is a block diagram of a scene graph generation method based on global context interaction according to the present invention.

1.图像的目标检测与特征向量融合1. Image target detection and feature vector fusion

给出输入图像后,本发明使用Faster-RCNN深度学习模型进行目标检测,得到其目标集合O=(o1,o2,…,on),对应的视觉特征集合V=(v1,v2,…,vn),坐标特征集合B=(b1,b2,…,bn)、预分类标签集合L=(l1,l2,…,ln)、两两目标坐标并集框内的视觉特征C=(ci→j,i≠j)。After the input image is given, the present invention uses the Faster- RCNN deep learning model for target detection, and obtains its target set O=(o 1 ,o 2 ,...,on ), and the corresponding visual feature set V=(v 1 ,v 2 ,...,v n ), coordinate feature set B=(b 1 ,b 2 ,...,b n ), pre-classification label set L=(l 1 ,l 2 ,...,l n ), pairwise target coordinates The visual feature C=( ci→j , i≠j) in the set frame.

首先,本发明使用特征融合方法,对每个目标对应的空间坐标特征bi、视觉特征的向量vi进行联合表示。对于目标oi,其绝对位置坐标b=(x1,y1,x2,y2),其中x1,y1,x2,y2分别代表其矩形回归框左上与右下坐标,本发明利用如下公式将其转化为在图像中相对位置编码biFirst, the present invention uses a feature fusion method to jointly represent the spatial coordinate feature b i corresponding to each target and the vector v i of the visual feature. For the target o i , its absolute position coordinates b=(x 1 , y 1 , x 2 , y 2 ), where x 1 , y 1 , x 2 , y 2 represent the upper left and lower right coordinates of its rectangular regression frame, respectively. The invention uses the following formula to convert it into the relative position code bi in the image:

Figure BDA0003563856330000031
Figure BDA0003563856330000031

式中,wid代表图像I原有宽度,hei代表图像I原有高度。In the formula, wid represents the original width of image I, and hei represents the original height of image I.

然后,使用神经网络的全连接层将相对位置编码bi扩充为128维特征siThen, the relative position encoding b i is augmented into a 128-dimensional feature s i using the fully connected layers of the neural network:

si=σ(Wsbi+bs),s i =σ(W s b i +b s ),

其中,σ代表ReLU激活函数,Ws与bs为线性变换参数,由神经网络自行学习调整。同时,本方法使用全连接层将目标视觉特征vi由4096维特征转为512维。Among them, σ represents the ReLU activation function, W s and b s are linear transformation parameters, which are learned and adjusted by the neural network. At the same time, this method uses a fully connected layer to convert the target visual feature v i from 4096-dimensional features to 512-dimensional features.

随后,本发明将经过维度变换的相对位置特征向量si和视觉特征vi进行拼接并维度变换,得到512维目标视觉与坐标特征融合向量fi,计算流程如下所示:Subsequently, the present invention splices and dimensionally transforms the relative position feature vector si and the visual feature vi that have undergone dimension transformation to obtain a 512-dimensional target vision and coordinate feature fusion vector f i , and the calculation process is as follows:

fi=σ(Wf[si,vi]+bf),f i =σ(W f [ s i ,vi ]+b f ),

式中,[·]代表拼接操作,σ代表ReLU激活函数,Wf与bf为线性变换参数。In the formula, [ ] represents the splicing operation, σ represents the ReLU activation function, and W f and b f are linear transformation parameters.

以上特征向量融合流程如图2所示。The above feature vector fusion process is shown in Figure 2.

2.基于双向门控循环神经网络的全局特征生成2. Global feature generation based on bidirectional gated recurrent neural network

在全局特征生成过程中,本发明构建双向门控循环神经网络BiGRU,并使用零向量作为其初始状态,其结构如图3所示。在得到目标集合的特征融合向量F=(f1,f2,…,fn)后,将其按照相对坐标中的第一项x坐标由左向右进行排序,并按序输入BiGRU中,得到全局上下文目标特征γ=(γ12,…,γn)。具体生成步骤为:In the process of global feature generation, the present invention constructs a bidirectional gated recurrent neural network BiGRU, and uses the zero vector as its initial state, the structure of which is shown in Figure 3. After obtaining the feature fusion vector F=(f 1 , f 2 ,..., f n ) of the target set, sort it from left to right according to the x coordinate of the first item in the relative coordinates, and input it into BiGRU in sequence, Obtain the global context target feature γ=(γ 12 ,...,γ n ). The specific generation steps are:

(1)初始化零向量作为BiGRU初始状态;(1) Initialize the zero vector as the initial state of BiGRU;

(2)在BiGRU两端,分别将目标集合中的第一个与最后一个特征融合向量f0与fn输入,生成对应方向与顺序的隐藏状态

Figure BDA0003563856330000041
(2) At both ends of BiGRU, input the first and last feature fusion vectors f 0 and f n in the target set, respectively, to generate hidden states corresponding to the direction and order.
Figure BDA0003563856330000041

(3)按序依次向BiGRU两端输入特征向量,生成

Figure BDA0003563856330000042
(3) Input feature vectors to both ends of BiGRU in sequence to generate
Figure BDA0003563856330000042

(4)将正向、逆向隐藏状态融合,得到每个目标的上下文融合状态γi(4) Integrate the forward and reverse hidden states to obtain the context fusion state γ i of each target.

随后,本发明利用Glove词嵌入向量,将目标检测过程中对目标的预分类结果L=(l1,l2,…,ln)转换为128维的目标类别特征向量giSubsequently, the present invention uses the Glove word embedding vector to convert the pre-classification result L=(l 1 ,l 2 ,...,l n ) of the target in the target detection process into a 128-dimensional target category feature vector g i .

最后,本发明使用神经网络全连接层将每个目标的全局上下文目标特征γi与其类别特征向量gi进行融合,得到此目标的全局特征ci。上述计算过程如公式所示:Finally, the present invention uses a neural network fully connected layer to fuse the global contextual target feature γ i of each target with its category feature vector gi to obtain the global feature ci of the target. The above calculation process is shown in the formula:

gi=Glove(li),g i =Glove(li i ),

ci=σ(Wci,gi]+bc),c i =σ(W ci , gi ]+b c ),

其中,Glove(li)代表使用Glove方式对目标的预分类标签进行编码,[·]代表拼接操作,Wc与bc为线性变换参数。Among them, Glove(l i ) represents the use of Glove to encode the pre-classification label of the target, [ ] represents the splicing operation, and W c and b c are linear transformation parameters.

3.基于全局特征向量的消息迭代传递机制3. Message iterative delivery mechanism based on global feature vector

消息迭代传递机制分为消息聚合函数和状态更新函数两部分。The message iterative delivery mechanism is divided into two parts: the message aggregation function and the state update function.

首先,本发明构建消息聚合函数:在场景图拓扑中,节点与边分别表示视觉关系中的主宾语目标及其关系,在消息传递时,单一节点或边会同时收到多个来源的信息,需要设计池化函数以计算每部分消息的权重,并使用其加权和以聚合最终的传入消息。根据消息的接收者不同,可将传入消息为由目标节点接收的消息

Figure BDA0003563856330000051
与由关系边接收的消息
Figure BDA0003563856330000052
First, the present invention constructs a message aggregation function: in the scene graph topology, nodes and edges respectively represent the subject-object target and its relationship in the visual relationship, and during message transmission, a single node or edge will receive information from multiple sources at the same time, A pooling function needs to be designed to calculate the weight of each part of the message and use its weighted sum to aggregate the final incoming message. Depending on the recipient of the message, incoming messages can be treated as messages received by the destination node
Figure BDA0003563856330000051
with messages received by relational edges
Figure BDA0003563856330000052

已知当前节点GRU和关系边GRU的隐藏状态

Figure BDA0003563856330000053
Figure BDA0003563856330000054
将第t次迭代时传入第i个节点的消息表示为
Figure BDA0003563856330000055
由目标GRU自身隐藏状态
Figure BDA0003563856330000056
其出度边GRU隐藏状态
Figure BDA0003563856330000057
入度边隐藏状态
Figure BDA0003563856330000058
计算得到,其中i→j代表此关系中目标i为主语,目标j为宾语。Know the hidden state of the current node GRU and relation edge GRU
Figure BDA0003563856330000053
and
Figure BDA0003563856330000054
Denote the message passed to the i-th node at the t-th iteration as
Figure BDA0003563856330000055
Hidden state by the target GRU itself
Figure BDA0003563856330000056
Its out-degree edge GRU hidden state
Figure BDA0003563856330000057
In-degree edge hidden state
Figure BDA0003563856330000058
Calculated, where i→j represents the target i as the subject and the target j as the object in this relation.

Figure BDA0003563856330000059
Figure BDA0003563856330000059

相似的,第t次迭代时由第i个目标节点到第j个目标节点的关系边,其聚合消息为

Figure BDA00035638563300000510
由关系边GRU的上一迭代对应的隐藏状态
Figure BDA00035638563300000511
主语节点GRU隐藏状态
Figure BDA00035638563300000512
宾语节点GRU隐藏状态
Figure BDA00035638563300000513
组成。
Figure BDA00035638563300000514
Figure BDA00035638563300000515
由以下自适应加权函数求得:Similarly, in the t-th iteration, from the i-th target node to the j-th target node, the aggregated message is
Figure BDA00035638563300000510
The hidden state corresponding to the previous iteration of the relational edge GRU
Figure BDA00035638563300000511
Subject node GRU hidden state
Figure BDA00035638563300000512
Object Node GRU Hidden State
Figure BDA00035638563300000513
composition.
Figure BDA00035638563300000514
and
Figure BDA00035638563300000515
It is obtained by the following adaptive weighting function:

Figure BDA00035638563300000516
Figure BDA00035638563300000516

其中,[·]代表拼接操作,σ代表ReLU激活函数,w1、w2和v1、v2是可学习参数。Among them, [ ] represents the splicing operation, σ represents the ReLU activation function, and w 1 , w 2 and v 1 , v 2 are learnable parameters.

其次,本发明构建状态更新函数:分别构建目标节点GRU和关系边GRU,对目标和目标间关系的特征向量

Figure BDA0003563856330000061
的存储和更新。首先,在t=0时,将每个目标节点与关系边的GRU状态初始化为零向量,将目标的全局特征向量ci作为目标节点GRU的输入,将两两目标坐标并集框内的视觉特征ci→j作为其关系边GRU的输入,分别生成目标节点和关系边在初始时刻的隐藏状态
Figure BDA0003563856330000062
Secondly, the present invention constructs a state update function: constructing the target node GRU and the relationship edge GRU respectively, and the feature vector of the relationship between the target and the target
Figure BDA0003563856330000061
storage and updating. First, at t=0, initialize the GRU state of each target node and relation edge to a zero vector, use the global feature vector c i of the target as the input of the target node GRU, and combine the pairwise target coordinates to set the visual in the frame. The feature c i→j is used as the input of its relational edge GRU to generate the hidden state of the target node and relational edge at the initial moment, respectively
Figure BDA0003563856330000062

在后续迭代中,每一次迭代t,每个GRU,根据其是目标GRU或关系GRU,将其上一迭代的隐藏状态

Figure BDA0003563856330000063
Figure BDA0003563856330000064
和上一迭代的传入消息
Figure BDA0003563856330000065
Figure BDA0003563856330000066
作为输入,并生成一个新的隐藏状态
Figure BDA0003563856330000067
Figure BDA0003563856330000068
作为输出,用于消息聚合函数生成下一次迭代的消息。In subsequent iterations, at each iteration t, each GRU, depending on whether it is a target GRU or a relational GRU, converts the hidden state of the previous iteration
Figure BDA0003563856330000063
or
Figure BDA0003563856330000064
and incoming messages from the previous iteration
Figure BDA0003563856330000065
or
Figure BDA0003563856330000066
as input, and generate a new hidden state
Figure BDA0003563856330000067
or
Figure BDA0003563856330000068
As output, the message used for the message aggregation function to generate the message for the next iteration.

Figure BDA0003563856330000069
Figure BDA0003563856330000069

Figure BDA00035638563300000610
Figure BDA00035638563300000610

故整个消息传递机制的具体步骤为:Therefore, the specific steps of the entire message passing mechanism are as follows:

(1)将每个目标节点与关系边的GRU状态初始化为零向量;(1) Initialize the GRU state of each target node and relation edge to a zero vector;

(2)将目标的全局特征向量ci作为目标节点GRU的输入,将两两目标坐标并集框内的视觉特征ci→j作为其关系边GRU的输入,分别生成目标节点和关系边在初始时刻的隐藏状态

Figure BDA00035638563300000611
(2) The global feature vector c i of the target is used as the input of the target node GRU, and the visual feature c i→j in the union frame of the paired target coordinates is used as the input of its relational edge GRU, and the target node and relational edge are generated respectively in Hidden state at the initial moment
Figure BDA00035638563300000611

(3)利用消息聚合函数,计算每个目标与关系的接收到的消息

Figure BDA00035638563300000612
Figure BDA00035638563300000613
(3) Using the message aggregation function, calculate the received messages for each target and relationship
Figure BDA00035638563300000612
and
Figure BDA00035638563300000613

(4)结合隐藏状态

Figure BDA00035638563300000614
接受到的消息
Figure BDA00035638563300000615
Figure BDA00035638563300000616
利用GRU更新状态,得到下一时刻状态
Figure BDA00035638563300000617
(4) Combine the hidden state
Figure BDA00035638563300000614
received message
Figure BDA00035638563300000615
and
Figure BDA00035638563300000616
Use GRU to update the state to get the state at the next moment
Figure BDA00035638563300000617

(5)若迭代次数达到设定次数,则保存当前目标与关系的状态

Figure BDA00035638563300000618
否则,返回步骤(3)。(5) If the number of iterations reaches the set number, save the current state of the target and relationship
Figure BDA00035638563300000618
Otherwise, go back to step (3).

上述消息传递机制流程如图4所示。The flow of the above message passing mechanism is shown in FIG. 4 .

4.基于目标与关系状态表示的场景图生成4. Scene graph generation based on object and relational state representation

将经过消息传递机制更新后的目标与关系隐藏状态视为目标与关系的特征向量,送入神经网络中,使用softmax函数对目标、关系分别进行类别预测,得到每个目标的种类,以及每一对目标之间的关系类别,进而得到能够反映图像中目标与目标间关系的场景图。The target and relationship hidden states updated by the message passing mechanism are regarded as the feature vector of the target and the relationship, and sent to the neural network. The softmax function is used to predict the target and relationship respectively, and the type of each target and For the relationship categories between objects, a scene graph that can reflect the relationship between objects and objects in the image is obtained.

给定输入图像后,目标检测结果及对应场景图示意图如图5所示,本模型的性能测试结果如图6所示。Given the input image, the target detection results and the corresponding scene graph are shown in Figure 5, and the performance test results of this model are shown in Figure 6.

本发明再一实施例中,提供一种基于全局上下文交互的场景图生成系统,能够用于实现上述的基于全局上下文交互的场景图生成方法,具体的,该系统包括:In yet another embodiment of the present invention, a system for generating a scene graph based on global context interaction is provided, which can be used to implement the above-mentioned method for generating scene graphs based on global context interaction. Specifically, the system includes:

目标检测模块,用于对输入图像I进行目标检测,得到其目标集合O=(o1,o2,…,on),以及对应的视觉特征集合V=(v1,v2,…,vn)、坐标特征集合B=(b1,b2,…,bn)、预分类标签集合L=(l1,l2,…,ln)、两两目标坐标并集框内的视觉特征C=(ci→j,i≠j);The target detection module is used for target detection on the input image I to obtain its target set O=(o 1 , o 2 ,...,on ), and the corresponding visual feature set V=(v 1 ,v 2 ,..., v n ), coordinate feature set B=(b 1 ,b 2 ,...,b n ), pre-classification label set L=(l 1 ,l 2 ,...,l n ), in the frame of the union of pairwise target coordinates Visual feature C=( ci→j , i≠j);

目标视觉与坐标特征的联合表示向量获取模块,用于利用神经网络将各目标的绝对位置坐标,转化得到目标视觉与坐标特征的联合表示向量fiThe joint representation vector acquisition module of target vision and coordinate features is used to convert the absolute position coordinates of each target using a neural network to obtain a joint representation vector f i of target vision and coordinate features;

目标全局特征获取模块,用于根据特征融合向量F=(f1,f2,…,fn),得到局上下文目标特征γi与其类别特征向量gi,使用神经网络将目标的全局上下文目标特征γi与其类别特征向量gi进行融合,得到此目标的全局特征ciThe target global feature acquisition module is used to obtain the local context target feature γ i and its category feature vector g i according to the feature fusion vector F=(f 1 , f 2 ,..., f n ), and use the neural network to fuse the target global context target The feature γ i is fused with its category feature vector gi to obtain the global feature ci of the target;

场景图获取模块,用于基于每个目标的全局特征向量ci,每个关系的特征向量ci→j,初始化其隐藏状态

Figure BDA0003563856330000071
进而初始计算各节点传入消息
Figure BDA0003563856330000072
各边传入消息
Figure BDA0003563856330000073
并进行迭代传递,利用循环神经网络更新隐藏状态
Figure BDA0003563856330000074
并进行消息聚合得到各时刻i的传入消息
Figure BDA0003563856330000075
直至达到设置的迭代次数,然后利用目标节点与关系边的最终状态生成能够反映图像中目标与目标间关系的场景图。A scene graph acquisition module for initializing its hidden state based on the global feature vector c i of each target and the feature vector c i→j of each relation
Figure BDA0003563856330000071
Then initially calculate the incoming messages of each node
Figure BDA0003563856330000072
incoming messages from all sides
Figure BDA0003563856330000073
And iterative pass, using recurrent neural network to update the hidden state
Figure BDA0003563856330000074
And perform message aggregation to get the incoming message at each time i
Figure BDA0003563856330000075
Until the set number of iterations is reached, and then use the final state of the target node and the relationship edge to generate a scene graph that can reflect the relationship between the target and the target in the image.

本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本发明各个实施例中的各功能模块可以集成在一个处理器中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。The division of modules in the embodiments of the present invention is schematic, and is only a logical function division. In actual implementation, there may be other division methods. In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit. In the device, it can also exist physically alone, or two or more modules can be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.

本发明再一个实施例中,提供了一种计算机设备,该计算机设备包括处理器以及存储器,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器用于执行所述计算机存储介质存储的程序指令。处理器可能是中央处理单元(CentralProcessing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital SignalProcessor、DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable GateArray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其是终端的计算核心以及控制核心,其适于实现一条或一条以上指令,具体适于加载并执行计算机存储介质内一条或一条以上指令从而实现相应方法流程或相应功能;本发明实施例所述的处理器可以用于基于全局上下文交互的场景图生成方法的操作。In yet another embodiment of the present invention, a computer device is provided, the computer device includes a processor and a memory, the memory is used for storing a computer program, the computer program includes program instructions, and the processor is used for executing the computer Program instructions stored in the storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable GateArray, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computing core and control core of the terminal, which are suitable for implementing one or more instructions, specifically suitable for One or more instructions in the computer storage medium are loaded and executed to implement the corresponding method process or corresponding function; the processor according to the embodiment of the present invention can be used for the operation of the scene graph generation method based on the global context interaction.

本发明公开了一种基于全局上下文交互的场景图生成方法,1)基于物体视觉特征、空间坐标、语义标签等多种特征融合的向量联合表示;2)基于双向门控循环神经网络的全局特征生成;3)基于全局特征向量的消息迭代传递机制;4)基于目标与关系状态表示的场景图生成。本发明所公开的基于全局上下文交互的场景图生成方法,同现存的场景图生成方法相比,通过上下文交互充分利用图像的全局特征,更具有应用广泛性;同时,得到上下文交互后的全局特征后进行目标对与其关系间的消息传递,利用目标间的潜在联系更新现有状态,进行更准确的场景图生成,具有实际应用的优势。The invention discloses a scene graph generation method based on global context interaction, 1) vector joint representation based on the fusion of various features such as object visual features, spatial coordinates, semantic labels, etc.; 2) global features based on bidirectional gated cyclic neural network generation; 3) message iterative delivery mechanism based on global feature vector; 4) scene graph generation based on target and relation state representation. Compared with the existing scene graph generation method, the method for generating a scene graph based on global context interaction disclosed in the present invention fully utilizes the global feature of the image through context interaction, and has more extensive application; at the same time, the global feature after context interaction is obtained. Then, the message transfer between the target pair and its relationship is carried out, and the existing state is updated by using the potential connection between the targets to generate a more accurate scene graph, which has the advantage of practical application.

最后应当说明的是:以上实施例仅用以说明本发明的技术方案而非对其限制,尽管参照上述实施例对本发明进行了详细的说明,所属领域的普通技术人员应当理解:依然可以对本发明的具体实施方式进行修改或者等同替换,而未脱离本发明精神和范围的任何修改或者等同替换,其均应涵盖在本发明的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than to limit them. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: the present invention can still be Modifications or equivalent replacements are made to the specific embodiments of the present invention, and any modifications or equivalent replacements that do not depart from the spirit and scope of the present invention shall be included within the protection scope of the claims of the present invention.

Claims (10)

1. A scene graph generation method based on global context interaction is characterized by comprising
Carrying out target detection on the input image I to obtain a target set O ═ O1,o2,…,on) And a corresponding set of visual features V ═ (V ═ V)1,v2,…,vn) And the coordinate feature set B ═ B1,b2,…,bn) And the set of pre-classified labels L ═ L1,l2,…,ln) And (C) combining the two target coordinates and collecting the visual characteristics C in the framei→j,i≠j);
The absolute position coordinates of each target are converted by utilizing a neural network to obtain a joint expression vector f of the target vision and coordinate characteristicsi
According to the feature fusion vector F ═ F1,f2,…,fn) Obtaining local context target characteristics gammaiAnd its class feature vector giUsing a neural network to map the global context target feature gamma of the targetiAnd its class feature vector giFusing to obtain the global feature c of the targeti
Global feature vector c based on each targetiFeature vector c of each relationi→jInitialize its hidden state
Figure FDA0003563856320000011
Further, each node incoming message is initially calculated
Figure FDA0003563856320000012
Each side incoming message
Figure FDA0003563856320000013
And performing iterative transfer, and updating hidden state by using recurrent neural network
Figure FDA0003563856320000014
And carrying out message aggregation to obtain the incoming message of each time i
Figure FDA0003563856320000015
And generating a scene graph capable of reflecting the relation between the target and the target in the image by using the final states of the target node and the relation edge until the set iteration number is reached.
2. The method as claimed in claim 1, wherein the neural network is used to convert the absolute position coordinates of each target into relative position codes in the image and expand the relative position codes into relative position features siVisual features v of the objectiConverting into 512 dimensions, adopting a feature fusion method to obtain a relative position feature vector siAnd a visual feature viSplicing and converting to obtain a joint expression vector f of the target vision and the coordinate characteristicsi
3. The method as claimed in claim 2, wherein in the feature fusion-based vector joint representation, after target detection is performed on the input image I by using the fast-RCNN model, the absolute position coordinates of the target are converted into the relative position code b in the imageiFor the object oiIts coordinate (x)1,y1,x2,y2) Wherein x is1,y1,x2,y2Respectively representing the upper left coordinate and the lower right coordinate of a rectangular regression frame, and a relative position code calculation formula:
Figure FDA0003563856320000021
in the formula, wid represents the original width of the image I, hei represents the original height of the image I; then, the relative position is encoded b using the full connection layeriExtended to 128-dimensional features si
si=σ(Wsbi+bs),
Where σ represents the ReLU activation function, WsAnd bsThe parameters are linear transformation parameters and are automatically learned and adjusted by a neural network; meanwhile, the visual characteristics v of the target are obtained by detecting the target by the same methodiPerforming dimension transformation, and converting 4096-dimensional features into 512-dimensional features by using a full connection layer; then, the relative position feature vector s subjected to dimension transformationiAnd a visual feature viSplicing and converting are carried out, and finally a 512-dimensional target vision and coordinate feature fusion vector f is obtainediThe calculation flow is as follows:
fi=σ(Wf[si,vi]+bf),
in the formula [ ·]Represents the stitching operation, σ represents the ReLU activation function, WfAnd bfAre linear transformation parameters.
4. The method of claim 1, wherein the fusion vector is F ═ F (F) according to features1,f2,…,fn) Obtaining global context target characteristic gamma (gamma) by using bidirectional gating recurrent neural network (BiGRU)1,γ2,…,γn) (ii) a Classifying the target by the target detection module to obtain a result L ═ L1,l2,…,ln) To obtain the category feature vector g of each targetiUsing a neural network to characterize the global context of the target by the target feature gammaiAnd its class feature vector giPerforming fusion to obtain the global characteristic c of the targeti
5. The method as claimed in claim 4, wherein in the global feature generation process based on the bidirectional gated recurrent neural network, a feature fusion vector F ═ F (F) of the target set is obtained1,f2,…,fn) Then, it is expressed as x-coordinate in relative coordinatesSequencing from left to right, and inputting the sequence into a bidirectional gated recurrent neural network BiGRU to realize global context interaction to obtain a global context target characteristic gamma (gamma is ═1,γ2,…,γn);
Subsequently, the classification result L ═ of the target using the target detection (L)1,l2,…,ln) Calculating a Glove word embedding vector of the classification label to obtain a 128-dimensional target class feature vector giFinally, the global context target feature gamma of each target is determinediAnd its class feature vector giFusing to obtain the global feature c of the targetiThe above calculation process is shown as the formula:
gi=Glove(li),
ci=σ(Wci,gi]+bc),
wherein, Glove (l)i) Represents the encoding of a pre-sorted tag of an object using the Glove approach [. cndot]Representing a splicing operation, WcAnd bcAre linear transformation parameters.
6. The method as claimed in claim 5, wherein γ is a set of parametersiThe specific generation steps are as follows:
(1) initializing a zero vector as a BiGRU initial state;
(2) at two ends of the BiGRU, respectively fusing the first and the last feature fusion vectors f in the target set0And fnInputting, generating hidden states corresponding to the direction and sequence
Figure FDA0003563856320000031
(3) Sequentially inputting feature vectors to two ends of the BiGRU to generate
Figure FDA0003563856320000032
(4) Fusing the forward and reverse hidden states to obtain the context of each targetFusion state gammai
7. The method for generating a scene graph based on global context interaction according to claim 1, wherein the message iterative transfer mechanism based on the global feature vector comprises two calculation functions of constructing a message aggregation function and a state update function;
constructing a message aggregation function: known ith target node GRU hidden state
Figure FDA0003563856320000033
Hidden state of relationship edge GRU from ith target node to jth target node
Figure FDA0003563856320000034
The message that is transmitted into the ith node at the t iteration is represented as
Figure FDA0003563856320000035
Then
Figure FDA0003563856320000036
Hidden state by the target GRU itself
Figure FDA0003563856320000037
Its out-of-range GRU hidden state
Figure FDA0003563856320000038
In-degree edge hidden state
Figure FDA0003563856320000039
Calculated, where i → j represents that target i is the subject and target j is the object in the relationship:
Figure FDA00035638563200000310
similarly, the ith target node goes to the jth target node at the tth iterationAggregated messages for relational edges of individual target nodes
Figure FDA00035638563200000311
Hidden states corresponding to last iteration of relational edge GRU
Figure FDA00035638563200000312
Subject node GRU hidden state
Figure FDA0003563856320000041
Object node GRU hidden state
Figure FDA0003563856320000042
The components of the components are as follows,
Figure FDA0003563856320000043
and
Figure FDA0003563856320000044
the following adaptive weighting function:
Figure FDA0003563856320000045
wherein [ ·]Represents the stitching operation, σ represents the ReLU activation function, w1、w2And v1、v2Is a learnable parameter;
constructing a state updating function: respectively constructing a target node GRU and a relation edge GRU, and carrying out feature vector on the relation between the targets
Figure FDA0003563856320000046
Storage and update of (2): first, when t is 0, the GRU state of each target node and relationship edge is initialized to a zero vector, and the global feature vector c of the target is initialized to a zero vectoriAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs input to its relational edge GRU, respectively generate targetsHidden states of nodes and relational edges at initial time
Figure FDA0003563856320000047
In subsequent iterations, each iteration t, each GRU, depending on whether it is a target GRU or a relationship GRU, will have its previous iteration hidden state
Figure FDA0003563856320000048
Or
Figure FDA0003563856320000049
And the incoming message of the previous iteration
Figure FDA00035638563200000410
Or
Figure FDA00035638563200000411
As input, and generates a new hidden state
Figure FDA00035638563200000412
Or
Figure FDA00035638563200000413
As an output, the message aggregation function generates a message for the next iteration:
Figure FDA00035638563200000414
Figure FDA00035638563200000415
8. the method for generating a scene graph based on global context interaction according to claim 1, wherein a message iteration delivery mechanism based on a global feature vector specifically comprises the following steps:
(1) initializing GRU states of each target node and the relation edges into zero vectors;
(2) global feature vector c of targetiAs the input of the target node GRU, combining two target coordinates and collecting the visual characteristics c in the framei→jAs the input of the relation edge GRU, the hidden states of the target node and the relation edge at the initial time are respectively generated
Figure FDA00035638563200000416
(3) Computing received messages for each target and relationship using a message aggregation function
Figure FDA00035638563200000417
And with
Figure FDA00035638563200000422
(4) Combined with hidden state
Figure FDA00035638563200000419
Received message
Figure FDA00035638563200000420
And
Figure FDA00035638563200000421
utilizing GRU to update state to obtain state of next time
Figure FDA0003563856320000051
(5) If the iteration times reach the set times, the state of the current target and the relation is stored
Figure FDA0003563856320000052
Otherwise, returning to the step (3);
(6) and after the message is transmitted, sending the final state vector of the target and the relation into a neural network to obtain a scene graph capable of reflecting the relation between the target and the target in the image.
9. A scene graph generation system based on global context interaction, comprising:
a target detection module for performing target detection on the input image I to obtain a target set O ═ O1,o2,…,on) And a corresponding set of visual features V ═ V (V ═ V)1,v2,…,vn) And the coordinate feature set B ═ B1,b2,…,bn) And the pre-classified label set L ═ (L)1,l2,…,ln) And (C) combining the two target coordinates and collecting the visual characteristics C in the framei→j,i≠j);
A joint expression vector acquisition module of the target vision and coordinate characteristics, which is used for transforming the absolute position coordinates of each target by using a neural network to obtain a joint expression vector f of the target vision and coordinate characteristicsi
A target global feature obtaining module for obtaining (F) according to the feature fusion vector F1,f2,…,fn) Obtaining local context target characteristics gammaiAnd its class feature vector giUsing a neural network to map the global context target feature gamma of the targetiAnd its class feature vector giFusing to obtain the global feature c of the targeti
A scene graph acquisition module for acquiring a global feature vector c based on each targetiFeature vector c of each relationi→jInitialize its hidden state
Figure FDA0003563856320000053
Further, each node incoming message is initially calculated
Figure FDA0003563856320000054
Each side incoming message
Figure FDA0003563856320000055
And performing iterative transfer, and updating hidden state by using recurrent neural network
Figure FDA0003563856320000056
And carrying out message aggregation to obtain the incoming message of each time i
Figure FDA0003563856320000057
And generating a scene graph capable of reflecting the relation between the target and the target in the image by using the final states of the target node and the relation edge until the set iteration number is reached.
10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the scene graph generation method based on global context interaction according to any one of claims 1 to 8 when executing the computer program.
CN202210297025.7A 2022-03-24 2022-03-24 A scene graph generation method, system and device based on global context interaction Active CN114677544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210297025.7A CN114677544B (en) 2022-03-24 2022-03-24 A scene graph generation method, system and device based on global context interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210297025.7A CN114677544B (en) 2022-03-24 2022-03-24 A scene graph generation method, system and device based on global context interaction

Publications (2)

Publication Number Publication Date
CN114677544A true CN114677544A (en) 2022-06-28
CN114677544B CN114677544B (en) 2024-08-16

Family

ID=82073908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210297025.7A Active CN114677544B (en) 2022-03-24 2022-03-24 A scene graph generation method, system and device based on global context interaction

Country Status (1)

Country Link
CN (1) CN114677544B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546589A (en) * 2022-11-29 2022-12-30 浙江大学 An Image Generation Method Based on Graph Neural Network
CN118015522A (en) * 2024-03-22 2024-05-10 广东工业大学 Time transition regularization method and system for video scene graph generation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462282A (en) * 2020-04-02 2020-07-28 哈尔滨工程大学 Scene graph generation method
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
CN113221613A (en) * 2020-12-14 2021-08-06 国网浙江宁海县供电有限公司 Power scene early warning method for generating scene graph auxiliary modeling context information
CN113627557A (en) * 2021-08-19 2021-11-09 电子科技大学 A Scene Graph Generation Method Based on Context Graph Attention Mechanism
CN113836339A (en) * 2021-09-01 2021-12-24 淮阴工学院 Scene graph generation method based on global information and position embedding
KR20220025524A (en) * 2020-08-24 2022-03-03 경기대학교 산학협력단 System for generating scene graph using deep neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
CN111462282A (en) * 2020-04-02 2020-07-28 哈尔滨工程大学 Scene graph generation method
KR20220025524A (en) * 2020-08-24 2022-03-03 경기대학교 산학협력단 System for generating scene graph using deep neural network
CN113221613A (en) * 2020-12-14 2021-08-06 国网浙江宁海县供电有限公司 Power scene early warning method for generating scene graph auxiliary modeling context information
CN113627557A (en) * 2021-08-19 2021-11-09 电子科技大学 A Scene Graph Generation Method Based on Context Graph Attention Mechanism
CN113836339A (en) * 2021-09-01 2021-12-24 淮阴工学院 Scene graph generation method based on global information and position embedding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
兰红;刘秦邑;: "图注意力网络的场景图到图像生成模型", 中国图象图形学报, no. 08, 12 August 2020 (2020-08-12) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115546589A (en) * 2022-11-29 2022-12-30 浙江大学 An Image Generation Method Based on Graph Neural Network
CN115546589B (en) * 2022-11-29 2023-04-07 浙江大学 Image generation method based on graph neural network
CN118015522A (en) * 2024-03-22 2024-05-10 广东工业大学 Time transition regularization method and system for video scene graph generation

Also Published As

Publication number Publication date
CN114677544B (en) 2024-08-16

Similar Documents

Publication Publication Date Title
CN110084296B (en) A Graph Representation Learning Framework Based on Specific Semantics and Its Multi-label Classification Method
CN114048331A (en) A Knowledge Graph Recommendation Method and System Based on Improved KGAT Model
CN110188167A (en) An end-to-end dialogue method and system incorporating external knowledge
CN110399518A (en) A Visual Question Answering Enhancement Method Based on Graph Convolution
CN111191526A (en) Pedestrian attribute recognition network training method, system, medium and terminal
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN113095346A (en) Data labeling method and data labeling device
CN114677544B (en) A scene graph generation method, system and device based on global context interaction
CN111462324A (en) Online spatiotemporal semantic fusion method and system
Sutanto et al. Learning equality constraints for motion planning on manifolds
US11270425B2 (en) Coordinate estimation on n-spheres with spherical regression
CN118393329B (en) A system for testing the performance of AI chips in model training and reasoning
CN110196928A (en) Fully parallelized end-to-end more wheel conversational systems and method with field scalability
CN116523583A (en) Electronic commerce data analysis system and method thereof
CN116151270A (en) Parking test system and method
CN118196089A (en) Lightweight method and system for glass container defect detection network based on knowledge distillation
CN117036545A (en) Image scene feature-based image description text generation method and system
CN119128790A (en) Multimodal data fusion method, system, readable storage medium and computer device
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
CN113923099B (en) Root cause positioning method for communication network fault and related equipment
CN115880552A (en) Cross-scale graph similarity guided aggregation system, method and application
CN113723511B (en) Target detection method based on remote sensing electromagnetic radiation and infrared image
CN115457268A (en) Hybrid structure-based segmentation method and device and storage medium
CN109993188B (en) Data label identification method, behavior identification method and device
CN110222839A (en) A kind of method, apparatus and storage medium of network representation study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant