CN111814658B - Scene semantic structure diagram retrieval method based on semantics - Google Patents
Scene semantic structure diagram retrieval method based on semantics Download PDFInfo
- Publication number
- CN111814658B CN111814658B CN202010644017.6A CN202010644017A CN111814658B CN 111814658 B CN111814658 B CN 111814658B CN 202010644017 A CN202010644017 A CN 202010644017A CN 111814658 B CN111814658 B CN 111814658B
- Authority
- CN
- China
- Prior art keywords
- structure diagram
- semantic structure
- matching
- scene
- scene semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010586 diagram Methods 0.000 title claims abstract description 98
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 239000013598 vector Substances 0.000 claims description 52
- 230000006870 function Effects 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 230000002776 aggregation Effects 0.000 claims description 3
- 238000004220 aggregation Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims 1
- 238000012163 sequencing technique Methods 0.000 abstract description 7
- 230000000694 effects Effects 0.000 abstract description 2
- 238000004364 calculation method Methods 0.000 description 9
- 230000000007 visual effect Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Biophysics (AREA)
- Library & Information Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a scene semantic structure diagram retrieval method, which mainly solves the problem of poor retrieval effect in the prior art. The implementation scheme is as follows: 1) Inputting a query scene semantic structure diagram, recalling a result related to the scene semantic structure diagram in a scene Jing Yuyi structure diagram database D, and obtaining a scene semantic structure diagram candidate set T; 2) Calculating the matching distance of the matching results in the candidate set T, sequencing the matching results in the candidate set T from small to large according to the matching distance, and reserving k results arranged in front to obtain a simplified candidate set T'; 3) And calculating the similarity S between the scene semantic structure diagram and the query scene semantic structure diagram in the simplified candidate set T' by using the graph neural network, and sequencing from large to small according to the similarity value to obtain a final retrieval result. The method improves the retrieval efficiency and the retrieval precision of the scene semantic structure diagram, and can be used for searching the scene semantic structure diagram with similar semantics and realizing the accurate positioning of the local scene in the global scene.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a scene semantic structure diagram retrieval method which can be used for searching a scene semantic structure diagram with similar semantics and realizing accurate positioning of a local scene in a global scene.
Background
Training computers to interpret and understand the visual world has attracted attention from many researchers in the field of computer vision over the past decades. For an image or a video, the eyes of a human can easily capture objects, backgrounds and hidden abundant semantic information among the objects or backgrounds in the image or the video, and how to represent the abundant object information and semantic information contained in the image becomes a very critical research point. The scene semantic structure diagram is a directed graph structure which is proposed by Johnson and used for describing a visual scene, nodes in the scene semantic structure diagram describe semantic information of objects in the visual scene, and sides describe interaction information among the objects. Scene semantic structure not only provides context cues for basic recognition tasks, but also provides powerful support for advanced visual tasks like semantic-based image retrieval.
In the field of image retrieval, both images and text descriptions can be represented by a field Jing Yuyi structure diagram, so that an image retrieval problem can be converted into a scene semantic structure diagram retrieval problem, and semantic-based image retrieval is realized through the retrieval of the scene semantic structure diagram.
The retrieval of the scene semantic structure diagram can be also cited to the accurate positioning of the local field in the panoramic local scene, when a plurality of unmanned aerial vehicles work cooperatively, the unmanned aerial vehicle at a high position can obtain a larger visual angle corresponding to a global scene, and the unmanned aerial vehicle at a low position has a smaller visual angle corresponding to a local scene.
Because the scene semantic structure is a directed graph structure, the retrieval of the scene semantic structure is closely related to the graph retrieval. At home and abroad, the retrieval of the scene semantic structure diagram is mainly carried out by two methods: the first type is a graph matching method, such as Ullmann algorithm, which adopts a method of depth-first search and backtracking pruning to judge whether a substructure which is exactly matched with an inquiry scene semantic structure drawing exists in the scene semantic structure drawing or not. The second type of method is a conditional random field method adopted by Johnson, which takes each node in a scene semantic structure diagram as a variable, and calculates the classification probability of the scene semantic structure diagram through the propagation of probabilities between edges.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a scene semantic structure diagram searching method based on semantics, which greatly improves the algorithm efficiency by firstly carrying out recall and rough sequencing filtration on the scene semantic structure diagram in a database, and then comprehensively considers the similarity among objects, relations and structures in the scene semantic structure diagram when using a graph neural network to accurately sequence the recall result, thereby greatly improving the searching precision.
In order to achieve the above purpose, the technical scheme adopted by the invention comprises the following steps:
(1) Inputting a query scene semantic structure diagram, recalling a result related to the scene semantic structure diagram in a scene Jing Yuyi structure diagram database D to obtain a scene semantic structure diagram candidate set T:
(1.1) extracting 5 fixed substructures from a scene semantic structure diagram database D to obtain a substructures database D'; extracting the same 5 fixed substructures from the input scene semantic structure drawing to obtain an inquiry substructures set Q, wherein each substructures comprises the name of the scene semantic structure drawing, the substructures type and the object label;
(1.2) for each substructure in the query substructure set Q, searching a substructure which can be matched with the substructure in the substructure database D', and obtaining a substructure matching pair formed by the query substructure and the matched substructure;
(1.3) selecting two substructure matching pairs which belong to the same scene semantic structure diagram and have no identical object or identical matching objects corresponding to the identical object from the two inquiry substructures, and connecting an edge between the two substructure matching pairs to obtain a plurality of undirected graphs;
(1.4) solving the maximum groups of the plurality of undirected graphs, and merging the sub-structure matching pairs in the maximum groups to obtain matching results between the query scene semantic structure diagram and the scene semantic structure diagram in the scene semantic structure diagram database D, wherein the matching results form a candidate set T;
(2) Calculating the matching distance of the matching results in the candidate set T, sorting the matching results in the candidate set from small to large according to the matching distance, and reserving k results arranged in front to obtain a simplified candidate set T', wherein k is set according to actual requirements and takes a value of 50 or 100;
(3) And precisely calculating the similarity S between the scene semantic structure diagram and the query scene semantic structure diagram in the simplified candidate set T' by using the graph neural network, and sequencing from large to small according to the similarity value to obtain a final retrieval result.
Compared with the prior art, the invention has the following advantages:
1) The invention carries out recall and rough sequencing on the input query scene semantic structure diagram in the database, can rapidly filter a large number of irrelevant results, and greatly improves the retrieval efficiency.
2) According to the method, the scene semantic structure diagram is firstly converted into the vectors by using the graph neural network, and then the similarity between the vectors is used for representing the similarity between the scene semantic structure diagrams, so that the problem of calculating the similarity of the complex and difficult scene semantic structure diagram is converted into the problem of calculating the simple vector similarity, the calculation efficiency and the precision of the similarity of the scene semantic structure diagram are greatly improved, and the retrieval efficiency and the precision of the scene semantic structure diagram are further improved;
3) The invention can also be applied to other fields of graph retrieval.
Drawings
FIG. 1 is a general flow chart of an implementation of the present invention;
FIG. 2 is a scene semantic structure diagram recall sub-flowchart in the present invention;
FIG. 3 is a diagram of an example of a sub-structure of a scene semantic structure used in the present invention;
FIG. 4 is a sub-flowchart of calculating the similarity between scene semantic structure diagrams using the graph neural network in the present invention;
FIG. 5 is a diagram of information encoding structure in the neural network of FIG. 5;
FIG. 6 is a cross-graph information propagation block diagram in the neural network of FIG. 6;
FIG. 7 is a diagram of an information aggregation structure in the neural network of FIG. 7;
fig. 8 is a diagram showing an example of the search result according to the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly described below in conjunction with the illustrations in the embodiments of the present invention, and it is apparent that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the implementation steps of this example are as follows:
step 1, recalling a scene semantic structure diagram related to the input scene semantic structure diagram from a database D to form a candidate set T.
Referring to fig. 2, the specific implementation of this step is as follows:
firstly, extracting 5 substructures with different shapes, which are shown in fig. 3 and are formed by points and lines, from a scene semantic structure diagram database D, wherein the points represent objects, the lines represent the relationship among the objects, and a substructures database D' is obtained; extracting 5 fixed substructures which are the same as those in FIG. 3 from the input scene semantic structure drawing to obtain an inquiry substructures set Q, wherein each substructure contains the name of the scene semantic structure drawing, the substructures type and the object label;
(1.2) for each substructure in the query substructure set Q, retrieving a substructure that can be matched with the query substructure set Q in the substructure database D', to obtain a substructure matching pair of the query substructure and the matched substructure:
(1.2.1) determining whether the query sub-structure is the same as the type in the currently traversed sub-structure in database D': if so, then execute (1.2.2), otherwise, execute (1.2.4);
(1.2.2) determining whether the objects of the two substructures satisfy the following constraints:
L 2 (V(c i ),V(φ(c i )))≤th o
wherein th o 、th a An object matching threshold value and an attribute matching threshold value set by people, c i Representing object o i Category V (c) i ) Representing the word vector corresponding to the category, phi (c i ) Representing an object phi (o) i ) Is of the class V (phi (c) i ) A word vector corresponding to the category;
if the constraint is satisfied, execute (1.2.3),
otherwise, execute (1.2.4);
(1.2.3) determining whether the relationship of the two substructures satisfies the following constraint:
wherein th r Is a manually set relationship matching threshold value o i 、o j Is the interrogation of two different objects in the substructure, phi (o i ) And phi (o) j ) Is two different objects in the substructure to be matched of the database, E 1 (o i ,o j ) p Representing object o in interrogation substructure i And object o j In relation to p-th, V (E 1 (o i ,o j ) p ) Representing the word vector corresponding to the p-th relation class, E 2 (φ(o i ),φ(oj)) q Representing the phi (o) of an object in a sub-structure of a database to be matched i ) And phi of the objecto j ) The q-th relation between V (E 2 (φ(o i ),φ(o j )) q ) A word vector corresponding to the q-th relation category is represented,
if the constraint is satisfied, a substructure match pair is obtained, added to the set of substructure match pairs, and executed (1.2.4),
otherwise, directly executing (1.2.4);
(1.2.4) enumerating the next substructure in the database D', ending the search for the current query substructure until enumeration is completed, otherwise, returning to (1.2.1).
(1.3) selecting two substructure matching pairs which belong to the same scene semantic structure diagram and have no identical object or identical matching objects corresponding to the identical object from the two inquiry substructures, and connecting an edge between the two substructure matching pairs to obtain a plurality of undirected graphs;
(1.4) solving a plurality of undirected graphs for a maximum clique:
existing methods for solving the maximum clique of the undirected graph include the Bron-Kerbosch algorithm, the Hochbaum algorithm, and the like, and the present example uses, but is not limited to, the Bron-Kerbosch algorithm, and the solving process is as follows:
(1.4.1) constructing 4 sets R, P, X, M and a tag n that records the number of nodes of the largest maximum clique currently found, wherein R sets record the points that have been added in the current maximum clique; the P set records the points which can be added possibly, namely the points which are connected with all the nodes in the R set by edges; x sets record points to which a certain maximum group has been added; m sets are the last returned maximum cluster set; initially, R, X, M is an empty set, P is a set containing all nodes in the undirected graph, n is 0, and the execution is (1.4.2);
(1.4.2) sequentially taking out each node in the P set, adding the currently taken node into the R set on the assumption that the currently taken node is v, updating the P set to be an intersection of the original P set and a node set connected with the node v, and simultaneously updating the X set to be an intersection of the original X set and the node set connected with the node v, and executing (1.4.3);
(1.4.3) determining whether the P set and the X set are both empty: if yes, execute (1.4.4), otherwise execute (1.4.5);
(1.4.4) comparing n with the number of nodes in R:
if n is less than the number of nodes in R, updating n to the number of nodes in R, emptying the M set, adding the R set into the M set, and executing (1.4.5);
if n is equal to the number of nodes in R, adding the R set to the M set, and then executing (1.4.5);
if n is greater than the number of nodes in R, then execute directly (1.4.5);
(1.4.5) delete v node from P while adding v node to X, return (1.4.2).
And (1.5) merging the substructure matching pairs in the maximum group to obtain matching results between the query scene semantic structure diagram and the scene semantic structure diagram in the scene semantic structure diagram database D, and forming a candidate set T by using the matching results.
And 2, roughly sorting the candidate set T, and reserving the results ranked in the front to obtain a simplified candidate set T'.
(2.1) calculating a matching distance of the matching result in the candidate set T:
(2.1.1) constructing a scene semantic structure map matching distance metric function D φ (G 1 ,G 2 ):
First, according to two scene semantic structure diagram G 1 And G 2 Mainly the object category, the relation category, the object attribute, the structure difference and the first scene semantic structure figure G 1 Some of the objects in (a) may be at G 2 The characteristics of the matching objects can not be found, and the differences of the matching results on the categories of the objects are respectively givenDifference in relation categories->Difference in properties of objects +.>Structural difference->First scene semantic Structure FIG. G 1 Some of the objects in G 2 In the case of no matching object found +.>Is represented by the formula:
in c i Represents G 1 Object o in (a) i Category V (c) i ) Representing the word vector corresponding to the category, phi (c i ) Representing object o i At G 2 Of the matching object, V (c i ) A) represents the word vector corresponding to the category, dg (o) i ) Representing object o i Degrees in the graph;
in E 1 (o i ,o j ) p Represents G 1 Object o in (a) i And object o j In relation to p-th, V (E 1 (o i ,o j ) p ) Representing the word vector corresponding to the p-th relation class, E 2 (φ(o i ),φ(o j )) q Represents G 2 The object phi (o) i ) And object phi (o) j ) The q-th relation between V (E 2 (φ(o i ),φ(o j )) q ) A word vector corresponding to the q-th relation class;
in which A i,p Representing object o i Is the p-th attribute of V (A i,p ) Is the word vector corresponding to the category of the p-th attribute, phi (A i ) q Representing a piece of paperLigand phi (o) i ) Is the q-th attribute of V (phi (A) i ) q ) Is the word vector corresponding to the category of the q-th attribute;
in which d (o) i ,o j ) Represents G 1 Object o in (a) i And o j At G of (2) 1 D (o) i ),φ(o j ) G) represents G 2 The object phi (o) i ) And phi (o) j ) At G 2 The shortest distance in (a);
o in i ∈O 1 -O 1 ' is G 1 At G 2 An object set in which no matching object is found;
then, the 5 parts are weighted and summed to obtain a scene semantic structure matching distance measurement function D shown as follows φ (G 1 ,G 2 ):
Wherein G is 1 =(O 1 ,E 1 ) Is a semantic structure diagram of inquiry scene, O 1 Is G 1 Object set of (E) 1 Is G 1 Relation set in G 2 =(O 2 ,E 2 ) Is a semantic structure diagram of a matching scene, O 2 Is G 2 Object set of (E) 2 Is G 2 In (a) and phi represents G 1 And G 2 Matching results, w o ,w r ,w a ,w s ,w g The weights of the parts are respectively represented;
(2.1.2) for each match result phi in the candidate set T, the distance metric function D is matched using the field Jing Yuyi structure map φ (G 1 ,G 2 ) Calculating a matching junctionMatching distance of fruit phi;
and (2.2) sorting the matching results in the candidate set T according to the matching distance from small to large, and reserving k results arranged in front to obtain a simplified candidate set T', wherein k is set according to actual requirements and takes a value of 50 or 100, and the example takes 100.
And 3, accurately sequencing the simplified candidate set T' by using the graph neural network to obtain a final retrieval result.
(3.1) calculating the similarity between the query scene semantic structure map and the scene semantic structure map in the reduced candidate set T' by using the map neural network:
referring to fig. 4, the specific implementation of this step is as follows:
(3.1.1) semantic Structure of query scene G 1 And candidate scene semantic structure diagram G in candidate set T 2 All object information o of (a) i And relationship information rel i,j Input into the information encoding structure shown in FIG. 5, namely object information o i Input to a first multi-layer perceptron MLP object In, relation information rel i,j Input to a second multi-layer perceptron MLP relationship In which the coded object hidden state information is outputtedAnd encoded relationship information e i,j :
e i,j =MLP relationship (rel i,j ),e i,j ∈E 1 E 2
Wherein o is i Representing the ith object information, rel in the scene semantic structure diagram ij Representing object o i And object o j Relationship information between the two;
(3.1.2) implicit State information with all encoded objectsLearning the information in the whole graph and the object matching information between the graphs to obtain final object hidden state information +.>
Semantic structure diagram G of query scene after coding 1 And candidate scene semantic structure diagram G 2 Inputting into the cross-graph information propagation structure shown in fig. 6, iterating t=5 times to make the input query scene semantic structure diagram G 1 And candidate scene semantic structure diagram G 2 All of the encoded implicit state informationCan learn the information in the whole graph and the object matching information between the graphs to obtain the final hidden state information of the object>
The cross-graph information propagation structure adopts an iterative structure and comprises a plurality of time steps, wherein each time step consists of three parts:
the first part uses a multi-layer perceptron MLP union Pair relationship information e i,j And implicit status information of objects at both ends thereofPerforming joint coding to obtain information m after joint coding j→i :
Wherein, MLP union Consists of a full connection layer and a linear rectification function;
the second part calculates G using the following 1 Implicit state information of objects in a computer systemAnd G 2 Implicit state information of the object in (a)>Matching information μ between j→i :
Wherein S is v The vector similarity calculation function can be an European similarity calculation function or a cosine similarity calculation function, and the example adopts but is not limited to the European similarity calculation function;
a third section for inputting implicit state information of the objectJointly encoded information m j→i Matching information mu between hidden states of objects j→i Calculating implicit state information of an object including first-order neighborhood information using the following formula>
Wherein, MLP p Is a multi-layer perceptron, which consists of a full connection layer and a linear rectifying function, S v The vector similarity calculation function can be an European similarity calculation function or a cosine similarity calculation function, and the example adopts but is not limited to the European similarity calculation function;
(3.1.3) semantic structure diagram G of query scene respectively 1 Implicit state information of all objects in a networkAnd candidate scene semantic structure diagram G 2 Implicit status information of all objects in (a)>Input into the information aggregation structure shown in FIG. 7 to obtain two vectors V 1 And V 2 :
Wherein, MLP w 、MLP u 、MLP G Is three multi-layer perceptron, all composed of a fully connected layer and a linear rectifying function, sigma is a logistic function, g can be a summation function or a mean function, the example adopts but is not limited to summation function;
(3.1.4) calculating the first vector V by different methods according to different training tasks 1 And a second vector V 2 Similarity between:
if the training task is a classification task, the first vector V is first 1 And a second vector V 2 Splicing the two vectors together, inputting the two vectors into a logic Studies function layer in the graph neural network, and calculating the similarity between the two vectors;
if the training task is a regression task, the first vector V is first 1 And a second vector V 2 And splicing the two vectors together, inputting the two vectors into a full-connection layer in the graph neural network, and calculating the similarity between the two vectors.
And (3.2) sequencing the scene semantic structure images in the simplified candidate set T' according to the similarity value from large to small to obtain a final retrieval result.
The effect of the invention can be further illustrated by the following simulation experiments:
simulation experiment condition
The whole flow of the example is realized by using a python programming language under a windows 64-bit platform, a database adopts a Johnson real scene database containing 5000 scene semantic structure diagrams, two inquiry scene semantic structure diagrams are selected, and the first inquiry scene semantic structure diagram contains 3 objects: "woman", "man", "bike", two relations: "woman behand man", "man on bike", the second query scene semantic structure contains 3 objects: "woman", "snorowBoard", "hat", two relations: "man by snorowBoard", "man has hat".
Experimental details and results
Experiment 1, input the first query scene semantic structure diagram, use the method of the invention to search in the above-mentioned database, keep the first 4 search results, as shown in figure 8 (a). The right side of the arrow in fig. 8 (a) shows the 4 search results and the corresponding real pictures, and as can be seen from fig. 8 (a), the method of the invention can search the scene semantic structure diagram which is exactly matched with the query scene semantic structure diagram, and can also search the similar scene semantic structure diagram which is different from the query scene semantic structure diagram in object type and relation type.
Experiment 2, input the second query scene semantic structure diagram, use the method of the invention to search in the above-mentioned database, keep the first 4 search results, as shown in figure 8 (b). The right side of the arrow in fig. 8 (b) shows the 4 search results and the corresponding real pictures, and as can be seen from fig. 8 (b), the method of the invention can search the scene semantic structure diagram which is exactly matched with the query scene semantic structure diagram, and can also search the similar scene semantic structure diagram which is structurally different from the query scene semantic structure diagram.
Claims (5)
1. A scene semantic structure diagram retrieval method is characterized by comprising the following steps:
(1) Inputting a query scene semantic structure diagram, recalling a result related to the scene semantic structure diagram in a scene Jing Yuyi structure diagram database D to obtain a scene semantic structure diagram candidate set T:
(1a) Extracting 5 fixed substructures from a scene semantic structure diagram database D to obtain a substructures database D'; extracting the same 5 fixed substructures from the input scene semantic structure drawing to obtain an inquiry substructures set Q, wherein each substructures comprises the name of the scene semantic structure drawing, the substructures type and the object label;
(1b) For each substructure in the query substructure set Q, searching a substructure which can be matched with the substructure in the substructure database D', and obtaining a substructure matching pair formed by the query substructure and the matched substructure;
(1c) Selecting two substructure matching pairs, wherein all the matching substructures belong to the same scene semantic structure diagram, the two inquiry substructures have no identical object or the matching objects corresponding to the identical object are identical, and connecting one edge between the two substructure matching pairs to obtain a plurality of undirected graphs;
(1d) Solving the maximum group of the plurality of undirected graphs, and merging the substructure matching pairs in the maximum group to obtain matching results between the query scene semantic structure diagram and the scene semantic structure diagram in the scene semantic structure diagram database D, wherein the matching results form a candidate set T;
(2) Calculating the matching distance of the matching results in the candidate set T, sorting the matching results in the candidate set from small to large according to the matching distance, and reserving k results arranged in front to obtain a simplified candidate set T', wherein k is set according to actual requirements and takes a value of 50 or 100;
(3) Accurately calculating the similarity S between the scene semantic structure diagram and the query scene semantic structure diagram in the simplified candidate set T' by using a graph neural network, and sorting from large to small according to the similarity value to obtain a final retrieval result;
the method comprises the steps of calculating the similarity S between a scene semantic structure diagram and an inquiry scene semantic structure diagram in a simplified candidate set T' by using a graph neural network, wherein the similarity S is realized as follows:
(3a) Query scene semantic structure graph G using two multi-layer perceptron pairs 1 And candidate scene semantic structure diagram G in T 2 All object information o of (a) i And relationship information rel i,j Coding to obtain each object o i Implicit state information of (a)And each relation rel i,j Encoded information e of (2) i,j ;
(3b) At each time step, let G 1 And G 2 Each object o in (2) i Implicit state information of (a)Learning first-order neighborhood information in the graph and matching information of objects between the graphs, and iterating for T=5 times to obtain each object o i New implicit status information of-> Aggregation into two vectors V 1 And V 2 ;
(3d) According to different training tasks, different methods are adopted to calculate the first vector V 1 And a second vector V 2 Similarity between:
if the training task is a classification task, the first vector V is first 1 And a second vector V 2 Splicing the two vectors together, inputting the two vectors into a logic Studies function layer in the graph neural network, and calculating the similarity between the two vectors;
if the training task is a regression task, the first vector V is first 1 And a second vector V 2 And splicing the two vectors together, inputting the two vectors into a full-connection layer in the graph neural network, and calculating the similarity between the two vectors.
2. The method of claim 1, wherein (2) a matching distance of the matching result of the candidate set T is calculated,matching distance metric function D by scene semantic structure diagram φ (G 1 ,G 2 ) And (3) calculating:
wherein G is 1 =(O 1 ,E 1 ) Is a semantic structure diagram of inquiry scene, O 1 Is G 1 Object set of (E) 1 Is G 1 Relation set in G 2 =(O 2 ,E 2 ) Is a semantic structure diagram of a matching scene, O 2 Is G 2 Object set of (E) 2 Is G 2 In (a) is G 1 And G 2 A bi-directional mapping function between objects in (1) representing G 1 And G 2 Matching result between the functions D φ Consists of 5 parts:w o ,w r ,w a ,w s ,w g the weights of the parts are respectively represented, wherein:
representing the difference of the matching result in the category of the object, wherein c i Represents G 1 Object o in (a) i Category V (c) i ) Representing the category c i Corresponding word vector, phi (c) i ) Representing object o i Is a match of the class of objects, V (phi (c) i ) Represents the category phi (c) i ) Corresponding word vector, dg (o i ) Representing object o i Degrees in the graph;
representing the difference of the matching result in the category of the relation, wherein E 1 (o i ,o j ) p Represents G 1 Object o in (a) i And object o j Category of p-th relation between V (E 1 (o i ,o j ) p ) Category E representing the relationship 1 (o i ,o j ) p Corresponding word vector, E 2 (φ(o i ),φ(o j )) q Representing an object phi (o) i ) And object phi (o) j ) Category of the q-th relation between V (E 2 (φ(o i ),φ(o j )) q ) Category E representing the relationship 2 (φ(o i ),φ(o j )) q A corresponding word vector;
representing the difference of the matching result on the attribute of the object, wherein A i,p Representing object o i The p-th attribute of (a) gets a category, V (a) i,p ) Is category A of the attribute i,p Corresponding word vector, phi (A i ) q Representing a matching object phi (o i ) Is of the q-th attribute class, V (phi (a i ) q ) Is the category phi (A of the attribute i ) q A corresponding word vector;
represents the structural difference of the matching result, wherein d (o i ,o j ) Representing object o i And o j At G of (2) 1 D (o) i ),φ(o j ) Represents an object phi (o) i ) And phi (o) j ) At G 2 The shortest distance in (a);
represents G 1 In the absence of matching of objects in (a), o i ∈O 1 -O′ 1 Is G 1 At G 2 No object set is found that matches the object.
3. The method of claim 1, wherein the 5 seed structures in (1 a) are different shapes consisting of points and lines, wherein the points represent objects and the lines represent relationships between the objects.
4. The method of claim 1, wherein (1 b) retrieving a substructure in the substructure database D' that matches it is accomplished by:
(1b1) Determining whether the query sub-structure is the same as the type in the currently traversed sub-structure in database D': if so, executing (1 b 2), otherwise, executing (1 b 4);
(1b2) Judging whether the objects of the two substructures meet the following constraint:
L 2 (V(c i ),V(φ(c i )))≤th o
wherein th o 、th a An object matching threshold value and an attribute matching threshold value set by people, c i Representing object o i Category V (c) i ) Representing the category c i Corresponding word vector, phi (c) i ) Representing an object phi (o) i ) Is of the class V (phi (c) i ) Represents the category phi (c) i ) The corresponding word vector is used to determine the word vector,
executing (1 b 3) if the constraint is satisfied, otherwise executing (1 b 4);
(1b3) Judging whether the relation between the two substructures meets the constraint shown in the following formula:
th r is a manually set relationship matching threshold value o i 、o j Is an object in the interrogation substructure, phi (o i ) And phi (o) j ) Is a sub-node to be matched of the databaseObject in the structure E 1 (o i ,o j ) p Representing object o i And object o j Category of p-th relation between V (E 1 (o i ,o j ) p ) Word vectors corresponding to the categories representing the relationship, E 2 (φ(o i ),φ(o j )) q Representing an object phi (o) i ) And object phi (o) j ) Category of the q-th relation between V (E 2 (φ(o i ),φ(o j )) q ) A word vector corresponding to the category representing the relationship,
if the constraint is met, obtaining a substructure matching pair, adding the substructure matching pair to the substructure matching pair set, and executing (1 b 4), otherwise, directly executing (1 b 4);
(1b4) Enumerating the next substructure in the database D', if traversing is completed, ending the search, otherwise, returning to (1 b 1).
5. The method of claim 1, wherein the maximum clique of the undirected graph of (1 d) is obtained by using a Bron-Kerbosch algorithm, which is specifically implemented as follows:
(1d1) Constructing 4 sets R, P, X, M and a mark n for recording the number of nodes of the maximum cluster found currently, wherein R sets record the points added in the maximum cluster currently; the P set records the points which can be added possibly, namely the points which are connected with all the nodes in the R set by edges; x sets record points to which a certain maximum group has been added; m sets are the last returned maximum cluster set; initially, R, X, M is an empty set, P is a set containing all nodes in the undirected graph, n is 0, and (1 d 2) is executed;
(1d2) Taking out each node in the P set in turn, assuming that the currently taken node is v, adding the node into the R set, updating the P set to be the intersection of the original P set and the node set connected with the node v, and simultaneously updating the X set to be the intersection of the original X set and the node set connected with the node v, and executing (1 d 3);
(1d3) Judging whether the P set and the X set are empty or not: if yes, executing (1 d 4), otherwise, executing (1 d 5);
(1d4) Comparing n with the number of nodes in R:
if n is less than the number of nodes in R, updating n to the number of nodes in R, emptying the M set, adding the R set into the M set, and executing (1 d 5);
if n is equal to the number of nodes in R, adding the R set to the M set, and then executing (1 d 5);
if n is greater than the number of nodes in R, then directly executing (1 d 5);
(1d5) The v node is deleted from P while the v node is added to X, returning to (1 d 2).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010644017.6A CN111814658B (en) | 2020-07-07 | 2020-07-07 | Scene semantic structure diagram retrieval method based on semantics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010644017.6A CN111814658B (en) | 2020-07-07 | 2020-07-07 | Scene semantic structure diagram retrieval method based on semantics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111814658A CN111814658A (en) | 2020-10-23 |
CN111814658B true CN111814658B (en) | 2024-02-09 |
Family
ID=72841795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010644017.6A Active CN111814658B (en) | 2020-07-07 | 2020-07-07 | Scene semantic structure diagram retrieval method based on semantics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111814658B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022130509A1 (en) * | 2020-12-15 | 2022-06-23 | 日本電信電話株式会社 | Object detection device, object detection method, and object detection program |
CN112788239A (en) * | 2021-01-27 | 2021-05-11 | 维沃移动通信(杭州)有限公司 | Shooting method and device and electronic equipment |
CN113034592B (en) * | 2021-03-08 | 2021-08-31 | 西安电子科技大学 | Three-dimensional scene target detection modeling and detection method based on natural language description |
CN113468770B (en) * | 2021-09-02 | 2021-11-12 | 成都新西旺自动化科技有限公司 | Method and system for generating machine vision formula |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867192A (en) * | 2012-09-04 | 2013-01-09 | 北京航空航天大学 | Scene semantic shift method based on supervised geodesic propagation |
CN105718555A (en) * | 2016-01-19 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Hierarchical semantic description based image retrieving method |
WO2019007041A1 (en) * | 2017-07-06 | 2019-01-10 | 北京大学深圳研究生院 | Bidirectional image-text retrieval method based on multi-view joint embedding space |
CN110188168A (en) * | 2019-05-24 | 2019-08-30 | 北京邮电大学 | Semantic relation recognition methods and device |
CN111291212A (en) * | 2020-01-24 | 2020-06-16 | 复旦大学 | Zero sample sketch image retrieval method and system based on graph convolution neural network |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8301633B2 (en) * | 2007-10-01 | 2012-10-30 | Palo Alto Research Center Incorporated | System and method for semantic search |
US10452923B2 (en) * | 2017-11-28 | 2019-10-22 | Visual Semantics, Inc. | Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation |
-
2020
- 2020-07-07 CN CN202010644017.6A patent/CN111814658B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102867192A (en) * | 2012-09-04 | 2013-01-09 | 北京航空航天大学 | Scene semantic shift method based on supervised geodesic propagation |
CN105718555A (en) * | 2016-01-19 | 2016-06-29 | 中国人民解放军国防科学技术大学 | Hierarchical semantic description based image retrieving method |
WO2019007041A1 (en) * | 2017-07-06 | 2019-01-10 | 北京大学深圳研究生院 | Bidirectional image-text retrieval method based on multi-view joint embedding space |
CN110188168A (en) * | 2019-05-24 | 2019-08-30 | 北京邮电大学 | Semantic relation recognition methods and device |
CN111291212A (en) * | 2020-01-24 | 2020-06-16 | 复旦大学 | Zero sample sketch image retrieval method and system based on graph convolution neural network |
Non-Patent Citations (3)
Title |
---|
代具亭 ; 汤心溢 ; 刘鹏 ; 邵保泰 ; .基于彩色-深度图像和深度学习的场景语义分割网络.科学技术与工程.2018,(第20期),全文. * |
宋腾义 ; 汪闽 ; .多要素空间场景相似性匹配模型及应用.中国图象图形学报.2012,(第10期),全文. * |
张玲玉 ; 尹鸿峰 ; .基于OAN的知识图谱查询研究.软件.2018,(第01期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111814658A (en) | 2020-10-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814658B (en) | Scene semantic structure diagram retrieval method based on semantics | |
CN115934990B (en) | Remote sensing image recommendation method based on content understanding | |
CN113241128B (en) | Molecular property prediction method based on molecular space position coding attention neural network model | |
CN108108657A (en) | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning | |
CN114398491A (en) | Semantic segmentation image entity relation reasoning method based on knowledge graph | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
Stumm et al. | Probabilistic place recognition with covisibility maps | |
Rad et al. | Image annotation using multi-view non-negative matrix factorization with different number of basis vectors | |
CN108170823B (en) | Hand-drawn interactive three-dimensional model retrieval method based on high-level semantic attribute understanding | |
CN113032613A (en) | Three-dimensional model retrieval method based on interactive attention convolution neural network | |
CN111400572A (en) | Content safety monitoring system and method for realizing image feature recognition based on convolutional neural network | |
CN112035689A (en) | Zero sample image hash retrieval method based on vision-to-semantic network | |
Qiu et al. | A survey of recent advances in CNN-based fine-grained visual categorization | |
CN116662468A (en) | Urban functional area identification method and system based on geographic object space mode characteristics | |
CN116204673A (en) | Large-scale image retrieval hash method focusing on relationship among image blocks | |
CN113240046A (en) | Knowledge-based multi-mode information fusion method under visual question-answering task | |
Prasomphan | Toward Fine-grained Image Retrieval with Adaptive Deep Learning for Cultural Heritage Image. | |
Nath et al. | Deep learning models for content-based retrieval of construction visual data | |
CN116935329B (en) | Weak supervision text pedestrian retrieval method and system for class-level comparison learning | |
CN115797795B (en) | Remote sensing image question-answer type retrieval system and method based on reinforcement learning | |
CN117475228A (en) | Three-dimensional point cloud classification and segmentation method based on double-domain feature learning | |
CN115934966A (en) | Automatic labeling method based on remote sensing image recommendation information | |
CN116797821A (en) | Generalized zero sample image classification method based on fusion visual information | |
Hsieh et al. | Region-based image retrieval | |
Ranjbar et al. | Scene novelty prediction from unsupervised discriminative feature learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |