CN116521895A

CN116521895A - Method, system, device and medium for generating scene graph of remote sensing image

Info

Publication number: CN116521895A
Application number: CN202310493492.1A
Authority: CN
Inventors: 劳江微; 仲力恒; 洪炜翔; 张营营; 王剑; 陈景东; 褚崴; 胡丁相; 邹旭苗; 何慧梅; 方彦明
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-01

Abstract

Disclosed is a method for generating a scene graph of a remote sensing image, comprising: generating a target set in the remote sensing image; selecting a plurality of target pairs with potential relations based on the knowledge graph of the remote sensing field; and performing a relationship prediction on the target pair to generate a scene graph of the remote sensing image. Systems, devices, and media for generating a scene graph of a remote sensing image are also disclosed.

Description

Method, system, device and medium for generating scene graph of remote sensing image

Technical Field

The present disclosure relates to remote sensing images, and more particularly, to a method, system, apparatus, and medium for generating a scene graph of a remote sensing image.

Background

Currently, remote sensing images (also referred to as "remote sensing images," "remote sensing image maps," "Remote Sensing Image," etc.) are increasingly used. The scene classification of the remote sensing image is helpful for better analysis of the remote sensing image.

The operations of scene classification, semantic segmentation, target detection and the like on the remote sensing image can generally only understand the remote sensing image in the sense level, namely, the ground feature information is extracted from the remote sensing image, and the sense of the image is lacked, namely, the relationship among the ground features is understood more deeply.

Accordingly, there is a need for improved schemes for generating scene graphs of remote sensing images.

Disclosure of Invention

One or more embodiments of the present specification achieve the above objects by the following technical means.

In one aspect, a method for generating a scene graph of a remote sensing image is provided, comprising: performing target detection on the remote sensing image to generate a target set; selecting a plurality of target pairs having a potential relationship from the set of targets based at least in part on the remote sensing domain knowledge-graph; and performing relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image.

Preferably, selecting a plurality of potentially related object pairs from the set of objects based at least in part on the remote sensing domain knowledge-graph comprises: whether to select the two targets as a target pair having a potential relationship is determined based at least in part on whether the target categories of the two targets have a relationship in the remote sensing domain knowledge graph.

Preferably, two targets satisfying the following conditions are selected as target pairs for which there is a potential relationship: the target categories of the two targets have a relation in the knowledge graph of the remote sensing field, and the center distance between the target frames of the two targets is smaller than the threshold distance; or the object boxes of two objects intersect.

Preferably, performing the relationship prediction on the plurality of potentially related pairs of targets to generate the scene graph of the remote sensing image comprises: and extracting the target visual characteristics, the target frame space characteristics and the target category semantic characteristics of the target pair combination.

Preferably, performing the relationship prediction on the plurality of potentially related pairs of targets to generate the scene graph of the remote sensing image comprises: and performing feature fusion on the target visual features, the target frame space features and the target category semantic features.

Preferably, the method further comprises: and correcting the confidence coefficient of the relation prediction by using the knowledge graph in the remote sensing field.

Preferably, correcting the confidence level of the relationship prediction by using the knowledge graph in the remote sensing field includes: based on the statistical probability of the corresponding relation between two targets in the target pair with potential relation in the remote sensing field knowledge graph, the confidence of the relation prediction is corrected through weighted summation.

In another aspect, a system for generating a scene graph of a remote sensing image is disclosed, comprising: means for performing target detection on the remote sensing image to generate a target set; means for selecting a plurality of potentially related object pairs from the set of objects based at least in part on the remote sensing domain knowledge-graph; and means for performing a relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image.

Preferably, performing the relationship prediction on the plurality of potentially related pairs of targets to generate the scene graph of the remote sensing image comprises: extracting the target visual characteristics, the target frame space characteristics and the target category semantic characteristics of the target pair combination; and performing feature fusion on the target visual feature, the target frame spatial feature and the target category semantic feature.

Preferably, the system further comprises: and correcting the confidence coefficient of the relation prediction by using the knowledge graph in the remote sensing field.

In yet another aspect, a method for generating a scene graph of a remote sensing image is disclosed, comprising: performing target detection on the remote sensing image to generate a target set; performing a relationship prediction on pairs of targets in the set of targets; correcting the confidence coefficient of the relation prediction by utilizing the knowledge graph in the remote sensing field; and generating the scene graph using the relational prediction based on the corrected confidence.

In yet another aspect, an apparatus for generating a scene graph of a remote sensing image is provided, comprising a processor; and a memory coupled to the processor, the memory storing processor-executable instructions that, when executed by the processor, cause the processor to perform the method as described above.

In yet another aspect, a non-transitory processor-readable storage medium is provided that includes processor-executable instructions that, when executed by a processor, cause the processor to perform the method as described above.

In contrast to the prior art, one or more embodiments of the present description are capable of achieving one or more of the following technical effects:

using a priori knowledge;

the relation among ground objects can be understood more deeply;

the efficiency of relation prediction is improved;

and the accuracy of relation prediction is improved.

Drawings

The foregoing summary, as well as the following detailed description, is better understood when read in conjunction with the appended drawings. It is to be noted that the drawings are merely examples of the claimed invention. In the drawings, like reference numbers indicate identical or similar elements.

Fig. 1 shows a generalized schematic block diagram of an example process for constructing a knowledge-graph of a remote sensing domain, according to an embodiment of the present specification.

Fig. 2 shows a schematic diagram of an example of a portion of an ontology definition according to an embodiment of the present specification.

Fig. 3 shows a schematic diagram of an example of entity class according to an embodiment of the present description.

Fig. 4 shows a schematic diagram of an example of attributes of an entity according to an embodiment of the present description.

Fig. 5 shows a schematic diagram of an example of a relationship according to an embodiment of the present specification.

Fig. 6 shows a screenshot of a portion of a national agricultural product geo-tag query system in china.

Fig. 7 is a schematic diagram illustrating conversion of a remote sensing image into a scene graph according to an embodiment of the present disclosure.

Fig. 8 illustrates a flowchart of an example method for generating a scene graph of a remote sensing image according to an embodiment of the present disclosure.

Fig. 9 shows a workflow diagram of an example process for generating a scene graph of a remote sensing image in accordance with a preferred embodiment of the present specification.

FIG. 10 illustrates a schematic diagram of an example system for generating a scene graph of a remote sensing image in accordance with an embodiment of the present disclosure

Fig. 11 illustrates a flowchart of another example method for generating a scene graph of a remote sensing image according to an embodiment of the present disclosure.

Fig. 12 illustrates a schematic block diagram of an apparatus for implementing a system in accordance with one or more embodiments of the present disclosure.

Detailed Description

The following detailed description is presented to enable any person skilled in the art to make and use the teachings of one or more embodiments of the present disclosure and to enable those skilled in the art to readily understand the objects and advantages associated with one or more embodiments of the present disclosure based on the disclosure, claims and drawings disclosed herein.

As described above, at present, a knowledge graph of the remote sensing field has not yet emerged. In the present specification, the knowledge graph in the remote sensing field may also be referred to as a remote sensing knowledge graph, a related knowledge graph in the remote sensing field, a knowledge graph in the remote sensing field, etc., which refers to a domain knowledge graph related to a remote sensing image.

For domain knowledge maps, such as medical domain, financial domain, etc., the construction process typically involves construction based on textual information only. However, the remote sensing field is a special field, which relates to images, and is a special image such as a remote sensing image.

Referring to fig. 1, there is shown a generalized schematic block diagram of an example process 100 for constructing a knowledge-graph of a remote sensing field, in accordance with an embodiment of the present specification.

As shown in FIG. 1, a process 100 for constructing a knowledge graph of a remote sensing field may include operations such as data preparation 102, ontology modeling 104, entity discovery 106, and relationship construction 108. Preferably, the process may further include one or more of vocabulary mining, knowledge fusion, quality control.

In the embodiment of the present disclosure, the data preparation 102 refers to obtaining data for constructing a knowledge graph in the remote sensing field. The data may be structured data, semi-structured data, unstructured data, and/or any combination thereof. In conventional schemes, the data used to construct the knowledge-graph is typically of a single nature, typically, for example, plain text data. In the embodiment of the present disclosure, unlike the conventional scheme, the acquired data may include remote sensing image data and text data.

The remote sensing image data refers to image data obtained by remote sensing, and may be recorded by, for example, aviation equipment or satellites, such as data recorded by a sentinel series, a high-resolution series of remote sensing satellites, and the like. The remote sensing image data may be obtained from any remote sensing image data source. Examples of remote image data sources may include the terrserver database from microsoft corporation, geoImageDB from wuhan giga information engineering, inc. Preprocessing of the remote sensing image data may include radiation correction, geometric correction, denoising, image enhancement, and the like. Any other preprocessing that is conceivable to those skilled in the art may be performed on the remote sensing image data.

The text data may include any form of text data related to the construction of knowledge-maps in the remote sensing field. The text data may include one or more of structured text data, semi-structured text data, and unstructured text data. Such text data may include, for example, data related to any entity that may be included in the remote sensing image. For example, the text data may include any data related to geography.

The structured text data may be obtained, for example, from a structured text database. Examples of structured text databases may include agricultural product geographic markers databases, and the like.

The semi-structured text data may be obtained from a semi-structured text database, examples of which are table class data, such as tables in yearbook, various types of statistical data tables, and so forth.

Unstructured text data may be obtained from an unstructured database, or may be otherwise obtained (e.g., collected by a crawler, etc.), examples of which may include, for example, an encyclopedia website, etc.

Preprocessing of the text data may include text cleansing, formatting, denoising, and the like.

As shown in fig. 1, the acquired data may preferably further include acquired map data, digital geographic product data, and the like. The map data and digital geographic product data may include structured, semi-structured, or unstructured map data or geographic data. The map data may include, for example, various common general map data or special map data. Examples of other digital geographic products may include, but are not limited to, various internet products such as Goldmap, google Maps, opentreetmap, and the like.

The data may be obtained from the various data sources in any suitable manner as would be contemplated by one of skill in the art. For example, the data may be obtained from a local store, a cloud server, a third party storage service, a web retrieval service, and the like.

Data preparation may also optionally include preprocessing the data. For example, data cleansing, data transformation, and other data preprocessing operations may be performed on the data. Preferably, for the remote sensing image and other images, the preprocessing may further include image enhancement of the remote sensing image or other images.

As shown in FIG. 1, the process 100 may also include ontology modeling 104. In the ontology modeling process, a basic framework of the cognitive domain can be applied to a machine learning model. The data sources of the domain knowledge graph can be further clarified and data can be collected according to concepts and attributes defined in the ontology modeling.

As shown in FIG. 1, the ontology modeling 104 may include entity definitions, attribute definitions, and relationship definitions. Preferably, the ontology modeling may also include or restrict/rule definitions. Axiom definitions, event definitions, etc. may also be performed as necessary.

An ontology refers to the representation, naming and definition of categories, attributes and relationships between concepts, data and entities that make up one, more or all of the domains. In other words, an ontology may refer to a formal expression of a collection of concepts and their relationships to each other within a particular domain, which may appear as a structured set of terms of a particular type.

As described above, the ontology may be domain specific. In order to construct a knowledge graph in the remote sensing field, an ontology suitable for knowledge graph modeling in the remote sensing field needs to be defined. For example, the ontology definition may be based on experience or existing data (e.g., by a person having knowledge in the remote sensing arts).

Referring to FIG. 2, a schematic diagram of an example 200 of a portion of an ontology definition according to an embodiment of the present specification is shown. As shown in fig. 2, the ontology defines basic elements of entities, attributes, relationships, and the like.

The construction of the remote sensing ontology follows a course from coarse to fine. After the field and the range of the knowledge graph are determined, key concepts in the graph can be obtained, and then concept expansion is carried out on the basis, so that ontology concepts are gradually complemented. Before building the ontology, whether relevant ontology or classification standard exists or not is analyzed and searched according to the requirement, and if the relevant ontology or classification standard exists, ontology mapping, expansion or direct use can be performed on the basis of the relevant ontology or classification standard.

It can be appreciated that since the remote sensing image mainly captures the geochemical data, i.e., the spatial object. Therefore, in the embodiment of the present disclosure, the spatial object is used as a core entity defined by the ontology, and the spatial attribute and the non-spatial attribute, the spatial relationship and the non-spatial relationship are integrally included in the knowledge graph, so as to implement integrated description of spatial knowledge and non-spatial knowledge. Other entities may also be defined, such as attribute entities (e.g., "polygons," "areas," "guests," etc.) of the spatial objects shown in FIG. 2. For spatial object entities, entity classes (or "entity sets") may also be defined, which may have multiple hierarchies. Examples of entity classes may be referred to below in connection with the description of FIG. 3.

The entity may have various attributes. In embodiments of the present description, an entity may include spatial attributes and non-spatial attributes.

Spatial attributes may include, for example, size (e.g., area), shape (e.g., polygon). The polygon may in turn be composed of its constituent elements (e.g. "points" and "polylines" in FIG. 1). It is conceivable that by the above shape definition, the shape characteristics of a specific physical instance (e.g. a certain stadium, etc.) can be represented.

Non-spatial attributes such as its name attribute, home attribute, etc. For example, the attribute of a spatial object may be "guest" or the like.

It is conceivable that by means of the above-mentioned name properties, the name characteristics of a specific entity instance can be represented.

Examples of attributes of entities may be referred to below in connection with the description of fig. 4.

As shown in FIG. 2, the ontology also defines a relationship between an entity and its attributes, e.g., the relationship between "area" and "space object 1" is "size".

As shown in FIG. 2, the ontology also defines a relationship between a spatial object and another spatial object, which may be a spatial relationship or a non-spatial relationship. Such as a spatial relationship between "spatial object 1" and "spatial object 2" (i.e., "adjacent thereto") and a non-spatial relationship between "spatial object 2" and "spatial object 3" (i.e., "sub-category thereof").

Examples of relationships between entities may be referred to below in connection with the description of FIG. 5.

Referring to FIG. 3, a schematic diagram of an example 300 of a class of entities according to an embodiment of the present description is shown.

As shown in fig. 3, in the embodiment of the present disclosure, according to the characteristics of the remote sensing image field, physical classes such as open space, natural area, artificial area, transportation, vegetation, and segmentation may be defined. Each entity class may include one or more entity subclasses.

For example, the open space may include desert, bare land, etc.; vegetation may include trees, grasslands, and the like; the natural area may include water or the like. And so on. Each class may in turn have subclasses. For example, water can be further classified into lakes, oceans, rivers, etc. The subclasses may in turn include next level subclasses, etc., as desired. In some cases, entities and entity classes (or subclasses) may be generic.

Referring to fig. 4, a schematic diagram of an example 400 of attributes of an entity according to an embodiment of the present description is shown.

As shown in fig. 4, in a preferred embodiment of the present description, the attributes may include spatial attributes and non-spatial attributes.

Spatial attributes may include, for example, geometric attributes, positioning attributes, and the like. Geometrical properties may include length, width, height, footprint, shape, etc.; the positioning attributes may include coordinate system, projection, X, Y, Z coordinates, and the like.

Non-spatial attributes may include, for example, category, name attribute, threat, and so forth. Categories may include, for example, fighter aircraft, transport aircraft, passenger aircraft, etc. (for aircraft). The name attributes may include, for example, chinese names, english names, aliases, used names, and the like. The threats may include, for example, high threat, medium threat, low threat, and the like.

The attributes may contain various possible data types such as text, numbers, links, etc.

Preferably, the attributes are stored in the form of nodes (entities) in the triples constituting the knowledge graph. That is, triples in the form of < entity, attribute name, attribute value > may be stored, where both entity and attribute value are nodes in the knowledge-graph, and "attribute name" is an edge in the knowledge-graph.

It is understood that other suitable spatial and non-spatial properties are contemplated with reference to the embodiments described herein.

Referring to fig. 5, a schematic diagram of an example 500 of a relationship according to an embodiment of the present description is shown.

In the present description embodiments, relationships may include spatial relationships and non-spatial relationships.

The spatial relationships may include, for example, one or more of distance relationships, direction relationships, topological relationships.

The distance relationship describes the distance between two entities. The distance relationship may include a quantitative distance and a qualitative distance. Examples of qualitative distances may include "near", "closer", "very near", "far", "farther", "very far", and so forth.

The directional relationship describes the direction of one entity relative to another entity. Examples of directional relationships may include "east", "west", "south", "north", and the like. Alternatively, the directional relationship may also be described using the absolute directions (e.g., azimuth, etc.) of the two entities.

The topological relation describes the topology between two entities. Examples of topological relationships may include "adjacent," "separated," "connected," "passing," "wrapping," and so forth.

As shown in FIG. 5, preferably, the non-spatial relationships may include semantic relationships or the like.

By utilizing the semantic relationship among the entities, the embodiment of the specification can define the relationship among the entities in more dimensions and describe the relationship among the entities better, thereby improving the efficiency of entity discovery, relationship discovery, knowledge graph construction and entity identification by utilizing the knowledge graph. As shown in FIG. 5, examples of semantic relationships may include attribute relationships, such as membership, equivalence, similarity, mutual exclusion, and so forth.

Membership describes semantically inclusive relationships between entities. For example, "fighter," transporter, "" airliner, "etc. may differ greatly in shape or other spatial relationship, but may all be semantically affiliated with" airplane, "so there may be a semantic affiliation between these entities and" airplane.

Equivalent relationships describe the semantic equivalence between entities. For example, the two entities "bus" and "bus" are semantically equivalent, and there may be an equivalent relationship between them.

The similarity relationship describes the semantic similarity between entities. For example, two entities, "river" and "river" are semantically unequal but have similarity; the two entities, "bomber" and "fighter" are semantically different but have similarity, etc.

Mutually exclusive relationships describe the mutual exclusivity between entities in terms of semantics. For example, two entities, "land" and "ocean" are semantically mutually exclusive, an entity may not be located in the ocean if located on land.

Optionally, the non-spatial relationship may also include a temporal relationship. Generally, a remote sensing satellite periodically observes the earth surface to obtain earth observation results with different states. Thus, the remote sensing image data and entities therein may correspond to different time stamps. In addition, the recorded or acquired text data and other data (e.g., map data, etc.) may also have a timestamp. Thus, in the embodiments of the present description, a temporal relationship between entities is introduced.

As shown in FIG. 5, the temporal relationships may include simultaneous relationships, continuous relationships, transient relationships, periodic relationships, and the like.

The simultaneous relationship indicates that two entities exist simultaneously. For example, two entities may appear simultaneously in the same remote sensing image, where there is a simultaneous relationship between the two entities.

The persistence relationship indicates a persistence of the entity relationship. For example, in most cases, there is a constant relationship between two entities, an "airport" and an "airport runway" that are constantly present.

Similarly, a temporal relationship indicates that two entities are only present instantaneously (e.g., in only one or a small number of images in a series of remote sensing images), while a periodic relationship indicates that two entities are present periodically simultaneously (e.g., that two entities are present periodically in the same remote sensing image).

In the present embodiment, the ontology definition may preferably further comprise constraint definitions and/or rule definitions. Constraints refer to descriptions of what is stated in formalized form, regarding the situation that must be met to accept an assertion as input. For example, one example of a constraint may include: the automobile entity and the marine entity have mutual exclusion relation in space. Since in reality it is not possible for the car itself to be located directly on the sea (the car entity is located on the ship and the situation where the ship is located on the sea does not violate this constraint).

Rules refer to statements in the form of pre-cause-result statements that describe logical inferences that can be drawn from a particular form of an assertion. For example, lake entities typically have rivers flowing in or out. Thus, the following rules may be defined: if a lake entity exists, a river entity exists in spatial connection with the lake entity.

Through the ontology modeling, the embodiment of the specification provides a mode description of a knowledge graph in the remote sensing field. In the subsequent knowledge graph construction process, specific data of entities and relations can be filled according to the ontology modeling.

In the process of knowledge graph construction, ontology modeling can be modified according to the needs.

In an alternative embodiment, as shown in FIG. 1, vocabulary mining 105 may be performed first. For example, vocabulary mining may be performed on text data. Vocabulary mining may include, for example, synonym mining, abbreviation mining, phrase mining, etc., to mine synonyms, abbreviations, phrases, etc., in text data, respectively. Through the vocabulary mining, the text data can be processed more reasonably and efficiently. Vocabulary mining may also be implemented during the preprocessing process of data hereinabove.

As shown in fig. 1, the process 100 may also include entity discovery 106 (or "entity identification", "entity extraction", etc.). Entity discovery is used to identify entities based on the acquired data. The extraction of entities may be based on the foregoing ontology modeling. The extracted entities may be used for subsequent knowledge-graph construction.

For example, for some types of data (e.g., structured data or some semi-structured data), entities may be extracted from the data directly or through simple data processing. For other data (e.g., unstructured data and some semi-structured data), more complex data processing algorithms (e.g., utilizing machine learning models, etc.) may be performed on the data to mine the entity.

Any suitable image recognition model may be used to perform object recognition on the remote sensing image data or other image data to identify entities in the remote sensing image data.

Entity recognition may be performed on the text data using any suitable text processing model (e.g., natural language processing model) or simple text processing (e.g., regular expression based text processing, etc.) to identify entities in the text data.

In addition to entity identification, classification of entities may also be performed. For example, any classification algorithm (e.g., machine learning model) or simple matching rules may be employed to perform entity classification. For example, the entity classification may be performed with reference to a classification scheme as shown in fig. 3.

In addition to performing entity identification, attributes of the entity may also be identified. For example, spatial and/or non-spatial attributes of the entity may be identified, as described above with reference to fig. 4.

For example, for non-spatial attributes, this may be obtained by processing the text data.

For structured text or some semi-structured text, this attribute may be obtained by direct reading or simple data processing operations (e.g., regular expressions, proprietary processing languages of various formats, etc.). For example, some structured tables have values of attributes of entities recorded directly therein.

For some other text, more complex algorithms, such as machine learning algorithms, may be required to obtain the attributes. The text data is assumed to include the following: "the building is lost over time, there is a risk of collapse", then the threatening attribute of the building can be identified as high by, for example, natural language processing algorithms or the like; for another example, it is assumed that the text data includes the following: "Chinese commercial flying C919 is a 168-190 class narrow trunk guest manufactured by China commercial aircraft Limited, C919 can be determined to be a guest and other attributes such as its manufacturer, purpose, etc.

Some non-spatial attributes may also be obtained by processing remote sensing image data, other image data, geographic data, map data, or the like. For example, by the shape of an aircraft in an image, the class attribute of the aircraft may be determined.

The spatial attribute can be obtained by processing text data or remote sensing image data.

Similarly, for structured data or some semi-structured data, the corresponding spatial attributes may be obtained by direct reading or simple data processing operations (e.g., regular expressions, proprietary processing languages of various formats, etc.).

For some other data, more complex algorithms may be required.

For example, after an entity in a remote sensing image or other image is identified, the geometric properties such as the length, width, area, shape and the like of the entity can be obtained by measuring the pixel number of the entity in the remote sensing image or image, multiplying the pixel number by a scale and calculating. Similarly, by measuring the number of pixels of the distance between different entities and multiplying by the scale, the location properties of the entities can be obtained.

For another example, spatial properties of an entity may be obtained based on text data through natural language processing. For example, assume that the text data includes the following: "C919 fuselage length is 38.9 meters, span 33.6 meters", and multiple geometric attributes of the aircraft are readily available through machine learning models such as natural language processing algorithms.

Some spatial attributes may also be obtained by processing other image data, geographic data, map data, or the like. For example, geometric properties of lakes, rivers, etc. and the like can be obtained by geographic data or map data.

Any other suitable way that is conceivable by a person skilled in the art may be used to identify the properties of the entity from the acquired data, according to the description of the embodiments of the present specification.

After or during the execution of entity discovery, relationship discovery 108 (or "relationship identification," "relationship extraction," etc.) may be executed. Relationship discovery is used to discover relationships between entities. Relationship discovery may be based on the foregoing ontology modeling. The extracted relationships can be used for subsequent knowledge-graph construction.

For example, for some types of data (e.g., structured data or some semi-structured data), relationships may be extracted from the data directly or through simple processing. For other data (e.g., unstructured data and some semi-structured data), more complex data processing algorithms (e.g., utilizing machine learning models, etc.) may be performed on the data to mine relationships.

For example, spatial and/or non-spatial relationships of entities may be identified, as discussed above with respect to ontology modeling. For structured data or some semi-structured data, the relationship may be obtained by direct reading or simple data processing operations (e.g., regular expressions, proprietary processing languages of various formats, etc.). For example, some structured tables have table heads that are relationships and table contents that are entities. Referring to fig. 6, a screenshot of a portion of a national agricultural product geographic marking query system of china is shown. As can be seen from the figure, the relationship between the entity "Li Cheng Juglandis" and the entity "Shanxi Ji Changzhi city" is "producing place" and so on. Therefore, the table head and the table content of the corresponding entity are directly read, and the two entities and the relation between the two entities can be obtained. Whereas for data that cannot be read directly from entities and/or relationships, relationships between entities may be mined by more complex algorithms (e.g., machine learning models, etc.), similar to the mining of entity attributes above.

For example, web sites such as wikipedia may be considered unstructured data. Relationships between entities can be obtained by text processing (e.g., natural language processing such as clause processing, relationship extraction, etc.) of content on the wikipedia website. For example, by the sentence "terraced fields are a layered hilly planting area", it is possible to obtain that "terraced fields" and "hilly planting areas" have a semantically similar relationship (semantic relationship); the sentence "terraced fields are built according to mountains" can obtain the qualitative distance relationship (spatial relationship) between the terraced fields and the mountains in space.

The time relationship may be mined, for example, by a timestamp of the data. For example, by comparing the time stamps of two entities, it can be determined whether there is a simultaneous relationship, a persistent relationship, a transient relationship, a periodic relationship, or the like for the two entities. Other ways of determining the temporal relationship may also be employed, such as by machine learning models to mine the laws of occurrence of entities in remote sensing image data or other image data, and so forth.

Some relationships (including spatial relationships and non-spatial relationships) may also be obtained through processing of other image data, geographic data, map data, or the like. For example, the spatial relationship of lakes, rivers, etc. may be obtained by geographic data or map data.

Any other suitable manner that would be conceivable to a person skilled in the art may be used to identify relationships between entities from the acquired data, in accordance with the description of embodiments of the present specification.

After entity discovery and relationship discovery are performed, entities, attributes of the entities and relationships among the entities have been mined, so that a knowledge graph triplet in the remote sensing field is obtained. For example, entities may be vertices (or "nodes") in a knowledge-graph, and relationships between entities may be edges in a knowledge-graph, where triples are < entities, relationships, entities >. In the case of entity attributes, the entity and entity's attribute values are vertices and attribute names are edges, where the triplet is < entity, attribute name, attribute value >. Combining these knowledge-graph triples, a complete remote sensing domain knowledge graph can be constructed, as shown in the domain knowledge base 114 of fig. 1.

When necessary, the relationship among different remote sensing image categories can be analyzed, and the remote sensing images of various different categories are aggregated to form a unified integral remote sensing field knowledge graph.

It can be seen that in the embodiment of the present disclosure, the construction of the knowledge graph in the remote sensing field combines text knowledge and image knowledge. The text knowledge carries out text description on the remote sensing knowledge, for example, a resource No. three satellite belongs to a remote sensing satellite, and the resource No. three satellite is in relation with two entities of the remote sensing satellite. The word knowledge extraction mainly uses vocabulary mining technology to identify important phrases and vocabularies in the field; and identifying the entities and establishing specific relations among the entities by means of entity identification, entity classification, entity link and the like. The remote sensing image is a direct result of earth observation in the remote sensing field. The remote sensing image contains rich geometric and attribute information, and can be intuitively and comprehensively described compared with a text entity.

Referring to fig. 1, optionally, knowledge fusion 110 may also be performed prior to constructing the knowledge-graph.

Since knowledge is obtained from a plurality of different data sources, many similar or ambiguous entities and properties may be obtained. Therefore, knowledge from different sources needs to be integrated and disambiguated, so that a complete and standard domain knowledge graph is constructed. Preferably, knowledge fusion may include one or more of entity alignment, attribute fusion.

Entity alignment may be achieved, for example, by a multiple information embedded entity alignment Method (MEAA). Other suitable methods may be used to achieve physical alignment, such as any one of MuGNN, RDGCN, aliNet, OAG:LinKG or a combination thereof.

Attribute fusion can be realized by adopting an attribute fusion method based on Word2 Vec. In this method, the triples can be converted into Word vectors by Word2Vec, and the similarity between the Word vectors is compared, and when the similarity is greater than a threshold, the two attributes are considered to have the same meaning. For example, when the "fuselage length" and the "length" are used as the length attributes of the "aircraft", the similarity of vectors corresponding to both the "aircraft-length-fuselage length" triplets and the "aircraft-length" triplets can be found to be greater than the threshold value through the comparison, and then the "fuselage length" and the "length" can be considered to describe the same attributes at this time, so that the two are fused.

Referring to fig. 1, quality control may also optionally be performed. Referring to fig. 6, a schematic block diagram of quality control according to an embodiment of the present description is shown.

As shown in fig. 6, the quality control may include one or more of knowledge completion, knowledge correction, and knowledge update. Knowledge completion, knowledge correction, and knowledge updating may be performed automatically or semi-automatically based on predefined rules. In addition, in some embodiments of the present description, an interface for manual intervention is also provided for manual editing of the knowledge graph or triples therein, and crowd-sourced construction may also be performed by a third party or user.

In addition to the predefined rule, the knowledge representation learning method can be used for representing the triples in the atlas as low-dimensional semantic vectors and then processing the vectors, so that error correction and completion of errors or missing parts in the atlas can be automatically realized.

Preferably, the constructed remote sensing domain knowledge graph may be applied to applications in the remote sensing domain, as shown in the domain knowledge application 116 of fig. 1. These applications are used, for example, to guide applications that interpret tasks, logical reasoning, deep learning reasoning, etc. The constructed knowledge graph in the remote sensing field can remarkably improve the entity discovery, attribute discovery and relation discovery performance of the remote sensing image. At the same time, the promotion of the performance of entity discovery, attribute discovery and relationship discovery in turn further promotes the performance when building a knowledge graph (or updating a knowledge graph). Therefore, according to the scheme of the embodiment of the specification, forward feedback closed loop can be realized, and the mutual promotion of knowledge graph construction in the remote sensing field and remote sensing knowledge graph interpretation and reasoning is realized.

Furthermore, the remote sensing field knowledge graph constructed according to the embodiments of the present disclosure (or generated according to other methods) may be used to convert a remote sensing image into a scene graph, so as to better understand the remote sensing image.

The left half of fig. 7 shows a remote sensing image 702. In the example shown in fig. 7, the remote sensing image is of an airport. Although the remote sensing image shows intuitive image information, it is difficult to easily recognize objects, particularly relationships between objects, from the remote sensing image 702.

The right half of fig. 7 shows a scene graph 704 corresponding to the remote sensing image 702.

In the present embodiment, the scene Graph refers to a Graph (Graph) representation of a remote sensing image, rather than an image (image) representation.

A scene Graph refers to a Graph (Graph) structure that represents visual elements in a scene (in this embodiment, a remote sensing image), such as objects and relationships between objects. Specifically, the object is to represent the object and the relationship in the image by nodes and edges, respectively, in the graph structure. Preferably, as shown in FIG. 7, each node has a corresponding region label on the image. For example, in the context of an airport, there may be objects such as aircraft, boarding bridges, apron, terminal buildings, etc., where there may be a "connection" relationship between the aircraft and the boarding bridge, e.g., aircraft (vertex) -connection (edge) -boarding bridge (vertex), and there may be a "surrounding" relationship between the apron and terminal buildings, e.g., apron (vertex) -surrounding (edge) -terminal buildings (vertex).

Preferably, targets of different categories can be displayed in different colors on the graph structure, so that a user is further helped to better understand the relationship among different categories of targets of the remote sensing image.

Referring to fig. 8, a flow chart of an example method 800 for generating a scene graph of a remote sensing image according to an embodiment of the present disclosure is shown. Description of fig. 8 with reference to fig. 9, a workflow diagram of an example process 900 for generating a scenegraph of a remote sensing image according to a preferred embodiment of this specification is shown in fig. 9.

As shown in fig. 8, method 800 may include: in operation 802, target detection may be performed on the remote sensing image to generate a target set.

Various object detection models 904 may be used to perform object detection on the remote sensing image 902 to obtain an object detection output 906. Preferably, a rotation target detection model is adopted in consideration of the characteristics of the remote sensing image. Examples of object detection models that may be employed may include, but are not limited to: r3Det, R-CNN series models (e.g., R-CNN, fast R-CNN, etc.), YOLO series models (e.g., YOLOv2, YOLOv3, etc.), SSD models.

Preferably, the target detection can be performed using the fast R-CNN to improve detection accuracy, and can utilize the generation of a complete and accurate scene graph. The large number of targets generated by fine-grained target detection of fast R-CNN (especially for large-format remote sensing images), combined with the operations described below for selecting target pairs with potential relationships, enables accurate scene graph generation with fewer resources without loss of accuracy.

The target detection output 906 typically includes a target box and a target category. "target Box" may also be referred to as a "Bounding Box" or "target Bounding Box", and may also be referred to in this context as a "prediction Box" that may mark information such as the position, size, and shape of a detected target in an image; typically represented by a rectangle, polygon, or other form. The Object Class (Object Class) may indicate the Class to which the Object belongs, and in a remote sensing image scene, generally refers to a ground Object, such as an airplane, a terminal building, a boarding bridge, and the like. If desired, the target detection output 906 may also include a target confidence level (Object Confidence), which may represent the confidence level of the model as to whether the target box contains a target, typically using a probability value between 0 and 1. Some models also output more detailed information such as the score of each target frame, the angle of the direction, the estimated occlusion level, etc.

From the object detection output 906, an object set 908 of detected clutter objects may be obtained. Each target may have an associated target box. The target frame may identify the location and extent of the target, typically in the form of a rectangular frame (other shapes may also be used). As shown in the target set 908 of FIG. 9, different target categories may be represented in different colors.

Suppose target O _i With associated target box B _i . The set of targets contained in the target detection output 906 may be represented as o= { O ₁ ,…,O _n And its associated set of target boxes can be represented as b= { B ₁ ,…,B _n Where n is the number of detected objects in the remote sensing image.

The method 800 may further include: at operation 804, a plurality of potentially related target pairs may be selected from the set of targets based at least in part on the remote sensing domain knowledge-graph.

As described above, in large-format remote sensing images, particularly where a fine-grained object detection model (e.g., fast R-CNN) is employed, a large number of objects may be identified. At this time, if the relationship prediction is performed for all possible target pairs, a lot of resources and/or a lot of time may be consumed.

In the present embodiment, a remote sensing domain knowledge graph (e.g., remote sensing domain knowledge graph 910 of fig. 9) is utilized to facilitate the generation of a remote sensing scene graph. The remote sensing field knowledge graph is a rich knowledge base containing priori knowledge and contains general rule facts of ground object targets and relations thereof, and the remote sensing scene graph can be regarded as an example of the remote sensing field knowledge graph. The relation prediction of the remote sensing scene graph can be optimized through priori knowledge of the knowledge graph in the remote sensing field.

Preferably, the remote sensing domain knowledge graph 910 is constructed by the method of constructing a remote sensing domain knowledge graph described above with reference to fig. 1-6. Alternatively, the remote sensing domain knowledge-graph may be constructed in any other manner known in the art, or an existing remote sensing domain knowledge-graph may be used.

In the embodiment of the present specification, the generation of the scene graph may be optimized by using the remote sensing domain knowledge graph in the following manner: performing object pair selection (as shown in 912 of fig. 9) using the remote sensing domain knowledge graph prior to relational prediction, thereby reducing the number of object pairs to be processed; and/or performing confidence correction (as shown in 916 of fig. 9) using the remote sensing domain knowledge graph after the relationship prediction, thereby improving the accuracy of the relationship prediction.

Whether to select the two targets as a pair of targets having a potential relationship may be determined based at least in part on whether the target categories of the two targets have a relationship in the remote sensing domain knowledge-graph. For example, predicting whether a potential relationship exists between two targets may be based on the following factors: whether the target categories of the two targets have a relation in the remote sensing field knowledge graph or not; whether the center distance between the target frames of the two targets is smaller than a threshold distance; and whether the object boxes of the two objects intersect. The target pairs may be selected in other ways using a remote sensing domain knowledge graph.

Specifically, in one embodiment of the present disclosure, the following formula may be employed to determine that there is a potential relationship between two target pairs:

{((O _i ,B _i ),(O _j ,B _j ))|hasRelation(O _i ,O _j ,K _G )∧(Dist(B _i ,B _j )＜τ)∨(Intersec(B _i ,B _j )＞0))}

wherein (((O) _i ,B _i ),(O _j ,B _j ) For the target pair with potential relation, KG represents knowledge graph in remote sensing field, hasRelation (O) _i ,O _j KG) is configured to determine a target O _i And O _j Whether there is a relationship in KG (i.e. with the target O in the knowledge graph of the remote sensing field _i And O _j Whether the corresponding entities are connected by edges), dist (x) is a function of calculating the center distance between the targets, τ is a distance threshold, and Intersect (x) is a function of determining whether the targets Intersect.

It can be seen that, as shown in the above formula, when the target categories of the two targets have a relationship in the remote sensing field knowledge graph and the center distance between the target frames of the two targets is smaller than the threshold distance; or when the object boxes of two objects intersect, a potential relationship can be considered to exist between the two objects, so that the object pair formed by the two objects is selected for subsequent processing.

The method 800 may include: at operation 806, a relationship prediction may be performed on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image.

Any suitable relationship prediction model may be employed to perform the relationship prediction. The relational prediction model 914 is, for example, any suitable deep learning model. A preferred model is described below.

Preferably, to perform relational prediction, the target visual features, target box spatial features, and target class semantic features of the target pair coalitions may be extracted, as shown at 920 of FIG. 9.

Any algorithm known to those skilled in the art may be used to extract the above features.

For the searched target pairs ((Oi, bi), (Oj, bj)), extracting target visual features F combined by the target pairs according to the remote sensing images _visual Spatial features of object frame F _spatial Target class semantic feature F _semantic As a feature characterizing the target relationship. Wherein the target visual feature F _visual Pooling results of the image depth feature map after the convolution neural network in the region of the target pair combined target frame range, wherein the target pair combined target frame range is the target pairA combination of target frames for both targets; f (F) _spatial For example, the normalized values of the target frame intersection ratio, the target center distance and the target center line direction angle between the target pairs can be included; f (F) _semantic Is a target category noun vector extracted through a language model (such as Word2vec, gloVe, BERT and the like).

Feature fusion may then be performed on the target visual feature, the target box spatial feature, and the target class semantic feature, as shown at 922 of fig. 9. Any suitable technique, as would be apparent to one skilled in the art, may be performed to perform feature fusion.

After feature fusion is performed, relationship prediction between targets of the target pair may be performed using the fused features.

For example, the relationship prediction may be performed using the following formula:

P＝σ([F _visual ,F _spatial ,F _semantic ],Γ)

where P is the result of the relation prediction between the targets, Γ is the model parameter and σ is the activation process. The result of performing the relationship prediction is the relationship (or "relationship category", which may be selected from a plurality of predefined relationship categories) for each target pair, and the confidence of the relationship is predicted. In general, the relationship with the highest confidence may be selected as the predicted relationship between the two targets of the target pair.

Referring to fig. 9, the detected objects are, for example, "airplane", "boarding bridge", "taxiway", "runway", "apron", "terminal building", and by means of the relationship prediction, the predicted relationships "airplane-link-boarding bridge", "taxiway-intersection-runway" and "apron-surrounding-terminal building" can be obtained.

In the embodiment of the present disclosure, under the guidance of the remote sensing knowledge graph, the target-to-target relationship prediction of the remote sensing image target may be expressed as:

R＝Φ(P,F)

wherein F is the statistical probability of the target relationship in the knowledge graph of the remote sensing field, and the prior knowledge in the knowledge graph of the remote sensing field is utilized; Φ (x) is the relationship prediction confidence correction process. In particular, in some embodiments, the confidence of the relationship prediction may be modified by weighted summation based on the statistical probability of the correspondence of two targets in the target pair that have potential relationships in the remote sensing domain knowledge-graph. That is, Φ (x) can be implemented by weighted summation.

Other ways can be used to correct the confidence level of the relationship prediction based on the knowledge graph in the remote sensing field, for example, a threshold method can be used, that is, when the statistical probability of the two relationships is higher than a predefined threshold value, the confidence level is set to 0, otherwise, the original confidence level is adopted.

After correcting the confidence, the predicted relationship may also be corrected. For example, assuming that the confidence of the originally predicted relationship A between two targets is greater than the confidence of the relationship B (and thus the originally predicted relationship A), and the confidence of the corrected predicted relationship A is less than the confidence of the relationship B, the predicted relationship of the two targets may be corrected to B.

Because the prior knowledge in the knowledge graph of the remote sensing field is utilized in the corrected confidence coefficient, the accuracy of the correction is superior to that of the original prediction confidence coefficient.

Through experiments, the accuracy of the method assisted by the knowledge graph in the remote sensing field is obviously higher than that of a frequency statistical method of a reference, and under the correction of the confidence coefficient of the knowledge graph guidance, the accuracy of the method is higher than that of a basic multi-feature method.

In this way, the relationship of the plurality of potentially related target pairs can be predicted. Having identified the targets and determined the relationship between the targets, a scene graph 918 of the remote sensing image is obtained, wherein the nodes of the scene graph are targets and the edges between the two nodes are the relationship between the two targets.

In a preferred embodiment, after the scene graph is generated, the scene graph may be visually output for viewing by a user.

Any method of visualization of the graph structure may be employed to output the scene graph. In outputting the scene graph, different categories of objects may be represented in different colors, as shown at scene graph 704 of FIG. 7 or scene graph 918 of FIG. 9.

Referring to fig. 10, a schematic diagram of an example system 1000 for generating a scene graph of a remote sensing image according to an embodiment of the present disclosure is shown.

As shown in fig. 10, the system 1000 may include an object detection device 1002, an object pair selection device 1004, and a relationship prediction device 1006.

The object detection device 1002 may be configured to perform object detection on the remote sensing image to generate an object set.

The target pair selection device 1004 may be configured to select a plurality of target pairs having a potential relationship from the set of targets based at least in part on the remote sensing domain knowledge-graph. For example, two targets meeting the following conditions may be selected as target pairs for which there is a potential relationship: the target categories of the two targets have a relation in the knowledge graph of the remote sensing field, and the center distance between the target frames of the two targets is smaller than the threshold distance; or the object boxes of two objects intersect.

A relationship prediction means 1006 configured to perform relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image.

Specifically, the relationship prediction apparatus may perform the following operations: extracting the target pair combined target visual features, target frame space features and target category semantic features, and executing feature fusion on the target visual features, the target frame space features and the target category semantic features.

Preferably, the relationship prediction device may further correct the confidence level of the relationship prediction by using the knowledge graph of the remote sensing field.

Referring to fig. 11, a flow chart of another example method 1100 for generating a scene graph of a remote sensing image according to an embodiment of the disclosure is shown. The description of the method 1100 of fig. 10 may also be understood with reference to the process 900 of fig. 9. The difference from the method 800 of fig. 8 is that the method 800 focuses on the remote sensing domain knowledge-graph being applied to the target pair selection (see target pair selection 912 of fig. 9) without performing the confidence correction, while the method 1100 focuses on the remote sensing domain knowledge-graph being applied to the confidence correction (see confidence correction 916 of fig. 9) without performing the target pair selection.

Accordingly, the method 1100 may include the following operations:

in operation 1102, object detection may be performed on the remote sensing image to generate an object set. This operation may refer to operation 802 of fig. 8. Likewise, the remote sensing image 902 may be processed, for example, by the object detection model 904 of FIG. 9 to generate an object detection output 906, and the set of objects 908 may be derived therefrom.

At operation 1104, a relationship prediction may be performed for pairs of targets in the set of targets. Unlike method 800, in method 1100, the relationship prediction is performed directly on all target pairs in the target set, without including a target pair selection operation. At this point, all possible target pairs in the target set 908 are input to the relational prediction model 914 to perform relational prediction. Likewise, visual features, spatial features, semantic features may be extracted for target pairs (as shown at 902 of FIG. 9), and feature fusion may be performed on these features. Relationship prediction may then be performed using the fused features. Other operations, except for not making the target pair selection, are described above with respect to fig. 8 and 9.

In operation 1106, the confidence level of the relationship prediction may be modified using the remote sensing domain knowledge-graph.

Reference is made to the above description for details. For example, the statistical probability of the target relationship in the knowledge graph of the remote sensing field may be used to correct the confidence level, which is not described herein.

At operation 1108, the scene graph may be generated using the relationship prediction based on the corrected confidence.

After correcting the confidence, the predicted relationship may also need to be corrected, as described above.

For further details and additional features of method 1100, reference may be made to the description of method 800 and process 900 above.

Fig. 12 shows a schematic block diagram of an apparatus 1200 for implementing a system (e.g., system 1000 above) or performing a method (e.g., method 800 above) in accordance with one or more embodiments of the present description. The apparatus may include a processor 1210 and a memory 1215 configured to perform the operations of any of the methods described above. The memory may store, for example, acquired data, algorithms and/or models used, intermediate data generated during operation, and so forth.

The apparatus 1200 may include a network connection element 1225, which may include, for example, a network connection device that connects to other devices through a wired connection or a wireless connection. The wireless connection may be, for example, a WiFi connection, a bluetooth connection, a 3G/4G/5G network connection, etc. Inputs made by the user from other devices may also be received via the network connection element or data transmitted to other devices for display.

The device may also optionally include other peripheral elements 1220 such as input devices (e.g., keyboard, mouse), output devices (e.g., display), etc. Corresponding information may also be output to the user via the output means.

Each of these modules may communicate with each other directly or indirectly, e.g., via one or more buses (e.g., bus 1205).

Moreover, a computer-readable storage medium comprising computer-executable instructions stored thereon, which when executed by a processor, cause the processor to perform the methods of the embodiments described herein is also disclosed.

Further, an apparatus is disclosed that includes a processor and a memory storing computer-executable instructions that, when executed by the processor, cause the processor to perform the methods of the embodiments described herein.

Furthermore, a system is disclosed, comprising means for implementing the methods of the embodiments described herein.

It is to be understood that methods in accordance with one or more embodiments of the present description may be implemented in software, firmware, or a combination thereof.

It should be understood that each embodiment in this specification is described in an incremental manner, and the same or similar parts between the embodiments are referred to each other, and each embodiment focuses on differences from other embodiments. In particular, for apparatus and system embodiments, the description is relatively simple, as it is substantially similar to method embodiments, and relevant references are made to the partial description of method embodiments.

It should be understood that the foregoing describes specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

It should be understood that elements described herein in the singular or shown in the drawings are not intended to limit the number of elements to one. Furthermore, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as a single may be split into multiple modules or elements.

Throughout this specification, "near", "nearly", "approximately" means a deviation of not more than 10%.

It is also to be understood that the phraseology and terminology employed herein are for the purpose of description and should not be regarded as limiting. The use of these terms and expressions is not meant to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications are possible and are intended to be included within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.

Also, it should be noted that while the above-mentioned embodiments have been described with reference to the present specific embodiments, those skilled in the art will recognize that the above-mentioned embodiments are merely illustrative of one or more embodiments of the present disclosure, and that various equivalent changes or substitutions can be made without departing from the spirit of the invention, and therefore, all changes and modifications to the above-mentioned embodiments are intended to be within the scope of the appended claims.

Claims

1. A method for generating a scene graph of a remote sensing image, comprising:

performing target detection on the remote sensing image to generate a target set;

selecting a plurality of target pairs having a potential relationship from the set of targets based at least in part on a remote sensing domain knowledge graph; and

performing relationship prediction on the plurality of target pairs with potential relationships to generate a scene graph of the remote sensing image.

2. The method of claim 1, wherein selecting a plurality of potentially related object pairs from the set of objects based at least in part on a remote sensing domain knowledge-graph comprises:

whether to select the two targets as a target pair having a potential relationship is determined based at least in part on whether the target categories of the two targets have a relationship in the remote sensing domain knowledge graph.

3. The method of claim 2, wherein two targets meeting the following conditions are selected as target pairs for which there is a potential relationship:

the target categories of the two targets have a relation in the remote sensing field knowledge graph, and the center distance between the target frames of the two targets is smaller than a threshold distance; or alternatively

The object boxes of the two objects intersect.

4. The method of claim 1, performing relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image comprising:

and extracting the target visual characteristics, the target frame space characteristics and the target category semantic characteristics of the target pair combination.

5. The method of claim 4, performing relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image comprising:

and executing feature fusion on the target visual features, the target frame space features and the target category semantic features.

6. The method of claim 1, further comprising:

and correcting the confidence coefficient of the relation prediction by using the knowledge graph in the remote sensing field.

7. The method of claim 6, wherein modifying the confidence level of the relationship prediction using the remote sensing domain knowledge-graph comprises:

And correcting the confidence of the relation prediction by weighted summation based on the statistical probability of the corresponding relation between the two targets in the target pair with potential relation in the remote sensing field knowledge graph.

8. A system for generating a scene graph of a remote sensing image, comprising:

means for performing target detection on the remote sensing image to generate a target set;

means for selecting a plurality of potentially related object pairs from the set of objects based at least in part on a remote sensing domain knowledge-graph; and

means for performing a relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image.

9. The system of claim 8, wherein two targets meeting the following conditions are selected as target pairs for which there is a potential relationship:

The object boxes of the two objects intersect.

10. The system of claim 8, performing relationship prediction on the plurality of potentially related pairs of targets to generate a scene graph of the remote sensing image comprising:

Extracting target visual features, target frame space features and target category semantic features of the target pair combination; and

11. The system of claim 8, further comprising:

12. A method for generating a scene graph of a remote sensing image, comprising:

performing a relationship prediction on pairs of targets in the set of targets;

correcting the confidence coefficient of the relation prediction by using the knowledge graph in the remote sensing field; and

the scene graph is generated using relational predictions based on the corrected confidence.

13. An apparatus for generating a scene graph of a remote sensing image, comprising:

a processor; and

a memory coupled to the processor, the memory storing processor-executable instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-7.

14. A non-transitory processor-readable storage medium comprising processor-executable instructions that, when executed by the processor, cause the processor to perform the method of any of claims 1-7.