CN117523410A - Image processing and construction method based on multi-terminal collaborative perception distributed large model - Google Patents

Image processing and construction method based on multi-terminal collaborative perception distributed large model Download PDF

Info

Publication number
CN117523410A
CN117523410A CN202311498044.7A CN202311498044A CN117523410A CN 117523410 A CN117523410 A CN 117523410A CN 202311498044 A CN202311498044 A CN 202311498044A CN 117523410 A CN117523410 A CN 117523410A
Authority
CN
China
Prior art keywords
terminal
satellite
characteristic
request
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311498044.7A
Other languages
Chinese (zh)
Inventor
孙显
付琨
王智睿
陈凯强
赵良瑾
成培瑞
卢雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202311498044.7A priority Critical patent/CN117523410A/en
Publication of CN117523410A publication Critical patent/CN117523410A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Astronomy & Astrophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an image processing and constructing method based on a multi-terminal collaborative perception distributed large model, relating to the field of remote sensing image processing and being used for solving the problems of limited performance of an online execution task and difficult deployment of the large model, comprising the following steps: acquiring remote sensing images of the same scene observed by different satellites; inputting each remote sensing image into a neural network model of a corresponding satellite respectively, and extracting first features in each remote sensing image; calculating a confidence score of the first feature; determining a satellite terminal needing to be assisted in a plurality of satellite terminals as a request terminal according to the confidence score; calculating the matching degree between the first characteristics of the request terminal and the first characteristics of other satellite terminals; determining at least one satellite terminal of the assistance request terminal as an assistance terminal according to the matching degree; calculating the association characteristic between the first characteristic of the request end and the first characteristic of the assisting end; weighting and fusing the first characteristic of the request terminal and the associated characteristic based on the confidence score and the matching degree; and generating an image detection result according to the fusion characteristic.

Description

Image processing and construction method based on multi-terminal collaborative perception distributed large model
Technical Field
The invention relates to the technical field of remote sensing satellites, in particular to an image processing and construction method based on a multi-terminal collaborative perception distributed large model.
Background
Currently, in the technical field of remote sensing satellites, on-line processing of remote sensing images is generally completed based on a single satellite terminal. Because of limited observation content and calculation resources of a single satellite terminal, the performance of online execution tasks is limited, and the precision and efficiency of online processing of remote sensing images are low. Moreover, because of the computational limitations of the edge terminal platform in the distributed large model, it is difficult to directly deploy the distributed large model.
Disclosure of Invention
In view of the above, a first aspect of the present invention provides an image processing method based on a multi-terminal collaborative awareness distributed large model, where the distributed large model includes a plurality of satellite terminals, and each satellite terminal adopts a neural network model with the same structure; the method comprises the following steps: acquiring a plurality of remote sensing images of the same scene observed by different satellites; inputting the remote sensing image observed by each satellite into a neural network model of the satellite respectively, and executing the following operations: extracting a first characteristic in each remote sensing image; calculating a confidence score of each first feature, wherein the confidence score is used for representing the information content contained by the first feature; determining a satellite terminal needing to be assisted in a plurality of satellite terminals according to the confidence scores as a request terminal; for each request terminal, calculating the matching degree between the first characteristics corresponding to the request terminal and the first characteristics corresponding to other satellite terminals; determining at least one satellite terminal of an assistance request terminal according to the matching degree, and taking the satellite terminal as an assistance terminal; calculating the association characteristic between the first characteristic of the request end and the first characteristic of the corresponding assisting end; weighting and fusing the first characteristic of the request terminal and the associated characteristic of the corresponding auxiliary terminal based on the confidence score and the matching degree to obtain a fused characteristic corresponding to the request terminal; and generating an image detection result according to the fusion characteristic.
According to an embodiment of the present invention, extracting the first feature in each of the remote sensing images includes: and (3) taking the ResNet-18 network as a characteristic extraction network of the neural network model to extract the first characteristic in each remote sensing image.
According to an embodiment of the present invention, calculating the confidence score of each first feature includes: encoding the first feature to generate a query and a key respectively; the query and the key are processed by adopting a self-attention mechanism and activated by a Sigmoid function to generate a confidence score.
According to the embodiment of the invention, the satellite terminals needing to be assisted in the plurality of satellite terminals are determined according to the confidence scores and comprise the following steps as request terminals: and determining the satellite terminal with the confidence score smaller than a preset request threshold as a request terminal.
According to an embodiment of the present invention, for each request terminal, calculating a matching degree between a first feature corresponding to the request terminal and a first feature corresponding to other satellite terminals includes: performing compression encoding on the first features corresponding to the request end to generate an assistance request, wherein the vector length of the assistance request is smaller than that of the key; sending the assistance request to other satellite terminals so that the other satellite terminals decode the assistance request and project the characteristics to obtain projection characteristics, wherein the projection characteristics are consistent with key dimensions corresponding to the assistance terminal; and for each satellite terminal in other satellite terminals, respectively carrying out dot product operation on the corresponding projection characteristic and the first characteristic to obtain the matching degree between the first characteristic of the request terminal and the first characteristic of each assistance terminal in other satellite terminals.
According to the embodiment of the invention, at least one satellite terminal of the assistance request terminal is determined according to the matching degree, and the satellite terminal comprises: normalizing the matching degree to obtain a matching score; and determining the satellite terminals with the matching scores larger than the assistance threshold as assistance terminals, wherein the assistance threshold is 1/(N-1), and N is the total number of the satellite terminals in the distributed large model.
According to an embodiment of the present invention, normalizing the matching degree to obtain a matching score includes: and normalizing the matching degree by adopting a Softmax function to obtain a matching score.
According to an embodiment of the present invention, calculating an association feature between a first feature of a requesting end and a first feature of a corresponding assisting end includes: performing global cross attention operation on the first features of the request end and the first features of the corresponding assistance end, and calculating an incidence matrix between the first features of the assistance end and the first features of the request end; performing matrix multiplication on the correlation matrix and the first characteristic of the assisting end to obtain the correlation characteristic required by the request end; weighting the first characteristics of the request terminal based on the confidence score, and weighting the associated characteristics based on the matching score; and summing the weighted features to obtain a fusion feature.
The second aspect of the invention provides a method for constructing a distributed large model based on multi-terminal collaborative awareness, which comprises the following steps: constructing a distributed large model, wherein the distributed large model comprises a plurality of satellite terminals, and each satellite terminal adopts the same structure of a neural network model; acquiring a plurality of remote sensing images of the same scene observed by different satellites; inputting the remote sensing image observed by each satellite into a neural network model of the satellite respectively, and executing the following operations: extracting a first characteristic in each remote sensing image; calculating a confidence score of each first feature, wherein the confidence score is used for representing the information content contained by the first feature; calculating the matching degree between the first characteristics corresponding to each satellite terminal and the first characteristics corresponding to other satellite terminals; calculating the association characteristic between the first characteristic of each satellite terminal and the first characteristic corresponding to other satellite terminals; weighting and fusing the first characteristic corresponding to each satellite terminal and the associated characteristic corresponding to other satellite terminals based on the confidence score and the matching degree respectively to obtain fusion characteristics corresponding to each satellite terminal respectively; generating an image detection result of each satellite terminal according to the fusion characteristics of each satellite terminal; and adjusting parameters of the neural network model of each satellite terminal according to the image detection result.
According to the image processing and constructing method based on the multi-terminal collaborative perception distributed large model provided by the embodiment of the invention, at least the following technical effects can be realized:
the observation information extracted by each distributed satellite terminal is adaptively fused by utilizing a collaborative sensing technology, the problems of limited calculation force and limited sensing information of a single terminal platform are solved by utilizing the calculation force of the distributed terminals, and the precision and the efficiency of remote sensing image processing are improved.
In the processing process, the cooperative sensing process is assisted and established through self-mutual information matching, and redundant information transmission is reduced under the condition of maintaining the precision. Through the association feature fusion, interference caused by unaligned feature information is avoided, effective fusion of multi-terminal features is realized, and the fusion features are used for downstream tasks to improve the overall task performance of the distributed large model.
In the large model construction process, the problem of computational limitation of an edge terminal platform is solved in a collaborative perception mode, and the direct deployment of the terminal distributed large model is realized.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:
fig. 1 schematically shows a flowchart of an image processing method based on a multi-terminal collaborative awareness distributed large model according to an embodiment of the invention.
Fig. 2 schematically shows a schematic diagram of an image processing method based on a multi-terminal collaborative awareness distributed large model according to an embodiment of the invention.
Fig. 3 schematically shows a block diagram of a self-mutual information matching module according to an embodiment of the invention.
Fig. 4 schematically shows a block diagram of an associated feature fusion module according to an embodiment of the invention.
Fig. 5 schematically shows a flowchart of a method for constructing a distributed large model based on multi-terminal collaborative awareness according to an embodiment of the invention.
Detailed Description
The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly and include, for example, either permanently connected, removably connected, or integrally formed therewith; may be mechanically connected, may be electrically connected or may communicate with each other; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In the description of the present invention, it should be understood that the terms "longitudinal," "length," "circumferential," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate an orientation or a positional relationship based on that shown in the drawings, merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the subsystem or element in question must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Like elements are denoted by like or similar reference numerals throughout the drawings. Conventional structures or constructions will be omitted when they may cause confusion in the understanding of the invention. And the shape, size and position relation of each component in the figure do not reflect the actual size, proportion and actual position relation. In addition, in the present invention, any reference signs placed between parentheses shall not be construed as limiting the claim.
Similarly, in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. The description of the terms "one embodiment," "some embodiments," "example," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.
The distributed satellite observation platform has the capabilities of cooperative observation and internal communication, and provides a foundation for cooperative perception. Collaborative awareness refers to sharing feature information among multiple platforms to enhance local awareness capabilities. Therefore, the efficient large model construction and remote sensing image processing can be realized by utilizing the concepts of distributed computing power and collaborative awareness of multiple platforms. The invention provides a remote sensing image processing and constructing method based on a multi-terminal collaborative perception distributed large model, which realizes dynamic on-demand interaction of remote sensing terminals and effective fusion of multi-terminal information through self-mutual information matching and feature alignment fusion, enriches the perception content of each terminal with less communication expenditure and improves the precision of processing tasks.
Fig. 1 schematically shows a flowchart of an image processing method based on a multi-terminal collaborative awareness distributed large model according to an embodiment of the invention.
Fig. 2 schematically shows a schematic diagram of an image processing method based on a multi-terminal collaborative awareness distributed large model according to an embodiment of the invention.
As shown in fig. 1 and 2, the image processing method based on the multi-terminal collaborative awareness distributed large model may include operations S101 to S107, for example.
In operation S101, a plurality of remote sensing images of the same scene observed by different satellites are acquired, and the remote sensing images observed by each satellite are respectively input into a neural network model of the satellite.
In the embodiment of the invention, the distributed large model comprises a plurality of satellite terminals, and each satellite terminal adopts the same structure of a neural network model and comprises a feature extraction backbone network, a self-mutual information matching module, an associated feature fusion module and a downstream task decoder. A distributed large model is formed by combining a plurality of single models, so that a foundation is laid for the subsequent information cooperative interaction.
For the same scene, remote sensing images observed by different satellite terminals are input into neural network models of different satellite terminals to perform subsequent feature extraction, self-mutual information matching, associated feature fusion and information prediction.
Taking a distributed large model frame consisting of four satellite terminals (platforms) as an example, each platform respectively inputs distributed observation remote sensing data of the same site. In the framework, the feature extraction module is used for extracting basic features, and the features are used for information interaction and feature fusion in the subsequent collaborative sensing process. The self-mutual information matching module is used for determining whether assistance is required and determining which platforms are in assistance relation. The association feature fusion module models the relationship between the local information and the assistance information, and selectively integrates the assistance information to perform feature fusion according to the guidance of the local information.
In operation S102, a first feature in each of the remote sensing images is extracted.
In the embodiment of the invention, a ResNet-18 network can be used as a characteristic extraction backbone network of a neural network model to extract the first characteristic in each remote sensing image. The network has better trade-off between model weight and index performance, is suitable for edge deployment of a remote sensing terminal, and uses the characteristics of a fourth layer for subsequent module processing. By doing so, higher-level semantic information can be reserved, and richer feature representations are provided, so that the subsequent collaborative awareness and feature fusion processes are facilitated.
In operation S103, a confidence score of each first feature is calculated, and a satellite terminal to be assisted from among the plurality of satellite terminals is determined as a request terminal according to the confidence score.
In an embodiment of the invention, the confidence score is used to characterize the amount of information contained by the first feature. Calculating the confidence score for each of the first features may include: encoding the first feature to generate a query and a key respectively; the query and the key are processed by adopting a self-attention mechanism and activated by a Sigmoid function to generate a confidence score. The following is a detailed description.
Fig. 3 schematically shows a block diagram of a self-mutual information matching module according to an embodiment of the invention.
As shown in fig. 3, the first feature extracted by the feature extraction backbone network is sent to the self-mutual information matching module, which is used to determine whether assistance is required, and determine the terminal for assistance.
Specifically, each satellite terminal takes the output of the feature extraction module as the input of the self-mutual information matching module, and generates a query and a key respectively through coding processing. Then, the query and the key are processed by adopting a self-attention mechanism, and the self-trust score is generated through activation of a Sigmoid function. If the confidence score is lower than the set request threshold, the satellite terminal is a request terminal and needs to send an assistance request to other satellite terminals.
In operation S104, for each request end, a matching degree between the first feature corresponding to the request end and the first feature corresponding to other satellite terminals is calculated, and at least one satellite terminal of the assistance request end is determined according to the matching degree and is used as an assistance end.
With continued reference to fig. 3, the matching degree calculation process may be: and generating an assistance request (request) by compression encoding the corresponding first feature (local feature) by the request end, wherein the vector length of the assistance request is smaller than the vector length of the key so as to reduce communication overhead during transmission. And sending the assistance request to other satellite terminals so that the other satellite terminals decode the assistance request and project the characteristics to obtain projection characteristics, wherein the projection characteristics are consistent with key dimensions corresponding to the assistance terminal. And for each satellite terminal in other satellite terminals, respectively carrying out dot product operation on the corresponding projection characteristic and the first characteristic to obtain the matching degree between the first characteristic of the request terminal and the first characteristic (assistance characteristic) of each assistance terminal in other satellite terminals.
The assisting end feeds the matching degree back to the requesting end, the requesting end normalizes the matching degree to obtain matching score, the satellite terminal with the matching score larger than the assisting threshold value is determined as the assisting end, the assisting threshold value is 1/(N-1), and N is the total number of the satellite terminals in the distributed large model. The requesting end sends an instruction to the selected assisting end, and the assisting end is required to provide assistance features. Further, the matching degree can be normalized by adopting a Softmax function to obtain a matching score.
In operation S105, an association characteristic between the first characteristic of the request end and the first characteristic corresponding to the other satellite terminals is calculated.
Fig. 4 schematically shows a block diagram of an associated feature fusion module according to an embodiment of the invention.
As shown in fig. 4, calculating the association characteristic may include performing a global cross-attention operation on the first characteristic of the requesting end and the first characteristic of the corresponding assisting end, and calculating an association matrix between the first characteristic of the assisting end and the first characteristic of the requesting end. And multiplying the correlation matrix by the first characteristic of the assisting end to obtain the correlation characteristic required by the request end. The first feature of the requesting end is weighted based on the confidence score and the associated feature is weighted based on the matching score. And summing the weighted features to obtain a fusion feature.
In operation S106, the first feature of the request terminal and the associated feature of the corresponding assisting terminal are weighted and fused based on the confidence score and the matching degree, so as to obtain the fusion feature corresponding to the request terminal.
In operation S107, an image detection result is generated from the fusion feature.
In an embodiment of the invention, the associated fusion feature is passed to a downstream task decoder for generating an image detection result.
Fig. 5 schematically shows a flowchart of a method for constructing a distributed large model based on multi-terminal collaborative awareness according to an embodiment of the invention.
As shown in fig. 5, the method for constructing the multi-terminal collaborative awareness distributed large model may include operations S501 to S508.
In operation S501, a distributed large model is constructed, and the distributed large model includes a plurality of satellite terminals, each of which adopts the same structure of the neural network model.
In operation S502, a plurality of remote sensing images of the same scene observed by different satellites are acquired, and the remote sensing images observed by each satellite are respectively input into a neural network model of the satellite.
In operation S503, a first feature in each of the remote sensing images is extracted.
In operation S504, confidence scores of the respective first features are calculated, and a degree of matching between the first feature corresponding to each satellite terminal and the first features corresponding to other satellite terminals is calculated.
In operation S505, an association characteristic between the first characteristic of each satellite terminal and the first characteristics corresponding to other satellite terminals is calculated.
In operation S506, the first features corresponding to each satellite terminal and the first features corresponding to other satellite terminals are weighted and fused based on the confidence score and the matching degree, respectively, to obtain fusion features corresponding to each satellite terminal.
In operation S507, an image detection result of each satellite terminal is generated according to the fusion characteristics of each satellite terminal.
In operation S508, parameters of the neural network model of each satellite terminal are adjusted according to the image detection result.
It should be noted that, the embodiment part of the construction method is similar to the remote sensing image processing part as a whole, and the difference is that: in the construction method, each terminal is taken as a request terminal, an assistance request is sent to all other satellite terminals, and in the feature fusion process, different feature fusion weights are distributed according to the similarity between the first features of other satellite terminals and the first features of the request terminal, so that feature fusion of all satellite terminals is realized. Please refer to the remote sensing image processing embodiment for other details, which will not be described herein.
In summary, the embodiment of the invention provides an image processing and constructing method based on a multi-terminal collaborative perception distributed large model, which solves the problems of limited computing power and limited perception information of a single terminal platform in a collaborative perception manner and realizes the deployment of the terminal distributed large model. The self-mutual information matching module is provided for completing the assistance establishment of the collaborative sensing process, and redundant information transmission is reduced under the condition of maintaining the precision. Through the associated feature fusion module, interference caused by unaligned feature information is avoided, effective fusion of multi-terminal features is realized, and the fusion features are used for downstream tasks to improve the overall task performance of the distributed large model.
Those skilled in the art will appreciate that the features recited in the various embodiments of the invention can be combined in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the present invention. In particular, the features recited in the various embodiments of the invention can be combined and/or combined in various ways without departing from the spirit and teachings of the invention. All such combinations and/or combinations fall within the scope of the invention.
The embodiments of the present invention are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the invention, and such alternatives and modifications are intended to fall within the scope of the invention.

Claims (9)

1. An image processing method based on a multi-terminal collaborative perception distributed large model is characterized in that the distributed large model comprises a plurality of satellite terminals, and each satellite terminal adopts the same structure of a neural network model; the method comprises the following steps:
acquiring a plurality of remote sensing images of the same scene observed by different satellites;
inputting the remote sensing image observed by each satellite into a neural network model of the satellite respectively, and executing the following operations:
extracting a first characteristic in each remote sensing image;
calculating a confidence score of each first feature, wherein the confidence score is used for representing the information content contained in the first feature;
determining a satellite terminal needing to be assisted in the plurality of satellite terminals according to the confidence scores as a request terminal;
for each request terminal, calculating the matching degree between the first characteristics corresponding to the request terminal and the first characteristics corresponding to other satellite terminals;
determining at least one satellite terminal assisting the request terminal according to the matching degree, and taking the satellite terminal as an assisting terminal;
calculating the association characteristic between the first characteristic of the request end and the first characteristic of the corresponding assisting end;
weighting and fusing the first characteristics of the request terminal and the associated characteristics of the corresponding auxiliary terminal based on the confidence score and the matching degree to obtain fused characteristics corresponding to the request terminal;
and generating an image detection result according to the fusion characteristic.
2. The image processing method based on a multi-terminal collaborative awareness distributed big model according to claim 1, wherein the extracting the first feature in each of the remote sensing images comprises:
and using a ResNet-18 network as a characteristic extraction network of the neural network model to extract a first characteristic in each remote sensing image.
3. The image processing method based on a multi-terminal collaborative awareness distributed big model according to claim 1, wherein the calculating the confidence score of each of the first features comprises:
encoding the first feature to generate a query and a key respectively;
the query and the key are processed by adopting a self-attention mechanism and activated by a Sigmoid function to generate a confidence score.
4. The image processing method based on the multi-terminal collaborative awareness distributed big model according to claim 1, wherein the determining a satellite terminal that needs to be assisted from the plurality of satellite terminals according to the confidence score includes:
and determining the satellite terminal with the confidence score smaller than a preset request threshold as the request terminal.
5. The image processing method based on the multi-terminal collaborative awareness distributed big model according to claim 3, wherein for each request terminal, calculating a matching degree between the first feature corresponding to the request terminal and the first features corresponding to other satellite terminals includes:
performing compression encoding on the corresponding first features through a request end to generate an assistance request, wherein the vector length of the assistance request is smaller than that of the key;
transmitting the assistance request to other satellite terminals so that the other satellite terminals decode and project the assistance request to obtain projection features, wherein the projection features are consistent with key dimensions corresponding to the assistance terminal;
and for each satellite terminal in other satellite terminals, respectively carrying out dot product operation on the corresponding projection characteristic and the first characteristic to obtain the matching degree between the first characteristic of the request terminal and the first characteristic of each assistance terminal in other satellite terminals.
6. The image processing method based on the multi-terminal collaborative awareness distribution type large model according to claim 5, wherein the determining assistance of the at least one satellite terminal of the requesting terminal according to the matching degree includes:
normalizing the matching degree to obtain a matching score;
and determining the satellite terminals with the matching scores larger than an assistance threshold as assistance terminals, wherein the assistance threshold is 1/(N-1), and N is the total number of the satellite terminals in the distributed large model.
7. The image processing method based on the multi-terminal collaborative awareness distributed big model according to claim 5, wherein normalizing the matching degree to obtain a matching score comprises:
and normalizing the matching degree by adopting a Softmax function to obtain a matching score.
8. The image processing method based on a multi-terminal collaborative awareness distributed big model according to claim 5, wherein the computing the association feature between the first feature of the requesting terminal and the first feature of the corresponding assisting terminal comprises:
performing global cross attention operation on the first features of the request end and the first features of the corresponding assistance end, and calculating an incidence matrix between the first features of the assistance end and the first features of the request end;
performing matrix multiplication on the correlation matrix and the first characteristic of the assisting end to obtain the correlation characteristic required by the request end;
weighting the first feature of the request terminal based on the confidence score, and weighting the associated feature based on the matching score;
and summing the weighted features to obtain the fusion feature.
9. The method for constructing the distributed large model based on multi-terminal collaborative perception is characterized by comprising the following steps of:
constructing a distributed large model, wherein the distributed large model comprises a plurality of satellite terminals, and each satellite terminal adopts the same structure of a neural network model;
acquiring a plurality of remote sensing images of the same scene observed by different satellites;
inputting the remote sensing image observed by each satellite into a neural network model of the satellite respectively, and executing the following operations:
extracting a first characteristic in each remote sensing image;
calculating a confidence score of each first feature, wherein the confidence score is used for representing the information content contained in the first feature;
calculating the matching degree between the first characteristic corresponding to each satellite terminal and the first characteristics corresponding to other satellite terminals;
calculating the association characteristic between the first characteristic of each satellite terminal and the first characteristic corresponding to other satellite terminals;
weighting and fusing the first characteristic corresponding to each satellite terminal and the associated characteristic corresponding to other satellite terminals based on the confidence score and the matching degree respectively to obtain fused characteristics corresponding to each satellite terminal;
generating an image detection result of each satellite terminal according to the fusion characteristic of each satellite terminal;
and adjusting parameters of the neural network model of each satellite terminal according to the image detection result.
CN202311498044.7A 2023-11-10 2023-11-10 Image processing and construction method based on multi-terminal collaborative perception distributed large model Pending CN117523410A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311498044.7A CN117523410A (en) 2023-11-10 2023-11-10 Image processing and construction method based on multi-terminal collaborative perception distributed large model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311498044.7A CN117523410A (en) 2023-11-10 2023-11-10 Image processing and construction method based on multi-terminal collaborative perception distributed large model

Publications (1)

Publication Number Publication Date
CN117523410A true CN117523410A (en) 2024-02-06

Family

ID=89761978

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311498044.7A Pending CN117523410A (en) 2023-11-10 2023-11-10 Image processing and construction method based on multi-terminal collaborative perception distributed large model

Country Status (1)

Country Link
CN (1) CN117523410A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150071528A1 (en) * 2013-09-11 2015-03-12 Digitalglobe, Inc. Classification of land based on analysis of remotely-sensed earth images
CN111582104A (en) * 2020-04-28 2020-08-25 中国科学院空天信息创新研究院 Semantic segmentation method and device for remote sensing image
CN111783556A (en) * 2020-06-11 2020-10-16 中国人民解放军军事科学院国防科技创新研究院 Multi-domain multi-source remote sensing information fusion system and method
CN115200554A (en) * 2022-07-14 2022-10-18 深圳市水务工程检测有限公司 Unmanned aerial vehicle photogrammetry supervision system and method based on picture recognition technology
CN115908969A (en) * 2022-11-01 2023-04-04 阿里巴巴(中国)有限公司 Method and apparatus for image processing and model training
CN115984646A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Distributed target detection method and device for remote sensing cross-satellite observation and satellite
CN115984647A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Remote sensing distributed collaborative reasoning method, device, medium and satellite for constellation
US20230123223A1 (en) * 2020-02-29 2023-04-20 Huawei Technologies Co., Ltd. Distributed Service Scheduling Method and Related Apparatus
CN116152053A (en) * 2022-10-27 2023-05-23 乔文豹 Large-scene intensive 3D environment sensing software for unmanned aerial vehicle cluster collaborative airborne calculation
CN116824396A (en) * 2023-08-29 2023-09-29 湖北省泛星信息技术有限公司 Multi-satellite data fusion automatic interpretation method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150071528A1 (en) * 2013-09-11 2015-03-12 Digitalglobe, Inc. Classification of land based on analysis of remotely-sensed earth images
US20230123223A1 (en) * 2020-02-29 2023-04-20 Huawei Technologies Co., Ltd. Distributed Service Scheduling Method and Related Apparatus
CN111582104A (en) * 2020-04-28 2020-08-25 中国科学院空天信息创新研究院 Semantic segmentation method and device for remote sensing image
CN111783556A (en) * 2020-06-11 2020-10-16 中国人民解放军军事科学院国防科技创新研究院 Multi-domain multi-source remote sensing information fusion system and method
CN115200554A (en) * 2022-07-14 2022-10-18 深圳市水务工程检测有限公司 Unmanned aerial vehicle photogrammetry supervision system and method based on picture recognition technology
CN116152053A (en) * 2022-10-27 2023-05-23 乔文豹 Large-scene intensive 3D environment sensing software for unmanned aerial vehicle cluster collaborative airborne calculation
CN115908969A (en) * 2022-11-01 2023-04-04 阿里巴巴(中国)有限公司 Method and apparatus for image processing and model training
CN115984646A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Distributed target detection method and device for remote sensing cross-satellite observation and satellite
CN115984647A (en) * 2022-12-19 2023-04-18 中国科学院空天信息创新研究院 Remote sensing distributed collaborative reasoning method, device, medium and satellite for constellation
CN116824396A (en) * 2023-08-29 2023-09-29 湖北省泛星信息技术有限公司 Multi-satellite data fusion automatic interpretation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUELEI WANG; ZHIRUI WANG; PEIRUI CHENG; XUAN ZENG; HONGQI WANG; XIAN SUN; KUN FU: "DCM: A Distributed Collaborative Training Method for the Remote Sensing Image Classification", 《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 》, vol. 61, 6 March 2023 (2023-03-06), pages 1 - 17 *

Similar Documents

Publication Publication Date Title
WO2021190451A1 (en) Method and apparatus for training image processing model
CN108388900A (en) The video presentation method being combined based on multiple features fusion and space-time attention mechanism
US20240119268A1 (en) Data processing method and related device
CN113486190A (en) Multi-mode knowledge representation method integrating entity image information and entity category information
WO2024060558A1 (en) Feasible region prediction method and apparatus, and system and storage medium
CN107832794A (en) A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN114780768A (en) Visual question-answering task processing method and system, electronic equipment and storage medium
Khurram et al. Dense-captionnet: a sentence generation architecture for fine-grained description of image semantics
CN113361387A (en) Face image fusion method and device, storage medium and electronic equipment
CN111046738A (en) Precision improvement method of light u-net for finger vein segmentation
CN113420606B (en) Method for realizing autonomous navigation of robot based on natural language and machine vision
CN112668608B (en) Image recognition method and device, electronic equipment and storage medium
CN117523410A (en) Image processing and construction method based on multi-terminal collaborative perception distributed large model
CN113822114A (en) Image processing method, related equipment and computer readable storage medium
CN115866229B (en) Viewing angle conversion method, device, equipment and medium for multi-viewing angle image
CN112052945B (en) Neural network training method, neural network training device and electronic equipment
CN117494812A (en) Model reasoning method, device, electronic equipment and storage medium
CN114979267B (en) Semantic communication method and device for multi-service requirements
CN115115947A (en) Remote sensing image detection method and device, electronic equipment and storage medium
CN115131259A (en) Fusion image determining method, device, equipment and storage medium
CN114048284A (en) Construction method and device of reference expression positioning and segmentation model
CN113569809A (en) Image processing method, device and computer readable storage medium
CN112966670A (en) Face recognition method, electronic device and storage medium
CN117237822B (en) Collaborative reasoning method for basic model terminal deployment
CN113268601B (en) Information extraction method, reading and understanding model training method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination