CN117312957A - Remote sensing image recognition model generation method, device, equipment, medium and product - Google Patents

Remote sensing image recognition model generation method, device, equipment, medium and product Download PDF

Info

Publication number
CN117312957A
CN117312957A CN202311315567.3A CN202311315567A CN117312957A CN 117312957 A CN117312957 A CN 117312957A CN 202311315567 A CN202311315567 A CN 202311315567A CN 117312957 A CN117312957 A CN 117312957A
Authority
CN
China
Prior art keywords
image
remote sensing
model
identification element
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311315567.3A
Other languages
Chinese (zh)
Inventor
杨晓诚
冯如
许政伟
李铁岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311315567.3A priority Critical patent/CN117312957A/en
Publication of CN117312957A publication Critical patent/CN117312957A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing

Abstract

The application provides a remote sensing image recognition model generation method, device, equipment, medium and product, and relates to the technical field of artificial intelligence, the field of financial science and technology or other related fields, wherein the method comprises the following steps: acquiring an image description text and an example image; inputting the image description text into a semantic understanding model to generate corresponding identification element information and background description information; inputting the identification element information and the example image into a remote sensing image segmentation large model, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; inputting the background description information into a background generation model to generate a remote sensing background image; fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample; training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model. According to the method, the training generation effect of the model can be improved.

Description

Remote sensing image recognition model generation method, device, equipment, medium and product
Technical Field
The present disclosure relates to the field of artificial intelligence, financial technology, and other related fields, and in particular, to a method, an apparatus, a device, a medium, and a product for generating a remote sensing image recognition model.
Background
In the financial field, satellite remote sensing images are generally adopted for post-engineering loan monitoring, and the accuracy and efficiency of a remote sensing image recognition model are particularly important. In order to perform rapid iterative optimization on the remote sensing image recognition model, a large number of remote sensing training samples are required.
At present, the high-precision remote sensing image is difficult to produce, the remote sensing image is generally acquired by business in quarters and even year, and the model is difficult to learn enough target samples for training.
Therefore, the number of the remote sensing training samples is small at present, so that the remote sensing image recognition model training generation effect is poor and the iteration speed is low.
Disclosure of Invention
The application provides a remote sensing image recognition model generation method, device, equipment, medium and product, which are used for solving the problems of poor remote sensing image recognition model training effect generation and low iteration speed caused by the fact that the number of remote sensing training samples is small at present.
The first aspect of the present application provides a remote sensing image recognition model generating method, which includes:
Acquiring an image description text and an example image corresponding to a remote sensing training sample to be generated;
inputting the image description text to a converged semantic understanding model, and generating corresponding identification element information and background description information;
inputting the identification element information and the example image into a converged remote sensing image segmentation large model, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; the example image at least comprises identification elements corresponding to the identification element information;
inputting the background description information to a converged background generation model to generate a remote sensing background image;
fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample;
training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
Further, as described above, the remote sensing image segmentation large model includes: an encoder and a decoder;
the step of inputting the identification element information and the example image into a converged remote sensing image segmentation big model to generate a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image, comprising the following steps:
Performing feature extraction and image blocking processing on the example image by adopting an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image;
and adopting a decoder to perform attention mechanism processing and classification prediction processing on the position representation vector and the coding information matrix based on the identification element information, and generating a divided image corresponding to the image identification element and identification element categories corresponding to pixels in the divided image.
Further, the method as described above, the encoder comprises a swin-transformer network structure;
the method for generating the position representation vector and the coding information matrix corresponding to the example image by adopting an encoder to perform feature extraction and image blocking processing on the example image comprises the following steps:
performing feature extraction and image blocking processing on the example image by adopting an encoder to generate an image feature vector corresponding to an image block and a position feature vector corresponding to the relative position between the image blocks;
generating a position representation vector corresponding to the example image according to the image feature vector corresponding to the image block and the relative position vector between the image blocks;
and inputting the position representation vector into the swin-transformer network structure to generate coding information matrixes corresponding to all the image blocks.
Further, the method as described above, the decoder comprising an attention layer and a swin-transformer network structure;
the method for generating a divided image corresponding to an image recognition element and a recognition element category corresponding to each pixel in the divided image by using a decoder to perform attention mechanism processing and classification prediction processing on the position representation vector and the encoding information matrix based on the recognition element information comprises the following steps:
generating a corresponding original attention map according to the position representation vector, the identification element information and the attention layer;
generating a corresponding coding attention map according to the coding information matrix, the identification element information and the attention layer;
fitting the coded attention map based on the original attention map, and inputting the fitted coded attention map into a swin-transformer network structure, generating a segmented image of the same size as the image feature vector;
and carrying out classification prediction processing on the segmented image to generate identification element categories corresponding to the pixels in the segmented image.
Further, in the above method, the fusing the segmented image, the identification element category corresponding to each pixel in the segmented image, and the remote sensing background map to generate the remote sensing training sample includes:
Performing binarization processing on the identification element category corresponding to each pixel in the segmented image to generate mask labels corresponding to each pixel;
fusing the segmented image and the remote sensing background image to generate a corresponding fused image;
and fusing the mask labels and the fused image to generate a remote sensing training sample.
Further, in the method as described above, the fusing the segmented image and the remote sensing background image to generate a corresponding fused image includes:
performing size assimilation treatment on the segmented image and the remote sensing background image so that the sizes of the segmented image and the remote sensing background image after treatment are the same;
convolving and resampling the processed segmented image based on a preset spectral response function to generate an analog gray value of a panchromatic wave band;
performing schmitt orthogonalization conversion on the processed remote sensing background image to generate a corresponding schmitt orthogonalization conversion result;
correspondingly adjusting the analog gray value based on a preset mean variance adjustment algorithm to generate an adjusted analog gray value;
replacing a first component in the Schmidt orthogonalization conversion result with the adjusted analog gray value, and performing the Schmidt orthogonalization inverse transformation on the Schmidt orthogonalization conversion result to generate an inverse transformation result;
And removing a first wave band in the inverse transformation result, and generating a corresponding fusion image after the segmentation image and the remote sensing background image are fused.
Further, in the method, the background generation model generates a model for the artificial intelligent image;
inputting and training the background description information to a converged background generation model to generate a remote sensing background image, wherein the remote sensing background image comprises the following steps:
determining image keywords from the background description information by adopting an artificial intelligent image generation model;
and generating a corresponding remote sensing background image based on the image keywords by adopting an artificial intelligent image generation model.
Further, in the method as described above, before the training the identification element information and the example image into the converged remote sensing image segmentation big model to generate the segmented image corresponding to the identification element information and the identification element category corresponding to each pixel in the segmented image, the method further includes:
obtaining a training sample, wherein the training sample comprises the following steps: the remote sensing training image and the label information corresponding to the remote sensing training image;
inputting the training sample into a preset remote sensing image segmentation big model, and training the preset remote sensing image segmentation big model based on a preset random covering module; the preset random masking module is used for randomly masking pixels of the image block when the preset remote sensing image segmentation large model is subjected to image blocking processing;
Determining whether the preset remote sensing image segmentation large model meets preset convergence conditions according to the segmentation image output by the preset remote sensing image segmentation large model and the identification element category corresponding to each pixel in the segmentation image;
and if the preset remote sensing image segmentation big model meets the convergence condition, determining the preset remote sensing image segmentation big model meeting the convergence condition as the remote sensing image segmentation big model trained to be converged.
Further, according to the above method, training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model includes:
dividing the remote sensing training sample into a training set and a verification set according to a preset proportion;
performing iterative training on the preset remote sensing image recognition model by adopting a training set to generate preset remote sensing image recognition models corresponding to each iterative version;
performing model accuracy verification on the preset remote sensing image recognition models corresponding to the iteration versions based on the verification set, and generating average accuracy mean values corresponding to the preset remote sensing image recognition models of the iteration versions;
and determining a preset remote sensing image recognition model corresponding to the maximum average precision mean value as a target remote sensing image recognition model.
A second aspect of the present application provides a remote sensing image recognition model generating device, including:
the acquisition module is used for acquiring an image description text and an example image corresponding to the remote sensing training sample to be generated;
the first generation module is used for inputting and training the image description text to a converged semantic understanding model and generating corresponding identification element information and background description information;
the second generation module is used for inputting the identification element information and the example image into a converged remote sensing image segmentation large model and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; the example image at least comprises identification elements corresponding to the identification element information;
the third generation module is used for inputting the background description information into a converged background generation model to generate a remote sensing background image;
the fusion module is used for fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample;
and the fourth generation module is used for training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
Further, as described above, the remote sensing image segmentation large model includes: an encoder and a decoder;
the second generating module is specifically configured to:
performing feature extraction and image blocking processing on the example image by adopting an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image; and adopting a decoder to perform attention mechanism processing and classification prediction processing on the position representation vector and the coding information matrix based on the identification element information, and generating a divided image corresponding to the image identification element and identification element categories corresponding to pixels in the divided image.
Further, the apparatus as described above, the encoder comprises a swin-transformer network structure;
the second generation module is specifically configured to, when performing feature extraction and image blocking processing on the example image by using an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image:
performing feature extraction and image blocking processing on the example image by adopting an encoder to generate an image feature vector corresponding to an image block and a position feature vector corresponding to the relative position between the image blocks; generating a position representation vector corresponding to the example image according to the image feature vector corresponding to the image block and the relative position vector between the image blocks; and inputting the position representation vector into the swin-transformer network structure to generate coding information matrixes corresponding to all the image blocks.
Further, the apparatus as described above, the decoder comprising an attention layer and a swin-transformer network structure;
the second generation module is specifically configured to, when performing attention mechanism processing and classification prediction processing on the position representation vector and the coding information matrix based on the identification element information by using a decoder to generate a divided image corresponding to an image identification element and identification element categories corresponding to pixels in the divided image:
generating a corresponding original attention map according to the position representation vector, the identification element information and the attention layer; generating a corresponding coding attention map according to the coding information matrix, the identification element information and the attention layer; fitting the coded attention map based on the original attention map, and inputting the fitted coded attention map into a swin-transformer network structure, generating a segmented image of the same size as the image feature vector; and carrying out classification prediction processing on the segmented image to generate identification element categories corresponding to the pixels in the segmented image.
Further, in the apparatus as described above, the fusion module is specifically configured to:
performing binarization processing on the identification element category corresponding to each pixel in the segmented image to generate mask labels corresponding to each pixel; fusing the segmented image and the remote sensing background image to generate a corresponding fused image; and fusing the mask labels and the fused image to generate a remote sensing training sample.
Further, in the above apparatus, the fusion module is specifically configured to, when fusing the segmented image and the remote sensing background image to generate a corresponding fused image:
performing size assimilation treatment on the segmented image and the remote sensing background image so that the sizes of the segmented image and the remote sensing background image after treatment are the same; convolving and resampling the processed segmented image based on a preset spectral response function to generate an analog gray value of a panchromatic wave band; performing schmitt orthogonalization conversion on the processed remote sensing background image to generate a corresponding schmitt orthogonalization conversion result; correspondingly adjusting the analog gray value based on a preset mean variance adjustment algorithm to generate an adjusted analog gray value; replacing a first component in the Schmidt orthogonalization conversion result with the adjusted analog gray value, and performing the Schmidt orthogonalization inverse transformation on the Schmidt orthogonalization conversion result to generate an inverse transformation result; and removing a first wave band in the inverse transformation result, and generating a corresponding fusion image after the segmentation image and the remote sensing background image are fused.
Further, the apparatus as described above, wherein the background generation model generates a model for an artificial intelligent image;
The third generating module is specifically configured to:
determining image keywords from the background description information by adopting an artificial intelligent image generation model; and generating a corresponding remote sensing background image based on the image keywords by adopting an artificial intelligent image generation model.
Further, the apparatus as described above, further comprising:
the training module is used for acquiring a training sample, wherein the training sample comprises the following components: the remote sensing training image and the label information corresponding to the remote sensing training image; inputting the training sample into a preset remote sensing image segmentation big model, and training the preset remote sensing image segmentation big model based on a preset random covering module; the preset random masking module is used for randomly masking pixels of the image block when the preset remote sensing image segmentation large model is subjected to image blocking processing; determining whether the preset remote sensing image segmentation large model meets preset convergence conditions according to the segmentation image output by the preset remote sensing image segmentation large model and the identification element category corresponding to each pixel in the segmentation image; and if the preset remote sensing image segmentation big model meets the convergence condition, determining the preset remote sensing image segmentation big model meeting the convergence condition as the remote sensing image segmentation big model trained to be converged.
Further, in the apparatus as described above, the fourth generating module is specifically configured to:
dividing the remote sensing training sample into a training set and a verification set according to a preset proportion; performing iterative training on the preset remote sensing image recognition model by adopting a training set to generate preset remote sensing image recognition models corresponding to each iterative version; performing model accuracy verification on the preset remote sensing image recognition models corresponding to the iteration versions based on the verification set, and generating average accuracy mean values corresponding to the preset remote sensing image recognition models of the iteration versions; and determining a preset remote sensing image recognition model corresponding to the maximum average precision mean value as a target remote sensing image recognition model.
A third aspect of the present application provides an electronic device, comprising: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the remote sensing image recognition model generation method according to any one of the first aspects.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein computer-executable instructions, which when executed by a processor, are configured to implement the remote sensing image recognition model generation method of any one of the first aspects.
A fifth aspect of the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the remote sensing image recognition model generation method of any one of the first aspects.
The application provides a remote sensing image recognition model generation method, a device, equipment, a medium and a product, wherein the method comprises the following steps: acquiring an image description text and an example image corresponding to a remote sensing training sample to be generated; inputting the image description text to a converged semantic understanding model, and generating corresponding identification element information and background description information; inputting the identification element information and the example image into a converged remote sensing image segmentation large model, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; the example image at least comprises identification elements corresponding to the identification element information; inputting the background description information to a converged background generation model to generate a remote sensing background image; fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample; training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model. According to the remote sensing image recognition model generation method, the recognition element information and the example image are input into a converged remote sensing image segmentation large model, and a segmentation image corresponding to the recognition element information and recognition element types corresponding to pixels in the segmentation image are generated. And inputting the background description information to a converged background generation model to generate a remote sensing background image. Based on the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image, a remote sensing training sample with a remote sensing background and specific identification elements is generated in a fusion mode. Therefore, the generation difficulty of the remote sensing training sample is reduced, the remote sensing training sample can be generated more conveniently and rapidly, and meanwhile, the training generation effect of the remote sensing image recognition model is improved by training the preset remote sensing image recognition model based on the remote sensing training sample.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a scene graph of a remote sensing image recognition model generation method in which embodiments of the present application may be implemented;
fig. 2 is a schematic flow chart of a remote sensing image recognition model generation method provided in the present application;
FIG. 3 is a second schematic flow chart of the remote sensing image recognition model generation method provided by the present application;
fig. 4 is a schematic structural diagram of a remote sensing image recognition model generating device provided by the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in the present application.
Specific embodiments thereof have been shown by way of example in the drawings and will herein be described in more detail. These drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but to illustrate the concepts of the present application to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
In the technical scheme of the embodiment of the application, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order is not violated.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that the remote sensing image recognition model generation method, device, equipment, medium and product disclosed by the disclosure can be used in the artificial intelligence technical field, the financial science and technology field or other related fields. But also can be used in any fields other than the artificial intelligence technical field, the financial science and technology field or other related fields. The remote sensing image recognition model generation method, device, equipment, medium and product application field are not limited.
The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
For a clear understanding of the technical solutions of the present application, the prior art solutions will be described in detail first. The task of monitoring engineering post-lending based on remote sensing target detection technology has been greatly progressed in recent years, but the requirement of rapid iterative optimization of a remote sensing model in a business complex scene has a great challenge. The following problems are mainly present:
the high-precision remote sensing image acquisition cost is high, the cost is saved in general business, the remote sensing image acquisition is carried out by seasons or even years, and the model is difficult to learn enough target samples for training. The background of the remote sensing image is complex, background features of different seasons in some areas have huge differences, and the recognition accuracy of the business target detection task in the image is greatly affected. Therefore, the model is difficult to learn enough target samples for training, so that the remote sensing image recognition model training effect is poor and the iteration speed is low.
Therefore, aiming at the problems of poor training generation effect and low iteration speed of a remote sensing image recognition model caused by the small number of remote sensing training samples in the prior art, the inventor finds that in order to solve the problems, an example picture containing recognition elements can be segmented and extracted through a remote sensing image segmentation large model to generate a segmented image, a remote sensing background picture is generated through a background generation model, and meanwhile, the segmented image and the remote sensing background picture are fused to obtain the remote sensing image which can be used for training. Thus solving the problem of the number of remote sensing training samples.
Specifically, an image description text and an example image corresponding to a remote sensing training sample to be generated are obtained. And training the image description text input to a converged semantic understanding model, and generating corresponding identification element information and background description information. And inputting the identification element information and the example image into a remote sensing image segmentation large model trained to be converged, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image. The example image includes at least an identification element corresponding to the identification element information. And inputting the background description information to a converged background generation model to generate a remote sensing background image. And fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample. Training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
According to the remote sensing image recognition model generation method, recognition element information and an example image are input and trained to a converged remote sensing image segmentation large model, and a segmentation image corresponding to the recognition element information and recognition element types corresponding to pixels in the segmentation image are generated. And inputting the background description information to a converged background generation model to generate a remote sensing background image. Based on the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image, the remote sensing training sample with the remote sensing background and specific identification elements is generated in a fusion mode. Therefore, the generation difficulty of the remote sensing training sample is reduced, the remote sensing training sample can be generated more conveniently and rapidly, and meanwhile, the training generation effect of the remote sensing image recognition model is improved by training the preset remote sensing image recognition model based on the remote sensing training sample.
The inventor puts forward the technical scheme of the application based on the creative discovery.
The application scenario of the remote sensing image recognition model generation method provided by the embodiment of the application is described below. As shown in fig. 1, 1 is a first electronic device, and 2 is a second electronic device. The network architecture of the application scene corresponding to the remote sensing image recognition model generation method provided by the embodiment of the application comprises the following steps: a first electronic device 1 and a second electronic device 2. The second electronic device 2 may be a database or the like. The second electronic device 2 stores image description text and example images corresponding to the remote sensing training samples to be generated. The image description text and the example image may be provided by a user and entered into the second electronic device 2.
For example, when the remote sensing image recognition model generation is required, the following procedure is performed:
(1) the second electronic device 2 sends the image description text and the example image to the first electronic device 1.
(2) The first electronic device 1 trains the image description text input to a converged semantic understanding model, and generates corresponding identification element information and background description information.
(3) The first electronic device 1 inputs the identification element information and the example image into a remote sensing image segmentation large model trained to converge, and generates a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image. The example image includes at least an identification element corresponding to the identification element information.
(4) The first electronic device 1 inputs the background description information to the converged background generation model to generate a remote sensing background map.
(5) The first electronic device 1 fuses the split image, the identification element category corresponding to each pixel in the split image and the remote sensing background image, and generates a remote sensing training sample. After the remote sensing training sample is generated, the remote sensing training sample can be sent to the electronic equipment for training the remote sensing image recognition model, so that the training effect on the remote sensing image recognition model is improved.
Embodiments of the present application are described below with reference to the accompanying drawings.
Fig. 2 is a schematic flow chart of a remote sensing image recognition model generating method provided in the present application, as shown in fig. 2, in this embodiment, an execution subject of the embodiment of the present application is a remote sensing image recognition model generating device, and the remote sensing image recognition model generating device may be integrated in an electronic device. The remote sensing image recognition model generation method provided by the embodiment comprises the following steps:
step S101, acquiring an image description text and an example image corresponding to a remote sensing training sample to be generated.
In the present embodiment, the manner of acquiring may be a manner of inputting the image description text and the example image by the user, or may be acquired from a device storing the image description text and the example image, which is not limited in the present embodiment.
Step S102, inputting and training the image description text to a converged semantic understanding model, and generating corresponding identification element information and background description information.
In the present embodiment, the identification element information includes identification element categories.
What the semantic understanding model does is primarily an entity identification (Name Entity Recognition) task for identifying key entities within a user-entered textual description. The entity identification solves the sequence labeling problem in the natural language processing task: given an input sentence, it is required to make entity labeling (e.g., place name, date, etc.) for each part in the sentence. In this embodiment, the semantic understanding module mainly focuses on two types of key entities: image background description and specific identification element category.
The semantic understanding model may be trained in advance, and in the training stage, the embodiment may label image background descriptions (such as jungle, city, mountain covered by snow, etc.) and specific recognition element categories (such as windmill, ship, oil drum, etc.) in different projects as training samples by collecting text descriptions of a large amount of project monitoring information provided by the service through the pretraining model ERNIE (ERNIE is a semantic understanding pretraining framework based on continuous learning). And fine-tuning an ERNIE model by using a training sample, setting a wakeup dynamic learning rate, and finally storing the model with the best effect on the verification set. The wakeup is a common technology, and can effectively relieve training instability of a deep neural network in an initial stage, accelerate convergence speed of a model and improve generalization capability of the model.
Step S103, inputting the identification element information and the example image into a remote sensing image segmentation large model trained to be converged, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image. The example image includes at least an identification element corresponding to the identification element information.
In this embodiment, the remote sensing image segmentation large model is used for segmenting a corresponding part of images in the example image based on the identification element information, so as to generate a segmented image corresponding to the identification element information and identification element categories corresponding to each pixel in the segmented image. For example, if the example image includes identification elements such as a building, a windmill, and a car, and the identification element information is a windmill, the image content corresponding to the windmill and the identification element category corresponding to each pixel of the image content may be segmented from the example image by the remote sensing image segmentation large model, for example, if the identification element category is background or the identification element category is windmill.
And step S104, inputting the background description information to a converged background generation model to generate a remote sensing background image.
In this embodiment, the background generation model may adopt an artificial intelligent image generation model, so that a corresponding remote sensing background map is directly generated based on the background description information.
Step S105, fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample.
By fusing the segmented image with the identification element, the identification element category corresponding to each pixel in the segmented image, and the remote sensing background map with the background, a remote sensing training sample can be generated that contains both the identification element, the background, and the label corresponding to the identification element.
Step S106, training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
In general, the remote sensing training samples can be divided into a training set and a verification set, so that the preset remote sensing image recognition model is trained based on the training set, and the accuracy of the preset remote sensing image recognition model is verified based on the verification set, so that the corresponding target remote sensing image recognition model is generated.
According to the remote sensing image recognition model generation method, image description text and an example image corresponding to a remote sensing training sample to be generated are obtained. And training the image description text input to a converged semantic understanding model, and generating corresponding identification element information and background description information. And inputting the identification element information and the example image into a remote sensing image segmentation large model trained to be converged, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image. The example image includes at least an identification element corresponding to the identification element information. And inputting the background description information to a converged background generation model to generate a remote sensing background image. And fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample. Training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
According to the remote sensing image recognition model generation method, recognition element information and an example image are input and trained to a converged remote sensing image segmentation large model, and a segmentation image corresponding to the recognition element information and recognition element types corresponding to pixels in the segmentation image are generated. And inputting the background description information to a converged background generation model to generate a remote sensing background image. Based on the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image, the remote sensing training sample with the remote sensing background and specific identification elements is generated in a fusion mode. Therefore, the generation difficulty of the remote sensing training sample is reduced, the remote sensing training sample can be generated more conveniently and rapidly, and meanwhile, the training generation effect of the remote sensing image recognition model is improved by training the preset remote sensing image recognition model based on the remote sensing training sample.
Fig. 3 is a second schematic flow chart of the remote sensing image recognition model generating method provided in the present application, as shown in fig. 3, in which the flow of generating the training sample is further refined based on the remote sensing image recognition model generating method provided in the previous embodiment of the present application. The remote sensing image recognition model generation method provided by the embodiment comprises the following steps.
Step S201, an image description text and an example image corresponding to a remote sensing training sample to be generated are obtained.
In this embodiment, the implementation of step 201 is similar to that of step 101 in the previous embodiment, and will not be described here again.
Step S202, inputting and training the image description text to a converged semantic understanding model, and generating corresponding identification element information and background description information.
In this embodiment, the implementation of step 202 is similar to that of step 102 in the previous embodiment, and will not be described here again.
In this embodiment, the remote sensing image segmentation large model includes: an encoder and a decoder.
Optionally, in this embodiment, before the remote sensing image segmentation, training may be performed on the remote sensing image segmentation large model, which specifically includes the following steps:
Obtaining a training sample, wherein the training sample comprises the following steps: the remote sensing training image and the corresponding annotation information of the remote sensing training image.
And inputting the training sample into a preset remote sensing image segmentation big model, and training the preset remote sensing image segmentation big model based on a preset random covering module. The preset random masking module is used for randomly masking pixels of the image block when the preset remote sensing image segmentation large model is subjected to image segmentation processing.
And determining whether the preset remote sensing image segmentation large model meets preset convergence conditions according to the segmentation image output by the preset remote sensing image segmentation large model and the identification element category corresponding to each pixel in the segmentation image.
If the preset remote sensing image segmentation big model meets the convergence condition, determining the preset remote sensing image segmentation big model meeting the convergence condition as the remote sensing image segmentation big model trained to be converged.
In this embodiment, multiple image annotation data sets are introduced for the self-supervision learning task of the remote sensing image segmentation large model, and a training data set combination corresponding to the remote sensing training image is formed. The training data set covers a large number of general segmentation data sets and remote sensing special segmentation data sets, the general data sets increase the diversity of data, the general segmentation capability of the pre-training model is enhanced, and the remote sensing special public data sets are used for enhancing the professional segmentation capability of the pre-training model in a remote sensing scene. The three main representative data sets are described below, and other data sets may be used, which is not limited in this embodiment:
Segment analysis 1-Billion (SA-1B): is composed of over 1100 tens of thousands of diverse, high resolution images and corresponding high quality segmentation mask annotations from image providers across multiple countries and regions of different regions and income levels. The subject of the image mainly comprises places, objects, scenes and the like, the segmentation subjects contained in the image are different, and the mask marks fine-grained details ranging from large objects such as buildings and the like to small animals and the like are mainly used for training the segmentation capability of the model on general objects.
Remote sensing private public data set: the disclosed dataset comprises various remote sensing land block segmentation contests, and consists of over 100 ten thousand remote sensing images with different resolutions, different scenes and objects, and is mainly used for training the generalization capability of a model in the remote sensing scenes.
Satellite acquired data sets: the remote sensing image data are further captured from satellites and aviation platforms, and different objects and scenes of six continents are covered, so that the remote sensing image data are mainly used for carrying out final downstream task training fine adjustment on the model, and the adaptability of the model to different scenes is improved.
Because a large amount of annotation data are needed for training a basic model in the conventional supervised learning at present, a large amount of remote sensing field experts and a large amount of annotation time are needed for acquiring the annotation data due to the fact that the remote sensing field images are high in resolution and uneven in quality, the embodiment designs a multi-scale random mask-based generation type self-supervised learning training method, and aims to cope with the remote sensing images with different resolutions, reconstruct hidden pixels and learn general feature representations in remote sensing image distribution.
The method is characterized in that pixel blocks of certain areas in an image are randomly covered, and a network is required to reconstruct an original image, and the accuracy rate of reconstructing the original image is larger than a preset threshold value as a convergence condition. Different from a general random masking method, which is used for generating image blocks according to fixed sizes and randomly and completely masking the image blocks, the embodiment designs a multi-scale random masking method in order to cope with the multi-resolution characteristic of a remote sensing image and avoid losing the characteristic information of small objects in the image.
First, multiple image block sizes are designed in the stage of generating an image block, larger tile sizes (64×64, 128×128, etc.) are designed for images with higher resolution (e.g., 0.5m,0.75m resolution), and smaller tile sizes (14×14, 28×28, etc.) are designed for images with lower resolution (e.g., 10m resolution), so that the pixel occupation of small objects in image blocks with different resolutions is guaranteed to be higher. Second, the strategy of randomly and completely masking the image blocks is replaced by masking with 1×1,3×3,5×5 small windows, so that pixel characteristic information of some small objects is effectively reserved in each image block. According to the embodiment, pixels of the image block are randomly covered when the remote sensing image segmentation large model is used for image segmentation processing, and the network is required to reconstruct an original image, so that the training effect of the remote sensing image segmentation large model can be improved.
In step S203, the encoder is used to perform feature extraction and image blocking processing on the example image to generate a position representation vector and an encoding information matrix corresponding to the example image.
The position representation vector is associated with position information of the example image and the encoding information matrix is associated with image information of the example image.
Optionally, in this embodiment, the encoder includes a swin-transformer network structure.
S203 may specifically be:
and carrying out feature extraction and image blocking processing on the example image by adopting an encoder to generate an image feature vector corresponding to the image block and a position feature vector corresponding to the relative position between the image blocks.
And generating a position representation vector corresponding to the example image according to the image feature vector corresponding to the image block and the relative position vector between the image blocks.
And inputting the position representation vector into a swin-transformer network structure to generate coding information matrixes corresponding to all the image blocks.
In this embodiment, the model structure introduces a swin-transformer as the backbone network of the encoder, and the encoder performs feature extraction on the example picture after receiving the example picture, so as to generate the original image matrix vector X. The encoder segments the input image and adds the image feature vector of the image block and the position feature vector of the relative position of the image block on the original image to obtain a position representation vector X' corresponding to the original image matrix. The position representation vector X' is transmitted into a swin-transformer backbone network, and the coding information matrix C of all image blocks of the original image is obtained after N continuous swin-transformer block. The dimension n of the coding information matrix is d, where n represents the number of image blocks and d represents the dimension of each image block.
In step S204, the decoder performs attention mechanism processing and classification prediction processing on the position expression vector and the encoding information matrix based on the identification element information, and generates a divided image corresponding to the image identification element and identification element categories corresponding to the pixels in the divided image.
Optionally, in this embodiment, the decoder includes an attention layer and a swin-transformer network structure.
S204 may specifically be:
based on the position representation vector, the identification element information, and the attention layer, a corresponding original attention map is generated.
Based on the coded information matrix, the identification element information, and the attention layer, a corresponding coded attention map is generated.
Fitting the coded attention map based on the original attention map, and inputting the fitted coded attention map into a swin-transformer network structure, a segmented image of the same size as the image feature vector is generated.
And carrying out classification prediction processing on the segmented image to generate identification element categories corresponding to the pixels in the segmented image.
In order to solve the problem that inter-class sharing and cross-sample feature key region discrimination are difficult to find when fine-granularity visual objects in a remote sensing image are segmented, a self-enhancement attention mechanism is designed in the embodiment, and an attention layer is added in a decoder. First, the decoder inputs the encoded information matrix C into a ESE (Effective Squeeze and Extraction) attention layer, which processes the encoded information matrix C based on the identification element information, to generate an encoded attention map Ce. At the same time, the position representation vector X ' is input into an ESE attention layer, which processes the position representation vector X ' based on the identification element information to generate an original attention map X ' e.
The position information in the original attention map X 'e is utilized to carry out fitting optimization on the coding attention map, and KL divergence (Kullback-Leibler divergence) between Ce and X' e is optimized, which is also called relative entropy, so that the corresponding relation between the position and the feature in the coding attention map is optimized. Then, after the fitted coded attention is intended to be input to N consecutive swins-transformer block, a segmented image of the same size as the input original image matrix vector X is obtained. And finally, attaching an image segmentation task head to the pre-trained trunk, wherein the task head mainly comprises a linear layer and a softmax layer and is used for predicting the segmentation result category of each pixel and finally generating the identification element category corresponding to each pixel in the segmented image.
Step S205, inputting the background description information to a converged background generation model to generate a remote sensing background image.
Optionally, in this embodiment, the background generating model is an artificial intelligent image generating model. S205 may specifically be:
an artificial intelligence image generation model is used to determine image keywords from the background description information.
And generating a corresponding remote sensing background image based on the image keywords by adopting an artificial intelligent image generation model.
In this embodiment, the artificial intelligent image generating model may adopt a Stablediffusion (SD) model, and the corresponding remote sensing background image may be generated through the image keywords.
In order to realize the task of generating a complex remote sensing background, the invention constructs a multi-resolution and multi-scene remote sensing special training data set, and introduces a streambooth training method to finely adjust an SD model.
In this embodiment, multiple image annotation datasets are introduced for the background generation model fine tuning task, forming a training dataset combination. The following is presented for the main dataset:
million-aid dataset: the system comprises millions of remote sensing scene examples and 51 scene categories, covers different scenes such as agricultural land, industrial land, living land and water area, and has the advantages of high spatial resolution, large scale, global distribution and the like.
SIRI-WU dataset: the system comprises 12 types of images obtained from satellite images, including scenes such as ports, parks, houses, rivers and the like.
NWPU-RESISC45 dataset: the system comprises more than 3 ten thousand high-resolution remote sensing images, and covers 45 scene types such as airports, snowmountains, jungles, deserts, commercial areas and the like.
Other data sets may be employed, and this embodiment is not limited in this regard.
In order to solve the problem that the output domain and the expressive force are limited when the original Stablediffusion (SD) model generates a complex remote sensing background image, language drift and overfitting are avoided, a streambooth algorithm is introduced in the embodiment, and fine adjustment is performed on the generated model, so that the function of truly recovering a keyword entity in the image is realized.
The principle of the streambooth algorithm is that after training pictures and keywords are input, a method of adding a weight of a special identifier to entity description is adopted, and a scene is modified by using special features learned by the special identifier based on scene category features learned by artificial intelligence, so that a background image has complexity and reality.
When the data set is used for training an original SD model, a streambooth algorithm is introduced, training pictures with key entities (such as vehicles) are prepared, texts with special identifiers (V) added to the key entities (such as vehicles) are input into the SD model for fine tuning training of the whole model network, and therefore complexity and authenticity of image detail generation are enhanced. Taking the example of generating a parking lot scene of 'fully parked vehicles', a model adds a special identifier [ V ] to the description of the vehicles, and more reasonable and complex parking lot scene pictures can be generated by learning the personalized details of various vehicles.
In step S206, binarizing the identification element class corresponding to each pixel in the divided image to generate a mask label corresponding to each pixel.
After the binarization processing is performed on the identification element category, if 1 is represented as a background and 0 is represented as an identification element, a value corresponding to each pixel can be output, and the value is marked as a mask for use in the subsequent training model.
Step S207, fusing the segmented image and the remote sensing background image to generate a corresponding fused image.
Optionally, in this embodiment, the image fusion is mainly performed by fusing the multispectral low-resolution segmented image and the high-resolution panchromatic band background generated image, so as to obtain a result sample image with more abundant information.
Alternatively, in this embodiment, S207 may specifically be:
and performing size assimilation treatment on the segmented image and the remote sensing background image so that the sizes of the segmented image and the remote sensing background image after treatment are the same.
And carrying out convolution resampling on the processed segmented image based on a preset spectral response function to generate an analog gray value of the panchromatic wave band.
And performing Schmidt orthogonalization conversion on the processed remote sensing background image to generate a corresponding Schmidt orthogonalization conversion result.
And correspondingly adjusting the analog gray value based on a preset mean variance adjustment algorithm to generate an adjusted analog gray value.
And replacing the first component in the Schmidt orthogonalization conversion result with the adjusted analog gray value, and performing the Schmidt orthogonalization inverse transformation on the Schmidt orthogonalization conversion result to generate an inverse transformation result.
And removing a first wave band in the inverse transformation result, and generating a corresponding fusion image after fusion of the segmentation image and the remote sensing background image.
The size assimilation processing of the segmented image and the remote sensing background image may be performed by size assimilating the segmented image of multispectral low resolution according to the size of the high-resolution full-color band remote sensing background image, even if the size of the segmented image becomes the same as the size of the remote sensing background image.
The method comprises the steps of setting a multispectral low-resolution segmented image as X, convolving the X with a certain weight matrix W according to a preset spectral response function, and resampling to obtain an analog gray value P of a panchromatic wave band:
the schmitt orthogonalization conversion function H is introduced to carry out schmitt orthogonalization conversion on the processed remote sensing background image, and the method concretely comprises the following steps:
wherein T represents the number of the wave band to be converted, Z represents the wave band of the remote sensing background image, mu T Represents the mean value of the band T, phi (H l ,Z T ) The covariance of the band Z representing the remote sensing background image and the band H representing the result of the Schmidt orthogonalization conversion is made as a quotient between the covariance of the band Z representing the remote sensing background image and the band H representing the result of the Schmidt orthogonalization conversion.
Based on a preset mean variance adjustment algorithm, the analog gray value is correspondingly adjusted, and the adjusted analog gray value is generated specifically as follows:
wherein P is the gray value, k of the low resolution panchromatic band image 1 Is gain, k 2 For the offset, the following is calculated:
k 2 =μ I -(k 1 ×μ P )
wherein I represents the brightness component of the divided image, v represents the variance of the gray value P of the full-color band image, μ I Sum mu P The average value of the brightness component I and the gray value P of the full-color band image is respectively.
Using adjusted low resolution panchromatic bandsSubstitution of the first component H after Schmidt orthogonalization 1 The rest components keep the components after the high-resolution panchromatic band schmidt orthogonalization, and carry out inverse transformation:
finally, removing the first wave band of the inverse transformation result Z, and outputting the fused image Z' after fusion.
And step S208, fusing mask labels and fused images to generate a remote sensing training sample.
In the embodiment, the problems of poor model training effect and low iteration speed in the process of remote sensing image recognition modeling in a monitoring scene after engineering lending based on satellite remote sensing images are actually solved, a large model is segmented through training the remote sensing images, and a background generation model based on an artificial intelligence technology is accessed, so that a user is helped to conveniently and rapidly generate training image samples containing different complex backgrounds and different recognition elements, and the modeling efficiency of the remote sensing image recognition training model and the accuracy of a prediction model are improved to a great extent.
Optionally, in this embodiment, training the preset remote sensing image recognition model based on the remote sensing training sample may include the following specific steps of:
dividing a remote sensing training sample into a training set and a verification set according to a preset proportion;
performing iterative training on the preset remote sensing image recognition model by adopting a training set to generate preset remote sensing image recognition models corresponding to each iterative version;
performing model accuracy verification on the preset remote sensing image recognition models corresponding to the iteration versions based on the verification set, and generating average accuracy mean values corresponding to the preset remote sensing image recognition models of the iteration versions;
and determining a preset remote sensing image recognition model corresponding to the maximum average precision mean value as a target remote sensing image recognition model.
In this embodiment, the remote sensing training samples may be separated into a training set and a verification set according to a preset ratio of 7:3, and used for training a preset remote sensing image recognition model. The preset remote sensing image recognition model can be described by a statistically inferred framework, and the probability of the model for making the first type of errors and the second type of errors is concerned, and is generally described by accuracy and recall. The accuracy describes how accurate the model is, i.e. how many are true examples in the result predicted to be positive examples; recall describes how complete the model is, i.e., how much is predicted as positive by our model in a sample that is true. Different tasks have different preferences for the two types of errors, often in cases where one type of error is not more than a certain threshold, an effort is made to reduce the other type of error. In the detection, mAP (mean Average Precision, average precision mean) is taken into consideration as a unified index.
Specifically, for each picture of the verification set, the preset remote sensing image recognition model outputs a plurality of prediction frames (often far exceeding the number of real frames), and IoU (Intersection Over Union, cross-over ratio) is used in this embodiment to mark whether the prediction frames are correctly predicted. After marking, the recall rate is always improved along with the increase of the prediction frames, the accuracy is averaged under different recall rate levels to obtain the AP, and then the average is carried out on all the categories according to the proportion of the categories to obtain the mAP. Through iterative training, after each iteration version is set, mAP is evaluated on a verification set, and after training is finished, a preset remote sensing image recognition model corresponding to the iteration version with the largest mAP is derived and used as a model finally output by training.
Fig. 4 is a schematic structural diagram of a remote sensing image recognition model generating device provided in the present application, as shown in fig. 4, in this embodiment, the remote sensing image recognition model generating device 300 may be disposed in an electronic device, such as a terminal device, and the remote sensing image recognition model generating device 300 includes:
the acquiring module 301 is configured to acquire an image description text and an example image corresponding to a remote sensing training sample to be generated.
The first generation module 302 is configured to train the image description text input to a converged semantic understanding model, and generate corresponding recognition element information and background description information.
The second generation module 303 is configured to input the identification element information and the example image into a remote sensing image segmentation large model trained to converge, and generate a segmentation image corresponding to the identification element information and an identification element category corresponding to each pixel in the segmentation image. The example image includes at least an identification element corresponding to the identification element information.
The third generating module 304 is configured to input the background description information to the converged background generating model and generate a remote sensing background map.
The fusion module 305 is configured to fuse the segmented image, the identification element class corresponding to each pixel in the segmented image, and the remote sensing background map, and generate a remote sensing training sample.
And a fourth generating module 306, configured to train a preset remote sensing image recognition model based on the remote sensing training sample, and generate a corresponding target remote sensing image recognition model.
The remote sensing image recognition model generating device provided in this embodiment may execute the technical scheme of the method embodiment shown in fig. 2, and its implementation principle and technical effect are similar to those of the method embodiment shown in fig. 2, and are not described in detail herein.
The remote sensing image recognition model generating device provided in the present application further refines the remote sensing image recognition model generating device based on the remote sensing image recognition model generating device provided in the previous embodiment, and then the remote sensing image recognition model generating device 300 includes:
Optionally, in this embodiment, the remote sensing image segmentation large model includes: an encoder and a decoder.
The second generating module 303 is specifically configured to:
and performing feature extraction and image blocking processing on the example image by adopting an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image. The decoder is used for carrying out attention mechanism processing and classification prediction processing on the position representation vector and the coding information matrix based on the identification element information, and generating a divided image corresponding to the image identification element and identification element categories corresponding to pixels in the divided image.
Optionally, in this embodiment, the encoder includes a swin-transformer network structure.
The second generating module 303 is specifically configured to, when performing feature extraction and image blocking processing on the example image by using an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image:
and carrying out feature extraction and image blocking processing on the example image by adopting an encoder to generate an image feature vector corresponding to the image block and a position feature vector corresponding to the relative position between the image blocks. And generating a position representation vector corresponding to the example image according to the image feature vector corresponding to the image block and the relative position vector between the image blocks. And inputting the position representation vector into a swin-transformer network structure to generate coding information matrixes corresponding to all the image blocks.
Optionally, in this embodiment, the decoder includes an attention layer and a swin-transformer network structure.
The second generation module 303 is specifically configured to, when performing, by using a decoder, attention mechanism processing and classification prediction processing on the position expression vector and the encoding information matrix based on the identification element information, to generate a divided image corresponding to the image identification element and an identification element category corresponding to each pixel in the divided image:
based on the position representation vector, the identification element information, and the attention layer, a corresponding original attention map is generated. Based on the coded information matrix, the identification element information, and the attention layer, a corresponding coded attention map is generated. Fitting the coded attention map based on the original attention map, and inputting the fitted coded attention map into a swin-transformer network structure, a segmented image of the same size as the image feature vector is generated. And carrying out classification prediction processing on the segmented image to generate identification element categories corresponding to the pixels in the segmented image.
Optionally, in this embodiment, the fusion module 305 is specifically configured to:
and performing binarization processing on the identification element types corresponding to each pixel in the segmented image to generate mask labels corresponding to each pixel. And fusing the segmented image and the remote sensing background image to generate a corresponding fused image. And fusing the mask labels and the fused images to generate a remote sensing training sample.
Optionally, in this embodiment, when the fusion module 305 fuses the split image and the remote sensing background image to generate a corresponding fused image, the fusion module is specifically configured to:
and performing size assimilation treatment on the segmented image and the remote sensing background image so that the sizes of the segmented image and the remote sensing background image after treatment are the same. And carrying out convolution resampling on the processed segmented image based on a preset spectral response function to generate an analog gray value of the panchromatic wave band. And performing Schmidt orthogonalization conversion on the processed remote sensing background image to generate a corresponding Schmidt orthogonalization conversion result. And correspondingly adjusting the analog gray value based on a preset mean variance adjustment algorithm to generate an adjusted analog gray value. And replacing the first component in the Schmidt orthogonalization conversion result with the adjusted analog gray value, and performing the Schmidt orthogonalization inverse transformation on the Schmidt orthogonalization conversion result to generate an inverse transformation result. And removing a first wave band in the inverse transformation result, and generating a corresponding fusion image after fusion of the segmentation image and the remote sensing background image.
Optionally, in this embodiment, the background generating model is an artificial intelligent image generating model.
The third generating module 304 is specifically configured to:
An artificial intelligence image generation model is used to determine image keywords from the background description information. And generating a corresponding remote sensing background image based on the image keywords by adopting an artificial intelligent image generation model.
Optionally, in this embodiment, the remote sensing image identification model generating device 300 further includes:
the training module is used for acquiring a training sample, and the training sample comprises: the remote sensing training image and the corresponding annotation information of the remote sensing training image. And inputting the training sample into a preset remote sensing image segmentation big model, and training the preset remote sensing image segmentation big model based on a preset random covering module. The preset random masking module is used for randomly masking pixels of the image block when the preset remote sensing image segmentation large model is subjected to image segmentation processing. And determining whether the preset remote sensing image segmentation large model meets preset convergence conditions according to the segmentation image output by the preset remote sensing image segmentation large model and the identification element category corresponding to each pixel in the segmentation image. If the preset remote sensing image segmentation big model meets the convergence condition, determining the preset remote sensing image segmentation big model meeting the convergence condition as the remote sensing image segmentation big model trained to be converged.
Optionally, in this embodiment, the fourth generating module 306 is specifically configured to:
And dividing the remote sensing training sample into a training set and a verification set according to a preset proportion. And carrying out iterative training on the preset remote sensing image recognition model by adopting a training set, and generating the preset remote sensing image recognition model corresponding to each iterative version. And carrying out model accuracy verification on the preset remote sensing image recognition models corresponding to the iteration versions based on the verification set, and generating average accuracy mean values corresponding to the preset remote sensing image recognition models of the iteration versions. And determining a preset remote sensing image recognition model corresponding to the maximum average precision mean value as a target remote sensing image recognition model.
The remote sensing image recognition model generating device provided in this embodiment may execute the technical scheme of the method embodiment shown in fig. 2-3, and its implementation principle and technical effect are similar to those of the method embodiment shown in fig. 2-3, and are not described in detail herein.
According to embodiments of the present application, there is also provided an electronic device, a computer-readable storage medium, and a computer program product.
As shown in fig. 5, fig. 5 is a schematic structural diagram of the electronic device provided in the present application. Electronic devices are intended for various forms of digital computers, such as laptops, desktops, personal digital assistants, blade servers, mainframes, and other appropriate computers. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 5, the electronic device includes: a processor 401 and a memory 402. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device.
Memory 402 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the remote sensing image recognition model generation method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to execute the remote sensing image recognition model generation method provided by the present application.
The memory 402 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules corresponding to the remote sensing image recognition model generating method in the embodiment of the present application (e.g., the acquiring module 301, the first generating module 302, the second generating module 303, the third generating module 304, the fusion module 305, and the fourth generating module 306 shown in fig. 4). The processor 401 executes various functional applications of the electronic device and data processing by running non-transitory software programs, instructions and modules stored in the memory 402, that is, implements the remote sensing image recognition model generation method in the above-described method embodiment.
Meanwhile, the embodiment also provides a computer product, and when instructions in the computer product are executed by a processor of the electronic device, the electronic device is enabled to execute the remote sensing image recognition model generation method of the embodiment.
Other embodiments of the examples herein will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the embodiments of the application following, in general, the principles of the embodiments and including such departures from the present disclosure as come within known or customary practice within the art to which the embodiments of the application pertains.
It is to be understood that the embodiments of the present application are not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of embodiments of the present application is limited only by the appended claims.

Claims (13)

1. The remote sensing image recognition model generation method is characterized by comprising the following steps of:
acquiring an image description text and an example image corresponding to a remote sensing training sample to be generated;
inputting the image description text to a converged semantic understanding model, and generating corresponding identification element information and background description information;
Inputting the identification element information and the example image into a converged remote sensing image segmentation large model, and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; the example image at least comprises identification elements corresponding to the identification element information;
inputting the background description information to a converged background generation model to generate a remote sensing background image;
fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample;
training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
2. The method of claim 1, wherein the remote sensing image segmentation large model comprises: an encoder and a decoder;
the step of inputting the identification element information and the example image into a converged remote sensing image segmentation big model to generate a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image, comprising the following steps:
performing feature extraction and image blocking processing on the example image by adopting an encoder to generate a position representation vector and an encoding information matrix corresponding to the example image;
And adopting a decoder to perform attention mechanism processing and classification prediction processing on the position representation vector and the coding information matrix based on the identification element information, and generating a divided image corresponding to the image identification element and identification element categories corresponding to pixels in the divided image.
3. The method of claim 2, wherein the encoder comprises a swin-transformer network structure;
the method for generating the position representation vector and the coding information matrix corresponding to the example image by adopting an encoder to perform feature extraction and image blocking processing on the example image comprises the following steps:
performing feature extraction and image blocking processing on the example image by adopting an encoder to generate an image feature vector corresponding to an image block and a position feature vector corresponding to the relative position between the image blocks;
generating a position representation vector corresponding to the example image according to the image feature vector corresponding to the image block and the relative position vector between the image blocks;
and inputting the position representation vector into the swin-transformer network structure to generate coding information matrixes corresponding to all the image blocks.
4. A method according to claim 3, wherein the decoder comprises an attention layer and a swin-transformer network structure;
The method for generating a divided image corresponding to an image recognition element and a recognition element category corresponding to each pixel in the divided image by using a decoder to perform attention mechanism processing and classification prediction processing on the position representation vector and the encoding information matrix based on the recognition element information comprises the following steps:
generating a corresponding original attention map according to the position representation vector, the identification element information and the attention layer;
generating a corresponding coding attention map according to the coding information matrix, the identification element information and the attention layer;
fitting the coded attention map based on the original attention map, and inputting the fitted coded attention map into a swin-transformer network structure, generating a segmented image of the same size as the image feature vector;
and carrying out classification prediction processing on the segmented image to generate identification element categories corresponding to the pixels in the segmented image.
5. The method of claim 4, wherein the fusing the segmented image, the identification element class corresponding to each pixel in the segmented image, and the remote sensing background map to generate the remote sensing training sample comprises:
performing binarization processing on the identification element category corresponding to each pixel in the segmented image to generate mask labels corresponding to each pixel;
Fusing the segmented image and the remote sensing background image to generate a corresponding fused image;
and fusing the mask labels and the fused image to generate a remote sensing training sample.
6. The method of claim 5, wherein fusing the segmented image and the remote sensing background map to generate a corresponding fused image comprises:
performing size assimilation treatment on the segmented image and the remote sensing background image so that the sizes of the segmented image and the remote sensing background image after treatment are the same;
convolving and resampling the processed segmented image based on a preset spectral response function to generate an analog gray value of a panchromatic wave band;
performing schmitt orthogonalization conversion on the processed remote sensing background image to generate a corresponding schmitt orthogonalization conversion result;
correspondingly adjusting the analog gray value based on a preset mean variance adjustment algorithm to generate an adjusted analog gray value;
replacing a first component in the Schmidt orthogonalization conversion result with the adjusted analog gray value, and performing the Schmidt orthogonalization inverse transformation on the Schmidt orthogonalization conversion result to generate an inverse transformation result;
and removing a first wave band in the inverse transformation result, and generating a corresponding fusion image after the segmentation image and the remote sensing background image are fused.
7. The method of claim 6, wherein the background generation model is an artificial intelligence image generation model;
inputting and training the background description information to a converged background generation model to generate a remote sensing background image, wherein the remote sensing background image comprises the following steps:
determining image keywords from the background description information by adopting an artificial intelligent image generation model;
and generating a corresponding remote sensing background image based on the image keywords by adopting an artificial intelligent image generation model.
8. The method according to any one of claims 1 to 7, wherein before the training the identification element information and the example image input to the converged remote sensing image segmentation big model to generate the segmented image corresponding to the identification element information and the identification element category corresponding to each pixel in the segmented image, further comprises:
obtaining a training sample, wherein the training sample comprises the following steps: the remote sensing training image and the label information corresponding to the remote sensing training image;
inputting the training sample into a preset remote sensing image segmentation big model, and training the preset remote sensing image segmentation big model based on a preset random covering module; the preset random masking module is used for randomly masking pixels of the image block when the preset remote sensing image segmentation large model is subjected to image blocking processing;
Determining whether the preset remote sensing image segmentation large model meets preset convergence conditions according to the segmentation image output by the preset remote sensing image segmentation large model and the identification element category corresponding to each pixel in the segmentation image;
and if the preset remote sensing image segmentation big model meets the convergence condition, determining the preset remote sensing image segmentation big model meeting the convergence condition as the remote sensing image segmentation big model trained to be converged.
9. The method of claim 6, wherein training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model comprises:
dividing the remote sensing training sample into a training set and a verification set according to a preset proportion;
performing iterative training on the preset remote sensing image recognition model by adopting a training set to generate preset remote sensing image recognition models corresponding to each iterative version;
performing model accuracy verification on the preset remote sensing image recognition models corresponding to the iteration versions based on the verification set, and generating average accuracy mean values corresponding to the preset remote sensing image recognition models of the iteration versions;
and determining a preset remote sensing image recognition model corresponding to the maximum average precision mean value as a target remote sensing image recognition model.
10. A remote sensing image recognition model generation device, characterized by comprising:
the acquisition module is used for acquiring an image description text and an example image corresponding to the remote sensing training sample to be generated;
the first generation module is used for inputting and training the image description text to a converged semantic understanding model and generating corresponding identification element information and background description information;
the second generation module is used for inputting the identification element information and the example image into a converged remote sensing image segmentation large model and generating a segmentation image corresponding to the identification element information and identification element categories corresponding to pixels in the segmentation image; the example image at least comprises identification elements corresponding to the identification element information;
the third generation module is used for inputting the background description information into a converged background generation model to generate a remote sensing background image;
the fusion module is used for fusing the segmented image, the identification element category corresponding to each pixel in the segmented image and the remote sensing background image to generate a remote sensing training sample;
and the fourth generation module is used for training a preset remote sensing image recognition model based on the remote sensing training sample to generate a corresponding target remote sensing image recognition model.
11. An electronic device, comprising: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the remote sensing image recognition model generation method of any one of claims 1 to 9.
12. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the remote sensing image recognition model generation method of any one of claims 1 to 9.
13. A computer program product comprising a computer program which, when executed by a processor, implements a remote sensing image recognition model generation method as claimed in any one of claims 1 to 9.
CN202311315567.3A 2023-10-11 2023-10-11 Remote sensing image recognition model generation method, device, equipment, medium and product Pending CN117312957A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311315567.3A CN117312957A (en) 2023-10-11 2023-10-11 Remote sensing image recognition model generation method, device, equipment, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311315567.3A CN117312957A (en) 2023-10-11 2023-10-11 Remote sensing image recognition model generation method, device, equipment, medium and product

Publications (1)

Publication Number Publication Date
CN117312957A true CN117312957A (en) 2023-12-29

Family

ID=89286301

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311315567.3A Pending CN117312957A (en) 2023-10-11 2023-10-11 Remote sensing image recognition model generation method, device, equipment, medium and product

Country Status (1)

Country Link
CN (1) CN117312957A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (en) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 Picture generation method, device, equipment and medium based on large model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (en) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 Picture generation method, device, equipment and medium based on large model

Similar Documents

Publication Publication Date Title
Guo et al. CDnetV2: CNN-based cloud detection for remote sensing imagery with cloud-snow coexistence
CN111259905B (en) Feature fusion remote sensing image semantic segmentation method based on downsampling
CN110889449A (en) Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
Fu et al. Contextual deconvolution network for semantic segmentation
CN113780149B (en) Remote sensing image building target efficient extraction method based on attention mechanism
CN117312957A (en) Remote sensing image recognition model generation method, device, equipment, medium and product
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN116311254B (en) Image target detection method, system and equipment under severe weather condition
CN115861756A (en) Earth background small target identification method based on cascade combination network
Li Image semantic segmentation method based on GAN network and ENet model
Dias et al. Semantic segmentation and colorization of grayscale aerial imagery with W‐Net models
Park et al. Downscaling earth system models with deep learning
Wang et al. Paccdu: pyramid attention cross-convolutional dual unet for infrared and visible image fusion
Wang et al. Detecting occluded and dense trees in urban terrestrial views with a high-quality tree detection dataset
CN117315241A (en) Scene image semantic segmentation method based on transformer structure
CN116630610A (en) ROI region extraction method based on semantic segmentation model and conditional random field
Vijayalakshmi K et al. Copy-paste forgery detection using deep learning with error level analysis
CN115577768A (en) Semi-supervised model training method and device
CN114913382A (en) Aerial photography scene classification method based on CBAM-AlexNet convolutional neural network
Liang et al. A bidirectional semantic segmentation method for remote sensing image based on super-resolution and domain adaptation
CN114155165A (en) Image defogging method based on semi-supervision
Liu et al. Weather recognition of street scene based on sparse deep neural networks
Zou et al. DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal From Optical Satellite Images
He et al. An efficient urban flood mapping framework towards disaster response driven by weakly supervised semantic segmentation with decoupled training samples
CN115984714B (en) Cloud detection method based on dual-branch network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination