CN111401302B

CN111401302B - Remote sensing image ship target integrated detection and fine-grained identification method

Info

Publication number: CN111401302B
Application number: CN202010266998.5A
Authority: CN
Inventors: 姚力波; 张筱晗; 吕亚飞; 李孟洋; 林迅; 杨冬
Original assignee: Naval Aeronautical University
Current assignee: Naval Aeronautical University
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2022-08-02
Anticipated expiration: 2040-04-07
Also published as: CN111401302A

Abstract

The embodiment of the invention discloses a remote sensing image ship target integrated detection and fine granularity identification method. The method comprises the following steps: acquiring a training sample set; determining a feature extraction shared model integrated in a remote sensing image processing model, a position detection branch model for outputting position information of a target object and a type identification branch model for outputting type information of the target object; and alternately training the position detection branch model and the type recognition branch model by adopting the shared image characteristics extracted from the training sample through the characteristic extraction shared model so as to finish training the remote sensing image processing model when a preset convergence condition is met. By adopting the scheme, the shared partial image features can be prevented from being repeatedly extracted when the position detection branch model and the type recognition branch model are trained or used, the task load of the part of extraction operation is prevented from being additionally increased, and the task load of model training is greatly reduced.

Description

Remote sensing image ship target integrated detection and fine-grained identification method

Technical Field

The embodiment of the invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image ship target integrated detection and fine-grained identification method.

Background

The detection and identification of the ship target based on the remote sensing image is an important means for ocean monitoring, and has great research significance in various fields.

Currently, in the field of computer vision, a method based on deep learning is becoming a mainstream technology for target detection and target recognition tasks. However, each target detection task and each target recognition task are independently completed, and detection and recognition of the ship target cannot be completed simultaneously, and more importantly, the target detection task and the target recognition task have a common place, so that no matter model training or model use is performed, additional task amount is increased by independently executing the tasks, and the efficiency of target detection and recognition is reduced when the model is used.

Disclosure of Invention

The embodiment of the invention provides a remote sensing image ship target integrated detection and fine-grained identification method, which is used for realizing integrated training of a detection function and an identification function and saving training calculation power.

In a first aspect, an embodiment of the present invention provides a remote sensing image ship target integrated detection and fine-grained identification method, where the method includes:

acquiring a training sample set; the training sample comprises a remote sensing image sample and a position label and a type label of a target object in the remote sensing image sample;

determining a feature extraction shared model integrated in a remote sensing image processing model, a position detection branch model for outputting position information of a target object and a type identification branch model for outputting type information of the target object;

and alternately training the position detection branch model and the type recognition branch model by adopting the shared image characteristics extracted from the training sample through the characteristic extraction shared model so as to finish the training of the remote sensing image processing model when a preset convergence condition is met.

In a second aspect, an embodiment of the present invention further provides a device for training a remote sensing image processing model, where the device includes:

the sample set acquisition module is used for acquiring a training sample set; the training sample comprises a remote sensing image sample and a position label and a type label of a target object in the remote sensing image sample;

the sub-model determining module is used for determining a feature extraction shared model integrated in the remote sensing image processing model, a position detection branch model used for outputting position information of a target object and a type identification branch model used for outputting type information of the target object;

and the sub-model training module is used for alternately training the position detection branch model and the type recognition branch model by adopting the shared image characteristics extracted from the training sample by the characteristic extraction shared model so as to finish the training of the remote sensing image processing model when meeting the preset convergence condition.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

one or more processing devices;

storage means for storing one or more programs;

when executed by the one or more processing devices, cause the one or more processing devices to implement a method of training the remote sensing image processing model as provided in any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements the method for training a remote sensing image processing model as provided in any embodiment of the present invention.

The embodiment of the invention provides a remote sensing image ship target integrated detection and fine-grained identification method, when a remote sensing image processing model is trained by using a training sample set comprising remote sensing image samples, a position detection branch model and a type identification branch model are integrated in the same remote sensing image processing model for training, so that a remote sensing image processing model can be used subsequently to realize a position detection function and a type identification function of a target object at the same time; and when the training sample set is used for training each sub-model, partial image features which can be shared by the position detection branch model and the type identification branch model can be obtained through one feature extraction model, and then the position detection branch model and the type identification branch model are respectively provided for training, so that the situation that the shared partial image features are repeatedly extracted when the position detection branch model and the type identification branch model are trained or used can be avoided, the task quantity of the part of extraction operation is prevented from being additionally increased, the task quantity of model training is greatly reduced, and the subsequent processing efficiency when the trained remote sensing image processing model is used for remote sensing image processing can be improved.

The above summary of the present invention is merely an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description in order to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart of a remote sensing image ship target integrated detection and fine-grained identification method provided in the embodiment of the present invention;

FIG. 2 is a schematic diagram of a model framework of a remote sensing image processing model provided in an embodiment of the present invention;

FIG. 3 is a flowchart of another remote sensing image ship target integrated detection and fine-grained identification method provided in the embodiment of the present invention;

FIG. 4 is a schematic illustration of ROI pooling and ROI alignment provided in an embodiment of the present invention;

FIG. 5 is a schematic illustration of a masking of a target region of interest provided in an embodiment of the invention;

FIG. 6 is a block diagram of a training apparatus for a remote sensing image processing model provided in an embodiment of the present invention

Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations (or steps) can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

In order to better understand the training scheme of the remote sensing image processing model of the present application, the following description is briefly made through a specific application scenario. Taking ships in remote sensing images as target objects as an example, the existing ship detection and fine-grained identification tasks are generally performed in a distributed manner. The ship detection is to judge whether the remote sensing image contains the ship or not and give position information of the ship, and can implement detection in a large-range scene image; the interference of the background and other targets needs to be eliminated in the ship category fine-grained identification, and the fine-grained identification is usually implemented in single ship target slice data at present. However, in practical applications, detection and identification of a ship target need to be completed at the same time, and separation of a detection task and an identification task increases a processing flow, wastes part of the calculation process shared by instincts, increases the task amount, and is difficult to meet the real-time requirement. Therefore, a model that can simultaneously perform the detection and recognition tasks and reduce the calculation amount as much as possible is highly desirable.

The following embodiments and alternatives thereof are described in detail with respect to the training scheme of the remote sensing image processing model provided in the present application.

Fig. 1 is a flowchart of a remote sensing image ship target integrated detection and fine-grained identification method provided in an embodiment of the present invention. The embodiment of the invention can be suitable for training the remote sensing image processing model, in particular to the situation of integrally training the model integrated with the detection task and the recognition task. The method can be executed by a training device of the remote sensing image processing model, and the device can be realized in a software and/or hardware mode and is integrated on any electronic equipment with a network communication function. As shown in fig. 1, the training method for a remote sensing image processing model provided in the embodiment of the present application may include the following steps:

s110, obtaining a training sample set; the training samples comprise remote sensing image samples and position labels and type labels of target objects in the remote sensing image samples.

In this embodiment, a collecting device for collecting remote sensing images may be used to collect remote sensing images in multiple scenes including a target object, so as to obtain remote sensing image samples in multiple different scenes, for example, the target object may be a ship or the like. And aiming at each acquired remote sensing image sample, manually marking a target object in the remote sensing image sample to obtain a position label and a type label of the target object.

In this embodiment, the location tag of the target object may be used to represent location information of the target object in the remote sensing image sample. For example, the position information of the target object may be labeled with the position information of a circumscribed rectangular frame of the target object. Wherein, the position information of the circumscribed rectangle frame of the target object may include: the coordinate of the central point of the external rectangular frame, the width and the height of the external rectangular frame and the included angle between the long edge of the external rectangular frame and the horizontal axis. The type tag of the target object can be used to represent category information of the target object in the remote sensing image sample. For example, taking the target object as a ship as an example, the category information of the target object may include: the function category of the civil ship, the military ship or the unknown ship can also specifically comprise a destroyer, a defender and the like.

S120, determining a feature extraction shared model integrated in the remote sensing image processing model, a position detection branch model for outputting position information of the target object and a type identification branch model for outputting type information of the target object.

In this embodiment, the remote sensing image processing model according to the scheme of the present application integrates a task of performing position detection on a target object and a fine-grained type recognition task, in other words, the position detection and the type recognition of the target object need to be integrated in the same remote sensing image processing model, so that a remote sensing image including the target object is input in an end-to-end model, and position information and type information of the target object in the remote sensing image are output.

In this embodiment, fig. 2 is a schematic diagram of a model framework of a remote sensing image processing model provided in an embodiment of the present invention. Referring to fig. 2, a feature extraction sharing model, a position detection branch model and a type recognition branch model are integrated in a remote sensing image processing model to be trained in the scheme of the application, and the whole network is divided into a position detection branch and a type recognition branch. The remote sensing image processing device comprises a position detection branch model and a type identification branch model, wherein the position detection branch model is used for judging whether a target object exists in the remote sensing image and determining the position information of the target object in the remote sensing image, and the type identification branch model is used for determining the type information of the target object in the remote sensing image and not determining the position information of the target object any more.

In this embodiment, referring to fig. 2, the remote sensing image processing model in the solution of the present application may be further improved on the basis of a ResNet50 network, the feature extraction sharing model is constructed by using a convolution module existing in a ResNet50 network, the location detection branch model is constructed by using an RPN network model, and the type identification model is constructed by using a classification model.

S130, adopting the shared image features extracted from the training sample through the feature extraction shared model, and alternately training the position detection branch model and the type recognition branch model so as to finish training the remote sensing image processing model when the preset convergence condition is met.

In the embodiment, in the field of visual processing, the position detection task of the position detection branch model and the type identification task of the type identification branch model have common points, and partial shared characteristics exist between the two tasks. Therefore, when the remote sensing image processing model is trained, the shared image features of the remote sensing image sample and the remote sensing image sample can be extracted through the feature extraction shared model, and the position detection branch and the type identification branch can share and use the extracted shared image features conveniently. The benefits of this are: when the position detection branch model and the type recognition branch model are trained, repeated extraction of the shared image features in some training processes is avoided, so that unnecessary feature extraction processes are avoided, and training efficiency is reduced due to repeated extraction of the shared image features; meanwhile, in the process of using the remote sensing image processing model, the shared image features can be obtained and used, and the detection and identification efficiency reduction caused by repeated extraction of the shared image features in the subsequent use process is avoided.

In the present embodiment, although the location detection task and the type recognition task have a partial commonality, there is also a slight difference in the tasks performed by the location detection branch model and the type recognition branch model. When the remote sensing image processing model is trained, the position detection branch model and the type recognition branch model can only share part of the low-level and medium-level image features in the remote sensing image sample, and the high-level image features in the remote sensing image sample need to be further extracted according to the characteristics of the position detection task and the type recognition task. Therefore, after the shared image features are extracted from the training samples through the feature extraction shared model, the position detection branch model and the type recognition branch model are trained alternately by adopting the shared image features, so that the higher-level image features can be further extracted according to the task characteristics of each sub-model to train.

Optionally, the position detection branch model includes at least a part of a convolution module and a position detection submodel, and the type identification branch model includes at least a part of a convolution module and a type identification submodel. And the partial convolution module in each branch model can further extract higher-level image features required by each branch task from the shared image features according to the task features of each branch, so that the position detection branch model and the type identification branch model are accurately trained through the higher-level image features of each branch obtained by feature separation. The benefits of this are: the commonality of the position detection task and the type identification task is considered, the calculation power in the training process is saved through feature sharing, and unnecessary feature extraction calculation amount is saved as much as possible; meanwhile, the difference between the position detection task and the type recognition task is also considered, the position detection branches can be more 'concentrated' in the detection task through the characteristic separation and the type recognition branches are more 'concentrated' in the recognition task through the alternative training of each sub-model, so that the precision of each branch is improved, and the balance between the model training operand and the precision can be well obtained in the model training process.

In this embodiment, when the position detection branch model and the type recognition branch model are alternately trained, the convergence determination may be performed by using the loss function of each branch model, and when it is determined that the output value of the loss function of each branch model satisfies the preset convergence condition, the training of the position detection branch model and the type recognition branch model is completed, that is, the training of the remote sensing image processing model is completed. The loss function of the position detection branch model is the same as the RPN network loss function, and the content is shown in the following formula (1), namely judging whether the loss is the sum of the two-classification logistic regression loss and the position regression loss of the target object; the penalty function of the type identifying branch shows the cross entropy penalty using, in particular, the following equation (2).

Wherein p is _i The prediction of whether a region is a target for the location detection branch model,

is true value, t _i In order to be a prediction of the position,

to correspond to true value, q _i 、

The type of the branch model is identified, and the predicted result and the corresponding true value are respectively.

The embodiment of the invention provides a remote sensing image ship target integrated detection and fine-grained identification method, when a remote sensing image processing model is trained by using a training sample set comprising remote sensing image samples, a position detection branch model and a type identification branch model are integrated in the same remote sensing image processing model for training, so that a remote sensing image processing model can be used subsequently to realize a position detection function and a type identification function of a target object at the same time; and when the training sample set is used for training each sub-model, partial image features which can be shared by the position detection branch model and the type identification branch model are obtained through one feature extraction model, then the position detection branch model and the type identification branch model are respectively provided for training, the shared partial image features are prevented from being repeatedly extracted when the position detection branch model and the type identification branch model are trained or used, the task amount of the part of extraction operation can be prevented from being additionally increased, the task amount of model training can be greatly reduced, and the subsequent processing efficiency when the trained remote sensing image processing model is used for remote sensing image processing can be improved.

Fig. 3 is a flowchart of another remote sensing image ship target integrated detection and fine-grained identification method provided in the embodiment of the present invention, and the embodiment of the present invention further optimizes the foregoing embodiment on the basis of the foregoing embodiment, and may be combined with various alternatives in one or more of the foregoing embodiments. As shown in fig. 3, the training method of the remote sensing image processing model provided in the embodiment of the present application may include the following steps:

s310, obtaining a training sample set; the training sample comprises a remote sensing image sample and a position label and a type label of a target object in the remote sensing image sample.

S320, determining a feature extraction shared model integrated in the remote sensing image processing model, a position detection branch model for outputting position information of the target object and a type identification branch model for outputting type information of the target object.

S330, controlling the network parameters of the type recognition branch model to be unchanged, performing combined training on the position detection branch model and the feature extraction sharing model by adopting the first shared image features, and updating the network parameters.

The first shared image feature is a shared image feature extracted from a training sample through the feature extraction shared model.

In this embodiment, in training the type recognition branch model, the feature extraction shared model may be used to extract shared image features from the training sample, and the extracted complete shared image features may be used to train the type recognition branch model so that it can recognize the type of the target object in the training sample. However, the shared image features extracted by the feature extraction shared model are image features representing the whole training sample, and since the type recognition capability of the training type recognition model is required, a process of determining the position of the target object is necessarily included in the training process, otherwise, the target object may not be accurately recognized, and other non-target objects are mistaken for the target object and are erroneously recognized, in other words, since accurately achieving the position of detecting the target object is the basis for correctly recognizing the target pair type, and if the type recognition branch model is directly trained by using the complete shared image features, more computation is necessarily required.

Based on the above analysis, in order to ensure that excessive computation is not required to be spent in training the type recognition branch model, the type recognition branch model can be trained in an auxiliary manner by means of the position information of the target object determined by the position detection branch model, so that the position detection branch model needs to be trained first, otherwise, the precision of the position detection branch model cannot be ensured.

In this embodiment, referring to fig. 2, the whole network of the remote sensing image processing model to be trained includes: the system comprises a feature extraction sharing model, a position detection branch model and a type identification branch model. Therefore, when the position detection branch model and the type recognition branch model are trained by using the shared image features extracted by the feature extraction shared model, an alternative training strategy of training the position detection branch model first and then performing auxiliary training on the type recognition branch model by using the output result of the trained position detection branch model can be adopted.

In the present embodiment, referring to fig. 2, when training the position detection branch model, since the position detection branch model needs to provide the shared image features using the feature extraction shared model, the position detection branch model and the feature extraction shared model need to be trained in combination with the network parameters of the fixed-type recognition branch model. The specific process is as follows: inputting training samples in a training sample set into a feature extraction shared model, and extracting a first shared image feature from the training samples through the feature extraction shared model (wherein the first shared image feature is distinguished from shared image features extracted in other subsequent training processes); and then, performing combined training on the position detection branch model and the feature extraction shared model by adopting the first shared image feature, and updating and adjusting network parameters of the position detection branch model and the feature extraction shared model according to a training result to obtain an updated position detection branch model and an updated feature extraction shared model.

Alternatively, the location detection branch model may include: after the first shared image feature extracted by the feature extraction shared model is obtained, the higher-level non-shared image feature matched with the position detection sub-model can be extracted from the first shared image feature through at least part of the convolution modules, and then the higher-level non-shared image feature is input into the position detection sub-model to perform combined training on the position detection branch model and the feature extraction shared model. Alternatively, when the position detection branch model is trained, a training method of the RPN network may be adopted.

And S340, after the feature extraction shared model and the position detection branch model are subjected to combined training, controlling the network parameters of the position detection branch model to be kept unchanged, adopting the second shared image features to carry out combined training on the type recognition branch model and the new feature extraction shared model, and updating the network parameters.

The new feature extraction shared model is a feature extraction shared model obtained after combined training together with the position detection branch model, and the second shared image feature is a shared image feature re-extracted from the training sample through the new feature extraction shared model.

In this embodiment, after the feature extraction shared model and the position detection branch model are subjected to combined training, a new feature extraction shared model and a new position detection branch model which are trained can be obtained, and at this time, the new feature extraction shared model and the new position detection branch model can well realize position detection on the target object. When the type recognition branch model is trained, since the type recognition branch model also needs to provide shared image features by using the feature extraction shared model, the visible feature extraction shared model needs to be used in cooperation with the type branch model, and therefore the network parameters of the branch model need to be detected at a fixed position to perform combined training on the type recognition branch model and the feature extraction shared model.

The specific process is as follows: inputting the training samples in the training sample set into a new feature extraction shared model, and extracting second shared image features from the training samples through the new feature extraction shared model; and then, performing combined training on the type recognition branch model and the new feature extraction shared model by adopting the second shared image features, and updating and adjusting the type recognition branch model and the new feature extraction shared model according to the training result. The reason that the second shared image feature is obtained through the new feature extraction shared model is that the initial feature extraction shared model is not matched with the position detection branch model for combined training, and has a certain error, and the training of the type identification branch model needs the position detection model, if the initial feature extraction shared model is used, the matching degree of the feature extraction model and the trained position detection branch model is likely to be greatly different, so that the precision of the position detection branch model is reduced.

Optionally, the type recognition branch model may include: after a second shared image feature extracted by a new feature extraction shared model is obtained, a higher-level unshared image feature matched with the type identification submodel can be extracted from the second shared image feature through at least part of the convolution module and a pre-constructed type identification submodel, and then the higher-level unshared image feature is input into the type identification submodel to perform combined training on the type identification branch model and the new feature extraction shared model.

In an alternative manner of this embodiment, the combined training of the type recognition branch model and the new feature extraction shared model by using the second shared image feature may include steps a 1-A3:

step A1, inputting the second shared image feature into the new position detection branch model, and obtaining the first position information of the target object in the training sample.

In this embodiment, after the feature extraction shared model and the position detection branch model are subjected to combined training, the second shared image feature can be extracted from the training sample by the trained new feature extraction shared model, and the obtained second shared image feature can be shared by the new position detection branch model and the type recognition branch model which is not yet combined training. Since the new position detection branch model is a model after combination training, the new position detection branch model detects whether the training sample includes the target object according to the second shared image feature, and determines the first position information of target redemption (here, the "first" is used to distinguish from the position information extracted in other subsequent training processes). The first position information of the target object may be position information of a circumscribed rectangular frame of the target object.

Step A2, determining a second shared image characteristic associated with the first target area from the second shared image characteristic according to the first position information of the target object in the training sample; the first target area is an area which is determined by the first position information and comprises a target object in the training sample.

In the present embodiment, after the first position information of the target object is determined, the first target region including the target object may be determined. In this way, the second shared image feature associated with the first target region can be obtained from the second shared image feature based on the determined first target region, and the second shared image feature related to the target object can be input to the type recognition submodel as much as possible for reducing the amount of computation of the type recognition submodel.

In this embodiment, there may be a large number of negative samples in the first position information of the target object in the training samples output by the new position detection branch model. For this, the positive and negative sample ratios may be controlled depending on the confidence of the first position information of the target object. For example, negative samples with higher scores can be selected to participate in the training of the type recognition branch model according to the descending order of the confidence scores of the first position information of the target object in the detected training samples, and the proportion of the positive samples and the negative samples input into the type recognition branch model is ensured to be 1: 3.

And step A3, performing combined training on the type recognition branch model and the new feature extraction shared model by adopting the second shared image features associated with the first target region.

In the present embodiment, after the second shared image feature associated with the first target region is input to the type recognition submodel in the type recognition branch model, the combined training of the type recognition branch model and the new feature extraction shared model can be realized.

In an alternative, determining the second shared image feature associated with the first target region from the second shared image features according to the first position information of the target object in the training sample may include steps B1-B2:

step B1, according to the first position information of the target object in the training sample, masking the first target area according to a preset scale to obtain a first target mask window including the first target area.

In this embodiment, the ROI region indicated by the position information of the target object output by the position detection branch model (e.g., a branch model including an RPN network) has different shapes and sizes, and when the features of the corresponding region are input to the type identification branch model, the input needs to be fixed to a uniform size due to the existence of the full connection layer, and the ROI pooling or ROI alignment operation is used to achieve size uniformity.

In this embodiment, fig. 4 is a schematic diagram of ROI pooling and ROI alignment provided in an embodiment of the present invention. Referring to fig. 4, the generation of a3 × 3 feature is illustrated. For the interested target area, ROI pooling divides the target area into 3 × 3 sub-areas, and maximum pooling or average pooling is performed on each sub-area, so as to obtain a feature map with a size of 3 × 3. However, in ROI pooling, the sizes of the sub-regions may not be uniform, resulting in non-uniform amounts of extracted feature point information. On the basis, the ROI alignment operation averagely divides the interested target region into sub-regions with the size of 3 x 3, and an interpolation method is adopted to acquire the characteristics of the central points of the sub-regions. Although the method can acquire the target region features with fixed dimensions, the two operations change the shape of the target region, and are not beneficial to the subsequent recognition task. For this purpose, the scheme of the application adopts a mask scheme of the interested target area to fix and unify the size of each interested target area.

In this embodiment, fig. 5 is a schematic diagram of masking a target region of interest according to an embodiment of the present invention. Referring to fig. 5, a preset mask window with a dimension k × k may be set, where the value of k is generally set to be larger than the dimension of the first target region and smaller than the length of the long edge of the original image; and overlapping the mask center with the center of the first target area, wherein in a preset mask window, the mask pixel value corresponding to the first target area is set to be 1, and the pixel values of the rest areas are set to be 0, so that the first target mask window related to the first target area can be obtained.

And step B2, performing element-by-element multiplication on the first target mask window and the second shared image feature at the position of the first target mask window to obtain a second shared image feature associated with the first target area.

In this embodiment, referring to fig. 5, after obtaining a first target mask window associated with a first target region, the first target mask window and a second shared image feature at a corresponding position of the first target mask window may be multiplied element by element, and the obtained k × k image feature is a second shared image feature associated with the first target region. Optionally, for a part of the first target region, the side length may be larger than a set mask window dimension k, at this time, the mask window is changed into the long side length of the first target region, and after the second shared image feature associated with the first target region is extracted by the same method, the feature space dimension is adjusted to k × k.

And S350, completing training of the remote sensing image processing model under the condition that the convergence functions of the position detection branch model and the type identification branch model meet the preset convergence condition.

The embodiment of the invention provides a remote sensing image ship target integrated detection and fine granularity identification method, when a training sample set is used for training each sub-model, partial image features which can be shared by a position detection branch model and a type identification branch model are obtained through a feature extraction model, then the position detection branch model and the type identification branch model are respectively provided for training, the shared partial image features are prevented from being repeatedly extracted when the position detection branch model and the type identification branch model are trained or used, the extra increase of the task load of the extraction operation can be avoided, the task load of model training can be greatly reduced, and the subsequent processing efficiency when the trained remote sensing image processing model is used for remote sensing image processing can be improved.

On the basis of the above embodiment, optionally, the alternating training of the position detection branch model and the type recognition branch model is performed by using the shared image features extracted from the training samples by the feature extraction shared model, and the following steps C1-C2 may be further included:

and step C1, controlling the network parameters of the feature extraction shared model and the type recognition branch model after the last training update to be kept unchanged, adopting a third shared image feature to carry out independent training on the position detection branch model after the last training update, and carrying out fine adjustment on the network parameters of the last training update.

And step C2, controlling the network parameters of the feature extraction shared model and the position detection branch model after the last training update to be kept unchanged, adopting a third shared image feature to carry out independent training on the type recognition branch model after the last training update, and carrying out fine adjustment on the network parameters of the last training update.

The third shared image feature is a shared image feature extracted from the training sample by the feature extraction shared model updated by the last training.

In this embodiment, when the data scale of the training sample set is small and insufficient to train the whole remote sensing image processing model from the beginning, the network parameters of the feature extraction sharing model, the position detection branch model and the type identification branch model integrated in the pre-trained remote sensing image processing model may be initialized on the basis of a model pre-trained by using other training samples. On the basis, the remote sensing image processing model to be trained can be considered to belong to the model which is trained and updated, and the steps C1-C2 are directly adopted for training fine adjustment at the moment, so that the operations S330-S340 are not required to be executed.

In this embodiment, when the data size of the training sample set is large enough to support the training of the whole remote sensing image processing model from the beginning, the operations of S330-S340 may be performed in sequence to perform the training from the beginning or the updating training in a wide range on the remote sensing image processing model, and after the operations of S330-S340 are performed, the steps C1-C2 are continued to perform the fine tuning training on each branch model of the aforementioned updating training, and the network parameters of each branch model are fine tuned. Moreover, the network parameters of the feature extraction shared model are fixed, so that the position detection branch model and the type identification branch model are trained more accurately and independently to improve the accuracy.

It should be noted that the operations of the above steps C1-C2 are similar to the operations of S330-S340, and the differences are: the steps C1-C2 belong to fine tuning training, and the training objects are an independent position detection branch model and an independent type recognition branch model, so that the individual accuracy of the position detection branch model and the type recognition branch model is further improved; the operations of S330-S340 belong to a de novo training or a large scale update training, and the training objects are a combination of the feature extraction shared model and the location detection branch model, and a combination of the feature extraction shared model and the type identification branch model, so as to increase the matching degree of the feature extraction shared model, the location detection branch model and the type identification branch model, and to complete a preliminary model training process as much as possible.

Fig. 6 is a block diagram of a structure of a training apparatus for a remote sensing image processing model provided in an embodiment of the present invention. The embodiment of the invention can be suitable for training the remote sensing image processing model, in particular to the situation of integrally training the model integrated with the detection task and the recognition task. The device can be implemented in software and/or hardware and integrated on any electronic equipment with network communication function. As shown in fig. 6, the training apparatus for a remote sensing image processing model provided in the embodiment of the present application may include the following: a sample set acquisition module 610, a sub-model determination module 620, and a sub-model training module 630. Wherein:

a sample set obtaining module 610, configured to obtain a training sample set; the training sample comprises a remote sensing image sample and a position label and a type label of a target object in the remote sensing image sample;

a sub-model determining module 620, configured to determine a feature extraction shared model integrated in the remote sensing image processing model, a location detection branch model for outputting location information of the target object, and a type identification branch model for outputting type information of the target object;

a sub-model training module 630, configured to alternately train the location detection branch model and the type recognition branch model by using the shared image features extracted from the training sample by the feature extraction shared model, so as to complete training of the remote sensing image processing model when a preset convergence condition is satisfied.

On the basis of the foregoing embodiment, optionally, the location tag is used to represent location information of a target object in the remote sensing image sample, and the type tag is used to represent category information of the target object in the remote sensing image sample.

On the basis of the foregoing embodiment, optionally, the sub-model training module 630 includes:

the first combined training unit is used for controlling the network parameters of the type recognition branch model to keep unchanged, performing combined training on the position detection branch model and the feature extraction shared model by adopting first shared image features, and updating the network parameters;

the second combined training unit is used for controlling the network parameters of the position detection branch model to keep unchanged after the feature extraction shared model and the position detection branch model are subjected to combined training, adopting second shared image features to perform combined training on the type recognition branch model and the new feature extraction shared model, and updating the network parameters;

the first shared image feature is a shared image feature extracted from the training sample through the feature extraction shared model, the new feature extraction shared model is the feature extraction shared model obtained after the combined training together with the position detection branch model, and the second shared image feature is a shared image feature re-extracted from the training sample through the new feature extraction shared model.

On the basis of the foregoing embodiment, optionally, the second combined training unit includes:

the first position output subunit is used for inputting the second shared image feature into the new position detection branch model to obtain first position information of a target object in the training sample;

the first feature screening subunit is used for determining a second shared image feature associated with the first target area from the second shared image feature according to the first position information of the target object in the training sample; wherein the first target region is a region of the training sample determined by the first location information and including a target object;

and the second combined training subunit is used for performing combined training on the type recognition branch model and the new feature extraction shared model by adopting the second shared image feature associated with the first target region.

On the basis of the foregoing embodiment, optionally, the first feature screening subunit includes:

according to the first position information of the target object in the training sample, carrying out mask processing on the first target area according to a preset scale to obtain a first target mask window comprising the first target area;

and multiplying the first target mask window and the second shared image feature at the position of the first target mask window element by element to obtain a second shared image feature associated with the first target area.

On the basis of the foregoing embodiment, optionally, the sub-model training module 630 further includes:

the first individual training unit is used for controlling the network parameters of the feature extraction shared model and the type recognition branch model which are updated after the last training to be kept unchanged, carrying out individual training on the position detection branch model which is updated after the last training by adopting a third shared image feature, and carrying out fine tuning on the network parameters which are updated after the last training;

the second individual training unit is used for controlling the network parameters of the feature extraction shared model and the position detection branch model which are updated after the last training to be kept unchanged, carrying out individual training on the type recognition branch model which is updated after the last training by adopting a third shared image feature, and carrying out fine adjustment on the network parameters which are updated after the last training;

On the basis of the above embodiment, optionally, the second individual training unit includes:

the second position output subunit is configured to input the third shared image feature into the position detection branch model updated in the last training, and obtain second position information of the target object in the training sample;

the second feature screening subunit is configured to determine, according to second position information of the target object in the training sample, a third shared image feature associated with the second target region from the third shared image feature; wherein the second target region is a region in the training sample determined by the second location information and including a target object;

and the second individual training subunit is used for performing individual training on the type recognition branch model updated by the last training by adopting the third shared image feature associated with the second target area.

On the basis of the foregoing embodiment, optionally, the second feature screening subunit includes:

according to second position information of the target object in the training sample, carrying out mask processing on the second target area according to a preset scale to obtain a second target mask window comprising the second target area;

and multiplying the second target mask window and the third shared image feature at the position of the second target mask window element by element to obtain a third shared image feature associated with the second target area.

The training device for the remote sensing image processing model provided by the embodiment of the invention can execute the training method for the remote sensing image processing model provided by any embodiment of the invention, has the corresponding functions and beneficial effects of the training method for executing the remote sensing image processing model, and the detailed process refers to the relevant operations of the training method for the remote sensing image processing model in the embodiment.

Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention. As shown in fig. 7, the electronic device provided in the embodiment of the present invention includes: one or more processors 710 and storage 720; the processor 710 in the electronic device may be one or more, and one processor 710 is taken as an example in fig. 7; storage 720 for storing one or more programs; the one or more programs are executed by the one or more processors 710, such that the one or more processors 710 implement a method for training a remote sensing image processing model according to any of the embodiments of the present invention.

The electronic device may further include: an input device 730 and an output device 740.

The processor 710, the storage device 720, the input device 730, and the output device 740 in the electronic apparatus may be connected by a bus or other means, and fig. 7 illustrates an example of connection by a bus.

The storage device 720 in the electronic device is used as a computer readable storage medium for storing one or more programs, which may be software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the training method for the remote sensing image processing model provided in the embodiment of the present invention. The processor 710 executes various functional applications and data processing of the electronic device by executing software programs, instructions and modules stored in the storage device 720, namely, implements the training method of the remote sensing image processing model in the above method embodiment.

The storage 720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Additionally, the storage 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the storage 720 may further include memory located remotely from the processor 710, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 730 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus. The output device 740 may include a display device such as a display screen.

And, when the one or more programs included in the electronic device are executed by the one or more processors 710, the programs perform the following operations:

Of course, it will be understood by those skilled in the art that when one or more programs included in the electronic device are executed by the one or more processors 710, the programs may also perform operations related to the method for training a remote sensing image processing model provided in any embodiment of the present invention.

An embodiment of the present invention provides a computer-readable medium having stored thereon a computer program for performing a method of training a remote sensing image processing model when executed by a processor, the method comprising:

Optionally, the program, when executed by the processor, may be further configured to perform a method for training a remote sensing image processing model provided in any embodiment of the present invention.

Computer storage media for embodiments of the present invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A remote sensing image ship target integrated detection and fine granularity identification method is characterized by comprising the following steps:

alternately training the position detection branch model and the type recognition branch model by adopting the shared image features extracted from the training sample by the feature extraction shared model so as to finish training the remote sensing image processing model when a preset convergence condition is met;

alternately training the position detection branch model and the type recognition branch model using the shared image features extracted from the training samples by the feature extraction shared model, including:

controlling the network parameters of the type recognition branch model to be kept unchanged, performing combined training on the position detection branch model and the feature extraction shared model by adopting first shared image features, and updating the network parameters;

after the feature extraction shared model and the position detection branch model are subjected to combined training, controlling the network parameters of the position detection branch model to be kept unchanged, adopting second shared image features to carry out combined training on the type recognition branch model and the new feature extraction shared model, and updating the network parameters;

2. The method of claim 1, wherein the location tag is used to represent location information of a target object in the remote sensing image sample, and the type tag is used to represent category information of the target object in the remote sensing image sample.

3. The method of claim 1, wherein the combined training of the type recognition branch model and the new feature extraction shared model with a second shared image feature comprises:

inputting the second shared image feature into the new position detection branch model to obtain first position information of a target object in the training sample;

determining a second shared image feature associated with the first target area from the second shared image features according to the first position information of the target object in the training sample; wherein the first target region is a region of the training sample determined by the first location information and including a target object;

and performing combined training on the type recognition branch model and the new feature extraction shared model by adopting the second shared image feature associated with the first target region.

4. The method of claim 3, wherein determining a second shared image feature associated with the first target region from the second shared image features according to the first position information of the target object in the training sample comprises:

5. The method of claim 1, wherein the position detection branch model and the type recognition branch model are alternatively trained using shared image features extracted from the training samples by the feature extraction shared model, further comprising:

controlling the network parameters of the feature extraction shared model and the type recognition branch model after the last training update to be kept unchanged, carrying out individual training on the position detection branch model after the last training update by adopting a third shared image feature, and carrying out fine adjustment on the network parameters after the last training update;

controlling the network parameters of the feature extraction shared model and the position detection branch model after the last training update to be kept unchanged, carrying out individual training on the type recognition branch model after the last training update by adopting a third shared image feature, and carrying out fine adjustment on the network parameters after the last training update;

6. The method of claim 5, wherein the separately training the type recognition branch model updated from the previous training with a third shared image feature comprises:

inputting the third shared image feature into the position detection branch model after last training and updating, and acquiring second position information of a target object in the training sample;

determining a third shared image feature associated with a second target area from the third shared image features according to second position information of a target object in the training sample; wherein the second target region is a region in the training sample determined by the second location information and including a target object;

and independently training the type recognition branch model updated by the last training by adopting the third shared image characteristics associated with the second target area.

7. The method of claim 6, wherein determining a third shared image feature associated with a second target region from the third shared image features according to second position information of a target object in the training sample comprises: