CN116740410B

CN116740410B - Bimodal target detection model construction method, bimodal target detection model detection method and computer equipment

Info

Publication number: CN116740410B
Application number: CN202310434291.4A
Authority: CN
Inventors: 李显巨; 李尧; 陈伟涛; 唐厂; 冯如意; 王力哲; 陈刚
Original assignee: China University of Geosciences
Current assignee: China University of Geosciences
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2024-01-30
Anticipated expiration: 2043-04-21
Also published as: CN116740410A

Abstract

The invention provides a bimodal target detection model construction method, a bimodal target detection model construction method and computer equipment, and relates to the technical field of target detection, wherein the bimodal target detection model construction method comprises the following steps: acquiring a bimodal image according to the original image; extracting features of the bimodal image through a neural network module to obtain initial bimodal features, and processing the bimodal image by utilizing a feature complementation module to obtain a vegetation normalization index; the initial bimodal feature is sent to a feature complementation module, so that the feature complementation module utilizes a vegetation normalization index to perform feature enhancement on the initial bimodal feature; inputting the intermediate bimodal feature into a graph rolling module to obtain a graph rolling bimodal feature, a first prediction result and a second prediction result; inputting the first prediction result and the second prediction result into a super-pixel mask module to generate final loss; and carrying out parameter optimization on the neural network module, the characteristic complementation module and the graph convolution module according to the final loss to obtain a bimodal target detection model. The invention realizes the increase of the target detection precision.

Description

Bimodal target detection model construction method, bimodal target detection model detection method and computer equipment

Technical Field

The invention relates to the technical field of target detection, in particular to a bimodal target detection model construction method, a bimodal target detection model construction method and computer equipment.

Background

Object detection technology has been rapidly developed and has gained wide attention in a variety of fields, such as the remote sensing field. The target detection needs to effectively detect targets with various postures in the picture, the characteristic extraction and classification decision of the traditional target detection algorithm are carried out separately, the characteristic selection requirement is more strict, and the ideal effect is difficult to obtain when facing complex scenes. As the understanding of deep learning theory continues to deepen, techniques for target detection by deep learning result. The convolutional neural network is generally adopted to process the target, compared with the traditional target detection algorithm, the convolutional neural network has the advantages that the characteristic extraction and the pattern classification are performed in parallel, and the complex scene can be processed better along with the increase of the layer number.

Many object detection models, such as the one-stage model yolo series and SSD, and the two-stage model RCNN series, have met with great success in object detection. Most of the existing target detection models based on deep learning are characterized by extracting features in a first stage through a two-stage model, and the two-stage model carries out position regression and classification on the extracted features. However, in the target detection process, the extracted features are often inaccurate due to the omission of the feature extraction part, so that the target detection precision is low.

Disclosure of Invention

The invention solves the problem of low target detection precision.

In order to solve the above problems, the present invention provides a method for constructing a bimodal target detection model, including:

acquiring a bimodal image according to an original image, wherein the bimodal image comprises a false color image and a true color image;

extracting features of the bimodal image through a neural network module to obtain initial bimodal features, and processing the bimodal image by utilizing a feature complementation module to obtain a vegetation normalization index;

the initial bimodal feature is sent to the feature complementation module, so that the feature complementation module utilizes the vegetation normalization index to perform feature enhancement on the initial bimodal feature to obtain an intermediate bimodal feature;

inputting the intermediate bimodal feature into a graph rolling module to obtain a graph rolling bimodal feature, a first prediction result and a second prediction result;

inputting the first prediction result and the second prediction result into a super-pixel mask module to generate final loss;

and carrying out parameter optimization on the neural network module, the characteristic complementation module and the graph convolution module according to the final loss to obtain a bimodal target detection model.

Optionally, the processing the bimodal image by using the feature complementary module to obtain a vegetation normalization index includes:

obtaining an infrared band reflection value according to the false color image, and obtaining a red band reflection value according to the true color image;

obtaining the vegetation normalization index according to the infrared band reflection value and the red band reflection value, wherein the vegetation normalization index is obtained by the following formula;

wherein NDVI represents the vegetation normalization index, NIR represents the infrared band reflectance value, red represents the Red band reflectance value.

Optionally, the initial bimodal feature includes an initial false color feature and an initial true color feature, and the sending the initial bimodal feature to the feature complementary module, so that the feature complementary module performs feature enhancement on the initial bimodal feature by using the vegetation normalization index to obtain an intermediate bimodal feature, including:

vectorizing the vegetation normalization index to obtain a vegetation normalization vector;

obtaining an infrared enhancement feature and a red enhancement feature according to the initial false color feature, the initial true color feature and the vegetation normalization vector, wherein the infrared enhancement feature and the red enhancement feature are obtained by the following formula;

f _nird ＝α _NDVI (f _nir -f _r )；

f _rd ＝α _NDVI (f _r -f _nir )；

Wherein f _nird Representing the infrared enhancement feature, alpha _NDVI Representing the vegetation normalization vector, f _nir Representing the initial false color feature, f _r Representing the initial true color feature, f _rd Representing the red light enhancement feature;

the intermediate bimodal feature is derived from the initial false color feature, the initial true color feature, the infrared enhancement feature, and the red enhancement feature.

Optionally, the intermediate mode feature includes an intermediate false color feature and an intermediate true color feature, and the obtaining the intermediate bimodal feature by feature complementation of the infrared enhancement feature and the red enhancement feature includes:

obtaining the intermediate false color feature and the intermediate true color feature from the initial false color feature, the initial true color feature, the infrared-enhanced feature, and the red-enhanced feature, including obtaining the intermediate false color feature and the intermediate true color feature by;

wherein f' _nir Representing the intermediate false color feature in question,f′ _r representing the intermediate true color feature, F (x) representing the residual, σ representing a hyperbolic tangent activation function, GAP representing global average pooling,indicates the sum of elements, and ". As indicated above, indicates the product of elements.

Optionally, the graph convolution module includes a double-layer graph convolution, a first convolution block and a second convolution block, and inputting the intermediate bimodal feature into the graph convolution module to obtain a graph convolution bimodal feature, a first prediction result and a second prediction result, including:

obtaining a fusion modal feature according to the intermediate false color feature and the intermediate true color feature;

inputting the fusion modal characteristics into the first convolution block to obtain a first prediction result, wherein the first prediction result comprises predicted image data;

inputting the intermediate bimodal feature into the double-layer graph convolution module to obtain the graph convolution bimodal feature;

and inputting the graph convolution bimodal characteristic into the second convolution block to obtain a second prediction result, wherein the second prediction result comprises prediction coordinate data and prediction category data.

Optionally, the inputting the first prediction result and the second prediction result into a superpixel mask module generates a final loss, including:

obtaining a superpixel mask true value, and obtaining superpixel mask loss according to the first prediction result and the superpixel mask true value;

acquiring original coordinate data and original category data of a target detection position manually marked in advance in the original image to obtain a target detection range true value;

Obtaining target detection loss according to the second prediction result and the target detection range true value;

and multiplying the superpixel mask loss and the target detection loss by a preset weight to obtain the final loss.

Optionally, the obtaining the true value of the super-pixel mask includes:

dividing the original image by super pixels to obtain a plurality of super pixel blocks;

determining the overlapping area of each super pixel block and a preset true value detection frame;

when the overlapping area is larger than an area threshold, the super pixel block is a first processing pixel block, wherein the pixel value of the first processing pixel block is 1;

when the overlapping area is smaller than or equal to the area threshold, the super pixel block is a second processing pixel block, wherein the pixel value of the second processing pixel block is 0;

and obtaining the true value of the super-pixel mask according to the first processing pixel block and the second processing pixel block.

Optionally, the neural network module includes a false color neural network and a true color neural network, and the extracting, by the neural network module, the features of the bimodal image to obtain initial bimodal features includes:

inputting the false color image into the false color neural network to obtain the initial false color characteristic, and inputting the true color image into the true color neural network to obtain the initial true color characteristic;

The method comprises the steps of obtaining the initial false color features and the initial true color features through the following formula;

f _nir ＝F ₂ (I _nir-gb )；

f _r ＝F ₁ (I _r-gb )；

wherein f _nir Representing the initial false color feature, f _r Representing the original true colour characteristics, F ₁ Representing the true color neural network, F ₂ Representing the pseudo-color neural network, I _r-gb Representing the true color image, I _nir-gb Representing the false color image.

According to the method for constructing the bimodal target detection model, the false color image and the true color image are obtained through band separation from the original image, and richer information is obtained through false color processing, so that target identification is facilitated. And extracting initial bimodal features of the bimodal image through a neural network, carrying out feature complementation on the extracted initial bimodal features through the vegetation normalization index, and enhancing the initial bimodal features to obtain intermediate bimodal features. And fusing the intermediate bimodal features according to the graph rolling module to obtain graph rolling bimodal features, and predicting by the graph rolling module to obtain a first prediction result and a second prediction result. And obtaining final loss by using the first prediction result and the second prediction result in the super-pixel mask module, and performing feedback by using the final loss to perform parameter optimization to obtain a bimodal target detection model which accords with the target detection precision.

The invention also provides a bimodal target detection method, which comprises the following steps:

acquiring a target bimodal image according to a target original image, wherein the target bimodal image comprises a false color image and a true color image;

inputting the target bimodal image into a bimodal target detection model obtained by the bimodal target detection model construction method to obtain a graph convolution bimodal characteristic;

and obtaining a target detection result according to the graph convolution bimodal characteristic.

The advantages of the bimodal target detection method and the bimodal target detection model construction method of the invention compared with the prior art are the same, and are not described in detail herein.

The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the bimodal object detection model construction method or the steps of the bimodal object detection method when executing the computer program.

The advantages of the computer device and the bimodal target detection model construction method of the invention compared with the prior art are the same, and are not described in detail herein.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.

FIG. 1 is a diagram showing an application environment of a method for constructing a bimodal target detection model in an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for constructing a bimodal target detection model in an embodiment of the invention;

FIG. 3 is a schematic diagram of a feature fusion module according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a graph convolution module according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of obtaining a true value of a superpixel mask in an embodiment of the present invention;

FIG. 6 is a schematic diagram of a preferred embodiment of the present invention;

FIG. 7 is a flow chart of a bimodal target detection method according to an embodiment of the invention;

Fig. 8 is a diagram showing an internal structure of a computer device in the embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 is an application environment diagram of a dynamic graphics generation method in an embodiment of the present invention. Referring to fig. 1, the bimodal target detection model construction method is applied to a bimodal target detection model construction system. The bimodal object detection model construction system includes a terminal 110 and a server 120. The terminal 110 and the server 120 are connected through a network. The terminal 110 may be a desktop terminal or a mobile terminal, and the mobile terminal may be at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

Referring to fig. 2, this embodiment provides a method for constructing a bimodal target detection model, including:

step 210, obtaining a bimodal image according to an original image, wherein the bimodal image comprises a false color image and a true color image;

specifically, data are separated according to wave bands from original image data to obtain bimodal data, one is a true color image of R-GB wave band, the other is a false color image of NIR-GB, and the false color image is usually expressed by a natural color image or a multispectral or hyperspectral image of the same scene. The false color image uses a false color enhancement technique, i.e. it is transformed into new three primary color components by a mapping function, and color synthesis causes each object in the image to appear in a different color than the original image.

Step 220, extracting features of the bimodal image through a neural network module to obtain initial bimodal features, and processing the bimodal image by utilizing a feature complementation module to obtain a vegetation normalization index;

specifically, a pre-trained neural network is obtained through a training set of input feature data, the bimodal image is input into the pre-trained neural network in the neural network module, and initial bimodal features are extracted. And extracting related data of the bimodal image from the feature complementary module, and calculating to obtain a vegetation normalization index.

Step 230, sending the initial bimodal feature to the feature complementary module, so that the feature complementary module performs feature enhancement on the initial bimodal feature by using the vegetation normalization index to obtain an intermediate bimodal feature;

specifically, referring to fig. 3, after receiving the initial bimodal feature, the feature complementation module converts the vegetation normalization index into a vegetation normalization vector, multiplies the difference point of the initial bimodal feature by the vegetation normalization vector to enhance the initial bimodal feature, and then transforms the enhanced initial bimodal feature to obtain an intermediate bimodal feature.

Step 240, inputting the intermediate bimodal feature into a graph rolling module to obtain a graph rolling bimodal feature, a first prediction result and a second prediction result;

specifically, in graph networks based on super-pixel segmentation, the existing methods mostly directly use bimodal features to directly perform element addition or a channel splicing method, and then input the element addition or the channel splicing method into a subsequent sub-network for target detection tasks. We have designed a graph convolution module here that uses super-pixel segmentation techniques to segment the original picture when constructing the graph network. Through super pixel segmentation generation, pixels with similar characteristics can be gathered into one super pixel block, connection among the super pixel blocks is constructed and then input into a graph rolling network, after the graph rolling bimodal characteristics are preliminarily fused, the context information of the characteristics can be increased through a set two-layer graph rolling network, and interaction of the two characteristics is enhanced.

Step 250, inputting the first prediction result and the second prediction result into a super-pixel mask module to generate a final loss;

specifically, the super-pixel mask module generates a super-pixel segmentation map through a super-pixel segmentation method. The super-pixel segmentation method is an unsupervised clustering method, and aims to divide pixels with similar characteristics into the same super-pixel block. And filtering and calculating by using the obtained super-pixel segmentation graph and an existing truth value detection frame to obtain a super-pixel mask truth value. And obtaining a true value of the target detection range through manual labeling. And respectively comparing the first prediction result and the second prediction result with the superpixel mask true value and the target detection range true value, calculating to obtain loss, and finally obtaining the final loss according to different weights of the prediction results.

And 260, performing parameter optimization on the neural network module, the characteristic complementation module and the graph convolution module according to the final loss to obtain a bimodal target detection model.

Specifically, the bimodal target detection model comprises the neural network module, the characteristic complementation module and the graph convolution module after adjustment, the bimodal detection model performs parameter training, and the final loss obtained by each training is calculated through a super-pixel mask module. And adjusting and optimizing parameters in the neural network module, the characteristic complementation module and the graph convolution module in the bimodal detection model according to the final loss, so as to ensure that the final loss obtained each time is smaller than the final loss obtained last time, and obtaining the bimodal target detection model when the final loss reaches a pre-counted value or approaches to zero.

In the embodiment of the present invention, the processing the bimodal image by using the feature complementary module to obtain the vegetation normalization index includes:

Specifically, the vegetation normalization index, NDVI, quantifies vegetation by measuring the difference between infrared light and red light, healthy vegetation reflects more infrared light and green light than other wavelengths, while red light indicates vegetation absorption. The vegetation normalization index is a value between-1 and +1, but each type of land coverage is not well defined and if the reflectivity of the red channel is low and the reflectivity of the infrared channel is high, a higher vegetation normalization index will result. In addition, a higher vegetation normalization index is also produced on vegetation-containing areas.

According to the method for constructing the bimodal target detection model, the vegetation normalization index is calculated according to the infrared band reflection value and the red band reflection value, the vegetation characteristics in the image to be detected are measured through the vegetation normalization index, and the characteristic extraction capacity of different types of ground features is enhanced by distinguishing vegetation from non-vegetation characteristics.

In the embodiment of the present invention, the initial bimodal feature includes an initial false color feature and an initial true color feature, and the sending the initial bimodal feature to the feature complementary module, so that the feature complementary module performs feature enhancement on the initial bimodal feature by using the vegetation normalization index to obtain an intermediate bimodal feature includes:

f _nird ＝α _NDVI (f _nir -f _r )；

f _rd ＝α _NDVI (f _r -f _nir )；

In some more specific embodiments, in the false color image, the vegetation appears uniformly red and clumps are more easily distinguished from cultivated land, but the non-vegetation class features are more indistinguishable than true color images, especially for cars on roads, losing some color information. In the true color image, the target land feature of the non-vegetation area is clearer and sharper, the automobile and the building are easier to distinguish, but the distinction between vegetation cannot be well distinguished. The vegetation features are easier to distinguish on the false color image, the non-vegetation features are easier to distinguish on the true color image, and the vegetation normalization index is introduced, so that the model is more dependent on the false color image when distinguishing vegetation type ground features, and is more dependent on the true color image when distinguishing non-vegetation type ground features.

According to the bimodal target detection model construction method, the vegetation characteristics and the disused vegetation characteristics are more easily distinguished through the vegetation normalization index, the difference characteristics of different modes are obtained through the difference method, and then the difference characteristics of differences between the vegetation characteristics and the non-vegetation characteristics are further enlarged through dot multiplication with the vegetation normalization index, so that the subsequent data processing is facilitated.

In an embodiment of the present invention, the intermediate mode feature includes an intermediate false color feature and an intermediate true color feature, and the obtaining the intermediate bimodal feature by complementing the infrared enhancement feature and the red enhancement feature includes:

wherein f' _nir Representing the intermediate false color feature, f' _r Representing the intermediate true color feature, F (x) representing the residual, σ representing the hyperbolic tangent activation function, GAP representing global averaging pooling, and, as such, the sum of elements, and, as such, the product of the elements.

Specifically, as shown in connection with fig. 3, the infrared enhancement feature is obtained by multiplying the difference point of the initial false color feature and the initial true color feature by a vegetation normalization vector. And the infrared enhancement features are subjected to global average pooling and input into a hyperbolic tangent activation function, namely a Sigmoid function, and then the obtained result and the initial true color features are subjected to element product. The average pooling is to divide the picture according to a fixed-size grid, and the pixel values in the grid are the average value of all pixels in the grid. The Sigmoid function is a common S-shaped function in biology, also called an S-shaped growth curve, and is used as an activation function of a neural network due to the properties of single increment, inverse function single increment and the like, and the variable is mapped between 0 and 1. The resulting element product and the initial false color features are summed such that the result of the summation is the residual input value. In the residual network, the residual block usually exists in the form of H (x) =f (x) +x, H (x) is a predicted value, and F (x) is a residual. And adding the residual output result and the initial false color feature to obtain a final residual block, namely an intermediate false color feature.

According to the bimodal target detection model construction method, the information is smoothly propagated back and forth by constructing the residual block, gradient dispersion is reduced, and feature accuracy is improved. Meanwhile, the feature dimension of the output of the convolution layer is reduced through global average pooling, so that the network parameters are effectively reduced, and the overfitting phenomenon can be prevented. At the same time, noise is suppressed, information redundancy is reduced, and model calculation amount is reduced. Meanwhile, the Sigmoid function is used as an activation function of the neural network for normalization, and gradient smoothing avoids jumping output values, so that a predicted result is more clear, and the subsequent data processing is facilitated.

In the embodiment of the present invention, the graph convolution module includes a double-layer graph convolution, a first convolution block and a second convolution block, and inputting the intermediate bimodal feature into the graph convolution module to obtain a graph convolution bimodal feature, a first prediction result and a second prediction result includes:

Specifically, as shown in connection with fig. 4, the input of each layer of the graph rolling network is the output of the previous layer. However, in the first layer, there is no feature of the previous layer, and the input features are features obtained by primary fusion of the two modes. The calculation formula of each layer of the graph convolution is as follows:

W ^l Representing the weight matrix obtained by learning the fine tuning,representing the regularized adjacency matrix, f ^l Representing the characteristics of the first layer, h represents the activation function.

The original image is divided into a plurality of areas by a super-pixel division method, and each area represents a node in the adjacency matrix. And obtaining the edges between the nodes through pixel similarity between each region, thereby constructing an adjacency matrix.

The bimodal target detection model construction method of the embodiment enhances the fusion characteristics by constructing a double-layer graph convolution, and enhances the characteristic interaction between the areas so as to increase the accuracy of the finally obtained characteristics.

In the embodiment of the present invention, the inputting the first prediction result and the second prediction result into the superpixel mask module generates a final loss, including:

In some more specific embodiments, as shown in fig. 4, the intermediate false color feature and the intermediate true color feature are input into a variability convolution block, and the obtained features are primarily fused to obtain a fused modal feature. And inputting the fusion modal characteristics into the first convolution block to obtain a first prediction result, and obtaining a super-pixel mask loss Ls according to the first prediction result and the super-pixel mask true value. And simultaneously inputting the fusion modal characteristics into a double-layer graph convolution module to obtain graph convolution bimodal characteristics. The graph rolling module is generated through super-pixel segmentation, pixels with similar characteristics can be gathered into a super-pixel block, and the pixels are input into a graph rolling network through connection among the super-pixel blocks.

According to the bimodal target detection model construction method, the superpixel mask loss is obtained according to the first prediction result and the superpixel mask true value, the target detection loss is obtained according to the second prediction result and the target detection range true value, the final loss is obtained through the two losses, and the overall loss is measured more comprehensively.

In an embodiment of the present invention, referring to fig. 5, the obtaining the true value of the superpixel mask includes:

Specifically, the preset true value detection frame is a frame of a picture which is manually interpreted and then labeled with the labeling software, namely a detection frame which is manually labeled. And obtaining a super-pixel segmentation map by segmenting an original image, and filtering the super-pixel segmentation map and a preset truth value detection frame to obtain a super-pixel mask truth value obtained by calculation.

In the embodiment of the present invention, the neural network module includes a false color neural network and a true color neural network, and the extracting the features of the bimodal image by the neural network module to obtain initial bimodal features includes:

f _nir ＝F ₂ (I _nir-gb )；

f _r ＝F ₁ (I _r-gb )；

In the method for constructing the bimodal target detection model of the embodiment, the two bimodal images respectively pass through the two depth feature extraction networks to extract the depth features, so that the calculated data volume is smaller, and meanwhile, errors and unnecessary features generated in feature extraction are reduced by utilizing the pre-trained neural network.

In some more specific embodiments, as shown in connection with fig. 6, a yov 5 model is used, and the true color picture and the false color picture are respectively passed through five stages C1, C2, C3, C4, C5 of a deep convolution network, wherein the C3, C4, C5 stages each input the obtained feature map into a super-pixel segmentation based graph convolution module. And simultaneously inserting the characteristic complementary modules between each layer in five stages, enhancing each layer according to the vegetation normalization index, and inserting four characteristic complementary modules in total. In practice, we set three feature graphs to be input into the graph convolution module, and three leads are input into the graph convolution module in pairs corresponding to different scales, and the results of the fusion features corresponding to the true color features and the false color features are correspondingly output. Predicting the fusion characteristics to obtain three different second prediction results, and calculating target detection loss L _D . According to the mask loss L based on super pixels _S And target detection loss L _D Obtaining the final loss L _F . And performing optimization according to the final loss to obtain a target detection model. The calculation formula of the final loss is as follows:

L _F ＝αL _D +(1-α)L _S ；

wherein L is _F Represents the final loss, alpha represents the artificial preset constant, L _D Indicating the target detection loss, L _S Representing the super pixel mask loss.

The calculation formula of the target detection loss is as follows:

L _D ＝L _C +L _R ；

wherein L is _C Representing losses calculated from category, L _R Representing the loss calculated from the coordinates.

Corresponding to the above-mentioned method for constructing a bimodal target detection model, as shown in fig. 7, a further embodiment of the present invention further provides a bimodal target detection method, which includes the following steps:

step 710, obtaining a target bimodal image according to a target original image, wherein the target bimodal image comprises a false color image and a true color image;

step 720, inputting the target bimodal image into the bimodal target detection model obtained by the bimodal target detection model construction method to obtain a graph convolution bimodal characteristic;

and step 730, obtaining a target detection result according to the graph convolution bimodal feature.

FIG. 8 illustrates an internal block diagram of a computer device in one embodiment. The computer device may be specifically the terminal 110 (or the server 120) in fig. 1. As shown in fig. 8, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement a bimodal object detection model construction method. The internal memory may also have stored therein a computer program which, when executed by the processor, causes the processor to perform a bimodal object detection model construction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in FIG. 8 is merely a block diagram of some of the structures associated with the present invention and is not limiting of the computer device to which the present invention may be applied, and that a particular computer device may include more or fewer components than those shown, or may combine certain components, or have a different arrangement of components.

Another embodiment of the present invention provides a computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program: acquiring a bimodal image according to an original image, wherein the bimodal image comprises a false color image and a true color image;

In one embodiment, the steps of the above-described bimodal object detection model construction method are also implemented when the processor executes the computer program.

It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The method for constructing the bimodal target detection model is characterized by comprising the following steps of:

Inputting the intermediate bimodal feature into a graph rolling module to obtain a graph rolling bimodal feature, a first prediction result and a second prediction result, wherein the first prediction result comprises predicted image data, and the second prediction result comprises predicted coordinate data and predicted category data;

2. The method for constructing a bimodal target detection model according to claim 1, wherein the processing the bimodal image by using a feature complementary module to obtain a vegetation normalization index comprises:

3. The method according to claim 1, wherein the initial bimodal features include an initial false color feature and an initial true color feature, the sending the initial bimodal features to the feature complementary module to enable the feature complementary module to perform feature enhancement on the initial bimodal features by using the vegetation normalization index to obtain an intermediate bimodal feature, including:

f _nird ＝α _NDVI (f _nir -f _r )；

f _rd ＝α _NDVI (f _r -f _nir )；

4. A bimodal object detection model construction method according to claim 3, wherein said intermediate modal features comprise intermediate false color features and intermediate true color features, said obtaining said intermediate bimodal features by feature complementation of said infrared enhancement features and said red enhancement features comprising:

wherein f _n ^′ _ir Representing the intermediate false color features, f _r ^′ Representing the intermediate true color feature, F (x) representing the residual, σ representing the hyperbolic tangent activation function, GAP representing the global average pooling, and, as such, the sum of elements, and, as such, the product of the elements.

5. The method of claim 4, wherein the graph convolution module includes a double-layer graph convolution, a first convolution block, and a second convolution block, and the inputting the intermediate bimodal feature into the graph convolution module obtains a graph convolution bimodal feature, a first prediction result, and a second prediction result, including:

inputting the fusion modal characteristics into the first convolution block to obtain a first prediction result;

and inputting the graph convolution bimodal characteristic into the second convolution block to obtain a second prediction result.

6. The method of claim 5, wherein inputting the first prediction result and the second prediction result into a superpixel mask module generates a final loss, comprising:

7. The method for constructing a bimodal object detection model according to claim 6, wherein said obtaining a superpixel mask true value comprises:

8. The method for constructing a bimodal target detection model according to claim 3, wherein the neural network module includes a false color neural network and a true color neural network, the extracting features of the bimodal image by the neural network module to obtain initial bimodal features includes:

f _nir ＝F ₂ (I _nir-gb )；

f _r ＝F ₁ (I _r-gb )；

9. A bimodal target detection method, comprising:

inputting the target bimodal image into a bimodal target detection model obtained by the bimodal target detection model construction method according to any one of claims 1 to 8 to obtain a graph convolution bimodal characteristic;

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the bimodal object detection model construction method according to any one of claims 1 to 8 or the bimodal object detection method according to claim 9 when executing the computer program.