CN114913409A

CN114913409A - Camouflage target identification method for marine organisms

Info

Publication number: CN114913409A
Application number: CN202210392176.0A
Authority: CN
Inventors: 徐涛; 赵未硕; 史增勇; 蔡磊; 马玉琨; 柴豪杰; 周纪勇
Original assignee: Henan Institute of Science and Technology
Current assignee: Henan Institute of Science and Technology
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-08-16

Abstract

The invention provides a method for identifying a disguised object facing marine organisms, which is used for solving the technical problem that the existing disguised detection model is not suitable for identifying the disguised object in a marine scene. The method comprises the following steps: training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map; extracting environment characteristic information of the disguised target characteristic and the disguised target characteristic in the characteristic diagram by utilizing an environment attention mechanism; establishing a graph structure of the environmental characteristic information and the target salient characteristics; inputting the feature map into a similarity measurement module to obtain potential features and optimize candidate boxes to obtain a target feature map and obtain an underwater target detection model; judging the prediction accuracy of the underwater target detection model by using a discriminator; training an underwater target detection model and realizing the identification of the disguised target. The invention distinguishes the disguised target and the disguised target in a counterstudy mode, simultaneously solves the problem of fuzzy real boundary frames, and improves the identification precision of the underwater disguised target.

Description

Camouflage target identification method for marine organisms

Technical Field

The invention relates to the technical field of image processing, in particular to a method for identifying a camouflaged target facing marine organisms.

Background

Camouflage is a basic skill of many creatures in the nature, and is also a starting point for designing and manufacturing various types of bionic robots. The lion pretends to be a prey in a pasture, the chameleon can adjust the color of the lion according to the surrounding environment, and the marine bionic robot pretends to be the shape of a marine animal and can detect the sea and research marine organisms at a short distance. Background-similar disguise object detection refers to segmenting the disguise from a similar looking environment. The similarity of the disguised target and the surrounding environment poses a significant challenge to its detection. Many scholars realize the detection of the disguised target based on deep learning, and obtain certain achievements, and the detection of the disguised target is widely applied to multiple fields of military affairs, agriculture and medicine at present. Has great research significance and wide application prospect in the fields of marine economic crop fishing, marine organism research, marine disaster relief, marine resource exploration and the like.

However, the existing camouflaged object detection technology is greatly different from the problem studied by the present invention. With the continuous improvement of the robot manufacturing technology, the shape of the underwater bionic robot is quite similar to that of the bionic animal, and the existing algorithm is difficult to distinguish the characteristics of the underwater bionic robot and cannot accurately identify the camouflage target. Meanwhile, the marine environment is complex and various, and the extraction of target features is influenced by a plurality of factors such as darkness, turbidity, foreign matter shielding and similar background colors. Due to the factors, the characteristics of the camouflage target are incomplete, and the accurate and effective identification of the camouflage target is very challenging, so that the existing camouflage detection model is not suitable for identification of the camouflage object in the ocean scene.

The environment in the ocean is complex, the light is dim, and the ocean camouflaged object just uses the phenomenon to deceive a visual system of the observation device; and the difficulty of target feature extraction is further increased due to factors such as foreign matter shielding and water fluctuation.

Disclosure of Invention

Aiming at the technical problem that the existing camouflage detection model is not suitable for identifying a camouflage object in an ocean scene, the invention provides a method for identifying a camouflage target facing marine organisms.

In order to achieve the purpose, the technical scheme of the invention is realized as follows: a camouflaged target identification method facing marine organisms comprises the following steps:

the method comprises the following steps: training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map F, and acquiring a candidate frame according to the feature map F;

step two: extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder, and extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder;

step three: establishing graph structures of the environmental characteristic information and the target salient characteristics in the characteristic graph through a graph structure module to respectively obtain network relations of the disguised target and the disguised target;

step four: inputting the feature map obtained in the step one into a similarity measurement module, obtaining potential features by using the similarity measurement module, optimizing the candidate box generated in the step one to obtain a target feature map, and obtaining an underwater target detection model;

step five: judging the accuracy of the real label of the positive sample image with the label and the prediction of the underwater target detection model by using a discriminator;

step six: repeating the steps from one to five to train the underwater target detection model to obtain weights through a large number of marked positive sample images and unmarked negative sample images, and performing feature extraction on the input image by using the weights to finally obtain a trained underwater target detection model;

step seven: and inputting the new underwater image into the trained underwater target detection model to realize the identification of the disguised target and the disguised target.

The main network of the significant feature extraction network is a ResNet-50 network, the positive sample images are underwater camouflage objects with labels and real biological images, and the negative sample images are other object images irrelevant to identification; inputting the positive sample image and the negative sample image into a significant feature extraction network to train a backbone network of the significant feature extraction network, wherein the principle is as follows:

S(f(x),f(x ⁺ ))≥S(f(x),f(x ^- ))；

where x represents a feature learned by the backbone network, x ⁺ Representing samples similar to the feature x, x-representing samples dissimilar to the feature x, f () representing a feature extraction operation of the feature extraction network, S (-) representing a degree of similarity between the samples;

the ResNet-50 network outputs a feature map F (h, w, c) of the target, wherein the feature map F (h, w, c) is the collective output of a large number of features x; where h denotes the length of the output feature map, w denotes the width of the output feature map, and c denotes the number of convolution kernels.

The generation method of the candidate frame comprises the following steps: acquiring target candidate frames on the feature map F (h, w, c) in 3 x 3 sliding windows, wherein each sliding window generates a low-dimensional vector which is fed into two full-connection layers comprising a regression layer and a classification layer; the maximum possible candidate box proposition number of each position is represented as k, the coordinates of k candidate boxes are generated in the regression layer, the confidence coefficient of each candidate box is calculated by the classification layer, the candidate box with the confidence coefficient larger than 0.7 is regarded as a positive label, and the candidate box with the confidence coefficient area smaller than 0.3 is regarded as a negative label.

The method for the environment attention mechanism of the disguised target encoder in the second step comprises the following steps:

extracting network extracted disguised target significant feature X according to significant feature _c Acquiring local environment characteristic information:

X _Cloc ＝f _R (X _C ；H,W)

wherein f is _R An extraction function representing the local environmental characteristics, H and W representing the width and height of the input image, X _Cloc Local environment features representing the extracted feature map F;

the environment features of the global level are aggregated by the image features as follows: x _Cglo ＝f _G (X _C ；H,W)；

Wherein f is _G Extraction function, X, representing features of the global environment _Cglo Global environment features representing the extracted feature map F;

characterizing the local environment by X _Cloc And global environment attention feature X _Cglo The environmental characteristics obtained by fusion are as follows:

X _Cinr ＝f _couv ([X _Cloc :X _Cglo ])；

wherein f is _couv Represents a convolution operation, [:]represents a series, X _Cinr Representing the extracted environmental features;

obtaining more relevant, reliable environmental information that provides for object detection:

X _CT ＝f _inr (X _C ,X _Cinr ,Ω[X _C ,X _Cinr ],Ω[X _Cinr ,X _Cinr ])；

wherein, omega [ X ] _C ,X _Cinr ]Representing environmental features X _Cinr And disguised object feature X _c Correlation between them, Ω [ X ] _Cinr ,X _Cinr ]) Representing correlation between environmental features, f _inr Representing a dynamic coding function, X _CT Representing more relevant and reliable environmental feature information extracted by the salient feature extraction network.

The method for establishing the graph structure of the camouflage target by the graph structure module comprises the following steps:

the obtained salient features X comprising the camouflage target are obtained _C And disguised target environment feature information X _CT Is regarded as a node to obtain a node set V ═ V ₁ ,v ₂ ,...,v _a A represents the number of nodes; the graph structure module comprises three GraphConv layers, and the retrieval and the node v are performed through the GraphConv layers _i Relative proximityThe set of nodes is denoted P (v) _i ) Y represents a node v _i The updated features, the implicit relationship between the image target and the background are:

wherein m represents the number of layers, and the number may be 0, 1, 2; w ₁ 、W ₂ Are all indicative of the learning parameters and,

representing a set of neighboring nodes P (v) _i ) The aggregation function of (a) is selected,

representing a merge neighborhood; y is ^(m) (v _i ) Representing a node v _i M-th update of features, v _j Is a node v _i Associated with a neighboring node, Y ^(m) (v _j ) Representing a node v _j The mth update of the feature;

the output characteristics of the nodes are obtained by linear transformation: e.g. of the type _i,j ＝γ(W·[v _i :v _j ]+d),v _i ,v _j ∈V；

Wherein γ (·) denotes a sigmoid function, e _i,j Representing a node v _i And v _j A distance parameter between, and v _i ,v _j E is V; w and d represent network learning parameters, [:]representing a series connection of two nodes.

The method for establishing the network relationship comprises the following steps:

the distance parameter set from node to node is recorded as E ═ E _1,2 ,e _1,3 ,...,e _i,j If the nodes have implicit relations, the value is 1, and if the nodes have no relations, the value is 0; the network relationship between the constructed features is shown as Q ═ V, E;

the network relations of the constructed disguise target characteristics and the disguise target characteristics are respectively expressed as Q _C ＝(V _C ,E _C ) And Q _R ＝(V _R ,E _R ) Which isIn, V _C Representing a set of bogus target nodes, E _C A node-to-node distance parameter, V, representing the disguised object _R Representing a set of disguised target nodes, E _R Representing the distance parameter from the disguised target node to the node.

The similarity measurement module measures the significant characteristic X of the disguised target through cosine similarity _C And a distinctive feature X of the disguised object _R The distance of (c) is:

calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module _s Thereby obtaining the difference between the distinctive features of the disguised object and the distinctive features of the disguised object.

The underwater target detection model comprises a backbone network, a camouflage encoder, a camouflaged encoder, a graph structure module and a similarity measurement module, wherein the camouflage encoder and the camouflaged encoder are connected with the graph structure module, the camouflage encoder, the camouflaged encoder and the similarity measurement module are connected with the backbone network, and the underwater target detection model optimizes extracted feature graphs through distances calculated by the graph structure and the similarity measurement module obtained by the camouflage encoder and the camouflaged encoder.

The distance L calculated by the underwater target detection model through the similarity measurement module _s Extracting the network relation established by the graph structure model to a node v _i Is connected to the node v by the distance parameter of the graph structure in different degrees _i Optimizing the feature map to obtain a target feature map F _o Target feature map F _o The classification prediction of the frame masquerading candidate is generated by being fed into two fully-connected layers, and whether the target is a masquerading target or not is judged.

And (3) adaptively introducing the candidate frame into a prediction discrimination stage: is provided with

For a real boundary four-dimensional vector, the candidate box is (x) _1a ,y _1a ,x _2a ,y _2a ) And then:

wherein, t _x1 、t _x2 、t _y1 、t _y2 Indicates the deviation of the prediction, t _x ′ ₁ 、t _x ′ ₂ 、t _y ′ ₁ 、t _y ′ ₂ Representing true bounding box based compensation; x is the number of ₁ 、y ₁ 、x ₂ 、y ₂ 、w _a 、h _a As parameters of the real bounding box, x _1a 、y _1a 、x _2a 、y _2a Parameters of the candidate frame;

and detecting by using the candidate frame of N before ranking to realize the detection and identification of the disguised target.

The loss function of the underwater target detection model is as follows:

l＝α ₁ l _s +α ₂ l _d +α ₃ l _reg +α ₄ l _cls ；

wherein alpha is ₁ 、α ₂ 、α ₃ 、α ₄ Parameters representing candidate box localization loss, perceptual loss, countermeasure loss, and classification loss, respectively; l _cls Representing a cross entropy loss function;

the perceptual loss function l _d Is composed of

And a structural perceptual loss function l _k Comprises the following steps:

l _k ＝(1+5*|γ(XY)-GT|)L _S ；

wherein XY represents the predicted value, GT represents the label of the truth value machine, L _s Representing the distance output by the similarity metric module; network of discriminators

The device consists of 5 connected convolution layers, wherein gamma is a network parameter;

the penalty function is:

wherein l _s Representing the positioning loss function, h and w represent the length and width of the candidate box, w _a And h _a Indicating the length and width of the real bounding box.

The invention has the beneficial effects that: firstly, introducing a positive sample learning training model feature extraction network and a negative sample learning training model feature extraction network, obtaining sample feature representation through feature space comparison learning by the feature extraction network, obtaining the significant features of the marine organism camouflage target, extracting more related environmental features through a layered environmental attention mechanism, and solving the problems of difficulty in extracting target features, insufficient target features and the like; secondly, latent relation models of the disguised target and the disguised target which are fused with the remarkable environment are respectively constructed, the dependency relation between the target and the background characteristic is obtained, the shielding objects can be converted into useful information, the comprehension capability of the models to the scene is improved, and therefore the identification accuracy of the disguised target is improved. And finally, constructing a disguised target detection network of frame self-adaptive regression under a countermeasure frame, and realizing identification of the disguised target and the disguised target by a countermeasure learning method. The method can effectively solve the problem of identifying the marine camouflage objects, can be applied to the fields of mariculture and the like, and has positive significance for preventing the invasion of camouflage foreign matters, maintaining the mariculture environment, realizing autonomous management and improving the culture yield.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a schematic structural diagram of an underwater target detection model according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1, a method for identifying a camouflaged target facing marine life includes the following steps:

the method comprises the following steps: and training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map F, and acquiring a candidate frame according to the feature map F.

The invention introduces a ResNet-50 network as a main network of a significant feature extraction network, defines an underwater camouflage object with a label and a real biological image as positive sample images, defines other object images irrelevant to the identification of the invention as negative sample images, inputs the positive sample images and the negative sample images into the significant feature extraction network to train the main network of the significant feature extraction network, and the principle is shown as a formula (1):

S(f(x),f(x ⁺ ))≥S(f(x),f(x ^- ))

where x represents a feature learned by the backbone network, x ⁺ Representing samples similar to the feature x, called positive samples, x ^- A sample representing dissimilarity with the feature x is called a negative sample, and f () represents a feature extraction operation of the feature extraction network. S (·, ·) represents the degree of similarity between samples. Through training of the positive sample images and the negative sample images, the features learned by the backbone network are gradually close to the positive samples, and are also gradually far away from the negative samples.

The invention outputs the advanced shape feature map F (h, w, c) of the target through the ResNet-50 network, and the feature map is the collective output of a large number of features x. The characteristic diagram F contains the target salient characteristic information. Where h denotes the length of the output feature map, w denotes the width of the output feature map, and c denotes the number of convolution kernels. And acquiring the target candidate frame in a 3 × 3 sliding window on the feature map F (h, w, c). Each sliding window generates a network of low dimensional vectors that are fed into two fully connected layers (the regression layer and the classification layer), the maximum possible proposed number of candidate boxes for each position is denoted as k, so that the coordinates of k candidate boxes are generated at the regression layer, while the confidence of each candidate box at the classification layer is calculated. Candidate boxes with confidence greater than 0.7 are considered positive labels, and those with confidence regions less than 0.3 are considered negative labels. The positive label has a positive effect on the training of the significant feature extraction network, and the negative label has a certain inhibiting effect on the training of the significant feature extraction network, that is, the confidence values of a large number of generated candidate boxes are between 0 and 1. While a candidate box with a confidence level less than 0.3 is meaningless, a candidate box with a confidence level greater than 0.7 will be beneficial for the subsequent final recognition, which is a process from coarse recognition to fine recognition. The candidate box will facilitate subsequent final recognition, which is a process from coarse recognition to fine recognition.

Step two: and extracting the environmental characteristic information of the characteristics of the disguised target in the characteristic diagram F by using the environmental attention mechanism of the disguised target encoder, and extracting the environmental characteristic information of the characteristics of the disguised target in the characteristic diagram F by using the environmental attention mechanism of the disguised target encoder.

Obtaining the significant characteristics X of the disguised target after extracting the main network of the network through the significant characteristics _C ＝{x _c1 ,x _c2 ,···,x _cn And a disguised object saliency feature X _R ＝{x _r1 ,x _r2 ,…,x _rm1 N represents the number of disguised object features, and m1 represents the number of disguised object features. The areas in the image are not independent, and an implicit relation exists between the environment information and the target information, and the relation can assist in improving the identification capability of the target. The present invention will be directed to acquiring environmental information to improve the recognition capability of a target. The disguised target feature is taken as an example to be input into a disguised target encoder,firstly, local environment characteristic information is obtained according to the extracted disguised target characteristics, and the principle is shown as the following formula:

X _Cloc ＝f _R (X _C ；H,W) (1)

wherein f is _R An extraction function representing the local environment features, H and W represent the width and height of the input image, X _Cloc The local environmental features of the extracted feature map F are represented. X _c Representing a distinctive feature of the disguised object.

In order to acquire more environmental features, the global-level environmental features are aggregated by image features, and the principle is as follows:

X _Cglo ＝f _G (X _C ；H,W) (2)

wherein f is _G Extraction function, X, representing features of the global environment _Cglo Representing the global environmental features of the extracted feature map F.

Characterizing the local environment by X _Cloc Fusing X with global environmental attention features _Cglo The environmental characteristics can be obtained according to the following principle:

X _Cinr ＝f _couv ([X _Cloc :X _Cglo ]) (3)

wherein, f _couv Represents a convolution operation, [:]represents a series, X _Cinr Representing the extracted environmental features.

In order to obtain the dynamic relationship between the environmental characteristics and the target to a greater extent and improve the effect of the environmental characteristic information on target detection, more relevant and reliable environmental information can be provided for object detection by obtaining the environmental characteristic information through the formula (4):

X _CT ＝f _inr (X _C ,X _Cinr ,Ω[X _C ,X _Cinr ],Ω[X _Cinr ,X _Cinr ]) (4)

wherein, Ω [ X ] _C ,X _Cinr ]Representing environmental characteristics X _Cinr And disguised object feature X _c Correlation between them, Ω [ X ] _Cinr ,X _Cinr ]) Representing correlation between environmental features, f _inr Representing a dynamic coding function, X _CT Representing by salient feature extractionThe network extracts more relevant and reliable environmental characteristic information. Similarly, the extracted relevant environmental feature of the disguised target feature is X _RT 。

Step three: and establishing a graph structure of the environment characteristic information and the target salient characteristics in the characteristic graph through a graph structure module to obtain the network relationship between the disguised target and the disguised target.

The invention utilizes the target salient features and the related environmental features to construct a graph network, takes the disguised target as an example, and extracts the disguised target salient features X _C ＝{x _C1 ,x _C2 ,...,x _Cn And n represents the number of elements of the target significant feature. The obtained environment characteristic information of the disguised target is X _CT ＝{x _CT1 ,x _CT2 ,...,x _CTm2 Where m2 represents the number of elements of the relevant environmental characteristic. Regarding each obtained feature as a node, wherein the obtained feature includes environmental feature information and a target significant feature, and obtaining a node set V ═ { V ═ V } ₁ ,v ₂ ,...,v _a And a represents the number of nodes. The invention constructs a graph structure module through three GraphConv layers, and searches and nodes v through the GraphConv layers _i The set of related neighboring nodes is denoted as P (v) _i ) Y represents a node v _i The updated features. The implicit relationship between the image target and the background is constructed according to the following principle:

representing a merge neighborhood. Y is ^(m) (v _i ) Representing a node v _i M-th update of features, v _j Is a node v _i Relative proximityA node, Y ^(m) (v _j ) Representing a node v _j The mth update of the feature.

The output characteristic representation of the node is obtained by linear transformation:

e _i,j ＝γ(W·[v _i :v _j ]+d),v _i ,v _j ∈V (6)

wherein γ (·) denotes a sigmoid function, e _i,j Representing a node v _i And v _j A distance parameter between, and v _i ,v _j E.g. V. W and d represent web learning parameters. [:]representing a series connection of two nodes.

The distance parameter set from node to node is recorded as E ═ E _1,2 ,e _1,3 ,...,e _i,j And if the nodes have implicit relationship, the value is 1, and if the nodes have no relationship, the value is 0. Therefore, the network relation between the constructed features can be expressed as Q ═ V, E, so that the graph structure module constructs an implicit relation between the image target and the background, and the network relation between the constructed disguised target feature and the disguised target feature is respectively expressed as Q _C ＝(V _C ,E _C ) And Q _R ＝(V _R ,E _R ) Wherein V is _C Representing a set of bogus target nodes, E _C A node-to-node distance parameter, V, representing the disguised object _R Representing a set of disguised target nodes, E _R Representing the distance parameter from the disguised target node to the node. The graph structure module realizes the connection between the nodes through the node distance, namely extracting the nodes v from the network _i After the characteristics of (c) are related to the node v in different degrees through the distance parameters of the graph structure _i To further optimize the profile.

Step four: inputting the feature map obtained in the first step into a similarity measurement module, obtaining potential features by using the similarity measurement module, optimizing the candidate box generated in the first step to obtain a target feature map, and obtaining an underwater target detection model.

The feature map is fed to a similarity measurement module and the feature X is disguised by cosine similarity measurement _C And a camouflaged feature X _R Distance ofFrom, its principle can be expressed as:

calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module _s Therefore, the difference between the camouflage target characteristic and the disguised target characteristic is obtained, and the characteristic diagram F is further optimized through the distance difference.

The invention realizes the connection between the nodes through the node distance by constructing the graph network of the target significant characteristic and the related environmental characteristic through the graph structure module, namely extracting the node v from the network _i The characteristics of (2) are then linked to other node characteristics to varying degrees by the distance parameters of the graph structure. Obtaining disguised features X by a similarity measurement module _C And potentially disguised feature X _R To obtain a target feature map F _o Target feature map F _o Are fed into two fully-connected layers (a regression layer and a classification layer) to generate a classification prediction of frame masquerading candidates, and whether the target is a masquerading target is judged. The regression layer and the classification layer are arranged in the prediction frame generation module and used for predicting to achieve the purpose of distinguishing the disguised target from the disguised target.

As shown in fig. 2, the underwater target detection model includes a backbone network, a disguising encoder, a disguised encoder, a graph structure module and a similarity measurement module, wherein the disguising encoder and the disguised encoder are connected with the graph structure module, the disguising encoder, the disguised encoder and the similarity measurement module are connected with a backbone network, and the distance calculated by the graph structure and the similarity measurement module obtained by the disguising encoder and the disguised encoder is optimized to extract a feature graph.

Step five: and (3) judging the prediction accuracy of the real label, namely the real boundary frame, of the positive sample image with the label and the underwater target detection model by using the discriminator, wherein the prediction accuracy needs to meet the formula (8). The genuine tag is identical to the tag obtained by the discriminator. The judger uses the real label to compare the prediction result, thus achieving the purpose of judgment.

In order to ensure the prediction accuracy, the invention designs that the prediction result is further optimized by counterstudy of a camouflage encoder and the camouflage encoder, the correctness of classification is judged by utilizing a positive sample image with a label input discriminator, and a discriminator network is introduced

The network consists of 5 connected convolutional layers, where γ is the network parameter. And the discriminator network acquires the real classification of the image to judge whether the prediction classification is correct or not and promote the learning capability of the model. Setting XY to represent predicted value, GT to represent real label of real value machine, and structure perception loss function l _k Can be defined as:

l _k ＝(1+5*|γ(XY)-GT|)L _S (8)

wherein L is _s The result, i.e., the distance, output by the similarity metric module is represented.

To achieve antagonistic learning of the decoy target and the camouflaged target, an antagonistic loss function is defined:

the underwater object has a real boundary frame fuzzy phenomenon due to the shielding of foreign matters, and the prediction precision of the underwater target detection model is seriously influenced. In order to solve the problem, the invention introduces the candidate frame self-adaptation into a prediction discrimination stage, realizes the self-adaptation of the boundary of the prediction frame and optimizes the prediction result. Is provided with

For a true boundary four-dimensional vector, the candidate box may be represented as (x) _1a ,y _1a ,x _2a ,y _2a ) And then:

wherein, t _x1 、t _x2 、t _y1 、t _y2 Denotes the predicted deviation, t' _x1 、t′ _x2 、t′ _y1 、t′ _y2 Representing a true bounding box based compensation. x is the number of ₁ 、y ₁ 、x ₂ 、y ₂ 、w _a 、h _a As parameters of the real bounding box, x _1a 、y _1a 、x _2a 、y _2a Is a parameter of the candidate box. The predictions obtained through the above processes are not all effective, so that further screening is needed, and the method uses the candidate boxes of N before ranking for detection, and finally realizes the detection and identification of the target. The end-to-end mode candidate box shape prediction loss using the multitask loss proposed by the present invention can be expressed as:

wherein l _s Indicating the positioning loss, h and w respectively indicating the length and width of the predicted bounding box, and w _a And h _a Indicating the length and width of the real bounding box.

The loss function of the entire recognition framework can be expressed as:

l＝α ₁ l _s +α ₂ l _d +α ₃ l _reg +α ₄ l _cls (12)

wherein alpha is ₁ 、α ₂ 、α ₃ 、α ₄ Parameters representing candidate box localization loss, perceptual loss, countermeasure loss, and classification loss, respectively. l _cls The cross-entropy loss function is a commonly used loss function in classifying losses.

And training the underwater target detection model according to the process to obtain weights, namely parameters of the whole network, through a large number of marked positive sample images and unmarked negative sample images, and finally obtaining the underwater target detection model suitable for underwater camouflage target detection. And performing feature extraction on the input image by using the weight.

Step six: inputting the new underwater image into the underwater target detection model in the fifth step to realize the identification of the disguised target and the disguised target.

And (4) inputting the underwater image in the first step into the underwater target detection model in the fourth step, and outputting a detection result, namely, identifying the disguised target and the disguised target for the new underwater image.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A camouflaged target identification method facing marine organisms is characterized by comprising the following steps:

step four: inputting the characteristic diagram obtained in the first step into a similarity measurement module, obtaining potential characteristics by using the similarity measurement module, optimizing the candidate box generated in the first step to obtain a target characteristic diagram, and obtaining an underwater target detection model;

2. The method for identifying the disguised target facing the marine organism as claimed in claim 1, wherein the main network of the significant feature extraction network is a ResNet-50 network, the positive sample images are the underwater disguised object with the label and the real organism image, and the negative sample images are other object images irrelevant to the identification; inputting the positive sample image and the negative sample image into a significant feature extraction network to train a backbone network of the significant feature extraction network, wherein the principle is as follows:

S(f(x),f(x ⁺ ))≥S(f(x),f(x ^- ))；

where x represents a feature learned by the backbone network, x ⁺ Representing samples similar to the feature x, x ^- Representing samples dissimilar to the feature x, f () representing a feature extraction operation of the feature extraction network, S (-) representing a degree of similarity between the samples;

3. The method for identifying the disguised target facing the marine organism as claimed in claim 2, wherein the method for generating the candidate frame is: acquiring target candidate frames on the feature map F (h, w, c) in 3 x 3 sliding windows, wherein each sliding window generates a low-dimensional vector which is fed into two full-connection layers comprising a regression layer and a classification layer; the maximum possible candidate box proposition number of each position is represented as k, the coordinates of k candidate boxes are generated in the regression layer, the confidence coefficient of each candidate box is calculated by the classification layer, the candidate box with the confidence coefficient larger than 0.7 is regarded as a positive label, and the candidate box with the confidence coefficient area smaller than 0.3 is regarded as a negative label.

4. The marine organism-oriented disguised target identification method according to any one of claims 1 to 3, wherein the method for the environmental attention mechanism of the disguised target encoder in the second step is as follows:

X _Cloc ＝f _R (X _C ；H,W)

X _Cinr ＝f _couv ([X _Cloc :X _Cglo ])；

obtaining more relevant, reliable environmental information that provides object detection:

X _CT ＝f _inr (X _C ,X _Cinr ,Ω[X _C ,X _Cinr ],Ω[X _Cinr ,X _Cinr ])；

5. The method for identifying the disguised target facing the marine organism as claimed in claim 4, wherein the method for establishing the graph structure of the disguised target by the graph structure module is as follows:

the obtained salient features X comprising the camouflage target _C And disguised target environment feature information X _CT Is regarded as a node to obtain a node set V ═ V ₁ ,v ₂ ,...,v _a A represents the number of nodes; the graph structure module comprises three GraphConv layers, and the retrieval and the node v are performed through the GraphConv layers _i The set of related neighboring nodes is denoted as P (v) _i ) Y represents a node v _i The updated features, the implicit relationship between the image target and the background are:

wherein m represents the number of layers, and the value thereof may be 0, 1, 2; w ₁ 、W ₂ Are all indicative of the learning parameters and,

6. The method for identifying the disguised target facing the marine organism as claimed in claim 5, wherein the method for establishing the network relationship is:

the network relations of the constructed disguise target characteristics and the disguise target characteristics are respectively expressed as Q _C ＝(V _C ,E _C ) And Q _R ＝(V _R ,E _R ) Wherein V is _C Representing a set of bogus target nodes, E _C A node-to-node distance parameter, V, representing the disguised object _R Representing a set of disguised target nodes, E _R Representing the distance parameter from the disguised target node to the node.

7. The method for identifying a disguised target facing marine organisms according to claim 1, 4 or 5, wherein the similarity measurement module measures the distinctive feature X of the disguised target by cosine similarity _C And a distinctive feature X of the camouflaged object _R The distance of (c) is:

calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module _s Thereby obtaining the difference between the distinctive features of the disguising target and the distinctive features of the disguised target.

8. The marine organism-oriented camouflaged target identification method according to claim 7, wherein the underwater target detection model comprises a backbone network, a camouflaged encoder, a graph structure module and a similarity measurement module, wherein the camouflaged encoder and the camouflaged encoder are connected with the graph structure module, the camouflaged encoder, the similarity measurement module and the backbone network, and the underwater target detection model optimizes the extracted feature graph through the graph structure obtained by the camouflaged encoder and the distance calculated by the similarity measurement module.

9. The marine organism-oriented camouflage target recognition method according to claim 1 or 8, wherein the candidate frame is adaptively introduced into a prediction discrimination stage: is provided with

For a true boundary four-dimensional vector, the candidate box is (x) _1a ,y _1a ,x _2a ,y _2a ) And then:

wherein, t _x1 、t _x2 、t _y1 、t _y2 Denotes the predicted deviation, t' _x1 、t′ _x2 、t′ _y1 、t′ _y2 Representing true bounding box based compensation; x is the number of ₁ 、y ₁ 、x ₂ 、y ₂ 、w _a 、h _a As parameters of the real bounding box, x _1a 、y _1a 、x _2a 、y _2a Parameters of the candidate frame;

10. The marine organism-oriented camouflaged target identification method according to claim 9, wherein the loss function of the underwater target detection model is:

l＝α ₁ l _s +α ₂ l _d +α ₃ l _reg +α ₄ l _cls ；

wherein alpha is ₁ 、α ₂ 、α ₃ 、α ₄ Parameters representing candidate box localization loss, perception loss, confrontation loss and classification loss respectively; l. the _cls Representing a cross entropy loss function;

the perceptual loss function l _d Is composed of

And a structural perceptual loss function l _k Comprises the following steps:

l _k ＝(1+5*|γ(XY)-GT|)L _S ；

the penalty function is:

wherein l _s Representing the positioning loss function, h and w represent the length and width of the candidate box, w _a And h _a Representing the length and width of the real bounding box.