CN114913409A - Camouflage target identification method for marine organisms - Google Patents
Camouflage target identification method for marine organisms Download PDFInfo
- Publication number
- CN114913409A CN114913409A CN202210392176.0A CN202210392176A CN114913409A CN 114913409 A CN114913409 A CN 114913409A CN 202210392176 A CN202210392176 A CN 202210392176A CN 114913409 A CN114913409 A CN 114913409A
- Authority
- CN
- China
- Prior art keywords
- target
- disguised
- representing
- feature
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Image Analysis (AREA)
- Farming Of Fish And Shellfish (AREA)
- Image Processing (AREA)
Abstract
The invention provides a method for identifying a disguised object facing marine organisms, which is used for solving the technical problem that the existing disguised detection model is not suitable for identifying the disguised object in a marine scene. The method comprises the following steps: training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map; extracting environment characteristic information of the disguised target characteristic and the disguised target characteristic in the characteristic diagram by utilizing an environment attention mechanism; establishing a graph structure of the environmental characteristic information and the target salient characteristics; inputting the feature map into a similarity measurement module to obtain potential features and optimize candidate boxes to obtain a target feature map and obtain an underwater target detection model; judging the prediction accuracy of the underwater target detection model by using a discriminator; training an underwater target detection model and realizing the identification of the disguised target. The invention distinguishes the disguised target and the disguised target in a counterstudy mode, simultaneously solves the problem of fuzzy real boundary frames, and improves the identification precision of the underwater disguised target.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for identifying a camouflaged target facing marine organisms.
Background
Camouflage is a basic skill of many creatures in the nature, and is also a starting point for designing and manufacturing various types of bionic robots. The lion pretends to be a prey in a pasture, the chameleon can adjust the color of the lion according to the surrounding environment, and the marine bionic robot pretends to be the shape of a marine animal and can detect the sea and research marine organisms at a short distance. Background-similar disguise object detection refers to segmenting the disguise from a similar looking environment. The similarity of the disguised target and the surrounding environment poses a significant challenge to its detection. Many scholars realize the detection of the disguised target based on deep learning, and obtain certain achievements, and the detection of the disguised target is widely applied to multiple fields of military affairs, agriculture and medicine at present. Has great research significance and wide application prospect in the fields of marine economic crop fishing, marine organism research, marine disaster relief, marine resource exploration and the like.
However, the existing camouflaged object detection technology is greatly different from the problem studied by the present invention. With the continuous improvement of the robot manufacturing technology, the shape of the underwater bionic robot is quite similar to that of the bionic animal, and the existing algorithm is difficult to distinguish the characteristics of the underwater bionic robot and cannot accurately identify the camouflage target. Meanwhile, the marine environment is complex and various, and the extraction of target features is influenced by a plurality of factors such as darkness, turbidity, foreign matter shielding and similar background colors. Due to the factors, the characteristics of the camouflage target are incomplete, and the accurate and effective identification of the camouflage target is very challenging, so that the existing camouflage detection model is not suitable for identification of the camouflage object in the ocean scene.
The environment in the ocean is complex, the light is dim, and the ocean camouflaged object just uses the phenomenon to deceive a visual system of the observation device; and the difficulty of target feature extraction is further increased due to factors such as foreign matter shielding and water fluctuation.
Disclosure of Invention
Aiming at the technical problem that the existing camouflage detection model is not suitable for identifying a camouflage object in an ocean scene, the invention provides a method for identifying a camouflage target facing marine organisms.
In order to achieve the purpose, the technical scheme of the invention is realized as follows: a camouflaged target identification method facing marine organisms comprises the following steps:
the method comprises the following steps: training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map F, and acquiring a candidate frame according to the feature map F;
step two: extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder, and extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder;
step three: establishing graph structures of the environmental characteristic information and the target salient characteristics in the characteristic graph through a graph structure module to respectively obtain network relations of the disguised target and the disguised target;
step four: inputting the feature map obtained in the step one into a similarity measurement module, obtaining potential features by using the similarity measurement module, optimizing the candidate box generated in the step one to obtain a target feature map, and obtaining an underwater target detection model;
step five: judging the accuracy of the real label of the positive sample image with the label and the prediction of the underwater target detection model by using a discriminator;
step six: repeating the steps from one to five to train the underwater target detection model to obtain weights through a large number of marked positive sample images and unmarked negative sample images, and performing feature extraction on the input image by using the weights to finally obtain a trained underwater target detection model;
step seven: and inputting the new underwater image into the trained underwater target detection model to realize the identification of the disguised target and the disguised target.
The main network of the significant feature extraction network is a ResNet-50 network, the positive sample images are underwater camouflage objects with labels and real biological images, and the negative sample images are other object images irrelevant to identification; inputting the positive sample image and the negative sample image into a significant feature extraction network to train a backbone network of the significant feature extraction network, wherein the principle is as follows:
S(f(x),f(x + ))≥S(f(x),f(x - ));
where x represents a feature learned by the backbone network, x + Representing samples similar to the feature x, x-representing samples dissimilar to the feature x, f () representing a feature extraction operation of the feature extraction network, S (-) representing a degree of similarity between the samples;
the ResNet-50 network outputs a feature map F (h, w, c) of the target, wherein the feature map F (h, w, c) is the collective output of a large number of features x; where h denotes the length of the output feature map, w denotes the width of the output feature map, and c denotes the number of convolution kernels.
The generation method of the candidate frame comprises the following steps: acquiring target candidate frames on the feature map F (h, w, c) in 3 x 3 sliding windows, wherein each sliding window generates a low-dimensional vector which is fed into two full-connection layers comprising a regression layer and a classification layer; the maximum possible candidate box proposition number of each position is represented as k, the coordinates of k candidate boxes are generated in the regression layer, the confidence coefficient of each candidate box is calculated by the classification layer, the candidate box with the confidence coefficient larger than 0.7 is regarded as a positive label, and the candidate box with the confidence coefficient area smaller than 0.3 is regarded as a negative label.
The method for the environment attention mechanism of the disguised target encoder in the second step comprises the following steps:
extracting network extracted disguised target significant feature X according to significant feature c Acquiring local environment characteristic information:
X Cloc =f R (X C ;H,W)
wherein f is R An extraction function representing the local environmental characteristics, H and W representing the width and height of the input image, X Cloc Local environment features representing the extracted feature map F;
the environment features of the global level are aggregated by the image features as follows: x Cglo =f G (X C ;H,W);
Wherein f is G Extraction function, X, representing features of the global environment Cglo Global environment features representing the extracted feature map F;
characterizing the local environment by X Cloc And global environment attention feature X Cglo The environmental characteristics obtained by fusion are as follows:
X Cinr =f couv ([X Cloc :X Cglo ]);
wherein f is couv Represents a convolution operation, [:]represents a series, X Cinr Representing the extracted environmental features;
obtaining more relevant, reliable environmental information that provides for object detection:
X CT =f inr (X C ,X Cinr ,Ω[X C ,X Cinr ],Ω[X Cinr ,X Cinr ]);
wherein, omega [ X ] C ,X Cinr ]Representing environmental features X Cinr And disguised object feature X c Correlation between them, Ω [ X ] Cinr ,X Cinr ]) Representing correlation between environmental features, f inr Representing a dynamic coding function, X CT Representing more relevant and reliable environmental feature information extracted by the salient feature extraction network.
The method for establishing the graph structure of the camouflage target by the graph structure module comprises the following steps:
the obtained salient features X comprising the camouflage target are obtained C And disguised target environment feature information X CT Is regarded as a node to obtain a node set V ═ V 1 ,v 2 ,...,v a A represents the number of nodes; the graph structure module comprises three GraphConv layers, and the retrieval and the node v are performed through the GraphConv layers i Relative proximityThe set of nodes is denoted P (v) i ) Y represents a node v i The updated features, the implicit relationship between the image target and the background are:
wherein m represents the number of layers, and the number may be 0, 1, 2; w 1 、W 2 Are all indicative of the learning parameters and,representing a set of neighboring nodes P (v) i ) The aggregation function of (a) is selected,representing a merge neighborhood; y is (m) (v i ) Representing a node v i M-th update of features, v j Is a node v i Associated with a neighboring node, Y (m) (v j ) Representing a node v j The mth update of the feature;
the output characteristics of the nodes are obtained by linear transformation: e.g. of the type i,j =γ(W·[v i :v j ]+d),v i ,v j ∈V;
Wherein γ (·) denotes a sigmoid function, e i,j Representing a node v i And v j A distance parameter between, and v i ,v j E is V; w and d represent network learning parameters, [:]representing a series connection of two nodes.
The method for establishing the network relationship comprises the following steps:
the distance parameter set from node to node is recorded as E ═ E 1,2 ,e 1,3 ,...,e i,j If the nodes have implicit relations, the value is 1, and if the nodes have no relations, the value is 0; the network relationship between the constructed features is shown as Q ═ V, E;
the network relations of the constructed disguise target characteristics and the disguise target characteristics are respectively expressed as Q C =(V C ,E C ) And Q R =(V R ,E R ) Which isIn, V C Representing a set of bogus target nodes, E C A node-to-node distance parameter, V, representing the disguised object R Representing a set of disguised target nodes, E R Representing the distance parameter from the disguised target node to the node.
The similarity measurement module measures the significant characteristic X of the disguised target through cosine similarity C And a distinctive feature X of the disguised object R The distance of (c) is:
calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module s Thereby obtaining the difference between the distinctive features of the disguised object and the distinctive features of the disguised object.
The underwater target detection model comprises a backbone network, a camouflage encoder, a camouflaged encoder, a graph structure module and a similarity measurement module, wherein the camouflage encoder and the camouflaged encoder are connected with the graph structure module, the camouflage encoder, the camouflaged encoder and the similarity measurement module are connected with the backbone network, and the underwater target detection model optimizes extracted feature graphs through distances calculated by the graph structure and the similarity measurement module obtained by the camouflage encoder and the camouflaged encoder.
The distance L calculated by the underwater target detection model through the similarity measurement module s Extracting the network relation established by the graph structure model to a node v i Is connected to the node v by the distance parameter of the graph structure in different degrees i Optimizing the feature map to obtain a target feature map F o Target feature map F o The classification prediction of the frame masquerading candidate is generated by being fed into two fully-connected layers, and whether the target is a masquerading target or not is judged.
And (3) adaptively introducing the candidate frame into a prediction discrimination stage: is provided withFor a real boundary four-dimensional vector, the candidate box is (x) 1a ,y 1a ,x 2a ,y 2a ) And then:
wherein, t x1 、t x2 、t y1 、t y2 Indicates the deviation of the prediction, t x ′ 1 、t x ′ 2 、t y ′ 1 、t y ′ 2 Representing true bounding box based compensation; x is the number of 1 、y 1 、x 2 、y 2 、w a 、h a As parameters of the real bounding box, x 1a 、y 1a 、x 2a 、y 2a Parameters of the candidate frame;
and detecting by using the candidate frame of N before ranking to realize the detection and identification of the disguised target.
The loss function of the underwater target detection model is as follows:
l=α 1 l s +α 2 l d +α 3 l reg +α 4 l cls ;
wherein alpha is 1 、α 2 、α 3 、α 4 Parameters representing candidate box localization loss, perceptual loss, countermeasure loss, and classification loss, respectively; l cls Representing a cross entropy loss function;
the perceptual loss function l d Is composed ofAnd a structural perceptual loss function l k Comprises the following steps:
l k =(1+5*|γ(XY)-GT|)L S ;
wherein XY represents the predicted value, GT represents the label of the truth value machine, L s Representing the distance output by the similarity metric module; network of discriminatorsThe device consists of 5 connected convolution layers, wherein gamma is a network parameter;
the penalty function is:
wherein l s Representing the positioning loss function, h and w represent the length and width of the candidate box, w a And h a Indicating the length and width of the real bounding box.
The invention has the beneficial effects that: firstly, introducing a positive sample learning training model feature extraction network and a negative sample learning training model feature extraction network, obtaining sample feature representation through feature space comparison learning by the feature extraction network, obtaining the significant features of the marine organism camouflage target, extracting more related environmental features through a layered environmental attention mechanism, and solving the problems of difficulty in extracting target features, insufficient target features and the like; secondly, latent relation models of the disguised target and the disguised target which are fused with the remarkable environment are respectively constructed, the dependency relation between the target and the background characteristic is obtained, the shielding objects can be converted into useful information, the comprehension capability of the models to the scene is improved, and therefore the identification accuracy of the disguised target is improved. And finally, constructing a disguised target detection network of frame self-adaptive regression under a countermeasure frame, and realizing identification of the disguised target and the disguised target by a countermeasure learning method. The method can effectively solve the problem of identifying the marine camouflage objects, can be applied to the fields of mariculture and the like, and has positive significance for preventing the invasion of camouflage foreign matters, maintaining the mariculture environment, realizing autonomous management and improving the culture yield.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the present invention.
FIG. 2 is a schematic structural diagram of an underwater target detection model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, a method for identifying a camouflaged target facing marine life includes the following steps:
the method comprises the following steps: and training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map F, and acquiring a candidate frame according to the feature map F.
The invention introduces a ResNet-50 network as a main network of a significant feature extraction network, defines an underwater camouflage object with a label and a real biological image as positive sample images, defines other object images irrelevant to the identification of the invention as negative sample images, inputs the positive sample images and the negative sample images into the significant feature extraction network to train the main network of the significant feature extraction network, and the principle is shown as a formula (1):
S(f(x),f(x + ))≥S(f(x),f(x - ))
where x represents a feature learned by the backbone network, x + Representing samples similar to the feature x, called positive samples, x - A sample representing dissimilarity with the feature x is called a negative sample, and f () represents a feature extraction operation of the feature extraction network. S (·, ·) represents the degree of similarity between samples. Through training of the positive sample images and the negative sample images, the features learned by the backbone network are gradually close to the positive samples, and are also gradually far away from the negative samples.
The invention outputs the advanced shape feature map F (h, w, c) of the target through the ResNet-50 network, and the feature map is the collective output of a large number of features x. The characteristic diagram F contains the target salient characteristic information. Where h denotes the length of the output feature map, w denotes the width of the output feature map, and c denotes the number of convolution kernels. And acquiring the target candidate frame in a 3 × 3 sliding window on the feature map F (h, w, c). Each sliding window generates a network of low dimensional vectors that are fed into two fully connected layers (the regression layer and the classification layer), the maximum possible proposed number of candidate boxes for each position is denoted as k, so that the coordinates of k candidate boxes are generated at the regression layer, while the confidence of each candidate box at the classification layer is calculated. Candidate boxes with confidence greater than 0.7 are considered positive labels, and those with confidence regions less than 0.3 are considered negative labels. The positive label has a positive effect on the training of the significant feature extraction network, and the negative label has a certain inhibiting effect on the training of the significant feature extraction network, that is, the confidence values of a large number of generated candidate boxes are between 0 and 1. While a candidate box with a confidence level less than 0.3 is meaningless, a candidate box with a confidence level greater than 0.7 will be beneficial for the subsequent final recognition, which is a process from coarse recognition to fine recognition. The candidate box will facilitate subsequent final recognition, which is a process from coarse recognition to fine recognition.
Step two: and extracting the environmental characteristic information of the characteristics of the disguised target in the characteristic diagram F by using the environmental attention mechanism of the disguised target encoder, and extracting the environmental characteristic information of the characteristics of the disguised target in the characteristic diagram F by using the environmental attention mechanism of the disguised target encoder.
Obtaining the significant characteristics X of the disguised target after extracting the main network of the network through the significant characteristics C ={x c1 ,x c2 ,···,x cn And a disguised object saliency feature X R ={x r1 ,x r2 ,…,x rm1 N represents the number of disguised object features, and m1 represents the number of disguised object features. The areas in the image are not independent, and an implicit relation exists between the environment information and the target information, and the relation can assist in improving the identification capability of the target. The present invention will be directed to acquiring environmental information to improve the recognition capability of a target. The disguised target feature is taken as an example to be input into a disguised target encoder,firstly, local environment characteristic information is obtained according to the extracted disguised target characteristics, and the principle is shown as the following formula:
X Cloc =f R (X C ;H,W) (1)
wherein f is R An extraction function representing the local environment features, H and W represent the width and height of the input image, X Cloc The local environmental features of the extracted feature map F are represented. X c Representing a distinctive feature of the disguised object.
In order to acquire more environmental features, the global-level environmental features are aggregated by image features, and the principle is as follows:
X Cglo =f G (X C ;H,W) (2)
wherein f is G Extraction function, X, representing features of the global environment Cglo Representing the global environmental features of the extracted feature map F.
Characterizing the local environment by X Cloc Fusing X with global environmental attention features Cglo The environmental characteristics can be obtained according to the following principle:
X Cinr =f couv ([X Cloc :X Cglo ]) (3)
wherein, f couv Represents a convolution operation, [:]represents a series, X Cinr Representing the extracted environmental features.
In order to obtain the dynamic relationship between the environmental characteristics and the target to a greater extent and improve the effect of the environmental characteristic information on target detection, more relevant and reliable environmental information can be provided for object detection by obtaining the environmental characteristic information through the formula (4):
X CT =f inr (X C ,X Cinr ,Ω[X C ,X Cinr ],Ω[X Cinr ,X Cinr ]) (4)
wherein, Ω [ X ] C ,X Cinr ]Representing environmental characteristics X Cinr And disguised object feature X c Correlation between them, Ω [ X ] Cinr ,X Cinr ]) Representing correlation between environmental features, f inr Representing a dynamic coding function, X CT Representing by salient feature extractionThe network extracts more relevant and reliable environmental characteristic information. Similarly, the extracted relevant environmental feature of the disguised target feature is X RT 。
Step three: and establishing a graph structure of the environment characteristic information and the target salient characteristics in the characteristic graph through a graph structure module to obtain the network relationship between the disguised target and the disguised target.
The invention utilizes the target salient features and the related environmental features to construct a graph network, takes the disguised target as an example, and extracts the disguised target salient features X C ={x C1 ,x C2 ,...,x Cn And n represents the number of elements of the target significant feature. The obtained environment characteristic information of the disguised target is X CT ={x CT1 ,x CT2 ,...,x CTm2 Where m2 represents the number of elements of the relevant environmental characteristic. Regarding each obtained feature as a node, wherein the obtained feature includes environmental feature information and a target significant feature, and obtaining a node set V ═ { V ═ V } 1 ,v 2 ,...,v a And a represents the number of nodes. The invention constructs a graph structure module through three GraphConv layers, and searches and nodes v through the GraphConv layers i The set of related neighboring nodes is denoted as P (v) i ) Y represents a node v i The updated features. The implicit relationship between the image target and the background is constructed according to the following principle:
wherein m represents the number of layers, and the number may be 0, 1, 2; w 1 、W 2 Are all indicative of the learning parameters and,representing a set of neighboring nodes P (v) i ) The aggregation function of (a) is selected,representing a merge neighborhood. Y is (m) (v i ) Representing a node v i M-th update of features, v j Is a node v i Relative proximityA node, Y (m) (v j ) Representing a node v j The mth update of the feature.
The output characteristic representation of the node is obtained by linear transformation:
e i,j =γ(W·[v i :v j ]+d),v i ,v j ∈V (6)
wherein γ (·) denotes a sigmoid function, e i,j Representing a node v i And v j A distance parameter between, and v i ,v j E.g. V. W and d represent web learning parameters. [:]representing a series connection of two nodes.
The distance parameter set from node to node is recorded as E ═ E 1,2 ,e 1,3 ,...,e i,j And if the nodes have implicit relationship, the value is 1, and if the nodes have no relationship, the value is 0. Therefore, the network relation between the constructed features can be expressed as Q ═ V, E, so that the graph structure module constructs an implicit relation between the image target and the background, and the network relation between the constructed disguised target feature and the disguised target feature is respectively expressed as Q C =(V C ,E C ) And Q R =(V R ,E R ) Wherein V is C Representing a set of bogus target nodes, E C A node-to-node distance parameter, V, representing the disguised object R Representing a set of disguised target nodes, E R Representing the distance parameter from the disguised target node to the node. The graph structure module realizes the connection between the nodes through the node distance, namely extracting the nodes v from the network i After the characteristics of (c) are related to the node v in different degrees through the distance parameters of the graph structure i To further optimize the profile.
Step four: inputting the feature map obtained in the first step into a similarity measurement module, obtaining potential features by using the similarity measurement module, optimizing the candidate box generated in the first step to obtain a target feature map, and obtaining an underwater target detection model.
The feature map is fed to a similarity measurement module and the feature X is disguised by cosine similarity measurement C And a camouflaged feature X R Distance ofFrom, its principle can be expressed as:
calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module s Therefore, the difference between the camouflage target characteristic and the disguised target characteristic is obtained, and the characteristic diagram F is further optimized through the distance difference.
The invention realizes the connection between the nodes through the node distance by constructing the graph network of the target significant characteristic and the related environmental characteristic through the graph structure module, namely extracting the node v from the network i The characteristics of (2) are then linked to other node characteristics to varying degrees by the distance parameters of the graph structure. Obtaining disguised features X by a similarity measurement module C And potentially disguised feature X R To obtain a target feature map F o Target feature map F o Are fed into two fully-connected layers (a regression layer and a classification layer) to generate a classification prediction of frame masquerading candidates, and whether the target is a masquerading target is judged. The regression layer and the classification layer are arranged in the prediction frame generation module and used for predicting to achieve the purpose of distinguishing the disguised target from the disguised target.
As shown in fig. 2, the underwater target detection model includes a backbone network, a disguising encoder, a disguised encoder, a graph structure module and a similarity measurement module, wherein the disguising encoder and the disguised encoder are connected with the graph structure module, the disguising encoder, the disguised encoder and the similarity measurement module are connected with a backbone network, and the distance calculated by the graph structure and the similarity measurement module obtained by the disguising encoder and the disguised encoder is optimized to extract a feature graph.
Step five: and (3) judging the prediction accuracy of the real label, namely the real boundary frame, of the positive sample image with the label and the underwater target detection model by using the discriminator, wherein the prediction accuracy needs to meet the formula (8). The genuine tag is identical to the tag obtained by the discriminator. The judger uses the real label to compare the prediction result, thus achieving the purpose of judgment.
In order to ensure the prediction accuracy, the invention designs that the prediction result is further optimized by counterstudy of a camouflage encoder and the camouflage encoder, the correctness of classification is judged by utilizing a positive sample image with a label input discriminator, and a discriminator network is introducedThe network consists of 5 connected convolutional layers, where γ is the network parameter. And the discriminator network acquires the real classification of the image to judge whether the prediction classification is correct or not and promote the learning capability of the model. Setting XY to represent predicted value, GT to represent real label of real value machine, and structure perception loss function l k Can be defined as:
l k =(1+5*|γ(XY)-GT|)L S (8)
wherein L is s The result, i.e., the distance, output by the similarity metric module is represented.
To achieve antagonistic learning of the decoy target and the camouflaged target, an antagonistic loss function is defined:
the underwater object has a real boundary frame fuzzy phenomenon due to the shielding of foreign matters, and the prediction precision of the underwater target detection model is seriously influenced. In order to solve the problem, the invention introduces the candidate frame self-adaptation into a prediction discrimination stage, realizes the self-adaptation of the boundary of the prediction frame and optimizes the prediction result. Is provided withFor a true boundary four-dimensional vector, the candidate box may be represented as (x) 1a ,y 1a ,x 2a ,y 2a ) And then:
wherein, t x1 、t x2 、t y1 、t y2 Denotes the predicted deviation, t' x1 、t′ x2 、t′ y1 、t′ y2 Representing a true bounding box based compensation. x is the number of 1 、y 1 、x 2 、y 2 、w a 、h a As parameters of the real bounding box, x 1a 、y 1a 、x 2a 、y 2a Is a parameter of the candidate box. The predictions obtained through the above processes are not all effective, so that further screening is needed, and the method uses the candidate boxes of N before ranking for detection, and finally realizes the detection and identification of the target. The end-to-end mode candidate box shape prediction loss using the multitask loss proposed by the present invention can be expressed as:
wherein l s Indicating the positioning loss, h and w respectively indicating the length and width of the predicted bounding box, and w a And h a Indicating the length and width of the real bounding box.
The loss function of the entire recognition framework can be expressed as:
l=α 1 l s +α 2 l d +α 3 l reg +α 4 l cls (12)
wherein alpha is 1 、α 2 、α 3 、α 4 Parameters representing candidate box localization loss, perceptual loss, countermeasure loss, and classification loss, respectively. l cls The cross-entropy loss function is a commonly used loss function in classifying losses.
And training the underwater target detection model according to the process to obtain weights, namely parameters of the whole network, through a large number of marked positive sample images and unmarked negative sample images, and finally obtaining the underwater target detection model suitable for underwater camouflage target detection. And performing feature extraction on the input image by using the weight.
Step six: inputting the new underwater image into the underwater target detection model in the fifth step to realize the identification of the disguised target and the disguised target.
And (4) inputting the underwater image in the first step into the underwater target detection model in the fourth step, and outputting a detection result, namely, identifying the disguised target and the disguised target for the new underwater image.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A camouflaged target identification method facing marine organisms is characterized by comprising the following steps:
the method comprises the following steps: training the salient feature extraction network by using the positive sample image and the negative sample image to generate a feature map F, and acquiring a candidate frame according to the feature map F;
step two: extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder, and extracting the environmental characteristic information of the disguised target characteristics in the characteristic diagram F by using an environmental attention mechanism of the disguised target encoder;
step three: establishing graph structures of the environmental characteristic information and the target salient characteristics in the characteristic graph through a graph structure module to respectively obtain network relations of the disguised target and the disguised target;
step four: inputting the characteristic diagram obtained in the first step into a similarity measurement module, obtaining potential characteristics by using the similarity measurement module, optimizing the candidate box generated in the first step to obtain a target characteristic diagram, and obtaining an underwater target detection model;
step five: judging the accuracy of the real label of the positive sample image with the label and the prediction of the underwater target detection model by using a discriminator;
step six: repeating the steps from one to five to train the underwater target detection model to obtain weights through a large number of marked positive sample images and unmarked negative sample images, and performing feature extraction on the input image by using the weights to finally obtain a trained underwater target detection model;
step seven: and inputting the new underwater image into the trained underwater target detection model to realize the identification of the disguised target and the disguised target.
2. The method for identifying the disguised target facing the marine organism as claimed in claim 1, wherein the main network of the significant feature extraction network is a ResNet-50 network, the positive sample images are the underwater disguised object with the label and the real organism image, and the negative sample images are other object images irrelevant to the identification; inputting the positive sample image and the negative sample image into a significant feature extraction network to train a backbone network of the significant feature extraction network, wherein the principle is as follows:
S(f(x),f(x + ))≥S(f(x),f(x - ));
where x represents a feature learned by the backbone network, x + Representing samples similar to the feature x, x - Representing samples dissimilar to the feature x, f () representing a feature extraction operation of the feature extraction network, S (-) representing a degree of similarity between the samples;
the ResNet-50 network outputs a feature map F (h, w, c) of the target, wherein the feature map F (h, w, c) is the collective output of a large number of features x; where h denotes the length of the output feature map, w denotes the width of the output feature map, and c denotes the number of convolution kernels.
3. The method for identifying the disguised target facing the marine organism as claimed in claim 2, wherein the method for generating the candidate frame is: acquiring target candidate frames on the feature map F (h, w, c) in 3 x 3 sliding windows, wherein each sliding window generates a low-dimensional vector which is fed into two full-connection layers comprising a regression layer and a classification layer; the maximum possible candidate box proposition number of each position is represented as k, the coordinates of k candidate boxes are generated in the regression layer, the confidence coefficient of each candidate box is calculated by the classification layer, the candidate box with the confidence coefficient larger than 0.7 is regarded as a positive label, and the candidate box with the confidence coefficient area smaller than 0.3 is regarded as a negative label.
4. The marine organism-oriented disguised target identification method according to any one of claims 1 to 3, wherein the method for the environmental attention mechanism of the disguised target encoder in the second step is as follows:
extracting network extracted disguised target significant feature X according to significant feature c Acquiring local environment characteristic information:
X Cloc =f R (X C ;H,W)
wherein f is R An extraction function representing the local environmental characteristics, H and W representing the width and height of the input image, X Cloc Local environment features representing the extracted feature map F;
the environment features of the global level are aggregated by the image features as follows: x Cglo =f G (X C ;H,W);
Wherein f is G Extraction function, X, representing features of the global environment Cglo Global environment features representing the extracted feature map F;
characterizing the local environment by X Cloc And global environment attention feature X Cglo The environmental characteristics obtained by fusion are as follows:
X Cinr =f couv ([X Cloc :X Cglo ]);
wherein f is couv Represents a convolution operation, [:]represents a series, X Cinr Representing the extracted environmental features;
obtaining more relevant, reliable environmental information that provides object detection:
X CT =f inr (X C ,X Cinr ,Ω[X C ,X Cinr ],Ω[X Cinr ,X Cinr ]);
wherein, omega [ X ] C ,X Cinr ]Representing environmental features X Cinr And disguised object feature X c Correlation between them, Ω [ X ] Cinr ,X Cinr ]) Representing correlation between environmental features, f inr Representing a dynamic coding function, X CT Representing more relevant and reliable environmental feature information extracted by the salient feature extraction network.
5. The method for identifying the disguised target facing the marine organism as claimed in claim 4, wherein the method for establishing the graph structure of the disguised target by the graph structure module is as follows:
the obtained salient features X comprising the camouflage target C And disguised target environment feature information X CT Is regarded as a node to obtain a node set V ═ V 1 ,v 2 ,...,v a A represents the number of nodes; the graph structure module comprises three GraphConv layers, and the retrieval and the node v are performed through the GraphConv layers i The set of related neighboring nodes is denoted as P (v) i ) Y represents a node v i The updated features, the implicit relationship between the image target and the background are:
wherein m represents the number of layers, and the value thereof may be 0, 1, 2; w 1 、W 2 Are all indicative of the learning parameters and,representing a set of neighboring nodes P (v) i ) The aggregation function of (a) is selected,representing a merge neighborhood; y is (m) (v i ) Representing a node v i M-th update of features, v j Is a node v i Associated with a neighboring node, Y (m) (v j ) Representing a node v j The mth update of the feature;
the output characteristics of the nodes are obtained by linear transformation: e.g. of the type i,j =γ(W·[v i :v j ]+d),v i ,v j ∈V;
Wherein γ (·) denotes a sigmoid function, e i,j Representing a node v i And v j A distance parameter between, and v i ,v j E is V; w and d represent network learning parameters, [:]representing a series connection of two nodes.
6. The method for identifying the disguised target facing the marine organism as claimed in claim 5, wherein the method for establishing the network relationship is:
the distance parameter set from node to node is recorded as E ═ E 1,2 ,e 1,3 ,...,e i,j If the nodes have implicit relations, the value is 1, and if the nodes have no relations, the value is 0; the network relationship between the constructed features is shown as Q ═ V, E;
the network relations of the constructed disguise target characteristics and the disguise target characteristics are respectively expressed as Q C =(V C ,E C ) And Q R =(V R ,E R ) Wherein V is C Representing a set of bogus target nodes, E C A node-to-node distance parameter, V, representing the disguised object R Representing a set of disguised target nodes, E R Representing the distance parameter from the disguised target node to the node.
7. The method for identifying a disguised target facing marine organisms according to claim 1, 4 or 5, wherein the similarity measurement module measures the distinctive feature X of the disguised target by cosine similarity C And a distinctive feature X of the camouflaged object R The distance of (c) is:
calculating the distance L between the disguised target feature and the disguised target feature through a similarity measurement module s Thereby obtaining the difference between the distinctive features of the disguising target and the distinctive features of the disguised target.
8. The marine organism-oriented camouflaged target identification method according to claim 7, wherein the underwater target detection model comprises a backbone network, a camouflaged encoder, a graph structure module and a similarity measurement module, wherein the camouflaged encoder and the camouflaged encoder are connected with the graph structure module, the camouflaged encoder, the similarity measurement module and the backbone network, and the underwater target detection model optimizes the extracted feature graph through the graph structure obtained by the camouflaged encoder and the distance calculated by the similarity measurement module.
The distance L calculated by the underwater target detection model through the similarity measurement module s Extracting the network relation established by the graph structure model to a node v i Is connected to the node v by the distance parameter of the graph structure in different degrees i Optimizing the feature map to obtain a target feature map F o Target feature map F o The classification prediction of the frame masquerading candidate is generated by being fed into two fully-connected layers, and whether the target is a masquerading target or not is judged.
9. The marine organism-oriented camouflage target recognition method according to claim 1 or 8, wherein the candidate frame is adaptively introduced into a prediction discrimination stage: is provided withFor a true boundary four-dimensional vector, the candidate box is (x) 1a ,y 1a ,x 2a ,y 2a ) And then:
wherein, t x1 、t x2 、t y1 、t y2 Denotes the predicted deviation, t' x1 、t′ x2 、t′ y1 、t′ y2 Representing true bounding box based compensation; x is the number of 1 、y 1 、x 2 、y 2 、w a 、h a As parameters of the real bounding box, x 1a 、y 1a 、x 2a 、y 2a Parameters of the candidate frame;
and detecting by using the candidate frame of N before ranking to realize the detection and identification of the disguised target.
10. The marine organism-oriented camouflaged target identification method according to claim 9, wherein the loss function of the underwater target detection model is:
l=α 1 l s +α 2 l d +α 3 l reg +α 4 l cls ;
wherein alpha is 1 、α 2 、α 3 、α 4 Parameters representing candidate box localization loss, perception loss, confrontation loss and classification loss respectively; l. the cls Representing a cross entropy loss function;
the perceptual loss function l d Is composed ofAnd a structural perceptual loss function l k Comprises the following steps:
l k =(1+5*|γ(XY)-GT|)L S ;
wherein XY represents the predicted value, GT represents the label of the truth value machine, L s Representing the distance output by the similarity metric module; network of discriminatorsThe device consists of 5 connected convolution layers, wherein gamma is a network parameter;
the penalty function is:
wherein l s Representing the positioning loss function, h and w represent the length and width of the candidate box, w a And h a Representing the length and width of the real bounding box.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210392176.0A CN114913409A (en) | 2022-04-14 | 2022-04-14 | Camouflage target identification method for marine organisms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210392176.0A CN114913409A (en) | 2022-04-14 | 2022-04-14 | Camouflage target identification method for marine organisms |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114913409A true CN114913409A (en) | 2022-08-16 |
Family
ID=82764234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210392176.0A Pending CN114913409A (en) | 2022-04-14 | 2022-04-14 | Camouflage target identification method for marine organisms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114913409A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690451A (en) * | 2022-11-14 | 2023-02-03 | 南京航空航天大学 | Combined detection method and system for camouflage object and salient object |
CN117612139A (en) * | 2023-12-19 | 2024-02-27 | 昆明盛嗳谐好科技有限公司 | Scene target detection method and system based on deep learning and electronic equipment |
-
2022
- 2022-04-14 CN CN202210392176.0A patent/CN114913409A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115690451A (en) * | 2022-11-14 | 2023-02-03 | 南京航空航天大学 | Combined detection method and system for camouflage object and salient object |
CN117612139A (en) * | 2023-12-19 | 2024-02-27 | 昆明盛嗳谐好科技有限公司 | Scene target detection method and system based on deep learning and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460356B (en) | Face image automatic processing system based on monitoring system | |
CN111583263B (en) | Point cloud segmentation method based on joint dynamic graph convolution | |
CN110633632A (en) | Weak supervision combined target detection and semantic segmentation method based on loop guidance | |
CN111783831B (en) | Complex image accurate classification method based on multi-source multi-label shared subspace learning | |
CN111898736B (en) | Efficient pedestrian re-identification method based on attribute perception | |
CN106951923B (en) | Robot three-dimensional shape recognition method based on multi-view information fusion | |
CN114913409A (en) | Camouflage target identification method for marine organisms | |
CN108520215B (en) | Single-sample face recognition method based on multi-scale joint feature encoder | |
Maire et al. | A convolutional neural network for automatic analysis of aerial imagery | |
CN115147864B (en) | Infrared human behavior recognition method based on cooperative heterogeneous deep learning network | |
CN106815323A (en) | A kind of cross-domain vision search method based on conspicuousness detection | |
CN110991257A (en) | Polarization SAR oil spill detection method based on feature fusion and SVM | |
CN114548256A (en) | Small sample rare bird identification method based on comparative learning | |
CN115937552A (en) | Image matching method based on fusion of manual features and depth features | |
CN117333948A (en) | End-to-end multi-target broiler behavior identification method integrating space-time attention mechanism | |
CN115311678A (en) | Background suppression and DCNN combined infrared video airport flying bird detection method | |
CN112529025A (en) | Data processing method and device | |
CN112801179A (en) | Twin classifier certainty maximization method for cross-domain complex visual task | |
CN115546668A (en) | Marine organism detection method and device and unmanned aerial vehicle | |
CN114863103A (en) | Unmanned underwater vehicle identification method, equipment and storage medium | |
Li | Construction method of swimming pool intelligent assisted drowning detection model based on computer feature pyramid networks | |
Al Duhayyim et al. | Intelligent deep learning based automated fish detection model for UWSN | |
CN110163106A (en) | Integral type is tatooed detection and recognition methods and system | |
US20240371145A1 (en) | Method and System for Optimization of a Human-Machine Team for Geographic Region Digitization | |
CN117437234B (en) | Aerial photo ground object classification and change detection method based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |