CN114863263B - Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion - Google Patents

Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion Download PDF

Info

Publication number
CN114863263B
CN114863263B CN202210796234.6A CN202210796234A CN114863263B CN 114863263 B CN114863263 B CN 114863263B CN 202210796234 A CN202210796234 A CN 202210796234A CN 114863263 B CN114863263 B CN 114863263B
Authority
CN
China
Prior art keywords
prediction
feature
cross
image
snakehead
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210796234.6A
Other languages
Chinese (zh)
Other versions
CN114863263A (en
Inventor
岳峻
张逸飞
王庆
李振忠
贾世祥
姚涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ludong University
Original Assignee
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ludong University filed Critical Ludong University
Priority to CN202210796234.6A priority Critical patent/CN114863263B/en
Publication of CN114863263A publication Critical patent/CN114863263A/en
Application granted granted Critical
Publication of CN114863263B publication Critical patent/CN114863263B/en
Priority to US18/184,490 priority patent/US11694428B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a snakehead detection method for blocking in class based on cross-scale hierarchical feature fusion, and belongs to the technical field of deep learning. The method comprises the steps of image acquisition, image processing and a network model; marking the collected image, adjusting the size of the image to obtain an input image, inputting the input image into a target detection network, performing convolution integration, and inserting the input image into a cross-scale hierarchical feature fusion module, wherein all features of the input cross-scale hierarchical feature fusion module are divided into n layers, and the n layers are composed of s feature mapping subsets, each feature mapping subset is subjected to feature fusion with other feature mapping subsets, and finally, the feature mapping subsets are connected to realize complete information fusion, and after the convolution operation, a training result is output; then, network parameter adjustment is carried out by using a loss function, and parameters suitable for a network model are obtained after multiple training iterations; and finally, inputting the output candidate frame into a non-maximum value inhibition module, and screening a correct prediction frame to obtain a prediction result.

Description

Snakehead detection method for intra-class shielding based on cross-scale hierarchical feature fusion
Technical Field
The invention relates to a snakehead fish detection method for intra-class shielding based on cross-scale hierarchical feature fusion, and belongs to the technical field of deep learning.
Background
In the technical field of deep learning, a feature fusion method is a common image processing method, for example, in a document "underwater fish target real-time detection based on improved YOLO and transfer learning" (pattern recognition and artificial intelligence, volume 32, third period, month 3 in 2019), an underwater fish target real-time detection algorithm based on improved YOLO and transfer learning is provided, multi-scale target detection is performed by using a feature fusion method, a network model with strong generalization performance can be obtained and trained by using the transfer learning method, the scattering fuzzy phenomenon of an underwater image can be removed by using the provided contrast-limited self-adaptive histogram equalization preprocessing algorithm, the problem of uneven illumination is overcome, and the underwater fish target real-time detection on an underwater robot embedded system is realized.
However, the above method still has problems, in the underwater fish target real-time detection based on improved YOLO and transfer learning, only one target can be detected at most for each subblock of an image, if multiple targets simultaneously appear in the subblocks, detection omission of overlapped targets is caused, multiple anchor points are set for one image subblock, and each anchor point can encode coordinate values of a boundary frame of the target, confidence coefficient and category of the existing target. However, the missing measurement still exists under the condition that the target sizes are close and the coordinates of the central points are basically overlapped.
The snakeheads are fishes inhabiting at the bottom of the river, the bodies are soft, slender and snake-shaped, the bodies are long, the front parts of the snakeheads are cylindrical, the rear parts of the snakeheads are gradually in a side flat shape, the bodies of the snakeheads are not fixed in a long strip shape in the culture process, and intra-class diversity shielding is easy to generate, namely, the snakeheads are shielded mutually in various body postures, and due to the slender bodies of the snakeheads, when a real frame is marked or a prediction frame is generated, the overlapping degree of anchor frames of the snakeheads shielded or tightly attached to each other in the intra-class shielding is large, and great difficulty is caused to detection. In aquaculture, particularly in breeding of snakeheads with flexible, long and thin bodies, accurate detection of snakehead individuals under the condition of intra-class diversity blocking is an important thing in breeding, and therefore, how to improve detection accuracy becomes a direction of research.
Disclosure of Invention
Aiming at the problems in the prior art, the method for detecting the similar internal shielding of the snakeheads based on the cross-scale hierarchical feature fusion is provided.
The invention solves the technical problems through the following technical scheme:
a method for detecting snakeheads with blocking in class based on cross-scale hierarchical feature fusion comprises image acquisition, image processing and a network model;
the image processing means that the collected images are divided into a training set and a prediction set according to the ratio of 9:1, label is carried out on all the images by using labelimg to obtain images containing real frames of all targets, size clustering is carried out on all the real frames to obtain nine real frame sizes most suitable for snakehead detection training, the image sizes are adjusted to form input images, and the input images are suitable for a network model;
inputting an input image into the network model, detecting a target, extracting the target snakehead characteristics of the input image through 1X1 convolution, integrating the characteristics, adjusting dimensionality, inserting into a cross-scale hierarchical characteristic fusion module,
it is characterized in that the preparation method is characterized in that,
firstly, performing model training, inputting input images of a training set into a network model, dividing all features input into the model into n layers in the cross-scale hierarchical feature fusion module by the input images of the training set, wherein the n layers are composed of s feature mapping subsets, each feature mapping subset is subjected to feature fusion with other feature mapping subsets, and finally, the feature mapping subsets are connected to form complete information fusion, and after convolution operation, a training result containing target confidence coefficient, coordinate information and type information is output; meanwhile, in order to improve the accuracy of the network model, a plurality of cross-scale hierarchical feature fusion modules can be connected in series;
then, carrying out network parameter adjustment by using a YOLOV4 loss function, and obtaining parameters suitable for a network model after 50 times of training iteration to form a network model for detection;
and then detecting the model, taking the input image of the prediction set as a test image for detection, inputting the test image into the network model with the adjusted parameters, obtaining a prediction result containing the target type, the central point coordinate and the width and height information of the candidate frame by the network model, inputting the prediction result into a non-maximum suppression module, and screening a correct prediction frame by the non-maximum suppression module based on the accuracy score ranking of all the candidate frames.
On the basis of the technical scheme, the application makes the following improvements and improvements on the technical scheme:
further, the image acquisition is to acquire a snakehead image with a size of 1920 x 1080 by using a camera; in this image, the close proximity of the snakehead body due to its elongation forms an intra-class occlusion.
Further, after information integration and dimension adjustment are carried out on the characteristic channels through 1X1 convolution, all characteristics contained in the image are obtained, the body posture of the elongated snakehead is extracted, the characteristics at the moment correspond to different dimensions of the characteristic matrix, are mutually independent and not related, and are respectively equivalent to independent individuals.
Further, the cross-scale hierarchical feature fusion module constructs hierarchical residual connection in a residual block, divides all features into n layers, and consists of s feature mapping subsets in total, namely, all features are divided into s feature mapping subsets on average, and x is used for dividing all features into s feature mapping subsets i Where i = {1, 2.. multidata, s }, each feature map subset x i There are the same spatial sizes, but there are w channels per feature mapping subset compared to the input features, i.e. n = s × w.
Further, the feature mapping subsets correspond to a 3X3 convolution kernel, an output feature is output after the feature extraction by the 3X3 convolution kernel, and the input feature of the 3X3 convolution kernel includes the feature mapping subset corresponding to the 3X3 convolution kernel and the output feature formed by the 3X3 convolution kernel before the 3X3 convolution kernel.
Further, all the output features are fused to form a fused feature subset.
Further, the output features are divided into two parts, one part is transmitted to a 3X3 convolution kernel corresponding to the feature mapping subset which is not fused for feature fusion, and the other part is convolved by 1X1 for information processing; and after all the feature mapping subset groups are fused, integrating all the feature information after convolution processing by 1X1 through convolution of 1X1, and summarizing the features to obtain a final prediction result containing the target type, the coordinate information and the confidence coefficient.
Further, the convolution layer of the cross-scale hierarchical feature fusion module is one layer for each x i All with a 3X3 convolutional layer, named K i ,K i Y for output i To represent; except for x 1 In addition, feature mapping subset x i Plus K i-1 Output characteristic of, co-feeding K i (ii) a Each feature map subset x i When a 3X3 convolution kernel is used, output results have larger receptive field than the original input feature mapping subsets, and different body posture features of the snakeheads can be learned; s is used as a control parameter of the scale dimension, so that characteristics of richer receptive fields are allowed to be learned, and the calculation cost introduced by connection is negligible; k is i Is not only input to K i+1 And across scales input to K i+2 ,K i+3 Up to K s ,y i Is represented as follows:
Figure 100002_DEST_PATH_IMAGE001
the formula I is shown in the specification,
the cross-scale hierarchical feature fusion inputs more abundant information into different scales, learns different body posture features of the snakeheads, ensures that feature information is extracted more effectively and stably in the environment where the snakeheads are closely adjacent, increases the receptive fields after the features are subjected to a 3X3 convolution kernel, generates a plurality of equivalent feature scales due to a combination effect, and finally outputs comprise combinations of different numbers and different receptive fields.
Further, the non-maximum value suppression is ranked based on the prediction frame positioning accuracy scores, the scores and the overlapping degrees of the prediction frames are considered, the scores of the prediction frames with the too-high scores are reduced, whether other prediction frames are removed or not is judged according to whether the intersection ratio of the prediction frame with the highest score and other prediction frames exceeds a threshold value or not, if the intersection ratio is larger than the threshold value, the other prediction frames are removed, and all the types are circulated until all the types complete the screening of the prediction frames.
Furthermore, all the prediction frames with scores larger than a threshold value in an image are found out, and the prediction frames with low scores can be screened and deleted; then, judging the score of the selected prediction frame, and expressing the score by t; sorting the screened prediction frames according to the scores to obtain a prediction frame with the highest score, then calculating the coincidence degree of the prediction frame with the highest score and all other prediction frames, and if the coincidence degree process is higher than a threshold value, taking a Gaussian index as an obtained intersection ratio, wherein the Gaussian index is as shown in a formula two:
Figure 375038DEST_PATH_IMAGE002
in the formula II, the first step is carried out,
in the second formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i Represents the prediction box to be processed currently, sigma is a constant,
and after the Gaussian index is taken, the score of the prediction frame is attenuated, and the attenuated score is shown as the formula III:
Figure 100002_DEST_PATH_IMAGE003
in the formula three, the first step is,
in the third formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i Representing the current prediction frame to be processed, sigma is a constant, and t is the prediction frame positioning accuracyAnd (4) scoring, namely replacing the original score after obtaining a new score, and then repeatedly sequencing and screening all the reserved prediction frames until obtaining the final prediction frame.
The invention has the advantages that:
firstly, the invention researches a cross-scale hierarchical feature fusion module, constructs hierarchical residual connection, represents multi-scale features with finer granularity, has stronger feature extraction capability under the condition of not increasing the calculation load, and effectively increases the receptive field of each network layer. After cross-scale feature processing, the color, texture and other main feature information of the snakeheads can be effectively extracted, the features of the snakeheads under the condition of diversified shielding can be accurately and quickly extracted, the confusion with the background is avoided, the generalization capability of the model is improved, and the detection precision of the snakeheads under the condition of similar diversified shielding is effectively improved.
Secondly, the invention provides a method for screening correct prediction frames based on the positioning accuracy score ranking of all the prediction frames. The method aims at the problems that snakeheads are easy to cling to each other due to long and thin bodies of the snakeheads in snakehead cultivation and are shielded in a diversity mode, can accurately screen out wrong prediction frames, and avoids the situation that the correct prediction frames are mistakenly rejected due to too high coincidence degree. The method includes the steps that firstly, the coincidence degrees of all candidate frames and the prediction frames are ranked instead of ranking only by means of the probability of judging the categories, the prediction frame with the highest score is obtained, then the coincidence degrees of the prediction frame with the highest score and all other prediction frames are calculated, if the coincidence degree process is higher than a threshold value, the obtained intersection ratio is subjected to non-maximum value suppression by taking a Gaussian index, the fact that the true prediction frames are screened out due to excessive overlapping of the prior frames can be effectively avoided, and therefore detection accuracy is improved.
Drawings
FIG. 1 is a system diagram of a detection module;
FIG. 2 is a block diagram of a cross-scale hierarchical feature fusion module;
FIG. 3 is a graph showing the effect of fish detection using the method of the present application.
Detailed Description
In order to better explain the technical solution of the present application, the following specifically describes the solution of the present application with reference to snakeheads:
the invention takes a snakehead image in a juvenile fish stage as a research object, and researches a method for detecting the diversity in snakehead breeding under the condition of mutual shielding.
In connection with the figures 1-2,
image acquisition: the size of a snakehead breeding image collected by a used camera is 1920-1080, the snakehead is slender, and in the swimming process of a fish school, the fish is close to the fish, so that in the shot picture, the snakeheads are closely adjacent to each other, and the condition of similar internal shielding occurs;
image processing: on one hand, labeling the collected image by using label to obtain an image containing a real frame of all targets; on the other hand, a candidate frame for network prediction is adjusted by an initially-set prior frame, the snakehead is slender, so that an anchor frame is also short and long or thin and tall, if the prior frame is not adjusted, the prediction of the prior frame is influenced, so that all real frames are subjected to size clustering to obtain nine anchor frame sizes most suitable for snakehead detection network training, and the image size is adjusted to form an input image, so that the image is suitable for a network model; specifically, the input image is a snakehead image containing an intra-class occlusion phenomenon, and the size of the image is adjusted to 608 × 608.
And (3) network model: inputting an image into a target detection network, extracting features through 1X1 convolution, integrating information, and obtaining a feature matrix A after adjusting dimensionality: [2,64,608 ], wherein the first dimension is two categories, namely snakehead and background, the second dimension is a characteristic channel and comprises snakehead color information, texture information and correlation information between snakeheads, and the third dimension and the fourth dimension are image width and height; inserting the characteristic matrix A into a cross-scale hierarchical characteristic fusion module, averagely dividing all characteristics into s characteristic mapping subsets, performing characteristic mutual fusion among the subsets, finally performing information integration, and outputting a candidate frame containing target confidence, coordinate information and category information after convolution operation; adjusting network parameters for multiple times through a loss function to obtain parameters suitable for a network model; finally, entering a non-maximum suppression module which screens correct prediction frames based on the positioning accuracy score ranking of all the prediction frames, screening all candidate frames generated in the last step through non-maximum suppression, and screening out wrong candidate frames, so that real prediction frames are reserved, and a final prediction result is obtained;
specifically, the cross-scale hierarchical feature fusion module: the cross-scale hierarchical feature fusion module constructs hierarchical residual connection in a residual block to represent multi-scale features with finer granularity, has stronger feature extraction capability under the condition of not increasing the calculation load and effectively increases the receptive field of each network layer.
In the network model:
step one, taking a snakehead image with intra-class diversity occlusion as an input image of the module, performing information integration and dimension adjustment on a characteristic channel through 1X1 convolution, obtaining all characteristics contained in the image, and obtaining a characteristic matrix A: [2,64,608 ], wherein the first dimension is two categories, namely snakehead and background, the second dimension is a characteristic channel and comprises snakehead color information, texture information and correlation information between snakeheads, and the third dimension and the fourth dimension are image width and height; the characteristics at this time are mutually independent and not related, such as color, texture, background and other characteristics, which are respectively equivalent to independent individuals;
secondly, a cross-scale hierarchical feature fusion module constructs hierarchical residual connection in a residual block, divides all features into n layers and consists of s feature mapping subsets, namely, all features are averagely divided into s feature mapping subsets, and x is used for dividing all features into s feature mapping subsets i Expressed as: [2,64/s,608,608 ]]The first dimension is the number of classes, the second dimension is the eigen-channel, where i = {1,2,.. multidata }, each eigen-map subset x i There are the same spatial size, but there are w channels per feature mapping subset compared to the input features, i.e. n = s × w;
thirdly, inputting the features of the first group of feature mapping subsets into a 3X3 convolution kernel corresponding to the feature mapping subsets to extract the features, and obtaining an output feature matrix y of the feature mapping subsets 1 :[2,64/s,608,608](ii) a Then, the output y 1 And a second set of feature mapsSubset x 2 :[2,64/s,608,608]Input together into the 3X3 convolution kernel corresponding to the second set of feature map subsets, and output y of the second set of feature map subsets 2 :[2,2*64/s,608,608]Output y of the first set of feature map subsets 1 The output y of the second set of feature mapping subsets 2 And a third set of feature map subsets x 3 :[2,64/s,608,608]Inputting the three subsets of feature maps into 3X3 convolution kernels corresponding to the third subset of feature maps, and outputting a third subset of feature maps y 3 :[2,4*64/s,608,608]Processing all the feature mapping subsets according to the above to obtain corresponding output features; the processing realizes the fusion of the characteristics and the enrichment of information; the output features of each feature mapping subset are also connected across scales into each feature mapping subset thereafter;
namely:
each feature map subset x i All with a 3X3 convolution layer called K i ,K i Output characteristic y of i To represent y i I.e. the output characteristic. Feature mapping subset x i Plus K i-1 Output characteristics of the output, fed together with K i . Each feature map subset x i Through a K i Y of time, output i Will have a larger field of view than the original input features. s is used as a control parameter of the dimension of the scale, and larger s allows learning of characteristics of richer receptive fields, while the computational overhead introduced by the connection is negligible. K i Is not only input to K i+1 And across scales input to K i+2 ,K i+3 Up to K s ,y i Is represented as follows:
Figure 217092DEST_PATH_IMAGE004
the formula I is shown in the specification,
cross-scale hierarchical feature fusion inputs more abundant information to different scales, learns different body posture features of the snakeheads, so as to ensure that feature information is extracted more effectively and stably under the intra-class diversity shielding environment, after the features pass through a convolution kernel of 3X3, the receptive field is increased, a plurality of equivalent feature scales are generated due to the combination effect, and the final output comprises combinations of different numbers and different receptive fields;
the cross-scale hierarchical feature fusion enables the color texture and the diverse body posture information of the snakeheads acquired by the network to be more, and the prior frame position obtained by detecting the snakeheads by utilizing the acquired color texture and the diverse body posture information is more accurate and closer to the actual position of the snakeheads, so that the accuracy of the detection of the snakeheads under the condition of intra-class diverse shielding is improved.
Fourthly, the fused output features are subjected to 1X1 convolution, and are input to the feature mapping subsets behind the feature mapping subsets and input to convolution kernels corresponding to the feature mapping subsets behind the feature mapping subsets for feature fusion; the output features are convolved by 1X1, processed and uniformly adjusted into a feature matrix [2,64, 208, 208 ]. After all the feature mapping subset groups complete feature fusion, all the output features are subjected to 1X1 convolution processing to obtain corresponding feature information, all the feature information is subjected to feature integration through an integrated convolution kernel to obtain a fusion feature subset, the integrated convolution is 1X1 convolution, integration of all the information is performed, summary of the features is completed, and the obtained fusion feature subset contains a final prediction result Y of a target type, coordinate information and confidence coefficient: [2,64, 208, 208], wherein the second dimension comprises all features of the snakehead image, and the position information, the type information and the confidence level information of the predicted frame of the snakehead image.
Non-maxima suppression module as a screen: a non-maximum suppression module that ranks the accuracy scores based on the prediction box.
Most target detection algorithms have a plurality of densely distributed prior frames, and after a prediction result is obtained, the prediction result is adjusted by combining the prior frames to obtain a final detection result. The same target may generate multiple prediction blocks. The snakeheads generate various body postures due to the slender and soft bodies of the snakeheads, so that various shielding in the snakeheads is caused, and the snakeheads are directly and easily attached together, so that real frames of the snakeheads are very close and dense, and values of prediction frames easily exceed a threshold value, so that the prediction frames are mistakenly removed.
The method comprises the steps of positioning non-maximum value suppression of accuracy score ranking based on a prediction frame, considering the score and the overlapping degree of the prediction frame, directly screening the prediction frame due to the fact that the overlapping degree is too large without simple and rough results, reducing the score of the prediction frame with the too high score, judging whether the intersection ratio of the prediction frame with the highest score and other prediction frames exceeds a threshold value set by people according to whether the intersection ratio of the prediction frame with the highest score and the other prediction frames exceeds the threshold value set by people or not, setting the threshold value to be 0.7, judging whether the other prediction frames are removed or not, removing the other prediction frames if the intersection ratio is greater than the threshold value, and circulating all the categories until all the categories complete screening of the prediction frames.
The method uses the intersection ratio score which can simultaneously express the score and the positioning precision as the ranking basis, integrates the confidence score and the positioning precision score, and can more accurately and effectively sort the prediction frames, thereby more accurately screening the prediction frames and eliminating redundant prediction frames.
As the snakehead target detection only has one type, the type does not need to be circulated in the screening process, and only one type of snakehead needs to be subjected to subsequent algorithm. The method comprises the steps of firstly, finding out all prediction frames with scores larger than a threshold value in an image, and screening and deleting the prediction frames with low scores; then, the score of the selected prediction box is determined and denoted by t. Sorting the screened prediction frames according to the scores to obtain the prediction frame with the highest score, then calculating the coincidence degree of the prediction frame with the highest score and all other prediction frames, and if the coincidence degree process is higher than a threshold value, taking a Gaussian index as the obtained intersection ratio, wherein the Gaussian index is shown as a formula II:
Figure 877880DEST_PATH_IMAGE002
in the formula II, the first step is carried out,
in the second formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i Represents the prediction box currently to be processed, sigma is a constant,
after the Gaussian index is taken, the score of the prediction frame is attenuated, and the score after attenuation is shown as a formula III:
Figure 971214DEST_PATH_IMAGE003
in the formula three, the first step is,
in the third formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i And representing the current prediction frame to be processed, wherein sigma is a constant, t is the positioning accuracy score of the prediction frame, and the original score is replaced after a new score is obtained. And then, repeatedly carrying out sequencing screening on all the reserved prediction frames until a final prediction frame is obtained.
The non-maximum suppression module based on the accuracy score ranking of the positioning of the prediction frame is not only suitable for the detection of snakehead targets with only one type, but also effectively suppresses the situation that the prediction results of two targets which are close to each other are mistaken for different prediction results of one target according to the slender body type characteristics of the snakehead. The method has considerable precision improvement on the detection of the snakehead target shielded by the intra-class diversity.
By combining the network model and the non-maximum suppression module, the target detection of the snakeheads under the condition of in-class diversity shielding can be realized, and the problem of accuracy reduction caused by large and dense superposition degree of prediction frames due to in-class diversity shielding and elongated body shapes caused by similar colors, textures and various body postures under the condition of in-class shielding is solved. The processed snakehead image is transmitted to a snakehead detection module, and by utilizing the cross-scale layered feature fusion module, the network can acquire more abundant color, texture, diversified body gestures and other features, so that the correlation among the features is enhanced, and the accuracy of detecting the snakehead is improved under the condition that the snakehead is shielded in a mutually diversified manner and the color and the texture are similar; by utilizing the non-maximum value inhibition module based on the accuracy score ranking of the prediction frame, under the condition that the generated prediction frame is dense and is greatly combined with the real frame, the condition that the correct prediction frame is mistakenly rejected is avoided, the existence probability of the correct prediction frame is effectively improved, and the accuracy of the detection of the snakeheads is improved.
Inputting an input image into a cross-scale hierarchical feature fusion module, reading all features of the image, and performing feature fusion on the image according to a method. In order to ensure the full fusion and the efficient utilization of the features, 5 cross-scale hierarchical feature fusion modules are arranged in total, and then the fused features are output; and integrating the fused feature information into a prediction result containing the target type, the coordinate information of the target type and the confidence coefficient. And fine-tuning the generated prediction result and a real frame labeled in advance through a YOLOV4 loss function, adjusting the position information of the prediction frame, and re-training until the training iteration times are finished to obtain the final prediction result.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A method for detecting snakeheads with blocking in class based on cross-scale hierarchical feature fusion comprises image acquisition, image processing and a network model;
the image processing means that the collected images are divided into a training set and a prediction set according to the ratio of 9:1, label is carried out on all the images by using labelimg to obtain images containing real frames of all targets, size clustering is carried out on all the real frames to obtain nine real frame sizes most suitable for snakehead detection training, the image sizes are adjusted to form input images, and the input images are suitable for a network model;
inputting an input image into the network model, detecting a target, extracting the target snakehead characteristics of the input image through 1X1 convolution, integrating the characteristics, adjusting dimensionality, inserting into a cross-scale hierarchical characteristic fusion module,
it is characterized in that the preparation method is characterized in that,
firstly, performing model training, inputting input images of a training set into a network model, dividing all features input into the model into n layers in the cross-scale hierarchical feature fusion module by the input images of the training set, wherein the n layers are composed of s feature mapping subsets, each feature mapping subset is subjected to feature fusion with other feature mapping subsets, and finally, the feature mapping subsets are connected to form complete information fusion, and after convolution operation, a training result containing target confidence coefficient, coordinate information and type information is output;
then, carrying out network parameter adjustment by using a YOLOV4 loss function, and obtaining parameters suitable for a network model after 50 times of training iteration to form a network model for detection;
and then detecting the model, taking the input image of the prediction set as a test image for detection, inputting the test image into the network model with the adjusted parameters, obtaining a prediction result containing the target type, the central point coordinate and the width and height information of the candidate frame by the network model, inputting the prediction result into a non-maximum suppression module, and screening a correct prediction frame by the non-maximum suppression module based on the accuracy score ranking of all the candidate frames.
2. The method for snakehead detection of intra-class occlusion based on cross-scale hierarchical feature fusion of claim 1, characterized in that the image acquisition is to acquire a snakehead image with size 1920 x 1080 by using a camera; in this image, the close proximity of the snakehead body due to its elongation forms an intra-class occlusion.
3. The method for detecting snakehead fish blocked in class based on cross-scale hierarchical feature fusion as claimed in claim 2, wherein the target detection is performed by performing information integration and dimension adjustment on feature channels through 1X1 convolution, so as to obtain all features contained in an image, and extract the body posture of the elongated snakehead fish, wherein the features at the moment correspond to different dimensions of a feature matrix, are mutually independent and not related, and are respectively equivalent to independent individuals.
4. The method for snakehead detection of in-class occlusion based on cross-scale hierarchical feature fusion of claim 3, wherein the cross-scale hierarchical feature fusion module constructs hierarchical residual connection inside a residual block, divides all features into n layers, and consists of s feature mapping subsets, i.e. all features are divided into s feature mapping subsets on average, and x is used for x i Is shown byWhere i = {1, 2.. multidot.s }, each feature map subset x i There are the same spatial sizes, but there are w channels per feature mapping subset compared to the input features, i.e. n = s × w.
5. The method for snakehead detection of occlusion within a class based on cross-scale hierarchical feature fusion of claim 4, wherein the feature mapping subsets correspond to a 3X3 convolution kernel, the 3X3 convolution kernel extracts features and outputs an output feature, and the input features of the 3X3 convolution kernel comprise the feature mapping subset corresponding to the 3X3 convolution kernel and the output feature formed by the 3X3 convolution kernel before the 3X3 convolution kernel.
6. The method for snakehead detection of in-class occlusion based on cross-scale hierarchical feature fusion as claimed in claim 5, wherein all the output features are fused to form a fused feature subset.
7. The method for detecting snakeheads with blocking in a class based on cross-scale hierarchical feature fusion as claimed in claim 6, wherein the output features are divided into two parts, one part is transmitted to the 3X3 convolution kernel corresponding to the next unfused feature mapping subset for feature fusion, and the other part is convolved by 1X1 for information processing; and after all the feature mapping subset groups are fused, performing 1X1 convolution on all feature information subjected to 1X1 convolution, integrating all information, and summarizing features to obtain a final prediction result containing target types, coordinate information and confidence coefficients.
8. The method for snakehead detection of in-class occlusion based on cross-scale hierarchical feature fusion of claim 4, wherein the convolution layer of the cross-scale hierarchical feature fusion module is one layer for each x i All with a 3X3 convolutional layer, named K i ,K i Y for output i To represent; except for x 1 In addition, feature mapping subset x i Plus K i-1 Output characteristic of, fed togetherInto K i ;K i Is not only input to K i+1 And across scales input to K i+2 ,K i+3 Up to K s ,y i Is represented as follows:
Figure DEST_PATH_IMAGE001
and (4) a formula I.
9. The method of claim 8, wherein the non-maximum suppression is based on the accuracy score ranking of the prediction frames, and considering the score and the overlapping degree of the prediction frames, the score of the prediction frame with the highest score is reduced, and whether the intersection ratio of the prediction frame with the highest score to other prediction frames exceeds the threshold is determined, and if the intersection ratio is greater than the threshold, the other prediction frames are removed, and then all the categories are circulated until all the categories complete the screening of the prediction frames.
10. The method for snakehead detection on in-class occlusion based on cross-scale hierarchical feature fusion of claim 9, wherein all prediction frames with scores larger than a threshold value in an image are found out firstly; then, judging the score of the selected prediction frame, and using t to represent the score; sorting the screened prediction frames according to the scores to obtain a prediction frame with the highest score, then calculating the coincidence degree of the prediction frame with the highest score and all other prediction frames, and if the coincidence degree process is higher than a threshold value, taking a Gaussian index as an obtained intersection ratio, wherein the Gaussian index is as shown in a formula two:
Figure 867733DEST_PATH_IMAGE002
in the formula II, the first step is carried out,
in the second formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i Represents the prediction box to be processed currently, sigma is a constant,
and after the Gaussian index is taken, the score of the prediction frame is attenuated, and the score after attenuation is shown as the formula III:
Figure DEST_PATH_IMAGE003
in the formula three, the first step is,
in the third formula, e is base number, iou is cross-over ratio, b M For the current scoring highest prediction box, b i And representing the current prediction frame to be processed, wherein sigma is a constant, t is a prediction frame positioning accuracy score, replacing the original score after obtaining a new score, and then repeatedly sequencing and screening all the reserved prediction frames until obtaining the final prediction frame.
CN202210796234.6A 2022-07-07 2022-07-07 Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion Active CN114863263B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210796234.6A CN114863263B (en) 2022-07-07 2022-07-07 Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion
US18/184,490 US11694428B1 (en) 2022-07-07 2023-03-15 Method for detecting Ophiocephalus argus cantor under intra-class occulusion based on cross-scale layered feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210796234.6A CN114863263B (en) 2022-07-07 2022-07-07 Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion

Publications (2)

Publication Number Publication Date
CN114863263A CN114863263A (en) 2022-08-05
CN114863263B true CN114863263B (en) 2022-09-13

Family

ID=82626854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210796234.6A Active CN114863263B (en) 2022-07-07 2022-07-07 Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion

Country Status (2)

Country Link
US (1) US11694428B1 (en)
CN (1) CN114863263B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114782759B (en) * 2022-06-22 2022-09-13 鲁东大学 Method for detecting densely-occluded fish based on YOLOv5 network
CN117689664B (en) * 2024-02-04 2024-05-14 杭州灵西机器人智能科技有限公司 Nondestructive testing method, system, device and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system
CN111209952A (en) * 2020-01-03 2020-05-29 西安工业大学 Underwater target detection method based on improved SSD and transfer learning
CN111310622A (en) * 2020-02-05 2020-06-19 西北工业大学 Fish swarm target identification method for intelligent operation of underwater robot
CN113076871A (en) * 2021-04-01 2021-07-06 华南理工大学 Fish shoal automatic detection method based on target shielding compensation
CN114170497A (en) * 2021-11-03 2022-03-11 中国农业大学 Multi-scale underwater fish school detection method based on attention module

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114653610A (en) * 2022-04-12 2022-06-24 闽江学院 Fish identification and sorting implementation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system
CN111209952A (en) * 2020-01-03 2020-05-29 西安工业大学 Underwater target detection method based on improved SSD and transfer learning
CN111310622A (en) * 2020-02-05 2020-06-19 西北工业大学 Fish swarm target identification method for intelligent operation of underwater robot
CN113076871A (en) * 2021-04-01 2021-07-06 华南理工大学 Fish shoal automatic detection method based on target shielding compensation
CN114170497A (en) * 2021-11-03 2022-03-11 中国农业大学 Multi-scale underwater fish school detection method based on attention module

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An Underwater Fish Individual Recognition Method Based on Improved YoloV4 and FaceNet;Huanjun Zhang 等;《 20th International Conference on Ubiquitous Computing and Communications (IUCC/CIT/DSCI/SmartCNS)》;20211231;全文 *
Yolov4 High-Speed Train Wheelset Tread Defect Detection System Based on Multiscale Feature Fusion;Changfan Zhang 等;《Journal of Advanced Transportation》;20220327;全文 *
基于多尺度融合与无锚点 YOLO v3 的鱼群计数方法;张璐 等;《农业机械学报》;20211130;全文 *

Also Published As

Publication number Publication date
US11694428B1 (en) 2023-07-04
CN114863263A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110598029B (en) Fine-grained image classification method based on attention transfer mechanism
CN114863263B (en) Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
Zahisham et al. Food recognition with resnet-50
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
CN104484681B (en) Hyperspectral Remote Sensing Imagery Classification method based on spatial information and integrated study
CN110781897B (en) Semantic edge detection method based on deep learning
CN106022232A (en) License plate detection method based on deep learning
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN111178120B (en) Pest image detection method based on crop identification cascading technology
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN110991257B (en) Polarized SAR oil spill detection method based on feature fusion and SVM
CN109087330A (en) It is a kind of based on by slightly to the moving target detecting method of smart image segmentation
CN108596195B (en) Scene recognition method based on sparse coding feature extraction
CN106815323A (en) A kind of cross-domain vision search method based on conspicuousness detection
CN109872331A (en) A kind of remote sensing image data automatic recognition classification method based on deep learning
CN109785359B (en) Video target detection method based on depth feature pyramid and tracking loss
Zhao et al. Semi-supervised learning-based live fish identification in aquaculture using modified deep convolutional generative adversarial networks
CN109165658A (en) A kind of strong negative sample underwater target detection method based on Faster-RCNN
CN112926652A (en) Fish fine-grained image identification method based on deep learning
Pramunendar et al. A Robust Image Enhancement Techniques for Underwater Fish Classification in Marine Environment.
CN105069459B (en) One kind is directed to High Resolution SAR Images type of ground objects extracting method
CN116977960A (en) Rice seedling row detection method based on example segmentation
CN112613428A (en) Resnet-3D convolution cattle video target detection method based on balance loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant