Disclosure of Invention
The invention aims to solve the technical problems of insufficient single-frequency band sonar detection precision, poor underwater image preprocessing effect, weak adaptability of an identification model and the like in the traditional fish investigation, and realizes the efficient and accurate investigation of the fish types and the fish numbers through the cooperation of multi-frequency band sonar self-adaptive detection, multi-dimensional image preprocessing, intelligent detection segmentation and migration learning identification.
In order to solve the technical problems, the embodiment of the invention provides the following technical scheme:
an apparatus for intelligently investigating fish types and numbers, comprising:
Extracting targets by a sonar image processing algorithm, counting multiple targets by a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
The system comprises a three-dimensional image acquisition module, a light compensation lamp, a light attenuation compensation algorithm, a three-dimensional image acquisition module, a light compensation module and a light attenuation compensation algorithm, wherein the depth camera is used for acquiring water depth and turbidity data in real time and synchronously marking image time sequences, and capturing images of fish shoals, and the images cover polymorphic scenes which are not limited to the side surfaces and the bending postures of fish bodies to form a three-dimensional image model of the fish shoals;
The image preprocessing module is used for constructing an acoustic and optical fish-shoal high-dimensional image data set by correlating acoustic signals output by sonar detection with the fish-shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited by rotation, scaling and cutting;
the fish detection module is used for inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
The main body segmentation module completes the cutting of the fish object after obtaining the rectangular frame position of the fish object, and enters a fish segmentation model to carry out main body segmentation, and the segmented picture eliminates an underwater impurity interference picture so that the image only contains the main identification object of the fish object;
The identification and classification module inputs the fish images after the main body segmentation into the migration learning identification model to carry out classification and identification on fish types.
Preferably, when the depth camera collects water depth and turbidity data in real time and marks image time sequences synchronously, and captures images of the fish school to form a three-dimensional image model of the fish school, the following steps are adopted:
acquiring the depth information of the fish shoal by adopting a TOF flight time technology, and fusing the depth information with the acquired pixel information of the fish shoal image to generate fish shoal image data with a depth channel;
Based on a multi-frame image sequence, a dynamic three-dimensional image model of the fish shoal in a certain time period is constructed by utilizing an SFM motion restoration structure algorithm and combining with a marked image time sequence, and the gesture change in the swimming process of the fish shoal is displayed;
By adjusting the field angle and resolution parameters of the depth camera, the depth data of the fish shoal can be collected in different underwater distance ranges at a certain point cloud density.
Preferably, the adaptation of the underwater light environment by the light attenuation compensation algorithm is specifically:
carrying out illumination equalization treatment on the collected fish images, and adopting a self-adaptive histogram equalization algorithm to adjust the image contrast according to the light intensity distribution after light attenuation compensation;
an underwater light attenuation compensation network is introduced, a light attenuation compensation model is constructed based on a U-Net architecture, an original image and a light attenuation coefficient are input, an image after light attenuation compensation is output, and a formula is adopted:
Calculating a light attenuation compensation enhancement value, combining the image features before and after light attenuation compensation, and fusing the compensated image features with the original image features through a feature fusion layer;
wherein, the A light attenuation compensation enhancement value representing pixel location (i, j) on channel c,Representing the intensity value of the pixel location (i, j) on channel c in the original image, alpha i,j representing the local light attenuation estimation factor at pixel location (i, j),A color shift correction factor representing pixel location (i, j) on channel c,A compensating perturbation factor representing the pixel location (i, j) on channel c,Representing the estimated attenuation value for pixel location (i, j) on channel c,Representing the local average of the estimated attenuation values of the neighborhood centered around pixel location (i, j) on channel c.
Preferably, the acoustic signals output by the fish shoal three-dimensional image model associated with sonar detection are used for constructing an acoustic plus optical fish shoal high-dimensional image data set, which specifically comprises:
respectively carrying out band-pass filtering pretreatment on three-frequency band sonar echo signals of 30kHz, 200kHz and 400kHz to remove environmental noise and frequency aliasing interference;
Adopting a GPS timing module to perform time stamp synchronization on the sonar signal and the optical image, and ensuring data alignment at the same moment;
Calibrating sonar detection depth and optical acquisition depth by an extended Kalman filtering algorithm;
Extracting time-frequency characteristics, space characteristics and geometric characteristics and texture characteristics of a three-dimensional image of a sonar signal;
Weighting and fusing the extracted acoustic features and the optical features, and performing dimension reduction on the fused high-dimension features;
And carrying out layered labeling on the behavior mode, the fingerling attribute and the environmental parameter of the fused data set, and carrying out cross-mode consistency test, sample equalization treatment and increment learning verification on the data set.
Preferably, the data set is expanded by a data enhancement mode which is not limited to rotation, scaling and clipping, and specifically:
carrying out rotation treatment on the fish images after the main body is segmented, wherein the rotation angle range is-15 degrees to 15 degrees, and simulating different swimming postures of fish under water;
Scaling the image to be 0.8-1.2 times to adapt to the size change of the fish body under different shooting distances;
cutting out an area containing the complete fish body from the original image by adopting a random cutting mode, wherein the cutting size is not less than 70% of the original image;
introducing Mosaic data enhancement, randomly splicing a plurality of different fish images, and simulating a multi-fish-swarm scene.
Preferably, the adaptive sonar detection module counts the volume/surface density of the volume or area of the water area swept by the sonar, and estimates the fish quantity by combining the area of the water area, specifically:
the sonar scans the water area by fan-shaped wave beams, the scanning angle is theta, the maximum detection distance is R, and the scanning volume is when the depth is h Horizontal scan area
Dividing the tracking count target number N by the scanning volume/area to obtain a bulk density ρ V =n/V or an areal density ρ S =n/S;
Knowing the total water area S total, the total number N total=ρS·Stoal, or the total volume calculated in combination with the average depth.
Preferably, the fish detection module invokes a detection algorithm to detect fish in the image, and calculates a coordinate position of a fish detection frame in the image and a corresponding prediction score, which specifically includes:
extracting multiscale visual characteristics of the preprocessed image by using a Faster R-CNN detection algorithm, and capturing morphology, texture and edge information of fish;
generating an anchor frame based on the feature map, adjusting the position of the anchor frame through regression calculation, and determining the coordinates of a fish detection frame;
and calculating the class probability of the target in each detection frame by using a classifier, and outputting the prediction score and the corresponding fish class label.
Preferably, the main body segmentation module completes the cutting of the target after obtaining the rectangular frame position of the fish target, and enters a fish segmentation model to carry out main body segmentation, and the segmented picture eliminates underwater impurity interference pictures, specifically:
Based on rectangular frame coordinates output by the fish detection module, cutting the preprocessed comprehensive image, extracting a region of interest (ROI) containing a fish target, removing a background region outside the rectangular frame, and reducing a subsequent processing range;
Inputting the cut ROI image into a fish segmentation model based on U-Net, classifying pixels in the image through an encoder-decoder structure, and generating a mask of a fish main body and a background, wherein model pre-training weights come from an underwater biological data set;
And (3) performing binarization processing on the ROI image based on the segmentation mask, eliminating a small-area noise region by combining morphological operation, finally reserving fish main pixels, and eliminating interference of underwater suspended matters and aquatic weed impurities, so that the output image only contains fish targets and contour details thereof.
Preferably, the identification and classification module inputs the fish image after the main body is segmented into the transfer learning identification model to perform classification and identification on the fish type, specifically:
constructing a transfer learning model, and constructing a transfer learning identification model based on a pre-trained deep learning network ResNet, wherein the pre-training weight is from an ImageNet or underwater fish data set, and the learned general visual characteristics of the pre-training weight are used as basic parameters;
image feature extraction and adaptation, namely adjusting a fish image after main body segmentation to a model input size, extracting high-level semantic features through a convolution layer and a pooling layer of the model, and introducing batch normalization and dropout regularization in a feature extraction stage;
and when a new fish species is detected, the model supports incremental learning, and the self-adaptive identification of the new species is realized by updating the weight of the classification layer on line.
The invention also provides a method for intelligently investigating the types and the amounts of the fishes, which comprises the following steps:
Extracting targets by a sonar image processing algorithm, counting multiple targets by using a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
Simultaneously, a light supplementing lamp is adopted to dynamically adjust the illumination intensity according to the water depth when the fish is snapped, and the underwater lighting environment is adapted through a light attenuation compensation algorithm;
the acoustic signals output by the fish-shoal three-dimensional image model in association with sonar detection are constructed into an acoustic and optical fish-shoal high-dimensional image data set, and the data set is expanded in a data enhancement mode which is not limited to rotation, scaling and cutting;
Inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
After the rectangular frame position of the fish object is obtained, cutting the object, entering a fish segmentation model to carry out main body segmentation, and removing underwater impurity interference pictures after segmented pictures, so that the pictures only contain the main identification object of the fish object;
The fish image after the body segmentation is input into YOLOv deep learning model, and the fish type is classified and identified. .
The technical scheme of the invention has the following beneficial effects:
1. The invention integrates the three-frequency-band transducer, can dynamically switch the frequency according to the depth of the fish shoal, combines sonar image processing and a multi-target tracking algorithm, can accurately extract the outline of the fish shoal target and count the number, solves the problem that the detection resolution and the distance of the traditional single-frequency-band sonar in water areas with different depths cannot be considered, and realizes the efficient estimation of the fish quantity.
2. According to the invention, the depth camera is used for collecting water depth and turbidity data in real time and marking image time sequence, the shoal image is captured, the multi-form scenes such as the side face and the bending posture of the fish body are covered, and a comprehensive shoal three-dimensional image model is formed, so that the shoal form information can be mastered in an omnibearing manner. The underwater lighting environment is adapted by combining the light attenuation compensation algorithm, the clear and stable quality of the shoal images under different water depths is ensured, the data set is expanded by data enhancement modes such as rotation, scaling and cutting, and the data diversity is improved.
3. According to the invention, through the fast R-CNN algorithm and the U-Net model, the accurate detection and main body segmentation of the fish targets are realized, the interference of underwater impurities is removed, so that the image only retains the fish main body, a pure target area is provided for species identification, and the problem that the fish targets in the underwater complex background are easily interfered is solved.
4. The invention is based on a migration learning construction model, utilizes the pre-training weight to extract high-level semantic features, combines batch normalization and incremental learning mechanisms, not only improves the accuracy of fish species identification, but also can adaptively learn new fish species features, so that the system has continuous expansion capability and meets the dynamic monitoring requirement of fishery resources.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention provides a device for intelligently investigating the type and quantity of fish, comprising:
Extracting targets by a sonar image processing algorithm, counting multiple targets by a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
The three-dimensional image acquisition module 102 acquires water depth and turbidity data in real time by adopting a depth camera, synchronously marks an image time sequence, and captures a fish-school image, covers a multi-form scene which is not limited to the side surface and the bending posture of a fish body, and forms a fish-school three-dimensional image model;
The image preprocessing module 103 is used for constructing an acoustic plus optical fish-shoal high-dimensional image data set by correlating the acoustic signals output by sonar detection with the fish-shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited by rotation, scaling and cutting;
the fish detection module 104 is configured to input the preprocessed high-dimensional image dataset into a fish detection model, invoke a detection algorithm to detect fish in the image, and calculate a coordinate position of a fish detection frame in the image and a corresponding prediction score;
The main body segmentation module 105 completes the cutting of the fish target after obtaining the rectangular frame position of the fish target, and enters a fish segmentation model to carry out main body segmentation, and images after segmentation remove underwater impurity interference images so that the images only contain the main identification object of the fish target;
The identification classification module 106 inputs the fish image obtained by dividing the main body into a transfer learning identification model, and classifies and identifies the fish species.
In this embodiment, the echo intensity monitored by the adaptive sonar detection module 101 is calculated as:
I=10log10(P)
Wherein, P is sonar received power, and the frequency switching threshold value is:
30kHz (low frequency) is suitable for deep water areas (h is more than 20 m), the penetrating capacity is strong, the detection distance is far, but the resolution is lower, 200kHz (medium frequency) is suitable for medium and deep water areas (h is less than or equal to 20 m), the detection distance and the resolution are balanced, and 400kHz (high frequency) is suitable for shallow water areas (h is less than or equal to 10 m), the resolution is high, and small fish schools or fine contours can be captured.
Wherein, the H is the depth of the fish shoal, c is the propagation speed of the sonar signal in water, deltat is the time delay from the transmission of the sonar signal to the reception, f switch is the sound wave transmission frequency, and the sonar detection module performs frequency switching according to the functions according to the different depths detected.
In this embodiment, the adaptive sonar detection module 101 extracts a target through a sonar image processing algorithm, specifically extracts a fish-shoal target contour through Otsu threshold segmentation or Canny edge detection algorithm, where:
otsu threshold segmentation, namely automatically determining an optimal segmentation threshold by calculating the inter-class variance of the gray level histogram of the image, binarizing the sonar image, and separating a fish-shoal target from background noise, thereby being applicable to water area environments with uniform gray level distribution.
Canny edge detection, namely extracting the edge profile of the fish shoal body through Gaussian filtering denoising, gradient amplitude and gradient direction calculation, non-maximum value inhibition and double-threshold screening, and is suitable for target boundary identification under a complex background.
Then, a mode of combining a nearest neighbor algorithm and extended Kalman filtering is adopted to realize continuous tracking of the cross-frame fish swarm target, which comprises the following steps:
And calculating the spatial distance and the feature similarity of the targets between adjacent frames by using a nearest neighbor algorithm for the targets in the sonar image of the first frame, and establishing an initial tracking track.
Then, an extended Kalman filtering algorithm is utilized to predict the target position of the next frame based on a target motion model (such as a uniform speed or uniform acceleration model), then the prediction error is corrected through measured data, and the target track is updated, so that the track loss problem under the scenes of multi-target shielding, cross motion and the like is solved.
And finally, counting the stably tracked target tracks to generate the shoal number data in unit time.
In this embodiment, the adaptive sonar detection module 101 counts the volume/surface density of the volume or area of the water area swept by the sonar, and estimates the fish quantity by combining the area of the water area, specifically:
the sonar scans the water area by fan-shaped wave beams, the scanning angle is theta, the maximum detection distance is R, and the scanning volume is when the depth is h Horizontal scan area
Dividing the tracking count target number N by the scanning volume/area to obtain a bulk density ρ V =n/V or an areal density ρ S =n/S;
Knowing the total water area S total, the total number N total=ρS·Stoal, or the total volume calculated in combination with the average depth.
The three-dimensional image acquisition module 102 adopts a depth camera to acquire water depth and turbidity data in real time and synchronously marks image time sequences, captures images of the fish shoal, covers polymorphic scenes which are not limited to the side surfaces and the bending postures of the fish body, forms a three-dimensional image model of the fish shoal, and simultaneously adopts a light supplementing lamp to dynamically adjust illumination intensity according to the water depth when capturing the fish shoal, and adapts a water lighting environment through a light attenuation compensation algorithm.
The method comprises the steps of acquiring water depth and turbidity data in real time by a depth camera, synchronously marking an image time sequence, acquiring depth information of a fish shoal by a TOF flight time technology when the fish shoal image is captured, fusing the depth information with pixel information of the acquired fish shoal image to generate fish shoal image data with a depth channel, constructing a dynamic three-dimensional image model of the fish shoal in a certain time period by combining the marked image time sequence by using an SFM motion restoration structure algorithm based on a multi-frame image sequence, displaying gesture change in the swimming process of the fish shoal, and finally, ensuring that the depth data of the fish shoal can be acquired in different underwater distance ranges by adjusting the angle of view and resolution parameters of the depth camera.
The underwater light environment is adapted through a light attenuation compensation algorithm, which comprises the steps of firstly, carrying out illumination equalization treatment on an acquired fish image, adopting a self-adaptive histogram equalization algorithm, adjusting image contrast according to light intensity distribution after light attenuation compensation, secondly, introducing an underwater light attenuation compensation network, constructing a light attenuation compensation model based on a U-Net architecture, inputting an original image and a light attenuation coefficient, outputting the image after light attenuation compensation, and adopting the following formula:
Calculating a light attenuation compensation enhancement value, combining the image features before and after light attenuation compensation, and fusing the compensated image features with the original image features through a feature fusion layer;
wherein, the A light attenuation compensation enhancement value representing pixel location (i, j) on channel c,Representing the intensity value of the pixel location (i, j) on channel c in the original image, alpha i,j representing the local light attenuation estimation factor at pixel location (i, j),A color shift correction factor representing pixel location (i, j) on channel c,A compensating perturbation factor representing the pixel location (i, j) on channel c,Representing the estimated attenuation value for pixel location (i, j) on channel c,Representing the local average of the estimated attenuation values of the neighborhood centered around pixel location (i, j) on channel c.
In this embodiment, an illumination equalization process is performed on a frame of raw image data acquired from an ROV (ROV) region at a sea of 30m at east sea, the raw image data including clown fish, specifically, the image is divided into a plurality of non-overlapping sub-regions of 8x8 pixels, for example, for the sub-region of the mth row and n column in the image, the gray histogram thereof is counted, a clipping threshold is set, the threshold is obtained by dividing the total number of pixels in the sub-region by the gray level 256, that is, the clipping threshold= (8×8)/256=0.25, the frequency part exceeding 0.25 in the histogram is clipped, and the number of clipped pixels is redistributed to each gray level in the histogram, thereby limiting the excessive amplification of the local contrast ratio, then, a cumulative distribution function is calculated for the clipped and redistributed histogram, and a new gray mapping value for each pixel in the image is obtained according to the function, a final value is mapped from the gray value of four adjacent sub-regions around the position thereof by the gray value of the pixel in the image, the color of the image is calculated, the attenuation of the color is compensated for each gray value in the channel is calculated, the light equalization model is completed, the color is input to the light equalization model is calculated for each gray value in the channel, the channel is calculated, the color equalization model is completedThe calculation process is represented by the formulaDefinition, wherein,Representing the light attenuation compensation enhancement value of the pixel at coordinates (i, j) over color channel c, which is a unit-free scalar,The light intensity value of the original input image at the channel c and the position (i, j) is [0,255], the subscript i, j respectively represents the row and column coordinates of the pixel in the image, the superscript (c) represents the color channel, which can be one of R, G, B channels, the core operation logic in the formula is that the molecular part estimates the original light intensity by the local light attenuation factor alpha i,j Preliminary amplifying and correcting by color shift correction factorAdditive correction is carried out, the denominator part is a regularization term, and the disturbance factor is compensatedEnsuring that the denominator is not zero and estimating the attenuation valueWith neighborhood mean valueWhen the difference between the attenuation characteristics of a pixel and its neighborhood is large, the denominator is increased, thus suppressing the compensation intensity, avoiding the creation of excessively enhanced artifacts in the edge and texture areas, the absolute value operation ensures that the final output enhancement value is non-negative, the formula being beneficial by introducing a comparison with the neighborhood attenuation characteristicsThe term is used for enabling the compensation process to have local self-adaptability, carrying out stronger color and brightness recovery in a smooth area, carrying out mild enhancement in a texture-rich area and effectively protecting image details, taking calculation of a pixel point with coordinates of (100, 150) in an image in a red channel (R channel) as an example for acquiring each parameter in a formula, firstly directly acquiring the light intensity value of the pixel point by an image acquisition deviceFor 50, the encoder-decoder architecture of the U-Net model generates a plurality of feature maps, one of which corresponds to an output local light attenuation estimation factor α i,j that is inversely related to the local brightness of the pixel, assuming that the model outputs α 100,150 as 0.3 based on the low brightness features of the point and its neighborhood, the value setting refers to a large amount of underwater image data, the value range is experimentally set between [0,1.5], and the other output feature map corresponds to a color shift correction factorAs the attenuation of the underwater environment to the red light is the most serious, the model can output a larger forward shift aiming at the R channel, and the model is set to output20, The setting range of the value is determined according to the average attenuation rate of different color channels, the beta value range of the R channel in the experiment is usually between [10,50], and the disturbance factor is compensatedThe value of the output value is usually small, the set basis is to ensure the stability of the denominator gradient in the training process, the value range of the output value is limited in [0.01,0.2] through training of tens of thousands of sample images, and the output value of the model isAn estimated attenuation value of 0.05Also directly output by the network, representing an estimate of the degree of attenuation of the light at that point by the model, its value normalized to the [0,1] interval, assuming the model output0.7 ForThen need to calculateThe arithmetic average of the estimated attenuation values for all 25 pixels in a surrounding 5x5 neighborhood, as shown in table 1 below, is the partial estimated attenuation value for that 5x5 neighborhood,
TABLE 1 sample Pixel Advance attenuation value Table
| Coordinates of |
R-channel estimated attenuation value |
Coordinates of |
R-channel estimated attenuation value |
| (98,148) |
0.72 |
(100,150) |
0.70 |
| (98,149) |
0.71 |
(100,151) |
0.68 |
| (98,150) |
0.70 |
(100,152) |
0.69 |
| (99,148) |
0.73 |
(101,148) |
0.75 |
| (99,149) |
0.72 |
(101,149) |
0.73 |
| (99,150) |
0.71 |
(101,150) |
0.72 |
| (99,151) |
0.69 |
(101,151) |
0.70 |
| (99,152) |
0.69 |
(101,152) |
0.71 |
| (100,148) |
0.74 |
(102,150) |
0.74 |
| (100,149) |
0.72 |
(102,151) |
0.72 |
As shown in Table 1, the local average is calculated by summing the attenuation values (only a partial example in the table) of all 25 points in the neighborhood, and dividing by 25All the parameter values are substituted into a formula for calculation at 0.71, The result shows that the light attenuation compensation enhancement value of the pixel (100, 150) in the R channel is 82.15, the value is used as one input for the subsequent feature fusion, and finally, the compensation enhancement image features obtained by calculating all pixel points are fused with the original image features after the illumination equalization treatment, and the specific implementation method is that the compensation enhancement value is as followsAnd the original light intensity valueThe weighted summation is carried out, the setting of the weight coefficient is based on the balance of retaining the details of the original image and the highlighting compensation effect, the original image weight w 0rig is set to be 0.4 through experimental test, the compensation image weight w comp is set to be 0.6, and the finally output pixel value is And performing truncation processing on the result beyond the range of [0,255] to obtain a final image after light attenuation compensation.
The image preprocessing module 103 correlates the acoustic signals output by sonar detection with the fish three-dimensional image model to construct an acoustic plus optical fish high-dimensional image dataset, specifically:
respectively carrying out band-pass filtering pretreatment on three-frequency band sonar echo signals of 30kHz, 200kHz and 400kHz to remove environmental noise and frequency aliasing interference;
Adopting a GPS timing module to perform time stamp synchronization on the sonar signal and the optical image, and ensuring data alignment at the same moment;
Calibrating sonar detection depth and optical acquisition depth by an extended Kalman filtering algorithm;
Extracting time-frequency characteristics, space characteristics and geometric characteristics and texture characteristics of a three-dimensional image of a sonar signal;
Weighting and fusing the extracted acoustic features and the optical features, and performing dimension reduction on the fused high-dimension features;
Layering labeling of behavior mode, fingerling attribute and environmental parameter is carried out on the fused data set, and cross-modal consistency test, sample equalization processing and incremental learning verification are carried out on the data set
The method comprises the steps of carrying out rotation processing on fish images segmented by a main body, simulating different swimming postures of fish under water, carrying out scaling processing on the images, controlling the scaling ratio to be 0.8-1.2 times so as to adapt to the size change of fish bodies under different shooting distances, cutting out an area containing complete fish bodies from an original image in a random cutting mode, wherein the cutting size is not less than 70% of the original image, introducing Mosaic data enhancement, randomly splicing a plurality of different fish images and simulating a multi-fish-swarm scene.
In this embodiment, the fish detection module 104 invokes a detection algorithm to detect fish in the image, and calculates a coordinate position of a fish detection frame in the image and a corresponding prediction score, which specifically includes:
The multiscale visual characteristics of the preprocessed image are extracted through a Faster R-CNN detection algorithm, and the morphology, texture and edge information of fish are captured. And a Faster R-CNN detection algorithm is adopted, and feature extraction operation is carried out on the preprocessed comprehensive image by depending on a backbone network comprising a convolution layer and a pooling layer. The convolution layer generates a characteristic diagram containing fish morphology (such as curve characteristics of fish outline), texture (such as repeated pattern formed by fish scale arrangement) and edge (such as gray abrupt change limit of fish and background) information through sliding calculation at different positions of an image by a plurality of convolution kernels (such as 3×3 and 5×5) with different sizes. The pooling layer (maximum pooling or average pooling) performs downsampling on the feature map, reduces feature dimension and calculation amount while maintaining key features, enhances translation invariance of the features, and ensures that the features can be stably extracted when fish are at different positions in the image. And fusing the feature graphs (such as shallow feature graphs which retain more detail textures and deep feature graphs which contain more abstract semantic information) output by different levels of the backbone network by using a feature pyramid structure (FPN) of Faster R-CNN. Through up-sampling, transverse connection and other operations, a multi-scale characteristic map is generated, so that the algorithm can effectively capture the characteristics of fishes with different sizes. For small-sized fishes, the shallow high-resolution characteristic map is relied on to identify fine textures, and for large-sized fishes, the deep characteristic map is used for understanding the overall morphology, so that the detection adaptability to fishes with different sizes is improved.
And generating an anchor frame based on the feature map, adjusting the position of the anchor frame through regression calculation, and determining the coordinates of the fish detection frame. Based on the fused multi-scale feature map, generating a plurality of anchor frames at each pixel position of the feature map according to preset anchor frame sizes (such as 16×16, 32×32, 64×64 and the like, adapting to different fish sizes) and aspect ratios (such as 1:1, 1:2, 2:1, matching with common fish forms), and covering a region where fish possibly exists in the image. For each anchor frame, predicting the offset (including horizontal offset deltax, vertical offset deltay, width scaling factor deltaomega and height scaling factor deltah) between the anchor frame and the boundary frame of the real fish through a regression model, and adjusting the position and the size of the anchor frame by using the offsets, so that the adjusted detection frame is closer to the boundary of the real fish target, wherein a calculation formula can be expressed as follows:
xnew=xanchor+Δx·ωanchor
ynew=yanchor+Δy·hanchor
ωnew=ωanchor·exp(Δω)
hnew=hanchor·exp(Δh)
wherein, (x anchor,yanchor) is the left upper corner coordinate of the anchor frame, (omega anchor,hanchor) is the width and height of the anchor frame, (x new,ynew) is the left upper corner coordinate of the adjusted detection frame, and (omega new,hnew) is the width and height of the adjusted detection frame, so that the accurate determination of the coordinates of the detection frame is realized.
And calculating the class probability of the target in each detection frame by using a classifier, and outputting the prediction score and the corresponding fish class label. Specifically, for each detection frame after regression adjustment, feature vectors of the corresponding feature map region are extracted and input into a classifier (usually a full connection layer matched with a Softmax function). The classifier calculates the probability of the target in the detection frame belonging to each category according to the characteristic difference between the learned fish and the non-fish and different fish types in the pre-training process. For example, for a detection box containing crucian, the classifier will output probability values that it belongs to the "crucian" class, as well as probability values that it belongs to other fish classes (e.g. carp, grass carp, etc.) and non-fish classes. Setting a probability threshold (for example, 0.5, and adjusting according to actual detection requirements), screening a detection frame with the prediction probability larger than the threshold, taking the corresponding class probability as the prediction score, and outputting the corresponding fish class labels (for example, crucian and carp). And judging that the detection frame with the prediction score is lower than a threshold value is false detection or background, filtering the false detection or background, and finally obtaining the coordinate position of the detection frame, the prediction score and the corresponding class label of the fish in the image, thereby providing accurate target area information for the subsequent fish body segmentation and class identification classification.
In this embodiment, the main body segmentation module 105 completes the cutting of the target after obtaining the rectangular frame position of the fish target, and enters the fish segmentation model to segment the main body, and the segmented picture eliminates the underwater impurity interference picture, specifically:
Based on the rectangular frame coordinates output by the fish detection module, the preprocessed comprehensive image is cut, a region of interest (ROI) containing fish targets is extracted, a background region outside the rectangular frame is removed, and the subsequent processing range is reduced. Based on rectangular frame coordinates (x 1,y1,x2,y2) output by the fish detection module, the coordinates need to be mapped from the feature map scale of the detection model back to the original pixel scale of the preprocessed image (the coordinate corresponding relation needs to be restored through deconvolution or interpolation due to the sampling operation of the detection model). In order to avoid losing the edge of the fish body during cutting, the coordinates are elastically expanded, and the formula is as follows:
Wherein W, H is the width and height of the preprocessed image, alpha is the elastic expansion proportion coefficient, and is used for controlling the degree of expansion of the rectangular frame to the outside of the boundary, and the value is usually between 0 and 1. (x 1,y1) represents the coordinates of the top left corner vertex of the detected fish object rectangular frame for locating the starting position of the rectangular frame in the image, and (x 2,y2) represents the coordinates of the bottom right corner vertex of the detected fish object rectangular frame. After elastic expansion calculation, the vertex coordinates of the left upper corner of the obtained new rectangular frame are adjusted; After elastic expansion calculation, the vertex coordinates of the right lower angle of the obtained new rectangular frame are adjusted.
The operation ensures that the edge details (such as fish tails and fish fins) of the fish body are completely reserved, avoids segmentation missing caused by boundary errors of a detection frame,
Then inputting the cut ROI image into a fish segmentation model based on U-Net, classifying pixels in the image through an encoder-decoder structure to generate a mask of a fish main body and a background, wherein the model pre-training weight is from an underwater biological data set, and the encoder-decoder structure specifically comprises:
1. encoder feature extraction:
A 5-layer downsampling encoder is used, each layer containing convolution (3 x 3 kernel, step size 2), batch normalization, reLU activation. Initializing with pre-training weight of underwater biological data set, extracting low-level features such as edge and texture (such as fish scale texture and fish body outline) at layer 1, and extracting semantic high-level features (such as fish overall shape and category distinguishing features) at layer 5.
2. Decoder feature recovery:
The decoder gradually restores the image resolution through up-sampling convolution (2×2 kernel, step size 2), and simultaneously fuses the feature images of the corresponding levels of the encoder (through jump connection), so that the problem of feature loss of the underwater image caused by scattering is solved. For example, the fusion of layer 1 and layer 5 features can enhance semantic classification capabilities while restoring fish details.
And (3) performing binarization processing on the ROI image based on the segmentation mask, eliminating a small-area noise region by combining morphological operation, finally reserving fish main pixels, and eliminating interference of underwater suspended matters and aquatic weed impurities, so that the output image only contains fish targets and contour details thereof. In morphological operation, firstly, open operation (corrosion and expansion) is carried out, 3X 3 rectangular cores are selected as structural elements, small area noise (such as bubbles and aquatic weed fragments, and the noise removal rate of 10 pixels is more than 90%) is eliminated, and then, closed operation (expansion and corrosion) is carried out, so that cavities (such as small gaps at fish mouths and fish gills) generated by shielding in the fish body are filled, and the profile of the fish body is more complete.
In this embodiment, the identification classification module 106 inputs the fish image after the segmentation of the main body into the migration learning identification model to perform classification and identification on the fish type, specifically:
And constructing a transfer learning model, and constructing a transfer learning identification model based on a pre-trained deep learning network ResNet, wherein the pre-training weight is from an ImageNet or underwater fish data set, and the learned general visual characteristics of the pre-training weight are used as basic parameters. In constructing a transfer learning model for fish identification, a pre-trained deep learning network ResNet is selected as an infrastructure. ResNet by virtue of the residual error connection structure, the gradient vanishing problem in deep network training can be effectively relieved, pre-training is finished on a large-scale universal image dataset ImageNet, and abundant universal visual features, such as basic modes of edges, outlines, textures and the like of objects, are learned. Meanwhile, weights pre-trained on the underwater fish data set can be adopted, and the weights are optimized for the underwater environment and are more fit for fish identification scenes. By taking the learned general visual features in the pre-training weights as basic parameters, the model has certain feature extraction capability during initialization, a learning basic visual mode is not needed from scratch, the data volume and calculation resources required by model training are greatly reduced, model convergence is accelerated, and the recognition basis of the fish features is promoted.
And (3) extracting and adapting image features, namely adjusting the fish image after the main body is segmented to the input size of the model, extracting high-level semantic features through a convolution layer and a pooling layer of the model, and introducing batch normalization and dropout regularization in a feature extraction stage. Firstly, since the model has a fixed requirement on the input size, the fish image obtained by dividing the main body needs to be adjusted to the corresponding size. This step ensures that the image can be smoothly input into the model for subsequent processing. Then, the convolution layer of the model is utilized to extract the layer-by-layer characteristics of the image, the convolution kernel slides on the image to capture the characteristics of different layers, and the characteristics range from simple edge and color information to complex fish body morphology and texture combination and the like; and the pooling layer downsamples the feature map obtained by convolution, reduces feature dimensions and calculated amount while retaining key features, and enables the model to pay more attention to global features. The regularization of dropout can randomly turn off partial neurons, avoid the model from excessively depending on specific features and from being over fitted, and can stably extract effective features when the model faces different underwater environments and fish images with different postures.
And when a new fish species is detected, the model supports incremental learning, and the self-adaptive identification of the new species is realized by updating the weight of the classification layer on line. After the high-level semantic features are extracted, the full-connection layer of the model integrates and maps the features, and converts the high-dimensional features into dimensions corresponding to the number of fish categories. Then, the softmax classifier processes the full-connection layer output and converts the full-connection layer output into probability distribution of fishes in each category, and the probability value is larger, so that the probability that the fishes in the image belong to the category is higher. Then, selecting the category with the highest probability, outputting corresponding fish category labels such as 'crucian', 'carp' and the like, and simultaneously outputting confidence scores of the labels to reflect the reliability of the identification result. When a new fish species is encountered, the model supports incremental learning, the whole model is not required to be retrained, and the weight of the classification layer is only required to be updated online, so that the model can learn the characteristic mode of the new fish species, thereby self-adaptively identifying the new species, and enabling the model to have the advantages of continuously expanding the identification capability and adapting to the dynamic change of the fish species.
As shown in fig. 2, the invention further provides a method for intelligently investigating the types and the amounts of fish, which comprises the following steps:
S1, extracting targets by a sonar image processing algorithm, counting multi-target tracking by using a nearest neighbor algorithm in combination with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by a sonar, and estimating the quantity of fishes in combination with the area of the water area;
S2, acquiring water depth and turbidity data in real time by adopting a depth camera, synchronously marking an image time sequence, capturing a fish school image, covering a multi-form scene which is not limited to the side surface and the bending posture of a fish school to form a fish school three-dimensional image model;
S3, constructing an acoustic plus optical fish shoal high-dimensional image data set by correlating acoustic signals output by sonar detection with the fish shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited to rotation, scaling and cutting;
S4, inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
S5, after the rectangular frame position of the fish object is obtained, cutting the object, entering a fish segmentation model to carry out main segmentation, and removing underwater impurity interference pictures from the segmented pictures to enable the pictures to only contain the main identification object of the fish object;
S6, inputting the fish images after the main body segmentation into a transfer learning identification model, and classifying and identifying the fish types.
According to the intelligent fish investigation device and method provided by the invention, through multi-module cooperation and technology fusion, accurate investigation of the types and the quantity of fish is realized, and the working principle is as follows:
The device integrates low, medium and high frequency band transducers (30 kHz, 200kHz and 400 kHz), and can automatically switch frequencies according to the depths of fish shoals, wherein the low frequency band is used for a deep water area, the detection distance is ensured, the medium frequency band is used for a medium and deep water area, the balance distance and resolution are balanced, the high frequency band is used for a shallow water area, and the details of fish bodies are captured. The sonar scans the water area by fan-shaped wave beams, extracts fish shoal targets through an image processing algorithm, counts the number of targets by combining a tracking algorithm, calculates the density according to the volume or the area of the scanned area, and further estimates the total number of fish in the water area.
The method comprises the steps of acquiring a fish swarm image in real time by using a depth camera, synchronously recording data such as water depth, turbidity and the like, acquiring depth information by using a time-of-flight technology, constructing a fish swarm dynamic three-dimensional model by combining multi-frame image sequences, and covering various postures such as side surfaces, bending and the like of a fish body. When shooting, the light supplementing lamp can automatically adjust the brightness according to the water depth, and the light attenuation compensation algorithm is matched, so that the definition of an underwater image is improved through histogram equalization, an image enhancement model and feature fusion, and the problem of uneven illumination is solved.
And (3) correlating the acoustic signals detected by the sonar with the optical images acquired by the camera, denoising the sonar signals, synchronizing with the time of the images, calibrating depth data of the sonar signals and the images, extracting acoustic features (time frequency and space) and optical features (geometry and texture), and fusing to construct a high-dimensional data set. Meanwhile, data are expanded through modes of rotating, zooming, cutting out images, splicing a plurality of pictures and the like, different postures of fishes and multiple fish-shoal scenes are simulated, and data diversity is enhanced.
And extracting multi-scale features of the preprocessed image by using a target detection algorithm, generating a candidate frame, adjusting the position, and determining specific coordinates and belonging category probability of the fish in the image. Based on the detection result, cutting out the region of interest, inputting a segmentation model for pixel-level classification, removing underwater suspended matters, aquatic weeds and other impurities, only retaining the fish main body, and providing a pure image for subsequent identification.
The method is characterized in that an identification model is built based on a pre-trained deep learning network, the learned general visual features of the identification model are used for extracting high-level semantic features of fish, and the model is prevented from being fitted excessively by combining a regularization method. And generating fish class probability distribution through a classifier, and outputting the class with the highest probability and the confidence. When a new fish species is encountered, the model supports online updating of parameters, self-adaptive learning of new class characteristics, and continuous expansion of recognition capability is realized.
The method realizes automatic analysis from fish quantity statistics to species identification through a complete flow of sonar detection estimated quantity, image acquisition and model construction, data fusion and enhancement, target detection and positioning, main body segmentation denoising and transfer learning classification. The multi-band sonar and depth camera hardware cooperation, the cross-modal processing of acoustic optical data and the intelligent model with the incremental learning capability effectively solve the problems of poor depth adaptability, low image quality, difficult new species identification and the like in the traditional investigation, and provide a high-efficiency and accurate technical scheme for fishery resource management and ecological monitoring.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.