CN121074619B - Device and method for intelligently investigating fish types and amounts - Google Patents

Device and method for intelligently investigating fish types and amounts

Info

Publication number
CN121074619B
CN121074619B CN202511111501.1A CN202511111501A CN121074619B CN 121074619 B CN121074619 B CN 121074619B CN 202511111501 A CN202511111501 A CN 202511111501A CN 121074619 B CN121074619 B CN 121074619B
Authority
CN
China
Prior art keywords
fish
image
model
detection
sonar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202511111501.1A
Other languages
Chinese (zh)
Other versions
CN121074619A (en
Inventor
陈淑峰
董志英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Academy Of Ecological And Environmental Protection
Original Assignee
Beijing Academy Of Ecological And Environmental Protection
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Academy Of Ecological And Environmental Protection filed Critical Beijing Academy Of Ecological And Environmental Protection
Priority to CN202511111501.1A priority Critical patent/CN121074619B/en
Publication of CN121074619A publication Critical patent/CN121074619A/en
Application granted granted Critical
Publication of CN121074619B publication Critical patent/CN121074619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Landscapes

  • Image Analysis (AREA)

Abstract

本发明公开一种智能化调查鱼类种类和数量的装置及方法。装置包括自适应声呐探测模块、三维图像采集模块、图像预处理模块、鱼类检测模块、主体分割模块和识别分类模块。声呐模块集成三频段换能器,可根据鱼群深度动态切换频率,结合算法实现目标跟踪计数与数量估算;图像采集模块利用深度相机结合光衰减补偿算法,构建鱼群三维图像模型;预处理模块融合声学与光学特征,扩充数据集;检测与分割模块精准定位鱼类并剔除杂质;识别分类模块基于迁移学习实现种类识别,支持新鱼种增量学习。该方法通过多模块协同,解决了传统调查中探测精度不足、图像质量差等问题,实现鱼类种类和数量的高效精准调查,为渔业资源管理与生态保护提供技术支撑。

This invention discloses an intelligent device and method for surveying fish species and quantities. The device includes an adaptive sonar detection module, a 3D image acquisition module, an image preprocessing module, a fish detection module, a subject segmentation module, and a recognition and classification module. The sonar module integrates a three-band transducer, which can dynamically switch frequencies according to the depth of the fish school, and combine algorithms to achieve target tracking, counting, and quantity estimation; the image acquisition module uses a depth camera combined with a light attenuation compensation algorithm to construct a 3D image model of the fish school; the preprocessing module integrates acoustic and optical features to expand the dataset; the detection and segmentation module accurately locates fish and removes impurities; the recognition and classification module achieves species recognition based on transfer learning and supports incremental learning for new fish species. This method, through the collaboration of multiple modules, solves the problems of insufficient detection accuracy and poor image quality in traditional surveys, achieving efficient and accurate surveys of fish species and quantities, and providing technical support for fisheries resource management and ecological protection.

Description

Device and method for intelligently investigating fish types and amounts
Technical Field
The invention relates to the technical field of fish investigation, in particular to a device and a method for intelligently investigating the types and the amounts of fish.
Background
In the fields of fishery resource investigation, water area ecological monitoring, aquatic organism research and the like, the accurate acquisition of the type and quantity information of fish is of great importance to the evaluation of fishery resource conditions and the maintenance of water area ecological balance. However, the current fish investigation means have a plurality of defects, and the actual demands are difficult to be met efficiently and accurately.
Traditional fish surveys rely primarily on manual fishing sampling, which has significant limitations. On one hand, the artificial fishing is difficult to cover comprehensively due to the influence of the movable range and the inhabitation habit of the fishes (such as that part of the fishes are in daytime and night and are loved in a specific water layer or a complex water area structure), the sample cannot truly reflect the overall distribution and quantity of the fishes in the water area, and on the other hand, the frequent fishing can cause direct damage to the ecology of the water area to interfere the living and reproduction of the fishes, which is contrary to the ecological protection concept and is especially not suitable for investigation scenes of rare or endangered fishes.
With the development of technology, sonar detection is gradually applied to fish investigation, but the conventional sonar system has functional defects. Most sonars adopt a single-frequency-band transducer, and are difficult to adapt to the detection requirements of fish shoals with different depths. In shallow water areas, although the high-frequency sonar has high resolution, the propagation distance is short and cannot be covered completely, and in deep water areas, the low-frequency sonar propagates far and the details of fish bodies are easily lost due to insufficient resolution, so that the target identification and counting errors are large. Meanwhile, in the face of complex water area environments (such as turbid water bodies and underwater topography fluctuation), echo signals are easy to interfere, the traditional sonar image processing algorithm is difficult to effectively extract fish swarm targets, track loss and misjudgment are easy to occur during multi-target tracking, and the number of fishes cannot be counted accurately.
Image recognition techniques also present challenges when assisting fish surveys. The underwater environment is special, the illumination is uneven, the water body is scattered, and the impurity is interfered, so that the quality of the acquired original image is poor. The existing image preprocessing mostly adopts single denoising and enhancing operation, but does not fully integrate multi-dimensional image characteristics (such as gray scale, color and texture), so that clear and effective fish target information is difficult to restore. In the links of fish detection and classification, the adaptation of a general image recognition model to the underwater fish form and texture is poor, optimization aiming at underwater scenes is lacking, model training depends on a large amount of annotation data, new fish species or morphological variant individuals are easy to misjudge, a complete and efficient processing flow is not formed, and the accuracy of fish species recognition and the statistical accuracy of the quantity are difficult to ensure.
In conclusion, the existing fish investigation technology has the defects in aspects of multi-scene adaptation, complex environment anti-interference, module cooperation, accurate identification and counting and the like, and cannot meet the fine requirements of fishery resource dynamic monitoring, ecological research and the like. Therefore, there is a need to develop an intelligent system integrating multi-band adaptive sonar, multi-dimensional image preprocessing, intelligent detection segmentation and migration learning identification, so as to realize efficient and accurate investigation of fish types and quantity and provide powerful technical support for fishery resource management and water area ecological protection.
Disclosure of Invention
The invention aims to solve the technical problems of insufficient single-frequency band sonar detection precision, poor underwater image preprocessing effect, weak adaptability of an identification model and the like in the traditional fish investigation, and realizes the efficient and accurate investigation of the fish types and the fish numbers through the cooperation of multi-frequency band sonar self-adaptive detection, multi-dimensional image preprocessing, intelligent detection segmentation and migration learning identification.
In order to solve the technical problems, the embodiment of the invention provides the following technical scheme:
an apparatus for intelligently investigating fish types and numbers, comprising:
Extracting targets by a sonar image processing algorithm, counting multiple targets by a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
The system comprises a three-dimensional image acquisition module, a light compensation lamp, a light attenuation compensation algorithm, a three-dimensional image acquisition module, a light compensation module and a light attenuation compensation algorithm, wherein the depth camera is used for acquiring water depth and turbidity data in real time and synchronously marking image time sequences, and capturing images of fish shoals, and the images cover polymorphic scenes which are not limited to the side surfaces and the bending postures of fish bodies to form a three-dimensional image model of the fish shoals;
The image preprocessing module is used for constructing an acoustic and optical fish-shoal high-dimensional image data set by correlating acoustic signals output by sonar detection with the fish-shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited by rotation, scaling and cutting;
the fish detection module is used for inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
The main body segmentation module completes the cutting of the fish object after obtaining the rectangular frame position of the fish object, and enters a fish segmentation model to carry out main body segmentation, and the segmented picture eliminates an underwater impurity interference picture so that the image only contains the main identification object of the fish object;
The identification and classification module inputs the fish images after the main body segmentation into the migration learning identification model to carry out classification and identification on fish types.
Preferably, when the depth camera collects water depth and turbidity data in real time and marks image time sequences synchronously, and captures images of the fish school to form a three-dimensional image model of the fish school, the following steps are adopted:
acquiring the depth information of the fish shoal by adopting a TOF flight time technology, and fusing the depth information with the acquired pixel information of the fish shoal image to generate fish shoal image data with a depth channel;
Based on a multi-frame image sequence, a dynamic three-dimensional image model of the fish shoal in a certain time period is constructed by utilizing an SFM motion restoration structure algorithm and combining with a marked image time sequence, and the gesture change in the swimming process of the fish shoal is displayed;
By adjusting the field angle and resolution parameters of the depth camera, the depth data of the fish shoal can be collected in different underwater distance ranges at a certain point cloud density.
Preferably, the adaptation of the underwater light environment by the light attenuation compensation algorithm is specifically:
carrying out illumination equalization treatment on the collected fish images, and adopting a self-adaptive histogram equalization algorithm to adjust the image contrast according to the light intensity distribution after light attenuation compensation;
an underwater light attenuation compensation network is introduced, a light attenuation compensation model is constructed based on a U-Net architecture, an original image and a light attenuation coefficient are input, an image after light attenuation compensation is output, and a formula is adopted:
Calculating a light attenuation compensation enhancement value, combining the image features before and after light attenuation compensation, and fusing the compensated image features with the original image features through a feature fusion layer;
wherein, the A light attenuation compensation enhancement value representing pixel location (i, j) on channel c,Representing the intensity value of the pixel location (i, j) on channel c in the original image, alpha i,j representing the local light attenuation estimation factor at pixel location (i, j),A color shift correction factor representing pixel location (i, j) on channel c,A compensating perturbation factor representing the pixel location (i, j) on channel c,Representing the estimated attenuation value for pixel location (i, j) on channel c,Representing the local average of the estimated attenuation values of the neighborhood centered around pixel location (i, j) on channel c.
Preferably, the acoustic signals output by the fish shoal three-dimensional image model associated with sonar detection are used for constructing an acoustic plus optical fish shoal high-dimensional image data set, which specifically comprises:
respectively carrying out band-pass filtering pretreatment on three-frequency band sonar echo signals of 30kHz, 200kHz and 400kHz to remove environmental noise and frequency aliasing interference;
Adopting a GPS timing module to perform time stamp synchronization on the sonar signal and the optical image, and ensuring data alignment at the same moment;
Calibrating sonar detection depth and optical acquisition depth by an extended Kalman filtering algorithm;
Extracting time-frequency characteristics, space characteristics and geometric characteristics and texture characteristics of a three-dimensional image of a sonar signal;
Weighting and fusing the extracted acoustic features and the optical features, and performing dimension reduction on the fused high-dimension features;
And carrying out layered labeling on the behavior mode, the fingerling attribute and the environmental parameter of the fused data set, and carrying out cross-mode consistency test, sample equalization treatment and increment learning verification on the data set.
Preferably, the data set is expanded by a data enhancement mode which is not limited to rotation, scaling and clipping, and specifically:
carrying out rotation treatment on the fish images after the main body is segmented, wherein the rotation angle range is-15 degrees to 15 degrees, and simulating different swimming postures of fish under water;
Scaling the image to be 0.8-1.2 times to adapt to the size change of the fish body under different shooting distances;
cutting out an area containing the complete fish body from the original image by adopting a random cutting mode, wherein the cutting size is not less than 70% of the original image;
introducing Mosaic data enhancement, randomly splicing a plurality of different fish images, and simulating a multi-fish-swarm scene.
Preferably, the adaptive sonar detection module counts the volume/surface density of the volume or area of the water area swept by the sonar, and estimates the fish quantity by combining the area of the water area, specifically:
the sonar scans the water area by fan-shaped wave beams, the scanning angle is theta, the maximum detection distance is R, and the scanning volume is when the depth is h Horizontal scan area
Dividing the tracking count target number N by the scanning volume/area to obtain a bulk density ρ V =n/V or an areal density ρ S =n/S;
Knowing the total water area S total, the total number N total=ρS·Stoal, or the total volume calculated in combination with the average depth.
Preferably, the fish detection module invokes a detection algorithm to detect fish in the image, and calculates a coordinate position of a fish detection frame in the image and a corresponding prediction score, which specifically includes:
extracting multiscale visual characteristics of the preprocessed image by using a Faster R-CNN detection algorithm, and capturing morphology, texture and edge information of fish;
generating an anchor frame based on the feature map, adjusting the position of the anchor frame through regression calculation, and determining the coordinates of a fish detection frame;
and calculating the class probability of the target in each detection frame by using a classifier, and outputting the prediction score and the corresponding fish class label.
Preferably, the main body segmentation module completes the cutting of the target after obtaining the rectangular frame position of the fish target, and enters a fish segmentation model to carry out main body segmentation, and the segmented picture eliminates underwater impurity interference pictures, specifically:
Based on rectangular frame coordinates output by the fish detection module, cutting the preprocessed comprehensive image, extracting a region of interest (ROI) containing a fish target, removing a background region outside the rectangular frame, and reducing a subsequent processing range;
Inputting the cut ROI image into a fish segmentation model based on U-Net, classifying pixels in the image through an encoder-decoder structure, and generating a mask of a fish main body and a background, wherein model pre-training weights come from an underwater biological data set;
And (3) performing binarization processing on the ROI image based on the segmentation mask, eliminating a small-area noise region by combining morphological operation, finally reserving fish main pixels, and eliminating interference of underwater suspended matters and aquatic weed impurities, so that the output image only contains fish targets and contour details thereof.
Preferably, the identification and classification module inputs the fish image after the main body is segmented into the transfer learning identification model to perform classification and identification on the fish type, specifically:
constructing a transfer learning model, and constructing a transfer learning identification model based on a pre-trained deep learning network ResNet, wherein the pre-training weight is from an ImageNet or underwater fish data set, and the learned general visual characteristics of the pre-training weight are used as basic parameters;
image feature extraction and adaptation, namely adjusting a fish image after main body segmentation to a model input size, extracting high-level semantic features through a convolution layer and a pooling layer of the model, and introducing batch normalization and dropout regularization in a feature extraction stage;
and when a new fish species is detected, the model supports incremental learning, and the self-adaptive identification of the new species is realized by updating the weight of the classification layer on line.
The invention also provides a method for intelligently investigating the types and the amounts of the fishes, which comprises the following steps:
Extracting targets by a sonar image processing algorithm, counting multiple targets by using a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
Simultaneously, a light supplementing lamp is adopted to dynamically adjust the illumination intensity according to the water depth when the fish is snapped, and the underwater lighting environment is adapted through a light attenuation compensation algorithm;
the acoustic signals output by the fish-shoal three-dimensional image model in association with sonar detection are constructed into an acoustic and optical fish-shoal high-dimensional image data set, and the data set is expanded in a data enhancement mode which is not limited to rotation, scaling and cutting;
Inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
After the rectangular frame position of the fish object is obtained, cutting the object, entering a fish segmentation model to carry out main body segmentation, and removing underwater impurity interference pictures after segmented pictures, so that the pictures only contain the main identification object of the fish object;
The fish image after the body segmentation is input into YOLOv deep learning model, and the fish type is classified and identified. .
The technical scheme of the invention has the following beneficial effects:
1. The invention integrates the three-frequency-band transducer, can dynamically switch the frequency according to the depth of the fish shoal, combines sonar image processing and a multi-target tracking algorithm, can accurately extract the outline of the fish shoal target and count the number, solves the problem that the detection resolution and the distance of the traditional single-frequency-band sonar in water areas with different depths cannot be considered, and realizes the efficient estimation of the fish quantity.
2. According to the invention, the depth camera is used for collecting water depth and turbidity data in real time and marking image time sequence, the shoal image is captured, the multi-form scenes such as the side face and the bending posture of the fish body are covered, and a comprehensive shoal three-dimensional image model is formed, so that the shoal form information can be mastered in an omnibearing manner. The underwater lighting environment is adapted by combining the light attenuation compensation algorithm, the clear and stable quality of the shoal images under different water depths is ensured, the data set is expanded by data enhancement modes such as rotation, scaling and cutting, and the data diversity is improved.
3. According to the invention, through the fast R-CNN algorithm and the U-Net model, the accurate detection and main body segmentation of the fish targets are realized, the interference of underwater impurities is removed, so that the image only retains the fish main body, a pure target area is provided for species identification, and the problem that the fish targets in the underwater complex background are easily interfered is solved.
4. The invention is based on a migration learning construction model, utilizes the pre-training weight to extract high-level semantic features, combines batch normalization and incremental learning mechanisms, not only improves the accuracy of fish species identification, but also can adaptively learn new fish species features, so that the system has continuous expansion capability and meets the dynamic monitoring requirement of fishery resources.
Drawings
FIG. 1 is a schematic block diagram of an apparatus for intelligently investigating fish types and numbers in accordance with the present invention;
FIG. 2 is a flow chart of a method for intelligently investigating fish types and amounts in accordance with the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention provides a device for intelligently investigating the type and quantity of fish, comprising:
Extracting targets by a sonar image processing algorithm, counting multiple targets by a nearest neighbor algorithm combined with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by the sonar, and estimating the quantity of fishes by combining the area of the water area;
The three-dimensional image acquisition module 102 acquires water depth and turbidity data in real time by adopting a depth camera, synchronously marks an image time sequence, and captures a fish-school image, covers a multi-form scene which is not limited to the side surface and the bending posture of a fish body, and forms a fish-school three-dimensional image model;
The image preprocessing module 103 is used for constructing an acoustic plus optical fish-shoal high-dimensional image data set by correlating the acoustic signals output by sonar detection with the fish-shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited by rotation, scaling and cutting;
the fish detection module 104 is configured to input the preprocessed high-dimensional image dataset into a fish detection model, invoke a detection algorithm to detect fish in the image, and calculate a coordinate position of a fish detection frame in the image and a corresponding prediction score;
The main body segmentation module 105 completes the cutting of the fish target after obtaining the rectangular frame position of the fish target, and enters a fish segmentation model to carry out main body segmentation, and images after segmentation remove underwater impurity interference images so that the images only contain the main identification object of the fish target;
The identification classification module 106 inputs the fish image obtained by dividing the main body into a transfer learning identification model, and classifies and identifies the fish species.
In this embodiment, the echo intensity monitored by the adaptive sonar detection module 101 is calculated as:
I=10log10(P)
Wherein, P is sonar received power, and the frequency switching threshold value is:
30kHz (low frequency) is suitable for deep water areas (h is more than 20 m), the penetrating capacity is strong, the detection distance is far, but the resolution is lower, 200kHz (medium frequency) is suitable for medium and deep water areas (h is less than or equal to 20 m), the detection distance and the resolution are balanced, and 400kHz (high frequency) is suitable for shallow water areas (h is less than or equal to 10 m), the resolution is high, and small fish schools or fine contours can be captured.
Wherein, the H is the depth of the fish shoal, c is the propagation speed of the sonar signal in water, deltat is the time delay from the transmission of the sonar signal to the reception, f switch is the sound wave transmission frequency, and the sonar detection module performs frequency switching according to the functions according to the different depths detected.
In this embodiment, the adaptive sonar detection module 101 extracts a target through a sonar image processing algorithm, specifically extracts a fish-shoal target contour through Otsu threshold segmentation or Canny edge detection algorithm, where:
otsu threshold segmentation, namely automatically determining an optimal segmentation threshold by calculating the inter-class variance of the gray level histogram of the image, binarizing the sonar image, and separating a fish-shoal target from background noise, thereby being applicable to water area environments with uniform gray level distribution.
Canny edge detection, namely extracting the edge profile of the fish shoal body through Gaussian filtering denoising, gradient amplitude and gradient direction calculation, non-maximum value inhibition and double-threshold screening, and is suitable for target boundary identification under a complex background.
Then, a mode of combining a nearest neighbor algorithm and extended Kalman filtering is adopted to realize continuous tracking of the cross-frame fish swarm target, which comprises the following steps:
And calculating the spatial distance and the feature similarity of the targets between adjacent frames by using a nearest neighbor algorithm for the targets in the sonar image of the first frame, and establishing an initial tracking track.
Then, an extended Kalman filtering algorithm is utilized to predict the target position of the next frame based on a target motion model (such as a uniform speed or uniform acceleration model), then the prediction error is corrected through measured data, and the target track is updated, so that the track loss problem under the scenes of multi-target shielding, cross motion and the like is solved.
And finally, counting the stably tracked target tracks to generate the shoal number data in unit time.
In this embodiment, the adaptive sonar detection module 101 counts the volume/surface density of the volume or area of the water area swept by the sonar, and estimates the fish quantity by combining the area of the water area, specifically:
the sonar scans the water area by fan-shaped wave beams, the scanning angle is theta, the maximum detection distance is R, and the scanning volume is when the depth is h Horizontal scan area
Dividing the tracking count target number N by the scanning volume/area to obtain a bulk density ρ V =n/V or an areal density ρ S =n/S;
Knowing the total water area S total, the total number N total=ρS·Stoal, or the total volume calculated in combination with the average depth.
The three-dimensional image acquisition module 102 adopts a depth camera to acquire water depth and turbidity data in real time and synchronously marks image time sequences, captures images of the fish shoal, covers polymorphic scenes which are not limited to the side surfaces and the bending postures of the fish body, forms a three-dimensional image model of the fish shoal, and simultaneously adopts a light supplementing lamp to dynamically adjust illumination intensity according to the water depth when capturing the fish shoal, and adapts a water lighting environment through a light attenuation compensation algorithm.
The method comprises the steps of acquiring water depth and turbidity data in real time by a depth camera, synchronously marking an image time sequence, acquiring depth information of a fish shoal by a TOF flight time technology when the fish shoal image is captured, fusing the depth information with pixel information of the acquired fish shoal image to generate fish shoal image data with a depth channel, constructing a dynamic three-dimensional image model of the fish shoal in a certain time period by combining the marked image time sequence by using an SFM motion restoration structure algorithm based on a multi-frame image sequence, displaying gesture change in the swimming process of the fish shoal, and finally, ensuring that the depth data of the fish shoal can be acquired in different underwater distance ranges by adjusting the angle of view and resolution parameters of the depth camera.
The underwater light environment is adapted through a light attenuation compensation algorithm, which comprises the steps of firstly, carrying out illumination equalization treatment on an acquired fish image, adopting a self-adaptive histogram equalization algorithm, adjusting image contrast according to light intensity distribution after light attenuation compensation, secondly, introducing an underwater light attenuation compensation network, constructing a light attenuation compensation model based on a U-Net architecture, inputting an original image and a light attenuation coefficient, outputting the image after light attenuation compensation, and adopting the following formula:
Calculating a light attenuation compensation enhancement value, combining the image features before and after light attenuation compensation, and fusing the compensated image features with the original image features through a feature fusion layer;
wherein, the A light attenuation compensation enhancement value representing pixel location (i, j) on channel c,Representing the intensity value of the pixel location (i, j) on channel c in the original image, alpha i,j representing the local light attenuation estimation factor at pixel location (i, j),A color shift correction factor representing pixel location (i, j) on channel c,A compensating perturbation factor representing the pixel location (i, j) on channel c,Representing the estimated attenuation value for pixel location (i, j) on channel c,Representing the local average of the estimated attenuation values of the neighborhood centered around pixel location (i, j) on channel c.
In this embodiment, an illumination equalization process is performed on a frame of raw image data acquired from an ROV (ROV) region at a sea of 30m at east sea, the raw image data including clown fish, specifically, the image is divided into a plurality of non-overlapping sub-regions of 8x8 pixels, for example, for the sub-region of the mth row and n column in the image, the gray histogram thereof is counted, a clipping threshold is set, the threshold is obtained by dividing the total number of pixels in the sub-region by the gray level 256, that is, the clipping threshold= (8×8)/256=0.25, the frequency part exceeding 0.25 in the histogram is clipped, and the number of clipped pixels is redistributed to each gray level in the histogram, thereby limiting the excessive amplification of the local contrast ratio, then, a cumulative distribution function is calculated for the clipped and redistributed histogram, and a new gray mapping value for each pixel in the image is obtained according to the function, a final value is mapped from the gray value of four adjacent sub-regions around the position thereof by the gray value of the pixel in the image, the color of the image is calculated, the attenuation of the color is compensated for each gray value in the channel is calculated, the light equalization model is completed, the color is input to the light equalization model is calculated for each gray value in the channel, the channel is calculated, the color equalization model is completedThe calculation process is represented by the formulaDefinition, wherein,Representing the light attenuation compensation enhancement value of the pixel at coordinates (i, j) over color channel c, which is a unit-free scalar,The light intensity value of the original input image at the channel c and the position (i, j) is [0,255], the subscript i, j respectively represents the row and column coordinates of the pixel in the image, the superscript (c) represents the color channel, which can be one of R, G, B channels, the core operation logic in the formula is that the molecular part estimates the original light intensity by the local light attenuation factor alpha i,j Preliminary amplifying and correcting by color shift correction factorAdditive correction is carried out, the denominator part is a regularization term, and the disturbance factor is compensatedEnsuring that the denominator is not zero and estimating the attenuation valueWith neighborhood mean valueWhen the difference between the attenuation characteristics of a pixel and its neighborhood is large, the denominator is increased, thus suppressing the compensation intensity, avoiding the creation of excessively enhanced artifacts in the edge and texture areas, the absolute value operation ensures that the final output enhancement value is non-negative, the formula being beneficial by introducing a comparison with the neighborhood attenuation characteristicsThe term is used for enabling the compensation process to have local self-adaptability, carrying out stronger color and brightness recovery in a smooth area, carrying out mild enhancement in a texture-rich area and effectively protecting image details, taking calculation of a pixel point with coordinates of (100, 150) in an image in a red channel (R channel) as an example for acquiring each parameter in a formula, firstly directly acquiring the light intensity value of the pixel point by an image acquisition deviceFor 50, the encoder-decoder architecture of the U-Net model generates a plurality of feature maps, one of which corresponds to an output local light attenuation estimation factor α i,j that is inversely related to the local brightness of the pixel, assuming that the model outputs α 100,150 as 0.3 based on the low brightness features of the point and its neighborhood, the value setting refers to a large amount of underwater image data, the value range is experimentally set between [0,1.5], and the other output feature map corresponds to a color shift correction factorAs the attenuation of the underwater environment to the red light is the most serious, the model can output a larger forward shift aiming at the R channel, and the model is set to output20, The setting range of the value is determined according to the average attenuation rate of different color channels, the beta value range of the R channel in the experiment is usually between [10,50], and the disturbance factor is compensatedThe value of the output value is usually small, the set basis is to ensure the stability of the denominator gradient in the training process, the value range of the output value is limited in [0.01,0.2] through training of tens of thousands of sample images, and the output value of the model isAn estimated attenuation value of 0.05Also directly output by the network, representing an estimate of the degree of attenuation of the light at that point by the model, its value normalized to the [0,1] interval, assuming the model output0.7 ForThen need to calculateThe arithmetic average of the estimated attenuation values for all 25 pixels in a surrounding 5x5 neighborhood, as shown in table 1 below, is the partial estimated attenuation value for that 5x5 neighborhood,
TABLE 1 sample Pixel Advance attenuation value Table
Coordinates of R-channel estimated attenuation value Coordinates of R-channel estimated attenuation value
(98,148) 0.72 (100,150) 0.70
(98,149) 0.71 (100,151) 0.68
(98,150) 0.70 (100,152) 0.69
(99,148) 0.73 (101,148) 0.75
(99,149) 0.72 (101,149) 0.73
(99,150) 0.71 (101,150) 0.72
(99,151) 0.69 (101,151) 0.70
(99,152) 0.69 (101,152) 0.71
(100,148) 0.74 (102,150) 0.74
(100,149) 0.72 (102,151) 0.72
As shown in Table 1, the local average is calculated by summing the attenuation values (only a partial example in the table) of all 25 points in the neighborhood, and dividing by 25All the parameter values are substituted into a formula for calculation at 0.71, The result shows that the light attenuation compensation enhancement value of the pixel (100, 150) in the R channel is 82.15, the value is used as one input for the subsequent feature fusion, and finally, the compensation enhancement image features obtained by calculating all pixel points are fused with the original image features after the illumination equalization treatment, and the specific implementation method is that the compensation enhancement value is as followsAnd the original light intensity valueThe weighted summation is carried out, the setting of the weight coefficient is based on the balance of retaining the details of the original image and the highlighting compensation effect, the original image weight w 0rig is set to be 0.4 through experimental test, the compensation image weight w comp is set to be 0.6, and the finally output pixel value is And performing truncation processing on the result beyond the range of [0,255] to obtain a final image after light attenuation compensation.
The image preprocessing module 103 correlates the acoustic signals output by sonar detection with the fish three-dimensional image model to construct an acoustic plus optical fish high-dimensional image dataset, specifically:
respectively carrying out band-pass filtering pretreatment on three-frequency band sonar echo signals of 30kHz, 200kHz and 400kHz to remove environmental noise and frequency aliasing interference;
Adopting a GPS timing module to perform time stamp synchronization on the sonar signal and the optical image, and ensuring data alignment at the same moment;
Calibrating sonar detection depth and optical acquisition depth by an extended Kalman filtering algorithm;
Extracting time-frequency characteristics, space characteristics and geometric characteristics and texture characteristics of a three-dimensional image of a sonar signal;
Weighting and fusing the extracted acoustic features and the optical features, and performing dimension reduction on the fused high-dimension features;
Layering labeling of behavior mode, fingerling attribute and environmental parameter is carried out on the fused data set, and cross-modal consistency test, sample equalization processing and incremental learning verification are carried out on the data set
The method comprises the steps of carrying out rotation processing on fish images segmented by a main body, simulating different swimming postures of fish under water, carrying out scaling processing on the images, controlling the scaling ratio to be 0.8-1.2 times so as to adapt to the size change of fish bodies under different shooting distances, cutting out an area containing complete fish bodies from an original image in a random cutting mode, wherein the cutting size is not less than 70% of the original image, introducing Mosaic data enhancement, randomly splicing a plurality of different fish images and simulating a multi-fish-swarm scene.
In this embodiment, the fish detection module 104 invokes a detection algorithm to detect fish in the image, and calculates a coordinate position of a fish detection frame in the image and a corresponding prediction score, which specifically includes:
The multiscale visual characteristics of the preprocessed image are extracted through a Faster R-CNN detection algorithm, and the morphology, texture and edge information of fish are captured. And a Faster R-CNN detection algorithm is adopted, and feature extraction operation is carried out on the preprocessed comprehensive image by depending on a backbone network comprising a convolution layer and a pooling layer. The convolution layer generates a characteristic diagram containing fish morphology (such as curve characteristics of fish outline), texture (such as repeated pattern formed by fish scale arrangement) and edge (such as gray abrupt change limit of fish and background) information through sliding calculation at different positions of an image by a plurality of convolution kernels (such as 3×3 and 5×5) with different sizes. The pooling layer (maximum pooling or average pooling) performs downsampling on the feature map, reduces feature dimension and calculation amount while maintaining key features, enhances translation invariance of the features, and ensures that the features can be stably extracted when fish are at different positions in the image. And fusing the feature graphs (such as shallow feature graphs which retain more detail textures and deep feature graphs which contain more abstract semantic information) output by different levels of the backbone network by using a feature pyramid structure (FPN) of Faster R-CNN. Through up-sampling, transverse connection and other operations, a multi-scale characteristic map is generated, so that the algorithm can effectively capture the characteristics of fishes with different sizes. For small-sized fishes, the shallow high-resolution characteristic map is relied on to identify fine textures, and for large-sized fishes, the deep characteristic map is used for understanding the overall morphology, so that the detection adaptability to fishes with different sizes is improved.
And generating an anchor frame based on the feature map, adjusting the position of the anchor frame through regression calculation, and determining the coordinates of the fish detection frame. Based on the fused multi-scale feature map, generating a plurality of anchor frames at each pixel position of the feature map according to preset anchor frame sizes (such as 16×16, 32×32, 64×64 and the like, adapting to different fish sizes) and aspect ratios (such as 1:1, 1:2, 2:1, matching with common fish forms), and covering a region where fish possibly exists in the image. For each anchor frame, predicting the offset (including horizontal offset deltax, vertical offset deltay, width scaling factor deltaomega and height scaling factor deltah) between the anchor frame and the boundary frame of the real fish through a regression model, and adjusting the position and the size of the anchor frame by using the offsets, so that the adjusted detection frame is closer to the boundary of the real fish target, wherein a calculation formula can be expressed as follows:
xnew=xanchor+Δx·ωanchor
ynew=yanchor+Δy·hanchor
ωnew=ωanchor·exp(Δω)
hnew=hanchor·exp(Δh)
wherein, (x anchor,yanchor) is the left upper corner coordinate of the anchor frame, (omega anchor,hanchor) is the width and height of the anchor frame, (x new,ynew) is the left upper corner coordinate of the adjusted detection frame, and (omega new,hnew) is the width and height of the adjusted detection frame, so that the accurate determination of the coordinates of the detection frame is realized.
And calculating the class probability of the target in each detection frame by using a classifier, and outputting the prediction score and the corresponding fish class label. Specifically, for each detection frame after regression adjustment, feature vectors of the corresponding feature map region are extracted and input into a classifier (usually a full connection layer matched with a Softmax function). The classifier calculates the probability of the target in the detection frame belonging to each category according to the characteristic difference between the learned fish and the non-fish and different fish types in the pre-training process. For example, for a detection box containing crucian, the classifier will output probability values that it belongs to the "crucian" class, as well as probability values that it belongs to other fish classes (e.g. carp, grass carp, etc.) and non-fish classes. Setting a probability threshold (for example, 0.5, and adjusting according to actual detection requirements), screening a detection frame with the prediction probability larger than the threshold, taking the corresponding class probability as the prediction score, and outputting the corresponding fish class labels (for example, crucian and carp). And judging that the detection frame with the prediction score is lower than a threshold value is false detection or background, filtering the false detection or background, and finally obtaining the coordinate position of the detection frame, the prediction score and the corresponding class label of the fish in the image, thereby providing accurate target area information for the subsequent fish body segmentation and class identification classification.
In this embodiment, the main body segmentation module 105 completes the cutting of the target after obtaining the rectangular frame position of the fish target, and enters the fish segmentation model to segment the main body, and the segmented picture eliminates the underwater impurity interference picture, specifically:
Based on the rectangular frame coordinates output by the fish detection module, the preprocessed comprehensive image is cut, a region of interest (ROI) containing fish targets is extracted, a background region outside the rectangular frame is removed, and the subsequent processing range is reduced. Based on rectangular frame coordinates (x 1,y1,x2,y2) output by the fish detection module, the coordinates need to be mapped from the feature map scale of the detection model back to the original pixel scale of the preprocessed image (the coordinate corresponding relation needs to be restored through deconvolution or interpolation due to the sampling operation of the detection model). In order to avoid losing the edge of the fish body during cutting, the coordinates are elastically expanded, and the formula is as follows:
Wherein W, H is the width and height of the preprocessed image, alpha is the elastic expansion proportion coefficient, and is used for controlling the degree of expansion of the rectangular frame to the outside of the boundary, and the value is usually between 0 and 1. (x 1,y1) represents the coordinates of the top left corner vertex of the detected fish object rectangular frame for locating the starting position of the rectangular frame in the image, and (x 2,y2) represents the coordinates of the bottom right corner vertex of the detected fish object rectangular frame. After elastic expansion calculation, the vertex coordinates of the left upper corner of the obtained new rectangular frame are adjusted; After elastic expansion calculation, the vertex coordinates of the right lower angle of the obtained new rectangular frame are adjusted.
The operation ensures that the edge details (such as fish tails and fish fins) of the fish body are completely reserved, avoids segmentation missing caused by boundary errors of a detection frame,
Then inputting the cut ROI image into a fish segmentation model based on U-Net, classifying pixels in the image through an encoder-decoder structure to generate a mask of a fish main body and a background, wherein the model pre-training weight is from an underwater biological data set, and the encoder-decoder structure specifically comprises:
1. encoder feature extraction:
A 5-layer downsampling encoder is used, each layer containing convolution (3 x 3 kernel, step size 2), batch normalization, reLU activation. Initializing with pre-training weight of underwater biological data set, extracting low-level features such as edge and texture (such as fish scale texture and fish body outline) at layer 1, and extracting semantic high-level features (such as fish overall shape and category distinguishing features) at layer 5.
2. Decoder feature recovery:
The decoder gradually restores the image resolution through up-sampling convolution (2×2 kernel, step size 2), and simultaneously fuses the feature images of the corresponding levels of the encoder (through jump connection), so that the problem of feature loss of the underwater image caused by scattering is solved. For example, the fusion of layer 1 and layer 5 features can enhance semantic classification capabilities while restoring fish details.
And (3) performing binarization processing on the ROI image based on the segmentation mask, eliminating a small-area noise region by combining morphological operation, finally reserving fish main pixels, and eliminating interference of underwater suspended matters and aquatic weed impurities, so that the output image only contains fish targets and contour details thereof. In morphological operation, firstly, open operation (corrosion and expansion) is carried out, 3X 3 rectangular cores are selected as structural elements, small area noise (such as bubbles and aquatic weed fragments, and the noise removal rate of 10 pixels is more than 90%) is eliminated, and then, closed operation (expansion and corrosion) is carried out, so that cavities (such as small gaps at fish mouths and fish gills) generated by shielding in the fish body are filled, and the profile of the fish body is more complete.
In this embodiment, the identification classification module 106 inputs the fish image after the segmentation of the main body into the migration learning identification model to perform classification and identification on the fish type, specifically:
And constructing a transfer learning model, and constructing a transfer learning identification model based on a pre-trained deep learning network ResNet, wherein the pre-training weight is from an ImageNet or underwater fish data set, and the learned general visual characteristics of the pre-training weight are used as basic parameters. In constructing a transfer learning model for fish identification, a pre-trained deep learning network ResNet is selected as an infrastructure. ResNet by virtue of the residual error connection structure, the gradient vanishing problem in deep network training can be effectively relieved, pre-training is finished on a large-scale universal image dataset ImageNet, and abundant universal visual features, such as basic modes of edges, outlines, textures and the like of objects, are learned. Meanwhile, weights pre-trained on the underwater fish data set can be adopted, and the weights are optimized for the underwater environment and are more fit for fish identification scenes. By taking the learned general visual features in the pre-training weights as basic parameters, the model has certain feature extraction capability during initialization, a learning basic visual mode is not needed from scratch, the data volume and calculation resources required by model training are greatly reduced, model convergence is accelerated, and the recognition basis of the fish features is promoted.
And (3) extracting and adapting image features, namely adjusting the fish image after the main body is segmented to the input size of the model, extracting high-level semantic features through a convolution layer and a pooling layer of the model, and introducing batch normalization and dropout regularization in a feature extraction stage. Firstly, since the model has a fixed requirement on the input size, the fish image obtained by dividing the main body needs to be adjusted to the corresponding size. This step ensures that the image can be smoothly input into the model for subsequent processing. Then, the convolution layer of the model is utilized to extract the layer-by-layer characteristics of the image, the convolution kernel slides on the image to capture the characteristics of different layers, and the characteristics range from simple edge and color information to complex fish body morphology and texture combination and the like; and the pooling layer downsamples the feature map obtained by convolution, reduces feature dimensions and calculated amount while retaining key features, and enables the model to pay more attention to global features. The regularization of dropout can randomly turn off partial neurons, avoid the model from excessively depending on specific features and from being over fitted, and can stably extract effective features when the model faces different underwater environments and fish images with different postures.
And when a new fish species is detected, the model supports incremental learning, and the self-adaptive identification of the new species is realized by updating the weight of the classification layer on line. After the high-level semantic features are extracted, the full-connection layer of the model integrates and maps the features, and converts the high-dimensional features into dimensions corresponding to the number of fish categories. Then, the softmax classifier processes the full-connection layer output and converts the full-connection layer output into probability distribution of fishes in each category, and the probability value is larger, so that the probability that the fishes in the image belong to the category is higher. Then, selecting the category with the highest probability, outputting corresponding fish category labels such as 'crucian', 'carp' and the like, and simultaneously outputting confidence scores of the labels to reflect the reliability of the identification result. When a new fish species is encountered, the model supports incremental learning, the whole model is not required to be retrained, and the weight of the classification layer is only required to be updated online, so that the model can learn the characteristic mode of the new fish species, thereby self-adaptively identifying the new species, and enabling the model to have the advantages of continuously expanding the identification capability and adapting to the dynamic change of the fish species.
As shown in fig. 2, the invention further provides a method for intelligently investigating the types and the amounts of fish, which comprises the following steps:
S1, extracting targets by a sonar image processing algorithm, counting multi-target tracking by using a nearest neighbor algorithm in combination with an extended Kalman filtering algorithm, counting the volume/surface density of the volume or area of a water area swept by a sonar, and estimating the quantity of fishes in combination with the area of the water area;
S2, acquiring water depth and turbidity data in real time by adopting a depth camera, synchronously marking an image time sequence, capturing a fish school image, covering a multi-form scene which is not limited to the side surface and the bending posture of a fish school to form a fish school three-dimensional image model;
S3, constructing an acoustic plus optical fish shoal high-dimensional image data set by correlating acoustic signals output by sonar detection with the fish shoal three-dimensional image model, and expanding the data set in a data enhancement mode which is not limited to rotation, scaling and cutting;
S4, inputting the preprocessed high-dimensional image data set into a fish detection model, calling a detection algorithm to detect fish in the image, and calculating the coordinate position of a fish detection frame in the image and the corresponding prediction score;
S5, after the rectangular frame position of the fish object is obtained, cutting the object, entering a fish segmentation model to carry out main segmentation, and removing underwater impurity interference pictures from the segmented pictures to enable the pictures to only contain the main identification object of the fish object;
S6, inputting the fish images after the main body segmentation into a transfer learning identification model, and classifying and identifying the fish types.
According to the intelligent fish investigation device and method provided by the invention, through multi-module cooperation and technology fusion, accurate investigation of the types and the quantity of fish is realized, and the working principle is as follows:
The device integrates low, medium and high frequency band transducers (30 kHz, 200kHz and 400 kHz), and can automatically switch frequencies according to the depths of fish shoals, wherein the low frequency band is used for a deep water area, the detection distance is ensured, the medium frequency band is used for a medium and deep water area, the balance distance and resolution are balanced, the high frequency band is used for a shallow water area, and the details of fish bodies are captured. The sonar scans the water area by fan-shaped wave beams, extracts fish shoal targets through an image processing algorithm, counts the number of targets by combining a tracking algorithm, calculates the density according to the volume or the area of the scanned area, and further estimates the total number of fish in the water area.
The method comprises the steps of acquiring a fish swarm image in real time by using a depth camera, synchronously recording data such as water depth, turbidity and the like, acquiring depth information by using a time-of-flight technology, constructing a fish swarm dynamic three-dimensional model by combining multi-frame image sequences, and covering various postures such as side surfaces, bending and the like of a fish body. When shooting, the light supplementing lamp can automatically adjust the brightness according to the water depth, and the light attenuation compensation algorithm is matched, so that the definition of an underwater image is improved through histogram equalization, an image enhancement model and feature fusion, and the problem of uneven illumination is solved.
And (3) correlating the acoustic signals detected by the sonar with the optical images acquired by the camera, denoising the sonar signals, synchronizing with the time of the images, calibrating depth data of the sonar signals and the images, extracting acoustic features (time frequency and space) and optical features (geometry and texture), and fusing to construct a high-dimensional data set. Meanwhile, data are expanded through modes of rotating, zooming, cutting out images, splicing a plurality of pictures and the like, different postures of fishes and multiple fish-shoal scenes are simulated, and data diversity is enhanced.
And extracting multi-scale features of the preprocessed image by using a target detection algorithm, generating a candidate frame, adjusting the position, and determining specific coordinates and belonging category probability of the fish in the image. Based on the detection result, cutting out the region of interest, inputting a segmentation model for pixel-level classification, removing underwater suspended matters, aquatic weeds and other impurities, only retaining the fish main body, and providing a pure image for subsequent identification.
The method is characterized in that an identification model is built based on a pre-trained deep learning network, the learned general visual features of the identification model are used for extracting high-level semantic features of fish, and the model is prevented from being fitted excessively by combining a regularization method. And generating fish class probability distribution through a classifier, and outputting the class with the highest probability and the confidence. When a new fish species is encountered, the model supports online updating of parameters, self-adaptive learning of new class characteristics, and continuous expansion of recognition capability is realized.
The method realizes automatic analysis from fish quantity statistics to species identification through a complete flow of sonar detection estimated quantity, image acquisition and model construction, data fusion and enhancement, target detection and positioning, main body segmentation denoising and transfer learning classification. The multi-band sonar and depth camera hardware cooperation, the cross-modal processing of acoustic optical data and the intelligent model with the incremental learning capability effectively solve the problems of poor depth adaptability, low image quality, difficult new species identification and the like in the traditional investigation, and provide a high-efficiency and accurate technical scheme for fishery resource management and ecological monitoring.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (9)

1.一种智能化调查鱼类种类和数量的装置,其特征在于,包括:1. An intelligent device for surveying fish species and quantities, characterized in that it comprises: 自适应声呐探测模块,集成30kHz、200kHz、400kHz三频段换能器,根据鱼群深度动态切换频率,生成回声强度特征向量;通过声呐图像处理算法提取目标,用最近邻算法结合扩展卡尔曼滤波算法对多目标跟踪计数,统计声呐扫过水域体积的体密度或面积的面密度,结合水域面积估计鱼类数量;The adaptive sonar detection module integrates three-band transducers of 30kHz, 200kHz, and 400kHz, dynamically switches frequencies according to the depth of the fish school, and generates echo intensity feature vectors. It extracts targets through sonar image processing algorithms, uses the nearest neighbor algorithm combined with the extended Kalman filter algorithm to track and count multiple targets, and counts the volume density or surface density of the water area swept by the sonar, and estimates the number of fish by combining the water area. 三维图像采集模块,采用深度相机实时采集水深、浊度数据并同步标记图像时序,并抓拍鱼群图像,覆盖鱼体侧面、弯曲姿态的多形态场景,形成鱼群三维图像模型;同时,抓拍鱼群时采用补光灯依据水深动态调节光照强度,通过光衰减补偿算法适配水下光环境;The 3D image acquisition module uses a depth camera to collect water depth and turbidity data in real time and simultaneously mark the image time sequence, and capture images of fish schools, covering multiple morphological scenes such as the side of the fish body and bending posture, to form a 3D image model of the fish school; at the same time, when capturing fish schools, the supplementary light is used to dynamically adjust the light intensity according to the water depth, and the light attenuation compensation algorithm is used to adapt to the underwater light environment. 图像预处理模块,将所述鱼群三维图像模型关联声呐探测输出的声学信号,构建声学加光学的鱼群高维图像数据集,通过旋转、缩放、裁剪的数据增强‌方式扩充数据集;The image preprocessing module associates the three-dimensional image model of the fish school with the acoustic signals output by sonar detection, constructs a high-dimensional image dataset of the fish school with acoustic and optical capabilities, and expands the dataset through data augmentation methods such as rotation, scaling, and cropping. 鱼类检测模块,用于将预处理后的高维图像数据集输入到鱼类检测模型,调用检测算法对图像中的鱼类进行检测,计算出图像中的鱼类检测框的坐标位置,以及对应的预测分数;The fish detection module is used to input the preprocessed high-dimensional image dataset into the fish detection model, call the detection algorithm to detect fish in the image, calculate the coordinate position of the fish detection box in the image, and the corresponding prediction score. 主体分割模块,在得到鱼类目标矩形框位置之后,完成目标的裁剪,并进入鱼类分割模型进行主体分割,经过分割后的图片,剔除了水下杂质干扰画面,使图像只包含了鱼类目标这个识别对象;The main body segmentation module, after obtaining the location of the fish target rectangle, completes the target cropping and enters the fish segmentation model for main body segmentation. After segmentation, underwater impurities that interfere with the image are removed, so that the image only contains the fish target as the recognition object. 识别分类模块,将主体分割后的鱼类图像输入到迁移学习识别模型,对鱼类种类进行分类识别;The identification and classification module inputs the fish images after subject segmentation into the transfer learning identification model to classify and identify fish species; 所述通过光衰减补偿算法适配水下光环境,具体为:The adaptation to the underwater light environment through the light attenuation compensation algorithm is specifically as follows: 对采集的鱼类图像进行光照均衡处理,采用自适应直方图均衡化算法,根据光衰减补偿后的光强分布调整图像对比度;The acquired fish images were subjected to illumination equalization processing. An adaptive histogram equalization algorithm was used to adjust the image contrast based on the light intensity distribution after light attenuation compensation. 引入水下光衰减补偿网络,基于U-Net架构构建光衰减补偿模型,输入原始图像和光衰减系数,输出光衰减补偿后的图像,采用公式:An underwater optical attenuation compensation network is introduced. An optical attenuation compensation model is constructed based on the U-Net architecture. The input is the original image and the optical attenuation coefficient, and the output is the image after optical attenuation compensation. The formula is as follows: ; 计算光衰减补偿增强值,结合光衰减补偿前后的图像特征,通过特征融合层将补偿后的图像特征与原始图像特征融合;Calculate the optical attenuation compensation enhancement value, combine the image features before and after optical attenuation compensation, and fuse the compensated image features with the original image features through a feature fusion layer; 其中,代表通道上像素位置的光衰减补偿增强值,代表原始图像中通道上像素位置的光强值,代表像素位置处的局部光衰减估计因子,代表通道上像素位置的色彩偏移修正因子,代表通道上像素位置的补偿扰动因子,代表通道上像素位置的预估衰减值,代表通道上以像素位置为中心邻域的预估衰减值的局部平均值。in, Representative Channel Upper pixel position The optical attenuation compensation enhancement value, Represents the channels in the original image Upper pixel position The light intensity value, Represents pixel position Local optical attenuation estimation factor at the location, Representative Channel Upper pixel position Color shift correction factor, Representative Channel Upper pixel position The compensation disturbance factor. Representative Channel Upper pixel position The estimated attenuation value, Representative Channel The above pixel position It is the local average of the estimated decay values in the central neighborhood. 2.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述深度相机在实时采集水深、浊度数据并同步标记图像时序,抓拍鱼群图像以形成鱼群三维图像模型时,采用如下步骤:2. The intelligent device for investigating fish species and quantities according to claim 1, characterized in that, when the depth camera collects water depth and turbidity data in real time and simultaneously marks the image time sequence to capture images of fish schools to form a three-dimensional image model of the fish school, it adopts the following steps: 采用TOF飞行时间技术获取鱼群的深度信息,与采集的鱼群图像像素信息进行融合,生成带有深度通道的鱼群图像数据;The depth information of the fish school is obtained by using Time-of-Flight (TOF) technology and fused with the pixel information of the collected fish school image to generate fish school image data with depth channels. 基于多帧图像序列,利用SFM运动恢复结构算法,结合标记的图像时序,构建鱼群在一定时间段内的动态三维图像模型,展示鱼群游动过程中的姿态变化;Based on a multi-frame image sequence, the SFM structure-of-motion (SOFM) algorithm is used, combined with the time sequence of the labeled images, to construct a dynamic 3D image model of a school of fish over a certain period of time, showing the posture changes of the school of fish during swimming. 通过调整深度相机的视场角、分辨率参数,确保在不同的水下距离范围内,均能以不低于一定的点云密度采集鱼群的深度数据。By adjusting the field of view and resolution parameters of the depth camera, it is ensured that depth data of fish schools can be collected at a point cloud density of no less than a certain level at different underwater distances. 3.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,将所述鱼群三维图像模型关联声呐探测输出的声学信号,构建声学加光学的鱼群高维图像数据集,具体为:3. The intelligent device for investigating fish species and quantities according to claim 1, characterized in that, by associating the three-dimensional image model of the fish school with the acoustic signal output from sonar detection, a high-dimensional image dataset of the fish school combining acoustic and optical methods is constructed, specifically as follows: 对30kHz、200kHz、400kHz 三频段声呐回波信号分别进行带通滤波预处理,去除环境噪声及频率混叠干扰;Bandpass filtering was performed on the sonar echo signals of the three frequency bands of 30kHz, 200kHz and 400kHz to remove environmental noise and frequency aliasing interference. 采用GPS授时模块对声呐信号与光学图像进行时间戳同步,确保同一时刻数据对齐;A GPS timing module is used to timestamp and synchronize sonar signals and optical images to ensure data alignment at the same time. 通过扩展卡尔曼滤波算法校准声呐探测深度与光学采集深度;The sonar detection depth and optical acquisition depth are calibrated using an extended Kalman filter algorithm. 提取声呐信号的时频特征、空间特征以及三维图像的几何特征、纹理特征;Extract the time-frequency and spatial features of sonar signals, as well as the geometric and texture features of three-dimensional images; 对提取的声学特征与光学特征进行加权融合,并对融合后的高维特征进行降维处理;The extracted acoustic and optical features are weighted and fused, and the high-dimensional features after fusion are then reduced in dimensionality. 对融合后的数据集进行行为模式、鱼种属性及环境参数的分层标注,对数据集进行跨模态一致性检验、样本均衡化处理及增量学习验证。The fused dataset is hierarchically labeled with behavioral patterns, fish species attributes, and environmental parameters. Cross-modal consistency testing, sample equalization processing, and incremental learning validation are performed on the dataset. 4.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述通过旋转、缩放、裁剪的数据增强‌方式扩充数据集,具体为:4. The intelligent device for investigating fish species and quantities according to claim 1, characterized in that the data augmentation method of expanding the dataset through rotation, scaling, and cropping specifically includes: 对主体分割后的鱼类图像进行旋转处理,旋转角度范围为-15°至15°,模拟鱼类在水下的不同游动姿态;The fish image after subject segmentation is rotated, with the rotation angle ranging from -15° to 15°, to simulate different swimming postures of fish underwater. 对图像进行缩放处理,缩放比例控制在0.8至1.2倍之间,以适应不同拍摄距离下鱼体大小的变化;The images are scaled, with the scaling ratio controlled between 0.8 and 1.2 times, to accommodate changes in fish size at different shooting distances; 采用随机裁剪方式,从原始图像中裁剪出包含完整鱼体的区域,裁剪尺寸不小于原图像的70%;Using a random cropping method, the region containing the complete fish body is cropped from the original image, with the cropped size not less than 70% of the original image; 引入Mosaic数据增强,将多张不同的鱼类图像随机拼接,模拟多鱼群场景。Mosaic data augmentation is introduced, which randomly stitches together multiple different fish images to simulate a scene with multiple schools of fish. 5.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述自适应声呐探测模块统计声呐扫过水域体积的体密度或面积的面密度,结合水域面积估计鱼类数量,具体为:5. The intelligent device for investigating fish species and quantities according to claim 1, characterized in that the adaptive sonar detection module calculates the volume density or surface density of the water area swept by the sonar, and estimates the fish quantity by combining the water area, specifically as follows: 声呐以扇形波束扫描水域,扫描角度为,最大探测距离为,深度为时,扫描体积 ,水平扫描面积The sonar scans the water area with a fan-shaped beam at a scanning angle of [missing information]. The maximum detection range is Depth is At that time, scan volume Horizontal scan area ; 将跟踪计数的目标数 除以扫描体积或面积,得到体密度或面密度The target number of tracking counts Divide by the scanned volume or area to obtain the volume density. or areal density ; 已知水域总面积时,总数量 ,或结合平均深度计算总体积后估算。The total area of the water body is known. At that time, the total number Alternatively, the total volume can be estimated by combining the average depth calculation. 6.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述鱼类检测模块调用检测算法对图像中的鱼类进行检测,计算出图像中的鱼类检测框的坐标位置,以及对应的预测分数,具体为:6. The intelligent apparatus for investigating fish species and quantities according to claim 1, characterized in that the fish detection module calls a detection algorithm to detect fish in the image, calculates the coordinate position of the fish detection box in the image, and the corresponding prediction score, specifically as follows: 通过Faster R-CNN检测算法提取预处理图像的多尺度视觉特征,捕捉鱼类的形态、纹理及边缘信息;The Faster R-CNN detection algorithm is used to extract multi-scale visual features from preprocessed images to capture the morphology, texture, and edge information of fish. 基于特征图生成锚框,通过回归计算调整锚框位置,确定鱼类检测框的坐标;Anchor boxes are generated based on feature maps, and their positions are adjusted through regression calculations to determine the coordinates of the fish detection boxes. 利用分类器对每个检测框内的目标进行类别概率计算,输出预测分数及对应的鱼类类别标签。The classifier is used to calculate the class probability of the target in each detection box, and the predicted score and the corresponding fish category label are output. 7.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述主体分割模块,在得到鱼类目标矩形框位置之后,完成目标的裁剪,并进入鱼类分割模型进行主体分割,经过分割后的图片,剔除了水下杂质干扰画面,具体为:7. The intelligent device for investigating fish species and quantities according to claim 1, characterized in that, after obtaining the position of the target rectangular frame of the fish, the main body segmentation module completes the cropping of the target and enters the fish segmentation model for main body segmentation. After segmentation, underwater impurities interfering with the image are removed, specifically: 基于鱼类检测模块输出的矩形框坐标,对预处理后的综合图像进行裁剪,提取包含鱼类目标的感兴趣区域ROI,剔除矩形框外的背景区域,缩小后续处理范围;Based on the bounding box coordinates output by the fish detection module, the preprocessed composite image is cropped to extract the region of interest (ROI) containing the fish target, and the background area outside the bounding box is removed to narrow down the scope of subsequent processing. 将裁剪后的ROI图像输入至基于U-Net的鱼类分割模型,通过编码器-解码器结构对图像中的像素进行分类,生成鱼类主体与背景的掩码,其中模型预训练权重来自水下生物数据集;The cropped ROI image is input into a U-Net-based fish segmentation model. The encoder-decoder structure classifies the pixels in the image and generates masks for the fish body and background. The model's pre-training weights are derived from an underwater biological dataset. 基于分割掩码对ROI图像进行二值化处理,结合形态学操作消除小面积噪声区域,最终保留鱼类主体像素并剔除水下悬浮物、水草杂质干扰,使输出图像仅包含鱼类目标及其轮廓细节。The ROI image is binarized based on segmentation masking, and small noise areas are eliminated by combining morphological operations. Finally, the main pixels of the fish are retained and underwater suspended objects and aquatic plants are removed to make the output image contain only the fish target and its outline details. 8.根据权利要求1所述的一种智能化调查鱼类种类和数量的装置,其特征在于,所述识别分类模块,将主体分割后的鱼类图像输入到迁移学习识别模型,对鱼类种类进行分类识别,具体为:8. The intelligent apparatus for investigating fish species and quantities according to claim 1, characterized in that the identification and classification module inputs the segmented fish image into a transfer learning identification model to classify and identify fish species, specifically as follows: 迁移学习模型构建,基于预训练的深度学习网络ResNet构建迁移学习识别模型,其中预训练权重来自ImageNet或水下鱼类数据集,利用其已学习的通用视觉特征作为基础参数;The transfer learning model is constructed based on the pre-trained deep learning network ResNet, where the pre-trained weights come from ImageNet or underwater fish datasets, using their learned general visual features as basic parameters. 图像特征提取与适配,将主体分割后的鱼类图像调整至模型输入尺寸,通过模型的卷积层与池化层提取高层语义特征,在特征提取阶段引入批归一化与 dropout正则化;Image feature extraction and adaptation: The fish image after subject segmentation is adjusted to the model input size. High-level semantic features are extracted through the model's convolutional and pooling layers. Batch normalization and dropout regularization are introduced in the feature extraction stage. 分类识别与结果输出,通过模型的全连接层与softmax分类器对提取的特征进行映射,生成各类别鱼类的概率分布,输出概率最高的鱼类类别标签及对应的置信度分数;当检测到新鱼种时,模型支持增量学习,通过在线更新分类层权重实现对新类别的自适应识别。The classification and output process maps the extracted features through the fully connected layers and softmax classifier of the model to generate the probability distribution of each fish category, and outputs the label of the fish category with the highest probability and the corresponding confidence score. When a new fish species is detected, the model supports incremental learning and achieves adaptive recognition of the new category by updating the classification layer weights online. 9.一种智能化调查鱼类种类和数量的方法,其特征在于,包括以下步骤:9. A method for intelligently surveying fish species and quantities, characterized by comprising the following steps: 通过集成的30kHz、200kHz、400kHz三频段换能器,根据鱼群深度动态切换频率,生成回声强度特征向量;通过声呐图像处理算法提取目标,用最近邻算法结合扩展卡尔曼滤波算法对多目标跟踪计数,统计声呐扫过水域体积的体密度或面积的面密度,结合水域面积估计鱼类数量;By integrating a 30kHz, 200kHz, and 400kHz three-band transducer, the frequency is dynamically switched according to the depth of the fish school to generate an echo intensity feature vector; the target is extracted by sonar image processing algorithm, and multiple targets are tracked and counted by the nearest neighbor algorithm combined with the extended Kalman filter algorithm. The volume density or surface density of the water area swept by the sonar is statistically analyzed, and the number of fish is estimated by combining the water area. 采用深度相机实时采集水深、浊度数据并同步标记图像时序,并抓拍鱼群图像,覆盖鱼体侧面、弯曲姿态的多形态场景,形成鱼群三维图像模型;同时,抓拍鱼群时采用补光灯依据水深动态调节光照强度,通过光衰减补偿算法适配水下光环境;A depth camera is used to collect water depth and turbidity data in real time and simultaneously mark the image time sequence. It also captures images of fish schools, covering various scenes of fish body side and bending posture, to form a three-dimensional image model of fish schools. At the same time, when capturing images of fish schools, a supplementary light is used to dynamically adjust the light intensity according to the water depth, and a light attenuation compensation algorithm is used to adapt to the underwater light environment. 将所述鱼群三维图像模型关联声呐探测输出的声学信号,构建声学加光学的鱼群高维图像数据集,通过旋转、缩放、裁剪的数据增强‌方式扩充数据集;The three-dimensional image model of the fish school is associated with the acoustic signal output by sonar detection to construct a high-dimensional image dataset of the fish school with acoustic and optical capabilities. The dataset is then expanded by data augmentation methods such as rotation, scaling, and cropping. 将预处理后的高维图像数据集输入到鱼类检测模型,调用检测算法对图像中的鱼类进行检测,计算出图像中的鱼类检测框的坐标位置,以及对应的预测分数;The preprocessed high-dimensional image dataset is input into the fish detection model, the detection algorithm is called to detect fish in the image, and the coordinates of the fish detection boxes in the image and the corresponding prediction scores are calculated. 在得到鱼类目标矩形框位置之后,完成目标的裁剪,并进入鱼类分割模型进行主体分割,经过分割后的图片,剔除了水下杂质干扰画面,使图像只包含了鱼类目标这个识别对象;After obtaining the bounding box position of the fish target, the target is cropped and then entered into the fish segmentation model for subject segmentation. After segmentation, underwater impurities that interfere with the image are removed, so that the image only contains the fish target as the identification object. 将主体分割后的鱼类图像输入到迁移学习识别模型,对鱼类种类进行分类识别;The fish images after subject segmentation are input into a transfer learning recognition model to classify and identify fish species; 所述通过光衰减补偿算法适配水下光环境,具体为:The adaptation to the underwater light environment through the light attenuation compensation algorithm is specifically as follows: 对采集的鱼类图像进行光照均衡处理,采用自适应直方图均衡化算法,根据光衰减补偿后的光强分布调整图像对比度;The acquired fish images were subjected to illumination equalization processing. An adaptive histogram equalization algorithm was used to adjust the image contrast based on the light intensity distribution after light attenuation compensation. 引入水下光衰减补偿网络,基于U-Net架构构建光衰减补偿模型,输入原始图像和光衰减系数,输出光衰减补偿后的图像,采用公式:An underwater optical attenuation compensation network is introduced. An optical attenuation compensation model is constructed based on the U-Net architecture. The input is the original image and the optical attenuation coefficient, and the output is the image after optical attenuation compensation. The formula is as follows: ; 计算光衰减补偿增强值,结合光衰减补偿前后的图像特征,通过特征融合层将补偿后的图像特征与原始图像特征融合;Calculate the optical attenuation compensation enhancement value, combine the image features before and after optical attenuation compensation, and fuse the compensated image features with the original image features through a feature fusion layer; 其中,代表通道上像素位置的光衰减补偿增强值,代表原始图像中通道上像素位置的光强值,代表像素位置处的局部光衰减估计因子,代表通道上像素位置的色彩偏移修正因子,代表通道上像素位置的补偿扰动因子,代表通道上像素位置的预估衰减值,代表通道上以像素位置为中心邻域的预估衰减值的局部平均值。in, Representative Channel Upper pixel position The optical attenuation compensation enhancement value, Represents the channels in the original image Upper pixel position The light intensity value, Represents pixel position Local optical attenuation estimation factor at the location, Representative Channel Upper pixel position Color shift correction factor, Representative Channel Upper pixel position The compensation disturbance factor. Representative Channel Upper pixel position The estimated attenuation value, Representative Channel The above pixel position It is the local average of the estimated decay values in the central neighborhood.
CN202511111501.1A 2025-08-08 2025-08-08 Device and method for intelligently investigating fish types and amounts Active CN121074619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202511111501.1A CN121074619B (en) 2025-08-08 2025-08-08 Device and method for intelligently investigating fish types and amounts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202511111501.1A CN121074619B (en) 2025-08-08 2025-08-08 Device and method for intelligently investigating fish types and amounts

Publications (2)

Publication Number Publication Date
CN121074619A CN121074619A (en) 2025-12-05
CN121074619B true CN121074619B (en) 2026-02-10

Family

ID=97842750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202511111501.1A Active CN121074619B (en) 2025-08-08 2025-08-08 Device and method for intelligently investigating fish types and amounts

Country Status (1)

Country Link
CN (1) CN121074619B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106707287A (en) * 2016-12-23 2017-05-24 浙江大学 Fish school quantity estimation method based on extended Kalman filtering combined with nearest neighbor clustering algorithm
CN119445348A (en) * 2024-10-15 2025-02-14 绍兴市曹娥江大闸投资开发有限公司 An improved YOLOv8 fish image recognition method based on transfer learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120219263B (en) * 2025-05-28 2025-09-23 上海东欣软件工程有限公司 Underwater image enhancement method for complex environment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106707287A (en) * 2016-12-23 2017-05-24 浙江大学 Fish school quantity estimation method based on extended Kalman filtering combined with nearest neighbor clustering algorithm
CN119445348A (en) * 2024-10-15 2025-02-14 绍兴市曹娥江大闸投资开发有限公司 An improved YOLOv8 fish image recognition method based on transfer learning

Also Published As

Publication number Publication date
CN121074619A (en) 2025-12-05

Similar Documents

Publication Publication Date Title
CN117079117B (en) Underwater image processing and target identification method and device, storage medium and electronic equipment
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN114677554A (en) A statistical filtering infrared small target detection and tracking method based on YOLOv5 and Deepsort
CN110246151B (en) A method for underwater robot target tracking based on deep learning and monocular vision
CN117058232A (en) An improved YOLOv8 model for position detection of fish target individuals in farmed fish schools
CN115393734B (en) SAR image ship contour extraction method based on Faster R-CNN and CV model combination method
WO2019071976A1 (en) Panoramic image saliency detection method based on regional growth and eye movement model
CN103902972A (en) Water surface moving platform visual system image analyzing and processing method
CN118397074B (en) Fish target length detection method based on binocular vision
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
CN112102197A (en) Underwater target detection system and method for assisting diver
CN115294322A (en) Underwater ship bottom suspicious target detection method and device, electronic equipment and readable medium
CN116129320A (en) Target detection method, system and equipment based on video SAR
CN113177960A (en) ROI monitoring video extraction platform with edge supporting background modeling
CN114037737B (en) Neural network-based offshore submarine fish detection and tracking statistical method
CN117372680A (en) Target detection method based on fusion of binocular camera and laser radar
CN116819540B (en) Method for intelligently calculating type and depth of fishing group
CN116343078A (en) Target tracking method, system and equipment based on video SAR
CN119559490B (en) A real-time underwater disease identification method for East Star Grouper based on PLDNET network
CN121074619B (en) Device and method for intelligently investigating fish types and amounts
CN114140698A (en) Water system information extraction algorithm based on FasterR-CNN
CN118941781A (en) A dim target detection method based on deep learning
CN113284135B (en) SAR ship detection method based on global and local context information
CN115205668B (en) A Deep Learning-Based Method and System for Detecting Web Attachments
CN121236797B (en) A method and system for evaluating feeding amount during the growth stages of large yellow croaker based on a large model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant