CN118397074B - Fish target length detection method based on binocular vision - Google Patents

Fish target length detection method based on binocular vision Download PDF

Info

Publication number
CN118397074B
CN118397074B CN202410677436.8A CN202410677436A CN118397074B CN 118397074 B CN118397074 B CN 118397074B CN 202410677436 A CN202410677436 A CN 202410677436A CN 118397074 B CN118397074 B CN 118397074B
Authority
CN
China
Prior art keywords
fish
image
key points
curve
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410677436.8A
Other languages
Chinese (zh)
Other versions
CN118397074A (en
Inventor
付民
翟桂星
孙梦楠
郑冰
俞智斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanya Institute Of Oceanography Ocean University Of China
Original Assignee
Sanya Institute Of Oceanography Ocean University Of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanya Institute Of Oceanography Ocean University Of China filed Critical Sanya Institute Of Oceanography Ocean University Of China
Priority to CN202410677436.8A priority Critical patent/CN118397074B/en
Publication of CN118397074A publication Critical patent/CN118397074A/en
Application granted granted Critical
Publication of CN118397074B publication Critical patent/CN118397074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a fish target length detection method based on binocular vision, and belongs to the technical field of image processing in computer vision. Firstly, underwater calibration is carried out on the binocular camera, secondly, a diffusion model is introduced into underwater image enhancement, and the capability of enhancing underwater pictures is further improved by introducing wavelet transformation. And then, the improvement is provided for YOLOV, so that the identification and detection capability of fish targets is improved. Then accurately describing the posture and the shape of a fish target, namely the body posture and the orientation of the fish by detecting and positioning key points of the fish body, finally carrying out three-dimensional reprojection by combining internal parameters and external parameters of a binocular camera, carrying out coordinate conversion on the key points, and fitting a pose curve of the fish by using a corresponding curve fitting algorithm according to the coordinates of the key points after reprojection; integrating the pose curve obtained by fitting, and calculating the length of the curve, namely the length of the fish. Provides a brand new non-invasive method for fish research.

Description

Fish target length detection method based on binocular vision
Technical Field
The invention belongs to the technical field of image processing in computer vision, and particularly relates to a fish target length detection method based on binocular vision.
Background
With the growing global interest in fish resources, an important challenge facing the fish farming industry is how to efficiently and accurately manage and cultivate large quantities of fish. The growth rate and the body size of the fish directly affect the culture benefit and the yield value of the fish, so that accurately measuring the length of the fish becomes a key index for assessing and monitoring the health of the fish. The binocular stereoscopic vision assisted fish personalized intelligent cultivation is a research hot spot in the field of aquatic cultivation in recent years, and the core method is to measure the fish length in a contactless manner through a binocular camera so as to realize more efficient and sustainable cultivation production. Provides a new thought and method for the breeding industry, is beneficial to improving the breeding efficiency, reducing the resource consumption and promoting the sustainable development of the breeding industry. The fish target length measuring method based on binocular vision can be used for detecting the growth of fish shoals through the underwater intelligent vision system, so that accurate fish length information is obtained, and management of a fishing ground is facilitated.
In existing fish length detection techniques, although there have been many attempts, there are still some problems and challenges. The following are the technical problems that our solution may solve for the analysis of the prior art.
The traditional measurement method often needs manual fishing for measurement, which not only increases the workload of fishermen, but also causes damage to fishes, and can not meet the differential requirements among different fish individuals, thereby causing the problems of resource waste and unbalanced growth. And may result in a large error in the overall estimate.
Machine vision based methods: the fish body length measurement by using machine vision mainly depends on an image acquisition device, a fish body contour extraction algorithm and a length measurement method. For example, the fish body contour is extracted by using a minimum bounding rectangle, hough transform and the like. However, these methods generally can only process two-dimensional images, and may not accurately capture three-dimensional information of the fish body. In addition, image quality is affected by various factors such as lighting conditions, water quality, and fish pose, which can affect measurement accuracy.
Deep learning-based method: by creating a Fish-Keypoints dataset and using deep learning techniques to detect Fish and their Keypoints, more accurate dimensional measurements are achieved. However, although deep learning can improve the accuracy of detection, the training process requires a large amount of annotation data, and the generalization ability of the model and the ability to adapt to different environmental conditions remain a challenge.
Imaging sonar-based methods: the imaging sonar is used for measuring the length of the fish, so that the linear relation between the imaging sonar and the length of the tail fork is good, and the error is small. However, although imaging sonar can provide a non-contact measurement, its resolution and measurement range are limited and may not be suitable for all types and sizes of fish.
Disclosure of Invention
Aiming at the problems, the invention provides a binocular vision-based fish target length detection method, which comprises the following steps of:
step 1, acquiring image information of an underwater target fish body by adopting a binocular camera, and performing double-target fixed acquisition of internal parameters and external parameters of the camera;
Step 2, carrying out image enhancement on the image obtained in the step 1 based on a diffusion model of wavelet transformation;
step 3, inputting the enhanced image data into an improved YOLOV model, and identifying fish targets;
step 4, detecting key points of the fish body based on the fish targets identified in the step 3 by using TransPose models;
Step 5, based on the key points detected in the step 4, carrying out three-dimensional re-projection by combining internal parameters and external parameters of the binocular camera, carrying out coordinate conversion on the key points, and fitting a pose curve of the fish by using a corresponding curve fitting algorithm according to the coordinates of the re-projected key points; integrating the pose curve obtained by fitting, and calculating the length of the curve, namely the length of the fish.
Preferably, the double-target calibration in the step 1 is to perform double-target calibration by adopting a Zhang Zhengyou calibration method and a GP290 checkerboard calibration plate.
Preferably, the step 1 further includes a stereo correction process for aligning the images of the left and right cameras; by adopting a three-dimensional correction method based on a parallax map, the internal parameters and external parameters of the cameras are adjusted by calculating the parallax between the left camera image and the right camera image, so that the alignment of the images is realized.
Preferably, the specific process of the step 2 is as follows:
S21, firstly decomposing the fish picture obtained in the step 1 by using a one-dimensional discrete Haar wavelet transform 1D-DWT, and calculating the following formula:
Wherein the method comprises the steps of The low frequency information representing the picture is displayed,Representing high-frequency information of the picture; Representing an original picture;
S22, enhancing the low-frequency picture through a diffusion model, wherein the diffusion model is divided into a forward diffusion part and a reverse denoising part; forward diffusion first uses a fixed variance scheme Gradually input through T stepsConversion to corrupted noise dataThe process is as follows:
Wherein, AndPredefined variances at corrupted noise data and time step T, respectively, N representing gaussian distributions; x0 is original data; the transition probability from t-1 to t is
The process of reverse denoising is as follows:
Wherein, Is under the condition thatAfter occurrence ofProbability of occurrence;
s23, enhancing the high-frequency information by adopting a cross attention mechanism, wherein the process is as follows:
,
Wherein LH is low and high, LH represents a horizontal high-frequency component, which contains edge and detail information in the horizontal direction in the image, HL is high and low, and HL represents a vertical high-frequency component, which contains edge and detail information in the vertical direction in the image; HH is high, HH denotes a high frequency component, where Conv is convolution, As a result of the LH convolution,As a result of the HL convolution,Results after HH convolution; it contains detailed information that changes rapidly in the horizontal and vertical directions in the image.
Preferably, the improved YOLOV model in the step 3 is specifically: introducing DCNV modules into a neck network of the YOLOV8 model to replace C2f modules in the PAN modules, and simultaneously, in a backbone network, replacing original C2f modules by using C2f_ ScConv modules;
the DCN V3 module allows flexible learning of sampling offset, and autonomously learns the sampling offset suitable for long-range or short-range dependency according to given data; meanwhile, the DCN V3 module can adaptively adjust convolution operation according to input data, and a 3×3 convolution window is adopted conventionally;
Wherein, the SCConv sub-module in the c2f_ ScConv module is composed of two units, namely a space reconstruction unit SRU and a channel reconstruction unit CRU; the SRU adopts a separation and reconstruction method to inhibit space redundancy, and the CRU adopts a split transformation and fusion strategy to reduce channel redundancy; the channel reconstruction unit CRU replaces standard convolution, and is realized through three operators, namely splitting, transforming and fusing; the spatial reconstruction unit SRU employs an independent reconstruction operation, suppressing spatial redundancy by a separation-reconstruction method.
Preferably, in the step 4, key point detection is performed by using a TransPose model, and the construction process of the TransPose model is as follows:
S41, acquiring a large number of underwater fish image data sets through a binocular camera, and dividing a training set and a testing set;
S42, labeling fish key points in the image by labelme, wherein the fish key points comprise a head, eyes, a dorsal fin front end root, a dorsal fin tail end root, a pectoral fin root, a gluteal fin root and a caudal fin root;
S43, constructing TransPose models, wherein the models comprise three parts of a CNN backbone network, an encoder group and a heat map output head group; feature extraction and downsampling are performed through the CNN backbone network. The feature map is resized by the cropping layer to fit the shape of the encoder. The adjusted feature map is treated as a sequence and then input to N concatenated encoders. The encoder uses a self-attention mechanism to establish dependencies between features, progressively extracting higher level feature representations by encoding the sequence multiple times. The output of each encoder is an encoded signature sequence. In the heat map output head group, the output of the last encoder is used as the input for generating the key point heat map, and the key point heat map is generated;
S44, performing end-to-end supervised learning on the built transPose model by using the labeling completion training set and the testing set, wherein the heat map loss is calculated by using a mean square error MSE by using a construction loss function, the model adjusts the prediction result of the key points by optimizing the loss function to enable the prediction result to be close to the real position, and finally, the final coordinate of each key point can be determined by using a maximum value activation function TransPose model.
Preferably, in the step 5, three-dimensional reprojection is performed by combining internal parameters and external parameters of the binocular camera, and coordinate conversion is performed on the key points, specifically:
Taking the optical center of the left camera as the origin of a world coordinate system, assuming that the X axis and the Y axis are the same as the imaging plane of the left camera, and the Z axis represents the direction in which the camera faces, calculating the space coordinates of the key points according to the similar triangle theorem ; First, pixel coordinates of key points in the left camera and the right camera are respectively expressed asAnd; Focal distance under waterThe value of (2) is 1.33 times of that in the air; the calculation process is as follows:
Wherein, B is the base line length, i.e. the distance between the left and right cameras; Is the length in the imaging plane.
Preferably, in the step 5, a corresponding curve fitting algorithm is used to fit the pose curve of the fish, the pose curve obtained by fitting is integrated, and the length of the curve is calculated, wherein the specific process is as follows:
s1, in world coordinates, recording head key points as The central point of the connecting line of the root key point of the front end of the dorsal fin and the root key point of the pectoral fin isThe central point of the connection line between the dorsal fin terminal root point and the hip fin root point isThe root key point of the tail fin is
S2, after the key points are set) Then, fitting the four key points to form a curve
To limit the twisting of the fish pose curve to a certain two-dimensional plane under a three-dimensional coordinate systemAn expression is used to define the equation for this plane Φ, which is shown below:
Wherein, Representing the coordinates on the X-axis,Representing the coordinates on the Y-axis,Representing the coordinate on the Z axis; wherein the method comprises the steps ofAs a coefficient byObtained by using a linear regression mode;
The process is as follows:
S3, obtaining the formula ) In plane surfaceMiddle and upper coordinate points [ ]) ; The process is as follows:
s4, constructing a new coordinate system by taking phi as a subspace to Is the origin ofFor the X-axis direction, the coordinates in a new coordinate system on the phi plane are calculated according to the following formula
S5, recordThe expression in the phi plane isWherein a, b, c are calculated from the following formula:
Wherein a, b and c are equation parameters, and are recorded Is thatAn expression in the Φ plane;
S6, finally to Performing curve integral summation to obtain the lengthI.e. the fish length, the process is as follows:
Where L is the length of the fish measured last and X base is the expression of the unknowns in the phi plane. Compared with the prior art, the invention has the following beneficial effects:
1. The fish length is measured by using a binocular camera in a non-contact mode, so that damage to fish in the traditional manual measurement is avoided. By using Zhang Zhengyou calibration method to obtain the accurate parameters of the binocular camera, the error in the subsequent measurement is greatly reduced. The binocular camera can provide more accurate depth information when measuring fish size underwater, better address light refraction problems in underwater environments, and can perform more complex visual tasks. Binocular camera applications are not limited to size measurement, but also include tracking of fish swimming three-dimensional trajectories, and fine analysis of underwater targets.
2. The fish image is processed through the proposed diffusion model based on wavelet transformation, the diffusion model is combined with the wavelet transformation, so that the computational effort is reduced under the condition of not losing information, the wavelet transformation can halve the space dimension under the condition of not sacrificing the information, and other transformation technologies such as Fast Fourier Transformation (FFT) and Discrete Cosine Transformation (DCT) can cause information loss. The wavelet transform may decompose the underwater image into frequency bands of different scales, thereby enabling capturing detailed information on different scales. Image details may exist on different scales due to light attenuation and scattering in an underwater environment, so that information on different scales can be extracted through wavelet transformation, which helps to recover the details of the image. Diffusion models can model complex features of an image by learning nonlinear relationships of large amounts of data. Compared with the traditional enhancement algorithm based on a rule or a statistical method, the diffusion model can better capture the advanced semantic and structural information in the image, so that the image enhancement can be performed more accurately.
The diffusion model based on wavelet transformation is used for enhancing, so that the scattering effect is reduced, the definition and detail of the image are restored, and the image can obtain a better effect in a later task. Meanwhile, noise reduction and enhancement of the underwater pictures are realized, and after the image enhancement is carried out by the method, the situation of false detection is reduced while the target detection of the underwater fish is improved.
3. In order to adapt to the specificity of the underwater environment, the method improves YOLOv algorithm. The method replaces the C2f module in the PAN module by introducing Yolov a DCN V3 network. By utilizing the long-distance modeling capability of DCN V3, yolov can better capture the relevance between targets and improve the detection performance of remote targets. The self-adaptive space aggregation capability of DCN V3 is utilized, the detection capability of the DCN V3 on targets with different scales is enhanced, meanwhile, the induction deviation of regular convolution is reduced, and the generalization capability of a model is improved. Although the Yolov network has significant performance, this is at the expense of significant computational resources, in part because the convolutional layer extracts redundant features, and thus the present method uses the C2f modules in the c2f_ ScConv module backbone network to reduce the computational effort.
Aiming at the change of underwater light conditions and the problem of blurring of underwater images, the method optimizes a network architecture and a training strategy, and improves the perception capability and robustness of an algorithm on underwater fish targets. The accuracy of the algorithm for positioning the fish targets in the underwater environment is improved, and the accuracy of underwater fish length measurement is improved. The performance of the algorithm in an underwater environment is optimized. This will provide a high speed and high accuracy target region of interest location for underwater fish length measurement tasks, providing a reliable basis for subsequent dimensional measurements.
4. By detecting the fish key points, the pose of the detected fish target in the image can be more accurately captured. These positions have an important indicating effect on the posture and morphology of the fish. By detecting and locating these key points, the pose and shape of the fish object, i.e. the body pose and orientation of the fish, can be accurately described. The method is important for understanding biological characteristics of the fish and considering the posture change of the fish in the measuring process, can accurately determine the body length of the fish, provides accurate size information for culture management and research, is beneficial to analyzing physiological states, behavior characteristics and possible abnormal conditions of the fish, and can be used for monitoring the growth change of the fish. By comparing the key point information of different time points, the growth trend and the growth rate of the fish in the cultivation process can be known, and a scientific basis is provided for adjusting the cultivation strategy.
By fitting the pose curve to the fish, i.e. when all fish keypoints are successfully matched in pairs, a pose curve can be fitted by using the three-dimensional coordinates of the keypoints. The pose curve not only can reflect the position of the fish in the real space, but also can accurately depict the pose of the fish. By analyzing the pose curve, the coordinate information of the fish can be obtained, and important characteristics such as the length of the fish can be calculated. This process provides a deep understanding of the behavioral and ecological characteristics of fish. Through the three-dimensional reconstruction technology, the fish can be accurately positioned and tracked in the natural environment. In addition, the detailed analysis of the population structure and the ecological system dynamics can be performed by acquiring the information such as the length of fish. Provides a brand new non-invasive method for fish research, and provides powerful support for research and practice in the fields of ecology, environmental monitoring and the like.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will be given simply with reference to the accompanying drawings, which are used in the description of the embodiments or the prior art, it being evident that the following description is only one embodiment of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a logic block diagram of the underwater fish sizing method of the present invention.
FIG. 2 is a model diagram of an improvement YOLOV of the present invention
Fig. 3 is an image enhancement contrast map under water of example 1.
FIG. 4 is a fish object detection chart of example 1.
FIG. 5 is a diagram showing identification of fish key points in example 1.
FIG. 6 is a graph showing the measurement of fish size in example 1.
Detailed Description
The invention will be further described with reference to specific examples.
The invention provides an underwater fish target length detection method for enhancing underwater image, detecting targets and detecting characteristic points in order to improve the performance of the accuracy of underwater fish target detection. According to the method, underwater calibration is carried out on the binocular camera, and then the diffusion model is introduced into underwater image enhancement, so that enhancement results are more accurate and natural, and the underwater image enhancement capability is further improved by introducing wavelet transformation. A diffusion model based on wavelet transformation is presented. And the improvement is provided for YOLOV, the SCConv network and the DCNv network are integrated, and the identification and detection capability of fish targets is improved. And finally, carrying out three-dimensional reprojection on the positions of the paired key points in the left and right images to obtain three-dimensional coordinates of the paired key points under a world coordinate system, and fitting a pose curve by utilizing the three-dimensional coordinates of the key points. The position of the fish in the real space can be reflected by the pose curve, and the pose of the fish can be accurately depicted. By analyzing the pose curve, the coordinate information of the fish can be obtained, and important characteristics such as the length of the fish can be calculated.
The overall concept of the invention is shown in fig. 1, and the overall method comprises binocular matching, image enhancement, target detection, key point detection and fish size measurement.
In this embodiment, a set of underwater images is taken as an example, and the method of the present invention will be further described.
1. Binocular calibration and image acquisition;
First, a binocular system is set up, and binocular videos or images of fish are captured. A binocular camera and a light source which are subjected to waterproof treatment are combined with a water pool to form a measuring device,
The binocular camera adopts Zhang Zhengyou calibration method and GP290 checkerboard calibration plate to accurately calibrate the underwater and water, and optimizes the depth perception function.
Wherein the binocular camera and light source configuration is specifically designed to accommodate complex underwater lighting conditions and environmental changes, including but not limited to waterproof depth adjustment of the camera, wavelength selection of the light source, and light intensity control, is intended to ensure that high quality images can be captured under different water quality conditions (including but not limited to clear, turbid, and different colored bodies of water), as well as under different water depths and lighting conditions, by optimizing the interaction of the light source with the camera. In addition, the specific placement angle and distance of the binocular camera are carefully designed to ensure that the system can obtain optimal visual angle and depth information for fishes of different sizes and forms, thereby improving the accuracy and reliability of measurement
The binocular matching module adopts a zed 2 binocular camera, the video output resolution is 3840×1080 (1080P) @30fps, the USB3.0 interface is used for transmission, and the update rate of the motion and object detection gestures is up to 100Hz. The waterproof treatment is carried out during underwater operation, and the increase of errors is avoided through the waterproof treatment. The calibration plate is GP290 checkerboard calibration plate, the substrate is glass float substrate, the alumina panel has the advantages of no reflection, opacity, high precision and water resistance, the number of corner points is 11 x 8, the side length of the checkerboard is 20mm, the software development environment adopted by binocular calibration is MATLAB R2019b, the built-in double-target calibration tool box is used for calibration, and the binocular camera respectively acquires 16 groups of calibration plate images from different angles on land and under water.
In an underwater environment, the shape and characteristics of the calibration plate may change due to the influence of refraction and scattering. In the experiment, an alumina calibration panel GP290 having a high contrast and clear calibration point was first selected. By photographing the calibration plate at different depths and distances, a large amount of image data is collected. Then, zhang Zhengyou calibration was performed on the underwater binocular camera using these data. Experimental results show that by properly processing and correcting the calibration plate image, accuracy and precision similar to those of conventional camera calibration can be obtained. After the camera calibration is completed, the next step is to perform stereo correction. The purpose of stereo correction is to align the images of the left and right cameras, ensuring that accurate results are obtained in subsequent depth estimation and size measurement. The research adopts a three-dimensional correction method based on a parallax map, and the internal parameters and external parameters of the cameras are adjusted by calculating the parallax between the images of the left camera and the right camera, so that the alignment of the images is realized. Experiments prove that the underwater binocular camera calibration and three-dimensional correction method of the research not only improves the accuracy of fish size measurement, but also provides a reliable basis for subsequent image processing and analysis.
2. Image processing is carried out on the obtained binocular image
The acquired underwater binocular image is subjected to image processing, and a diffusion model processing method based on wavelet transformation is provided, wherein the diffusion model is used for enhancing low-frequency information, and the cross attention mechanism is used for enhancing and reducing noise for high-frequency information.
The diffusion model based on wavelet transformation is divided into a denoising diffusion model and a wavelet transformation, and a Denoising Diffusion Probability Model (DDPM) is an image denoising method based on probability modeling. It utilizes two markov chains: one is the forward chain that interferes with the data to noise and the other is the reverse chain that converts noise back to data. Forward diffusion process (Forward Diffusion): in DDPM, the input image is first subjected to a forward diffusion process. This process perturbs the distribution of the image by gradually adding noise. The noise may be random, such as gaussian noise. Through multiple iterations, the intensity of noise will be gradually increased, gradually blurring the distribution of the image. Reverse diffusion process (Reverse Diffusion): after the forward diffusion process is completed, the DDPM model performs the reverse diffusion process. The goal of the back diffusion process is to recover the original image from the perturbed image. By gradually reducing the intensity of the noise, the distribution of the image gradually recovers sharpness.
The wavelet transform is a one-dimensional discrete Haar wavelet transform (1D-DWT), and specifically operates as follows: the acquired fish image is decomposed into a low-frequency part (approximate component) and a high-frequency part (detail component). The low frequency part represents the overall trend and smoothness information of the image, and the high frequency part represents the detail and texture information of the image. The decomposition process may be performed recursively, with each stage of decomposition performing further decomposition on the low frequency portion. In each level of decomposition, the frequency range of the wavelet function is adjusted by changing the scale parameters of the wavelet function, so that analysis of different frequency components is realized. Smaller dimensions provide higher frequency resolution and larger dimensions provide lower frequency resolution. The wavelet reconstruction of the image is to reconstruct the coefficient synthesis obtained by decomposition into the original image. And performing inverse operation and inverse downsampling on the coefficients obtained by decomposition of each stage through inverse scale transformation and inverse convolution operation to obtain a low-frequency part and a high-frequency part of the previous stage. And finally, performing inverse scale transformation and deconvolution operation on the highest-level low-frequency part to obtain a complete reconstructed image.
First, decomposing underwater pictures by using a one-dimensional discrete Haar wavelet transform (1D-DWT), wherein the process is as follows:
wherein LL0 represents low frequency information of the picture, (-) ) Representing picture high frequency information. IMG0 represents the original picture. Then, the low-frequency picture is enhanced by a diffusion model, and the diffusion model is generally divided into a forward diffusion part and a backward denoising part. Forward diffusion first uses a fixed variance schemeGradually converting the input X0 into corrupted noise data by T stepThe process is shown in the following formula:
Wherein, AndPredefined variances at corrupted noise data and time step T, respectively, N representing gaussian distribution, X0 being raw data; the transition probability from t-1 to t isAnd finally, the process of enhancing reverse denoising on the high-frequency information by adopting a cross attention mechanism is shown in the following formula:
Wherein, Is under the condition thatAfter occurrence ofProbability of occurrence;
Wherein LH is low and high, LH represents a horizontal high-frequency component, which contains edge and detail information in the horizontal direction in the image, HL is high and low, and HL represents a vertical high-frequency component, which contains edge and detail information in the vertical direction in the image; HH is high, HH denotes a high frequency component, where Conv is convolution, As a result of the LH convolution,As a result of the HL convolution,Results after HH convolution; it contains detailed information that changes rapidly in the horizontal and vertical directions in the image.
Diffusion models based on wavelet transforms are designed specifically to address unique challenges in underwater imaging, including but not limited to attenuation of light, scattering of particulate matter, and image motion blur caused by water flow. The diffusion model utilizes the multi-scale decomposition capability of wavelet transformation to effectively separate signals and noise in images, and realizes noise reduction and signal recovery on different frequency levels. In addition, the model combines the characteristic of underwater imaging, and adjusts the threshold processing method of wavelet coefficients, so that the model is more suitable for image recovery under the conditions of underwater complex illumination and turbidity. By the method, the system can effectively improve the definition and contrast of the underwater image, and provides a clearer and more accurate image foundation for fish target detection and length measurement. Further, the implementation of the model reduces the dependence on computing resources, so that the vision system is more suitable for being deployed in a field environment with limited resources, and the practicability and universality of the system are improved.
Underwater images are affected by scattering and absorption by the body of water, resulting in blurred images. By the enhancement processing, scattering effects can be reduced, and sharpness and details of the image can be restored, so that the image is easier to observe and analyze. Traditional underwater image enhancement methods mainly rely on enhancement techniques of manual fabrication and have limited generalization capability. A series of deep learning methods are proposed by utilizing the strong feature extraction capability of convolutional neural networks and transformers, and have been remarkably successful in the field of underwater image enhancement. While the above methods have made significant progress in underwater image enhancement, their restored noise coverage details may be further enhanced. Previous approaches tend to result in blurring of detail and color distortion. The diffusion model demonstrates the capabilities in image generation, which can generate more realistic details through a series of improvements. Diffusion models can model complex features of an image by learning nonlinear relationships of large amounts of data. Compared with the traditional enhancement algorithm based on a rule or a statistical method, the diffusion model can better capture the advanced semantic and structural information in the image, so that the image enhancement can be performed more accurately. Therefore, diffusion models are introduced into underwater image enhancement to make the enhancement result more accurate and natural.
While diffusion models have many advantages in image enhancement, they also face challenges such as recovery time consuming, excessive computing resource consumption, and unstable recovery. The wavelet transform may decompose the underwater image into frequency bands of different scales, thereby enabling capturing detailed information on different scales. Image details may exist on different scales due to light attenuation and scattering in an underwater environment, so that information on different scales can be extracted through wavelet transformation, which helps to recover the details of the image. Furthermore, wavelet transforms have the property of energy concentration, i.e. the energy of an image in the wavelet domain is typically more concentrated on a few coefficients. In underwater images, valuable information tends to concentrate in wavelet coefficients at lower frequencies, while noise and interference typically concentrate in high frequency parts. Only the information of low frequency and high frequency is processed in a targeted way, the picture enhancement effect can be improved, and the use of computing resources is reduced. Therefore, the invention further improves the capability of underwater picture enhancement by introducing wavelet transform.
The method utilizes the generating capacity of the diffusion model and the advantages of wavelet transformation, achieves the satisfactory effects of denoising the underwater picture and recovering the image, and reduces the demand on calculation force. Wavelet transforms can halve the spatial dimension without sacrificing information, while other transform techniques, such as Fast Fourier Transforms (FFTs) and Discrete Cosine Transforms (DCTs), can result in information loss. In the wavelet transformed image, the low frequency information contains low frequency components of the image, which represent slowly varying parts of the image, such as the background, flat areas and general structures. The low frequency information also contains global features of the image such as overall brightness, contrast and color distribution. These features are important for the overall perception and analysis of the image. The low frequency information also contains some details on a larger scale, but these details change more gradually with respect to the high frequency components. By means of the low frequency information, larger textures and shapes in the image can be observed. In the wavelet transformed image, the high frequency information contains details of the image and portions that change faster. Specifically, the high frequency component typically contains detailed information such as edges, contours, and textures in the image. The high frequency information also contains small scale features in the image such as fine textures, fine lines, and fine object features. Since the high frequency components are very sensitive to subtle changes in the image, they may also contain noise in the image. Thus, the enhancement and noise reduction of high frequency information using a cross-attention mechanism is performed herein by using a diffusion model for low frequency information.
The image enhancement experiment uses an underwater UIEB dataset comprising 950 real world underwater images, divided into two subsets: 890 for the original underwater image and the corresponding high quality reference image and 60 challenging images with no reference. The experiments herein split the dataset into training, validation and test sets in a ratio of 8:1:1. To quantitatively compare the restored image to the paired reference image provided on the dataset, an evaluation was made using the PSNR and SSIM indices. The higher the values of PSNR and SSIM, the more similar the structure between images. The comparative methods include WaterNet, UWCNN-typeI, ucolo and U-Trans. The comparison results are shown in Table 1 and FIG. 3. According to the table 1 and the figure 3, the diffusion model based on wavelet transformation provided by the method can be obtained, good effects are obtained, the quality of underwater pictures can be improved, and better effects are provided for subsequent tasks.
TABLE 1 contrast enhancement results for underwater pictures
3. Fish identification of enhanced images
The invention identifies the fish in the acquired image based on the improved YOLOv algorithm, identifies the fish and acquires the position. The modified YOLOv algorithm is shown in fig. 2, introducing DCNV module 3 into the neck network of the YOLOV model to replace the C2f module in the PAN module, and at the same time, in the backbone network, replacing the original C2f module with the c2f_ ScConv module. By utilizing the long-distance modeling capability of DCNV, the Yolov8 can better capture the correlation between targets and improve the detection performance of remote targets. The self-adaptive space aggregation capability of DCNV is utilized, the detection capability of the self-adaptive space aggregation capability on targets with different scales is enhanced, meanwhile, the induction deviation of rule convolution is reduced, the generalization capability of a model is improved, and in order to solve the problems of underwater illumination condition change and underwater image blurring, the method optimizes a network structure and a training strategy so as to improve the perception capability and robustness of an algorithm on fish targets in an underwater environment. By training and adjusting the fish images shot under the water, the algorithm can be ensured to accurately detect and position the fish targets in the underwater environment, so that the accuracy of underwater fish length measurement is improved. The invention establishes a data set specially used for underwater fish target detection, wherein the data set comprises various underwater fish images with different types, sizes and backgrounds, and the data set covers various typical underwater environmental conditions. By training and testing the algorithm by using the data set, the invention can evaluate the target detection performance of the algorithm in the underwater environment and further improve and optimize the performance of the algorithm.
The SRU employs separate and reconstruct methods to suppress spatial redundancy, while the CRU employs split transform and fusion strategies to reduce channel redundancy. The information content of the different feature maps is evaluated with scaling factors in the Group Normalization (GN) layer. The process is as follows:
Where [ mu ] and [ sigma ] are the mean and standard deviation in X, [ epsilon ] is a small positive constant added for division stability, and [ gamma ] and [ beta ] are trainable affine transformations. The trainable parameter γ in the GN layer is utilized as a method of measuring the spatial pixel variance of each batch and channel. The richer spatial information reflects more spatial pixel variations, resulting in a larger gamma. The normalized correlation weights wγ are derived from a formula, which indicates the importance of the different feature maps. The weights of the feature map re-weighted by wy are then mapped to the range (0, 1) by an s-type function and gated by a threshold. The weight above the threshold is set to 1 to obtain the information weight W1, while it is set to 0 to obtain the non-information weight W2.
Finally, the input feature X is multiplied by W1 and W2, respectively, resulting in two weighted features: the feature X1w with abundant information and the feature X2w with less information. Thus, the present invention successfully splits the input features into two parts: x1w has informative and expressive spatial content, while X2w has little or no information, which is considered redundant.
The Channel Reconstruction Unit (CRU) replaces the standard convolution with the CRU, which is implemented by three operators—splitting, transforming and fusing.
Segmentation: first, the features areThe channel of (a) is divided into two parts, and an alpha C channel and a (1-alpha) C channel are respectively used, wherein alpha is more than or equal to 0 and less than or equal to 1, and the division ratio is equal to or less than or equal to 1. Features for spatially refining after segmentation and compression operationsDivided into upper portionsAnd a lower part
Conversion: for the followingEfficient convolution operations (i.e., GWC and PWC) are employed to extract high-level representative information and reduce computational costs. GWC reduces the number of parameters and computations due to sparse convolution connections, but cuts off the flow of information between channel groups. Whereas PWC can compensate for information loss and help information flow across characteristic channels. Finally forming a merged representative feature map. The process can be represented by the following formula:
x is the feature for a merged representative feature map.
For the followingPWC operations are employed to generate feature maps with shallow hidden details as a supplement to the rich feature extractor. Finally forming a merged representative feature map. The process may be represented by the following formula:
x is the feature for a merged representative feature map.
Fusion: after the conversion is completed, the simplified SKNet method adaptively combines the output features from the up-down conversion stageAnd. A global average pooling (pooling) is first applied to collect global spatial information. Next, up and down global channel level descriptorsSuperimposed together and used for generating characteristic importance vectors by channel-level soft attention operationAnd. Finally, in the feature importance vectorAndUnder the guidance of (a) to characterizeAndThe combining results in a channel refinement feature Y, and the process can be expressed by the equation:
Wherein: And Is an output characteristic of the up-down conversion stage.For upper and lower global channel level descriptors.AndFeature importance vectors are generated for channel level soft attention operations. Y is a channel refinement feature.
The self-attention mechanism is a method that can dynamically adjust weights according to different inputs. In ViT, the multi-headed self-attention mechanism makes the calculation of weights dependent on the input query vector, which enables different tokens to dynamically adjust weights according to the input query. This mechanism enables ViT to aggregate information between different tokens in an adaptive manner, enabling global interactions and modeling. In contrast, when the conventional CNN aggregates information, the parameters of the convolution kernel are static and cannot be dynamically adjusted according to different inputs. This makes conventional CNNs relatively limited in terms of adaptive spatial aggregation capability.
Therefore, by introducing a self-attention mechanism ViT successfully overcomes the limitations of conventional CNNs in terms of global modeling and adaptive spatial aggregation. This allows ViT to exhibit better performance in handling various types of data, particularly in tasks requiring global information interaction. With the continuous development of the deep learning field, models based on attention mechanisms such as ViT are expected to become important tools for processing large-scale data and complex tasks.
The core operation of the completely new convolutional backbone network DCN V3 is a dynamic sparse convolution, which uses a common 3 x 3 window size. The main characteristics are as follows:
(1) Sampling offset: DCN V3 allows flexible learning of the sampling offset to dynamically determine the appropriate acceptance domain. The model may autonomously learn a sampling offset appropriate for long-range or short-range dependencies based on given data. This flexibility allows DCN V3 to better accommodate different types of tasks.
(2) And (3) self-adaptive adjustment: the DCN V3 adaptively adjusts the convolution operation according to the input data. This may achieve an effect similar to ViTs (Vision Transformers) or the like of adaptive spatial aggregation. The adaptive tuning may reduce overinduced bias compared to regular convolution operations, thereby improving the performance and generalization ability of the model.
(3) 3 X 3 convolution window: DCN V3 employs a common 3 x 3 convolution window. This window size selection avoids optimization problems that may occur when using large dense kernels and reduces computational costs. Therefore, DCN V3 has certain advantages in achieving efficient convolution operations.
The DCN V3 network is introduced YOLOv to replace the C2f module in the PAN module. By utilizing the long-distance modeling capability of DCN V3, yolov can better capture the relevance between targets and improve the detection performance of remote targets. The adaptive space aggregation capability of DCN V3 is utilized, the detection capability of the DCN V3 on targets with different scales is enhanced, meanwhile, the induction deviation of regular convolution is reduced, the generalization capability of a model is improved, and the improved model is shown in a figure 2.
Through the combination of SCConv network and DCNv network, the improved YOLOv algorithm improves the robustness of the algorithm to common interference in the underwater environment, such as suspended particles, aquatic weed swing and the like, while improving the recognition of the underwater dynamic environment and various fish postures. The optimization of the algorithm not only improves the accuracy and efficiency of target detection, but also reduces the false detection rate and the miss detection rate while maintaining the real-time processing capability, thereby providing more powerful and reliable technical support for accurate measurement and cultivation management of underwater fishes
The fish dataset of this embodiment is obtained for acquisition, and in addition, partial pictures are acquired by the crawler and screened using UIEB datasets. The label is labelme, and a total of 654 pictures are labeled. The data set is expanded by clipping, amplifying, shrinking and rotating the picture. The recognition result is shown in fig. 4. Table 2 is a model comparison table, and it can be seen that the P, R, mAP indexes of the improved network are significantly better than those of other original models, and the improvement of P, R, mAP to different degrees proves that the improvement mode proposed herein is effective and can well pad the subsequent tasks.
Table 2 model comparison
4. The key points of the fish body are detected by using TransPose model for detection.
After detecting the individual fish, fish keypoints are detected, including head, eyes, dorsal fin, pectoral fin, gluteal fin, and caudal fin. The method adopts TransPose key point detectors, adopts transPose neural network trained by a deep learning technology, can accurately detect fish in a single-frame binocular image, and then converts the detected key point image coordinates into world coordinates through a three-dimensional reprojection technology and a binocular imaging principle.
First, a data set is prepared, and an image data set containing fish gesture labels is collected. Each image needs to be marked with the position information of key points such as fish heads, fish tails and the like. The labeling can be accomplished by manual labeling or automatic labeling. Then, a network architecture design is carried out, and a posture estimation model based on a transposed neural network (TransPose Net) is designed to detect key points such as fish heads, fish tails and the like. The network architecture typically includes structures such as convolution layers, transposed convolution layers, residual connections, etc., to efficiently extract image features and achieve regression of key points. And then preprocessing the image, including operations of size adjustment, normalization, mean removal and the like. The preprocessing helps to improve the training effect and generalization capability of the network. The training model is then started and the designed pose estimation model is trained using the prepared dataset. In the training process, a supervised learning mode is adopted, and the marked key point position information and the image data are utilized to optimize parameters of the model. And then carrying out model evaluation and tuning, evaluating the trained model by using a verification set, and carrying out model tuning according to an evaluation result. The evaluation index may include accuracy of the location of the keypoints, recall, etc. And finally, performing model test and application, and using a test set to test the evaluated and optimized model to evaluate the performance of the model on unseen data. If the model performance is satisfactory, the model can be applied to an actual scene and used for detecting key points such as fish heads, fish tails and the like. And carrying out post-processing operations, such as smoothing processing of key points, calculation of attitude angles and the like, according to requirements so as to improve the stability and accuracy of detection results.
The overall processing process of TransPose model is as follows: and carrying out feature extraction and downsampling on the input image through a CNN backbone network. The feature map is resized by the cropping layer to fit the shape of the encoder. The adjusted feature map is treated as a sequence and then input to N concatenated encoders. The encoder uses a self-attention mechanism to establish dependencies between features, progressively extracting higher level feature representations by encoding the sequence multiple times. The output of each encoder is an encoded signature sequence. In the heat map output header set, the output through the last encoder is used as an input to generate a keypoint heat map. Training the heat map to accurately locate fish key points.
TransPose a heat map matching the number of key points is generated by the heat map output head. A heat map is a two-dimensional image in which each pixel value represents the probability that the location may be a corresponding keypoint. The heat map loss is calculated using Mean Square Error (MSE) by constructing a loss function, and the model can adjust the predicted outcome of the keypoints to approximate the true position by optimizing the loss function. Finally, by maximum activation function TransPose, the final coordinates of each keypoint can be determined. By finding the pixel point in the heat map with the highest probability value, the specific location of the keypoint in the image can be determined.
In summary, the steps of using the transposed neural network-based model to detect the key points of the fish head, the fish tail and the like include the links of data set preparation, network architecture design, data preprocessing, model training, model evaluation and tuning, model testing and application, post-processing and the like. Through the steps, accurate detection of the key points of the underwater fish can be realized. By identifying the pose of the fish object that has been detected in the keypoint capturing image, the pose and shape of the fish object, i.e., the body pose and orientation of the fish, can be accurately described.
The design and training process of transPose neural network trained by deep learning technology comprises the following key steps: firstly, performing end-to-end supervised learning on transPose neural networks by using a large number of marked underwater fish image data sets so as to learn and improve the key point detection capability of the network on the underwater fish. Secondly, the network structure is optimized and adjusted for the specificity of the underwater environment, including but not limited to the depth and width of the convolution layer, the selection of pooling operation, and the application of regularization q-ization and batch standardization, so as to improve the robustness of the network in the aspects of underwater illumination change, water flow influence, image blurring and the like. And then, the training data is amplified and enhanced by utilizing a data enhancement technology so as to increase the generalization capability and the anti-interference capability of the network and ensure the stability and the reliability of the model under different underwater environments and illumination conditions. Finally, through repeated iterative training and verification on a large-scale underwater scene data set, the transPose neural network is optimized and optimized, so that the neural network shows higher accuracy and stability in a key point detection task of the underwater fish image.
Through the training process, the transPose neural network trained by the deep learning technology can accurately detect the key points of fish in a single-frame binocular image, and the image coordinates are converted into world coordinates through a three-dimensional reprojection technology and a binocular imaging principle. The network combines the cognition and understanding of the underwater environment, has strong adaptability to illumination change, water flow interference and fish attitude change, can steadily cope with challenges of the underwater complex environment, and provides a reliable length measurement solution for fish individuation culture.
The fish dataset is acquired for oneself, and partial pictures are acquired through a crawler and screened. Labeled with labelme, together 578 figures are labeled. TranPose model weights are initialized randomly, after which the model substantially converges at iteration 120. The AP and AR were 81.6 and 83.1, respectively. The recognition result is shown in fig. 5.
5. And carrying out coordinate conversion based on the detected key points, fitting a pose curve of the fish, integrating the pose curve obtained by fitting, and calculating the length of the curve, namely the length of the fish.
And measuring the size of the fish, successfully fitting a curve reflecting the pose of the fish by using a curve fitting algorithm based on the key point coordinates after re-projection, and precisely calculating the length of the curve by integral operation, thereby obtaining the actual length information of the fish. In the binocular image, the key points of the left and right bounding boxes which are matched with each other can be detected, so that three-dimensional reconstruction of the gesture and the position of the fish is realized, and specifically, if the key points of a certain fish exist in the left and right images, the left and right key points can be matched. By three-dimensionally reprojecting the positions of these paired key points in the left and right images, three-dimensional coordinates thereof in the world coordinate system can be obtained. When all the fish key points are successfully matched in pairs, a pose curve can be fitted by utilizing the three-dimensional coordinates of the key points. The pose curve not only can reflect the position of the fish in the real space, but also can accurately depict the pose of the fish. By analyzing the pose curve, the coordinate information of the fish can be obtained, and important characteristics such as the length of the fish can be calculated.
In the binocular detection task, the optical center of the left camera is taken as the origin of the world coordinate system. The X-axis and Y-axis are assumed to be the same as the imaging plane of the left camera, while the Z-axis indicates the direction in which the camera is oriented. According to the similar triangle theorem, calculating the space coordinates of the key points. First, pixel coordinates of key points in the left camera and the right camera are respectively expressed asAnd. The value of the focal length f under water is 1.33 times that in the air. The baseline length is B (i.e., the distance between the left and right cameras).Is the length in the imaging plane. The process is shown in the following formula.
Wherein: Representing the coordinates on the X-axis, Representing the coordinates on the Y-axis,Representing the coordinate on the Z axis; f is the underwater focal length; is the length in the imaging plane; b is the length of the base line, AndRepresenting the pixel coordinates of the keypoints in the left and right cameras.
The head key points and the tail key points serve as the starting points and the ending points of the pose curves, represent the initial positions and the orientations of the fishes, and then three-dimensional coordinates of the head key points and the tail key points are used for fitting a curve so as to reflect the motion trail of the fishes in the real space, and other key points except the head key points and the tail key points cannot be directly used for fitting the pose curves. Because they represent other parts of the fish body, the positional information of which is insufficient to determine the overall motion profile of the fish. Instead, the positional information of these key points needs to be combined to comprehensively consider the morphology and attitude of the fish. By combining the position information of other key points, more comprehensive fish pose description can be obtained. In world coordinates, the head key points are recorded asThe central point of the connecting line of the root key point of the front end of the dorsal fin and the root key point of the pectoral fin is. The central point of the connection line between the dorsal fin terminal root point and the hip fin root point is. The root key point of the tail fin is. At the key point of) Then, fitting the four key points to form a curve
To limit the twisting of the fish pose curve to a certain two-dimensional plane under a three-dimensional coordinate systemAn expression may be used to define the equation for this plane Φ. The equation is shown as:
Wherein: Representing the coordinates on the X-axis, Representing the coordinates on the Y-axis,Representing the coordinate on the Z axis; wherein the method comprises the steps ofAs a coefficient, can passObtained by means of linear regression, the process is as follows:
Wherein: is a defined key point; Representing the coordinates on the X-axis, Representing the coordinates on the Y-axis,Representing the coordinate on the Z-axis.
Then the equation can be used to obtain%) In plane surfaceMiddle and upper coordinate points [ ]). The process is as follows:
Wherein: ( ) Representative key pointIn plane surfaceCoordinate point in the middle and upper middle
Constructing a new coordinate system by taking phi as subspaceIs the origin ofIn the X-axis direction. The coordinates in the new coordinate system on the phi plane are calculated according to the following formula
Wherein: Representative key point Coordinates in the new coordinate system on the phi plane.
Recording deviceThe expression in the phi plane isWherein the a, b, c parameters can be calculated from the formula.
Wherein a, b and c are equation parameters, and are recordedIs thatAn expression in the Φ -plane.
Finally toPerforming curve integral summation to obtain the lengthI.e. the fish length, the process of which is shown in the formula.
Where L is the length of the fish measured last. Xbase is an expression of an unknown on the phi plane, X phi can be calculated, and the like.
6. Experiment verification
In order to verify the accuracy of the algorithm, an experimental platform is built, and a video of fish swimming is shot in a pool. And then, drawing frames from the video, and calculating the length of fish in the image. The experimental results are shown in fig. 6.5 samples were selected and the length measured by the algorithm was compared with the actual length, and the comparison results are shown in table 3. From the table, the measured length and the actual length error of the algorithm are limited to be within 10%, which proves the usability of the algorithm.
TABLE 3 measurement Length and actual Length
The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
While the foregoing describes the embodiments of the present invention, it should be understood that the present invention is not limited to the embodiments, and that various modifications and changes can be made by those skilled in the art without any inventive effort.

Claims (7)

1. The fish target length detection method based on binocular vision is characterized by comprising the following steps of:
step 1, acquiring image information of an underwater target fish body by adopting a binocular camera, and performing double-target fixed acquisition of internal parameters and external parameters of the camera;
Step 2, carrying out image enhancement on the image obtained in the step 1 based on a diffusion model of wavelet transformation; the high-frequency information is enhanced by adopting a cross attention mechanism, and the process is as follows:
,
Wherein LH is low and high, LH represents a horizontal high-frequency component, which contains edge and detail information in the horizontal direction in the image, HL is high and low, and HL represents a vertical high-frequency component, which contains edge and detail information in the vertical direction in the image; HH is high, HH denotes a high frequency component, where Conv is convolution, As a result of the LH convolution,As a result of the HL convolution,Results after HH convolution; the method comprises the steps of including detail information which changes rapidly in the horizontal and vertical directions in an image;
Step 3, inputting the enhanced image data into an improved YOLOV model, and identifying fish targets; the YOLOV model is specifically: introducing DCNV modules into a neck network of the YOLOV8 model to replace C2f modules in the PAN modules, and simultaneously, in a backbone network, replacing original C2f modules by using C2f_ ScConv modules;
the DCN V3 module allows flexible learning of sampling offset, and autonomously learns the sampling offset suitable for long-range or short-range dependency according to given data; meanwhile, the DCN V3 module can adaptively adjust convolution operation according to input data, and a 3×3 convolution window is adopted conventionally;
Wherein, the SCConv sub-module in the c2f_ ScConv module is composed of two units, namely a space reconstruction unit SRU and a channel reconstruction unit CRU; the SRU adopts a separation and reconstruction method to inhibit space redundancy, and the CRU adopts a split transformation and fusion strategy to reduce channel redundancy; the channel reconstruction unit CRU replaces standard convolution, and is realized through three operators, namely splitting, transforming and fusing; the space reconstruction unit SRU adopts an independent reconstruction operation, and suppresses space redundancy by a separation-reconstruction method;
step 4, detecting key points of the fish body based on the fish targets identified in the step 3 by using TransPose models;
Step 5, based on the key points detected in the step 4, carrying out three-dimensional re-projection by combining internal parameters and external parameters of the binocular camera, carrying out coordinate conversion on the key points, and fitting a pose curve of the fish by using a corresponding curve fitting algorithm according to the coordinates of the re-projected key points; integrating the pose curve obtained by fitting, and calculating the length of the curve, namely the length of the fish.
2. The binocular vision-based fish target length detection method of claim 1, wherein: in the step 1, the double-target calibration is carried out by adopting Zhang Zhengyou calibration method and GP290 checkerboard calibration plate.
3. The binocular vision-based fish target length detection method of claim 1, wherein: the step 1 also comprises a stereo correction process, aiming at aligning the images of the left camera and the right camera; by adopting a three-dimensional correction method based on a parallax map, the internal parameters and external parameters of the cameras are adjusted by calculating the parallax between the left camera image and the right camera image, so that the alignment of the images is realized.
4. The binocular vision-based fish target length detection method of claim 1, wherein the specific process of step 2 is as follows:
S21, firstly decomposing the fish picture obtained in the step 1 by using a one-dimensional discrete Haar wavelet transform 1D-DWT, and calculating the following formula:
Wherein the method comprises the steps of The low frequency information representing the picture is displayed,Representing high-frequency information of the picture; Representing an original picture;
S22, enhancing the low-frequency picture through a diffusion model, wherein the diffusion model is divided into a forward diffusion part and a reverse denoising part; forward diffusion first uses a fixed variance scheme Gradually input through T stepsConversion to corrupted noise dataThe process is as follows:
Wherein, AndPredefined variances at corrupted noise data and time step T, respectively, N representing gaussian distributions; x0 is original data; the transition probability from t-1 to t is
The process of reverse denoising is as follows:
Wherein, Is under the condition thatAfter occurrence ofProbability of occurrence.
5. The binocular vision-based fish target length detection method of claim 1, wherein the key point detection is performed by using TransPose model in the step 4, and the TransPose model is constructed by the following steps:
S41, acquiring a large number of underwater fish image data sets through a binocular camera, and dividing a training set and a testing set;
S42, labeling fish key points in the image by labelme, wherein the fish key points comprise a head, eyes, a dorsal fin front end root, a dorsal fin tail end root, a pectoral fin root, a gluteal fin root and a caudal fin root;
s43, constructing TransPose models, wherein the models comprise three parts of a CNN backbone network, an encoder group and a heat map output head group; performing feature extraction and downsampling through a CNN backbone network; the size of the feature map is adjusted to be suitable for the shape of the encoder through a clipping layer; the adjusted feature map is regarded as a sequence and then input into N cascaded encoders; the encoder uses a self-attention mechanism to establish the dependency relationship between the features, and gradually extracts higher-level feature representations by encoding the sequence a plurality of times; the output of each encoder is an encoded signature sequence; in the heat map output head group, the output of the last encoder is used as the input for generating the key point heat map, and the key point heat map is generated;
S44, performing end-to-end supervised learning on the built transPose model by using the labeling completion training set and the testing set, wherein the heat map loss is calculated by using a mean square error MSE by using a construction loss function, the model adjusts the prediction result of the key points by optimizing the loss function to enable the prediction result to be close to the real position, and finally, the final coordinate of each key point can be determined by using a maximum value activation function TransPose model.
6. The binocular vision-based fish target length detection method of claim 1, wherein: in the step 5, three-dimensional reprojection is performed by combining internal parameters and external parameters of the binocular camera, and the key points are subjected to coordinate conversion, specifically:
Taking the optical center of the left camera as the origin of a world coordinate system, assuming that the X axis and the Y axis are the same as the imaging plane of the left camera, and the Z axis represents the direction in which the camera faces, calculating the space coordinates of the key points according to the similar triangle theorem ; First, pixel coordinates of key points in the left camera and the right camera are respectively expressed asAnd; Focal distance under waterThe value of (2) is 1.33 times of that in the air; the calculation process is as follows:
Wherein, B is the base line length, i.e. the distance between the left and right cameras; Is the length in the imaging plane.
7. The binocular vision-based fish target length detection method of claim 1, wherein in the step 5, the pose curve of the fish is fitted by using a corresponding curve fitting algorithm, the pose curve obtained by fitting is integrated, and the length of the curve is calculated by the specific process of:
s1, in world coordinates, recording head key points as The central point of the connecting line of the root key point of the front end of the dorsal fin and the root key point of the pectoral fin isThe central point of the connection line between the dorsal fin terminal root point and the hip fin root point isThe root key point of the tail fin is
S2, after the key points are set) Then, fitting the four key points to form a curve
To limit the twisting of the fish pose curve to a certain two-dimensional plane under a three-dimensional coordinate systemAn expression is used to define the equation for this plane Φ, which is shown below:
Wherein, Representing the coordinates on the X-axis,Representing the coordinates on the Y-axis,Representing the coordinate on the Z axis; wherein A, B and C are coefficients, and are obtained by using a linear regression mode through P1, P2, P3 and P4;
The process is as follows:
S3, obtaining (P1, P2, P3, P4) on the plane by using an equation Middle and upper coordinate points [ ]) ; The process is as follows:
s4, constructing a new coordinate system by taking phi as a subspace to Is the origin ofFor the X-axis direction, the coordinates in a new coordinate system on the phi plane are calculated according to the following formula
S5, recordThe expression in the phi plane isWherein a, b, c are calculated from the following formula:
Wherein a, b and c are equation parameters, and are recorded Is thatAn expression in the Φ plane;
S6, finally to Performing curve integral summation to obtain the lengthI.e. the fish length, the process is as follows:
Where L is the length of the fish measured last and Xbase is the expression of the unknowns in the Φ -plane.
CN202410677436.8A 2024-05-29 2024-05-29 Fish target length detection method based on binocular vision Active CN118397074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410677436.8A CN118397074B (en) 2024-05-29 2024-05-29 Fish target length detection method based on binocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410677436.8A CN118397074B (en) 2024-05-29 2024-05-29 Fish target length detection method based on binocular vision

Publications (2)

Publication Number Publication Date
CN118397074A CN118397074A (en) 2024-07-26
CN118397074B true CN118397074B (en) 2024-10-11

Family

ID=92005542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410677436.8A Active CN118397074B (en) 2024-05-29 2024-05-29 Fish target length detection method based on binocular vision

Country Status (1)

Country Link
CN (1) CN118397074B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119314030A (en) * 2024-10-29 2025-01-14 中国海洋大学 A fish length monitoring method based on computer vision
CN119399176B (en) * 2024-11-04 2025-10-03 天津大学 Stereo image quality evaluation method based on dual-frequency interactive enhancement and binocular matching
CN119339223B (en) * 2024-12-20 2025-03-14 中科视语(北京)科技有限公司 Shrimp information detection method and system based on computer binocular vision
CN120298651A (en) * 2025-02-18 2025-07-11 杭州师范大学 A method and system for detecting and tracking marine vessels
CN120564226B (en) * 2025-05-21 2026-04-28 国家海洋环境监测中心 A method and system for fish body length recognition based on image processing and binocular camera

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926652A (en) * 2021-02-25 2021-06-08 青岛科技大学 Fish fine-grained image identification method based on deep learning
CN116452885A (en) * 2023-04-24 2023-07-18 浙江大学 Novel yellow croaker key point detection and phenotype data measurement method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7515763B1 (en) * 2004-04-29 2009-04-07 University Of Rochester Image denoising based on wavelets and multifractals for singularity detection and multiscale anisotropic diffusion
CN103593639A (en) * 2012-08-15 2014-02-19 北京三星通信技术研究有限公司 Lip detection and tracking method and device
CN116453114B (en) * 2022-12-14 2024-03-05 西南医科大学附属口腔医院 Pathological image analysis method, equipment and system based on deep learning
CN117058232A (en) * 2023-07-27 2023-11-14 大连海洋大学 An improved YOLOv8 model for position detection of fish target individuals in farmed fish schools
CN117689704A (en) * 2023-12-11 2024-03-12 大连理工大学 A monocular image depth estimation method based on diffusion model
CN117726870A (en) * 2023-12-19 2024-03-19 之江实验室 Small-sample target detection model reinforcement learning method and device based on diffusion model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112926652A (en) * 2021-02-25 2021-06-08 青岛科技大学 Fish fine-grained image identification method based on deep learning
CN116452885A (en) * 2023-04-24 2023-07-18 浙江大学 Novel yellow croaker key point detection and phenotype data measurement method

Also Published As

Publication number Publication date
CN118397074A (en) 2024-07-26

Similar Documents

Publication Publication Date Title
CN118397074B (en) Fish target length detection method based on binocular vision
Yeh et al. Multi-scale deep residual learning-based single image haze removal via image decomposition
Han et al. Underwater image processing and object detection based on deep CNN method
Hsu et al. Single image dehazing using wavelet-based haze-lines and denoising
CN120259864B (en) Underwater Target Detection Method Based on Multimodal Features and Domain Adaptation
Sun et al. Underwater image enhancement with encoding-decoding deep CNN networks
Li et al. A high-precision underwater object detection based on joint self-supervised deblurring and improved spatial transformer network
Malathi et al. Optimzied resnet model of convolutional neural network for under sea water object detection and classification
Jindal et al. An ensemble mosaicing and ridgelet based fusion technique for underwater panoramic image reconstruction and its refinement
CN120451191A (en) A method, system and device for underwater forward-looking sonar image segmentation based on CNN and Transformer
CN120071113A (en) Underwater sonar and multisource image fusion intelligent denoising enhancement and target recognition system
CN119784741A (en) A training method and system for 5G flexible circuit board defect detection model
Sugunapriya et al. Studies on underwater image processing using artificial intelligence technologies
Bajpai et al. Enhancing underwater object detection: Leveraging YOLOv8m for improved subaquatic monitoring
Ni et al. Example-driven manifold priors for image deconvolution
Ferzo et al. Image denoising techniques using unsupervised machine learning and deep learning algorithms: A review
CN117095159A (en) An intelligent identification method for key parts of ships in infrared images
CN114612798B (en) Satellite image tampering detection method based on Flow model
CN121147534A (en) A method and system for underwater image descattering
CN119850697B (en) Unsupervised vehicle-mounted monocular depth estimation method based on confidence mask
CN119006742B (en) Human body three-dimensional reconstruction method and system based on deep learning
Yu et al. Visual Perception and Control of Underwater Robots
CN118840271A (en) Submarine coral reef image brightness enhancement method based on deep learning
Yang et al. Research on binocular stereo vision phenotyping measurement for leafy vegetable based on 3DGS supervision
Aamir et al. Overview of Underwater Object Detection Based on Image Processing Techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant