CN115527106A - Imaging identification method and device based on quantitative fish identification neural network model - Google Patents

Imaging identification method and device based on quantitative fish identification neural network model Download PDF

Info

Publication number
CN115527106A
CN115527106A CN202211296268.5A CN202211296268A CN115527106A CN 115527106 A CN115527106 A CN 115527106A CN 202211296268 A CN202211296268 A CN 202211296268A CN 115527106 A CN115527106 A CN 115527106A
Authority
CN
China
Prior art keywords
fish
neural network
network model
model
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211296268.5A
Other languages
Chinese (zh)
Inventor
刘�英
周纪军
陈庭槿
赵志扬
李昌贸
阮润康
陈智鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
Original Assignee
Shenzhen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University filed Critical Shenzhen University
Priority to CN202211296268.5A priority Critical patent/CN115527106A/en
Publication of CN115527106A publication Critical patent/CN115527106A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/05Underwater scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an imaging identification method and device based on a quantitative fish identification neural network model; the method comprises the following steps: building a fish recognition neural network model, inputting a training set into the built fish recognition neural network model for training, and performing channel pruning on the trained fish recognition neural network model; performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework; quantizing the fish recognition neural network model after model conversion by quantizing the model parameters; and (3) extracting frames from the video stream obtained by real-time shooting through the quantized fish identification neural network model, carrying out fish identification on the images after frame extraction, identifying the fish, and reminding the coming fish. Compared with an unquantized model, the method has the advantages that the occupied memory is greatly reduced, the reasoning speed is correspondingly improved, and the lightweight model can meet the calculation requirement under low-cost hardware resources.

Description

Imaging identification method and device based on quantitative fish identification neural network model
Technical Field
The invention relates to the technical field of fish identification, in particular to an imaging identification method, device and medium based on a quantitative fish identification neural network model.
Background
In fishing, underwater culture, deep well viewing, underwater engineering acceptance and other scenes, people are inconvenient to enter water to observe underwater environment and cannot know underwater scenes. The identification of underwater fishes in the related field needs to be observed by human eyes at present, people cannot know whether the fishes exist underwater currently due to uncertainty of underwater scenes, manpower is consumed for long-time observation, and the underwater imaging device with the function of reminding the fishes has a promising application space along with increasing popularity of fishing culture.
In the prior art, on one hand, a display screen of a common underwater imaging device needs to be connected with a camera through a data line, the movement of the display screen is constrained by a wiring harness, the display screen is large in size and inconvenient to carry, and one camera can only be connected with one display screen and cannot be shared by multiple screens; on the other hand, common underwater imaging equipment does not have an intelligent fish identification function, can only judge whether fish exist in a water area through human eyes, consumes manpower, cannot know whether fish exist when no one is on duty, and cannot sense underwater scenes.
Thus, there is a need for improvements and enhancements in the art.
Disclosure of Invention
The invention aims to solve the technical problems that the imaging identification method, the device and the medium based on the quantitative fish identification neural network model are provided aiming at the defects of the prior art, the invention realizes the functions of intelligent identification of underwater fish and reminding of fish coming with low cost, is convenient for users to observe underwater environment in fishing, underwater culture, deep well check, underwater engineering acceptance and other scenes, can know underwater scenes, and provides convenience for the users to use.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows:
an imaging identification method based on a quantitative fish identification neural network model, wherein the method comprises the following steps:
acquiring underwater image data in advance;
screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
labeling the screened pictures which are reserved with the preset pictures containing the fish images, and labeling the fish in the pictures through a rectangular frame to form the labeled pictures;
carrying out format conversion on the marked pictures to form a label file, taking a set of the label file and the marked pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
building a fish recognition neural network model, and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
performing channel pruning on the trained fish recognition neural network model;
performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework;
quantizing the fish recognition neural network model after model conversion by quantizing the model parameters;
and (3) extracting frames from the video stream obtained by real-time shooting through the quantized fish identification neural network model, carrying out fish identification on the images after frame extraction, identifying the fish, drawing a prediction frame around the fish, and carrying out incoming fish reminding.
The imaging identification method based on the quantified fish identification neural network model is characterized in that the steps of carrying out format conversion on the labeled pictures to form label files, taking a set of the label files and the labeled pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion comprise:
automatically generating an xml file for each image marked;
converting all xml files into txt format files, wherein the txt file comprises five values: the fish category is represented by 1, and the coordinates x, y of the upper left corner and x, y of the lower right corner of the box;
taking the set of the label file and the marked picture as a data set for neural network training;
and dividing the data set into a training set, a verification set and a test set according to the proportion of 8.
The imaging identification method based on the quantitative fish identification neural network model is characterized in that the step of building the fish identification neural network model and inputting the training set into the built fish identification neural network model to obtain the trained fish identification neural network model comprises the following steps:
using CSPDarknet as a feature extraction network to extract a series of feature maps with different scales of the image;
4 cross-stage local network modules and a feature pyramid pooling module are connected in series in the feature extraction network; dividing the original input into two branches through a cross-stage local network, respectively carrying out convolution operation to reduce the number of channels by half, wherein one branch does not carry out any treatment, the other branch carries out a multi-time residual error structure, and finally fusing the channels of the two branches;
a convolution kernel with the size of 3 x 3 and stride =2 is placed in front of each cross-stage local network module to play a role of down-sampling; adding a characteristic pyramid pooling module behind the 3 rd cross-stage local network module, converting a characteristic graph with any size into a characteristic vector with a fixed size through the characteristic pyramid pooling module, performing three pooling operations on the input characteristic graph through the characteristic pyramid pooling module, and splicing with an input which is not subjected to pooling to perform multi-scale characteristic fusion;
finally outputting three characteristic layers called effective characteristic layers through the characteristic extraction network, wherein the three characteristic layers are positioned at three different positions of the characteristic extraction network and are respectively positioned at a middle layer, a middle lower layer and a bottom layer;
and constructing a feature pyramid network layer through the extracted three feature layers to obtain three reinforced feature layers.
The imaging identification method based on the quantitative fish identification neural network model is characterized in that the step of building the fish identification neural network model and inputting the training set into the built fish identification neural network model to obtain the trained fish identification neural network model comprises the following steps:
transmitting the obtained three reinforced characteristic layers into a detection layer, generating 3 prediction frames for each characteristic image pixel point of the three reinforced characteristic layers by the detection layer during training, and associating the prediction frames with the characteristic layer data through an anchor frame mechanism to generate an output matrix with a target category, a category probability and a prediction frame position;
when the detection layer detects that the fish target exists, a prediction frame is generated around the fish target, value suppression is carried out on the prediction frame, only one prediction frame is reserved at last, one prediction frame is correspondingly drawn on one fish on the original image, and training of the fish recognition neural network model is completed.
The imaging identification method based on the quantified fish identification neural network model is characterized in that the step of channel pruning on the trained fish identification neural network model comprises the following steps:
inserting a batch normalization layer after the convolutional layer of the trained fish recognition neural network model, and sending the characteristic diagram into the batch normalization layer to obtain a normalized characteristic diagram;
setting a pruning rate for determining the ratio of the number of pruning channels;
according to the set pruning rate, correspondingly pruning the number of channels of the trained fish recognition neural network model to obtain a compact network model;
retraining the compact network model obtained after pruning and finely adjusting the pruning rate to enable the compact network model to achieve the recognition effect before pruning as far as possible, thereby obtaining the optimal fish recognition neural network model after pruning.
The imaging identification method based on the quantified fish identification neural network model comprises the following steps of carrying out model conversion on the fish identification neural network model subjected to channel pruning and converting the fish identification neural network model into a model format suitable for a mobile terminal computing framework:
and performing model conversion on the fish recognition neural network model after channel pruning, adopting ONNX as an intermediate layer, using a torch2ONNX tool to convert the trained network model weight into an ONNX format, then using an ONNX2NCNN tool to convert an ONNX format file into an NCNN model, and converting the NCNN model into a model format suitable for a mobile terminal computing framework.
The imaging identification method based on the quantified fish identification neural network model is characterized in that the step of quantifying the fish identification neural network model after model conversion through quantified model parameters comprises the following steps:
and (4) quantizing the model parameters of the fish recognition neural network model after the model conversion to lighten the model again to obtain the quantized fish recognition neural network model.
An imaging recognition device based on a quantitative fish recognition neural network model, comprising:
the pre-acquisition module is used for acquiring underwater image data in advance;
the image screening module is used for screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
the image labeling module is used for labeling the screened pictures which are reserved with the preset images containing the fishes, labeling the fishes in the pictures through a rectangular frame and forming the labeled pictures;
the data set assembly module is used for carrying out format conversion on the marked pictures to form a label file, taking the label file and the marked picture assembly as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
the neural network model building module is used for building a fish recognition neural network model and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
the channel pruning module is used for carrying out channel pruning on the trained fish recognition neural network model;
the model conversion module is used for carrying out model conversion on the fish recognition neural network model subjected to channel pruning and converting the model into a model format suitable for a mobile terminal computing framework;
the quantification module is used for quantifying the fish recognition neural network model after model conversion through quantifying model parameters;
and the fish identification neural network model application module is used for extracting frames from the video stream obtained by real-time shooting through the quantized fish identification neural network model, identifying the fishes by carrying out fish identification on the images after the frames are extracted, drawing a prediction frame around the fishes, and reminding the fishes.
A portable underwater imaging and fish identification device comprises a memory, a processor and an imaging identification program which is stored in the memory and can run on the processor and is based on a quantified fish identification neural network model, wherein when the processor executes the imaging identification program based on the quantified fish identification neural network model, the imaging identification method based on the quantified fish identification neural network model is realized.
A computer-readable storage medium, wherein an imaging identification program based on a quantified fish identification neural network model is stored on the computer-readable storage medium, and when the imaging identification program based on the quantified fish identification neural network model is executed by a processor, the steps of any one of the imaging identification methods based on the quantified fish identification neural network model are realized.
Has the beneficial effects that: compared with the prior art, the invention provides an imaging identification method based on a quantitative fish identification neural network model, which is characterized in that a trained fish identification neural network model is obtained by building a fish identification neural network model and inputting the training set into the built fish identification neural network model; performing channel pruning on the trained fish recognition neural network model; performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework; quantizing the fish recognition neural network model after model conversion by quantizing the model parameters; and (3) extracting frames from the video stream obtained by real-time shooting through the quantized fish recognition neural network model, and performing fish recognition on the images after the frames are extracted. Compared with an unquantized model, the method has the advantages that the occupied memory is greatly reduced, the reasoning speed is correspondingly improved, and the lightweight model can meet the calculation requirement under low-cost hardware resources.
In addition, the pictures shot by the portable underwater imaging and fish recognition device are transmitted back to the mobile phone through the wireless network and displayed, a plurality of people share the video pictures, the moving range of the user is improved through wireless signal transmission data, and the brightness and the saturation of the video pictures can be adjusted to obtain a good display effect; the invention adopts an optimized target detection algorithm, and solves the problems of ensuring the real-time performance and precision of detection under the condition of limited computing resources of the portable underwater imaging and fish identification device. The invention can intelligently identify fishes in the video picture in real time, draw frame marks on the fishes in the picture and send out the fish prompt tone; the invention brings new experience to the user, one device, namely a multi-screen shared picture, carries out fish reminding when the device is unattended, and improves the perception of the user; providing convenience for the user. And the invention also has the following advantages:
1) The method has the advantages that the occupied operation resources are few, the method can be operated and used in equipment with lower operation capacity, for example, the method can be well realized in a thousand-element-level smart phone with lower chip operation capacity, a neural network dynamic pruning method is used, and the unimportant part weight is reset to 0 during network training, so that the method has the advantages of reducing weight parameters, reducing the size of a model and calculating the cost, and leading the thousand-element-level smart phone with lower operation capacity to identify underwater fishes in real time in a storage space;
2) The invention adopts the neural network to identify the fishes, has high identification accuracy and is real-time video identification; the method can identify whether the fish is the target object or not easily even if too many objects in the real environment have similar length-width ratio to the fish, and can effectively reduce misjudgment of other objects as the fish;
3) The invention relates to a method for amplifying data, which comprises the following steps of shooting fish images containing different sizes, side faces, rotation angles, inclinations, illuminations and shelters in real underwater environments with different backgrounds, taking the fish images as an original data set, and performing data amplification on the basis: the augmented data set and the original data set are sent to a neural network together for training; by the method, a detection data set is greatly enriched, the diversity of samples is increased, and a plurality of small targets are increased by random scaling, so that the network robustness is better; 4 graphs can be calculated at one time, and the difficulty of the model for diversity learning is greatly reduced.
Drawings
Fig. 1 is a flowchart of an imaging identification method based on a quantitative fish identification neural network model according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of an imaging identification method based on a quantitative fish identification neural network model according to embodiment 2 of the present invention.
Fig. 3 is a schematic structural diagram of a picture and a tag file stored in a folder according to a path in the imaging identification method based on the quantitative fish identification neural network model according to the embodiment of the present invention.
Fig. 4 is a flow chart of a neural network model building method based on an imaging identification method of a quantitative fish identification neural network model according to an embodiment of the present invention.
Fig. 5 is a model pruning flow chart of an imaging identification method based on a quantitative fish identification neural network model according to an embodiment of the present invention.
Fig. 6 is a model quantization flow chart of an imaging identification method based on a quantized fish identification neural network model according to an embodiment of the present invention.
Fig. 7 is a schematic block diagram of an imaging recognition apparatus based on a quantitative fish recognition neural network model according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of an internal structure of the portable underwater imaging and fish identification device according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as upper, lower, left, right, front, rear, 8230; \8230;) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components in a specific posture (as shown in the figure), the motion situation, etc., and if the specific posture is changed, the directional indications are correspondingly changed.
In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Method embodiment
As shown in fig. 1, an embodiment of the present invention provides an imaging identification method based on a quantitative fish identification neural network model, including the following steps:
s100, acquiring underwater image data in advance;
s200, screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
step S300, labeling the screened pictures which are reserved with the preset images containing the fishes, labeling the fishes in the pictures through a rectangular frame, and forming the labeled pictures;
step S400, format conversion is carried out on the marked pictures to form label files, a set of the label files and the marked pictures is used as a data set, and the data set is divided into a training set, a verification set and a test set according to a preset proportion;
s500, building a fish recognition neural network model, and inputting the training set into the built fish recognition neural network model to obtain the trained fish recognition neural network model;
s600, channel pruning is carried out on the trained fish recognition neural network model;
s700, performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework;
s800, identifying a neural network model for the fish after model conversion, and quantizing the model parameters;
and S900, extracting frames from the video stream obtained by real-time shooting through the quantized fish recognition neural network model, carrying out fish recognition on the images after frame extraction, recognizing fishes, drawing a prediction frame around the fishes, and carrying out incoming fish reminding.
The invention is further illustrated in detail by the following application examples:
the imaging identification method based on the quantitative fish identification neural network model provided by the specific application embodiment can be used for portable underwater imaging and fish identification devices. As shown in fig. 2, an imaging identification method based on a quantitative fish identification neural network model according to the embodiment of the present application includes the following steps:
s10, collecting underwater image data;
in the embodiment of the invention, training data for training the building of the fish recognition neural network model needs to be collected firstly, specifically, an ADH (ADH is a camera which transmits digital signals by using a coaxial cable) can be used for shooting underwater environment images in a water area, and after the underwater environment images enter the water, the positions, inclination angles, illumination and the like of the camera are adjusted to obtain diversified image backgrounds so as to collect and obtain underwater image data;
in the embodiment of the invention, when at least one fish object exists in the shot image, and the fish image has different sizes, sides, rotation angles, inclinations, illuminations and shelters, the fish object image is richer and has diversity. Thus, the fish can be conveniently identified.
When the underwater image acquisition method is specifically implemented, the underwater image data acquired in advance can be sent to the mobile terminal through wireless signals to be stored, and the acquired image data can be pictures or videos.
And S20, screening the collected underwater image data, and reserving a preset picture containing a fish image in the screened image data.
In the embodiment of the invention, the collected underwater image data is screened, and if the underwater image data is video data, a picture is captured at an interval of 1 second; screening all picture data, and reserving a preset number of images containing fishes, for example, saving at least 3000 pictures with clear fish images.
And step S30, labeling the screened pictures which are reserved with the preset images containing the fishes, and labeling the fishes in the pictures through a rectangular boundary frame to form the labeled pictures.
In the embodiment of the invention, the fish in the picture can be specifically marked by a rectangular frame, the label is stored as an xml format file, and the file comprises the x and y coordinates of the central point of the boundary frame, the length and the width of the boundary frame and the fish category name 'fish'.
Step S40, converting the format of the marked pictures to form a label file, taking the label file and a set of the marked pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
in the embodiment of the invention, a data set is established. For example, the data set can be divided into a training set, a validation set and a test set according to the proportion of 8. The data set is stored in a VOC path format.
In the embodiment of the invention, as to how to establish the data set: firstly, labeling each picture, and automatically generating an xml file when each picture is labeled; the labeling is to draw a boundary frame for the fish in the picture, and each xml file contains the name of the corresponding picture, the picture size (the width and the height of the picture), the labeled fish target quantity, a category label (fish), and a labeling target frame (coordinates x and y at the upper left corner and coordinates x and y at the lower right corner of the frame) until all the pictures are labeled, and each picture has a corresponding xml-format file;
then all xml files are converted into txt format files, and the txt files comprise five values: the fish category (denoted by 1), and (coordinates x, y in the upper left and x, y in the lower right of the box), these coordinates being normalized, i.e. the value of the coordinate point relative to the value of the picture size (taking 0-1). After obtaining the xml and txt files, the tag files (xml, txt) and pictures are stored in a folder according to a certain path, and the collection of these pictures and tag files is called as a data set. The data set is used for training a neural network model (in the embodiment of the invention, the neural network model is used for training the fish recognition in the following steps), and the labeling of the fish on the picture is to specify the characteristics of the fish to be learned by the neural network.
As shown in fig. 3, the picture and tab files are stored in a folder according to such a path, and their collection is a data set.
In the embodiment of the invention, the data set division comprises the following steps: according to the embodiment of the invention, a neural network model (fish recognition neural network model) is built, the built neural network model is used for calling data according to path indexes stored by pictures and tag files in a data set, according to a ratio of 8 to 1, if the data set has 100 pictures, paths for randomly reading 80 pictures are stored in a txt file, paths for randomly reading 10 pictures in the rest pictures are stored in a txt file, and the paths for remaining 10 pictures are directly stored in a txt file, so that picture sets respectively designated by the three txt files are called a training set, a verification set and a test set. The function is as follows: the training set is used for fitting (learning) a picture sample with a neural network model, gradient reduction is carried out on training errors in the training process, and weight parameters are obtained through training; the verification set is used for adjusting the hyper-parameters of the model and performing preliminary evaluation on the capability of the model; the test set is used to evaluate the generalization ability of the final model.
S50, building a fish recognition neural network model, and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
according to the invention, a fish recognition neural network model is built, as shown in FIG. 4, FIG. 4 is a flow chart of building the neural network model according to the embodiment of the invention, a series of feature maps with different scales of an image are extracted, three feature layers are finally output, fish target detection is carried out on the three feature layers, and the trained fish recognition neural network model is obtained;
in the embodiment of the invention, as shown in fig. 4, a fish recognition neural network model is built, and the neural network model is trained: the method comprises the following concrete steps:
s51, extracting a series of different-scale feature maps of the image by using CSP (CSP cross-stage local network) as a feature extraction network;
the CSPDarknet is used as a feature extraction network, and the feature extraction network has the main function of extracting a series of feature maps with different scales of the marked pictures. The CSPDarknet is a backbone feature extraction network of a YOLOv 4-deep learning model, an input picture is subjected to feature extraction in the CSPDarknet at first, and the extracted features can be called a feature layer and are a feature set of the input picture. In the main part, three characteristic layers are obtained to carry out the next step of network construction, and the three characteristic layers are called as effective characteristic layers.
S52, connecting 4 cross-stage local network modules and a feature pyramid pooling module in series in the feature extraction network; dividing the original input into two branches through a cross-stage local network, respectively carrying out convolution operation to reduce the number of channels by half, wherein one branch does not carry out any treatment, the other branch carries out a multi-time residual error structure, and finally fusing the channels of the two branches;
in the embodiment of the invention, 4 CSP modules and one SPP module are connected in series in a feature extraction network, the CSP is called a cross-stage local network, the SPP is called a feature pyramid pooling, the CSP structure divides an original input into two branches, convolution operation is respectively carried out to reduce the number of channels by half, one branch is not processed, the other branch carries out a residual error structure for many times, and finally the channels of the two branches are fused. Therefore, the model learns more features, the network learning capability can be effectively enhanced while the network operation accuracy is ensured, the neural network is smaller, the calculation bottleneck is effectively reduced, and the memory occupation is reduced.
Step S53, placing a convolution kernel with the size of 3 x 3 and stride =2 in front of each cross-stage local network module to play a role of down-sampling; adding a characteristic pyramid pooling module behind the 3 rd cross-stage local network module, converting a characteristic diagram with any size into a characteristic vector with a fixed size through the characteristic pyramid pooling module, performing three pooling operations on the input characteristic diagram through the characteristic pyramid pooling module, and splicing with an input which is not pooled to perform multi-scale characteristic fusion;
a convolution kernel with the size of 3 x 3, stride =2 is placed in front of each CSP module (cross-stage local network module) to perform the down-sampling function. Adding an SPP module behind the 3 rd CSP module (a cross-stage local network module), converting a feature map with any size into a feature vector with a fixed size by the SPP module (a feature pyramid pooling module), performing three pooling operations on the input feature map by the SPP module, performing maximum pooling in the pooling mode, and splicing with an input which is not pooled, wherein the four parts are integrated to play a multi-scale feature fusion role. The SPP structure can increase the receptive field, and simultaneously separate out the most important contextual features and remove some redundant information.
Step S54, finally outputting three characteristic layers called effective characteristic layers through the characteristic extraction network, wherein the three characteristic layers are positioned at three different positions of the characteristic extraction network and are respectively positioned at a middle layer, a middle lower layer and a bottom layer;
according to the invention, three characteristic layers are finally output through the characteristic extraction network, and fish target detection is carried out on the three characteristic layers. The three characteristic layers are positioned at three different positions of the characteristic extraction network and are respectively positioned at the middle layer, the middle lower layer and the bottom layer.
In step S55, for example, when the input is (640, 3), that is, the input size is 640 × 640, and the three channels rgb are input, as shown in fig. 4, the outputs of the three feature layers are feat1= (80, 256), feat2= (40, 512), and feat3= (20, 1024), respectively.
S56, constructing a characteristic pyramid network layer through the three extracted characteristic layers;
in the embodiment of the present invention, after the three feature layers are obtained, the three feature layers are used to construct an FPN layer (feature pyramid network), and the construction method is as follows: and (3) performing 1-times 1 convolution on the feature layer to adjust a channel to obtain a feature layer P5, performing up-sampling on the feature layer P5 to obtain a feature layer with the same size as that of the feature layer feat2, then fusing the feature layer with the feature layer feat2 to obtain a new feature layer, and performing CSPlayer feature extraction on the new feature layer to obtain a P5_ update feature layer with the size of (40, 512).
And S57, performing 1-time 1-times convolution adjustment on the obtained P5_ upsample feature layer to obtain a feature layer P4, performing up-sampling on the P4 to obtain a feature layer with the same size as that of the heat 1, then fusing the feature layer with the heat 1 to obtain a new feature layer, and performing CSPlayer feature extraction on the new feature layer to obtain a P3_ out feature layer with the size of (80, 256).
And S58, performing 3-by-3 convolution on the extracted P3_ out characteristic layer for downsampling, stacking the downsampled P4 and the P4_ out characteristic layer, and extracting the characteristic by using the CSPlayer to obtain the P4_ out characteristic layer with the size of (40, 40 and 512).
And step S59, performing 3-by-3 convolution on the extracted P4_ out feature layer, and performing down-sampling.
Step S510, stacking the down-sampled P4 and the down-sampled P4_ out feature layer, with a size of (40, 512), obtained by CSPlayer feature extraction.
Step S511, performing 3 × 3 convolution once on the obtained P4_ out feature layer to perform down-sampling, stacking the down-sampled P4_ out feature layer with P5, and then obtaining a P5_ out feature layer with the size of (20, 1024) by using CSPlayer feature extraction.
In summary, three feature layers are constructed through the FPN layer to obtain three reinforced feature layers, and the three reinforced feature layers have the sizes of (40, 512), (80, 256), (20, 1024), respectively.
And S512, transmitting the obtained three reinforced characteristic layers into a detection layer, generating 3 prediction frames for each characteristic image pixel point of the three reinforced characteristic layers by the detection layer during training, and associating the prediction frames with the characteristic layer data through an anchor frame mechanism to generate an output matrix with target categories, category probabilities and prediction frame positions. The three feature matrices are the preliminary prediction results of three feature layers, the matrix shapes of the feature layers are (N, 20, 255), (N, 40, 255), (N, 80, 255), the reshape of each matrix is firstly changed into (N, 20,3, 85), (N, 40,3, 85),
(N, 80,3, 85) the result is where N is batch _ size. The last dimension of the feature matrix represents 85 (4 +1+ 80) parameters of the prediction frame, the first 4 parameters are used for judging regression parameters of each feature point, the regression parameters are used for adjusting the prediction frame, the parameters are continuously updated through back propagation during training, the prediction frame gradually approaches to a real frame, the 5 th parameter represents the confidence degree of whether the prediction frame contains fish, and the last 80 parameters are used for judging the object type contained in each feature point. Taking the feature level (N, 20,3, 85) as an example, the feature level is equivalent to dividing the original input image into 20 × 20 feature points, and if a certain feature point falls within the corresponding frame of an object, the object is predicted.
And step S513, when the detection layer detects that the fish target exists, generating a prediction frame around the fish target, and performing value suppression on the prediction frame.
In the embodiment of the invention, when the detection layer detects that the fish target exists, a plurality of prediction frames are generated around the fish target, the prediction frames have high overlapping degree and are mutually overlapped, the overlapping degree of each prediction frame and the real fish target is different, and in order to enable one fish target to correspond to only one prediction frame, the non-maximum value inhibition needs to be carried out on the prediction frames.
Step S514, the inhibition of the non-maximum value is firstly sorted from large to small according to the confidence score of the prediction box, the prediction box with the highest confidence is added into the output list, and other prediction boxes are kept in the temporary list.
And step S515, in the temporary list, the prediction frame with the highest confidence coefficient is taken and compared with other frames, the intersection ratio of the prediction frame with the highest confidence coefficient and other prediction frames is compared, the prediction frame with the intersection ratio larger than a set threshold value is deleted, and the other prediction frames are stored in the temporary list.
And S516, repeating the steps S514 and S515 on the prediction frames in the temporary list, and finally only keeping one prediction frame, wherein one prediction frame is drawn on the corresponding fish on the original image.
And constructing a fish recognition neural network model, and training to obtain the trained fish recognition neural network model.
In the training process of the embodiment of the invention, a batch of pictures are input into the neural network, the number of the pictures in each batch can be defined, for example, the embodiment of the invention defines a batch as 16 pictures,
step 1, randomly reading 4 pictures from each batch of pictures;
step 2, the 4 pictures are randomly zoomed, randomly cut and randomly arranged to be spliced into a new picture;
step 3, repeating the step 1 and the step 2 for 16 times to obtain an augmented data set;
step 4, the augmented data set and the original data set are sent to a neural network together for training;
by the method, a detection data set is greatly enriched, the diversity of samples is increased, and a plurality of small targets are increased by random scaling, so that the network robustness is better; 4 images can be calculated at one time, and the difficulty of the model for diversity learning is greatly reduced.
S60, channel pruning is carried out on the trained fish recognition neural network model;
the invention performs channel pruning on the trained fish recognition neural network model, and can reduce the calculation resources occupied by the neural network model. Because the method is realized on the basis of being capable of being realized on a thousand-yuan smart phone, the computing resources of the smart phone are relatively limited, fish identification is a forward reasoning process of an image in a neural network model, the process needs certain memory access and a large amount of CPU (central processing unit) operation, the use of other mobile phone software is influenced due to the fact that too many computing resources of the mobile phone are occupied, too much energy consumption is generated, and in order to reduce the consumption of the computing resources of the mobile phone in the fish identification process, the model needs to be lightened, the method adopts channel pruning on the model, as shown in FIG. 5, and step S60, the channel pruning is carried out on the trained fish identification neural network model, and the method is specifically as follows:
s61, initiating a fish identification model, and performing channel sparse regularization training;
inserting a batch normalization layer after the convolutional layer of the trained fish recognition neural network model, and sending the characteristic diagram into the batch normalization layer to obtain a normalized characteristic diagram;
the unimportant channels are first identified by inserting a Batch Normalization (BN) layer after the convolutional layer, and the feature map after convolution is a multidimensional matrix, where one dimension is the number of channels c, and feeding the feature map into the BN layer will obtain the feature map after normalization, where each channel in the c feature maps corresponds to a set of scaling factors γ.
Step S62, controlling the scaling factor gamma to tend to 0;
then, the L1 norm regularization is applied to a scaling factor gamma of the BN layer, and then channel sparse regularization training is carried out on the model; the L1 regularization makes some scaling factors of the BN layer tend to 0, and since the activation value Z is in a positive correlation with the scaling factor γ, the corresponding activation value Z is also very small, and accordingly the channel corresponding to the small activation value Z is less important, which enables to identify the unimportant channel, which facilitates the subsequent channel pruning.
In the embodiment of the present invention, the L1 regularization is a LASSO regression formula or is called a function, and may be regarded as a penalty term of the loss function, where the penalty term is to limit some parameters in the loss function, so that the parameters are valued in a certain range. Its role is to perform channel selection, resulting in a sparse model.
S63, setting a channel pruning rate;
namely, a pruning rate for determining the proportion of the number of the pruned channels is set, and the pruning rate determines the proportion of the number of the pruned channels.
S64, sorting scaling factors, and deleting channels lower than a threshold layer by layer;
according to the set pruning rate, correspondingly pruning the number of channels of the trained fish recognition neural network model;
for example, setting the pruning rate to be 0.2, correspondingly pruning 20% of the number of channels, correspondingly sorting the absolute values of the scaling factors gamma from small to large, then taking the gamma value at the 20 th% as a threshold, and setting all the gamma values lower than the threshold to zero, so that the activation value Z becomes zero, and the corresponding channel connection can be deleted.
And S65, cutting off channels lower than a threshold layer by layer, pruning 20% of the channels with less weight of the model to obtain a compact network model, reducing the convolution operation times in the image reasoning stage, and reducing the occupation of the model operation memory to lighten the model.
S66, retraining the compact network model obtained after pruning and finely adjusting the pruning rate to enable the compact network model to achieve the recognition effect before pruning as much as possible, so that the optimal fish recognition neural network model after pruning is obtained;
generally, the pruning rate set for the first time cannot bring the optimal pruning effect, if the pruning rate is set to be too low, the model is not substantially light-weighted, and if the pruning rate is set to be too high, the model tends to greatly reduce the accuracy of fish identification, so that after one pruning rate is set, the model after pruning is retrained, the pruning rate is finely adjusted, the identification effect before pruning is achieved as far as possible, and the optimal model after pruning is obtained.
Thus obtaining the optimal trimmed fish recognition neural network model.
And S70, performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework.
The specific implementation steps for carrying out model conversion on the fish recognition neural network model for completing channel pruning in the embodiment of the invention are as follows:
step S71, the model after pruning is built and trained under a pytorch framework, the pytorch model cannot be directly operated on the mobile terminal, and in order to deploy the network model on the mobile phone, the trained network model is converted into a model format suitable for a mobile terminal computing framework, namely ONNX is used as an intermediate layer, a torch2ONNX tool is used for converting the weight of the trained network model into the ONNX format, and then an onx 2NCNN tool is used for converting the ONNX format file into the NCNN model.
The PyTorch is an open-source Python machine learning library and is used for applications such as natural language processing and the like based on the Torch.
ONNX is an open neural network exchange format used to represent standards for deep learning models, enabling models to be transferred between different frameworks. NCNN is a deep learning forward processing framework specifically targeted at mobile devices.
S80, quantizing the fish recognition neural network model after model conversion through quantizing model parameters:
specifically, the model quantization flow is as shown in fig. 6.
And S81, after obtaining the model file in the NCNN format in the previous step, quantizing the model parameters to lighten the model again. The weight parameters of the NCNN format model are represented by 32-bit floating point type binary, and the 32-bit floating point type weight parameters are quantized into 8-bit integer type parameters, so that the size of the model is reduced by 75%. The low bit quantization of the model can reduce the computation complexity of the model, accelerate the computation speed, reduce the delay time and simultaneously reduce the memory occupation of data storage. On the other hand, the fixed-point dot product is used for replacing the floating-point dot product, so that the operation cost of the neural network on the operation unit is reduced, the occupied memory of the model is greatly reduced compared with an unquantized model, the reasoning speed is correspondingly improved, and the lightweight model can meet the calculation requirement under the thousand-element level mobile phone hardware resource.
S90, extracting frames from a video stream obtained by real-time shooting through a quantized fish identification neural network model, carrying out fish identification on the images after frame extraction, identifying fishes, drawing a prediction frame around the fishes, and carrying out incoming fish reminding;
and embedding the quantified fish recognition neural network model into software to serve as an API (application programming interface) of the software, wherein the quantified fish recognition neural network model can be installed in a mobile terminal during specific implementation, and a detection function is started when the software is operated.
When fish identification is carried out, the underwater camera picture can be transmitted back to a mobile terminal such as a mobile phone through a wireless network and displayed, so that the transmitted underwater image video data is shot by the camera through the mobile terminal embedded with a quantized fish identification neural network model, frames are extracted from a video stream, the images after the frames are extracted are sent to a fish identification neural network for forward reasoning, fish is identified, a prediction frame is drawn around the fish, and meanwhile, the app sends out fish reminding.
The invention can realize the sharing of the video picture transmitted to the multi-person mobile terminal, adopts wireless signal transmission data to improve the moving range of the user, and can adjust the brightness and saturation of the video picture to obtain good display effect; the invention optimizes the target detection algorithm to detect and identify the fish, and solves the problems of ensuring the real-time performance and the precision of the detection under the condition of limited computing resources of portable equipment. Intelligently identifying fishes in a video picture in real time, drawing frames and marking the fishes in the picture, and simultaneously sending out fish prompt tones; the invention brings new experience to the user, one device, namely the multi-screen shared picture, carries out fish reminding when being unattended, and improves the perception of the user
Device embodiment
As shown in fig. 7, an embodiment of the present invention provides an imaging recognition apparatus based on a quantitative fish recognition neural network model, including:
the pre-acquisition module 10 is used for acquiring underwater image data in advance;
the image screening module 20 is used for screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
the image labeling module 30 is used for labeling the screened pictures which are reserved with the predetermined images containing the fishes, labeling the fishes in the pictures through a rectangular frame, and forming labeled pictures;
the data set assembly module 40 is used for performing format conversion on the marked pictures to form label files, taking the label files and the marked pictures as data sets, and dividing the data sets into a training set, a verification set and a test set according to a preset proportion;
the neural network model building module 50 is used for building a fish recognition neural network model and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
a channel pruning module 60, configured to perform channel pruning on the trained fish recognition neural network model;
the model conversion module 70 is used for performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework;
a quantization module 80, configured to quantize the model-converted fish recognition neural network model by quantizing the model parameters;
the fish recognition neural network model application module 90 is configured to extract frames from a video stream obtained by real-time shooting through the quantized fish recognition neural network model, perform fish recognition on the images after the frames are extracted, recognize fish, draw a prediction frame around the fish, and perform incoming fish reminding, as described above.
Based on the above embodiment, the invention also provides a portable underwater imaging and fish identification device, and a functional block diagram of the portable underwater imaging and fish identification device of the embodiment of the invention can be shown in fig. 8. The portable underwater imaging and fish identification device comprises a processor, a memory and a network interface which are connected through a system bus. Wherein, the processor of the portable underwater imaging and fish identification device is used for providing calculation and control capability. The memory of the portable underwater imaging and fish identification device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the portable underwater imaging and fish identification device is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to implement an imaging identification based on a quantitative fish identification neural network model. The portable underwater imaging and fish identification device sends the shot picture to the mobile terminal for display through a wireless network.
It will be understood by those skilled in the art that the block diagram of fig. 5 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation on the application of the solution of the present invention and the portable underwater imaging and fish identification device used therein, and a specific portable underwater imaging and fish identification device may include more or less components than those shown in the drawing, or may be combined with some components or have different arrangements of components.
In one embodiment, a portable underwater imaging and fish identification device is provided, the portable underwater imaging and fish identification device comprises a memory, a processor and an imaging identification program stored on the processor and capable of running on the processor and based on a quantitative fish identification neural network model, the processing is performed as follows:
acquiring underwater image data in advance;
screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
labeling the screened pictures which are reserved with the preset pictures containing the fish images, and labeling the fish in the pictures through a rectangular frame to form the labeled pictures;
carrying out format conversion on the marked pictures to form a label file, taking a set of the label file and the marked pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
building a fish recognition neural network model, and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
performing channel pruning on the trained fish recognition neural network model;
performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework;
quantizing the fish recognition neural network model after the model conversion through quantizing model parameters;
through the quantized fish recognition neural network model, frames are extracted from a video stream obtained through real-time shooting, fish recognition is performed on the images after the frames are extracted, fishes are recognized, a prediction frame is drawn around the fishes, and incoming fish reminding is performed, specifically as described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases or other media used in the embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the invention discloses an imaging identification method, device and medium based on a quantitative fish identification neural network model, and the invention obtains the trained fish identification neural network model by building the fish identification neural network model and inputting the training set into the built fish identification neural network model; performing channel pruning on the trained fish recognition neural network model; performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework; quantizing the fish recognition neural network model after the model conversion through quantizing model parameters; and (3) extracting frames from the video stream obtained by real-time shooting through the quantized fish recognition neural network model, and performing fish recognition on the images after the frames are extracted. Compared with an unquantized model, the method has the advantages that the occupied memory is greatly reduced, the reasoning speed is correspondingly improved, and the lightweight model can meet the calculation requirement under low-cost hardware resources.
In addition, the underwater camera picture is transmitted back to the mobile phone through the wireless network and displayed, a plurality of people share the video picture, the wireless signal transmission data improves the moving range of the user, and the brightness and the saturation of the video picture can be adjusted to obtain a good display effect; the invention adopts an optimized target detection algorithm, and solves the problems of ensuring the real-time performance and precision of detection under the condition of limited computing resources of portable equipment. The invention can intelligently identify fishes in the video picture in real time, draw frame marks on the fishes in the picture and send out the fish prompt tone; the invention brings new experience to the user, one device, namely a multi-screen shared picture, carries out fish reminding when being unattended, and improves the perception of the user; providing convenience for the user. And the invention also has the following advantages:
1) The method has the advantages that the occupied operation resources are few, the method can be operated and used in equipment with low operation capacity, for example, the method can be well realized in a thousand-element-level smart phone with low chip operation capacity, a neural network dynamic pruning method is used, the unimportant part weight is reset to be 0 during network training, the weight parameters are reduced, the size of a model is reduced, and the calculation cost is reduced, so that the underwater fish can be identified in real time in a storage space and the thousand-element-level smart phone with low operation capacity;
2) The invention adopts the neural network to identify the fishes, has high identification accuracy and is real-time video identification; the method can identify whether the fish is the target object or not easily even if too many objects in the real environment have similar length-width ratio to the fish, and can effectively reduce misjudgment of other objects as the fish;
3) The invention is characterized in that in a data augmentation part, fish images containing different sizes, sides, rotation angles, inclinations, illuminations and shelters are shot in real underwater environments with different backgrounds and are used as an original data set to perform data augmentation on the basis: sending the augmented data set and the original data set into a neural network together for training; by the method, a detection data set is greatly enriched, the diversity of samples is increased, and a plurality of small targets are increased by random scaling, so that the network robustness is better; 4 images can be calculated at one time, and the difficulty of the model for diversity learning is greatly reduced.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An imaging identification method based on a quantitative fish identification neural network model, which is characterized by comprising the following steps:
acquiring underwater image data in advance;
screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
labeling the screened pictures which are reserved with the preset pictures containing the fish images, and labeling the fish in the pictures through a rectangular frame to form the labeled pictures;
carrying out format conversion on the marked pictures to form a label file, taking a set of the label file and the marked pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
building a fish recognition neural network model, and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
performing channel pruning on the trained fish recognition neural network model;
performing model conversion on the fish recognition neural network model subjected to channel pruning, and converting the model into a model format suitable for a mobile terminal computing framework;
quantizing the fish recognition neural network model after model conversion by quantizing the model parameters;
and (3) extracting frames from the video stream obtained by real-time shooting through the quantized fish identification neural network model, carrying out fish identification on the images after frame extraction, identifying the fish, drawing a prediction frame around the fish, and carrying out incoming fish reminding.
2. The imaging identification method based on the quantitative fish identification neural network model according to claim 1, wherein the step of converting the format of the labeled pictures to form label files, using the set of the label files and the labeled pictures as a data set, and dividing the data set into a training set, a verification set and a test set according to a predetermined ratio comprises:
automatically generating an xml file for each marked picture;
converting the whole xml file into a txt format file, wherein the txt file comprises five values: the fish category is represented by 1, and the coordinates x, y of the upper left corner and x, y of the lower right corner of the box;
taking the set of the label file and the marked picture as a data set for neural network training;
and dividing the data set into a training set, a verification set and a test set according to the proportion of 8.
3. The imaging identification method based on the quantitative fish identification neural network model according to claim 1, wherein the step of building the fish identification neural network model and inputting the training set into the built fish identification neural network model to obtain the trained fish identification neural network model comprises the following steps:
using CSPDarknet as a feature extraction network to extract a series of feature maps with different scales of the image;
4 cross-stage local network modules and a feature pyramid pooling module are used in the feature extraction network for series connection; dividing the original input into two branches through a cross-stage local network, respectively carrying out convolution operation to reduce the number of channels by half, wherein one branch does not carry out any treatment, the other branch carries out a multi-time residual error structure, and finally fusing the channels of the two branches;
a convolution kernel with the size of 3 x 3 and stride =2 is placed in front of each cross-stage local network module to play a role of down-sampling; adding a characteristic pyramid pooling module behind the 3 rd cross-stage local network module, converting a characteristic graph with any size into a characteristic vector with a fixed size through the characteristic pyramid pooling module, performing three pooling operations on the input characteristic graph through the characteristic pyramid pooling module, and splicing with an input which is not subjected to pooling to perform multi-scale characteristic fusion;
finally outputting three characteristic layers called effective characteristic layers through the characteristic extraction network, wherein the three characteristic layers are positioned at three different positions of the characteristic extraction network and are respectively positioned at a middle layer, a middle lower layer and a bottom layer;
and constructing a feature pyramid network layer through the extracted three feature layers to obtain three reinforced feature layers.
4. The imaging identification method based on the quantitative fish identification neural network model according to claim 3, wherein the step of building the fish identification neural network model and inputting the training set into the built fish identification neural network model to obtain the trained fish identification neural network model comprises the steps of:
transmitting the obtained three enhanced feature layers into a detection layer, generating 3 prediction frames for each feature image pixel point of the three enhanced feature layers by the detection layer during training, and associating the prediction frames with the feature layer data through an anchor frame mechanism to generate an output matrix with a target class, a class probability and a prediction frame position;
when the detection layer detects that the fish target exists, a prediction frame is generated around the fish target, value suppression is carried out on the prediction frame, only one prediction frame is reserved at last, one prediction frame is correspondingly drawn on one fish on the original image, and training of the fish recognition neural network model is completed.
5. The imaging identification method based on the quantitative fish identification neural network model of claim 1, wherein the step of performing channel pruning on the trained fish identification neural network model comprises the following steps:
inserting a batch normalization layer after the convolutional layer of the trained fish recognition neural network model, and sending the characteristic diagram into the batch normalization layer to obtain a normalized characteristic diagram;
setting a pruning rate for determining the ratio of the number of pruning channels;
according to the set pruning rate, correspondingly pruning the number of channels of the trained fish recognition neural network model to obtain a compact network model;
retraining the compact network model obtained after pruning and finely adjusting the pruning rate to enable the compact network model to achieve the recognition effect before pruning as far as possible, thereby obtaining the optimal fish recognition neural network model after pruning.
6. The imaging recognition method based on the quantitative fish recognition neural network model of claim 1, wherein the step of performing model conversion on the fish recognition neural network model subjected to channel pruning into a model format suitable for a mobile terminal computing framework comprises:
and performing model conversion on the fish recognition neural network model after channel pruning, adopting ONNX as an intermediate layer, using a torch2ONNX tool to convert the trained network model weight into an ONNX format, then using an ONNX2NCNN tool to convert an ONNX format file into an NCNN model, and converting the NCNN model into a model format suitable for a mobile terminal computing framework.
7. The imaging identification method based on the quantified fish identification neural network model according to claim 6, wherein the quantifying the model-converted fish identification neural network model by quantifying the model parameters comprises:
and (4) quantizing the model parameters of the fish recognition neural network model after the model conversion to lighten the model again to obtain the quantized fish recognition neural network model.
8. An imaging recognition device based on a quantified fish recognition neural network model, comprising:
the pre-acquisition module is used for acquiring underwater image data in advance;
the image screening module is used for screening the acquired underwater image data, and reserving a preset picture containing a fish image in the screened image data;
the image labeling module is used for labeling the screened pictures which are reserved with the preset images containing the fishes, labeling the fishes in the pictures through a rectangular frame and forming the labeled pictures;
the data set assembly module is used for carrying out format conversion on the marked pictures to form a label file, taking the label file and the marked picture assembly as a data set, and dividing the data set into a training set, a verification set and a test set according to a preset proportion;
the neural network model building module is used for building a fish recognition neural network model and inputting the training set into the built fish recognition neural network model to obtain a trained fish recognition neural network model;
the channel pruning module is used for performing channel pruning on the trained fish recognition neural network model;
the model conversion module is used for carrying out model conversion on the fish recognition neural network model subjected to channel pruning and converting the model into a model format suitable for a mobile terminal computing framework;
the quantification module is used for quantifying the fish recognition neural network model after model conversion through quantifying model parameters;
and the fish identification neural network model application module is used for extracting frames from the video stream obtained by real-time shooting through the quantized fish identification neural network model, identifying the fishes by carrying out fish identification on the images after the frames are extracted, drawing a prediction frame around the fishes, and reminding the fishes.
9. A portable underwater imaging and fish identification device, characterized in that the portable underwater imaging and fish identification device comprises a memory, a processor and an imaging identification program based on a quantified fish identification neural network model, which is stored in the memory and can be run on the processor, and when the processor executes the imaging identification program based on the quantified fish identification neural network model, the steps of the imaging identification method based on the quantified fish identification neural network model according to any one of claims 1 to 7 are realized.
10. A computer-readable storage medium, wherein the computer-readable storage medium has stored thereon an imaging identification program based on a neural network model for quantitative fish identification, and when the imaging identification program based on the neural network model for quantitative fish identification is executed by a processor, the steps of the imaging identification method based on the neural network model for quantitative fish identification according to any one of claims 1-7 are implemented.
CN202211296268.5A 2022-10-21 2022-10-21 Imaging identification method and device based on quantitative fish identification neural network model Pending CN115527106A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211296268.5A CN115527106A (en) 2022-10-21 2022-10-21 Imaging identification method and device based on quantitative fish identification neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211296268.5A CN115527106A (en) 2022-10-21 2022-10-21 Imaging identification method and device based on quantitative fish identification neural network model

Publications (1)

Publication Number Publication Date
CN115527106A true CN115527106A (en) 2022-12-27

Family

ID=84703813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211296268.5A Pending CN115527106A (en) 2022-10-21 2022-10-21 Imaging identification method and device based on quantitative fish identification neural network model

Country Status (1)

Country Link
CN (1) CN115527106A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning
CN117541623A (en) * 2023-11-23 2024-02-09 中国水产科学研究院黑龙江水产研究所 Fish shoal activity track monitoring system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129197A (en) * 2023-04-04 2023-05-16 中国科学院水生生物研究所 Fish classification method, system, equipment and medium based on reinforcement learning
CN117541623A (en) * 2023-11-23 2024-02-09 中国水产科学研究院黑龙江水产研究所 Fish shoal activity track monitoring system
CN117541623B (en) * 2023-11-23 2024-06-07 中国水产科学研究院黑龙江水产研究所 Fish shoal activity track monitoring system

Similar Documents

Publication Publication Date Title
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN115527106A (en) Imaging identification method and device based on quantitative fish identification neural network model
KR102385463B1 (en) Facial feature extraction model training method, facial feature extraction method, apparatus, device and storage medium
CN111160375B (en) Three-dimensional key point prediction and deep learning model training method, device and equipment
CN110033023B (en) Image data processing method and system based on picture book recognition
AU2021201933B2 (en) Hierarchical multiclass exposure defects classification in images
US11861769B2 (en) Electronic device and operating method thereof
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN112508975A (en) Image identification method, device, equipment and storage medium
CN112200057A (en) Face living body detection method and device, electronic equipment and storage medium
CN113824884B (en) Shooting method and device, shooting equipment and computer readable storage medium
CN111553182A (en) Ship retrieval method and device and electronic equipment
CN112597920A (en) Real-time object detection system based on YOLOv3 pruning network
CN110705564B (en) Image recognition method and device
CN113674321B (en) Cloud-based method for multi-target tracking under monitoring video
CN113221695B (en) Method for training skin color recognition model, method for recognizing skin color and related device
CN113128522B (en) Target identification method, device, computer equipment and storage medium
CN116519106B (en) Method, device, storage medium and equipment for determining weight of live pigs
CN116704554A (en) Method, equipment and medium for estimating and identifying hand gesture based on deep learning
CN112699809B (en) Vaccinia category identification method, device, computer equipment and storage medium
CN113469049B (en) Disease information identification method, system, device and storage medium
CN115115552A (en) Image correction model training method, image correction device and computer equipment
CN114511877A (en) Behavior recognition method and device, storage medium and terminal
CN114640785A (en) Site model updating method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination