CN116740758A - Bird image recognition method and system for preventing misjudgment - Google Patents

Bird image recognition method and system for preventing misjudgment Download PDF

Info

Publication number
CN116740758A
CN116740758A CN202310603161.9A CN202310603161A CN116740758A CN 116740758 A CN116740758 A CN 116740758A CN 202310603161 A CN202310603161 A CN 202310603161A CN 116740758 A CN116740758 A CN 116740758A
Authority
CN
China
Prior art keywords
bird
image
birds
representing
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310603161.9A
Other languages
Chinese (zh)
Inventor
李红光
蒋晨曦
刘颖
尹莹
黄志斌
马永桃
高源良
黄洪达
黄尚杰
张蕾
黄日平
石凌霄
廖芷羚
黄小玲
刘宝华
黄司司
李豪鹏
李春容
韦文标
覃炜梅
杨嘉敏
刘绍偑
朱天成
梁裕鑫
黄琦
刘春成
伊然
郑毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guigang Power Supply Bureau of Guangxi Power Grid Co Ltd
Original Assignee
Guigang Power Supply Bureau of Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guigang Power Supply Bureau of Guangxi Power Grid Co Ltd filed Critical Guigang Power Supply Bureau of Guangxi Power Grid Co Ltd
Priority to CN202310603161.9A priority Critical patent/CN116740758A/en
Publication of CN116740758A publication Critical patent/CN116740758A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The application discloses a bird image recognition method for preventing erroneous judgment, which comprises the following steps: establishing a data set for identifying the whole and partial images of birds in a natural light environment, and marking the birds; training a YOLO neural network for target detection through a data set of bird integral image recognition, and primarily recognizing and coarsely positioning birds; training a U-net neural network through a data set of bird local image recognition, performing semantic segmentation on limb details of birds, and confirming bird information in an image. The bird image recognition method for preventing erroneous judgment provided by the application is formed by combining two deep learning models, the image to be detected is taken as input, the range is narrowed by the target detection algorithm, the partial graph is segmented by the semantic segmentation algorithm, the characteristics of birds contained in the partial graph are determined, the recognition time is shortened, whether the blocked birds, bird-like non-bird objects and small-size birds exist or not can be judged, and the erroneous judgment probability is reduced.

Description

Bird image recognition method and system for preventing misjudgment
Technical Field
The application relates to the technical field of artificial intelligent recognition, in particular to a bird image recognition method and a bird image recognition system for preventing erroneous judgment.
Background
In recent years, due to its excellent performance in detecting and identifying targets, the Deep Learning (DL) method has been studied to accomplish the task of detection and localization. Deep learning techniques greatly assist researchers in extracting relevant features that best represent the object to be described. In fact, these models have been successfully applied to various fields such as image classification, automatic driving of automobiles, voice recognition, pedestrian detection, bird recognition, cancer detection, and the like.
Due to the rapid development of computer vision algorithms, methods for bird detection continue to innovate. However, these methods face some limitations that need to be accurately solved, because of the habit of birds, the shooting angle is generally fixed at a high place to shoot in a top view, or is bound to some mobile bird-repellent devices to shoot in a bottom view, which results in that birds in the shot pictures are likely to be in a blocked state, at this time, not a clear and complete bird but only a part of the birds, and at this time, situations that other bird-like non-bird objects are also judged as birds, such as the existence of bird-like objects, high misjudgment rate, detection of small-size bird objects, and high inference time, may occur. However, in the actual situation, due to the influence of external factors such as camera accuracy, shooting angle, weather and the like and model training accuracy based on deep learning, bird information in an image is not complete, so that in certain occasions, the bird repellent device can always misjudge bird-like objects, and the resource utilization rate and efficiency of the bird repellent device are wasted.
Therefore, there is a need for a bird image recognition method that prevents erroneous judgment, reduces the erroneous judgment rate, and improves the accuracy.
Disclosure of Invention
This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.
The present application has been made in view of the above-described problems.
Therefore, the technical problems solved by the application are as follows: the existing bird identification method has the problems that small-size birds cannot be identified, the misjudgment rate is high, the birds are easily influenced by external factors, and the shielded birds are identified.
In order to solve the technical problems, the application provides the following technical scheme: a bird image recognition method for preventing erroneous judgment, comprising:
establishing a data set for identifying the whole and partial images of birds in a natural light environment, and marking the birds;
training a YOLO neural network for target detection through a data set of bird integral image recognition, and primarily recognizing and coarsely positioning birds;
training a U-net neural network through a data set of bird local image recognition, performing semantic segmentation on limb details of birds, and confirming bird information in an image.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: the method comprises the steps of establishing a data set for bird integral and local image recognition under natural light environment, wherein the data set for bird integral image recognition is established for marking bird integral by collecting images containing bird information under natural light scene through frame extraction, image database searching and field photographing of a monitoring video, and marking bird limb information containing bird head, wings, feathers and paws by the data set for bird local image recognition, image data augmentation is carried out on images of the image set through translation, scaling and rotation, and the amplified image data are respectively used as two parts of a training set and a test set according to the proportion of 9:1.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: training the YOLO neural network for target detection comprises training the YOLO neural network, setting a learning rate to be 0.01, setting a calendar to be 300, and outputting a loss value of a loss function in the training process, wherein the loss value is expressed as:
wherein S is 2 Representing the area of a single image cell, each image being divided into S x S cells, B representing eachGenerating a target frame number in the grid, obj representing a target object, noobj representing no target object, lambda coord Representing the position loss weight coefficient, lambda noobj Representing the specific gravity coefficient of the bounding box in the loss function,a j-th prediction box representing an i-th grid containing target objects,/th prediction box>A j-th prediction box representing an i-th grid not containing a target object,/->Representing the ith grid containing objects, x i And y i Representing the predicted value of the coordinates of the center point of an object, w i And h i Predictive value representing width and height of bounding box, +.>And->Coordinate value of the center point representing the actual object, +.>And->Representing the width and height of the actual bounding box, C i Representing the number of predicted object categories, +.>Representing the number of kinds of actual objects, p i (c) Representing the predicted likelihood of an object being present, < +.>Indicating the likelihood of the presence of an object, c indicating the specific category, class indicatingTotal number of types of objects.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: the preliminary identification and rough positioning of birds comprise inputting a data set for bird integral image identification for training, finishing training when the mIOU value and the loss value are fitted, outputting a model, and testing by using a testing set;
and identifying and detecting the image to be detected through the trained YOLO neural network to obtain a first detection result diagram, and cutting the position of the square boundary frame in the first identification result diagram to obtain an image only belonging to the boundary frame limiting area.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: training the U-net neural network through the data set of bird partial image recognition comprises training the U-net neural network, setting the learning rate to be 0.0001, selecting Adam as an optimizer, and outputting a loss value of a loss function in the training process, wherein the loss value is expressed as:
wherein y is i Andthe predicted value label value and the label value of the pixel i are respectively represented, and N represents the total number of pixel points;
and when the mIOU value and the loss value are fitted, training is finished, outputting a model, and testing by using a testing set.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: the method comprises the steps of confirming bird information in an image, namely sending a block diagram of a target detection model output by a YOLO neural network, which is considered to contain bird information, to a detection path of a semantic segmentation model of a U-net neural network, classifying each pixel point in the image, classifying irrelevant pixel points into background types, classifying the pixel points belonging to the bird information into corresponding limb detail label types, and finally generating a second detection result diagram, wherein the pixel information of the background types is set to be black, and four types of bird information are red, yellow, green and blue respectively.
As a preferable mode of the bird image recognition method for preventing erroneous judgment according to the present application, wherein: the bird information in the confirmation image further comprises any one of the pixels of the head, wings, feathers or paws of the bird when the YOLO neural network outputs an image with the result confidence higher than 0.8 and the second recognition result exists, and the current recognition result is confirmed to contain bird information and a bird driving instruction is sent;
when the YOLO neural network outputs an image with the result confidence higher than 0.8, but the secondary identification result does not have limb details of birds, the primary detection result is regarded as bird-like non-bird, and a bird-driving instruction is not sent;
when the YOLO neural network outputs an image with the result confidence level lower than 0.8, the image is regarded as an image in which birds and bird-like non-birds are not present in the image, the second detection is not performed, and the absence of birds is judged.
Another object of the present application is to provide a bird image recognition system for preventing erroneous judgment, which can discriminate a blocked bird and a bird-like non-bird object by secondarily judging body part characteristics of birds in a recognition image, solving the problem that the existing bird recognition method cannot recognize the blocked bird and the bird-like non-bird object.
A bird image recognition system for preventing erroneous judgment, characterized in that: the system comprises a data integration module, a primary identification module and a secondary detection module;
the data integration module is used for collecting bird image data, constructing a data set, classifying the data set into a bird whole data set and a bird partial image, carrying out image data augmentation, and dividing the data set and a test set;
the preliminary identification module is used for carrying out preliminary identification on the data set and judging whether birds and bird-like objects exist in the image;
the secondary detection module is used for carrying out secondary recognition on the primarily recognized image, recognizing the bird partial graph in the graph, and judging whether the bird-like object is a bird.
A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of a method for bird image identification that prevents erroneous judgment.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of a bird image identification method for preventing erroneous judgment.
The application has the beneficial effects that: the bird image recognition method for preventing erroneous judgment provided by the application is formed by combining two deep learning models, the image to be detected is taken as input, the range is narrowed by the target detection algorithm, the partial graph is segmented by the semantic segmentation algorithm, the characteristics of birds contained in the partial graph are determined, the recognition time is shortened, whether the blocked birds, bird-like non-bird objects and small-size birds exist or not can be judged, and the erroneous judgment probability is reduced. By setting the confidence coefficient of the target detection algorithm, the image recognition time is reduced, and the resource occupation is reduced. The application has better effect in the aspects of recognition time, judgment precision and recognition type.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
fig. 1 is a flowchart illustrating an overall bird image recognition method for preventing erroneous judgment according to a first embodiment of the present application.
Fig. 2 is a flowchart illustrating an overall bird image recognition system for preventing erroneous judgment according to a third embodiment of the present application.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present application can be understood in detail, a more particular description of the application, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
While the embodiments of the present application have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the application. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.
Also in the description of the present application, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present application and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present application. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
Example 1
Referring to fig. 1, for one embodiment of the present application, there is provided a bird image recognition method for preventing erroneous judgment, including:
s1: and establishing a data set for bird whole and partial image recognition in natural light environment, and labeling the birds.
Furthermore, establishing the data set for bird whole and partial image recognition in natural light environment comprises collecting images containing bird information in natural light scene by frame extraction, searching camera image database and field photographing, establishing the data set for bird whole image recognition to mark bird whole, establishing the data set for bird partial image recognition to mark bird limb information containing bird head, wings, feathers and claws, carrying out image data augmentation by adopting translation, scaling and rotation mode set images, and respectively taking the amplified image data as two parts of training set and test set according to the proportion of 9:1.
It should be noted that, since the training data amount is small, data augmentation is required. In order to make the training result have stronger robustness, a random mode is adopted to specify a translation range and a translation step length, and translation is carried out along the horizontal direction or the vertical direction. Changing the position of the image content; and enlarging or reducing the image according to the appointed scale factor; the size or the blurring degree of the image content is changed, so that the number of data sets is effectively increased.
S2: and training a YOLO neural network for target detection through a data set of bird integral image recognition to primarily recognize and coarsely position birds.
Further, training the YOLO neural network for target detection includes training the YOLO neural network, setting a learning rate to 0.01, setting an image batch training size to 8, setting a calendar to 300, setting an image size to 1024×1024, and outputting a loss value of a loss function during training, expressed as:
wherein S is 2 Representing the area of a single image cell, each image being divided into S x S cells, B representing the number of target frames generated within each grid, obj representing the target object, noobj representing no target object, λ coord Representing the position loss weight coefficient, lambda noobj Representing the specific gravity coefficient of the bounding box in the loss function,a j-th prediction box representing an i-th grid containing target objects,/th prediction box>A j-th prediction box representing an i-th grid not containing a target object,/->Representing the ith grid containing objects, x i And y i Representing the predicted value of the coordinates of the center point of an object, w i And h i Predictive value representing width and height of bounding box, +.>And->Coordinate value of the center point representing the actual object, +.>And->Representing the width and height of the actual bounding box, C i Representing the number of predicted object categories, +.>Representing the number of kinds of actual objects, p i (c) Representing the predicted likelihood of an object being present, < +.>Indicating the likelihood of the presence of an object, c indicating a specific category, and class indicating the total number of categories of objects.
It should be noted that the calculated duty cycle of the position error and the classification error on the loss value is different, and that many grids in each image do not contain any object (i.e. no center point of the object falls within these grids), this can bias the confidence value of the bounding box within most grids to 0, and the phase shift amplifies the effect of the confidence error within the grid containing the object in calculating the gradient. Thus introducing lambda in calculating losses coord Correction of coordinate loss by =5, introducing λ noobj =0.5 to correct for the effect of the lack of target confidence error bias. Meanwhile, since the same positional deviation has much smaller influence on the IOUerror of a large object than on a small object, the YOLO target detection algorithm corrects the influence of balancing both by squaring the information items (w, h) of the object size.
It should be noted that the preliminary identification and coarse positioning of birds includes inputting a dataset for overall image identification of birds for training, finishing training when the mIOU value and the loss value are fitted, outputting a model, and testing with a test set.
And identifying and detecting the image to be detected through the trained YOLO neural network to obtain a first detection result diagram, and cutting the position of the square boundary frame in the first identification result diagram to obtain an image only belonging to the boundary frame limiting area.
It should be further noted that the learning rate is too small, the loss is too slow, the model detection accuracy is improved too slowly, the learning rate is too high, and the loss may be increased to cause inaccurate model detection accuracy, so that the learning rate is set to be 0.01 for ensuring the model accuracy. The size of the batch training is related to the display of the own hardware display card, and the larger the display memory is, the larger the image of the batch training size can be amplified, so that the set meets the requirement of the own hardware configuration, exceeds the requirement of the hardware configuration, and errors can be reported in the training process.
Furthermore, the TensorBoard is used for visualizing the current loss value and the accuracy in real time, the round loading training model with the highest IOU value is selected after training is stopped, the test set is used for testing, and the model with the best effect after the test is completed is used as the final training model.
It should be noted that, the IOU refers to the ratio of intersection and union of model to a certain class of prediction result and true value, the learning rate refers to the speed of model learning and weight adjustment, the image batch refers to the number of single input pictures, the calendar refers to the total number of training, and the loss value refers to the total loss value composed of classification loss, target confidence loss and positioning loss.
It should be further noted that, the integrated network combines the target detection neural network and the semantic segmentation neural network, so that a graph can be input, and the result is output finally through primary detection and secondary detection, YOLO is unified into a regression problem, and the detection result is divided into two parts of object category (classification problem) and object position (regression problem) for solving by R-CNN and fast-RCNN, so that the YOLO detection speed is Faster.
S3: training a U-net neural network through a data set of bird local image recognition, performing semantic segmentation on limb details of birds, and confirming bird information in an image.
Further, training the U-net neural network through the data set of bird partial image recognition comprises training the U-net neural network, setting a learning rate to be 0.0001, selecting Adam as an optimizer, performing image batch training to obtain images with a size of 4 and an image size of 512×512, and outputting a loss value of a loss function in the training process, wherein the loss value is expressed as:
wherein y is i Andthe predicted value label value and label value of the pixel i are respectively represented, and N represents the total number of pixel points. And when the mIOU value and the loss value are fitted, training is finished, outputting a model, and testing by using a testing set.
It should be noted that the common loss functions used in U-Net semantic segmentation algorithms are cross entropy loss functions and Dice loss functions.
The cross entropy loss function is a loss function commonly used in classification problems that can measure the difference between the predicted output and the real label. In the U-Net semantic segmentation algorithm, an image segmentation task is regarded as a two-classification problem, pixel points are classified into a foreground type and a background type, and a cross entropy loss function is used for measuring the difference between a prediction result of a model and a real label.
For application in a bi-classification loss function, the cross entropy loss function is of the form:
in which the value of the classification problem is only the set {0,1}, we assume that the true label of a sample point is y, and the probability of the sample point taking y=1 isThen the cross entropy loss L can be calculated.
The degree of similarity between the prediction result and the real label can be measured by the Dice loss function, and the smaller the Dice loss function is, the higher the degree of similarity between the prediction result of the model and the real label is. The cross entropy loss function and the Dice loss function are combined, so that the model can be better matched with a real label, and better performance is achieved in an image segmentation task.
It should be noted that, to ensure accuracy of the secondary detection, the learning rate is set to 0.0001 to enhance accuracy without causing an excessively slow decrease in loss.
It should be noted that, confirming bird information in the image includes sending a block diagram of the output target detection model of the YOLO neural network to a detection path of a semantic segmentation model of the U-net neural network, classifying each pixel point in the image, classifying irrelevant pixel points into background types, classifying pixel points belonging to bird information into corresponding limb detail label types, and finally generating a second detection result diagram, wherein the pixel information of the background types is set to be black, and four types of bird information are red, yellow, green and blue respectively.
It should also be noted that the second detection result image does not have colors other than five colors, and other colors indicate that the model is wrongly trained, and the model needs to be retrained.
Further, confirming that the bird information in the image further comprises when the YOLO neural network outputs an image with the result confidence higher than 0.8 and any one of the bird head, wing, feather or claw pixels exists in the second recognition result, confirming that the current recognition result comprises bird information and sending a bird driving instruction;
when the YOLO neural network outputs an image with the result confidence higher than 0.8, but the secondary identification result does not have limb details of birds, the primary detection result is regarded as bird-like non-bird, and a bird-driving instruction is not sent;
when the YOLO neural network outputs an image with the result confidence level lower than 0.8, the image is regarded as an image in which birds and bird-like non-birds are not present in the image, the second detection is not performed, and the absence of birds is judged.
It should be noted that the primary detection model uses YOLO network, and the secondary detection model uses U-net network. The object detection is to consider the detection problem as a regression problem, the nature of the detection problem is a probability problem, the accuracy requirement on a model is high, the resolution requirement on an input image is high, each pixel point of the image is classified by semantic segmentation, the detection accuracy is high, but for a high-resolution image, the pixel points are increased, and the detection accuracy and the speed are reduced. However, the accuracy of target detection is related to the number of training data sets, the quality of input images and the number of training rounds, so that the imaging configuration and the imaging angle are not good in a real scene, when the input images are blurred, the number of the data sets is small, the detection effect is obviously reduced, birds like or non-bird objects are easily judged to be birds, the images after primary positioning are much smaller than the original images, the interference of irrelevant backgrounds is removed, the resolution of the detected images is reduced, semantic segmentation can be rapidly carried out, and secondary detection of the primary positioning images is completed, so that the defects of insufficient YOLO network accuracy, easiness in misjudgment and low U-net network training speed are overcome through an integrated network method.
Example 2
In order to verify the beneficial effects of the bird image identification method, scientific demonstration is carried out through economic benefit calculation and simulation experiments.
With 20 bird pictures being shielded, false recognition of non-bird objects may occur in each picture.
As shown in the bird picture detection accuracy analysis table in table 1, the comprehensive recognition accuracy is reduced by 17%, and through manual verification, the false judgment of non-birds and bird-like is truly realized, and the overall detection accuracy is obviously improved. The independent target recognition or independent semantic segmentation has higher paper surface accuracy, but the accuracy comprises the judgment of bird-like non-bird objects and bird-like objects, and the higher accuracy is rather wasteful of bird-repellent resources. The two algorithms are combined, the image to be detected is taken as input, the range is narrowed by the target detection algorithm, the partial image is segmented by the semantic segmentation algorithm, the characteristics of birds contained in the partial image are determined, the recognition time is shortened, whether the blocked birds, bird-like non-bird objects and small-size birds exist or not can be judged, the misjudgment probability is reduced, and as only 20 experiments are carried out, the 100% misjudgment rate is reduced in the experiments, and the application of the my is remarkably improved in the aspect of avoiding misjudgment compared with the existing target recognition and semantic segmentation.
Table 1 birds picture detection accuracy analysis table
Picture sequence number Target detection recognition result Semantic segmentation recognition results Comprehensive recognition result
1 0.95 1 0.95
2 0.93 1 0.93
3 0.86 0 0.00
4 0.94 1 0.94
5 0.96 1 0.96
6 0.81 0 0.00
7 0.92 1 0.92
8 0.88 1 0.88
9 0.95 1 0.95
10 0.96 1 0.96
11 0.88 1 0.88
12 0.94 1 0.94
13 0.85 0 0.00
14 0.93 1 0.93
15 0.96 1 0.96
16 0.94 1 0.94
17 0.82 0 0.00
18 0.93 1 0.93
19 0.95 1 0.95
20 0.96 1 0.96
Average accuracy rate 0.92 0.8 0.75
Example 3
Referring to fig. 2, for one embodiment of the present application, there is provided a bird image recognition system for preventing erroneous judgment, comprising: the device comprises a data integration module, a primary identification module and a secondary detection module.
The data integration module is used for collecting bird image data, constructing a data set, classifying the bird image data into a bird whole data set and a bird partial image, carrying out image data augmentation, and dividing the data set and the test set.
The preliminary identification module is used for carrying out preliminary identification on the data set and judging whether birds and bird-like objects exist in the image.
The secondary detection module is used for carrying out secondary recognition on the primarily recognized image, recognizing the bird partial graph in the graph, and judging whether the bird-like object is a bird.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like. It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.
It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims (10)

1. A bird image recognition method for preventing erroneous judgment, comprising:
establishing a data set for identifying the whole and partial images of birds in a natural light environment, and marking the birds;
training a YOLO neural network for target detection through a data set of bird integral image recognition, and primarily recognizing and coarsely positioning birds;
training a U-net neural network through a data set of bird local image recognition, performing semantic segmentation on limb details of birds, and confirming bird information in an image.
2. The bird image recognition method for preventing erroneous judgment according to claim 1, wherein: the method comprises the steps of establishing a data set for bird integral and local image recognition under natural light environment, wherein the data set for bird integral image recognition is established for marking bird integral by collecting images containing bird information under natural light scene through frame extraction, image database searching and field photographing of a monitoring video, and marking bird limb information containing bird head, wings, feathers and paws by the data set for bird local image recognition, image data augmentation is carried out on images of the image set through translation, scaling and rotation, and the amplified image data are respectively used as two parts of a training set and a test set according to the proportion of 9:1.
3. The bird image recognition method for preventing erroneous judgment according to claim 1 or 2, wherein: training the YOLO neural network for target detection comprises training the YOLO neural network, setting a learning rate to be 0.01, setting a calendar to be 300, and outputting a loss value of a loss function in the training process, wherein the loss value is expressed as:
wherein S is 2 Representing the area of a single image cell, each image being divided into S x S cells, B representing the number of target frames generated within each grid, obj representing the target object, noobj representing no target object, λ coord Representing the position loss weight coefficient, lambda noobj Representing the specific gravity coefficient of the bounding box in the loss function,a j-th prediction box representing an i-th grid containing target objects,/th prediction box>A j-th prediction box representing an i-th grid not containing a target object,/->Representing the ith grid containing objects, x i And y i Representing the predicted value of the coordinates of the center point of an object, w i And h i Predictive value representing width and height of bounding box, +.>And->Coordinate value of the center point representing the actual object, +.>And->Representing the width and height of the actual bounding box, C i Representing the number of predicted object categories, +.>Representing the number of kinds of actual objects, p i (c) Representing the predicted likelihood of an object being present, < +.>Indicating the likelihood of the presence of an object, c indicating a specific category, and class indicating the total number of categories of objects.
4. The bird image recognition method for preventing erroneous judgment according to claim 3, wherein: the preliminary identification and rough positioning of birds comprise inputting a data set for bird integral image identification for training, finishing training when the mIOU value and the loss value are fitted, outputting a model, and testing by using a testing set;
and identifying and detecting the image to be detected through the trained YOLO neural network to obtain a first detection result diagram, and cutting the position of the square boundary frame in the first identification result diagram to obtain an image only belonging to the boundary frame limiting area.
5. The bird image recognition method for preventing erroneous judgment according to claim 4, wherein: training the U-net neural network through the data set of bird partial image recognition comprises training the U-net neural network, setting the learning rate to be 0.0001, selecting Adam as an optimizer, and outputting a loss value of a loss function in the training process, wherein the loss value is expressed as:
wherein y is i Andthe predicted value label value and the label value of the pixel i are respectively represented, and N represents the total number of pixel points;
and when the mIOU value and the loss value are fitted, training is finished, outputting a model, and testing by using a testing set.
6. The bird image recognition method for preventing erroneous judgment according to claim 5, wherein: the method comprises the steps of confirming bird information in an image, namely sending a block diagram of a target detection model output by a YOLO neural network, which is considered to contain bird information, to a detection path of a semantic segmentation model of a U-net neural network, classifying each pixel point in the image, classifying irrelevant pixel points into background types, classifying the pixel points belonging to the bird information into corresponding limb detail label types, and finally generating a second detection result diagram, wherein the pixel information of the background types is set to be black, and four types of bird information are red, yellow, green and blue respectively.
7. The bird image recognition method for preventing erroneous judgment according to claim 6, wherein: the bird information in the confirmation image further comprises any one of the pixels of the head, wings, feathers or paws of the bird when the YOLO neural network outputs an image with the result confidence higher than 0.8 and the second recognition result exists, and the current recognition result is confirmed to contain bird information and a bird driving instruction is sent;
when the YOLO neural network outputs an image with the result confidence higher than 0.8, but the secondary identification result does not have limb details of birds, the primary detection result is regarded as bird-like non-bird, and a bird-driving instruction is not sent;
when the YOLO neural network outputs an image with the result confidence level lower than 0.8, the image is regarded as an image in which birds and bird-like non-birds are not present in the image, the second detection is not performed, and the absence of birds is judged.
8. A system employing the bird image recognition method for preventing erroneous judgment as claimed in any one of claims 1 to 7, characterized in that: the system comprises a data integration module, a primary identification module and a secondary detection module;
the data integration module is used for collecting bird image data, constructing a data set, classifying the data set into a bird whole data set and a bird partial image, carrying out image data augmentation, and dividing the data set and a test set;
the preliminary identification module is used for carrying out preliminary identification on the data set and judging whether birds and bird-like objects exist in the image;
the secondary detection module is used for carrying out secondary recognition on the primarily recognized image, recognizing the bird partial graph in the graph, and judging whether the bird-like object is a bird.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310603161.9A 2023-05-25 2023-05-25 Bird image recognition method and system for preventing misjudgment Pending CN116740758A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310603161.9A CN116740758A (en) 2023-05-25 2023-05-25 Bird image recognition method and system for preventing misjudgment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310603161.9A CN116740758A (en) 2023-05-25 2023-05-25 Bird image recognition method and system for preventing misjudgment

Publications (1)

Publication Number Publication Date
CN116740758A true CN116740758A (en) 2023-09-12

Family

ID=87903610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310603161.9A Pending CN116740758A (en) 2023-05-25 2023-05-25 Bird image recognition method and system for preventing misjudgment

Country Status (1)

Country Link
CN (1) CN116740758A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351521A (en) * 2023-12-04 2024-01-05 国网山东省电力公司电力科学研究院 Digital twinning-based power transmission line bird detection method, system, medium and equipment
CN117690164A (en) * 2024-01-30 2024-03-12 成都欣纳科技有限公司 Airport bird identification and driving method and system based on edge calculation
CN117690164B (en) * 2024-01-30 2024-04-30 成都欣纳科技有限公司 Airport bird identification and driving method and system based on edge calculation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117351521A (en) * 2023-12-04 2024-01-05 国网山东省电力公司电力科学研究院 Digital twinning-based power transmission line bird detection method, system, medium and equipment
CN117351521B (en) * 2023-12-04 2024-04-09 国网山东省电力公司电力科学研究院 Digital twinning-based power transmission line bird detection method, system, medium and equipment
CN117690164A (en) * 2024-01-30 2024-03-12 成都欣纳科技有限公司 Airport bird identification and driving method and system based on edge calculation
CN117690164B (en) * 2024-01-30 2024-04-30 成都欣纳科技有限公司 Airport bird identification and driving method and system based on edge calculation

Similar Documents

Publication Publication Date Title
CN113160192B (en) Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background
CN109657716B (en) Vehicle appearance damage identification method based on deep learning
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
CN108229509B (en) Method and device for identifying object class and electronic equipment
US10346720B2 (en) Rotation variant object detection in Deep Learning
CN109871902B (en) SAR small sample identification method based on super-resolution countermeasure generation cascade network
CN105574550A (en) Vehicle identification method and device
CN114663346A (en) Strip steel surface defect detection method based on improved YOLOv5 network
CN111274964B (en) Detection method for analyzing water surface pollutants based on visual saliency of unmanned aerial vehicle
CN113313107A (en) Intelligent detection and identification method for multiple types of diseases on cable surface of cable-stayed bridge
CN113033558A (en) Text detection method and device for natural scene and storage medium
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN110458019B (en) Water surface target detection method for eliminating reflection interference under scarce cognitive sample condition
CN115082776A (en) Electric energy meter automatic detection system and method based on image recognition
CN114882204A (en) Automatic ship name recognition method
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN116935369A (en) Ship water gauge reading method and system based on computer vision
CN117058069A (en) Automatic detection method for apparent diseases of pavement in panoramic image
CN114927236A (en) Detection method and system for multiple target images
CN110889418A (en) Gas contour identification method
CN111402185A (en) Image detection method and device
CN110334703B (en) Ship detection and identification method in day and night image
Ren et al. Building recognition from aerial images combining segmentation and shadow
CN113313678A (en) Automatic sperm morphology analysis method based on multi-scale feature fusion
CN110598697A (en) Container number positioning method based on thickness character positioning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication