CN114821102A

CN114821102A - Intensive citrus quantity detection method, equipment, storage medium and device

Info

Publication number: CN114821102A
Application number: CN202210405888.1A
Authority: CN
Inventors: 尹帆; 李嘉晖; 李子茂; 帖军; 郑禄; 田莎莎; 杜小坤; 吴钱宝
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-07-29

Abstract

The invention discloses a method, equipment, a storage medium and a device for detecting the quantity of dense oranges. The method is used for detecting the quantity of the citrus in the citrus image to be identified based on the deformable convolution network in the preset DS-YOLO network model and the preset SimAM attention mechanism, compared with the prior art that the detection effect of the dense citrus is not ideal, the detection precision is low, the detection precision of the model is improved, the dense citrus quantity detection with high reliability is realized, and the defects in the prior art are overcome.

Description

Intensive citrus quantity detection method, equipment, storage medium and device

Technical Field

The invention relates to the field of citrus detection, in particular to a method, equipment, a storage medium and a device for detecting the quantity of intensive citrus.

Background

Along with the rise and development of deep learning, intelligent agriculture and agricultural automation are more and more emphasized, the target detection by using the deep learning becomes a current research focus, the difficulty of intensive citrus detection is caused by how to realize small-size citrus identification, overlapped and shielded citrus identification, citrus identification with similar grain color and environment in a natural environment, citrus identification in multi-angle images which can cause repeated counting and the like by using a computer vision technology, the accurate positioning of intensive fruits is an important premise for realizing early fruit estimation, and effective technical support is provided for a picking robot.

At present, the mainstream target detection algorithm based on deep learning focuses on detecting images with dispersed and regular targets, the effect of the method is poor when the method is applied to dense scenes, the similarity between the targets in the dense scenes is high, the target interval is small, and the missing detection is easy. Under natural scenes, the number of citrus fruits is large, and the fruits are too small, adhered, mutually shielded or shielded by branches and leaves of the fruits and the like. The current detection methods for the dense targets are few, and if the detection method for the sparse targets is used for the dense targets, the false detection condition of missed detection is particularly serious, and the detection accuracy is not high.

Target detection is part of computer vision, which is mainly described according to the content of the whole image and determines the category and the position of a target object by combining the characteristic information of the object. The intensive target detection task is similar to general target detection, and all targets in an input image need to be positioned, but the intensive target detection task is different from the general target detection task in that the class of each target does not need to be marked due to the particularity of an intensive scene, and objects in a detected image usually have the characteristics of dense arrangement, easy overlapping and the like, such as crowds, supermarket commodities, parking lot vehicles and the like.

The current general target detection method based on deep learning can be roughly divided into two types. One is a two-stage Region-based suggestion (Region pro common) approach; another is a one-phase End-to-End (End-to-End) based approach: YOLO series, SSD, RetinaNet, etc.

In the one-stage end-to-end-based detection method, when the existing YOLO algorithm is used for detection, the detection precision of small targets aggregated into a group is low due to the fact that the space constraint limits the number of YOLO detection adjacent targets, and therefore positioning is not accurate.

In the two-stage detection method based on the region suggestion, information is lost in cutting, and the original proportion of an object is lost in deformation due to zooming, so that the accuracy of identification is influenced; the SPPNet method introduces a Spatial Pyramid Pooling layer (SPP), and realizes multi-size input, so that the size of an input image is arbitrary, but the SPPNet calculates a convolution characteristic for a whole image, and the detection speed is slow.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, equipment, a storage medium and a device for detecting the quantity of dense oranges, and aims to solve the technical problem that the detection precision is low due to the fact that the detection effect of the dense oranges in the prior art is not ideal.

In order to achieve the above object, the present invention provides a dense citrus fruit number detection method, including the steps of:

carrying out image preprocessing on the citrus image to be identified to obtain a processed target citrus image;

performing feature extraction on the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map containing the shape features and the position features of the dense citrus;

performing feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps with different scales;

determining the coordinates and the size of a target prediction frame according to the candidate frame adjusting parameters corresponding to the target feature map;

and marking the position of the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determining the quantity of the citrus in the citrus image to be identified according to a marking result.

Optionally, the step of performing feature extraction on the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map including dense citrus shape features and position features includes:

sampling the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a deformable sampling point;

carrying out self-adaptive adjustment on the weight coefficient of the deformable sampling point according to the target citrus image to obtain the adjusted deformable sampling point;

and performing feature extraction on the target citrus image according to the adjusted deformable sampling point to obtain a feature map containing the shape features and the position features of the dense citrus.

Optionally, the step of performing feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps of different scales includes:

calculating neuron weights corresponding to the characteristic graphs according to a preset SimAM attention mechanism and a preset energy function, and determining target neurons according to the calculation results;

and performing feature fusion on the feature map according to the target neuron to obtain target feature maps with different scales.

Optionally, the step of performing feature fusion on the feature map according to the target neuron to obtain target feature maps of different scales includes:

performing feature fusion on the feature map according to the target neuron to obtain a first feature map after feature fusion;

performing Anchor coordinate matching on the first characteristic graph according to a preset K-Mean clustering algorithm to obtain a target detection scale;

and carrying out multi-scale detection on the first characteristic diagram according to the target detection scale to obtain target characteristic diagrams with different scales.

Optionally, before the step of performing feature extraction on the target citrus image based on the preset DS-YOLO network model to obtain the feature map including the dense citrus shape features and the location features, the method further includes:

inputting the collected dense citrus image into an original YOLOv4 model to obtain a feature map with a preset scale;

stacking and convolving the feature maps with preset scales according to a preset SPP network to obtain processed feature maps;

performing feature fusion processing on the processed feature map based on a path aggregation network to obtain a second feature map after feature fusion;

and constructing a DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjustment parameter corresponding to the detection scale.

Optionally, after the step of building a DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjustment parameter corresponding to the detection scale, the method further includes:

replacing the convolution layer of the residual error unit of the residual error module in the DS-YOLO network model with an improved deformable convolution layer, and adding the preset SimAM attention mechanism in a path aggregation network of the DS-YOLO network model to generate a new DS-YOLO network model;

and taking the new DS-YOLO network model as the preset DS-YOLO network model.

Optionally, after the step of using the new DS-YOLO network model as the preset DS-YOLO network model, the method further includes:

testing the preset DS-YOLO network model through a dense citrus test set without image preprocessing to obtain a test result;

and performing quality evaluation on the test result according to preset average detection precision, and determining the precision rate of the test result according to the quality evaluation result.

In addition, in order to achieve the above object, the present invention further provides a dense citrus quantity detection apparatus, which includes a memory, a processor, and a dense citrus quantity detection program stored in the memory and operable on the processor, wherein the dense citrus quantity detection program is configured to implement the steps of the dense citrus quantity detection as described above.

In addition, in order to achieve the above object, the present invention further provides a storage medium, on which a dense citrus quantity detection program is stored, which, when being executed by a processor, implements the steps of the dense citrus quantity detection method as described above.

In addition, in order to achieve the above object, the present invention further provides a dense citrus fruit number detection device, including:

the image preprocessing module is used for preprocessing the image of the citrus to be identified to obtain a processed target citrus image;

the characteristic extraction module is used for extracting the characteristics of the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a characteristic diagram containing the shape characteristics and the position characteristics of the dense citrus;

the characteristic fusion module is used for carrying out characteristic fusion on the characteristic graph according to a preset SimAM attention mechanism to obtain target characteristic graphs of different scales;

the orange detection module is used for determining the coordinates and the size of the target prediction frame according to the candidate frame adjustment parameters corresponding to the target feature map;

the citrus detection module is further configured to perform position marking on citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determine the number of citrus in the citrus image to be identified according to a marking result.

The method comprises the steps of preprocessing an image of the citrus to be recognized to obtain a processed target citrus image, extracting features of the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map containing intensive citrus shape features and position features, performing feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps with different scales, and adjusting parameters according to candidate frames corresponding to the target feature map to determine coordinates and sizes of a target prediction frame; and marking the position of the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determining the quantity of the citrus in the citrus image to be identified according to the marking result. Because the feature fusion is carried out on the citrus image to be recognized based on the deformable convolution network in the preset DS-YOLO network model and the preset SimAM attention mechanism, and the quantity of the citrus in the citrus image to be recognized is detected according to the target prediction frames corresponding to the target feature maps with different scales, compared with the prior art, the detection effect on the dense citrus is not ideal, so that the detection precision is low, the detection precision of the model is improved, the detection on the quantity of the dense citrus with high reliability is realized, and the defects in the prior art are overcome.

Drawings

FIG. 1 is a schematic diagram of a dense citrus fruit quantity sensing device for a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow diagram of a first embodiment of a method for determining the quantity of dense citrus fruit according to the present invention;

FIG. 3 is a schematic structural diagram of an original Yolov4 model according to a first embodiment of the method for detecting the quantity of dense citrus fruit of the present invention;

FIG. 4 is a schematic diagram of a SimAM attention mechanism of a first embodiment of the dense citrus fruit count detection method of the present invention;

FIG. 5 is a schematic structural diagram of a DS-YOLO model according to a first embodiment of the method for detecting the quantity of dense citrus fruit according to the present invention;

FIG. 6 is a schematic flow chart diagram of a second embodiment of a method for dense citrus fruit count detection in accordance with the present invention;

FIG. 7 is a flowchart of a deformable convolution implementation of a second embodiment of a dense citrus fruit quantity detection method of the present invention;

fig. 8 is a block diagram of the first embodiment of the intensive citrus fruit count detection apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a dense citrus fruit quantity detection device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the dense citrus fruit quantity measuring apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the dense citrus fruit count sensing apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in FIG. 1, memory 1005, identified as one type of computer storage medium, may include an operating system, a network communications module, a user interface module, and a dense citrus fruit count detection program.

In the intensive citrus quantity detection device shown in fig. 1, the network interface 1004 is mainly used for connecting a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the dense citrus quantity detection device calls the dense citrus quantity detection program stored in the memory 1005 through the processor 1001, and executes the dense citrus quantity detection method provided by the embodiment of the present invention.

Based on the hardware structure, the embodiment of the intensive citrus quantity detection method is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the intensive citrus quantity detection method according to the present invention, and the first embodiment of the intensive citrus quantity detection method according to the present invention is provided.

In this embodiment, the method for detecting the number of dense citrus fruits comprises the following steps:

step S10: and carrying out image preprocessing on the citrus image to be identified to obtain a processed target citrus image.

It should be noted that the execution subject of the present embodiment may be a device having a dense citrus quantity detection function, and the device may be a computing device including a citrus quantity detection system, such as: the present invention relates to a method for detecting the quantity of intensive citrus fruit, and more particularly, to a method for detecting the quantity of intensive citrus fruit, which includes a computer, a notebook, and the like, and may also be a device (e.g., a mobile phone, an ipad) including a APP for detecting the quantity of intensive citrus fruit.

It is understood that the citrus image to be identified can refer to a citrus image acquired in the natural environment that requires the measurement of the amount of citrus fruit. The citrus image comprises images of smaller-sized citrus fruit, overlapped and occluded citrus fruit, citrus fruit with a grain color similar to the environment, and the like. For example: the method comprises the steps of collecting citrus images naturally growing in a natural environment to simulate a natural scene, shooting citrus plants on a sunny day, a cloudy day and at different angles when the images are shot, wherein the collection distance is 1.0-2.0 m, 2365 citrus sample images in different environments are collected in total, each image comprises 50 or more citrus, and the image resolution is 4032 x 3024 pixels.

It should be understood that the preprocessing of the citrus image includes image cleaning, image cropping, image resizing, and the like, and the preprocessed image is input into the preset DS-YOLO network model for quantity detection.

In specific implementation, collected intensive citrus pictures are sorted and classified, invalid pictures and abnormal pictures are removed, data augmentation operations including picture cutting, turning, brightness adjustment and the like are performed, background interference or redundant target information is avoided, and irrelevant or redundant features are removed. For example: firstly, sorting and classifying collected intensive citrus pictures, removing invalid images and abnormal images (such as images with too large shooting angles and difficult fruit detection), marking the citrus in the images by using an image marking tool LabelImg after the images are cleaned, respectively marking the positions and the types of the citrus, cutting partial pictures by using a Crop method in an OpenCV library after the marking is finished, avoiding background interference or redundant target information, removing irrelevant or redundant features to reduce the processing burden of model detection, then adjusting the image size to 416 by using a Resize method in the OpenCV library, and storing the processed images in a preset folder.

Step S20: and performing feature extraction on the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map containing the shape features and the position features of the dense citrus.

It should be noted that the preset DS-YOLO network model is a pre-trained model for performing quantity detection on dense citrus. The preset DS-YOLO network model is a model constructed based on D-CSPDarnet53 (deformable convolution feature extraction network), PANET and Head detection algorithms.

It can be understood that under normal conditions, the image of the outdoor orchard is acquired, obvious intra-class variation exists, the size of the acquired citrus image is variable due to different illumination conditions, camera viewpoints and other conditions, and the number of the citrus is large, so that the target citrus image is subjected to feature extraction through a deformable convolution network in a preset DS-YOLO network model to obtain a feature map containing the shape features and the position features of the dense citrus.

Further, to improve the model detection accuracy, before the step S20, the method includes: inputting the collected dense citrus image into an original YOLOv4 model to obtain a feature map with a preset scale; stacking and convolving the feature maps with preset scales according to a preset SPP network to obtain processed feature maps; performing feature fusion processing on the processed feature map based on a path aggregation network to obtain a second feature map after feature fusion; and constructing a DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjustment parameter corresponding to the detection scale.

It should be noted that, through the collected dense citrus images and the construction of an image data set, the collected images can be named according to the format of a Pascal VOC data set, and three folders named as antibiotics, ImageSets and JPEGImages are created at the same time; the data set is a self-created data set. The image data set consists of pictures shot in an orchard and images collected on the internet, 4 shot factors are shooting distance (0.5m/1.0m), fruit size (large/small), fruit number (large/medium/small) and weather condition (cloudy day/sunny day), 2365 pieces are counted, and the distribution rate of the images can be set to 4032 x 3024 pixels. Preprocessing images in a dense citrus image dataset, such as: firstly, sorting and classifying collected intensive citrus images, removing invalid images and abnormal images, marking the citrus in the images by using an image marking tool LabelImg after the images are cleaned, respectively marking the positions and the types of the citrus, dividing a picture data set into a training set and a testing set according to a certain proportion after marking is finished, wherein the training set accounts for 90% of the total sample number, the testing set accounts for 10% of the total sample number, and the training set and the testing set are respectively named as train.txt and text.txt and are stored in subfolders Main under an ImageSets folder. Then using a Crop method in an OpenCV library to cut partial pictures, avoiding background interference or redundant target information, eliminating irrelevant or redundant features, simplifying the model, then using a Resize method in the OpenCV library to adjust the image size to 416 x 416, and inputting the adjusted image into an original YOLOv4 model for training. The above numerical values are not particularly limited and are merely illustrative.

It is understood that the preset scale refers to a scale for extracting features based on the original YOLOv4 model, such as: inputting the adjusted image into an original YOLOv4 model, and outputting feature maps of 13 × 13, 26 × 26 and 52 × 52 by YOLOv4 after the adjusted image passes through a backbone network, wherein the feature maps of different scales contain semantic information of different dimensions. The preset SPP network refers to a network for calculating convolution characteristics in the original YOLOv4 model, such as: SPP (spatial Pyramid) network architecture. A path aggregation network refers to a network that fuses features. The above numerical values are not particularly limited and are merely illustrative.

It should be understood that the detection scale refers to a scale of a YOLO detection head corresponding to the second feature map, and each detection head scale includes a candidate frame adjustment parameter, where the candidate frame adjustment parameter includes a confidence coefficient parameter, an adjustment length and width parameter, and a coordinate offset parameter, and a category parameter.

In specific implementation, referring to the structural schematic diagram of the original YOLOv4 model shown in fig. 3, the adjusted image is input into the original YOLOv4 model, and after passing through the backbone network, YOLOv4 outputs feature maps of 13 × 13, 26 × 26, and 52 × 52 in three dimensions, in the feature fusion portion, the feature map of 13 × 13 size enters an SPP (spatial Pyramid power) structure, and the SPP stacks and convolves the obtained new feature map and the feature map before entering the network, and outputs the new feature map and the feature map to a feature fusion network (path aggregation network) pant. And performing two times of upsampling on the 13 × 13 feature map by using the PANet, stacking the result with the 26 × 26 and 52 × 52 feature maps respectively, performing convolution, performing a series of similar downsampling and stacking convolution from bottom to top, fully fusing the features of the 3 different scale feature maps, and finally outputting the 13 × 13, 26 × 26 and 52 × 52 three YOLO detection heads respectively. Each detector head in the YOLOv4 algorithm will contain 3 sets of candidate box tuning parameters, each set of candidate box tuning parameters will contain 1 confidence parameter, 4 parameters for tuning length and width and coordinate offset, and 20 class parameters (VOC2007 data set contains 20 classes). And constructing a DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjusting parameters corresponding to the detection scale, and generating a final prediction frame by using the coordinate and the width and the height of the adjusting candidate frame through the adjusting parameters and a YOLOv4 algorithm, wherein the prediction frame is used for calibrating the position of the citrus in the citrus image, so that the quantity of the citrus is finally determined.

Further, after the step of building the DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjustment parameter corresponding to the detection scale, the method further includes: replacing the convolution layer of the residual error unit of the residual error module in the DS-YOLO network model with an improved deformable convolution layer, and adding the preset SimAM attention mechanism in a path aggregation network of the DS-YOLO network model to generate a new DS-YOLO network model; and taking the new DS-YOLO network model as the preset DS-YOLO network model.

It should be noted that, a new DS-YOLO network model is obtained by improving the problem of missing detection caused by insufficient feature extraction capability of the original YOLOv4 model with the goal of dense oranges, so that not only is the detection capability of the model on dense oranges with large size changes further improved, but also the extraction capability of the model on orange features is improved, the training speed is high, and the model accuracy is high. The DS-YOLO network model is constructed based on an original YOLOv4 model, so that in order to solve the problem that target prediction is carried out only by using three detection scales in an original YOLOv4 algorithm and detection omission is easily caused for dense oranges, a new DS-YOLO network model is generated by replacing a convolutional layer of a residual error module residual error unit in the DS-YOLO network model with an improved deformable convolutional layer and adding the preset SimAM attention mechanism in a path aggregation network of the DS-YOLO network model.

Understandably, due to the conditions of different illumination conditions, camera view points and the like, the sizes of the citrus images collected in an outdoor orchard are variable, and the citrus images are seriously overlapped and shielded with the citrus and the leaves, so that the shape of the citrus is deformed, and great difficulty is brought to the identification of the citrus. Therefore, for the problem of orange overlapping and occlusion, the modified Deformable Convolution (Deformable Convolution) is used to replace the Convolution layer of the CSPDarknet53 network partial residual unit in the DS-YOLO network model, namely, in the DCSP module of the D-CSPDarknet53 network, more Deformable Convolution layers with offset learning capability are used, so that the model can determine the shape and size according to the orange image change. In the feature fusion network of the original YOLOv4, a SimAM attention mechanism was added. The SimAM attention mechanism combines the channel dimension and the space dimension, and deduces the three-dimensional attention weight of the unified channel dimension and the space dimension, so that the network pays more attention to deep characteristic space information of the dense oranges.

Further, in order to detect the detection accuracy of the trained preset DS-YOLO network model, after the step of using the new DS-YOLO network model as the preset DS-YOLO network model, the method further includes: testing the preset DS-YOLO network model through a dense citrus test set without image preprocessing to obtain a test result; and performing quality evaluation on the test result according to preset average detection precision, and determining the precision rate of the test result according to the quality evaluation result.

It should be noted that the dense citrus test set without image preprocessing refers to an image data set constructed by taking citrus images in an orchard and original citrus images collected on the internet. And inputting the original citrus image in the image data set to a trained preset DS-YOLO network model for carrying out dense citrus quantity detection, and outputting a quantity result. And taking the average detection precision as a main evaluation index of the model to evaluate the quality of the intensive citrus detection model result, taking the loss function as a target function evaluation index, and evaluating the difference degree between the model prediction value and the actual value.

It should be understood that the average detection accuracy is measured by the value of mAP (mean average precision), which indicates the total weight of the prediction accuracy (positive type and negative type); precision (Precision) refers to the proportion of correct predictions to all predictions as positive; recall (Recall) refers to the proportion of all that is actually positive that is positively predicted to be positive; the F1 value is the harmonic mean of precision and recall.

In the concrete implementation, after model training is completed, an intensive citrus test set without data preprocessing is used for effect verification, as the target is small, part of an image of a detection result is intercepted for effectively observing the detection result of the model, and the result shows that under the same training parameters, the DS-YOLO model provided by the document can accurately position oranges which are large in number, small, intensive, shielded and the like in a natural scene, while the original YOLOv4 model has a certain degree of missing detection and false detection, and further improves the average detection precision of the intensive oranges.

Step S30: and performing feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps with different scales.

It should be noted that the preset SimAM attention mechanism is preset to improve the attention for identifying the citrus features in the feature image. The preset SimAM attention mechanism combines the channel dimension and the space dimension, and the attention weight of the unified channel dimension and the space dimension is deduced for the characteristic diagram, so that the attention further enhances the extraction capability of the network on important information, parameters do not need to be added in the original network, and the parameter quantity of the model is not increased.

In a specific implementation, referring to a schematic diagram of an attention mechanism of a SimAM shown in fig. 4, a conventional convolution network employs square convolution, and regular sampling points limit the capability of geometric transformation modeling, so that scanning can be performed only in a window with a fixed shape and size. Each element of the deformable convolution kernel has a parameter offset which can be learned, so that sampling points of the deformable convolution kernel can be adaptively adjusted according to the characteristic diagram, and the receptive field can be changed along with the difference of the shape and the size of an object. Although the deformable convolution enables the sampling points to be adaptively adjusted according to the characteristic diagram and better perform characteristic extraction, useless context information such as background and the like may be introduced, so that aiming at the problem, the method expands the use range of the deformable convolution layer in the YOLOv4 network, uses more deformable convolution layers with offset learning capability in the CSPDarknet53 network, enables the deformable convolution to control sampling on the characteristic layer in a larger range, adds a weight coefficient for each sampling point on the basis of the original deformable convolution to distinguish whether the sampling point is a target area, and sets the weight of the sampling point to be 0 if the sampling point is not the target area so as to ensure accurate extraction of the target characteristic.

Step S40: and determining the coordinates and the size of the target prediction frame according to the candidate frame adjusting parameters corresponding to the target feature map.

It should be noted that the frame candidate adjustment parameters corresponding to the target feature map refer to frame candidate adjustment parameters corresponding to different scales, and referring to the DS-YOLO model structure diagram shown in fig. 5, the present scheme obtains features of 4 different scale feature maps, and outputs four YOLO detection heads 13 × 13, 26 × 26, 52 × 52, and 104 × 104, respectively. Each detection head in the DS-YOLO algorithm will contain 3 sets of frame candidates 'tuning parameters, and each set of frame candidates' tuning parameters will contain 1 confidence parameter, 4 parameters for tuning length, width, and coordinate offset, and 1 category parameter. With these adjustment parameters, the DS-YOLO algorithm will adjust the coordinates and size (width and height) of the candidate box to generate the target prediction box.

Step S50: and marking the position of the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determining the quantity of the citrus in the citrus image to be identified according to a marking result.

Note that, labeling the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, that is, framing the citrus by using the prediction frame, and determining the number of the citrus in the citrus image according to the number of the prediction frames.

The method comprises the steps of preprocessing an image of a citrus image to be identified to obtain a processed target citrus image, extracting features of the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map containing dense citrus shape features and position features, performing feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps with different scales, and adjusting parameters according to candidate frames corresponding to the target feature map to determine coordinates and sizes of a target prediction frame; and marking the position of the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determining the quantity of the citrus in the citrus image to be identified according to the marking result. Because this embodiment is based on preset DS-YOLO network model and preset SimAM attention mechanism and treats discernment oranges and tangerines image and carry out the feature fusion to the oranges and tangerines in treating discernment oranges and tangerines image according to the target prediction frame that the target feature map of different yards corresponds carry out quantity detection, this embodiment compares in prior art to the detection effect of intensive oranges and tangerines not ideal, leads to detecting the precision low, this embodiment has promoted the detection precision of model, has realized that the reliability is high intensive oranges and tangerines quantity detects, in order to remedy prior art's not enough.

Referring to fig. 6, fig. 6 is a schematic flow chart of a second embodiment of the present invention, which is based on the first embodiment shown in fig. 2 and provides a second embodiment of the method for detecting the quantity of dense citrus fruit according to the present invention.

In this embodiment, the step S20 includes:

step S201: and sampling the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a deformable sampling point.

It should be noted that the Deformable Convolution network refers to a Deformable Convolution (Deformable Convolution) feature extraction network of a Convolution layer replacing part of a residual unit of the original YOLOv4 network model CSPDarknet53 network, and the feature extraction network is a network constructed based on a D-CSPDarnet53 algorithm. The deformable convolution network can control sampling on a feature layer in a larger range, and a weight coefficient is added to each sampling point on the basis of the original deformable convolution to distinguish whether the sampling point is a target area, and if the sampling point is not the target area, the weight of the sampling point is set to be 0 to ensure accurate extraction of target features.

The sampling points correspond to a weight coefficient, and whether the sampling area is a target area is judged according to the weight coefficient, so that the accuracy of feature extraction is improved.

Step S202: and carrying out self-adaptive adjustment on the weight coefficient of the deformable sampling point according to the target orange image to obtain the adjusted deformable sampling point.

It should be noted that, in order to better learn the shape and size of the citrus, each element of the deformable convolution kernel has a parameter offset that can be learned, so that the sampling point of the deformable convolution kernel can be adaptively adjusted according to the characteristic diagram, and thus the area of the sampling point can be changed along with the difference of the shape and size of the object. And then better study oranges and tangerines shape and size, avoid oranges and tangerines because sheltered from and lead to discerning the failure.

It can be understood that in order to better extract the features and avoid the influence of useless information (such as background) on the identification result, the accurate extraction of the target features is ensured by adding the weight coefficient to the deformable sampling point, adaptively adjusting the weight coefficient and determining the target area according to the weight coefficient.

Step S203: and performing feature extraction on the target citrus image according to the adjusted deformable sampling point to obtain a feature map containing the shape features and the position features of the dense citrus.

In a specific implementation, the conventional convolution includes two steps: 1) sampling on the input feature map by a convolution kernel of fixed size; 2) and calculating sampling according to the convolution kernel weight, and adding calculation results. For example: defining a convolution kernel with a size of 3 × 3 and an expansion of 1 as R, a conventional convolution sliding window offset vector is a 9-point grid, and the middle is offset by 8 points in 8 directions:

R＝{(-1,-1),(-1,0),......,(0,1),(1,1)}；

then toOutputting each position p on the profile y ₀ The conventional convolution can be expressed as:

wherein, each pixel point p in the convolution _n Having a weight w, p ₀ Representing each pixel point output by the convolution window, x is the input feature map.

The deformable convolution adds a 2-dimensional offset deltap to each sampling point position of the conventional convolution kernel _n |n＝1,...,N},N＝|R|。

The flowchart for the implementation of the deformable convolution shown in FIG. 7 is for each position p on the output feature map y ₀ The deformable convolution is calculated as follows:

in order to solve the problem that useless context information is introduced by deformable convolution, a weight coefficient is added to each sampling point on the basis of the original deformable convolution to distinguish whether the sampling point is a target area or not, if the sampling point is not the target area, the weight of the sampling point is set to be 0, and delta w _n ∈[0,1]。

Due to the offset Δ p _n Usually a decimal number, so the feature diagram x after convolution needs to be calculated by a bilinear interpolation method:

wherein p represents an arbitrary position after the shift, i.e., p ═ p ₀ +p _n +Δp _n Q is the spatial position of all points in the feature map, and G (·,) corresponds to the pointThe weight of (c) can be divided into two one-dimensional kernels:

G(q,p)＝g(q _x ,p _x )·g(q _y ,p _y )；

wherein the content of the first and second substances,

g(a,b)＝max(0,1-|a-b|)。

in this embodiment, the step S30 includes:

step S301: and calculating the neuron weight corresponding to the characteristic diagram according to a preset SimAM attention mechanism and a preset energy function, and determining a target neuron according to the calculation result.

Note that the function for calculating the attention weight is preset when the energy function is preset.

In the concrete implementation, the SimAM attention mechanism is mainly based on the neuroscience theory, and an energy function is designed in advance to calculate the attention weight, wherein the weight function is defined as follows:

wherein the content of the first and second substances,

respectively representing target neuron and other neurons of input feature map, X ∈ R ^C×H×W I denotes an index in the spatial dimension, M ═ hxw denotes the number of all neurons in a certain channel, W _t And b _t Respectively referring to the weight and deviation of a certain neuron transformation, and introducing a binary label to replace y _t And y ₀ The values are 1 and-1 respectively, and the minimized e is solved _t And judging the linear separability of the current target neuron and other neurons, and finally obtaining a weight function as follows:

theoretically, there are M × W weighting functions per channel, and the following analytical solution is obtained:

wherein the content of the first and second substances,

respectively handle w _t And b _t Substituting into the original formula, the minimum weight can be calculated:

from a statistical point of view, to reduce the amount of computation,

and

can be replaced by a mean and a variance, respectively, the mean being

Variance (variance)

The smaller the value, the more separable the current target neuron and other neurons, the more important the target neuron.

Step S302: and performing feature fusion on the feature map according to the target neuron to obtain target feature maps with different scales.

It should be noted that the feature map is subjected to feature fusion by the target neuron, so that deep feature space information of the intensive citrus is concerned, and the rate and accuracy of feature fusion by the model are improved.

Further, the step S302 includes: performing feature fusion on the feature map according to the target neuron to obtain a first feature map after feature fusion; performing Anchor coordinate matching on the first characteristic diagram according to a preset K-Mean clustering algorithm to obtain a target detection scale; and carrying out multi-scale detection on the first characteristic diagram according to the target detection scale to obtain target characteristic diagrams with different scales.

It should be noted that, because the Anchor size of the original network is not suitable for dense targets, the feature map is subjected to feature fusion through the target neurons to obtain a first feature map after feature fusion, and an Anchor coordinate matching is performed by adopting a K-means clustering algorithm to calculate the Anchor size and the target detection scale suitable for the first feature map data set, and multi-scale detection is performed on the first feature map according to the Anchor size and the target detection scale to obtain target feature maps with different scales.

In the specific implementation, in the feature fusion module, a K-means method is used for matching with a new Anchor coordinate, a plurality of detection scales are increased, the detection precision of the model is further improved, and 3 detection scales in the original YOLOv4 network are expanded to 4. The original multi-scale detection structure of the YOLOv4 has only three layers, and missing detection is easily caused for dense targets, so that a feature map with the scale of 104 x 104 is added on the basis of the original network structure to reduce the false detection rate of small targets. Because the Anchor size of the original network is not suitable for the dense target, the K-Mean clustering method is adopted to calculate 12 Anchor sizes suitable for the first feature map data set, the Anchor sizes respectively correspond to four feature layers (detection scales), one feature layer corresponds to three Anchor boxes, and the Anchor sizes are obtained by applying the Anchor sizes to the training network, wherein the 12 Anchor boxes respectively have the following sizes: 3 × 4,5 × 6,6 × 8,7 × 11,8 × 10,9 × 12,10 × 14,11 × 11,11 × 16,13 × 15,14 × 18,18 × 23. And after the 11 th CSP layer, continuously performing processing such as up-sampling on the characteristic diagram to enable the characteristic diagram to be continuously expanded, and simultaneously performing tensor splicing and fusion on the acquired characteristic diagram with the size of 52 x 52 and the characteristic diagram of the 3 rd layer in the backbone network at the 19 th CSP layer to obtain the characteristic diagram with the size of 104 x 104, so that a small target object can be conveniently detected. The whole model is improved and then detected by using four detection layers (detection scales), so that the network depth is further deepened, and the feature information can be better extracted from a deeper network, so that the multi-scale learning capability of the model under a dense target can be enhanced, the multi-level feature information of the dense target can be better learned, and the detection performance of the model under a dense scene is improved.

The method comprises the steps of preprocessing an image of the orange to be recognized to obtain a processed target orange image, and sampling the target orange image based on a deformable convolution network in a preset DS-YOLO network model to obtain deformable sampling points; carrying out self-adaptive adjustment on the weight coefficient of the deformable sampling point according to the target citrus image to obtain the adjusted deformable sampling point; performing feature extraction on the target citrus image according to the adjusted deformable sampling point to obtain a feature map containing the shape features and the position features of the dense citrus, calculating neuron weights corresponding to the feature map according to a preset SimAM attention mechanism and a preset energy function, and determining target neurons according to the calculation result; performing feature fusion on the feature map according to the target neuron to obtain target feature maps with different scales, and determining the coordinates and the size of a target prediction frame according to candidate frame adjustment parameters corresponding to the target feature maps; and marking the position of the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determining the quantity of the citrus in the citrus image to be identified according to the marking result. Because this embodiment is based on preset DS-YOLO network model and preset SimAM attention mechanism and treats discernment oranges and tangerines image and carry out the feature fusion to the oranges and tangerines in treating discernment oranges and tangerines image according to the target prediction frame that the target feature map of different yards corresponds carry out quantity detection, this embodiment compares in prior art to the detection effect of intensive oranges and tangerines not ideal, leads to detecting the precision low, this embodiment has promoted the detection precision of model, has realized that the reliability is high intensive oranges and tangerines quantity detects, in order to remedy prior art's not enough.

Referring to fig. 8, fig. 8 is a block diagram illustrating a first embodiment of the device for detecting the number of densely packed citrus fruits according to the present invention.

As shown in fig. 8, the device for detecting the number of dense oranges according to the embodiment of the present invention includes:

the image preprocessing module 10 is configured to perform image preprocessing on a citrus image to be identified to obtain a processed target citrus image;

the feature extraction module 20 is configured to perform feature extraction on the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map including shape features and position features of the dense citrus;

the feature fusion module 30 is configured to perform feature fusion on the feature map according to a preset SimAM attention mechanism to obtain target feature maps of different scales;

a citrus detecting module 40, configured to determine coordinates and a size of the target prediction frame according to the candidate frame adjustment parameter corresponding to the target feature map;

the citrus detecting module 40 is further configured to perform position marking on the citrus in the citrus image to be identified according to the coordinates and the size of the target prediction frame, and determine the number of the citrus in the citrus image to be identified according to a marking result.

Further, the feature extraction module 20 is further configured to sample the target citrus image based on a deformable convolution network in a preset DS-YOLO network model to obtain a deformable sampling point; carrying out self-adaptive adjustment on the weight coefficient of the deformable sampling point according to the target citrus image to obtain the adjusted deformable sampling point; and performing feature extraction on the target citrus image according to the adjusted deformable sampling point to obtain a feature map containing the shape features and the position features of the dense citrus.

Further, the feature fusion module 30 is further configured to calculate a neuron weight corresponding to the feature map according to a preset SimAM attention mechanism and a preset energy function, and determine a target neuron according to a calculation result; and performing feature fusion on the feature map according to the target neuron to obtain target feature maps with different scales.

Further, the feature fusion module 30 is further configured to perform feature fusion on the feature map according to the target neuron, so as to obtain a first feature map after feature fusion; performing Anchor coordinate matching on the first characteristic diagram according to a preset K-Mean clustering algorithm to obtain a target detection scale; and carrying out multi-scale detection on the first characteristic diagram according to the target detection scale to obtain target characteristic diagrams with different scales.

Further, the intensive citrus fruit number detection device further comprises: the model building module is used for inputting the collected dense citrus images into an original YOLOv4 model to obtain a feature map with a preset scale; stacking and convolving the feature maps with preset scales according to a preset SPP network to obtain processed feature maps; performing feature fusion processing on the processed feature map based on a path aggregation network to obtain a second feature map after feature fusion; and constructing a DS-YOLO network model based on the detection scale corresponding to the second feature map and the candidate frame adjustment parameter corresponding to the detection scale.

Further, the model building module is further configured to replace a convolutional layer of a residual error module residual error unit in the DS-YOLO network model with an improved deformable convolutional layer, and add the preset SimAM attention mechanism to a path aggregation network of the DS-YOLO network model to generate a new DS-YOLO network model; and taking the new DS-YOLO network model as the preset DS-YOLO network model.

Further, the model construction module is further used for testing the preset DS-YOLO network model through a dense citrus test set without image preprocessing to obtain a test result; and performing quality evaluation on the test result according to preset average detection precision, and determining the precision rate of the test result according to the quality evaluation result.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.

It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not described in detail in this embodiment may be referred to the method for detecting the number of dense oranges provided in any embodiment of the present invention, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A dense citrus quantity detection method is characterized by comprising the following steps:

2. The method for detecting the number of dense oranges according to claim 1, wherein the step of performing feature extraction on the target orange image based on a deformable convolution network in a preset DS-YOLO network model to obtain a feature map including shape features and position features of the dense oranges includes:

3. The method of detecting the amount of dense citrus fruit according to claim 2, wherein said step of performing feature fusion on said feature map according to a preset SimAM attention mechanism to obtain target feature maps of different scales comprises:

4. The method according to claim 3, wherein the step of performing feature fusion on the feature maps according to the target neurons to obtain target feature maps of different scales comprises:

performing Anchor coordinate matching on the first characteristic diagram according to a preset K-Mean clustering algorithm to obtain a target detection scale;

5. The method for detecting the number of dense oranges according to claim 4, wherein before the step of performing feature extraction on the target orange image based on a preset DS-YOLO network model to obtain a feature map including shape features and position features of the dense oranges, the method further comprises:

6. The method for detecting the number of dense citrus fruits according to claim 5, wherein the step of constructing the DS-YOLO network model based on the detection scale corresponding to the second feature map and the frame candidate adjustment parameter corresponding to the detection scale further comprises:

and taking the new DS-YOLO network model as the preset DS-YOLO network model.

7. The method of detecting dense citrus fruit count according to claim 6, wherein said step of using said new DS-YOLO network model as said preset DS-YOLO network model is followed by the steps of:

8. The intensive citrus quantity detection device is characterized by comprising: a memory, a processor, and a dense citrus quantity detection program stored on the memory and executable on the processor, the dense citrus quantity detection program when executed by the processor implementing the steps of the dense citrus quantity detection method according to any one of claims 1 to 7.

9. A storage medium having stored thereon a dense citrus quantity detection program which, when executed by a processor, implements the steps of the dense citrus quantity detection method according to any one of claims 1 to 7.

10. The utility model provides a dense oranges and tangerines quantity detection device which characterized in that, dense oranges and tangerines quantity detection device includes: