CN115496979A

CN115496979A - Orchard young fruit growth posture visual identification method based on multiple feature fusion

Info

Publication number: CN115496979A
Application number: CN202211120206.9A
Authority: CN
Inventors: 吕继东; 牛亮亮; 徐黎明; 邹凌; 韩颖; 戎海龙; 许浩; 卢文斌; 孙晓琴; 王凌云; 杨洋
Original assignee: Changzhou University
Current assignee: Changzhou University
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2022-12-20

Abstract

The invention relates to the technical field of image detection, in particular to an orchard young fruit growth posture visual identification method based on multiple feature fusion, which comprises the steps of collecting data images of orchard young fruits, and adjusting a target detection frame in the images; carrying out format conversion on the marked data set, and carrying out cutting pretreatment on the converted data set; constructing a young fruit growth posture feature extraction model, and carrying out deep fusion on a shallow feature map and a high feature map of the feature extraction model by adopting a Bi-FPN network; performing posture frame regression on the feature map subjected to the fusion processing by adopting a posture prediction layer, and extracting a target region; training the model through a training data set, storing the coordinates of the posture frame, and calculating the growth posture angle of the young fruit. The invention provides an effective solution in the aspects of realizing the mechanization, automation and intellectualization of intelligent bagging, ensuring the timely and high-efficiency bagging of young fruits, reducing the bagging operation cost and the like.

Description

Orchard young fruit growth posture visual identification method based on multiple feature fusion

Technical Field

The invention relates to the technical field of image processing, in particular to a visual identification method for the growth posture of young fruits in an orchard based on multi-feature fusion.

Background

The bagging technology is an important technology for producing green and high-quality high-grade fruits and vegetables, bird and insect damages to the fruits and vegetables can be effectively reduced through bagging, pesticide pollution, sun burn, wind and rain damage and scratching deformation are prevented, and the color and luster of the fruits and vegetables are improved. In the planting and production process of high-quality fruits, bagging is also an indispensable link, however, as picking ripe fruits, bagging of young fruits is a work with strong timeliness and huge workload, and is mainly completed by manpower or manual work matched with simple machinery at present, so that time and labor are consumed, labor intensity is high, and bagging quality is uneven; moreover, the problems of aging and shortage of agricultural labor are increasingly obvious, and the labor price of manual bagging is increased year by year, so that the corresponding production cost is increased, and the market competitiveness is influenced.

For the young fruit bag, the bag needs to be sleeved from the fruit bottom upwards, so the growth posture information is necessary, and the young fruit is small compared with the ripe fruit, and the color is close to the background of branches and leaves, so the difficulty in identifying the growth posture of the young fruit can be imagined.

Disclosure of Invention

Aiming at the defects of the existing algorithm, the invention provides an effective solution in the aspects of realizing the mechanization, automation and intellectualization of intelligent bagging, ensuring the timely and high-efficiency bagging of young fruits, reducing the bagging operation cost and the like.

The technical scheme adopted by the invention is as follows: a visual identification method for orchard young fruit growth postures based on multi-feature fusion comprises the following steps:

acquiring a data image of orchard young fruits, adjusting a target detection frame in the image, and enhancing and labeling the image;

further, the target detection frame is adjusted by replacing the horizontal frame with a posture frame with angle parameters through rolabelmg, wherein the posture frame comprises a target center point coordinate, a length and a width and an inclination angle.

Secondly, format conversion is carried out through the marked data set, and cutting pretreatment is carried out on the converted data set;

further, format conversion is to express the inclination angle of the fruit according to the angle formed by the long edge of the posture frame and the x axis clockwise.

Constructing a young fruit growth posture feature extraction model, and performing deep fusion on a shallow feature map and a high feature map of the feature extraction model by adopting a Bi-FPN network;

further, the feature extraction model comprises a Focus module, a feature extraction module P1, a CBL module, a feature extraction module P2, a CBL module, a feature extraction module P3, a CBL module, a feature extraction module P4, a CBL module and a feature extraction module P5 which are connected in sequence; the feature extraction module P2 consists of 2 Bottleneck CSP modules and a CA attention mechanism module, the feature extraction modules P3 and P4 consist of 8 Bottleneck CSP modules and a CA attention mechanism module, and the feature extraction module P5 consists of an SPP module, 2 Bottleneck CSP modules and a CA attention mechanism module.

The Focus module is to use a frame with 2 multiplied by 2 and step pitch of 2 to take pixel values of an image at an interval of one pixel, the fixed position value of each frame is placed on the same layer to obtain four pictures, the four pictures are spliced into a new picture, and the new picture is subjected to convolution operation to obtain a sampling characteristic diagram.

The CBL module consists of convolution + batch normalization + leak relu activation function.

The SPP module sends the sampling feature map into pooling layers with the sizes of 1 × 1,2 × 2 and 4 × 4 different convolution kernels;

the BottleneckCSP module is used for normalizing two branches after Concat through BatchNormalization and an LeakReLU activation function; wherein, one branch is composed of two CBL layers and a convolution layer; the other is composed of a convolution layer;

further, global pooling is carried out on the input feature graph along the horizontal direction and the vertical direction by utilizing Coordinate information embedding of a CA attention mechanism module, and feature graphs in the horizontal direction and the vertical direction are obtained;

splicing two feature maps together in coding Attention generation conversion, and obtaining a feature map F through convolution transformation ₁ Normalizing to obtain a characteristic diagram f;

decomposing the characteristic diagram f into f along the horizontal and vertical directions _w ∈R ^C/r×W And f _h ∈R ^C/r×H R is the reduction rate, respectively for f _w And f _h Convolution by 1X 1 to obtain a feature map F _w And F _h Respectively obtaining attention weights g of the feature map in two spatial directions by using sigmoid activation function _w And g _h ；

The original feature map is multiplied by attention weights in the horizontal and vertical directions, and the attention in both the horizontal and vertical directions is simultaneously applied to the input features.

Further, the deep fusion of the shallow feature map and the high feature map of the feature extraction model by adopting the Bi-FPN network comprises the following steps:

firstly, fusing a P5 characteristic layer with a P4 characteristic layer through up-sampling; secondly, performing up-sampling and P3 feature layer secondary fusion on the obtained fusion feature information again; finally, performing up-sampling and P2 feature layer three times of fusion on the feature information obtained by the second fusion to complete feature information fusion from top to bottom; in a similar way, firstly, fusing feature information obtained by down-sampling and secondary fusion of the information obtained by the third fusion; secondly, fusing the feature information obtained by down-sampling and first fusing the fusion information; and finally, fusing the fusion information with the P5 characteristic diagram through downsampling.

Performing posture frame regression on the feature map subjected to the fusion processing by adopting a posture prediction layer, and extracting a target region;

furthermore, an angle prediction channel is added to the head structure of the prediction layer, the channel dimension of the head detection layer is 3 × (C +5+ 180), wherein 3 represents that 3 anchor frames with length-width ratios are preset in each grid, and each anchor frame is responsible for predicting C types and frame parameter information (x, y, w, h, p) _r ) (ii) a Class prediction channel (C) ₀ ,C ₁ ,…,C _n )，p _r The foreground confidence of the prediction box is represented and each anchor box will add one more angular prediction of 180 channels.

Training the model through a training data set, storing coordinates of a posture frame, and calculating a young fruit growth posture angle;

further, the coordinates of the posture frame comprise four vertex coordinates, an included angle formed by the long edge and the x axis clockwise is a growth posture angle of the young fruit, and the calculation formula is as follows:

wherein (x) ₁ ,y ₁ )、(x ₂ ,y ₂ )、(x ₃ ,y ₃ ) And (x) ₄ ,y ₄ ) Four vertex coordinates.

The invention has the beneficial effects that:

1. aiming at the problem of visual identification of the growth posture of young fruits in an orchard growth environment, the posture angle information is also obtained while the position information of the young fruits is obtained.

2. An attention mechanism is introduced to the near color background problem, and compared with the method without the attention mechanism, the recognition performance is improved.

3. In order to further improve the identification capability of the network to smaller targets, a small target detection layer is added, the growth posture of the young peach fruits can be better identified, and the missing rate is greatly reduced compared with the case that the small target detection layer is not used.

4. The intelligent bagging machine provides an effective solution in the aspects of realizing the mechanization, automation and intellectualization of intelligent bagging, ensuring the timely and high-efficiency bagging of young fruits, reducing the bagging operation cost and the like, and also provides reference for solving the bagging research problems of other vegetables and fruits.

Drawings

FIG. 1 is a diagram of a young fruit growth posture feature extraction model according to the present invention;

FIG. 2 is a schematic diagram of a CA attention mechanism model according to the present invention;

FIG. 3 is a Bi-FPN plot of the detection of added small targets of the present invention;

fig. 4 is a visual image of the growth posture angle of the young peach fruit.

Detailed Description

The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic drawings and illustrate only the basic structure of the invention in a schematic manner, and therefore only show the structures relevant to the invention.

A visual identification method for orchard young fruit growth postures based on multi-feature fusion comprises the following steps:

acquiring a young fruit data image of an orchard, adjusting a target detection frame in the image, and enhancing and labeling the image;

by collecting young fruit data sets in different time periods and different weather conditions, the data sets are more similar to different scenes in a natural growth state, and the adaptability of a network is enhanced; then, image data is enriched by image enhancement, setting a contrast enhancement factor to be 1.5, enhancing the brightness by 1.5, rotating, adding Gaussian noise and the like; the target detection frame is adjusted by replacing a horizontal frame with a posture frame with angle parameters through rolabel img, wherein the posture frame comprises a target central point coordinate, a length, a width and an inclination angle, and a posture information data set is obtained.

Secondly, format conversion is carried out through the marked data set, and cutting and segmentation pretreatment are carried out on the converted data set;

and the format conversion is carried out on the inclination angle of the fruit by representing the four vertex coordinates of the posture frame according to the clockwise included angle formed by the long edge of the posture frame and the x axis.

Constructing a training set and a testing set of the preprocessed young fruit data set; the pretreatment comprises the following steps: the method comprises the steps of segmenting a young fruit image by utilizing cutting, then enhancing the image by means of image enhancement Mosaic data, wherein Mosaic is enhanced according to a data format, and then dividing according to the proportion of a training set to a verification set 8.

fig. 1 is a feature extraction model, which is composed of Focus, CBL (restriction-batch normalization-leak relu), spatial pyramid pooling module (SPP), bottlenceckcsp, and attention model module, and is used for extracting features of an image.

The principle of the Focus module is that a series of slicing operations are carried out on a picture before the picture enters a backbone network, specifically, a frame with 2 x2 and a step pitch of 2 is used, a value is taken from every other pixel in each image, a fixed position value of each frame is taken and placed on the same layer, four pictures are obtained, namely, W and H information is concentrated into a channel space, and an input channel is expanded by 4 times; finally, compared with an RGB three-channel mode of an original image, the spliced image is changed into 12 channels, then the obtained new image is subjected to convolution operation, and finally a double-sampling feature image under the condition of no information loss is obtained, and the function of the feature image is to reduce the calculated amount and accelerate the speed; wherein.

The CBL consists of convolution, batchNormalization normalization and LeakReLU activation functions, and has the function of improving effective information for extracting picture features.

The SPP module sends the feature map of the previous layer into pooling layers with sizes of 1 × 1,2 × 2 and 4 × 4 different convolution kernels, and the detection capability of the multi-receptive-field fusion promotion model for complex scenes is achieved.

The BottleneckCSP is formed by two branches, concat, a BatchNormalization normalization and a LeakReLU activation function; one branch is composed of two CBL layers and one convolution layer, the other branch is composed of one convolution layer, the effect of the branch is that the structure fuses the thought of a residual error structure, the two different branches are connected, the feature fusion of different levels is realized, and the feature extraction capability of the network is greatly improved.

For the small target detection processing, a prediction head for small target detection is added on a P2 layer (160 × 160 pixels) feature map of a backbone network to predict targets with the size of pixels 4 × 4.

As shown in FIG. 2, the attention mechanism module focuses attention on important areas of the image and ignores irrelevant target areas, resulting in better separation of the target from the background.

The method utilizes Coordinate information embedding in a CA attention mechanism module to perform global pooling on input feature maps along the horizontal direction and the vertical direction, specifically, given input x, encoding each channel along the horizontal Coordinate and the vertical Coordinate respectively by using posing with the sizes of (1, W) and (H, 1) to obtain feature maps in the horizontal direction and the vertical direction; then, in the Coordinate orientation generation conversion, the two characteristic maps are spliced together, and the characteristic map F is obtained through convolution transformation ₁ Using normalization to obtain a characteristic diagram f; then decomposing the characteristic diagram f into f along the horizontal direction and the vertical direction _w ∈R ^C/r×W And f _h ∈R ^C/r×H R is the reduction rate, respectively for f _w And f _h Performing convolution calculation of 1 × 1 to obtain a feature diagram F with the same channel number as the original one _w And F _h Respectively obtaining attention weights g of the feature map in two spatial directions by using sigmoid activation function _w And g _h (ii) a Finally, multiplying the original feature map by attention weights in the horizontal and vertical directions, and simultaneously applying the attention in the horizontal and vertical directions to the input features; CA pays attention to the relationship among the channels, meanwhile, the long-term dependence relationship is captured by using accurate position information, the target characteristics can be better focused to weaken background noise, and the research significance lies in that young fruits belong to near-color backgrounds and the detection effect can be improved.

As shown in fig. 3, a Bi-directional feature fusion from top to bottom and from bottom to top is repeatedly applied to a shallow feature map and a high feature map of a feature extraction model through a Bi-FPN network structure, wherein Bi-FPN is used for performing deep fusion on a P2-P5 feature layer, and firstly, the P5 feature layer is fused with a P4 feature layer through up-sampling; secondly, performing up-sampling and P3 feature layer secondary fusion on the obtained fusion feature information again; and finally, performing three times of fusion on the feature information obtained by the second fusion and the P2 feature layer to complete the feature information fusion from top to bottom. Similarly, fusing feature information obtained by down-sampling and secondary fusion of the information obtained by the third fusion; secondly, fusing the feature information obtained by down-sampling and first fusing the fused information of the previous step; and finally, fusing the fusion information obtained in the last step with the P5 characteristic layer through downsampling, completing the characteristic information fusion from bottom to top, fully utilizing the shallow information, and reducing the negative influence of the object on the scale.

Extracting the target area by adopting a method of performing posture frame regression on the feature map subjected to the fusion processing by adopting a posture prediction layer;

furthermore, an angle parameter theta is added to the head structure of the attitude prediction layer ₁ Angle parameter θ ₁ The dimension comprises 180 prediction angle channels (1, 2,3 \8230180; 180), and then the angle regression task is converted into classification, so that the prediction of the growth attitude angle can be realized. Wherein, the growth posture of the orchard young fruitThe dimension of a head detection layer channel of the visual identification network is 3 × (C +5+ 180), wherein 3 indicates that anchor frames with 3 length-width ratios are preset for each grid, and each anchor frame is responsible for predicting C category and border parameter information (x, y, w, h, p) _r ) (ii) a Class prediction channel (C) ₀ ,C ₁ ,…,C _n )，p _r The foreground confidence for the prediction box is indicated and each anchor box will add one more angular prediction for 180 channels.

According to the growth characteristics of young fruits in the natural state of an orchard, the young fruits are in downward postures in multiple growth directions and have a certain angle, so that great challenge is brought to the intelligent bagging research; in order to guarantee bagging of young fruits growing in multiple angles, the target position is detected by using the posture frame, and the inclination angle of the fruits is represented by the included angle between the long edge of the posture frame and the x axis, so that the problems that the target is difficult to find and the bagging angle cannot be determined in the young fruit bagging process are solved; the background of the young fruit image collected in the unstructured field growth environment is complex and similar to the color of branches and leaves, the method belongs to target detection under the near-color background, and certain difficulty is brought to identification of the target detection. Therefore, the last layer of the bottleckCSP of the backbone is replaced by the attention mechanism, so that attention is focused on important areas of the image and irrelevant target areas are ignored, the position information and the channel relation can be captured simultaneously, the remote dependence relation is obtained, the feature representation of the network is enhanced, and the identification effect is improved to a certain extent in a complex environment; the invention relates to a method for detecting weak and small targets under complex background, which is characterized in that young fruits in an orchard are small, and identification of the young fruits belongs to the problem of detection of the weak and small targets under the complex background.

And fifthly, training the visual identification network model of the orchard young fruit growth posture through a training data set, predicting the test set by the obtained pre-training weight, returning the class name and the confidence level, and finally storing the coordinates of the posture frame to further calculate the growth posture angle.

As shown in fig. 3, the posture angle is detected by the visual recognition network based on the growth posture of the orchard young fruit with multiple feature fusion, four vertex coordinates of the target, namely ((x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4)), are obtained, the coordinates are counted clockwise from the upper left corner, in order to unify the detection direction of the posture target, the angle formed by the long edge and the x axis clockwise is proposed to be expressed as the growth posture angle of the young fruit, the method additionally calculates the angle of the target through the prediction result, and the method is calculated through formula (1) according to the predicted and displayed coordinates:

wherein, by comparing the side length of the posture frame, the angle of the slope corresponding to the longer side is calculated by using an inverse trigonometric function arctan, and the growth posture angle theta of the young fruit is represented.

The experimental results are as follows:

the orchard peach young fruit is taken as a research object, and in order to explore the influence of different network models on the identification of the young fruit growth attitude angles, the network designed by the experiment is compared with R3Det and R-CenterNet. According to the identification result, the visual identification network for the growth posture of the orchard young fruit with the multiple feature fusion has the best network performance in three models. The experiment verifies 300 test images through training weights, the identification effect is shown in figure 4, and the experiment shows that the average accuracy of the model and the average accuracy of angle estimation are the best, so that the network can effectively identify the growth postures of young peaches.

In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims

1. A visual identification method for orchard young fruit growth postures based on multi-feature fusion is characterized by comprising the following steps:

step three, constructing a young fruit growth posture feature extraction model, and performing deep fusion on a shallow feature map and a high feature map of the feature extraction model by adopting a Bi-FPN network;

performing attitude frame regression on the feature map subjected to the fusion processing by adopting an attitude prediction layer, and extracting a target area;

and fifthly, training the model through a training data set, storing the coordinates of the posture frame, and calculating the growth posture angle of the young fruit.

2. The visual identification method for the growth posture of the young fruit of the orchard based on the multi-feature fusion as claimed in claim 1, wherein the target detection frame is adjusted by replacing a horizontal frame with a posture frame with angle parameters through rolabelmg, wherein the posture frame comprises target center point coordinates, length and width and an inclination angle.

3. The visual identification method for the growth postures of the young fruits of the orchard based on the multi-feature fusion as claimed in claim 1, wherein format conversion is carried out to express the inclination angles of the fruits according to the included angle formed by the long edge of the posture frame and the x axis clockwise.

4. The visual identification method for the growth posture of the young fruit of the orchard based on the multiple feature fusion is characterized in that the feature extraction model comprises a Focus module, a feature extraction module P1, a CBL module, a feature extraction module P2, a CBL module, a feature extraction module P3, a CBL module, a feature extraction module P4, a CBL module and a feature extraction module P5 which are sequentially connected; the feature extraction module P2 consists of 2 BottleneckCSP modules and a CA attention mechanism module, the feature extraction modules P3 and P4 consist of 8 BottleneckCSP modules and a CA attention mechanism module, and the feature extraction module P5 consists of an SPP module, 2 BottleneckCSP modules and a CA attention mechanism module.

5. The visual identification method for the growth postures of the young fruits in the orchard based on the multi-feature fusion as claimed in claim 4, characterized in that the inputted feature map is subjected to global pooling along the horizontal direction and the vertical direction by utilizing Coordinate information embedding of a CA attention mechanism module, and feature maps in the horizontal direction and the vertical direction are obtained;

splicing two characteristic graphs together in the coding Attention generation conversion, and obtaining a characteristic graph F through convolution transformation ₁ Normalizing to obtain a characteristic diagram f;

decomposing the characteristic diagram f into f along the horizontal direction and the vertical direction _w ∈R ^C/r×W And f _h ∈R ^C/r×H R is the reduction rate, respectively for f _w And f _h Convolution by 1X 1 to obtain a feature map F _w And F _h Respectively obtaining attention weights g of the feature map in two spatial directions by using sigmoid activation function _w And g _h ；

6. The visual identification method for the growth posture of the young fruit of the orchard based on the multiple feature fusion as claimed in claim 1, wherein the deep fusion of the shallow feature map and the high feature map of the feature extraction model by adopting a Bi-FPN network specifically comprises:

firstly, fusing a P5 characteristic layer with a P4 characteristic layer through up-sampling; secondly, performing secondary fusion on the obtained fusion characteristic information with a P3 characteristic layer through upsampling again; finally, performing up-sampling and three times of fusion on the feature information obtained by the second fusion with the P2 feature layer to complete feature information fusion from top to bottom; in a similar way, firstly, fusing feature information obtained by down-sampling and secondary fusion of the information obtained by the third fusion; secondly, fusing the feature information obtained by down-sampling and first fusing the fusion information; and finally, fusing the fusion information with the P5 characteristic diagram through downsampling.

7. The visual identification method for orchard young fruit growth postures based on multi-feature fusion as claimed in claim 1, wherein an angle prediction channel is added to a head structure of a posture prediction layer, the channel dimension of a head detection layer is 3 x (C +5+ 180), wherein 3 denotes that 3 anchor frames with length-width ratios are preset for each grid, and each anchor frame is responsible for predicting C category and border parameter information (x, y, w, h, p) _r ) (ii) a Class prediction channel (C) ₀ ,C ₁ ,…,C _n )，p _r The foreground confidence of the prediction box is indicated and each anchor box will add one more angular prediction of 180 channels.

8. The visual identification method for the growth posture of the young fruit of the orchard based on the multi-feature fusion as claimed in claim 1, wherein the coordinates of the posture frame comprise coordinates of four vertexes, an included angle formed by a long edge and an x-axis clockwise is the growth posture angle of the young fruit, and the calculation formula is as follows: