CN112541508A

CN112541508A - Fruit segmentation and recognition method and system and fruit picking robot

Info

Publication number: CN112541508A
Application number: CN202011519247.6A
Authority: CN
Inventors: 贾伟宽; 张中华; 邵文静; 侯素娟; 郑元杰
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-03-23

Abstract

The invention provides a fruit segmentation identification method and system and a fruit picking robot, belonging to the technical field of fruit picking robots, wherein the outline of a target fruit in a fruit image is labeled; extracting the dimension of the target fruit and the characteristics of the target missing fruit in the marked fruit image; the obtained characteristic graph is transmitted to a regional candidate network, and the interested region with the same scale is obtained through non-maximum value inhibition; predicting the confidence coefficient of the fruit, the frame coordinates and the segmentation mask of the interested area through two full-connected layers and a full convolution network; calculating the fruit confidence coefficient, the loss between the frame coordinate and the segmentation mask and the loss between the frame coordinate and the marked value, updating the network parameters through gradient feedback, and continuously iterating until the parameters are stable to obtain an identification model for segmentation identification. The invention realizes the end-to-end detection process, has high precision and strong robustness, can realize the effective division of fruits in the orchard environment with various interferences and lays the foundation for promoting the deployment of the apple picking robot to the practical application.

Description

Fruit segmentation and recognition method and system and fruit picking robot

Technical Field

The invention relates to the technical field of fruit picking robots, in particular to a fruit segmentation and identification method and system and a fruit picking robot.

Background

The real application of the fruit picking robot has important significance for promoting the production automation and management intelligence of the fruit and vegetable industry, and the visual system is used as the most basic and important link, so that the target fruit can be accurately segmented under the complex orchard environment, and the operation quality and the operation efficiency of the picking robot can be directly related. Since the mid-century intelligent picking came out, the recognition algorithm of the target fruit has attracted the attention of numerous scholars at home and abroad, and certain research bases and achievements have been accumulated in the technical categories of machine learning, deep learning and the like, however, the current segmentation method is difficult to deal with various interferences existing in the natural environment, such as fruit overlapping, branch and leaf shielding, illumination and weather change, mixed noise, homochromatic background and other factors, and the segmentation effect of each model is greatly limited. Therefore, the detection precision and the anti-interference capability of the model are further improved, and the high efficiency and the stability of the visual system are improved.

The main reasons for the decrease of the model detection effect are that the feature extraction capability of the model itself is insufficient, and in addition, various interferences exist in the orchard environment, so that the features of the target fruit in the aspects of shape, color, texture and the like are lost, and the subsequent steps of the model are difficult to support for making correct judgment, so that the target fruit is detected by mistake and missed.

Disclosure of Invention

The invention aims to provide a fruit segmentation and identification method and system and a fruit picking robot which can effectively improve the segmentation effect of a model on clustered or overlapped target fruits, effectively improve the segmentation effect of the model on target fruits with different scales, improve the segmentation effect of the model on characteristic-missing target fruits and adapt to the identification of different types of target fruits in a complex orchard environment, so as to solve at least one technical problem in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, the present invention provides a fruit segmentation and identification method, including:

step S110: collecting fruit images containing different interferences in an orchard environment, and labeling the outlines of target fruits in the fruit images;

step S120: extracting the dimension of the target fruit and the characteristics of the target missing fruit in the marked fruit image;

step S130: the feature map obtained in the step S120 is transmitted to a regional candidate network, and the interested regions with the same scale are obtained through non-maximum suppression;

step S140: predicting the fruit confidence, the frame coordinates and the segmentation mask of the interested areas with the same scale through two full-connected layers and a full convolution network;

step S150: calculating the fruit confidence coefficient, the loss between the frame coordinate and the segmentation mask and the loss between the frame coordinate and the marked value, updating the network parameters through gradient feedback, continuously iterating until the parameters are stable, obtaining an identification model, and carrying out segmentation identification on fruits in the orchard environment image.

Preferably, the step S120 specifically includes:

step S121: inputting the marked fruit images into a residual error network in batches, and performing continuous downsampling through convolution operation;

step S122: introducing a characteristic pyramid network, and integrating characteristic representations of all layers in a residual error network through top-to-bottom and transverse connection;

step S123: sampling all layers of feature maps in the feature pyramid network to the same scale and integrating, aggregating the whole map information by a Gaussian non-local attention mechanism, representing the aggregated features, sampling and re-fusing to obtain a balanced feature pyramid, and outputting the balanced feature map.

Preferably, the step S130 specifically includes:

step S131: inputting the balanced feature map into a regional candidate network, and generating a predefined anchor frame according to different size and aspect ratios by taking coordinates of each spatial position on the feature map, which correspond to the original map, as centers;

step S132: judging positive and negative samples based on the intersection and parallel ratio between each anchor frame and all the marking frames, generating a training target of the regional candidate network, and primarily predicting a target fruit through classification branches and regression branches;

step S133: and screening the generated candidate frames through boundary elimination and non-maximum suppression, selecting the first N candidate frames according to confidence degree sequencing, selecting positive and negative samples according to a certain proportion, and inputting the positive and negative samples into a RoI Align layer for sampling to the same size.

Preferably, the step S140 specifically includes:

step S141: inputting the regions of interest with the same size into two fully-connected branches, and respectively outputting the probability vector of each candidate frame belonging to the target fruit and the corresponding frame offset;

step S142: and the two fully-connected branches are parallel to a full convolution network to realize target fruit mask prediction, multi-dimensional feature representation is segmented for each candidate frame, binarization is carried out, and a segmentation graph of a background and a foreground is generated.

Preferably, the step S150 specifically includes:

and adding the fruit confidence coefficient, the frame coordinate and the loss between the segmentation mask and the labeled value to obtain a final loss function, performing back propagation by using a random gradient descent method, continuously optimizing the model parameters to be stable, and fitting training data to obtain the recognition model.

In a second aspect, the present invention provides a fruit segmentation recognition system, comprising:

the fruit image acquisition module is used for acquiring fruit images containing different interferences in an orchard environment and marking the outline of a target fruit in the fruit images;

the first extraction module is used for extracting the scale size of the target fruit and the characteristics of the target missing fruit in the marked fruit image to obtain a characteristic diagram;

the second extraction module is used for combining the regional candidate network and obtaining the interested regions with the same scale in the characteristic diagram through non-maximum value inhibition;

the prediction module is used for predicting the fruit confidence, the frame coordinates and the segmentation mask of the interested region with the same scale through two full-connected layers and a full convolution network;

and the identification module is used for calculating the fruit confidence coefficient, the frame coordinate and the loss between the segmentation mask and the labeled value, updating the network parameters through gradient return, continuously iterating until the parameters are stable, obtaining an identification model, and performing segmentation identification on the fruits in the orchard environment image.

Preferably, the first extraction module includes:

the sampling unit is used for combining the marked fruit image into a residual error network and carrying out continuous downsampling through convolution operation;

the characteristic representation unit is used for introducing a characteristic pyramid network and integrating the characteristic representation of each layer in the residual error network through top-to-bottom and transverse connection;

and the balancing unit is used for sampling and integrating all layers of feature graphs in the feature pyramid network to the same scale, aggregating the whole graph information through a Gaussian non-local attention mechanism, representing the aggregated features back to sampling and re-fusing to obtain a balanced feature pyramid, and outputting the balanced feature graph.

Preferably, the second extraction module includes:

the predefined unit is used for combining the balanced feature map with the regional candidate network, taking the coordinate of each space position on the feature map, which corresponds to the original map, as the center, and generating a predefined anchor frame according to different sizes and aspect ratios;

the preliminary prediction unit is used for judging positive and negative samples based on the intersection and parallel ratio between each anchor frame and all the marking frames, generating a training target of the regional candidate network, and preliminarily predicting target fruits through classification branches and regression branches;

and the screening unit is used for screening the generated candidate frames through boundary elimination and non-maximum suppression, selecting the first N candidate frames according to confidence degree sequencing, selecting positive and negative samples according to a certain proportion, and combining the positive and negative samples with the RoI Align layer to sample to the same size.

Preferably, the prediction module comprises:

the calculation unit is used for combining the regions of interest with the same size with the two fully-connected branches and respectively outputting the probability vector of each candidate frame belonging to the target fruit and the corresponding frame offset;

and the segmentation unit is used for realizing target fruit mask prediction by parallel one full convolution network with the two full-connected branches, segmenting out multi-dimensional feature representation aiming at each candidate frame, and carrying out binarization to generate a segmentation graph of a background and a foreground.

In a third aspect, the present invention provides a fruit picking robot comprising a fruit segmentation identification system as described above.

The invention has the beneficial effects that: the apple picking robot has the advantages that the end-to-end detection process can be realized, the precision is high, the robustness is strong, a good segmentation effect can be shown under various interferences existing in the orchard environment, and a foundation is laid for further promoting the deployment of the apple picking robot to the practical application.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is an apple image collected in different time periods and different interference scenes according to an embodiment of the present invention.

Fig. 2 is a diagram illustrating an effect of different image enhancements applied to the same image according to an embodiment of the present invention.

Fig. 3 is a flowchart illustrating an embodiment of a partition architecture according to the present invention.

Fig. 4 is a flowchart illustrating a specific implementation of the feature obtaining stage according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a gaussian non-local attention mechanism according to an embodiment of the present invention.

Fig. 6 is a diagram of a regional candidate network architecture according to an embodiment of the present invention.

Fig. 7 is an overall flowchart of the trained network in the inference phase according to the embodiment of the present invention.

FIG. 8 is a diagram illustrating the effect of segmenting a target fruit in a complex environment according to an embodiment of the present invention

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.

It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.

Example 1

The embodiment 1 of the invention provides a method for accurately positioning and dividing target fruits suitable for an apple picking robot, which is used for the method for accurately positioning and dividing the target fruits of a visual system of the apple picking robot and comprises the following steps:

step 1: image acquisition and labeling: under the orchard environment, images containing target fruits under different interferences are collected and labeled by using labelme image labeling software.

Step 2: image feature acquisition: aiming at a training target, in order to improve the detection effect of a model on different types of target fruits, a backbone network architecture for special diagnosis extraction is designed, and for an input image X:

step 2.1: firstly, after convolution and pooling of residual error network (ResNet) and other down-sampling operations, the semantic capacity of each space position on the feature map is gradually enriched, and the feature representation output by each layer of residual error block is respectively marked as { F }₂,F₃,F₄,F₅}；

Step 2.2: will { F₂,F₃,F₄,F₅Merging the top-down and transverse connection frameworks to obtain a characteristic pyramid which is marked as { A }₂,A₃,A₄,A₅Respectively using each layer of feature map for subsequent operation, and improving the recognition effect of the model on target fruits with different scales;

step 2.3: will be { A₂,A₃,A₄,A₅Uniformly sampling to A₄Size and fusion to give A_concatA 'is obtained by inputting the fused product to a non-local attention machine of Gaussian'_concatAggregating the characteristics of the whole graph and simultaneously inhibiting interference factors, then resampling and fusing to obtain the final balanced characteristic representation { P }₂,P₃,P₄,P₅}。

And step 3: region of interest generation: respectively using { P₂,P₃,P₄,P₅Each layer of feature representation in the graph is input into a Region candidate network (RPN), a predefined anchor frame is generated on an input picture and is subjected to preliminary adjustment, a candidate frame is obtained through screening, a corresponding Region of Interest (Region of Interest) on the feature graph is intercepted, and sampling is carried out to the same size through a RoI Align layer.

And 4, step 4: and (3) predicting the detection result: the two Fully connected branches respectively predict the fruit confidence and the bounding box offset, and the parallel mask branch generates a segmentation mask through a Full Convolution Network (FCN).

And 5: model training and optimization: and generating a training target according to the labeling information, calculating loss with a predicted value generated by the model, updating model parameters through a gradient back propagation algorithm, and continuously iterating and optimizing.

In the embodiment 1 of the invention, the method is suitable for the accurate positioning and segmentation of the target fruits of the apple picking robot, and solves the problem that the visual system of the fruit picking robot is difficult to deal with various interferences in the natural environment. The method is high in precision and strong in robustness, can realize an end-to-end identification process, and is suitable for actual operation of the apple picking robot.

Example 2

An embodiment 2 of the present invention provides a fruit division and recognition system, including:

The first extraction module comprises: the sampling unit is used for combining the marked fruit image into a residual error network and carrying out continuous downsampling through convolution operation; the characteristic representation unit is used for introducing a characteristic pyramid network and integrating the characteristic representation of each layer in the residual error network through top-to-bottom and transverse connection; and the balancing unit is used for sampling and integrating all layers of feature graphs in the feature pyramid network to the same scale, aggregating the whole graph information through a Gaussian non-local attention mechanism, representing the aggregated features back to sampling and re-fusing to obtain a balanced feature pyramid, and outputting the balanced feature graph.

The second extraction module comprises: the predefined unit is used for combining the balanced feature map with the regional candidate network, taking the coordinate of each space position on the feature map, which corresponds to the original map, as the center, and generating a predefined anchor frame according to different sizes and aspect ratios; the preliminary prediction unit is used for judging positive and negative samples based on the intersection and parallel ratio between each anchor frame and all the marking frames, generating a training target of the regional candidate network, and preliminarily predicting target fruits through classification branches and regression branches; and the screening unit is used for screening the generated candidate frames through boundary elimination and non-maximum suppression, selecting the first N candidate frames according to confidence degree sequencing, selecting positive and negative samples according to a certain proportion, and combining the positive and negative samples with the RoI Align layer to sample to the same size.

The prediction module comprises: the calculation unit is used for combining the regions of interest with the same size with the two fully-connected branches and respectively outputting the probability vector of each candidate frame belonging to the target fruit and the corresponding frame offset; and the segmentation unit is used for realizing target fruit mask prediction by parallel one full convolution network with the two full-connected branches, segmenting out multi-dimensional feature representation aiming at each candidate frame, and carrying out binarization to generate a segmentation graph of a background and a foreground.

In this embodiment 2, a fruit division recognition method is implemented based on the fruit division recognition subsystem, and includes the following steps:

In this embodiment 2, the step S120 specifically includes:

In this embodiment 2, the step S130 specifically includes:

In this embodiment 2, the step S140 specifically includes:

In this embodiment 2, the step S150 specifically includes:

Example 3

An embodiment 3 of the present invention provides a fruit picking robot, including a fruit division recognition system, where the fruit division recognition system is capable of implementing a fruit division recognition method, and the fruit division recognition method includes the following steps:

step 1: and (5) making a data set. Collecting images containing different interferences in a complex orchard environment and marking the outlines of target fruits in the images to produce a data set for subsequent training, verification and testing of a model;

step 2: and (6) obtaining the characteristics. Inputting the picture into a combined backbone architecture of a residual error network, a feature pyramid network and a balanced feature pyramid, and fully extracting features of small-scale and target missing fruits in the picture;

and step 3: and generating a region of interest. Transmitting the characteristic diagrams of each layer in the steps to a regional candidate network, and obtaining the interested region with the same scale through operations such as non-maximum value inhibition;

and 4, step 4: and (6) result prediction. Embedding three parallel branches, and predicting fruit confidence, frame coordinates and a segmentation mask through two full-connected layers and a full convolution network respectively;

and 5: and (6) optimizing the model. And calculating the loss between the prediction information and the labeled value, updating network parameters through gradient return, and continuously performing iterative training and evaluation to finally enable the model to tend to be stable.

The feature extraction method in the step 2 comprises the following steps:

step 2.1: inputting the image into a residual error network by taking batch as a unit, continuously sampling the image through operations such as convolution and the like, and gradually enriching semantic capacity in the feature map;

step 2.2: a characteristic pyramid network is introduced, and the detection effect of the model on target fruits with different scales, especially small scales, is improved by efficiently integrating characteristic representations of all layers in the residual error network from top to bottom and in a transverse connection manner;

step 2.3: sampling all layers of feature maps in the pyramid to the same scale and integrating, aggregating full map information through a Gaussian non-local attention mechanism, inhibiting interferences such as homochromatic system backgrounds and the like, representing refined features back to the sampling and re-fusing to obtain a balanced feature pyramid;

the region of interest generation method in step 3 comprises the following steps:

step 3.1: inputting each layer of balanced feature map into a regional candidate network, and generating a predefined anchor frame according to different size and aspect ratio by taking a coordinate of each spatial position on the feature map, which corresponds to the original map, as a center;

step 3.2: judging positive and negative samples based on the intersection and parallel ratio between each anchor frame and all the marking frames, generating a training target of the regional candidate network, and primarily predicting a target fruit through two branches of classification and regression;

step 3.3: and screening the generated candidate frames through boundary elimination and non-maximum suppression, selecting the first N candidate frames according to confidence degree sequencing, selecting positive and negative samples according to a certain proportion, inputting the positive and negative samples into a RoI Align layer, sampling the positive and negative samples to the same size, and preparing for subsequent fruit identification.

The target fruit prediction method in the step 4 comprises the following steps:

step 4.1: inputting the regions of interest with the same size into two fully-connected branches, and respectively outputting the probability vector of each candidate box belonging to the target fruit and the corresponding frame offset so as to more accurately regress the target detection box;

step 4.2: and a full convolution network is parallel to the two branches to realize the task of target fruit mask prediction, a feature representation of Km2 dimension is segmented for each candidate box, and binarization is performed by using 0.5 as a threshold value during testing, so that a segmentation graph of a background and a foreground is generated.

The model optimization method in step 5 comprises the following steps: and adding the five losses generated in the process to obtain a final loss function, wherein the classification loss is calculated by adopting a cross entropy loss function, the regression loss is sampled and smoothed by L1 loss calculation, and for the loss of the mask branch, sigmoid is applied to each spatial position, and then the average value of the cross entropies of all pixels on the RoI is taken. And finally, performing back propagation by a random gradient descent method, continuously optimizing model parameters, and fitting training data to obtain the recognition model.

In this embodiment 3, the fruit picking robot is used to pick fruits, and it is necessary to identify the fruit target in a complicated orchard environment by segmentation. In an orchard environment, images of target fruits under different interferences including different time periods such as early morning, noon and night, overlapping, shielding, direct lighting, backlight, raining and the like are collected, as shown in fig. 1(a) and 1 (b). And the acquired image is subjected to image enhancement processing such as atomization, brightness enhancement, contrast reduction, gaussian noise, impulse noise, poisson noise and the like, as shown in fig. 2. And marking the target fruit in the image, and making a data set for subsequent operation.

The overall architecture of the recognition model is shown in fig. 3 and can be divided into three stages, namely feature acquisition, RoIs generation and result prediction.

1. A characteristic acquisition stage:

as shown in fig. 4, for an input picture X, the specific implementation flow at this stage sequentially passes through three modules, namely ResNet, FPN and BFP, and the image features are extracted in depth by means of a combined backbone architecture of the three modules. Wherein:

ResNet extracts features:

the depth of a Convolutional Neural Network (CNN) is important to the performance influence of a model, the training of a deep network is usually accompanied with the problem of gradient disappearance or explosion, ResNet well solves the contradiction through the design of a residual block, the semantic capacity of a deep characteristic diagram is gradually enriched through the technologies of convolution, pooling, residual learning and the like, and the characteristic representation output by each layer of residual block is respectively marked as { F }₂,F₃,F₄,F₅}；

FPN fusion characteristics:

since ResNet is subject to constant downsampling operations, albeit F₅The method contains rich semantic features, but is not suitable for detecting small-scale target fruits, and has poor detection effect on long-range images. Thus, will { F₂,F₃,F₄,F₅Fusing the top-down and transverse connection frameworks to obtain a characteristic pyramid, and marking characteristic graphs of each layer as { A }₂,A₃,A₄,A₅And respectively using the characteristic diagram of each layer for subsequent operation, and improving the recognition effect of the model on target fruits with different scales.

BFP refining characteristics:

will be { A₂,A₃,A₄,A₅It is sampled uniformly to the same value as A by interpolation or pooling₄The same dimensionality is fused to obtain a fused characteristic diagram A_concat∈R^C×H×WAs shown in formula (1):

wherein A is_concatIs a fused characteristic diagram, R is a real number, C \ H \ W is A_concatThe dimensions of (a) represent the number of channels, height and width in space, respectively. L is the number of feature maps, i.e. { A₂,A₃,A₄,A₅4 layers of feature maps, l is formed by {2,3,4,5}, l_max＝5,l_min＝2。

In the above formula, L is the number of feature maps in the feature pyramid, L_minAnd l_maxRespectively representing the lowest-level feature map and the highest-level feature map index in the feature pyramid. To obtain A_concatThen, the feature map is input into a Gaussian non-local attention mechanism, the schematic diagram is shown in FIG. 5, and for the finally output feature map E, each spatial position E of the feature map E_iCan be expressed as:

i is the index of the feature point whose correlation degree with each spatial position needs to be calculated currently, f is the method for calculating the correlation similarity between two spatial positions, which is calculated by using an embedded gaussian method, as shown in formula (3),

for three embedding spaces, three 1 × 1 convolutions are used and finally regularized by c (x).

θ, φ, g are three embedding spaces.

As shown in FIG. 5, three new feature representations B, C, D e R are obtained by three 1 × 1 convolutions^C×H×WWith a transformation dimension of R^C×NR^C×NWhere N is H × W, multiplying the transpose of B by C to obtain a matrix of degree of association S ∈ R^N×N。

For each feature point S on the matrix S_ijAnd, the representative is the degree of association between the ith spatial position and the jth spatial position, as shown in equation (4):

after obtaining the similarity matrix S, the similarity matrix S and D are subjected to matrix multiplication and dimension conversion to R^C×H×WAnd then the obtained characteristic expression E belongs to R by carrying out pixel-level addition on the obtained characteristic expression E and the obtained characteristic expression A^C×H×WAs shown in formula (5):

the above formula can conclude that E fuses context information of the whole graph, and can fuse similar characteristic information and inhibit interference factors through the similarity matrix S, so that the characteristic-missing fruits can also be subjected toAnd a good detection effect is achieved. Then re-sample E and match { A }₂,A₃,A₄,A₅The final characteristic pyramid (P) is obtained by fusion₂,P₃,P₄,P₅}

2. And a RoIs generation stage:

for P_l∈{P₂,P₃,P₄,P₅And the pre-defined anchor frames with different sizes and aspect ratios are generated by respectively inputting the pre-defined anchor frames into RPN corresponding to the original image receptive field, which is realized by connecting 13 × 3 convolution with 21 × 1 convolutions, and taking each space bit center on Pl as the center, and the pre/background and frame offsets are preliminarily predicted through 21 × 1 convolutions, as shown in FIG. 6. And carrying out operations such as frame elimination and non-maximum suppression on the candidate frame obtained by the RPN, reserving the candidate frame of the positive and negative samples according to a certain sampling proportion, extracting the characteristics of the region of interest from the specified characteristic level k according to the formula (6), and inputting the characteristics into the RoI Align layer to sample to a fixed size.

3. Result prediction phase

And inputting the RoIs with fixed sizes generated in the last stage into two fully-connected branches to generate the class confidence coefficient and the frame offset of the fruit respectively, and generating the segmentation mask of the fruit in each prediction frame by paralleling one FCN.

4. Model training and optimization

In embodiment 3 of the present invention, the idea of model optimization is to calculate a loss between a predicted value generated by a model and a training target generated according to labeling information, update parameters by using a gradient back propagation algorithm, perform iterative training repeatedly, and evaluate on a verification set to obtain an optimal model for a network to infer and segment a target fruit in an image, where a flow chart of the network in an inference stage is shown in fig. 7.

The loss of the whole model is mainly composed of multitask loss generated by two multitask branches, and is generated by the RPN and the result prediction stage respectively, as shown in formula (7):

lfinal is the total loss of the final gradient return, where L is_cls1And L_reg1Predicted values and training targets generated from two 1 × 1 convolutional layers of RPN, L_cls2、L_reg2And L_maskResulting from the outcome prediction stage. Of the 5 loss functions of the formula (7), L_cls1、L_cls2、L_maskCalculating by sampling a cross entropy loss function, L_reg1、L_reg2The calculation is performed using a smoothing L1 loss function.

When training the RPN, taking an Intersection over Unit (IoU) between an anchor frame and a real frame as a standard for judging positive and negative samples, and if the Intersection over Unit (IoU) is greater than 0.7, judging the positive sample as a positive sample; if the sampling rate is less than 0.3, the sampling rate is regarded as a negative sample; if the frequency is between 0.3 and 0.7, the training is not participated. After the positive and negative samples are judged, random sampling is carried out according to the proportion of 1:1 of the positive and negative samples, and a training target of the RPN is generated according to the corresponding real frame information; when the three-task branch is generated in the training result prediction stage, firstly, the IoU threshold value is used as the standard for judging positive and negative samples, then sampling is carried out according to a certain proportion (1:3), finally, a corresponding training target is generated, and the loss between the training target and the predicted value is calculated to optimize the model. The segmentation effect of the model is shown in fig. 8.

In summary, according to the fruit segmentation and identification method for the fruit picking robot in the embodiment of the present invention, RGB images including different interference types are collected in an orchard environment, and target fruits therein are labeled; inputting the image into a residual error network ResNet, fusing feature representations of different levels by a feature pyramid, sampling to the same size for combination, conveying to a Gaussian non-local attention mechanism for interference suppression, and resampling and fusing to obtain a balanced feature pyramid; respectively inputting each layer of feature map in the pyramid into a region candidate module, generating a predefined anchor frame on an input picture, performing preliminary adjustment, screening out a candidate frame through non-maximum value inhibition, extracting features on the corresponding feature map, and down-sampling to the same size through Align layers to obtain an interested region; and finally, generating category confidence and frame coordinates through two full-connection layers, and generating a segmentation mask of the fruit by using a full convolution network.

Mask R-CNN is used as the most mainstream example segmentation algorithm at present, image features are extracted by means of neural network depth, nonlinear data are fitted, and the segmentation effect of a model on clustered or overlapped target fruits can be effectively improved; the Feature Pyramid Network (FPN) is connected with the transverse direction from top to bottom to construct Feature representations with different sizes and high-level semantic information, so that the segmentation effect of the model on target fruits with different scales, especially small scales, can be effectively improved; the Gaussian non-local attention mechanism can be embedded into a network by a very small amount of parameters, similar characteristic information in the whole graph is aggregated, interference factors are suppressed, and the segmentation effect of the model on characteristic missing target fruits can be effectively improved. The embodiment of the invention integrates the technical design, and can be suitable for identifying different types of target fruits in a complex orchard environment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Although the present disclosure has been described with reference to the specific embodiments shown in the drawings, it is not intended to limit the scope of the present disclosure, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive faculty based on the technical solutions disclosed in the present disclosure.

Claims

1. A fruit division recognition method is characterized by comprising the following steps:

2. The fruit division recognition method according to claim 1, wherein the step S120 specifically includes:

3. The fruit division recognition method according to claim 2, wherein the step S130 specifically includes:

4. The fruit division recognition method according to claim 3, wherein the step S140 specifically comprises:

5. The fruit division recognition method according to claim 4, wherein the step S150 specifically comprises:

6. A fruit segmentation identification system, comprising:

7. The fruit segmentation identification system according to claim 6, characterized in that the first extraction module comprises:

8. The fruit segmentation recognition system of claim 7, wherein the second extraction module comprises:

and the screening unit is used for screening the generated candidate frames through boundary elimination and non-maximum suppression, then selecting the first N candidate frames according to confidence degree sequencing, selecting positive and negative samples according to a certain proportion, and combining the positive and negative samples with a RoIAlign layer for sampling to the same size.

9. The fruit segmentation identification system according to claim 8, wherein the prediction module comprises:

10. A fruit picking robot comprising a fruit division recognition system according to any one of claims 6 to 8.