CN115294472A

CN115294472A - Fruit yield estimation method, model training method, equipment and storage medium

Info

Publication number: CN115294472A
Application number: CN202210778457.XA
Authority: CN
Inventors: 叶全洲; 林涌海; 王宏乐; 谢辉; 邓烈; 王兴林
Original assignee: Shenzhen Wugu Network Technology Co ltd; Shenzhen Zhinong Intelligent Technology Co ltd
Current assignee: Shenzhen Wugu Network Technology Co ltd; Shenzhen Zhinong Intelligent Technology Co ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-11-04

Abstract

The application discloses a fruit yield estimation method, a model training method, equipment and a storage medium, and belongs to the technical field of agricultural automation. The method comprises the following steps: acquiring a plurality of images of the fruit planting area, wherein the plurality of images are obtained by respectively acquiring images of a plurality of sampling points of the fruit planting area through image acquisition equipment; inputting each image of the plurality of images into a target detection model, identifying fruits in each image through the target detection model to obtain a fruit identification result of each image, wherein a backbone network of the target detection model is a deformed Darknet, and the deformed Darknet is generated by replacing a plurality of residual blocks in the Darknet with CSP blocks; and determining the fruit yield of the fruit planting area according to the fruit identification result of each image in the plurality of images. Therefore, the fruit yield estimation mode is efficient and convenient, labor cost is saved, and working efficiency and identification accuracy are improved.

Description

Fruit yield estimation method, model training method, equipment and storage medium

Technical Field

The application relates to the technical field of agricultural automation, in particular to a fruit yield estimation method, a model training method, equipment and a storage medium.

Background

China is a big fruit producing country, and the fruit tree industry has become a local support industry in many areas. In order to improve the orchard management level, the fruit yield of a fruit planting area needs to be counted. For example, the fruit yield is estimated in the fruit tree growth process, so that the production management and sales strategy can be adjusted in time according to the fruit yield. However, most of the existing orchards adopt a manual mode to estimate the fruit yield, and have the problems of low efficiency and safety, low accuracy and the like.

Disclosure of Invention

The application provides a fruit yield estimation method, a model training method, equipment and a storage medium, which can save labor cost and improve fruit yield estimation efficiency and accuracy. The technical scheme is as follows:

in a first aspect, there is provided a fruit yield estimation method, the method comprising:

acquiring a plurality of images of a fruit planting area, wherein the images are obtained by respectively acquiring a plurality of sampling points of the fruit planting area through image acquisition equipment;

inputting each image of the multiple images into a target detection model, identifying fruits in each image through the target detection model, and obtaining a fruit identification result of each image, wherein a backbone network of the target detection model is a deformed Darknet, and the deformed Darknet is generated by replacing a plurality of residual blocks in the Darknet with cross-stage local CSP blocks;

and determining the fruit yield of the fruit planting area according to the fruit identification result of each image in the plurality of images.

Optionally, the fruit identification result of each image includes a fruit position and a fruit number in each image, and the determining the fruit yield of the fruit growing area according to the fruit identification result of each image in the plurality of images includes:

determining the total fruit number of the fruit planting area according to the fruit number in each image of the plurality of images;

determining the average single fruit quality of the fruit planting area according to the fruit position and the fruit number in each image of the plurality of images;

determining the fruit yield of the fruit growing area according to the total fruit number of the fruit growing area and the average single fruit quality.

Optionally, the determining the average single fruit quality of the fruit growing area according to the fruit position and the fruit number in each of the plurality of images comprises:

determining the depth information of the fruit in each image according to the fruit position in each image in the plurality of images;

determining the fruit quality of the fruit in each image according to the depth information of the fruit in each image;

determining an average single fruit quality of the fruit growing area according to the fruit weight of the fruit in each image of the plurality of images and the fruit number in each image.

Optionally, the determining the fruit quality of the fruit in each image according to the depth information of the fruit in each image includes:

determining the volume of the fruit in each image according to the depth information of the fruit in each image;

and taking the volume of the fruit in each image as an input of a regression model, and determining the quality of the fruit in each image through the regression model, wherein the regression model is used for determining the quality of any fruit according to the volume of the fruit.

Optionally, before the obtaining the plurality of images of the fruit growing area, the method further comprises:

and determining the plurality of sampling points from the fruit planting area according to the planting area and the distribution condition of the fruit planting area.

Optionally, the acquiring a plurality of images of a fruit growing area includes:

if the fruits planted in the fruit planting area are fruits of a first type, respectively carrying out image acquisition on the plurality of sampling points through image acquisition equipment carried by a fruit park operation vehicle to obtain a plurality of images;

if the fruit planted in the fruit planting area is a second type of fruit, image acquisition equipment carried by an unmanned aerial vehicle is used for acquiring images of the plurality of sampling points respectively to obtain the plurality of images.

Optionally, the image acquisition device carried by the unmanned aerial vehicle respectively acquires images of the plurality of sampling points, and the image acquisition device includes:

planning a route, a waypoint and a flying height of the unmanned aerial vehicle according to the plurality of sampling points;

adjusting an image acquisition angle of image acquisition equipment carried by the unmanned aerial vehicle;

and controlling the unmanned aerial vehicle to fly according to the air route and the flying height, and acquiring images through carried image acquisition equipment when the unmanned aerial vehicle flies to a planned waypoint.

Optionally, the target detection model is a YOLO network model, the YOLO network model includes a variant Darknet, an attention mechanism module, and a multi-scale fusion detection module, the variant Darknet includes n CSP blocks, and n is a positive integer.

Optionally, the attention mechanism module is a convolution block attention module CBAM, and the multi-scale fusion detection module is a feature pyramid detection network FPN.

Optionally, the multi-scale fusion detection module includes a plurality of target detection modules with different scales, and at least one target detection module of the plurality of target detection modules has a receptive field block RFB introduced therein.

Optionally, the multi-scale fusion detection module introduces a spatial attention mechanism.

In a second aspect, a model training method is provided, which includes:

obtaining a sample image set, wherein a sample image in the sample image set is an image of a sample fruit planting area;

marking the fruit position in each sample image included in the sample image set to obtain a marked image set;

training a target detection model to be trained according to the labeled image set to obtain a target detection model, wherein the target detection model is used for identifying fruits in any image, a backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with cross-stage local CSP blocks.

Optionally, the acquiring a sample image set comprises:

determining a plurality of sampling points of the sample fruit planting area to obtain a plurality of sample sampling points;

carrying out image acquisition on the plurality of sample sampling points through image acquisition equipment to obtain an initial sample image set;

and performing data enhancement processing on the initial sample image set to obtain the sample image set.

Optionally, the performing data enhancement processing on the initial sample image set includes:

generating a sample image simulating a first occlusion scene according to the sample image in the initial sample image set, wherein the first occlusion scene is a scene in which a fruit is occluded by other fruits;

and generating a sample image simulating a second occlusion scene according to the sample image in the initial sample image set, wherein the second occlusion scene is a scene in which a fruit is occluded by a background object.

Optionally, the generating a sample image simulating a first occlusion scenario according to the sample image in the initial sample image set includes:

for a first sample image in the initial sample image set, performing contour edge extraction on a target fruit in the first sample image to obtain a first fruit contour, wherein the first sample image is any sample image in the initial sample image set;

overlapping a first fruit contour with a second fruit contour, wherein the second fruit contour is obtained by copying the first fruit contour or is the fruit contour of other fruits except the target fruit;

removing an intersection pixel point set of the first fruit contour and the second fruit contour from the pixel point set of the first fruit contour to obtain a first pixel point set;

merging the first pixel point set and the pixel point set of the second fruit outline to obtain a second pixel point set;

and splicing the second pixel point set and any sample image in the initial sample image to obtain a new sample image.

Optionally, the generating a sample image simulating a second occlusion scenario according to the sample image in the initial sample image set includes:

for a second sample image in the initial sample image set, replacing a fruit region in the second sample image with a background region in any sample image in the initial sample image to obtain a new sample image, wherein the second sample image is any image in the initial sample image set.

Optionally, the training a target detection model to be trained according to the labeled image set to obtain a target detection model includes:

taking each marked image in the marked image set as the input of the target detection model to be trained, and identifying the fruit position in each marked image through the target detection model to be trained;

and adjusting the model parameters of the target detection model to be trained according to the position error of the fruit position identified in each labeled image and the position error of the labeled fruit position, and taking the target detection model to be trained after the model parameters are adjusted as the target detection model.

Optionally, the target detection model to be trained is obtained by modifying a Yolo V3 network model, the deformed Darknet53 network includes n CSP blocks, an attention mechanism module and a multi-scale fusion detection module, and n is a positive integer.

Optionally, the attention mechanism module is a convolution block attention module CBAM, and the multi-scale fusion detection module is a feature pyramid detection network FPN module.

Optionally, the multi-scale fusion detection module includes a plurality of target detection modules with different scales, and an RFB module is incorporated in at least one of the plurality of target detection modules.

In a third aspect, there is provided a fruit yield estimation apparatus, the apparatus comprising:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a plurality of images of a fruit planting area, and the images are obtained by respectively acquiring a plurality of sampling points of the fruit planting area through image acquisition equipment;

the identification module is used for inputting each image of the images into a target detection model, identifying fruits in each image through the target detection model, and obtaining a fruit identification result of each image, wherein a backbone network of the target detection model is a deformed Darknet, and the deformed Darknet is generated by replacing a plurality of residual blocks in the Darknet with cross-stage local CSP blocks;

the first determining module is used for determining the fruit yield of the fruit planting area according to the fruit identification result of each image in the images.

Optionally, the fruit identification result of each image includes a fruit position and a fruit number in each image, and the first determining module includes:

a first determining unit, configured to determine a total number of fruits in the fruit growing area according to the number of fruits in each of the plurality of images;

the second determining unit is used for determining the average single fruit quality of the fruit planting area according to the fruit position and the fruit quantity in each image of the plurality of images;

a third determining unit for determining the fruit yield of the fruit growing area according to the total fruit number of the fruit growing area and the average single fruit quality.

Optionally, the second determining unit is configured to:

determining an average single fruit quality of the fruit growing area according to the fruit weight of the fruit in each of the plurality of images and the number of fruits in each image.

Optionally, the second determining unit is configured to:

Optionally, the apparatus further comprises:

and the second determining module is used for determining the plurality of sampling points from the fruit planting area according to the planting area and the distribution condition of the fruit planting area.

Optionally, the obtaining module is configured to:

the first acquisition unit is used for respectively acquiring the images of the plurality of sampling points through image acquisition equipment carried by a fruit park operation vehicle to obtain a plurality of images if the fruits planted in the fruit planting area are of a first type;

and the second acquisition unit is used for acquiring images of the plurality of sampling points respectively through image acquisition equipment carried by the unmanned aerial vehicle if the fruits planted in the fruit planting area are second-class fruits, so that the plurality of images are obtained.

Optionally, the second obtaining unit is configured to:

and controlling the unmanned aerial vehicle to fly according to the air route and the flying height, and acquiring images through carried image acquisition equipment when the unmanned aerial vehicle flies to a planned navigation point.

In a fourth aspect, there is provided a model training apparatus, comprising:

the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a sample image set, and a sample image in the sample image set is an image of a sample fruit planting area;

the marking module is used for marking the fruit position in each sample image included in the sample image set to obtain a marked image set;

the training module is used for training a target detection model to be trained according to the labeled image set to obtain a target detection model, the target detection model is used for identifying fruits in any image, a backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with cross-stage local CSP blocks.

Optionally, the obtaining module includes:

the determining unit is used for determining a plurality of sampling points of the sample fruit planting area to obtain a plurality of sample sampling points;

the acquisition unit is used for acquiring images of the plurality of sample sampling points through image acquisition equipment to obtain an initial sample image set;

and the data enhancement unit is used for carrying out data enhancement processing on the initial sample image set to obtain the sample image set.

Optionally, the data enhancement unit is configured to:

and splicing the second pixel point set with any sample image in the initial sample image to obtain a new sample image.

Optionally, the data enhancement unit:

Optionally, the training module is to:

and adjusting the model parameters of the target detection model to be trained according to the fruit position identified in each labeled image and the position error of the labeled fruit position, and taking the target detection model to be trained after the model parameters are adjusted as the target detection model.

In a fifth aspect, there is provided a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program when executed by the processor implementing the fruit yield estimation method or the model training method described above.

In a sixth aspect, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the fruit yield estimation method or the model training method described above.

In a seventh aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the fruit yield estimation method or model training method described above.

Compared with the prior art, the embodiment of the application has the advantages that:

in the embodiment of the application, carry out image acquisition to a plurality of sampling points in fruit growing region through image acquisition equipment, carry out fruit discernment to the image of gathering through the target detection model, according to fruit discernment result automatic estimation fruit output, the estimation mode is high-efficient, convenient, has saved the cost of labor, has improved work efficiency. Moreover, the backbone network of the target detection model is the deformation Darknet, and the deformation Darknet is generated by replacing a plurality of residual blocks in the Darknet with the cross-stage local CSP blocks, so that the calculated amount of the target detection model can be reduced, and the identification accuracy can be ensured.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a CSP Block provided by an embodiment of the present application;

fig. 2 is a schematic diagram of a YOLO network model according to an embodiment of the present application;

FIG. 3 is a flow chart of a model training process provided by an embodiment of the present application;

FIG. 4 is a flow chart of a fruit yield estimation method provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of a fruit yield estimation system according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a fruit yield estimation device provided in an embodiment of the present application;

FIG. 7 is a block diagram of a model training apparatus provided in an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that reference to "a plurality" in this application refers to two or more. In the description of the present application, "/" means "or" unless otherwise stated, for example, a/B may mean a or B; "and/or" herein is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, for the convenience of clearly describing the technical solutions of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

Before explaining the embodiments of the present application in detail, an application scenario of the embodiments of the present application will be explained.

Generally, the planting area of fruit crops is few, dozens and thousands of acres. The traditional yield estimation method is based on manual experience to carry out sampling statistical estimation, consumes a large amount of manpower and material resources, has the defects of small investigation area, low efficiency, low accuracy, high labor intensity and strong specialization, causes certain limitation on operation, cannot accurately and quickly reflect the actual yield of an orchard, seriously influences the formulation and guidance of scientific decisions of government departments, enterprises and growers, and causes adverse influence on later-stage sales guidance, crop layout adjustment and other links.

In the embodiment of the application, the method for carrying out image acquisition on a plurality of sampling points of a fruit planting area through the image acquisition equipment is provided, fruit identification is carried out on the acquired image through the target detection model, the fruit yield is automatically estimated according to the fruit identification result, the estimation mode is efficient and convenient, the accuracy rate is high, the labor cost is saved, and the working efficiency is improved.

Furthermore, there is currently no statistical estimation of fruit crop yield using methods specifically for target detection. With the wide application of the deep convolutional neural network technology in the agricultural field, the method has great influence on aspects such as agricultural cultivation, plant protection, production measurement and the like, greatly improves the efficiency of agricultural production, and obviously reduces the production cost. In the embodiment of the application, the backbone network is used as a target detection model of the Darknet deformation for fruit identification, and the Darknet deformation is generated by replacing a plurality of residual blocks in the Darknet with CSP (Cross Stage Partial) blocks, so that the identification accuracy can be improved while the calculated amount of the target detection model is reduced.

Next, a target detection model according to an embodiment of the present application will be described in detail.

The target detection model used by the fruit yield estimation method provided by the embodiment of the application is used for identifying the fruits existing in any image to obtain a fruit identification result. That is, the input of the target detection model is an image, and the output is a fruit recognition result. The fruit identification result may include information such as the fruit position and the fruit number in the image, or may include other information such as the fruit type.

The target detection model is a convolutional neural network model. In order to improve the accuracy of fruit identification, the target detection model used in the embodiment of the present application is obtained by modifying a YOLO (young Only Look one) network model as a basis. The backbone (backbone) network of the modified YOLO network model adopts a modified Darknet, which is obtained by modifying the Darknet (a deep learning framework).

The core idea of the YOLO network model is to directly regress the position of the bounding box and the category to which the bounding box belongs in the output layer by using the whole graph as the input of the network. The YOLO network model may be a network model such as YOLO V1, YOLO V2, YOLO V3, or YOLO V4. The variant Darknet is a variant Darknet53, etc., and the variant Darknet53 is obtained by modifying the Darknet53. For example, the target detection model is obtained by modifying a YOLO V3 network model as a basis, and the backbone network of the modified YOLO V3 network model is a modified Darknet53.

Modification point 1 of YOLO network model: a modified Darknet is generated by replacing several Resnet blocks (residual blocks) in the Darknet (backbone network) with CSP blocks (CSP blocks).

The structure of the Darknet network adopts a design idea for borrowing the residual connection of ResNet. The structure can enable a deeper network to be constructed, and meanwhile, the problems of gradient disappearance, explosion and the like can be relieved, but the structure also has the problem that the calculated amount is greatly increased due to repeated gradient information.

In order to reduce the problem of large calculation amount caused by repeated gradient information, in the process of modifying Darknet to obtain the modified Darknet, the embodiment of the application firstly replaces several Resnet blocks in the Darknet with CSP blocks. The CSP Block can ensure the accuracy while reducing the calculation amount by integrating the change of the gradient into the characteristic diagram from beginning to end.

The CSP Block is obtained by applying the CSP idea to Resnet Block. The CSP idea is to split the feature map into two parts, one part is performed with convolution operation, and the other part is spliced with the result of the convolution operation in the previous part. In the embodiment of the application, the calculation amount caused by repeated gradient information can be reduced by applying the CSP idea in Resnet Block.

Referring to fig. 1, fig. 1 is a schematic diagram of a CSP Block according to an embodiment of the present disclosure. As shown in fig. 1, the CSP Block is generated by applying the CSP concept on the basis of the respet Block, that is, a part of the output of the base layer (base layer) is convolved, and another part is processed by n respet blocks and then spliced with the result of the convolution of the previous part (concatee).

Modification point 2 of YOLO network model: an attention mechanism module is added after the deformation of Darknet.

The Attention mechanism Module may be a CBAM (Convolutional Block Attention Module), i.e., an Attention mechanism Module for a Convolutional Block. CBAM is a module of attention mechanism that combines space (spatial) and channel (channel).

As an example, the variant Darknet comprises n CSP blocks, and an attention mechanism module may be added after the last two CSP blocks of the n CSP blocks.

By adding the attention mechanism module after the Darknet is deformed, larger feature maps can be reserved as far as possible to enhance the detection capability of the small target, and the loss of the small target precision caused by the reduction of the feature maps due to the convolution step length in the network is relieved.

Modification point 3 of YOLO network model: the YOLO network model further includes a multi-scale fusion detection module, and RFB (received Field Block) is introduced into the multi-scale fusion detection module.

The multi-scale fusion detection module is used for carrying out feature fusion based on different dimensions. For example, bilinear difference method can be used for sampling, so that the feature map with small size at the upper layer can be fused with the feature map with large size at the bottom layer. The characteristic diagram has spatial information and reliable high-level semantic information so as to improve the detection performance.

The multi-scale fusion detection module may be FPN (feature pyramid network) or the like.

In the target detection scene for fruits, most fruits belong to small targets with fewer pixel points occupied in the image, so that the detection capability of the small targets needs to be improved. In the embodiment of the application, the RFB is introduced into the multi-scale fusion detection module, so that the detection capability of the small target can be improved. Moreover, the improvement of small target detection performance is much larger than the increased computational cost.

For example, the multi-scale fusion detection module includes a plurality of object detection modules of different scales, and the RFB may be introduced in at least one of the plurality of object detection modules. For example, RFBs may be introduced in a first object detection module and a second object detection module of the multi-scale fusion detection module, respectively.

Modification point 4 of YOLO network model: CAM (Channel Attention Module) is introduced into the multi-scale fusion detection Module.

In the multi-scale fusion detection module, the direct fusion of the features of different scales brings a large amount of redundant information and conflict information, and reduces the multi-scale expression capability. In the embodiment of the application, by introducing the CAM into the multi-scale fusion detection module, a spatial attention mechanism can be added to perform adaptive weighting on spatial dimensions, the background suppression capability is enhanced, and the expression of boundary features is enhanced, so that tiny target features are prevented from being submerged in conflict information, and the detection performance is effectively improved.

Modification point 5 of YOLO network model: the bounding box penalty function is replaced with C _ iou _ loss.

The bounding box loss function used by the YOLO network model before modification is typically iou _ loss. Compared with other ious, the calculation method of the C _ iou is more in line with a regression mechanism of a target frame, and the conditions of three aspects of the distance between two BBoxs, the overlapping rate and the size of the scale are considered, so that the network can be better converged by replacing a boundary frame loss function used by the YOLO network model with the C _ iou _ loss.

Modification point 6 of YOLO network model: the DIOU _ NMS is used instead of the conventional IOU _ NMS calculation procedure, increasing the euclidean distance considering the two BBox center points.

Here, the IOU (Intersection-over-unity) is a concept used in target detection, and is an overlapping rate of a generated candidate frame (candidate frame) and an original labeled frame (ground route frame), that is, a ratio of an Intersection to an Union of the generated candidate frame and the original labeled frame. The use of NMS (Non-maximum suppression) in object detection is very widespread, with the aim of eliminating redundant frames and finding the best position for object detection.

The DIOU introduces the Euclidean distance of a central point between two BBoxs as a penalty term on the basis of IOU (Intersection-over-Union) calculation, directly minimizes the distance of the two BBoxs, and provides a direction for the movement of the BBoxs. Using the DIOU _ NMS instead of the conventional IOU _ NMS procedure, we add the euclidean distance considering the two BBox center points. The calculation result of the IOU is not used as a unique index, the problem that the overlapped target IOU is deleted by the NMS process to a certain extent is relieved, and the overlapped target frame is effectively reserved.

Referring to fig. 2, fig. 2 is a schematic diagram of a YOLO network model according to an embodiment of the present application, where the YOLO network model is obtained by modifying a conventional YOLO network model. As shown in fig. 2, the YOLO network model includes a morphed Darknet (backbone network), CBAM, and FPN. Wherein the variant Darknet comprises n CSP blocks, wherein the last two CSP blocks are added with CBAM. The FPN incorporates RFB and CAM. Wherein an RFB is introduced in the first object detection module and the second object detection module.

It should be understood that fig. 2 is only an example of the YOLO network model used in the embodiment of the application, and does not constitute a limitation on the YOLO network model used, and the network structure of the YOLO network model may be set as other network structures according to requirements.

Next, a model training process of the target detection model will be described in detail.

Fig. 3 is a flowchart of a model training process provided in an embodiment of the present application, where the method is applied to a computer device, where the computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, a computer, or the like. As shown in fig. 3, the method comprises the steps of:

step 301: and acquiring a sample image set, wherein the sample image in the sample image set is an image of a sample fruit planting area.

As an example, an initial sample image set may be obtained, and then the data enhancement may be performed on the initial sample image set to obtain the sample image set.

The sample image in the initial sample image set can be obtained by image acquisition of a sample fruit planting area. For example, a plurality of sampling points in the sample fruit growing area may be determined first to obtain a plurality of sample sampling points. Then, image acquisition is carried out on the plurality of sample sampling points through image acquisition equipment, and an initial sample image set is obtained.

Wherein, a plurality of sampling points in the sample fruit planting area can be determined according to the distribution area and/or planting area of the fruit. For example, according to the planting area and/or distribution condition of the sample fruit planting area, a plurality of sampling points of the sample fruit planting area are determined by adopting a specified sampling method. The specified sampling method may be a 5-point sampling method, a diagonal sampling method, or the like. As an example, a 5-point sampling method may be adopted, and one sampling point is selected from each of 4 corners and the central segment of the fruit growing area of the sample, so as to obtain 5 sampling points. In addition, in order to improve the sampling accuracy, the edge row tree is not used as a sampling point.

In addition, according to the difference of the fruit planted in the fruit planting area, different image acquisition equipment can be flexibly selected to carry out image acquisition on the sample sampling point. For example, for the first type of fruit, image acquisition can be performed through image acquisition equipment carried by a fruit park operation vehicle. To second type fruit, can carry out image acquisition through the image acquisition equipment that unmanned aerial vehicle carried on. Wherein the first kind of fruit can be the fruit of plants such as vine, shrub, etc., such as grape, dragon fruit, etc. The second kind of fruit can be the fruit of plants such as arbor, such as apple, pear, peach, apricot, etc.

After the initial sample image set is obtained, the data volume of the initial sample image set does not meet the requirement of model training, so that the data enhancement can be performed on the initial sample image set to amplify the number and diversity of sample images in the initial sample image set and improve the scale of the data set.

Because the orchard scene is complicated, and fruit crop has the blade area moreover and is big, easily shelters from each other, and light interference, fruit, blade color characteristics such as similar, the target detection has certain degree of difficulty. In the embodiment of the application, in order to improve the accuracy of the target detection model, the occlusion data of the initial sample image set can be enhanced, so that sample images simulating occlusion scenes such as fruit occlusion scenes and background occlusion scenes are added in the initial sample image set. Therefore, the target detection model to be trained can learn more shielding scenes, and the identification accuracy of the target detection model is improved.

The processing method for enhancing the occlusion data of the initial sample image set may include:

the first processing mode is as follows: and generating a sample image simulating a first occlusion scene according to the sample image in the initial sample image set, wherein the first occlusion scene is a scene in which the fruit is occluded by other fruits.

For example, generating a sample image simulating a first occlusion scene from sample images in the initial sample image set may include the following steps 1) -3):

1) And for a first sample image in the initial sample image set, performing contour edge extraction on a target fruit in the first sample image to obtain a first fruit contour, wherein the first sample image is any one sample image in the initial sample image set.

The target fruit may be any fruit in the first sample image, or a fruit satisfying a certain requirement. When contour edge extraction is performed on a target fruit in the first sample image, contour edge extraction can be performed on the fruit through an edge detection algorithm such as a Canny operator.

2) The first fruit contour is overlapped with a second fruit contour, wherein the second fruit contour is obtained by copying the first fruit contour or is the fruit contour of other fruits except the target fruit.

For example, when the second fruit contour is a fruit contour of another fruit, the second fruit contour may be obtained by performing contour edge extraction on a fruit in another sample image in the initial sample image set, or performing contour edge extraction on another fruit except for the target fruit in the first sample image.

The first fruit contour and the second fruit contour may be partially overlapped, or the first fruit contour and the second fruit contour may be globally overlapped. After the two fruit contours are overlapped, an intersection is generated between the two fruit contours. By overlapping the two fruit contours, the scene occluded between the two fruits can be simulated.

3) Removing an intersection pixel point set of the first fruit outline and the second fruit outline from the pixel point set of the first fruit outline to obtain a first pixel point set; merging the first pixel point set and the pixel point set of the second fruit outline to obtain a second pixel point set; and splicing the second pixel point set with any sample image in the initial sample image to obtain a new sample image.

After the first fruit contour and the second fruit contour are overlapped, a set of pixel points of the first fruit contour in the image, a set of pixel points of the second fruit contour in the image, and a set of intersection pixel points of the first fruit contour and the second fruit contour may be determined. Then, an intersection pixel point set is removed from the pixel point set of the first fruit outline to obtain a first pixel point set, and the first pixel point set and the pixel point set of the second fruit outline are combined to obtain a second pixel point set simulating two fruit shielding scenes. And then, splicing the second pixel point set with any sample image in the initial sample image, for example, covering the second pixel point set on a fruit tree of any sample image, so as to generate a new sample image simulating a fruit occlusion scene.

The second processing mode is as follows: and generating a sample image simulating a second occlusion scene according to the sample image in the initial sample image set, wherein the second occlusion scene is a scene in which the fruit is occluded by the background object.

For example, for a second sample image in the initial sample image set, the fruit region in the second sample image is replaced by the background region in any sample image in the initial sample image, so as to obtain a new sample image, where the second sample image is any image in the initial sample image set.

For example, the fruit area in the second sample image is cut in a random size, and the cut area is covered with the background area to simulate the scene that the fruit is blocked by the background.

In addition, after the occlusion data enhancement is performed on the initial sample image set, a geometric transformation enhancement mode can be further adopted for data enhancement. The geometric transformation enhancement mode comprises turning, rotating, shifting, scaling, clipping, gaussian noise and the like, and the geometric transformation enhancement mode is not limited in the embodiment of the application.

As an example, by performing data enhancement on the initial sample image set, the initial sample image set can be amplified by more than 50 times, which greatly increases the size of the data set.

Step 302: and marking the fruit position in each sample image included in the sample image set to obtain a marked image set.

When labeling the sample data set, in addition to labeling the fruit position in each sample image, other information, such as a fruit number and a fruit type, may be labeled.

And after the sample image is labeled, obtaining an labeled image, wherein the labeled image comprises the sample image and the labeling information of the sample image. The annotated image set comprises a plurality of annotated images.

In addition, in order to improve the sample quality, before labeling the sample image set, the images in the sample image set may be preprocessed, and then the preprocessed sample image set may be labeled. The preprocessing may include identification, classification, data cleaning, etc., and may also include other preprocessing methods. Data cleansing may include the elimination of duplicate or blurred images, etc. In addition, an agricultural expert may perform preprocessing on the images in the sample image set, or may automatically perform preprocessing on the sample image set by using the apparatus, which is not limited in this embodiment of the present application.

Step 303: and training the target detection model to be trained according to the labeling image set to obtain a target detection model, wherein the target detection model is used for identifying fruits in any image.

The backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with cross-stage local CSP blocks. For example, the target detection model to be trained is a YOLO network model, and the YOLO network model includes a deformation Darknet, an attention mechanism module, a multi-scale fusion detection module, and the like. For a specific model structure, reference may be made to the related description of the target detection model, which is not described in detail herein.

For example, the labeled image set may be used as an input of the target detection model to be trained, and the fruit position in each labeled image is identified by the target detection model to be trained. That is, after the labeled image set is input into the target detection model to be trained, the target detection model to be trained can automatically perform unsupervised learning, extraction and operation on the labeled image set, and then output the identification detection result of the effective fruit part of the labeled image in the labeled image set. Then, according to the position error between the identified fruit position and the labeled fruit position in each labeled image, the model parameters of the target detection model to be trained are adjusted, and the target detection model to be trained after the model parameters are adjusted is used as the target detection model.

In the process of training the target detection model to be trained according to the labeled image set, the training can be stopped when the end condition is met, so that the target detection model is obtained. The ending condition may be that the training times satisfy preset times, or that the detection accuracy of the target detection model for the test set satisfies preset requirements, and the like. For example, the training may be stopped when the detection accuracy of the target detection model to be trained on the test set reaches 93.65.

As one example, the set of annotated images can also be divided into a training set and a test set. After the training set is used to train the target detection model to be trained, the test set can be used to validate the target detection model. And if the detection accuracy of the target detection model to the test set meets the preset requirement, the verification is passed, and the training is completed. And if the detection accuracy of the target detection model to the test set does not meet the preset requirement, performing data adjustment on the training set, and continuing training the target detection model by using the adjusted training set until the verification is passed.

In the embodiment of the application, the fruit positions in each sample image included in the sample image set are marked by obtaining the sample image set of the sample fruit planting area to obtain the marked image set, and then the target detection model to be trained is trained according to the marked image set, so that the target detection model capable of accurately identifying the fruit positions in the images can be trained. The backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with CSP blocks, so that the calculation amount of the target detection model can be reduced, and the identification accuracy can be improved. In addition, the original sample image set is subjected to occlusion data set enhancement, for example, a sample image simulating a situation that fruits are occluded by other fruits is generated, and a sample image simulating a situation that fruits are occluded by background objects is generated, so that the target detection model to be trained can learn more occlusion situations, and the identification accuracy of the target detection model is improved.

Next, a process of estimating the fruit yield using the trained object detection model will be described in detail. Referring to fig. 4, fig. 4 is a flowchart of a method for estimating fruit yield according to an embodiment of the present application, where the method is applied to a computer device, where the computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer. As shown in fig. 4, the method may include the steps of:

step 401: and acquiring a plurality of images of the fruit planting area.

The plurality of images of the fruit planting area can be acquired by respectively acquiring the images of the plurality of sampling points of the fruit planting area through the image acquisition equipment.

As an example, the operation of acquiring a plurality of images of a fruit growing area comprises the steps of:

1) And determining a plurality of sampling points of the fruit planting area.

For example, a plurality of sampling points can be selected from the fruit growing area according to the growing area and/or distribution condition of the fruit growing area. The method for selecting the sampling points may be a 5-point sampling method or a diagonal sampling method, and the method for selecting the sampling points is not limited in the embodiment of the application.

For example, a 5-point sampling method can be adopted, and 5 sampling points are selected from 4 corners and a central segment of a fruit growing area respectively to obtain 5 sampling points. In addition, in order to improve the sampling accuracy, the edge row tree is not used as a sampling point.

As an example, a fruit tree may be selected from a fruit growing area as a sampling point. Wherein, for fruits such as apple, pear, peach, apricot and the like, one fruit tree is taken as one plant; for fruits such as grapes, 3-5 fruit trees are used as one strain.

2) And respectively carrying out image acquisition on a plurality of sampling points of the fruit planting area through image acquisition equipment to obtain a plurality of images.

The image capturing device may be a distance-measurable image capturing device, that is, an image capturing device that can acquire depth information of an image. The image acquisition equipment can be a mobile phone, a camera or a laser radar and the like, and the camera can be a camera with a binocular camera and the like and capable of measuring distance. In addition, image acquisition equipment can carry on the use on different equipment, for example carry on and use on fruit garden operation car or unmanned aerial vehicle.

Wherein, according to the difference of the fruit of planting in the fruit growing region, can select different image acquisition equipment in a flexible way and carry out image acquisition to the sample sampling point. For example, if the fruits planted in the fruit planting area are fruits of the first type, image collection is respectively performed on the plurality of sampling points through image collection equipment carried by a fruit park operation vehicle, and a plurality of images are obtained. If the fruit of planting in the fruit growing region is second class fruit, then carry out image acquisition to a plurality of sampling points respectively through the image acquisition equipment that unmanned aerial vehicle carried on, obtain a plurality of images.

Wherein the first kind of fruit can be the fruit of plants such as vine, shrub, etc., such as grape, dragon fruit, etc. The second kind of fruit can be the fruit of plants such as arbor, such as apple, pear, peach, apricot, etc.

As an example, the operation of respectively performing image acquisition on the plurality of sampling points by the image acquisition device carried by the unmanned aerial vehicle may include: and planning the air route, the waypoint and the flying height of the unmanned aerial vehicle according to the plurality of sampling points of the fruit planting area. Adjusting an image acquisition angle of image acquisition equipment carried by the unmanned aerial vehicle, such as adjusting the angle of a camera; and controlling the unmanned aerial vehicle to fly according to the air route and the flying height, and acquiring images through carried image acquisition equipment when the unmanned aerial vehicle flies to a planned navigation point.

Step 402: and inputting each image of the plurality of images into the target detection model, and identifying the fruit in each image through the target detection model to obtain a fruit identification result of each image.

Wherein the fruit identification result comprises the fruit position of the fruit in each image. The fruit position may be coordinate information of the fruit in the image, or coordinate information of a detection frame of the fruit in the image, and the like.

In addition, the fruit identification result may include the number of fruits or the category of fruits.

Step 403: and determining the fruit yield of the fruit planting area according to the fruit identification result of each image in the plurality of images.

As one example, the fruit identification result for each image includes a fruit position and a fruit number in each image. Accordingly, the operation of determining the fruit yield of the fruit growing region according to the fruit recognition result of each of the plurality of images may include the steps of:

1) And determining the total fruit number of the fruit planting area according to the fruit number in each image of the plurality of images.

For example, the number of fruits at each of the plurality of sampling points in the fruit growing area may be determined according to the number of fruits in each of the plurality of images. Then, counting the total fruit number of the fruit planting area according to the fruit number of each sampling point in the plurality of sampling points.

2) And determining the average single fruit quality of the fruit planting area according to the fruit position and the fruit quantity in each image of the plurality of images.

As an example, determining an average single fruit quality for a fruit growing area based on fruit position and fruit number in each of a plurality of images comprises: and determining the depth information of the fruit in each image according to the fruit position in each image in the plurality of images. Determining fruit quality of the fruit in each image according to the depth information of the fruit in each image. And determining the average single fruit quality of the fruit growing area according to the fruit weight of the fruit in each image of the plurality of images and the fruit number in each image.

The depth information of the fruit is used for indicating the distance of each point of the fruit relative to the image acquisition equipment and indicating the distance relation between each point of the fruit and the image acquisition equipment.

For example, the image acquisition device is a distance-measuring image acquisition device, the fruit contour can be calibrated according to the fruit position in each image, then the disparity map of the fruit is calculated through the distance-measuring image acquisition device according to the fruit contour, and the disparity map of the fruit is measured to obtain the depth information of the fruit.

In addition, in order to improve accuracy, the distance-measuring image acquisition device can be corrected in advance.

As an example, the operation of determining the fruit quality of the fruit in each image from the depth information of the fruit in each image comprises: determining the volume of the fruit in each image according to the depth information of the fruit in each image; the fruit volume in each image is used as input to a regression model by which the quality of the fruit in each image is determined. Wherein the regression model is used to determine the quality of any fruit based on the volume of any fruit.

For example, the depth information of the fruit can be calculated and analyzed to obtain the volume of the fruit.

3) And determining the fruit yield of the fruit planting area according to the total fruit quantity and the average single fruit quality of the fruit planting area.

For example, the fruit yield of the fruit growing area can be determined according to the total fruit quantity and the average single fruit quality of the fruit growing area by the following formula (1):

fruit yield = total fruit number × average individual fruit quality (1)

As an example, the mu sampling points of the fruit planting area may also be determined, and the image acquisition device performs image acquisition on the mu sampling points of the fruit planting area to obtain a plurality of images. Then, according to the fruit identification result of each image in the multiple images, the total fruit number per mu and the average single fruit quality of the fruit planting area are counted, and the per mu yield of the fruit planting area is determined according to the total fruit number per mu and the average single fruit quality of the fruit planting area.

For example, the acre yield of the fruit planting area can be determined according to the total acre fruit number and the average single fruit quality of the fruit planting area by the following formula (2):

mu yield = mu total fruit number × average single fruit mass (2)

Wherein the total fruit number per mu = the number of plants per mu and the average number of fruits per plant. The number of plants per mu can be obtained through statistics according to the number of fruit trees at a plurality of sampling points, and the number of the fruit trees at the plurality of sampling points can be determined according to the number of the fruit trees in each image in the plurality of images. The average number of fruits per plant can be determined according to the number of fruits of each fruit tree in a plurality of images. The number of fruits refers to the number of fruits.

Wherein the average single fruit mass = fruit total weight of preset fruit in specified sampling point/fruit number of preset fruit in specified sampling point. The designated sampling point may be any n sampling points among the plurality of sampling points, n being a positive integer, and n being less than or equal to a total number of the plurality of sampling points. The preset fruit may be all or part of the fruit in the designated sampling point, such as 20 fruits in the designated sampling point.

As an example, referring to fig. 5, the left half of fig. 5 is a model training process of the target detection model, and the right half of fig. 5 is a process of performing fruit yield estimation according to the target detection model. As shown in the left half of fig. 5, an initial sample image set of a sample fruit planting region may be preprocessed, and then data enhancement may be performed on the initial sample image to obtain a sample image set. And then, marking the fruit position of each sample image in the sample image set to obtain a marked image set, and performing model training on the target detection model to be trained according to the marked image set. And after the model training is finished, carrying out model testing on the trained target detection model according to the test set. If the model test fails, model training continues. And if the model test is passed, the model of the current sequence is used as a trained target detection model.

After the model training is completed, the trained target detection model can be uploaded to a cloud server. In the fruit yield estimation process, the image acquisition equipment can be selected according to the types of the fruits planted in the fruit planting area, and the sampling points planted in the fruit planting area are determined according to the planting area and the distribution condition of the fruit planting area. And then, carrying out image acquisition on the determined sampling points through the selected image acquisition equipment. The collected image is used as the input of a target detection model of a cloud server, fruits in the image are identified through the target detection model, and the fruit yield of the fruit planting area is estimated according to the fruit identification result of the image. Then, the fruit yield estimation result is output.

In addition, in the model training process, sample images of sampling points in a sample fruit planting area can be spliced, and model training is carried out on the target detection model to be trained according to the spliced images obtained through splicing.

In the embodiment of the application, carry out image acquisition to a plurality of sampling points in fruit growing region through image acquisition equipment, carry out fruit discernment to the image of gathering through the target detection model, according to fruit discernment result automated estimation fruit output, the estimation mode is high-efficient, convenient, has saved the cost of labor, has improved work efficiency. Moreover, the backbone network of the target detection model is the modified Darknet, and the modified Darknet is generated by replacing a plurality of residual blocks in the Darknet with the cross-stage local CSP blocks, so that the calculation amount of the target detection model can be reduced, and the identification accuracy can be ensured. In addition, the fruit yield can be estimated quickly and efficiently by determining the depth information of the fruit according to the fruit position in the image and determining the fruit quality according to the depth information of the fruit. In addition, the fruit volume is determined according to the depth information of the fruit by establishing a regression model of the fruit volume and the fruit mass, and then the fruit mass can be rapidly determined through the regression model, so that the labor cost is further saved, and the estimation efficiency and the accuracy are improved. In addition, the fruit yield estimation method provided by the embodiment of the application can be suitable for various fruit types, and is high in adaptability.

Fig. 6 is a block diagram of a fruit yield estimation apparatus provided in an embodiment of the present application, where the apparatus may be integrated into a computer device, where the computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, or a computer. Referring to fig. 6, the apparatus includes: an acquisition module 601, a recognition module 602, and a first determination module 603.

The acquiring module 601 is configured to acquire multiple images of a fruit planting area, where the multiple images are obtained by respectively performing image acquisition on multiple sampling points of the fruit planting area through an image acquisition device;

the identification module 602 is configured to input each of the multiple images into a target detection model, and identify a fruit in each image through the target detection model to obtain a fruit identification result of each image, where a backbone network of the target detection model is a deformation Darknet, and the deformation Darknet is generated by replacing a plurality of residual blocks in the Darknet with cross-stage local CSP blocks;

a first determining module 603, configured to determine a fruit yield of the fruit growing area according to a fruit identification result of each of the plurality of images.

Optionally, the fruit identification result of each image includes a fruit position and a fruit number in each image, and the first determining module 603 includes:

the first determining unit is used for determining the total fruit number of the fruit planting area according to the fruit number in each image of the plurality of images;

and the third determining unit is used for determining the fruit yield of the fruit planting area according to the total fruit number of the fruit planting area and the average single fruit quality.

Optionally, the second determining unit is configured to:

and determining the average single fruit quality of the fruit growing area according to the fruit weight of the fruit in each image of the plurality of images and the fruit quantity in each image.

Optionally, the second determining unit is configured to:

the volume of the fruit in each image is used as an input to a regression model by which the quality of the fruit in each image is determined, the regression model being used to determine the quality of any fruit based on its volume.

Optionally, the apparatus further comprises:

Optionally, the obtaining module is configured to:

the fruit planting system comprises a first acquisition unit, a second acquisition unit and a control unit, wherein the first acquisition unit is used for respectively acquiring images of a plurality of sampling points through image acquisition equipment carried by a fruit park operation vehicle to obtain a plurality of images if the fruits planted in the fruit planting area are fruits of a first type;

the second obtains the unit for if the fruit of this fruit growing region planting is second type fruit, then carry out image acquisition respectively to these a plurality of sampling points through the image acquisition equipment that unmanned aerial vehicle carried on, obtain these a plurality of images.

Optionally, the second obtaining unit is configured to:

and controlling the unmanned aerial vehicle to fly according to the air route and the flying height, and acquiring images through the carried image acquisition equipment when flying to a planned waypoint.

Optionally, the multi-scale fusion detection module includes a plurality of target detection modules of different scales, and at least one target detection module of the plurality of target detection modules has a receptive field block RFB introduced therein.

Fig. 7 is a block diagram of a model training apparatus provided in an embodiment of the present application, where the apparatus may be integrated in a computer device, and the computer device may be a terminal or a server, and the terminal may be a mobile phone, a tablet computer, a computer, or the like. Referring to fig. 7, the apparatus includes: an acquisition module 701, an annotation module 702, and a training module 703.

An obtaining module 701, configured to obtain a sample image set, where a sample image in the sample image set is an image of a sample fruit planting area;

a labeling module 702, configured to label a fruit position in each sample image included in the sample image set to obtain a labeled image set;

the training module 703 is configured to train a target detection model to be trained according to the labeled image set to obtain a target detection model, where the target detection model is configured to identify a fruit in any image, a backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with cross-stage local CSP blocks.

Optionally, the obtaining module 701 includes:

Optionally, the data enhancement unit is configured to:

and generating a sample image simulating a second occlusion scene according to the sample image in the initial sample image set, wherein the second occlusion scene is a scene in which the fruit is occluded by a background object.

Optionally, the data enhancement unit is configured to:

removing an intersection pixel point set of the first fruit outline and the second fruit outline from the pixel point set of the first fruit outline to obtain a first pixel point set;

Optionally, the data enhancement unit:

and for a second sample image in the initial sample image set, replacing a fruit area in the second sample image with a background area in any sample image in the initial sample image to obtain a new sample image, wherein the second sample image is any image in the initial sample image set.

Optionally, the training module 703 is configured to:

taking each annotation image in the annotation image set as the input of the target detection model to be trained, and identifying the fruit position in each annotation image through the target detection model to be trained;

Optionally, the target detection model to be trained is obtained by modifying a Yolo V3 network model, the deformed Darknet53 network includes n CSP blocks, an attention mechanism module, and a multi-scale fusion detection module, where n is a positive integer.

Optionally, the multi-scale fusion detection module includes a plurality of object detection modules with different scales, and an RFB module is introduced into at least one of the plurality of object detection modules.

In the embodiment of the application, the fruit positions in each sample image included in the sample image set are labeled by obtaining the sample image set of the sample fruit planting area to obtain the labeled image set, and then the target detection model to be trained is trained according to the labeled image set, so that the target detection model capable of accurately identifying the fruit positions in the images can be trained. The backbone network of the target detection model to be trained is a deformed Darknet53 network, and the deformed Darknet53 network is generated by replacing a plurality of residual blocks in the Darknet53 with CSP blocks, so that the calculation amount of the target detection model can be reduced, and the identification accuracy can be improved. In addition, the original sample image set is subjected to occlusion data set enhancement, for example, a sample image simulating a situation that fruits are occluded by other fruits is generated, and a sample image simulating a situation that fruits are occluded by background objects is generated, so that the target detection model to be trained can learn more occlusion situations, and the identification accuracy of the target detection model is improved.

It should be noted that: in the fruit yield estimation device provided in the above embodiment, when performing fruit yield estimation, the model training device provided in the above embodiment is only illustrated by the division of the above functional modules when performing model training, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above.

Each functional unit and module in the above embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.

The fruit yield estimation device and the fruit yield estimation method provided by the above embodiments belong to the same concept, the model training device and the model training method provided by the above embodiments belong to the same concept, and the specific working processes and technical effects brought by the units and modules in the above embodiments can be referred to the method embodiment part, which is not described herein again.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer apparatus 800 includes: a processor 80, a memory 81 and a computer program 82 stored in the memory 81 and executable on the processor 80, the steps in the fruit yield estimation method or the model training method in the above embodiments being implemented when the computer program 82 is executed by the processor 80.

The computer device 800 may be a general purpose computer device or a special purpose computer device. In a specific implementation, the computer device 800 may be a desktop computer, a portable computer, a network server, a handheld computer, a mobile phone, a tablet computer, a wireless terminal device, a communication device, or an embedded device, and the embodiment of the present application does not limit the type of the computer device 800. Those skilled in the art will appreciate that fig. 8 is merely an example of a computer device 800 and is not intended to limit the computer device 800 and may include more or fewer components than those shown, or some components may be combined, or different components may be included, such as input output devices, network access devices, etc.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor.

The storage 81 may in some embodiments be an internal storage unit of the computer device 800, such as a hard disk or a memory of the computer device 800. The memory 81 may also be an external storage device of the computer device 800 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 800. Further, the memory 81 may also include both an internal storage unit and an external storage device of the computer apparatus 800. The memory 81 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs. The memory 81 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application further provides a computer device, where the computer device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the foregoing method embodiments may be implemented.

The embodiments of the present application provide a computer program product, which when run on a computer causes the computer to execute the steps of the above-mentioned method embodiments.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the above method embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a photographing apparatus/terminal device, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, etc. The computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present application, and they should be construed as being included in the present application.

Claims

1. A method of fruit yield estimation, the method comprising:

2. The method of claim 1, wherein the fruit identification results for each image include a fruit position and a fruit number in each image, and wherein determining the fruit yield for the fruit growing area based on the fruit identification results for each image of the plurality of images comprises:

determining the average single fruit quality of the fruit growing region according to the fruit quality of the fruit in each image of the plurality of images and the fruit quantity in each image;

3. The method of claim 2, wherein determining fruit quality of the fruit in each image based on the depth information of the fruit in each image comprises:

and taking the volume of the fruit in each image as an input of a regression model, determining the fruit quality of the fruit in each image through the regression model, and determining the fruit quality of any fruit according to the volume of the fruit.

4. The method of any one of claims 1-3, wherein the target detection model is a YOLO network model comprising a morphed Darknet comprising n CSP blocks, n being a positive integer, an attention mechanism module, and a multi-scale fusion detection module.

5. The method of claim 4,

the attention mechanism module is a convolution block attention module CBAM;

the multi-scale fusion detection module is a feature pyramid detection network FPN, the multi-scale fusion detection module comprises a plurality of target detection modules with different scales, and a receptive field block RFB is introduced into at least one target detection module in the plurality of target detection modules;

the multi-scale fusion detection module introduces a spatial attention mechanism.

6. A method of model training, the method comprising:

acquiring an initial sample image, wherein the initial sample image is obtained by carrying out image acquisition on a plurality of sample sampling points in a sample fruit planting area through image acquisition equipment;

performing data enhancement processing on the initial sample image set to obtain a sample image set;

7. The method of claim 6, wherein said data enhancement processing of said initial sample image set comprises:

8. The method of claim 7, wherein generating a sample image from the sample images in the initial sample set of images that simulates a first occlusion scenario comprises:

9. A computer device, characterized in that the computer device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, which computer program, when executed by the processor, implements the method according to any of claims 1-5 or 6-8.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1-5 or claims 6-8.