CN115631366A

CN115631366A - Tomato fruit maturity prediction method and device

Info

Publication number: CN115631366A
Application number: CN202211201176.4A
Authority: CN
Inventors: 徐海涛; 郑城锟; 龙拥兵; 袁宇翔; 冷鑫涛; 殷江栋; 邓海东; 兰玉彬
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-01-20

Abstract

The invention discloses a method and a device for predicting the maturity of tomato fruits; the method comprises collecting original picture of tomato fruit; improving the CenterNet-Improved model, and using the CenterNet-Improved model to predict the maturity of the original picture; and deploying a CenterNet-Improved model on the embedded platform configured with the camera to obtain a tomato fruit maturity prediction result in real time and visually display the tomato fruit maturity prediction result. A corresponding apparatus is disclosed. So as to visually display the tomato maturity according to the collected original tomato fruit image. The invention provides a tomato fruit maturity prediction method and device, which effectively solve the technical problems of low fruit maturity recognition accuracy and high requirements on computing power and a communication network in the prior art.

Description

Tomato fruit maturity prediction method and device

Technical Field

The invention relates to the technical field of deep learning and computer vision, in particular to a method and a device for predicting the maturity of tomato fruits.

Background

Tomato is a very common food material in daily life, is sweet and juicy, and is rich in various nutritional ingredients. In addition, the flavonoid contained in the tomato can prevent angiosclerosis, and has high medical value.

Realize production automation, intensification, intellectuality at the tomato planting in-process, can not only effectively ensure tomato output, can further promote the tomato quality moreover. In the tomato production link, the judgment of the tomato maturity plays an important role in the production management decision of harvesting and storage. Moreover, when tomatoes are used as edible commodities to enter the consumption field, the maturity and the commodity quality of the tomatoes are tightly hooked, and the economic benefit is directly influenced.

In recent years, under the support of artificial intelligence technology, traditional agriculture is accelerated to be transformed to intelligent agriculture. The artificial intelligence technology provides a novel idea and an effective tool for solving the problem of tomato maturity prediction. Researchers seek to utilize image processing methods to achieve tomato maturity detection. For example, the edge of a single tomato fruit is obtained by a binary method, and the maturity of the tomato is classified by combining RGB color characteristics. However, there are still many problems to be solved further in the prediction of tomato maturity. For example, the tomato maturity judging standard has great difference, and the physicochemical characteristics, biochemical parameters and phenotypic characteristics of the fruits can be used as the maturity judging reference. The judgment standard combined with the deep learning method is determined only according to the characteristics of the computer image, and the comprehensive criterion for embodying the characteristics of the planting production cycle and the physiological cycle of the tomatoes is lacked. The design of the maturity prediction model lacks the support of actual production scene elements, and the difference between a detection target and a foreground and background has a large influence on the detection accuracy; the accuracy and execution speed of the prediction model are to be further improved. The application of the maturity prediction model requires higher computational power support, and the algorithm deployment is difficult. The prediction model with huge expense cannot be operated on the edge end equipment with limited calculation capacity in real time. If the prediction model is deployed at the far end of the server, the edge device cannot communicate with the server under the condition of weak network or offline, and the feedback of real-time results is difficult to obtain. In the actual agricultural production link, the limited computing power of edge end equipment and the coverage limitation of a communication network cannot support the application of a large-cost prediction model.

Disclosure of Invention

The invention provides a tomato fruit maturity prediction method based on deep learning model multi-target detection, aiming at overcoming the technical defects of low fruit maturity recognition accuracy and high requirements on computational power and communication network in the prior art.

The invention discloses a method for predicting the maturity of tomato fruits, which is characterized by comprising the following steps:

collecting original pictures of tomato fruits;

constructing an Improved CenterNet-Improved model, and predicting the maturity of the original picture by using the CenterNet-Improved model;

and deploying a CenterNet-Improved model on an embedded platform configured with a camera to obtain the tomato fruit maturity prediction result in real time and visually display the tomato fruit maturity prediction result.

Preferably, after the collecting the original picture of the tomato fruit, the method further comprises:

constructing a reference picture data set of physiological period classification and marking:

collecting green mature period, color transition period, mature period and mature period of tomato fruit physiological period respectively, collecting original images of tomato fruit, and marking the physiological period corresponding to the images;

obtaining LAB three-channel average value L of fruit region of each reference picture _i 、A _i 、B _i And LAB three-channel average value of n reference pictures in the same physiological period

Wherein, i is marked as a picture serial number, and s is marked as a physiological period reference color category serial number;

selecting tomato fruits in each picture of the training picture data set according to the training picture data set, and taking the region as a fruit segmentation region; calculating LAB three-channel average value L of fruit region segmented by each training picture _k 、 A _k 、B _k Wherein j is marked as a picture serial number, and k is marked as a fruit serial number;

calculating the current fruit L by using a CIEDE2000 color difference formula _k 、A _k 、B _k For each physiological period

Taking the reference color category with the minimum color difference as the physiological period category of the tomato fruits j and k in the training picture data set, and determining the maturity according to the physiological period;

and the maturity is divided according to physiological period classification results, wherein the physiological period is a green maturity period, the maturity is marked as immature, the color conversion period is marked as semi-mature, and the maturity period and the full maturity period are marked as full mature.

Preferably, the constructing improves the centrnet-Improved model, comprising:

based on a CircleNet target detection model of a CenterNet algorithm, aiming at the deployment characteristics of side edge calculation acceleration, modifying a model backbone network, and replacing an original backbone network with a lightweight ShuffleNet V2;

replacing a rectangular target detection frame with a circular target detection frame, and determining fruit areas in a reference picture, a training picture and a prediction picture;

according to the geometric characteristics and the statistical deviation rule of the circular detection area, a weighting loss function L is constructed _det Improved model。

Preferably, the weighted loss function L _det The training model specifically comprises:

L _det ＝L _k +λ _off L _off +λ _r L _r +λ _DIou L _DIou ；

wherein L is _k For the heat map loss function of the key point, the calculation formula is

Alpha and beta are respectively 2 and 4, N is the number of the targets in the image;

L _off as a function of center point offset loss, the formula is

For the predicted center point offset, R is the downsampling rate,

as predicted center point, p is the true center point;

L _r as a function of the radius loss, the formula is

Is a predicted radius, r _k Representing the radius value in the marking frame, namely the real frame information;

simultaneously evaluating the influence of the overlapping degree of the prediction frame and the real frame and the center Distance between the two frames, and constructing a Distance-IoU Loss function with the calculation formula as

ρ(b,b ^gt ) The distance between the central points of the two frames; c is the length of the minimum rectangular diagonal covering the prediction frame and the real frame, and the calculation method is that the minimum circumscribed square of the circle detection frame is determined according to the circle center and the radius, the coordinates of the circle center of the detection frame are recorded as (x, y), the radius is recorded as r, and the coordinates of the upper left vertex and the lower right vertex of the minimum circumscribed square are respectively (x-r, y-r), (x + r, y + r).

Preferably, after the constructing the Improved centrnet-Improved model, the method further comprises the following steps:

training the Improved centrnet-Improved model:

collecting an original picture containing a plurality of tomato fruits, constructing a deep learning training picture data set, and classifying and marking the maturity of each tomato fruit in the picture;

a circular target detection frame is adopted, the circle center and the radius of the circular detection frame are manually selected, and the tomato fruit area in the training data set picture is calibrated;

dividing a training picture data set into a training set, a testing set and a verification set, setting parameters of initial learning rate, class number, training round and batch size, training a CenterNet-Improved model, obtaining an inference weight file of the CenterNet-Improved model after training is finished, and identifying performance indexes of the model.

Preferably, the dividing the training picture data set into a training set, a test set and a verification set, setting parameters of an initial learning rate, a class number, a training round and a batch size, training the cenet-Improved model, obtaining an inference weight file of the cenet-Improved model after the training is finished, and identifying the performance index of the model specifically includes:

according to the training picture data set, marking the circle center coordinates, the radius, the tomato fruit number, the picture number and the maturity category of a real frame, wherein the specific form is [ x, y, radius, id, image _ id and category _ id ], and generating a marking file according to a COCO format;

expanding and enhancing the data set, randomly mixing and enhancing the data set in an off-line manner by adopting three methods of adding rainwater, randomly erasing and Gaussian noise, and performing on-line enhancement on the data set in a translation manner, a scale transformation manner, a rotation manner, an overturning manner and a color disturbance manner;

setting a training hyper-parameter to obtain a trained inference weight model;

preferably, the identification model performance index calculation method includes:

calculating the intersection area, union area and intersection ratio of the prediction frame and the real frame by using a Circle IoU algorithm of a Circle Net model;

precision, recall, F1Score and mAP were calculated from Circle IoU values to evaluate the performance of the CenterNet-Improved model.

Preferably, the real-time obtaining and visually displaying of the tomato fruit maturity prediction result comprises:

converting the trained inference weight file into an inference file format required by an Ncnn framework;

setting a model quantization format for reducing the size of the weight file, accelerating the reasoning speed and reducing the hardware overhead in the reasoning process, calculating the size of the weight file, the reasoning efficiency and the model precision loss, and selecting the model quantization format according to the computing capability of the embedded platform;

constructing an inference algorithm of a CenterNet-Improved model based on an Ncnn framework, wherein the algorithm comprises the following steps: and (4) zooming the input inference image by using a bilinear interpolation method, and then carrying out standardization processing. After an input layer and a read output layer are loaded, normalizing the heat map part of the key point through a Sigmoid activation function layer, obtaining the maximum value points of the neighborhood of the heat map of the key point in a maximum pooling mode, and setting the positions of the rest non-maximum value points to be 0; finally, traversing and outputting values of the key point heat map, the central point offset and the radius, and calculating the circle center position, the circle radius, the maturity classification label and the confidence coefficient of each target;

writing application software, including an inference module, to calculate and obtain the circle center position, the circle radius, the maturity classification label, the confidence coefficient and the processing frame rate per second of each target; the visualization module is used for displaying pictures or videos acquired by the camera, displaying information of a target detection frame, a maturity classification label and a processing frame rate per second, and realizing maturity visualization; and the information sending module has a serial port sending function and can send the circle center position, the radius of the circle, the maturity classification label and the processing frame rate per second of each target.

The invention also discloses a device for predicting the maturity of the tomato fruit, which comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring the original picture of the tomato fruit;

the image processing module is used for constructing an Improved CenterNet-Improved model and predicting the maturity of the original picture by using the CenterNet-Improved model;

and the visualization module is used for deploying a CenterNet-Improved model on the embedded platform configured with the camera to obtain the tomato fruit maturity prediction result in real time and visually display the tomato fruit maturity prediction result.

According to the method, the original picture of the tomato fruit is collected, the maturity of the original picture is predicted by using a CenterNet-Improved model, and a model backbone network is realized by adopting a lightweight network, so that the method is better adapted to the environment of the computational-limited edge equipment. In the aspect of practical application, the functions of on-site real-time prediction, visual display and upper communication in a network-free and weak signal environment are realized, and the results are output to a matched embedded platform for visual display, so that workers can quickly and accurately know the maturity of the tomato fruits.

The existing tomato fruit maturity identification is mainly realized by a manual or image processing method or a prediction model with high computational power. The manual sorting mode needs to spend a large amount of energy, and different staff sort the standard differently, lead to great letter sorting deviation easily. The image processing method only classifies the tomato maturity according to the color characteristics, but the judgment standard of the tomato maturity is very different, the physicochemical characteristics, biochemical parameters and phenotypic characteristics of fruits can be used as judgment references of the maturity, and the judgment is inaccurate due to single considered factor. And finally, the prediction model needs high calculation force support, the algorithm is difficult to deploy, the network requirement is high, and the cost is overhigh.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

FIG. 1 is an original picture of green ripe fruit of tomato;

FIG. 2 is an original picture of tomato fruits in the color transition period;

FIG. 3 is an original picture of fruits of tomatoes in the mature period;

FIG. 4 is an original drawing of a tomato at the full-ripe stage;

FIG. 5 is a block diagram of the implementation process of the tomato fruit maturity prediction method of the present invention;

FIG. 6 is a schematic diagram illustrating a comparison between a circular detection box and a rectangular detection box in a conventional target detection algorithm in the method for predicting the ripeness of tomato fruits according to the present invention;

FIG. 7 is a schematic diagram of the calculation of Circle IoU in the tomato fruit maturity prediction method of the present invention;

fig. 8 is an operation interface diagram of the embedded end visualization software in the tomato fruit maturity prediction method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

It should be noted that all directional indicators (such as up, down, left, right, front, and back \8230;) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the motion situation, etc. in a specific posture (as shown in the attached drawings), and if the specific posture is changed, the directional indicators are changed accordingly.

In addition, the descriptions related to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

A tomato maturity prediction method based on deep learning, as shown in fig. 1-8, comprises

Collecting original pictures of tomato fruits at green maturity stage, color transition stage, mature stage and mature stage of physiological stage of tomato fruits by using a digital camera, and marking the physiological stage corresponding to the pictures.

The green ripe stage is that the fruit is already enlarged and shaped, is the first step necessary for the ripe stage, the peel and the pulp are light green or green, the tomato at this moment is very hard and has bad taste, and generally no one picks up the tomato to eat.

The color-changing period is the color change of the tomato peel and pulp from light green or green to red or pink, and generally the color change gradually from top to bottom of the fruit shoulder. The flesh begins to be somewhat soft, with some sugar content, but is still relatively sour to eat.

The mature period is that the tomato fruits are basically mature, about 80 percent of the tomato fruits are mature, and the peel is changed into red, pink, yellow and the like according to different varieties. The tomatoes in the maturation period are picked at a good time, have good taste, high sugar content and high nutritional value, and the maturation period is selected to be the best if the tomatoes are transported in a long distance.

And in the finish stage, the whole tomato is well cooked, the skin turns red, the sugar content reaches the highest, and the tomato is soft to eat and suitable for being eaten directly after being picked or sold directly in the market.

The green ripe period and the color conversion period have poor mouthfeel, generally can not be harvested, and the ripe period and the full ripe period have good mouthfeel and high sugar content, and are generally harvested in the two stages.

And (3) segmenting the original reference picture of the fruit, segmenting the picture of a single complete tomato fruit without leaf and fruit occlusion conditions, and constructing a physiological period classified reference picture data set. Calculating LAB three-channel average value of fruit region of each reference picture (i is marked as picture number) by adopting LAB color mode

Calculating LAB three-channel average value of n reference pictures in the same physiological period

(s denotes a physiological period reference color type number).

In the LAB three channel, the L channel represents the luminance, which controls the luminance and contrast of the picture, the a channel includes colors from dark green (low luminance value) to gray (medium luminance value) to bright red (high luminance value), and the B channel includes colors from bright blue (low luminance value) to gray to deep yellow (high luminance value). The LAB color mode is divided into three channels, the lightness channel: mainly contain the black, white and grey area in the picture, A passageway: mainly contain in the picture from green information to the color zone of carmine information, green is the shade, and carmine is highlight, B passageway: the color area from blue information to yellow information in the picture is mainly included, the blue is dark tone, and the yellow is highlight.

The method comprises the steps of collecting an original picture containing a plurality of tomato fruits, constructing a deep learning training picture data set, and classifying and marking maturity categories of each tomato fruit in the picture.

For example, all tomato fruit pictures are classified according to the standards of green maturity, color transition, maturity and maturity, and the fruits in the green maturity are labeled according to 1,2,3,4,5 \8230A, and other good fruit pictures are also operated in the same way.

A circular target detection frame is adopted, the circle center and the radius of the circular detection frame are manually selected, and the tomato fruit area in the training data set picture is calibrated. The distinguishing effect of the circular target detection box and the traditional target detection algorithm rectangular detection box is shown in fig. 6.

Compared with the original rectangular target detection frame, the circular target detection frame is more fit with the actual shape of the tomato fruit, and the redundant data of the four corners of the rectangular target detection frame is eliminated. These data belong to the foreground and background rather than to the fruit surface, and the color difference is much different from that of the actual fruit. The redundant data are removed, so that the workload of data processing is reduced, the identification accuracy of the reference picture is improved, and the accuracy of model prediction is improved.

Because tomato fruit most all is circular, so adopt circular target detection frame can laminate tomato fruit outline more, make things convenient for the picture to circular frame has reduced the region that does not belong to the fruit surface than the rectangle frame, has reduced the redundancy of data, has promoted the degree of accuracy of benchmark picture, can make follow-up processing work load to the picture still less, and work efficiency is high, and the processing is faster more accurate.

Dividing a training picture data set into a training set, a testing set and a verification set, setting parameters of initial learning rate, class number, training round and batch size, training a CenterNet-Improved model, and obtaining an inference weight file of the CenterNet-Improved model and a performance index of the recognition model after training is finished;

the training method is derived from the concept of transfer learning in machine learning: in order to complete a learning task, the model is trained on other related tasks, and then the model is further optimized on a target task, so that the transfer of the knowledge learned by the model is realized. The pre-training model has the greatest advantages that the problem of insufficient data of the target task can be solved, the effective model is established by utilizing a large amount of data of other tasks and then is migrated to the target task, and the accuracy of the model is greatly improved. The maturity of the tomatoes is judged quickly, and the output result of the model after training is quicker and more accurate than that before training.

The method comprises the steps of dividing a training picture data set into a training set, a testing set and a verification set, setting parameters of initial learning rate, class number, training round and batch size, training a CenterNet-Improved model, obtaining an inference weight file of the CenterNet-Improved model after training is finished, and identifying performance indexes of the model specifically comprise the following steps:

tomato fruits are labeled, which is helpful for improving the accuracy of production management. The data such as tomato fruit grade, radius, maturity category and the like can be used as production management information and stored in a data platform, and the planted products correspond to the production information one by one, so that the information digitization in the production management process is convenient to realize. In addition, workers can conveniently utilize the production information, the growth period of each tomato can be accurately identified in later-stage collection, and the premature tomatoes are prevented from being collected by mistake. Meanwhile, the system outputs the sizes of the tomato fruits, can automatically classify and grade the tomato fruits, and greatly reduces the workload of subsequent manual classification and grading of the tomato fruits.

setting a training hyper-parameter to obtain a trained inference weight model;

the data expansion enhancement means that under the condition that data is not increased substantially, limited data generates value equivalent to more data, rain is added to a picture by people, water drops similar to raindrops are added to the picture, gaussian noise is based on noise data enhancement, namely, on the basis of the original picture, some noise is superposed randomly, and random erasing means that an area is selected randomly on the picture and image information is erased.

Based on a CircleNet target detection model of a CenterNet algorithm, aiming at the deployment characteristics of side edge calculation acceleration, a model backbone network is modified, and a lightweight ShuffleNet V2 is adopted to replace the original backbone network.

The centret is an object detection network, which is advantageous in speed and precision, and in the development of lightweight networks, in order to measure the computational complexity, a widely used measurement standard is the number of floating point operations (FLOPs), however, the FLOPs is an indirect index, which cannot be equal to direct indexes such as speed and accuracy, such as speed or delay, and past work also proves that the network operation speed of the FLOPs is different. The use of FLOPs alone as an indicator of evaluation may result in a suboptimal solution.

The contradiction between the direct indicator and the indirect indicator can be attributed to two reasons. First, FLOPs do not consider some important factors affecting speed. For example, the memory access cost occupies a large amount of operation time in group coherency, and is also a potential performance bottleneck in GPU operation; there is also a degree of parallelism. Highly parallel networks perform much more quickly with the same FLOPs.

Second, the FLOPs operate identically, but operate differently on different platforms. Early work, for example, extensively used tensor decomposition to accelerate matrix multiplication. But recent work has found that although it can reduce FLOPs by 75%, it operates more slowly on the GPU.

Therefore, two major principles of efficient network architecture design are proposed. First, direct indicators (e.g., velocity) are used rather than indirect indicators (e.g., FLOPs); second, the metrics need to be validated on the target platform. Meanwhile, four design guidelines crossing the platform are provided, and a new network architecture ShuffleNet V2 is designed under the guidance of the guidelines.

The prior art needs higher computational power support, so that the algorithm is difficult to deploy. The prediction model with huge expense cannot be operated on the marginal end equipment with limited calculation capacity in real time. If the prediction model is deployed at the far end of the server, under the condition of weak network or offline, the edge device cannot communicate with the server, and the feedback of a real-time result is difficult to obtain, so that the invention modifies the model backbone network by aiming at the deployment characteristic of the edge computing acceleration of the side edge, adopts the light-weight ShuffleNet V2 to replace the original backbone network, and can better adapt to the environment of the edge device with limited computing power. In the aspect of practical application, the functions of on-site real-time prediction, visual display and upper communication in a network-free and weak signal environment are realized.

Constructing a weighted loss function L aiming at the circular shape characteristics of tomato fruits _det The method comprises the following steps:

L _det ＝L _k +λ _off L _off +λ _r L _r +λ _DIou L _DIou

weighted part one in the above equation, L _k For the key point heat map loss function, the calculation formula is:

alpha and beta are respectively 2 and 4, and N is the number of the targets in the image.

The key point heat map loss function realizes the balance of difficult and easy samples and the distinction of positive and negative samples, and the input picture is assumed to be I E R ^W×H×3 Where W represents the width of the input picture and H represents the height of the picture. The key point heatmap records information for each category. Each category generates a key point heat map in which a gaussian circle centered at the center point of the target is recorded.

Note book

Wherein C represents the number of sample data set categories, and R represents the down-sampling rate.

And outputting 1, namely detecting the object with the category C, and otherwise outputting 0 by using the key point heat map. Wherein the true center point of each target is calculated by a two-dimensional gaussian kernel function.

Wherein (x, y) is the coordinate of the real central point of the target,

the coordinate of the target center point after down sampling is calculated by the formula

Where R represents the down-sampling rate and the middle brackets indicate the rounding-down operation. Sigma _p The standard deviation is obtained, the numerical value is related to the size of the detected target, and the error between the coordinate of the predicted central point and the coordinate of the real central point is recorded as L _k 。

For the second weighting part, in the step of calculating the position of the center point, the down-sampling operation involves a rounding-down operation, which results in an error in the coordinates of the center point of the key point heat map outputted by the network, and therefore, the error needs to be evaluated in consideration of the center point Offset (Offset).

L _off For the center point offset loss function, the calculation formula is:

for the predicted center point offset, R is the down-sampling rate,

for the predicted center point, p is the true center point.

For the third weighting part, the tomato fruit maturity detection method adopts a circular target detection frame, and the size of the tomato is considered to be realized through a circle radius.

Note the book

Represents the radius prediction of each point, then L _r For the radius loss function, the calculation formula is:

to a predicted radius, r _k To represent the radius value in the label box (real box) information.

Regarding the weighted part four, considering the intersection ratio (IoU) of the predicted frame and the real frame, and simultaneously evaluating the influence of the center Distance between the two frames to construct a Distance-IoU Loss function, wherein the calculation formula is as follows:

ρ(b,b ^gt ) The distance between the central points of the two frames; c is the length of the minimum rectangle diagonal covering the prediction frame and the real frame, and the calculation method is to determine the minimum circumscribed square of the circle detection frame according to the circle center and the radius. And (4) recording the coordinates of the circle center of the detection frame as (x, y) and the radius as r, and then respectively recording the coordinates of the upper left vertex and the lower right vertex of the minimum circumscribed square as (x-r, y-r) and (x + r, y + r).

Preferably, take λ _off ＝0.6，λ _r ＝0.1，λ _DIou ＝0.3。

The model is trained by utilizing the weighting loss function, the offset of the central point can be reduced, the estimated predicted value and the true value of the model are closer, the model prediction speed is higher, and the output result is more accurate.

The penalty function is used to represent the difference between the predicted and actual data, and the weighted parts of the penalty function represent the cases that may lead to bias, respectively. Therefore, the loss functions are not configured to be the same, and the factors representing the occurrence of the bias to be considered when constructing the model are not the same. IoU loss is not considered by the loss function of the CircleNet target detection model. The loss function constructed in the invention further considers the intersection ratio IoU of the prediction frame and the real frame. Therefore, the deviation factors considered by the loss function are more comprehensive, so that the model training convergence speed is increased better, and the model precision is improved.

An Improved target detection model centret-Improved was obtained.

Dividing a training picture data set into a training set, a test set and a verification set, setting parameters of initial learning rate, class number, training round and batch size, training a CenterNet-Improved model, obtaining an inference weight file of the CenterNet-Improved model after training is finished, and identifying performance indexes of the model.

Preferably, the training picture data set is adjusted according to 8:1: the proportion of 1 is divided into a training set, a test set and a verification set. Setting the initial learning rate to 5 × 10 ^-4 The number of categories is 3, the epochs (training round) is 300, the batch size is 32, training is carried out, and a model weight file is obtained after the training is finished.

Further, the model performance index is obtained by adopting the following steps

Preferably, the intersection and union ratio between the detection frame and the real frame is calculated by using Circle IoU, that is, the intersection area and the union area between two circles are calculated by using the related knowledge of cosine law, and the calculation diagram is as shown in fig. 7.

Method for calculating angle AO by using cosine theorem ₁ O ₂ ，∠AO ₂ O ₁

Then, the intersection area of the two circles is calculated, so that the Circle IoU is calculated

Preferably, indexes such as Precision, recall, F1Score (F1 Score) and mAP (mean average Precision) are selected for evaluation and analysis;

for TP (True Positive, correct detection frame) and FP (False Positive, false Negative), FN (False Negative), defining that if the Circle IoU between prediction frame and real frame is greater than or equal to 0.5, it is marked as TP, otherwise it is marked as FP, and if there is no matched prediction frame in the real frame, it is marked as FN.

For example, if the intersection area Circle IoU of the prediction frame Circle and the correct detection frame Circle is 0.6, the prediction frame is regarded as a correct detection frame and marked as TP, and if the intersection area Circle IoU of the prediction frame Circle and the correct detection frame Circle is 0.49, the prediction frame is regarded as an erroneous detection frame and marked as FP.

For Precision (Precision), recall (Recall), the calculation formula is as follows:

the accuracy rate is relative to the prediction result and indicates how many samples predicted to be positive are correct; then there are two possible sources of samples that are predicted to be positive, one is to predict positive as positive, these are TP, and the other is to predict negative as positive, these are FP, so the accuracy is: p = TP/(TP + FP)

The recall ratio is relative to the samples, i.e. how many positive samples are predicted to be correct, there are TP positive samples, all the positive samples have two directions, one is judged to be positive, the other is misjudged to be negative, so there are TP + FN total, therefore, the recall ratio R = TP/(TP + FN).

Precision and Recall are mutually influenced, in an ideal case, the larger the value of both are, the better the performance is, but in practice, the two generally have an inverse relationship, so a Precision Recall PR (Precision Recall) curve needs to be made, the area enclosed under the curve is calculated to obtain an Average Precision AP (Average Precision), the larger the Average Precision, the better the model performance, and the calculation formula is as follows:

calculating the average value of AP values of three categories of full maturity, semi maturity and immature maturity to obtain the average precision mAP;

for F1Score, which is the harmonic mean of the precision rate and recall rate, the following formula is calculated:

from the formula, the larger the Precision, the smaller the 1/Precision, and thus the larger the F1Score, when Recall is unchanged.

The same principle is that: when Precision is constant, the larger Recall, the smaller 1/Recall and thus the larger F1 Score.

The invention also discloses a tomato fruit maturity prediction device, which comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring the original picture of the tomato fruit;

The indexes of the model trained in this example are shown in Table 1

TABLE 1 summary of model Performance indicators

The same super-parameters are set for training by adopting the same training picture data set, the super-parameters are parameters which are set before the learning process is started, the super-parameters are configuration variables outside the model, the parameter data are not obtained through training, the value of the super-parameters cannot be estimated through the data, and a group of good super-parameters are selected under the normal condition, so that the learning performance and effect can be improved. Comparing the effects of the CenterNet model and the CenterNet-Improved model, the results showed that the mAP value of the CenterNet model was 79.02% and the CenterNet-Improved model increased by 2.06% in mAP value. It can be seen that the centret-Improved model has more accurate performance.

The method comprises the following steps that a CenterNet-Improved model is deployed on an embedded platform provided with a camera, and the embedded platform further comprises system software, a driver, a hardware platform and a development environment. The simulator and the program are compiled normally, the platform automatically conducts application reasoning and visual display, and workers can obtain the maturity prediction result of the tomato fruits on the embedded platform in real time, and the method comprises the following specific steps:

and converting the trained inference weight file into an inference file format required by an Ncnn framework by using an Ncnn conversion tool, wherein the Ncnn is a high-performance neural network forward computing framework which is extremely optimized for a mobile phone end. Ncnn deeply considers the deployment and application of the mobile phone end from the beginning of design, no third party is dependent on the system, the system is cross-platform, and the speed of a CPU of the mobile phone end is higher than that of all known open source frameworks at present. Based on Ncnn, a developer can easily transplant a deep learning algorithm to a mobile phone end for efficient execution;

further, the deep learning inference process requires a large amount of parallel floating point calculations, and compared to the CPU, the GPU is more suitable for parallel calculations. However, the general GPU chip consumes a large amount of power, and the edge devices are mainly powered by small-capacity batteries, and cannot bear the power consumption load of the general GPU chip. The adoption of a special GPU reasoning chip has higher cost and also has larger technical barriers. Therefore, model reasoning needs to be optimized in view of the computational limitations of the edge-end devices.

Preferably, the raspberry pi 4B is used as a test embedded system, a CPU is started to have four threads, and inference time is tested. Considering that a pth weight parameter file after training of an improved CenterNet model built by a Pythrch framework is converted into an ONNX (Open Neural Network Exchange) format weight file, and the pth weight parameter file is taken as a bridge and then converted into a bin Network weight file and a param Network structure file required by a Tencent priority chart Ncnn inference framework.

The raspberry pi 4B is a mini embedded computer, and the raspberry pi 4B can be used to edit documents, browse web pages, play games, play videos, play audios and the like, and the raspberry pi 4B can be used to make intelligent trolleys, oscilloscopes, electronic photo frames, home theaters, cameras and the like.

model quantization refers to converting a floating point algorithm of a neural network into a fixed point. Quantization has some similar terms, and Low precision (Low precision) may be common.

The low-precision model represents that the model weight numerical format is FP16 (half-precision floating point) or INT8 (8-bit fixed point integer), but currently, the low precision model is usually referred to as INT8.

The conventional precision model generally represents the model weight value format as FP32 (32-bit floating point, single precision).

The Mixed precision (Mixed precision) then uses both FP32 and FP16 weight value formats in the model. FP16 reduces the memory size by half, but some parameters or operators must be in FP32 format to maintain accuracy.

Model quantization enables the reduction of model size: if int8 quantization can reduce the model size by 75%, int8 quantization model size is typically 1/4 of the size of 32-bit floating point model; the storage space is reduced: the method is more significant when the end side storage space is insufficient; reducing the memory occupation: smaller models of course mean that more memory space is not required; and reducing the power consumption of the equipment: the memory consumption is low, the reasoning speed is high, and the equipment power consumption is naturally reduced; the method has the advantages that the reasoning speed is high, the int8 integer can be accessed four times by accessing the 32-bit floating point type once, and the integer operation is faster than the floating point type operation; the CPU calculates faster with int8. Therefore, the design concept of low cost and small equipment can be met.

Preferably, the statistics of the transformed model parameters are shown in Table 2

TABLE 2 statistical list of model parameters after conversion and quantization

As can be seen from table 2, the loss of contrast is small until the model is quantized to bf16, but the loss increases greatly after the quantization to int 8; comparing the inference time, it can be seen that the inference using the model with the bf16 format speeds up by about 24% compared to the fp32 inference time, while int8 quantization speeds up by only about 4% compared to the bf16 inference time. Therefore, the weight file of the selected bf16 is a preferable scheme for balancing precision and efficiency, and therefore, the bf16 format is selected as the preferable scheme for the quantization acceleration reasoning application in the embodiment;

an inference algorithm of the CenterNet-Improved model based on the Ncnn framework is constructed, and the algorithm steps are as follows: and zooming the input inference image by using a bilinear interpolation method, and then carrying out standardization processing. After the input layer is loaded and the output layer is read, normalization processing is carried out on the key point heat map part through the Sigmoid activation function layer, maximum value points of the neighborhood of the key point heat map are obtained in a maximum pooling mode, and the positions of the rest non-maximum value points are set to be 0. Finally, traversing and outputting values of the key point heat map, the central point offset and the radius, and calculating the circle center position, the circle radius, the maturity classification label and the confidence coefficient of each target;

bilinear interpolation, also known as bilinear interpolation. Mathematically, bilinear interpolation is linear interpolation expansion of an interpolation function with two variables, and the core idea is to perform linear interpolation in two directions respectively, and the bilinear interpolation is used as an interpolation algorithm in numerical analysis and widely applied to the aspects of signal processing, digital image and video processing and the like.

The pooling mode comprises average pooling and maximum pooling, wherein the average pooling can reduce the increase of the contrast of the estimated value caused by the limitation of the field size, more background information of the image can be kept, the maximum pooling can reduce the estimated mean shift caused by parameter errors of the convolutional layer, and more texture information is kept. Obviously, the background information of the tomato fruit image does not need to be preserved in the invention, but the texture of the tomato fruit image is needed to further judge the maturity of the tomato fruit, so the mode of maximum pooling is selected.

Writing application software, including an inference module, to calculate and obtain the circle center position, the circle radius, the maturity classification label, the confidence coefficient and the processing frame rate per second of each target; the visualization module is used for displaying pictures or videos acquired by the camera, displaying a target detection frame, a maturity classification label and information of a processing frame rate per second, and realizing maturity visualization; and the information sending module has a serial port sending function and can send the circle center position, the radius of the circle, the maturity classification label and the processing frame rate per second of each target.

Preferably, in a specific test environment, a worker runs software on the raspberry pi 4B to perform inference test on input pictures and video streams in a pure CPU mode, clicks a selection file in the visualization software, selects a picture or a video to be identified, then clicks a serial port transmission option box, and clicks a start identification button. The ripeness of the tomato fruits in the picture or video stream can be identified, the information is printed in the text box and sent through serial port communication, as shown in fig. 8.

And a model quantization format is reasonably selected, so that the reduction of the size of the weight file is facilitated. A high-precision quantization format is adopted, so that higher model precision is facilitated, but the reasoning speed with larger calculated amount is influenced; the quantization format with low precision is adopted, which is helpful for obtaining higher reasoning speed, but the model precision is reduced. And obtaining an optimized weight file configuration scheme from the viewpoint of balancing the inference speed and the model precision loss. The optimized weight file has small size, relatively high reasoning speed and less precision loss, and provides a better and more reasonable solution for applying the optimized weight file to implement detection on the computational-limited edge-end equipment. The results output by the model are visually displayed, so that the operation and the checking of workers are facilitated, the starting difficulty of software is reduced, the training time and the training energy of the workers are saved, the tomato fruit maturity prediction results can be obtained in real time on site, and the situation that picking decisions are influenced due to the fact that the prediction is not real-time is avoided, and unnecessary waste caused by unreasonable arrangement of the workers is avoided. The maturity prediction is implemented by adopting equipment with limited computing power, so that the cost can be reduced, and the problem that equipment cannot be popularized due to overhigh cost can be solved.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A tomato fruit maturity prediction method is characterized by comprising the following steps:

collecting original pictures of tomato fruits;

2. The method for predicting ripeness of tomato fruit according to claim 1, wherein the collecting of the original picture of tomato fruit further comprises:

selecting tomato fruits in each picture of the training picture data set according to the training picture data set, and taking the region as a fruit segmentation region; calculating LAB three-channel average value L of fruit region segmented by each training picture _k 、A _k 、B _k Wherein j is marked as a picture serial number, and k is marked as a fruit serial number;

calculating the current fruit L by using a CIEDE2000 color difference formula _k 、A _k 、B _k And for each physiological period

Taking the color difference value ofThe reference color category with the minimum color difference is used as the physiological period category of tomato fruits j and k in the training picture data set, and the maturity is determined according to the physiological period;

and the maturity is divided according to the physiological period classification result, wherein the physiological period is a green maturity period and marks the maturity as immature, the color conversion period and mature period as semi-mature, and the mature period and the complete maturity period are marked as full-mature.

3. Tomato fruit maturity prediction method according to claim 1, characterized in that said construction of a modified centrnet-Improved model comprises:

based on a CircleNet target detection model of a CenterNet algorithm, aiming at the deployment characteristic of side edge calculation acceleration, modifying a model backbone network, and replacing an original backbone network by adopting a lightweight ShuffleNet V2;

according to the geometric characteristics and the statistical deviation rule of the circular detection area, a weighting loss function L is constructed _det And improving the model.

4. Tomato fruit maturity prediction method according to claim 3, characterized in that the weighted loss function L _det The training model specifically comprises:

L _det ＝L _k +λ _off L _off +λ _r L _r +λ _DIou L _DIou ；

L _off as a function of center point offset loss, the formula is

For the predicted center point offset, R is the downsampling rate,

for the predicted center point, p is the true center point;

L _r as a function of the radius loss, the formula is

ρ(b,b ^gt ) The distance between the central points of the two frames; and c is the length of the minimum rectangular diagonal covering the prediction frame and the real frame, the calculation method comprises the steps of determining the minimum external square of the circle detection frame according to the circle center and the radius, recording the coordinates of the circle center of the detection frame as (x, y) and the radius as r, and respectively setting the coordinates of the upper left vertex and the lower right vertex of the minimum external square as (x-r, y-r) and (x + r, y + r).

5. Tomato fruit maturity prediction method according to claim 3, characterized in that said construction of Improved centrnet-Improved model further comprises:

training the Improved centrnet-Improved model:

6. The tomato fruit maturity prediction method according to claim 5, wherein the training picture dataset is divided into a training set, a testing set and a verification set, parameters of an initial learning rate, a category number, a training round and a batch size are set, a CenterNet-Improved model is trained, an inference weight file of the CenterNet-Improved model is obtained after training is finished, and the identification model performance index specifically comprises:

the method comprises the following steps of performing expansion enhancement on a data set, performing random mixing off-line enhancement on the data set by adopting three methods of adding rainwater, randomly erasing and Gaussian noise, and performing on-line enhancement on the data set by adopting translation, scale transformation, rotation, turnover and color disturbance modes;

setting a training hyper-parameter to obtain a trained inference weight model;

7. the tomato fruit maturity prediction method of claim 5, wherein the recognition model performance index calculation method comprises:

8. Tomato fruit maturity prediction method according to claim 1, characterized in that,

the real-time tomato fruit maturity prediction result acquisition and visual display comprises the following steps:

setting a model quantization format for reducing the size of the weight file, accelerating the reasoning speed and reducing the hardware overhead in the reasoning process, calculating the size of the weight file, the reasoning efficiency and the model precision loss, and selecting the model quantization format according to the calculation capability of the embedded platform;

writing application software, including an inference module, to calculate and obtain the circle center position, the circle radius, the maturity classification label, the confidence coefficient and the processing frame rate per second of each target; the visualization module is used for displaying pictures or videos acquired by the camera, displaying a target detection frame, a maturity classification label and information of a processing frame rate per second, and realizing maturity visualization; and the information sending module has a serial port sending function and can send the circle center position, the circle radius, the maturity classification label and the processing frame rate per second of each target.

9. The device for predicting the maturity of the tomato fruits is characterized by comprising an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring original pictures of the tomato fruits;