CN113808055A

CN113808055A - Plant identification method and device based on hybrid expansion convolution and storage medium

Info

Publication number: CN113808055A
Application number: CN202110947152.2A
Authority: CN
Inventors: 郑禄; 帖军; 刘越; 宋中山; 王江晴; 吴立锋; 徐胜舟; 肖博文
Original assignee: South Central University for Nationalities
Current assignee: Wuhan Bitaoju Agricultural Development Co ltd; South Central Minzu University
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2021-12-17
Anticipated expiration: 2041-08-17
Also published as: CN113808055B

Abstract

The invention relates to the technical field of image processing, and discloses a plant identification method, a plant identification device and a storage medium based on mixed expansion convolution, wherein the method comprises the following steps: acquiring a plant image to be processed in a preset scene, and determining scene distance information of the plant image to be processed; carrying out expansion processing on the plant image to be processed according to the scene distance information to obtain an expanded plant image; inputting the expanded plant image into a preset mixed expanded image segmentation model to obtain a plant segmentation image; and carrying out plant identification according to the plant segmentation image. Compared with the prior art that plant images need to be manually segmented, the efficiency of plant image processing is low, the plant image processing method and the plant image processing system utilize the preset mixed expansion image segmentation model to process the plant images, the plant segmentation images are rapidly and accurately obtained, and the accuracy of plant identification is improved.

Description

Plant identification method and device based on hybrid expansion convolution and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a plant identification method and device based on hybrid expansion convolution and a storage medium.

Background

Plants in the natural environment are affected by a plurality of interference factors, which mainly include illumination intensity change, uneven brightness, similar color of the plants and the background, branch and leaf shielding and shadow covering, so that the appearance characteristics of the plants are greatly changed along with the change of the environment. Due to the influence of a plurality of interference factors, the plant identification directly by using the target detection algorithm has more misjudgment or omission, so that the plant cannot be accurately identified.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a plant identification method, a plant identification device and a storage medium based on mixed expansion convolution, and aims to solve the technical problem of improving the accuracy of plant identification.

In order to achieve the above object, the present invention provides a mixed dilation convolution based plant identification method, which includes the following steps:

acquiring a plant image to be processed in a preset scene, and determining scene distance information of the plant image to be processed;

performing expansion processing on the plant image to be processed according to the scene distance information to obtain an expanded plant image;

inputting the expanded plant image into a preset mixed expanded image segmentation model to obtain a plant segmentation image;

and carrying out plant identification according to the plant segmentation image.

Preferably, the step of performing dilation processing on the plant image to be processed according to the scene distance information to obtain a dilated plant image includes:

determining an image expansion rate according to the scene distance information;

acquiring image pixel information corresponding to the plant image to be processed;

performing expansion processing on the image pixel information according to the image expansion rate to obtain expanded pixel information;

and generating a swelling plant image according to the swelling pixel information.

Preferably, before the step of obtaining a plant image to be processed in a preset scene and determining scene distance information of the plant image to be processed, the method further includes:

acquiring a plurality of plant image samples under different scenes;

respectively determining plant image category information corresponding to the plurality of plant image samples;

respectively preprocessing the multiple plant image samples according to the plant image category information to obtain multiple plant image training samples and multiple plant image verification samples;

training an initial network model according to the multiple plant image training samples to obtain an initial mixed expansion image segmentation model;

and determining a preset mixed expansion image segmentation model according to the multiple plant image verification samples and the initial mixed expansion image segmentation model.

Preferably, the step of preprocessing the multiple plant image samples according to the plant image category information to obtain multiple plant image training samples and multiple plant image verification samples includes:

respectively determining image illumination information corresponding to the multiple plant image samples according to the plant image category information;

determining a preset gray world elimination rule according to the image illumination information;

respectively processing the multiple plant image samples according to the preset gray world elimination rule to obtain multiple gray image samples;

and determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of gray scale image samples.

Preferably, the step of determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of gray-scale image samples includes:

respectively determining image brightness information corresponding to the gray images;

determining a brightness processing function according to the image brightness information;

respectively adjusting the brightness of the gray images through the brightness processing function to obtain a plurality of brightness image samples;

and determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of brightness image samples.

Preferably, the step of determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of luminance image samples includes:

respectively carrying out rotation mirror image processing on the multiple brightness image samples to obtain multiple rotation image samples;

respectively carrying out noise processing on the multiple rotating image samples to obtain multiple noise point image samples;

respectively carrying out fuzzy processing on the multiple noise point image samples to obtain multiple fuzzy image samples;

and dividing the plurality of blurred image samples according to a preset image proportion rule to obtain a plurality of plant image training samples and a plurality of plant image verification samples.

Preferably, the step of training the initial network model according to the multiple plant image training samples to obtain an initial hybrid dilated image segmentation model includes:

respectively determining image distance information corresponding to the plurality of plant image training samples;

respectively determining expansion rates of image samples corresponding to the plant image training samples according to the image distance information;

judging whether the expansion rate of the image sample meets a preset expansion factor relation or not;

and when the expansion rate of the image samples meets the preset expansion factor relationship, training an initial network model according to the expansion rates of the plant image training samples and the image samples corresponding to the plant image training samples to obtain an initial mixed expansion image segmentation model.

Preferably, the step of determining a preset mixed dilated image segmentation model according to the multiple plant image verification samples and the initial mixed dilated image segmentation model includes:

verifying the initial mixed expansion image segmentation model according to the multiple plant image verification samples to obtain a model evaluation index value;

and when the model evaluation index score meets a preset index condition, taking the initial mixed expansion image segmentation model as a preset mixed expansion image segmentation model.

In addition, in order to achieve the above object, the present invention further provides a mixed dilation convolution based plant identification apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plant image to be processed in a preset scene and determining scene distance information of the plant image to be processed;

the processing module is used for performing expansion processing on the plant image to be processed according to the scene distance information to obtain an expanded plant image;

the segmentation module is used for inputting the expanded plant image into a preset mixed expanded image segmentation model to obtain a plant segmentation image;

and the identification module is used for carrying out plant identification according to the plant segmentation image.

In addition, in order to achieve the above object, the present invention further provides a storage medium having a hybrid dilation convolution based vegetation identification program stored thereon, wherein the hybrid dilation convolution based vegetation identification program, when executed by a processor, implements the steps of the hybrid dilation convolution based vegetation identification method as described above.

According to the method, firstly, a plant image to be processed under a preset scene is obtained, scene distance information of the plant image to be processed is determined, then expansion processing is carried out on the plant image to be processed according to the scene distance information to obtain an expanded plant image, then the expanded plant image is input into a preset mixed expanded image segmentation model to obtain a plant segmentation image, and finally plant identification is carried out according to the plant segmentation image. Compared with the prior art that plant images need to be manually segmented, the efficiency of plant image processing is low, the plant image processing method and the plant image processing system utilize the preset mixed expansion image segmentation model to process the plant images, the plant segmentation images are rapidly and accurately obtained, and the accuracy of plant identification is improved.

Drawings

Fig. 1 is a schematic structural diagram of a hybrid dilation convolution-based plant identification apparatus for a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a first embodiment of a mixed dilation convolution-based plant identification method according to the present invention;

FIG. 3 is a schematic flow chart of a plant identification method based on hybrid dilation convolution according to a second embodiment of the present invention;

FIG. 4 is a graph of HDC-Seg Net model loss according to a second embodiment of the plant identification method based on hybrid dilation convolution according to the present invention;

FIG. 5 is a SegNet model loss diagram of a second embodiment of the hybrid dilation convolution based plant identification method of the present invention;

fig. 6 is a block diagram illustrating a first embodiment of a mixed dilation convolution-based plant recognition apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a hybrid dilation convolution-based plant identification device in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the hybrid dilation convolution-based plant recognition apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may further include a standard wired interface and a wireless interface, and the wired interface for the user interface 1003 may be a USB interface in the present invention. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the hybrid dilation-convolution based plant recognition device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, identified as a computer storage medium, may include an operating system, a network communication module, a user interface module, and a hybrid convolutional based vegetation identification program.

In the plant identification device based on hybrid dilation convolution shown in fig. 1, the network interface 1004 is mainly used for connecting with a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting user equipment; the hybrid deconvolution-based plant identification device calls the hybrid deconvolution-based plant identification program stored in the memory 1005 through the processor 1001, and executes the hybrid deconvolution-based plant identification method provided by the embodiment of the present invention.

Based on the hardware structure, the embodiment of the plant identification method based on the hybrid expansion convolution is provided.

Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the plant identification method based on hybrid dilation convolution of the present invention, and proposes the first embodiment of the plant identification method based on hybrid dilation convolution of the present invention.

In a first embodiment, the hybrid dilation convolution-based plant identification method includes the steps of:

step S10: the method comprises the steps of obtaining a plant image to be processed in a preset scene, and determining scene distance information of the plant image to be processed.

It should be noted that the main execution body of the present embodiment is a hybrid-deconvolution-based plant recognition device, where the device is a hybrid-deconvolution-based plant recognition device having functions such as image processing, data communication, and program execution, and may be other devices, which is not limited in this embodiment.

The plant image to be processed in the preset scene can be a fruit image growing on a tree in a natural environment, the fruit image can be a single green citrus fruit image, a plurality of green citrus fruit images and the like, the plant image to be processed in the preset scene can be acquired through an internal camera of the mobile terminal in the embodiment, the plant image to be processed in the preset scene can be acquired through an external camera, and the plant image to be processed in the preset scene can be acquired through a camera.

The scene distance information is distance information between the camera and the plant, the scene distance information can be obtained according to pixel information of the plant image to be processed, and the scene distance information can be 10cm, 8cm and the like.

Step S20: and performing expansion processing on the plant image to be processed according to the scene distance information to obtain an expanded plant image.

The expanded plant image can be understood as a plant image or the like in which a plant area is enlarged and an image size is uniform in the plant image to be processed.

Determining an image expansion rate according to the scene distance information, acquiring image pixel information corresponding to the plant image to be processed, performing expansion processing on the image pixel information according to the image expansion rate to obtain expanded pixel information, and generating an expanded plant image according to the expanded pixel information. The expansion pixel information may be an expansion rate corresponding to the plant image to be processed.

The size of the fruit can show different sizes along with the growth period of the fruit and the change of the distance between the fruit and the fruit when the fruit is shot in the natural environment, if a small object cannot be reconstructed, the segmentation effect of a model on a small fruit target can be influenced, in the prior art, when a convolutional neural network is used for image processing, convolution and pooling operation are generally carried out, so that the feature graph after operation is small, and the feature graph can be transmitted to a final full-connection layer for classification. And the image semantic segmentation task needs to classify each pixel point in the image, so that pixel point prediction needs to be carried out after a smaller feature image is changed into the size of an original image through an upsampling operation. However, this process has two major disadvantages: since the pooling operation is irreversible, there is a problem of information loss when restoring the feature map, and small object images cannot be reconstructed after the pooling operation. Instead of the traditional max-pooling and convolution operations, the present embodiment proposes a dilation convolution to increase the perceived field of view while keeping the feature map and the original image size consistent. The expansion convolution reduces the convolution times of each pixel, so that the semantic information of a local pixel block can be effectively focused, and each pixel is not mixed with the surrounding pixels to influence the segmentation precision.

In a specific implementation, if a plurality of identical dilation convolutions are superposed to find that a large number of holes exist in the perception field, the continuity and integrity between data are damaged, and local information is lost. Because the input signals are relatively sparse, the feature information extracted remotely does not have correlation, and the classification result is further influenced. Therefore, in the embodiment, a mixed expansion convolution is adopted, and the mixed expansion convolution has three characteristics, namely, the value of the expansion rate of the superposed convolution cannot be randomly selected, and the superposed convolution cannot contain a common divisor greater than 1 in a convention manner; the second characteristic is that the expansion rate is designed into a saw-toothed structure, the convolution with small expansion rate pays attention to short-distance information, and the convolution with large expansion rate pays attention to long-distance information, so that the requirement for segmenting large and small objects can be met simultaneously, more continuous information in the pixel range can be processed in the same configuration by setting different expansion rates, and a large amount of local information cannot be lost; the third characteristic is that the expansion ratio needs to satisfy a preset expansion formula so that no hole exists in the sensing visual field.

The preset expansion formula is:

M_i＝max[M_i+1-2r_i,M_i+1-2(M_i+1-r_i),r_i]

in the formula, M_iIs the maximum expansion ratio of the i-th layer, r_iThe expansion ratio of the ith layer.

In this embodiment, assuming n and K x K dilated convolution kernels, then M_n＝r_nDesigned so that M is₂<K. It should be noted that the expansion ratio in the group cannot have the common factor relationship of 2, 4, 8, etc., otherwise, the problem of mesh division in the network still exists.

The fruit identification task in the near scene is assumed to be aimed at, the size of the fruit in the image is changed greatly along with the selection of the shooting angle and the change of the shooting distance in the image acquisition process, and when the shooting distance is far, the fruit is distributed in the image in a large and small manner, so that the fruit identification task brings great difficulty to accurate segmentation. Therefore, in order to carry out semantic segmentation on fruits in multiple scenes, mixed expansion convolution with different expansion rates is superposed to adapt to targets with different sizes, and multi-scale detection of the image is achieved. Meanwhile, the depth of the model can be increased through superposition of mixed expansion convolution, so that the model learns deeper semantic features, and further the segmentation effect is optimized.

Step S30: and inputting the expanded plant image into a preset mixed expanded image segmentation model to obtain a plant segmentation image.

The processing mode for constructing the preset mixed expansion image segmentation model comprises the steps of respectively determining plant image category information corresponding to a plurality of plant image samples for obtaining the plurality of plant image samples under different scenes, respectively preprocessing the plurality of plant image samples according to the plant image category information to obtain a plurality of plant image training samples and a plurality of plant image verification samples, training an initial network model according to the plurality of plant image training samples to obtain an initial mixed expansion image segmentation model, and determining the preset mixed expansion image segmentation model according to the plurality of plant image verification samples and the initial mixed expansion image segmentation model. The plant image type information includes weather information, fruit number information, fruit size information, and the like of the plant image.

The processing mode of respectively preprocessing the plurality of plant image samples according to the plant image category information to obtain the plurality of plant image training samples and the plurality of plant image verification samples can be that image illumination information corresponding to the plurality of plant image samples is respectively determined according to the plant image category information, determining a preset gray world elimination rule according to the image illumination information, respectively processing a plurality of plant image samples according to the preset gray world elimination rule to obtain a plurality of gray image samples, respectively determining image brightness information corresponding to the plurality of gray images, determining a brightness processing function according to the image brightness information, respectively adjusting the brightness of the multiple gray-scale images through the brightness processing function to obtain multiple brightness image samples, and determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of brightness image samples.

It should be noted that, in order to enhance the richness of the green citrus data set, the collected plant images are preprocessed in color, brightness, rotation, noise, blur, and the like through the application program interface provided in the computer vision and machine learning software library, so as to expand the data set.

In this embodiment, different illumination conditions in the natural environment may cause a certain deviation between the color presented by the captured image and the actual color of the object, and the effect of illumination on color rendering may be eliminated by using the gray-scale world algorithm. The gray world algorithm is applied to a plurality of plant image samples, so that the influence of illumination on the image color in the natural environment can be eliminated, and the image is closer to the real color of a real object.

Due to the variability of the illumination intensity in reality, the brightness of the original data set image can be adjusted by changing the parameters of the brightness function in this embodiment. If the image brightness is too strong or too weak, the target edge may be unclear, and it may be difficult to draw a bounding box during manual annotation, which may affect the performance of the model. In order to avoid generating such images, the target edge can be accurately identified, and appropriate parameter ranges (0.3, 0.5 and 0.7) can be selected for brightness change, so as to obtain a plurality of brightness image samples. The method can simulate the illumination intensity condition which changes constantly in the natural environment, and make up for the defect that the neural network does not have robustness to various illumination intensities caused by centralized image acquisition time.

The processing mode for determining the multiple plant image training samples and the multiple plant image verification samples according to the multiple brightness image samples can be that the multiple brightness image samples are respectively rotated and mirrored to obtain multiple rotation image samples, the multiple rotation image samples are respectively denoised to obtain multiple noise image samples, then the multiple noise image samples are respectively blurred to obtain multiple blurred image samples, the multiple blurred image samples are divided according to a preset image proportion rule to obtain the multiple plant image training samples and the multiple plant image verification samples, and the preset image proportion rule can be set by a user in a self-defined mode and can be 6:4 or 8: 2.

It should be noted that randomly rotating multiple luminance image samples helps to improve the model generalization capability. In order to further expand the image data set, 90 °, 180 °, 270 ° rotation, mirror image processing and the like may be performed on the image, the anti-noise capability of the model may be increased if the model can learn new features in the noisy image, and the image may be processed using gaussian noise and salt-and-pepper noise to generate noisy images, i.e., multiple noisy image samples, for training the anti-noise capability of the model. It should be understood that in the actual shooting process, the shooting distance is far away, the mobile phone is not focused correctly or moves, and the acquired image becomes unclear. In order to improve the richness of the data set, the situation of image blurring in the actual shooting process is simulated, and the image is processed by using Gaussian blurring and median blurring respectively to obtain a plurality of blurred image samples and the like.

In the specific implementation, a plurality of plant image training samples and a plurality of plant image verification samples are determined according to a plurality of fuzzy image samples, an initial network model is trained according to the plurality of plant image training samples to obtain an initial mixed expanded image segmentation model, a preset mixed expanded image segmentation model is determined according to the plurality of plant image verification samples and the initial mixed expanded image segmentation model, then the expanded plant images are input into the preset mixed expanded image segmentation model to obtain plant segmentation images, and the plant segmentation images can be background-free segmentation images with fruits of different sizes, and the like.

Step S40: and carrying out plant identification according to the plant segmentation image.

It should be understood that the number of fruits and the size of the fruits, etc. can be identified based on the background-free segmented image in which there are fruits of different sizes.

In this embodiment, a plant image to be processed in a preset scene is first acquired, scene distance information of the plant image to be processed is determined, then expansion processing is performed on the plant image to be processed according to the scene distance information to obtain an expanded plant image, then the expanded plant image is input into a preset mixed expanded image segmentation model to obtain a plant segmentation image, and finally plant identification is performed according to the plant segmentation image. Compared with the prior art in which plant images need to be manually segmented, the efficiency of plant image processing is low, and in the embodiment, the preset mixed expansion image segmentation model is used for processing the plant images, so that the plant segmentation images are rapidly and accurately obtained, and the accuracy of plant identification is further improved.

In addition, referring to fig. 3, fig. 3 is a diagram illustrating a first embodiment of the plant identification method based on the hybrid dilation convolution according to the above, and a second embodiment of the plant identification method based on the hybrid dilation convolution according to the present invention is proposed.

In the second embodiment, before the step S10 in the method for identifying plants based on hybrid dilation convolution, the method further includes:

step S001: and acquiring a plurality of plant image samples under different scenes.

The plant image samples in different scenes can be plant image samples in cloudy days, plant image samples in sunny days and the like.

In order to adapt to the complicated and changeable practical situation, the constructed data set is close to the real environment by selecting different shooting weather and time, different shooting distances and different fruit quantities in the acquisition process, so that the data set has the illumination intensity diversity, the fruit size diversity and the fruit quantity diversity. In the embodiment, the shooting scene can be selected from cloudy days and sunny days, the shooting time can be selected from 2 pm, 5 pm and the like, and a plurality of plant image samples and the like are obtained according to the shooting scene and the shooting time.

Step S002: and respectively determining the plant image category information corresponding to the plurality of plant image samples.

The plant type information includes shooting distance information, fruit number information, fruit size information, weather condition information, and the like.

The shooting distance can be 1.0m and 0.5m away from the trunk, the number of fruits appearing in the image can be divided into small (1-5), medium (5-10) and large (more than 10), the size of the fruits in the image can be divided into small or large, and the like, referring to table 1, the table 1 is a sample category table of a plurality of plant images.

TABLE 1

Step S003: and respectively preprocessing the multiple plant image samples according to the plant image category information to obtain multiple plant image training samples and multiple plant image verification samples.

The method comprises the steps of respectively determining image illumination information corresponding to a plurality of plant image samples according to plant image category information, determining a preset gray world elimination rule according to the image illumination information, respectively processing the plurality of plant image samples according to the preset gray world elimination rule to obtain a plurality of gray image samples, respectively determining image brightness information corresponding to the plurality of gray images, determining a brightness processing function according to the image brightness information, respectively adjusting the brightness of the plurality of gray images through the brightness processing function to obtain a plurality of brightness image samples, and determining a plurality of plant image training samples and a plurality of plant image verification samples according to the plurality of brightness image samples.

Assume that multiple plant image samples are expanded from 2200 images to 5600, with A, B categories being 1200, C, D, E, F, G, H, I and J categories being 400, respectively. And selecting A, B categories of 840 images and the rest categories of 280 images, taking 3920 images in total as a plurality of plant image training sample training network models, and taking the rest 1680 images as a plurality of plant image verification sample verification models to verify the performance of the models and the like.

Step S004: and training an initial network model according to the plurality of plant image training samples to obtain an initial mixed expansion image segmentation model.

Respectively determining image distance information corresponding to a plurality of plant image training samples, respectively determining image sample expansion rates corresponding to the plurality of plant image training samples according to the image distance information, judging whether the image sample expansion rates meet a preset expansion factor relationship, and training an initial network model according to the image sample expansion rates corresponding to the plurality of plant image training samples and the plurality of plant image training samples when the image sample expansion rates meet the preset expansion factor relationship, so as to obtain an initial mixed expansion image segmentation model.

In a specific implementation, if a large number of holes exist by overlapping a plurality of identical dilation convolutions, continuity and integrity between data are damaged, and local information is lost. Because the input signals are relatively sparse, the feature information extracted remotely does not have correlation, and the classification result is further influenced. Therefore, in the embodiment, a mixed expansion convolution is adopted, and the mixed expansion convolution has three characteristics, namely, the value of the expansion rate of the superposed convolution cannot be randomly selected, and the superposed convolution cannot contain a common divisor greater than 1 in a convention manner; the second characteristic is that the expansion rate is designed into a saw-toothed structure, the convolution with small expansion rate pays attention to short-distance information, and the convolution with large expansion rate pays attention to long-distance information, so that the requirement for segmenting large and small objects can be met simultaneously, more continuous information in the pixel range can be processed in the same configuration by setting different expansion rates, and a large amount of local information cannot be lost; the third characteristic is that the expansion ratio needs to satisfy a preset expansion formula so that no hole exists in the sensing visual field.

It should be noted that the semantic segmentation SegNet model is a symmetric network composed of an encoder and a decoder, the encoder first analyzes and learns the characteristics of the pixels of the image to obtain high-order semantic information, and then the decoder classifies the pixels of the image by using the high-order semantic information. The growth state of fruits in a natural environment cannot be estimated, the fruits face interference of a plurality of environmental factors, and the size of the fruits in an image is suddenly changed due to selection of a shooting angle and change of shooting distance, so that a semantic segmentation model is required to have good robustness and the capability of segmenting the fruits with different sizes. In the embodiment, a SegNet model is improved by using mixed expansion convolution, the mixed expansion convolution can keep the sensing view field of SegNet, and meanwhile, convolutional layers with different expansion rates can be subjected to multi-scale detection, so that the influence of the rapid change of the size of the citrus on the segmentation precision is overcome. Hybrid scaled convolution (HDC) Block consists of three convolutional layers, the convolution kernel size is 3 × 3, the number of filters is 1024, the expansion rates are 1,2 and 5 respectively, and for further study on the influence of HDC Block on SegNet model, refer to table 2, where table 2 is a model design training result table.

TABLE 2

Newly-added module	Rate of accuracy/%)	Recall/%)
			Non-increasing module (SegNet model)	78.46	77.57
A set of HDC blocks	79.23	80.68
			Two sets of HDC blocks	82.43	82.89
Two sets of HDC blocks (each convolution containing BN layer, ReLU layer)	82.62	83.59

As can be seen from the above table, when a group of mixed expansion convolutions is added, the model has an obvious gain effect, and by using the symmetric structure of SegNet, when two groups of mixed expansion convolutions are added, the model has a better segmentation accuracy, and when two groups of mixed expansion convolutions are added, the expansion rate is set to [1,2,5], and after a BN (batch normalization) layer and a ReLU (linear rectification) layer are used after each layer of convolution, the segmentation effect of the model reaches the best. Due to the addition of the mixed expansion convolution, the model has multi-scale detection capability, and meanwhile, due to the appropriate network depth, more high-level semantic features can be extracted from the model, and the overall segmentation effect of the model is improved. And the BN layer and the ReLU layer are used after each convolution layer, and the stability of data distribution among layers is ensured by standardizing the mean value and the variance of fixed input, so that the convergence speed of the network can be accelerated, overfitting is controlled, loss and normalization operation can be eliminated or reduced, and the characterization capability of the whole neural network is ensured.

Two groups of HDC Block structures are added between an encoder and a decoder of a SegNet network, each convolution of the HDC Block structures comprises a BN layer and a ReLU layer, and the improved model, namely the initial mixed expansion image segmentation model, has the best semantic segmentation effect. The HDC Block is added between the encoder and decoder. The HDC Block structure is formed by a group of mixed expansion convolutions, wherein the convolution with a large expansion rate can be used for extracting and generating more abstract features for a large object, and the convolution with a small expansion rate is used for paying attention to short-distance information, so that the semantic segmentation effect of the model on a small object is better. The mixed expansion convolution group HDC Block is formed by combining the expansion convolutions with different expansion rates, so that the model has the capability of extracting the characteristics of objects with different sizes, namely multi-scale detection is realized to adapt to the continuously changing fruit size in the natural environment, and meanwhile, by adding the HDC Block structure, a network can extract high-level semantic characteristics, and more accurate semantic segmentation is realized.

Step S005: and determining a preset mixed expansion image segmentation model according to the multiple plant image verification samples and the initial mixed expansion image segmentation model.

And verifying the initial mixed expanded image segmentation model according to the multiple plant image verification samples to obtain a model evaluation index value, and taking the initial mixed expanded image segmentation model as a preset mixed expanded image segmentation model when the model evaluation index value meets a preset index condition. The preset index condition is within a preset threshold range and the like.

The model evaluation index score may be an accuracy rate, a recall rate, an F1 score, a loss function (SSIM), and the like. The accuracy rate can be understood as the ratio of the frames which belong to the correct prediction frame in the currently traversed prediction frame, namely the overall prediction accuracy; the recall rate can be understood as the ratio of the currently detected label frame to all the label frames, namely the probability of being predicted as a positive sample in the actually positive samples; f1 score as harmonic mean; SSIM may be an index that takes into account brightness (luminance), contrast (contrast), and structure (structure) for evaluating the quality of an image after compression, and the like.

In this embodiment, an HDC-SegNet model and a SegNet model may be trained according to a plurality of plant image training samples, referring to fig. 4 and 5, fig. 4 is a loss graph of the HDC-SegNet model according to the second embodiment of the plant identification method based on hybrid dilation convolution of the present invention, fig. 5 is a loss graph of the SegNet model according to the second embodiment of the plant identification method based on hybrid dilation convolution of the present invention, and it can be known from the graph that the final loss of the SegNe t model is about 4.0 and the final loss of the HDC-SegNet model is about 2.5. After 2000 training steps, loss curves of the two models begin to converge, the loss curves are not reduced after 4000 training steps, then the accuracy, the recall rate, the F1 score, the SSIM value and the processing time of the segNet model, the PSPNet model and the HDC-segNet model before and after improvement are compared according to a plurality of plant image training samples, and the method refers to the table 3, wherein the table 3 is a model evaluation index score comparison table.

TABLE 3

As can be seen from the table above, the accuracy of the improved model is improved by 4.16%, the recall rate is improved by 6.02%, the F1 fraction is improved by 5.09%, and the SSIM value is improved by 2.88% compared with the model before the improvement. Because the HDC-SegNet model has the multi-scale detection capability, small fruit targets which are missed by the SegNet model can be segmented, and the recall rate is greatly improved. The improved model accuracy is improved, and the segmentation result image is closer to the group Truth, so that the SSIM value is improved. The improved model has the effect similar to that of the PSPNet model, and the HDC-SegNet model only stores the maximum index of the feature mapping, so that the memory is saved, the model is more efficient, and the time for processing the image by the HDC-SegNet model is shorter.

It should be noted that five preprocessing methods are used in the data preprocessing part to enhance the image, so that the algorithm is more robust. To verify the effect of these five data enhancement methods on the training model, one data enhancement method was deleted at a time and the F1 score for the model was observed to change.

The color balance processing has the largest influence on the performance of the model, the F1 fraction of the model is reduced by 4.41 percent after the color balance processing is removed, and the influence of the rotation transformation processing on the model is the smallest. This is because the color shift of the image is corrected by the color balance processing, which makes the image appear closer to the real object, and the rotation transformation has less influence on the characteristics of the fruit in the image, so that the model can still better classify the pixel points. Experimental results show that the richness of the data set can be improved by carrying out various preprocessing on the data set, so that the model can learn more scenes close to reality, and the model is suitable for the complex and changeable conditions in the actual scene.

In order to verify the segmentation capability of the model to images with different fruit numbers, 200 images with large fruit sizes, 1-5, 5-10 and more than 10 fruits are selected respectively, and the segmentation effect of the HD C-SegNet model, the SegNet model and the PSPNet model is compared and tested. For more than 10 fruit images, the accuracy of the improved model is improved by 1.09%; for 5-10 fruit images, the accuracy of the improved model is improved by 0.39%; for 1-5 fruit images, the accuracy of the improved model is improved by 0.82%. Because the PSPNet model is based on structure prediction, the result is more detailed when the fruit is segmented, and the segmentation of the target edge is more accurate, so that the PSPNet model has a better segmentation effect than the SegNet model. Referring to table 4, table 4 is a comparison table of accuracy rates of the models under different fruit numbers, and the overall segmentation effect of the improved model, i.e., the HDC-SegNet model, is better than that of the model before the improvement.

TABLE 4

Number of fruits/number	HDC-SegNet model/%	SegNet model/%)	PSPNet model/%)
				1～5	91.45	90.63	91.63
5～10	90.34	89.95	90.29
				Greater than 10	84.57	83.48	84.26

In order to further explore the performance of the model under the condition of different fruit sizes, 200 images with the fruit number being middle and the fruit size being large and small respectively are selected, and the segmentation effect of the HDC-SegNet model, the SegNet model and the PSPNet model is tested in a comparison mode. Referring to table 5, table 5 is a comparison table of the accuracy of each model under different fruit sizes, and it can be seen from table 5 that the improved model has significant advantages for the segmentation effect of small fruits, which is increased by 8.25% compared to the improved model and 5.05% compared to the PSPNet model. The PSPNet model adopts a multi-scale feature combination mode to improve the segmentation performance, so that the PSPNet model has better segmentation results for fruits with different sizes compared with a SegNet model. The experimental results show that the HDC-SegNet model has a good segmentation effect on small fruits and large fruits, and particularly has an obvious gain on the segmentation effect of the small fruits, so that the multi-scale detection formed by mixed expansion convolution can help the semantic segmentation model to better segment the small fruits, the generalization of the model is stronger, and the model can adapt to segmentation tasks of different fruit sizes.

TABLE 5

Fruit size	HDC-SegNet model/%	SegNet model/%)	PSPNet model/%)
				Big (a)	91.75	89.49	91.28
Small	83.62	75.37	78.57

In this embodiment, a plurality of plant image samples under different scenes are obtained first, plant image category information corresponding to the plurality of plant image samples is determined respectively, then the plurality of plant image samples are preprocessed according to the plant image category information to obtain a plurality of plant image training samples and a plurality of plant image verification samples, finally an initial network model is trained according to the plurality of plant image training samples to obtain an initial mixed dilated image segmentation model, a preset mixed dilated image segmentation model is determined according to the plurality of plant image verification samples and the initial mixed dilated image segmentation model, compared with the prior art in which a coder part is used to analyze and learn features in pixels of an image to obtain high-order semantic information, a decoder part is used to classify the pixels in the image by using the high-order information, while in this embodiment, mixed dilated convolution is added to the preset mixed dilated image segmentation model, the method realizes the segmentation operation of plants with different sizes, and further more accurately segments the plants with the close scenery in the natural environment.

Furthermore, an embodiment of the present invention further provides a storage medium, where the hybrid dilation convolution based plant identification program is stored, and the hybrid dilation convolution based plant identification program, when executed by a processor, implements the steps of the hybrid dilation convolution based plant identification method as described above.

In addition, referring to fig. 6, an embodiment of the present invention further provides a mixed dilation convolution based plant identification apparatus, where the mixed dilation convolution based plant identification apparatus includes:

the acquiring module 6001 is configured to acquire a plant image to be processed in a preset scene, and determine scene distance information of the plant image to be processed.

And the processing module 6002 is configured to perform expansion processing on the plant image to be processed according to the scene distance information to obtain an expanded plant image.

The preset expansion formula is:

M_i＝max[M_i+1-2r_i,M_i+1-2(M_i+1-r_i),r_i]

A segmentation module 6003, configured to input the dilated plant image into a preset mixed dilated image segmentation model to obtain a plant segmentation image.

An identifying module 6004, configured to perform plant identification according to the plant segmentation image.

Other embodiments or specific implementation manners of the plant identification device based on the hybrid expansion convolution of the present invention may refer to the above method embodiments, and are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order, but rather the words first, second, third, etc. are to be interpreted as names.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g., a Read Only Memory (ROM)/Random Access Memory (RAM), a magnetic disk, an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A plant identification method based on mixed dilation convolution is characterized in that the plant identification method based on mixed dilation convolution comprises the following steps:

2. The method as claimed in claim 1, wherein the step of performing dilation processing on the plant image to be processed according to the scene distance information to obtain a dilated plant image comprises:

3. The method as claimed in claim 1, wherein the step of obtaining the plant image to be processed in the preset scene and determining the scene distance information of the plant image to be processed is preceded by the steps of:

acquiring a plurality of plant image samples under different scenes;

4. The method according to claim 3, wherein the step of preprocessing the plurality of plant image samples according to the plant image category information to obtain a plurality of plant image training samples and a plurality of plant image verification samples comprises:

5. The method of claim 4, wherein the step of determining a plurality of plant image training samples and a plurality of plant image verification samples from the plurality of grayscale image samples comprises:

6. The method of claim 5, wherein the step of determining a plurality of plant image training samples and a plurality of plant image verification samples from the plurality of luminance image samples comprises:

7. The method according to any one of claims 2 to 6, wherein the step of training an initial network model based on the plurality of plant image training samples to obtain an initial hybrid dilated image segmentation model comprises:

8. The method of claim 7, wherein the step of determining a preset blended dilated image segmentation model from the plurality of plant image verification samples and the initial blended dilated image segmentation model comprises:

9. A hybrid dilation convolution based plant identification apparatus, wherein the hybrid dilation convolution based plant identification apparatus comprises:

10. A storage medium, characterized in that the storage medium stores thereon a hybrid dilation convolution based vegetation identification program, which when executed by a processor implements the steps of the hybrid dilation convolution based vegetation identification method according to any one of claims 1 to 8.