CN112037294B

CN112037294B - Method and system for counting interpretable plants

Info

Publication number: CN112037294B
Application number: CN202010800336.1A
Authority: CN
Inventors: 曹治国; 陆昊; 李亚楠
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2024-03-19
Anticipated expiration: 2040-08-11
Also published as: CN112037294A

Abstract

The invention discloses an interpretable plant counting method and system, and belongs to the field of agricultural automatic observation. The method comprises the following steps: sequentially encoding and decoding the plant original graph, and fusing the feature mapping obtained by decoding with a soft mask to realize feature segmentation; performing dot multiplication on the feature mapping with the resolution set in the encoding stage and the feature mapping obtained by segmentation; convolving the feature mapping obtained by the point multiplication to obtain local count, and dynamically distributing the weight of the local count to obtain a redundant count diagram; and (3) carrying out normalization and visualization treatment on the redundant counting graph to obtain an interpretable plant counting graph. The method of the invention does not roughly and evenly distribute the counting result into the local area directly, but improves the output resolution of the local counting diagram and enhances the interpretability of the output counting diagram by reassigning the characteristic weight of the local area.

Description

Method and system for counting interpretable plants

Technical Field

The invention belongs to the field of agricultural automatic observation, and particularly relates to an interpretable plant counting method and system.

Background

Plant counting has wide application in agricultural breeding, plant phenotype identification, crop monitoring and yield forecasting. By phenotyping geometric features such as leaves or organs, breeders can identify desirable crop varieties. The emergence rate of the plant ears is a key indicator for monitoring the key growth stage of crops, the density of the plant ears and the number of fruits are also closely related to yield, and plant counting is required to obtain all these indexes. In conventional agriculture, plant counting is usually performed manually by means of random sampling and visual observation. However, manual counting is cumbersome, time-consuming, laborious and error-prone, and it does not meet the requirements of modern high-throughput phenotyping.

The growing interest in developing automated plant counting tools based on digital images is growing due to the popularity of flexible plant phenotype platforms such as Unmanned Aerial Vehicles (UAVs), increasing computing power year by year, and third wave artificial intelligence (deep learning). Many automated plant counting tools are driven by the success of deep Convolutional Neural Networks (CNNs). However, the inference of CNN is generally considered a black box model. Thus, to infer which instance is counted, the agricultural worker wishes that the output generated by at least one algorithm be interpretable. Since the interpretability of an output is closely related to the way the output is visualized, this necessity drives how a practitioner selects a certain algorithmic framework to generate the output visualization.

Since 2008, object counting in computer vision was considered an independent area of research and introduced regression counting. While the regression count has three sub-branches. First, chan et al in 2008 published paper "Privacy preserving crowd monitoring: counting people without people models or tracking" regressed the global image count, which is equivalent to optimizing the Mean Absolute Error (MAE) of the image level. Xiong Haipeng et al in 2019 published paper "From open set to closed set: counting objects by spatial divide-and-conquer" indicated that regression of global image counts was difficult to optimize due to the large image variation and the open nature of the count values. A good alternative (second branch) is to regress density maps introduced by lemithsky and zisselman in 2010. The third branch is block-wise regression and local counting, which was the earliest in 2012 as in Chen et al paper "Feature mining for localized crowd counting". It has been shown in the literature that this local counting framework is particularly suitable for plant counting, since it is very sensitive to local size variations, whereas plants themselves grow and undergo variations, so it is important to enhance the robustness of the model. The paper "Adaptive mixture regression network with local counting map for crowd counting" recently published in Liu Xiyang et al 2020 shows that local counts produce smaller average absolute errors compared to density maps.

Although the local counting framework is effective, one disadvantage of this is that the generated visual result is very coarse and has poor interpretability, i.e. the individual distribution of the object to be detected cannot be visually seen from the output counting result diagram. In practical application, the requirement of people on the counting problem is not only "accurate in counting", but also "clear in counting", namely the counted object can be clearly seen from the counting result diagram. Fig. 10 is a graph of the rice image shown in fig. 4 obtained by processing the rice image using a conventional local counting algorithm, and it can be seen that although the graph can obtain the result of estimating the target number, the visual effect is very rough, and the refined target information to be counted cannot be observed from the graph. Therefore, it is necessary to propose a new plant counting method which can accurately detect the number of plants and obtain a visual output with good interpretation.

Disclosure of Invention

In response to the above-identified deficiencies or improvements of the prior art, the present invention provides an interpretable plant counting method and system that is aimed at both accurately detecting plant numbers and obtaining interpretable visual output.

To achieve the above object, according to one aspect of the present invention, there is provided an interpretable plant counting method including:

s1, coding and decoding plant original pictures in sequence, and fusing the feature mapping obtained by decoding with a soft mask to realize feature segmentation; the feature mapping segmentation process is supervised by a pseudo-true value mask obtained by dot expansion and transformation of a manually marked dot mark graph;

s2, performing dot multiplication on the feature map with the resolution set in the encoding stage and the feature map obtained by segmentation; the feature mapping resolution obtained by decoding is the same as the set resolution;

s3, convolving the feature mapping obtained by the point multiplication to obtain local counts, and dynamically distributing weights of the local counts to obtain a redundant count diagram; the dynamic allocation result and the local counting result are supervised by a truth redundancy counting diagram obtained by manually marked point marking diagrams through Gaussian convolution and stride convolution;

and S4, carrying out normalization and visualization treatment on the redundant counting graph to obtain an interpretable plant counting graph.

Further, the coding process is specifically that MixNet-L is adopted to code the plant original graph, and feature mapping with resolution of 1/2, 1/4, 1/8, 1/16 and 1/32 is obtained in sequence.

Further, the decoding process specifically includes:

processing the feature map with the resolution of 1/32 obtained by encoding by adopting a context encoding module ASPP;

performing up-sampling operation on the output of the context coding module and convolving the up-sampling operation with the feature map with the resolution of 1/16 in the coding stage to obtain a new feature map with the resolution of 1/16;

and carrying out up-sampling operation on the new feature map with the resolution of 1/16, and convolving the feature map with the resolution of 1/8 in the encoding stage to obtain the new feature map with the resolution of 1/8.

Further, in step S1, feature segmentation is implemented by fusing the decoded feature map and the soft mask, specifically, feature segmentation is implemented by fusing the new feature map with 1/8 of resolution and the soft mask.

Further, in step S3, the dynamically assigning the local count weight specifically includes:

generating weight mapping by averaging the feature mapping obtained by the dot multiplication through channel-by-channel;

spatially normalizing the weight map using a Softmax function;

the redistributed count map is generated by multiplying the weight map with the local counts.

According to another aspect of the present invention there is provided an interpretable plant counting system, comprising: encoder, decoder, segmentation module, counter, generator, normalizer and visualizer;

an encoder for encoding the plant original map;

a decoder for decoding the feature map output by the encoder;

the segmentation module is used for fusing the feature mapping output by the decoder with the soft mask to realize feature segmentation; performing dot multiplication on the segmentation result and the feature mapping with the set resolution in the encoder; wherein, the feature mapping resolution outputted by the decoder is the same as the set resolution;

the counter is used for receiving the point multiplication result, convoluting the point multiplication result to obtain local count, and dynamically distributing the weight of the local count to obtain a redundant count diagram;

the generator is used for performing point expansion and transformation on the manually marked point mark graph to obtain a pseudo-true mask, and monitoring the segmentation module by adopting the pseudo-true mask; carrying out Gaussian convolution and stride convolution on the manually marked point mark graph to obtain a true redundancy count graph, and monitoring a counter by adopting the true redundancy count graph;

the normalizer is used for carrying out normalization processing on the redundant count diagram;

and the visualizer is used for upsampling the output result of the normalizer to obtain an interpretable plant count diagram.

Further, the specific implementation process of the encoder is that MixNet-L is adopted to encode the plant original graph, and feature maps with resolution of 1/2, 1/4, 1/8, 1/16 and 1/32 are obtained in sequence.

Further, the specific implementation process of the decoder is as follows:

Further, the dynamic allocation of the local count weight in the counter specifically includes:

spatially normalizing the weight map using a Softmax function;

In general, the above technical solution conceived by the present invention can achieve the following advantageous effects compared to the prior art.

(1) The method of the invention does not roughly and evenly distribute the counting result into the local area directly, but through reassigning the characteristic weight of the local area, the output resolution of the local counting diagram is improved, the fine distribution of the targets to be counted is realized, more visual and accurate counting images are output, and the interpretability of the output counting diagram is enhanced.

(2) The invention uses the soft mask to modulate the feature map, so that each feature map has different weights, not only can provide the position information of plants, but also can effectively inhibit the background interference, so that the counter is focused on the front Jing Shili, and a more accurate count estimation value is obtained.

Drawings

FIG. 1 is a block diagram of an illustrative plant counting system provided by the present invention;

FIG. 2 is a flow chart of an encoder and a single decoder provided by the present invention;

FIG. 3 is a diagram of a dynamic allocation process provided by the present invention;

FIG. 4 is an original view of a piece of field rice provided by the invention;

FIG. 5 is an original view of a piece of field corn provided by the present invention;

FIG. 6 is an original view of a field wheat provided by the present invention;

FIG. 7 is a truth chart of the rice labeled in FIG. 4 provided by the present invention;

FIG. 8 is a truth chart of the corn labeled in FIG. 5 provided by the present invention;

FIG. 9 is a truth chart of the wheat labeled in FIG. 6 provided by the present invention;

FIG. 10 is a graph of rice counts obtained by performing the conventional local counting algorithm of FIG. 4;

FIG. 11 is a graph showing the result of counting rice obtained by the method of the present invention in FIG. 4;

FIG. 12 is a graph of the corn count results obtained for FIG. 5 using the method of the present invention;

FIG. 13 is a graph showing the results of counting wheat obtained by the method of the present invention with respect to FIG. 6.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

The embodiment of the invention provides an interpretable plant counting system, and the functions and specific implementation processes of each module of the system are described in detail below;

(1) As shown in fig. 1, the original crop map of fig. 4-6 (specification h×w×m, h is image height, w is image width, m is number of characteristic channels) is first input to an encoder, which encodes the image through five encoding stages (1/2, 1/4, 1/8, 1/16, and 1/32) of MixNet-L in order to obtain a sufficiently large receptive field. The decoder has two branches, counting and splitting, and the specific codec procedure is shown in fig. 2. The counting branch directly takes as its output the characteristic map of the encoded 1/8 input resolution. In order to overcome the problem that similar backgrounds cannot be distinguished due to limited receptive fields, the invention introduces segmentation branches to inhibit background interference. Since the context information has obvious effect of removing the background interference, the coding part in the invention follows a coding and decoding architecture similar to a characteristic pyramid network, and a context coding module ASPP is adopted in a decoder. Furthermore, the present invention also uses the data dependent up-sampling operator CARAFE in order to get feature maps of the same resolution size (1/8 input resolution) as the count branches in the split branches due to the different receptive fields of the codec. Specifically, the feature map with the resolution of 1/32 in the decoder needs to go through the context coding module ASPP first, then the upsampling operator carrafe performs an upsampling operation and convolves the upsampling operation with the feature map with the original resolution of 1/16 in the encoder to obtain a new feature map with the resolution of 1/16. Then the feature map is up-sampled again and convolved with the original feature map with 1/8 of the resolution in the encoder to obtain a feature map with 1/8 of the final resolution, and finally the new feature map is transmitted to the segmentation module as the output of the segmentation branch.

(2) The segmentation module fuses the soft mask with the feature map with the resolution of 1/8 generated by the segmentation branch of the decoder in the step (1), and outputs the segmentation map so as to provide priori knowledge of the foreground and the background. The invention uses soft mask (soft mask is image mask with gray value between 0-1) to modulate feature map, so that each feature map has different weight, namely weighting the feature, which can provide plant position information and restrain background feature. The segmentation map generated by the segmentation module can effectively suppress background interference so that the counter is focused on the foreground instance. Since the probability of the segmentation module marking the background as foreground is much smaller than the probability of the foreground as background, the segmentation module only needs to correctly distinguish the background area with zero count. And finally, carrying out point multiplication on the segmentation map generated by the segmentation module and the feature map in the counting branch in the codec to obtain a new feature map with the resolution of 1/8, wherein the feature map has a weight capable of inhibiting background interference, and then, taking the feature map as input to be transmitted into the counter module.

(3) The counter firstly receives the feature map obtained in the step (2), wherein the feature map is a result obtained by performing dot multiplication on the feature map of the counting branch in the encoder and the segmentation map generated by the segmentation module. Secondly, convolving the feature map to obtain local counts; and then reassigning the local counting result through a dynamic assignment module, and finally forming a redundant counting diagram. The conversion process of the counter is thatWhere o is referred to as the output span and has a value of S _e ×S _c 。S _c For the downsampling rate of the counter (typically set to 1), S _e Is the downsampling rate of the encoder. Partial count with dynamic allocation concept during conversionLine reassignment, dynamic assignment is in fact a novel upsampling operator that can improve the interpretation ability of the output by reassigning the count weight distribution. The purpose of the dynamic allocation is to allocate local counts of regions in the low resolution map (1/16 or 1/32) to several regions of the high resolution map (1/8) according to allocation weights. The dynamic state is represented by that the distribution weight changes along with the change of the characteristics, and the characteristics of each region in the image are different, so that the low-resolution local count diagram is mapped to the high resolution by reassigning the low-resolution local count diagram. FIG. 3 is a process diagram of dynamic allocation, wherein the specific process of the reallocation is to generate a weight map by using feature maps obtained after point multiplication through channel-by-channel average, then further normalize the weight map in space by using a Softmax function, and finally generate a redistributed count map by multiplying the weight map with local counts. The counter module passes this reassigned count map as a redundant count map to the normalizer for processing.

(4) The generator performs point expansion and euclidean distance-based transformation on the manually labeled point marker graph (the manually labeled point marker graph is obtained by labeling a plant in the original plant graph, and a general labeling rule is labeled in the center of a counting target in the counting field) as shown in fig. 7-9 to generate a pseudo-true value mask. If P is to be _x,y And F _x,y Defined as the pixel indexed by coordinates x and y, respectively, and the nearest marker point, then the point expansion can be seen as centered, distanceIs a circle of radius. By setting a single threshold S _x,y Obtaining a false true value mask, wherein the threshold expression is as follows:

when τ is set to p/2, p represents the size of the plant image, and the counter result is not affected by the segmentation module. As shown in fig. 3, the pseudo-true mask generated in this step can be used as a supervisory signal to optimize the segmentation effect of the segmentation module. The generator module may provide not only the segmentation module with a pseudo-value mask with supervision information, but also a supervision signal for the counter. Specifically, the artificially labeled point marker graph can generate a density image through Gaussian convolution, and a true redundancy count graph can be generated through stride convolution on the density image. The count map may be used as supervisory information to optimize the effect of the counter.

(5) The normalizer processes the redundant count images output by the counter, first calculates the count number of the images, judges whether adjacent partial images overlap, and normalizes the overlapped images. The normalization condition is to compare the sizes of o and p, p being the size of the crop plot. When o=p, normalization is not necessary. When o < p, it is indicated that there is overlap between adjacent partial image blocks, and normalization processing is required. The treatment process is as follows: without changing the spatial resolution, the redundant count map passesAfter conversion a normalized count image is generated, the sum of which is the final count number of the global image, i.e. the total number of plants.

(6) The visual output is to output the count diagram after normalization processing in the step (5), such as the count result diagrams shown in fig. 11-13. The process is a typical upsampling process, i.e. passing the low resolution image throughTransformed, mapped into a high resolution image. Fig. 11-13 are graphs of the counting results obtained by performing the algorithm processing of the present invention on the rice image, the corn ear image and the wheat ear image, respectively, and from these three graphs, it is clear that some pixel blocks different from the background area are seen, and the positions of these pixel blocks are the positions of the counting targets in the graph. According to the positions of the pixel blocks, the judgment of which targets in the graph are correctly counted can be easily carried out, and the false counting targets can be more easily diagnosed. FIG. 10The image of rice is processed by the common local counting algorithm to obtain a counting result diagram, and the pixel blocks in fig. 10 are very large, even the whole image is covered with the pixel blocks, so that the objects which are counted cannot be distinguished from the pixel blocks at all. Whereas the pixel blocks in fig. 11 are finer, the counting object and the background area can be clearly distinguished.

For the interpretable plant counting method provided by the invention, as each step and the specific implementation process of the method are consistent with the corresponding functions of each module of the system, the invention is not repeated here.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An interpretable plant counting method, comprising:

2. The method of claim 1, wherein the coding process is specifically that a MixNet-L is used to code the plant raw map, and feature maps with resolutions of 1/2, 1/4, 1/8, 1/16 and 1/32 are obtained in sequence.

3. An interpretable plant counting method according to claim 1 or 2, wherein the decoding process comprises:

4. An interpretable plant counting method according to claim 3, wherein in step S1, feature segmentation is implemented by combining the decoded feature map with a soft mask, specifically by combining a new feature map with a resolution of 1/8 with a soft mask.

5. An interpretable plant counting method according to any one of claims 1 to 4, wherein the step S3 of dynamically assigning local counting weights includes:

spatially normalizing the weight map using a Softmax function;

6. An interpretable plant counting system, comprising: encoder, decoder, segmentation module, counter, generator, normalizer and visualizer;

an encoder for encoding the plant original map;

a decoder for decoding the feature map output by the encoder;

7. The system of claim 6, wherein the encoder is configured to encode the plant source map using MixNet-L to obtain feature maps having resolutions of 1/2, 1/4, 1/8, 1/16, and 1/32 in sequence.

8. An interpretable plant counting system according to claim 6 or 7, wherein the decoder is implemented as:

9. An interpretable plant counting system according to any one of claims 6 to 8, wherein the local counting weights are dynamically assigned in the counter, in particular comprising:

spatially normalizing the weight map using a Softmax function;