CN111860623A

CN111860623A - Method and system for counting tree number based on improved SSD neural network

Info

Publication number: CN111860623A
Application number: CN202010635369.5A
Authority: CN
Inventors: 韩巧玲; 刘雷; 赵燕东; 赵玥; 席本野; 宋美慧; 李晨曦
Original assignee: Beijing Forestry University
Current assignee: Beijing Forestry University
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-10-30

Abstract

The embodiment of the invention discloses a method and a system for counting the number of trees based on an improved SSD neural network, wherein the method comprises the following steps: acquiring an image, wherein the image is an aerial environment image; inputting the image into a pre-trained neural network model for identifying the tree, so as to identify the tree in the image through the neural network model, wherein the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence manner, and a plurality of deconvolution layers for performing feature fusion on feature maps of the plurality of scales in a one-to-one correspondence manner, so that a new feature map after fusion is obtained through the plurality of deconvolution layers. According to the method for counting the number of trees based on the improved SSD neural network, the neural network model is trained before the images are analyzed, so that the method has universality on the images in a complex environment, the number of trees can be accurately identified, and the precision of forestry resource counting is improved.

Description

Method and system for counting tree number based on improved SSD neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a system for counting the number of trees based on an improved SSD neural network.

Background

Convolutional neural networks are widely used in image classification, pedestrian detection, and landmark identification. For example: and classifying and identifying buildings, woodlands, water bodies and the like in the images aerial by the unmanned aerial vehicle by using a convolutional neural network. However, the environment information in the forest images acquired by the unmanned aerial vehicle is complex, and the precision is poor in the current neural network identification process, namely: identifying trees is not accurate enough.

Disclosure of Invention

Based on the problems in the prior art, the embodiment of the invention provides a method and a system for counting the number of trees based on an improved SSD neural network.

In a first aspect, an embodiment of the present invention provides a method for counting the number of trees based on an improved SSD neural network, including:

acquiring an image, wherein the image is an aerial environment image;

inputting the image into a pre-trained neural network model for identifying trees so as to identify the trees in the image through the neural network model,

the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence mode and a plurality of deconvolution layers for performing feature fusion on feature images of the plurality of scales in a one-to-one correspondence mode, and the plurality of deconvolution layers are used for obtaining new feature images after fusion.

In some examples, the method further includes the step of training the neural network model, specifically:

acquiring a plurality of image training samples;

carrying out multi-scale feature map mapping on a plurality of image training samples;

and carrying out prior frame matching and data enhancement training on the multi-scale characteristic graph to obtain the trained neural network model for identifying the trees.

In some examples, the multi-scale feature map mapping of the plurality of image training samples includes:

and extracting the feature maps of multiple scales by using multiple convolution layers capable of extracting the feature maps of multiple scales, and fusing the feature maps with the front ends and the rear ends of the corresponding deconvolution layers to obtain a new fused feature map.

In some examples, the performing prior frame matching and data enhancement training on the multi-scale feature map to obtain the trained neural network model for identifying the tree includes:

setting prior frames with different scales or aspect ratios based on the feature maps of a plurality of scales;

and performing data enhancement on the multi-scale feature map by one or more of horizontal turning, cutting, zooming in and zooming out.

In some examples, the training the neural network model includes:

For each real target in a plurality of image training samples, finding a prior frame which is intersected with the real target and has the largest IOU;

matching with a corresponding real target through the prior frame;

for the remaining unmatched prior frames, if the intersection ratio IOU of the real target is greater than a preset value, matching with the corresponding real target through the prior frames;

and sampling the negative samples, performing descending order arrangement according to confidence coefficient errors during sampling, and selecting top-k with larger errors as training negative samples.

In some examples, when the tree in the image is identified through the neural network model, the larger the scale of the multi-scale feature map is, the smaller the target used for detection is.

In some examples, in the neural network model, classification of a plurality of image training samples is performed by a set loss function.

Second aspect, an embodiment of the present invention further provides a system for counting the number of trees based on an improved SSD neural network, including:

the device comprises an acquisition module, a display module and a control module, wherein the acquisition module is used for acquiring images, and the images are aerial environment images;

and the classification module is used for inputting the image into a pre-trained neural network model for identifying the tree so as to identify the tree in the image through the neural network model, wherein the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence manner and a plurality of deconvolution layers for performing feature fusion on feature maps of the plurality of scales in a one-to-one correspondence manner, and a new feature map after fusion is obtained by the plurality of deconvolution layers.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the method for counting the number of trees based on the improved SSD neural network according to the first aspect is implemented.

In a fourth aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for improving the SSD neural network-based statistical tree number according to the first aspect.

According to the technical scheme, the method and the system for counting the number of trees based on the improved SSD neural network provided by the embodiment of the invention train the neural network model before image analysis, so that the method and the system have universality on images in a complex environment, can accurately identify the number of trees, and further improve the accuracy of forestry resource counting.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a method for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a neural network model in a method for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a system for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The following describes a voice detection method and system according to an embodiment of the present invention with reference to the drawings.

Fig. 1 shows a flowchart of a method for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention, and as shown in fig. 1, the method for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention specifically includes the following steps:

s101: and acquiring an image, wherein the image is an aerial environment image.

S102: inputting the image into a pre-trained neural network model for identifying the tree, so as to identify the tree in the image through the neural network model, wherein the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence manner, and a plurality of deconvolution layers for performing feature fusion on feature maps of the plurality of scales in a one-to-one correspondence manner, so that new feature maps after fusion are obtained through the plurality of deconvolution layers. Namely: when the trees in the image are identified through the neural network model, the larger the scale of the multi-scale feature map is, the smaller the target used for detection is.

In a specific example, the neural network model is trained first, specifically: acquiring a plurality of image training samples; carrying out multi-scale feature map mapping on a plurality of image training samples; and carrying out prior frame matching and data enhancement training on the multi-scale characteristic graph to obtain the trained neural network model for identifying the trees.

The training process is as follows: for each real target in a plurality of image training samples, finding a prior frame which is intersected with the real target and has the largest IOU; matching with a corresponding real target through the prior frame; for the remaining unmatched prior frames, if the intersection ratio IOU of the real target is greater than a preset value, matching with the corresponding real target through the prior frames; and sampling the negative samples, performing descending order arrangement according to confidence coefficient errors during sampling, and selecting top-k with larger errors as training negative samples.

The multi-scale feature map mapping is performed on a plurality of image training samples, and comprises the following steps: and extracting the feature maps of multiple scales by using multiple convolution layers capable of extracting the feature maps of multiple scales, and fusing the feature maps with the front ends and the rear ends of the corresponding deconvolution layers to obtain a new fused feature map.

Carrying out prior frame matching and data enhancement training on the multi-scale feature map to obtain the trained neural network model for identifying the trees, wherein the method comprises the following steps: setting prior frames with different scales or aspect ratios based on the feature maps of a plurality of scales; and performing data enhancement on the multi-scale feature map by one or more of horizontal turning, cutting, zooming in and zooming out.

In the neural network model, a plurality of image training samples are classified through a set loss function.

The process of identifying the number of trees from the aerial image of the environment is described in detail below.

And acquiring a complex environment image, namely an aerial environment image. In the example, images are read in batch, the real target of each target tree image can be matched with the prior frame, and in order to match the prior frame, calibration software is used for calibrating the acquired images. The information is converted into pixel values of the image and input into a network (i.e., a neural network model). This is the process of data acquisition, namely: and (4) acquiring an aerial environment image.

And carrying out multi-scale feature map mapping on the calibration image. In this example, as shown in fig. 2, using conv _7, conv8_2, conv9_2, conv10_2, conv11_2, these 5 different convolutional layers extract the feature map, and obtain a new feature map by fusing with the front and back ends of the deconvolution layer; and training the plurality of feature maps by carrying out prior frame matching and data enhancement to obtain a neural network model.

In the above description, feature maps of different sizes are obtained from multi-scale mapping. And each unit is provided with a priori frame with different scales or aspect ratios, and the predicted bounding box takes the priori frames as a reference, so that the training difficulty is reduced to a certain extent. In the aspect of data enhancement, various methods of data enhancement are used, including horizontal turning, cropping, zooming in and out, and the like. The data enhancement operation can increase the number of training samples, construct more targets with different shapes and sizes at the same time, and input the targets into the network, so that the network can learn more robust features. And finally, classifying through a loss function.

And putting the image to be recognized into a neural network model for detection to obtain a recognition result.

Specifically, trees in the images are identified by using the neural network model obtained through training, so that statistics on forestry resources is realized.

According to the embodiment of the invention, the neural network model is trained before the image analysis, so that the method has universality on the images in the complex environment, the number of trees can be accurately identified, and the forestry resource statistics accuracy is further improved.

In one or more examples, the image is multi-scale feature map mapped. The method specifically comprises the following steps: for each feature map, k default boxes are generated with different sizes and aspect ratios. The scale adopts feature maps with different sizes, the front feature map is larger, the size of the feature map is gradually reduced by convolution or pool with stride being 2, a plurality of scale maps are used for detection, the larger feature map is used for detecting a relatively smaller target, and the smaller feature map is responsible for detecting a large target, so that the comprehensive detection of the image is achieved.

As shown in fig. 2, it can be seen from the setup of the convolutional layer that the single-layer convolutional layer cannot be detected for one picture, so that the number of convolutional layers and the scale of the feature map are improved, and a basis is provided for subsequent identification.

Namely: the SSD neural network of the embodiment of the invention is an improved SSD neural network, and the main improvement points are as follows:

And adding a plurality of deconvolution layers at the rear end of the neural network model, and performing element-by-element product on the deconvolution layers and the feature map with the same size at the front end. That is, the improved SSD neural network includes multiple deconvolution layers, and each deconvolution layer is element-by-element multiplied with a feature map of the same size corresponding to the front end

In this example, each convolutional layer is followed by a corresponding deconvolution layer, such that multiple convolutional layers correspond to multiple deconvolution layers, thereby speeding up the rate of network learning. And fusing the high-level feature map of the deconvolution layer with the bottom-level feature map with the same size as the front end to obtain the next-level feature map to be deconvoluted. The fusion adopts a method of element-by-element multiplication. And finally, carrying out classification detection on the obtained characteristic graph.

From the above-described arrangement of the deconvolution layer, since the characteristic diagram representation capability of the shallow layer is not strong enough, the characteristic diagram obtained by only the convolution layer cannot perform comprehensive detection on the small target. Context information (context) is more fully utilized, shallow features are more fully utilized, and therefore the detection rate of small targets and dense targets can be greatly improved.

Training is carried out according to prior frame matching and data enhancement of a plurality of characteristic graphs to obtain a neural network model, and the method specifically comprises the following steps: performing prior frame matching on a plurality of feature maps as follows: a plurality of prior frames are generated on the feature maps of different scales, and if the feature maps into the original image, a prior frame set of dense hemp is obtained. And according to the setting of the IOU, predicting to obtain a final detection result.

Carrying out prior frame matching on a plurality of characteristic images, specifically: for each real target in the picture, finding a prior frame with the largest IOU (cross-over ratio) of the real target, matching the prior frame with the prior frame, for the rest unmatched prior frames, if the IOU of a certain real target is larger than 0.5, matching the prior frame with the real target, sampling the negative samples, performing descending order arrangement according to confidence coefficient errors (the confidence coefficient of a prediction background is smaller, the error is larger) during sampling, and selecting the top-K with larger errors (finding out the K number with the largest error) as the training negative samples to ensure that the proportion of the positive samples and the negative samples is close to 1: 3.

The classification is performed by a loss function, specifically: the loss function is defined as a weighted sum of the position error and the confidence error. For the position error, Smooth L1loss is adopted, and for the confidence coefficient error, softmax loss is adopted;

the image is horizontally turned, randomly cut and distorted in color, and data enhancement is carried out in a random block domain acquisition mode, and the method comprises the following steps: data enhancement is carried out through modes of random cutting, color distortion, random block domain acquisition and the like, the number of training samples can be increased, more targets with different shapes and sizes are constructed at the same time and are input into the network, and the network can be trained better.

And when the trained neural network model is put into image recognition by using histogram feature description, a multi-scale feature map is used for detection. Comparing the large feature maps to detect relatively small targets, while the small feature maps are responsible for detecting large targets; setting prior frames, and taking the predicted boundary frames as reference; detecting by adopting convolution, and extracting detection results from different characteristic graphs;

as a specific example, images of a complex environment are acquired, and trees are calibrated. Obtaining multi-scale feature mapping for the image and adding a deconvolution layer: data acquisition: and reading images in batches, wherein the real target of each target tree image can be matched with the prior frame, and calibration software is used for calibrating the acquired images in order to match the prior frame. And converting the information into pixel values of the image, and inputting the pixel values into the network. This is the process of data acquisition. Obtaining multi-scale feature mapping for the image: 5 different feature maps, conv _7, conv8_2, conv9_2, conv10_2, conv11_2, are used. For each feature map, k default boxes are generated with different sizes and aspect ratios. For each feature map, k default boxes are generated with different sizes and aspect ratios. The scale adopts feature maps with different sizes, the front feature map is larger, the size of the feature map is gradually reduced by convolution or pool with stride being 2, a plurality of scale maps are used for detection, the larger feature map is used for detecting a relatively smaller target, and the smaller feature map is responsible for detecting a large target, so that the comprehensive detection of the image is achieved.

Adding a deconvolution layer: after each convolutional layer a BN layer is added. And fusing the high-level feature map of the deconvolution layer with the bottom-level feature map with the same size as the front end to obtain the next-level feature map to be deconvoluted. The fusion adopts a method of element-by-element multiplication. And finally, carrying out classification detection on the obtained characteristic graph.

Training a plurality of feature maps by prior frame matching and data enhancement: matching a priori frames: and obtaining feature maps with different sizes by multi-scale mapping. And each unit is provided with a priori frame with different scales or aspect ratios, and the predicted bounding box takes the priori frames as a reference, so that the training difficulty is reduced to a certain extent. A plurality of prior frames are generated on the feature maps of different scales, and if the feature maps into the original image, a prior frame set of dense hemp is obtained. And according to the setting of the IOU, sending the boxes into an NMS module to obtain a final detection result. Loss function: the loss function is defined as a weighted sum of the position error and the confidence error. For position errors, it uses Smooth L1loss, and for confidence errors, it uses softmax loss. The objects are classified according to a loss function. Data enhancement: data enhancement is carried out through modes of random cutting, color distortion, random block domain acquisition and the like, the number of training samples can be increased, more targets with different shapes and sizes are constructed at the same time and are input into the network, and the network can be trained better. And putting the image to be recognized into model detection to obtain a recognition result.

According to the method for counting the number of trees based on the improved SSD neural network, the anti-convolution layer is added in the neural network model, the convolution layer is adjusted, context associated information is introduced into the anti-convolution layer, and information of the content before and after the context associated information is fused, so that the identification accuracy is improved.

Fig. 3 is a schematic structural diagram of a system for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention, and as shown in fig. 3, the system for counting the number of trees based on an improved SSD neural network according to an embodiment of the present invention includes: an acquisition module 310 and a classification module 320.

The acquisition module 310 is configured to acquire an image, where the image is an aerial image of an environment. The classification module 320 is configured to input the image into a pre-trained neural network model for identifying the tree, so as to identify the tree in the image through the neural network model, where the neural network model includes multiple convolution layers for extracting feature images of multiple scales in a one-to-one correspondence manner, and multiple deconvolution layers for performing feature fusion on feature maps of multiple scales in a one-to-one correspondence manner, so that new feature maps after fusion are obtained by the multiple deconvolution layers.

According to the system for counting the number of trees based on the improved SSD neural network, the neural network model is trained before image analysis, so that the system has universality on images in a complex environment, the number of trees can be accurately identified, and the accuracy of forestry resource counting is improved.

It should be noted that, a specific implementation manner of the system for counting the number of trees based on the improved SSD neural network in the embodiment of the present invention is similar to a specific implementation manner of the method for counting the number of trees based on the improved SSD neural network in the embodiment of the present invention, and please refer to the description of the method part specifically, and details are not repeated here in order to reduce redundancy.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device, which specifically includes the following components, with reference to fig. 4: a processor 401, a memory 402, a communication interface 403, and a communication bus 404;

the processor 401, the memory 402 and the communication interface 403 complete mutual communication through the communication bus 404; the communication interface 403 is used for implementing information transmission between the devices;

The processor 401 is configured to call a computer program in the memory 402, and the processor executes the computer program to implement all the steps of the above method for counting the number of trees based on the improved SSD neural network, for example, when the processor executes the computer program to implement the following steps: acquiring an image, wherein the image is an aerial environment image; inputting the image into a pre-trained neural network model for identifying the tree, so as to identify the tree in the image through the neural network model, wherein the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence manner, and a plurality of deconvolution layers for performing feature fusion on feature maps of the plurality of scales in a one-to-one correspondence manner, so that a new feature map after fusion is obtained through the plurality of deconvolution layers.

In addition, other structures and functions of the electronic device according to the embodiment of the present invention are known to those skilled in the art, and are not described herein.

Based on the same inventive concept, a further embodiment of the present invention provides a non-transitory computer-readable storage medium, having a computer program stored thereon, which when executed by a processor implements all the steps of the above method for improving the number of trees of an SSD neural network, for example, the processor implements the following steps when executing the computer program: acquiring an image, wherein the image is an aerial environment image; inputting the image into a pre-trained neural network model for identifying the tree, so as to identify the tree in the image through the neural network model, wherein the neural network model comprises a plurality of convolution layers for extracting feature images of a plurality of scales in a one-to-one correspondence manner, and a plurality of deconvolution layers for performing feature fusion on feature maps of the plurality of scales in a one-to-one correspondence manner, so that a new feature map after fusion is obtained through the plurality of deconvolution layers.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions may be essentially or partially implemented in the form of software products, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the index monitoring method according to the embodiments or some parts of the embodiments.

In addition, in the present invention, terms such as "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Moreover, in the present invention, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Furthermore, in the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for counting the number of trees based on an improved SSD neural network is characterized by comprising the following steps:

acquiring an image, wherein the image is an aerial environment image;

2. The method for counting the number of trees based on the improved SSD neural network as claimed in claim 1, further comprising a step of training the neural network model, specifically:

acquiring a plurality of image training samples;

3. The method for improving the number of the trees of the SSD neural network according to claim 2, wherein the multi-scale feature map mapping of the plurality of image training samples comprises:

4. The method for counting the number of trees based on the improved SSD neural network as claimed in claim 2, wherein the prior frame matching and data enhancement training of the multi-scale feature map to obtain the trained neural network model for identifying trees comprises:

5. The method for improving the number of the trees of the SSD neural network according to claim 2, wherein the training the neural network model comprises:

matching with a corresponding real target through the prior frame;

6. The method for counting the number of trees based on the improved SSD neural network as claimed in claim 2, wherein the larger the scale of the multi-scale feature map is, the smaller the target for detection is when the trees in the image are identified by the neural network model.

7. The method for improving the number of the trees of the SSD neural network according to claim 2, wherein the classification of the plurality of image training samples is performed through a set loss function in the neural network model.

8. A system for counting the number of trees based on an improved SSD neural network is characterized by comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the method for improved SSD neural network-based statistical tree counting according to any of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method for improved SSD neural network based statistics of tree numbers according to any of claims 1 to 7.