CN116843971A

CN116843971A - Method and system for detecting hemerocallis disease target based on self-attention mechanism

Info

Publication number: CN116843971A
Application number: CN202310808238.6A
Authority: CN
Inventors: 王栋; 宋子申; 朱勇建; 许子鑫; 梁晓静
Original assignee: Shanghai Institute of Technology
Current assignee: Shanghai Institute of Technology
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-10-03

Abstract

The invention belongs to the field of computer vision, and particularly relates to a method and a system for detecting a hemerocallis disease target based on a self-attention mechanism, wherein the method comprises the following steps: collecting hemerocallis leaf images, inputting RGB color images, constructing an FTC characteristic extraction backbone network based on a rapid self-focusing mechanism, constructing a front two layers of a neck network by using two groups of top-down pyramid fusion module groups with the same structure, constructing a back two layers of the neck network by using two pyramid fusion module groups with the same structure and based on self-adaption deformation convolution modules from bottom to top, performing characteristic compression and aggregation processing on the characteristic images of the second third layer and the fourth layer of the neck network by using a prediction head, generating a target positioning classification prediction vector as a final prediction result, effectively distinguishing diseases, blades and backgrounds in a real scene, reducing the quantity of parameters and calculated quantity, and using a deformation convolution kernel of self-adaption learning to effectively align the characteristic images of the characteristic pyramids so as to more accurately position and classify targets.

Description

Method and system for detecting hemerocallis disease target based on self-attention mechanism

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a method and a system for detecting a hemerocallis disease target based on a self-attention mechanism.

Background

Object detection aims at letting the computer locate and sort objects to be detected in the RGB image. In recent years, the self-attention mechanism is applied first in natural language processing and then used successively in various subdivision directions in the field of computer vision. Since the self-attention mechanism has excellent global spatial feature correlation capability, the method based on the convolutional neural network is designed and shows excellent performance in the aspect of significant target detection due to the strong capability of the convolutional neural network in feature extraction, and the method is widely applied to medical image segmentation, target tracking, image editing and the like. In the detection of the daylily leaf diseases, the false detection of a complex background and the large quantity of algorithm parameters and calculation are important two problems, so in order to solve the two problems, a target detection method of the daylily leaf diseases based on a rapid self-attention mechanism is provided.

Currently existing target detection based on deep neural networks is mainly divided into two categories: 1) A single-stage-based target detection method; 2) A two-stage based target detection method. The method 1) generates the ROI area without passing through the RPN, so that the positioning accuracy is weaker; method 2) has an RPN network, but the number of parameters and the calculation amount are high.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the application and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description of the application and in the title of the application, which may not be used to limit the scope of the application.

The present application has been made in view of the above-described problems.

Therefore, the technical problems solved by the application are as follows: the method for detecting the disease targets of the hemerocallis blades based on the rapid self-attention mechanism is characterized in that global space associated information is utilized to process the disease targets, the blades and the background in a targeted manner, deformation convolution check through self-adaptive learning is utilized to align and fuse characteristic information to be fused, and characteristics output by the second layer, the third layer and the fourth layer of the neck network are compressed to realize the positioning and classification of the targets.

In order to solve the technical problems, the application provides the following technical scheme: a method for detecting a hemerocallis disease target based on a self-attention mechanism comprises the following steps: collecting hemerocallis leaf images, inputting RGB color images, constructing an FTC feature extraction backbone network based on a rapid self-focusing mechanism, dividing the backbone network into five scale feature extraction layers, constructing a front two layers of a neck network by using two groups of top-down pyramid fusion modules with the same structure, respectively carrying out top-down feature fusion on the features of a third scale layer and a fourth scale layer of the backbone network, constructing a back two layers of the neck network by using two groups of bottom-up pyramid fusion modules with the same structure and based on self-adaptive deformation convolution modules, respectively carrying out alignment and bottom-up fusion on the middle output features of the front two fusion layers of the neck network, carrying out feature compression and aggregation processing on the feature images of the second third layer and the fourth layer of the neck network by using a prediction head, and generating a prediction vector of target positioning classification as a final prediction result.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the feature extraction backbone network comprises a feature extraction network which is used for simultaneously extracting global and local spatial feature information and fusing, a group of convolution layers with the number of four channels of 64, 128, 256 and 512 are used for carrying out spatial downsampling and spatial feature fusion on an input color image, an output feature image is input into an FTC block to carry out spatial global feature information correlation, a convolution module with the number of channels of 512 is used for carrying out spatial downsampling, and a rapid pyramid pooling layer is used for expanding a receptive field.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the feature extraction backbone network further comprises an FTC block, the length and width dimension information of the input three-dimensional matrix feature map is unfolded to form a two-dimensional matrix, the size of the first dimension maintains the channel number c, the size of the second dimension is the number h x w of pixels of the space dimension, h is the length of the feature map, w is the width of the feature map, the feature map is input to 4 FTC layers with the same structure and composed of a rapid self-attention mechanism, an input feature addition residual error is connected to the output of the second layer, the output of the second layer is added with a residual error and connected to the output of the fourth layer, and the feature map which is related by the global space feature information is obtained.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the feature extraction backbone network further comprises an FTC layer constructed by adopting a rapid self-attention mechanism, an input matrix is processed by a convolution branch, a high-efficiency self-attention module branch and a residual branch respectively, the outputs of the three branches are added element by element and standardized layer by layer to obtain an output A, the A is input into a full-connection layer and is subjected to batch standardization to obtain an output B, and the A is subjected to residual connection with the B;

the convolution branches transform a matrix with the shape (c, h x w) to a matrix with the shape (c, h, w), and convolve the matrix with a convolution layer, wherein the convolution layer comprises two convolution modules of 3x3, a batch normalization module and a ReLU module;

constructing a high-efficiency self-attention module, and dividing information of a second dimension of an input matrix according to a sequence average to obtain 3 matrices m1, m2 and m3 with the shape of (c, h x w); m2 is right multiplied by m3, and the result is left multiplied by m1.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the top-down feature fusion comprises the steps of inputting a fifth layer result of a backbone network into a first top-down pyramid fusion module group, inputting the result into a second top-down pyramid fusion module group, wherein each top-down pyramid fusion module group comprises a convolution kernel with a channel number of b, a double up-sampling module, a channel splicing module and a C3 module with a channel number of b, the first group b is 512, the input of the splicing module is a third layer output feature map of the backbone network and an output feature map of a last module, the second group b is 256, and the input of the splicing module is a fourth layer output feature map of the backbone network and an output feature map of the last module.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the alignment and bottom-up fusion comprises the steps of inputting a second layer result of the neck network into a first bottom-up pyramid fusion module group, inputting the result into a second bottom-up pyramid fusion module group, wherein each bottom-up pyramid fusion module group consists of a local feature extraction module, a feature selection alignment module and a C3 module;

the input matrix is simultaneously and respectively input into a 7x7 convolution kernel and a 5x5 convolution kernel, the results are added element by element, and then 1x1 convolution is carried out to obtain a local feature map;

and inputting the output feature image of the k-th layer of the two layers of the neck network into the SE module for feature screening, and inputting the feature image and the input matrix into the feature alignment fusion module to obtain an aligned fusion feature image, wherein k of a first bottom-up pyramid fusion module group is 2, and k of a second group is 1.

The method for detecting the hemerocallis disease target based on the self-attention mechanism comprises the following steps: the processing includes outputting O from the second, third and fourth layers of the neck network _i ，j∈[2,3,4]Convolution was performed using 1x1 convolution kernels, respectively, expressed as:

wherein c _out Index representing output channel, c _in Index indicating input feature map channel, h _out And w _out Representing the height and width of the output feature map, respectivelyIndexing;

the feature map obtained by three-layer 1x1 convolution is unfolded into one dimension, and a result vector is output and expressed as:

wherein h and w represent the super-pixel number in the length and width directions of the feature map, n is the disease target class number, 3 and 5 represent the length of regression prediction and classification vectors of the prediction target position, and 3 represents that each super-pixel position is matched by using 3 anchor frames with different length-to-width ratios;

obtaining final prediction target position regression prediction and classification vector P through IOU threshold screening and Conf threshold screening _f The size of the target is m (n+5), m represents the positions and related information of m targets remained after screening, and the mathematical formula is as follows:

P _f ＝{(p ₁ ,p ₂ ,p ₃ ,x _i ,y _i ,a _i ,b _i ,Conf _i )|(p ₁ ,p ₂ ,p ₃ ,x _i ,y _i ,a _i ,b _i ,Conf _i )

∈P，IOU(x _i ,y _i ,a _i ,b _i )≥IOU_T，Conf _i ≥Conf_T}

wherein P is composed ofj∈[2,3,4]Constitution, IOU (x _i ,y _i ,a _i ,b _i ) Representing the target position (x _i ,y _i ,a _i ,b _i ) IOU_T represents IOU threshold for filtering excessively overlapping prediction frames, conf_T represents Conf threshold for filtering prediction vectors with low confidence, x _i And y _i Is the two-dimensional coordinates of the predicted target frame of the ith vector, a _i And b _i Is the length and width of a predicted target frame, conf _i Is the confidence level of the existence of the target in the ith vector prediction target frame, p ₁ ,p ₂ ,p ₃ Is of the type 1,2. Category confidence of 3;

the initially output P is filtered through IOU and Conf threshold values to obtain final target position prediction information P _f Finally, in each vector, adopt p ₁ ,p ₂ ,p ₃ The highest value of (3) is classified.

The invention further aims to provide a hemerocallis disease target detection system based on a self-attention mechanism, which can establish global association through the quick self-attention mechanism, process various targets in a targeted manner, realize feature fusion by using a deformation convolution check, finally realize positioning and classification of disease targets by using feature compression, and solve the problem of difficulty in detecting hemerocallis disease targets in a real scene.

In order to solve the technical problems, the invention provides the following technical scheme: a hemerocallis disease target detection system based on a self-attention mechanism comprises an FTC module, a top-down pyramid fusion module group, a bottom-up pyramid fusion module group, a pre-measurement head and a screening output module;

the FTC module is used for extracting space global feature information of an input RGB color chart and comprises 4 FTC layers, each FTC layer is composed of a convolution branch, a high-efficiency self-attention module branch and a residual branch, and space association of features can be achieved;

The top-down pyramid fusion module groups are used for carrying out top-down fusion on feature graphs with different scales in a backbone network, each module group comprises a convolution layer, a double up-sampling module, a channel splicing module and a C3 module, and feature expression capacity and receptive field are enhanced;

the bottom-up pyramid fusion module groups are used for carrying out bottom-up fusion on feature graphs with different scales in the neck network, and each module group comprises a local feature extraction module, a feature selection alignment module and a C3 module, so that feature enhancement and alignment are realized;

the prediction head is used for carrying out feature compression and aggregation on feature graphs of different scales of the neck network to generate a prediction vector of target positioning classification, and comprises a plurality of convolution layers to realize regression prediction and probability prediction;

the screening output module is used for screening the position vector output by the prediction head to obtain the final target position regression prediction and classification vector.

A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that said processor, when executing said computer program, implements the steps of a method for detecting a target of a daylily disease based on a self-attention mechanism.

A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of a method for detecting a disease target of hemerocallis based on a self-attention mechanism.

The invention has the beneficial effects that: the invention constructs a shallow global space associated information backbone network by using a full convolution network to realize an end-to-end single-stage detection model, can effectively distinguish diseases, blades and backgrounds in a real scene, and reduces the quantity of parameters and calculated quantity; the method and the device can effectively align the feature graphs of the feature pyramid by using the deformation convolution kernel of the self-adaptive learning, and more accurately locate and classify the targets.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of a network model framework of a method for detecting a daylily disease target based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 2 is a flow chart of FTC modules of a backbone network of a method for detecting a target of a daylily disease based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 3 is a top-down pyramid fusion module flow chart of a method for detecting daylily disease targets based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 4 is a flow chart of a bottom-up pyramid fusion module of a method for detecting daylily disease targets based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 5 is a flowchart of a prediction module of a method for detecting a target of a daylily disease based on a self-attention mechanism according to an embodiment of the present invention;

FIG. 6 is a block diagram of a system for detecting a target of a daylily disease based on a self-attention mechanism according to an embodiment of the present invention.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1

Referring to fig. 1-5, for one embodiment of the present invention, a method for detecting a disease target of hemerocallis based on a self-attention mechanism is provided, including:

collecting a hemerocallis fulva leaf image, inputting an RGB color image, and constructing an FTC (fiber to the center) feature extraction backbone network based on a rapid self-attention mechanism, wherein the backbone network can be divided into five-scale feature extraction layers;

two groups of top-down pyramid fusion modules with the same structure are used for forming the front two layers of the neck network, and top-down feature fusion is respectively carried out on the third and fourth scale layer features of the backbone network;

constructing a bottom-up pyramid fusion module group by using two self-adaptive deformation convolution modules with the same structure to form two back layers of the neck network, and respectively aligning and bottom-up fusion the middle output characteristics of the first two fusion layers of the neck network;

And carrying out feature compression and aggregation processing on the feature graphs of the second third layer and the fourth layer of the neck network by using the prediction head, and generating a prediction vector of the target positioning classification as a final prediction result.

S1: collecting a hemerocallis fulva leaf image, inputting an RGB color image, and constructing an FTC (fiber to the center) feature extraction backbone network based on a rapid self-attention mechanism, wherein the backbone network can be divided into five-scale feature extraction layers;

furthermore, the feature extraction backbone network comprises a feature extraction network which simultaneously extracts and fuses global and local spatial feature information, wherein a group of convolution layers with the number of four channels of 64, 128, 256 and 512 are used for carrying out spatial downsampling and spatial feature fusion on an input color image, an output feature image is input into an FTC block for carrying out spatial global feature information association, a convolution module with the number of channels of 512 is used for carrying out spatial downsampling, and a rapid pyramid pooling layer is used for carrying out receptive field expansion.

It should be noted that the four convolution layers are sequentially subjected to double spatial downsampling, and when each spatial downsampling is performed, the number of channels needs to be expanded to accommodate deep characterization information, and the channel values are set to empirical values.

Further, the feature extraction backbone network further comprises an FTC block, wherein the FTC block is configured to expand the length-width dimension information of the feature map of the input three-dimensional matrix into a two-dimensional matrix, the size of the first dimension maintains the channel number c, the size of the second dimension is the number h x w of pixels of the spatial dimension, wherein h is the length of the feature map, w is the width of the feature map, the FTC block is input into 4 FTC layers with the same structure and composed of a fast self-attention mechanism, the input feature addition residual error is connected to the output of the second layer, the output of the second layer is added with a residual error and connected to the output of the fourth layer, and the feature map correlated with the global spatial feature information is obtained.

Furthermore, the feature extraction backbone network further comprises the steps of constructing an FTC layer by adopting a rapid self-attention mechanism, processing an input matrix by using a convolution branch, a high-efficiency self-attention module branch and a residual branch respectively, adding the outputs of the three branches element by element and carrying out layer standardization to obtain an output A, inputting the A into a full-connection layer, carrying out batch standardization to obtain an output B, and carrying out residual connection on the A to the B;

It should be noted that the use of 3x3 convolution can better expand the receptive field of the network to the spatial feature map, enhancing the local spatial feature extraction.

S2: two groups of top-down pyramid fusion modules with the same structure are used for forming the front two layers of the neck network, and top-down feature fusion is respectively carried out on the third and fourth scale layer features of the backbone network;

still further, the top-down feature fusion includes,

and inputting a fifth layer result of the backbone network into a first top-down pyramid fusion module group, inputting the result into a second top-down pyramid fusion module group, wherein each top-down pyramid fusion module group consists of a convolution kernel with a channel number of b, a double up-sampling module, a channel splicing module and a C3 module with a channel number of b, the first group b is 512, the input of the splicing module is a third layer output characteristic diagram of the backbone network and an output characteristic diagram of a last module, the second group b is 256, and the input of the splicing module is a fourth layer output characteristic diagram of the backbone network and an output characteristic diagram of the last module.

It should be noted that the C3 module is proposed in the Yolov5 network, can fuse semantic information of different scales, and is helpful for improving the feature extraction capability of the model on large, medium and small targets;

it should be noted that the values b set in the first and second groups are empirical values, 512 is empirical value in the first group of modules, and the second group of modules is set to be one half of the first group because the spatial semantic information is rough by performing double spatial upsampling using the nearest upsampling method, and therefore the channel side information needs to be compressed and fused to supplement the spatial side semantic information.

S3: constructing a bottom-up pyramid fusion module group by using two self-adaptive deformation convolution modules with the same structure to form two back layers of the neck network, and respectively aligning and bottom-up fusion the middle output characteristics of the first two fusion layers of the neck network;

further, the aligning and bottom-up fusing includes inputting a second layer result of the neck network to a first bottom-up pyramid fusion module group, inputting the result to a second bottom-up pyramid fusion module group, each bottom-up pyramid fusion module group being composed of a local feature extraction module, a feature selection aligning module and a C3 module;

it should be noted that, in the usage scenario of the present invention, since the application scenario is performed on the disease target on the daylily leaf in the real scenario including the complex background, the boundary features between the leaf and the boundary features between the leaf and the complex background need to be distinguished, the 3x3 convolution kernel receptive field is too small, the default channel is narrower, and the characteristic information cannot be well contained, so the input features are extracted by adopting the convolution kernel of multiple sizes in parallel with reference to the acceptance series, and for the reason, the present invention adopts the convolution of 7x7 and 5x5 to process the input features in parallel in the local feature extraction module, then fuses the feature information, and fuses the channel feature graphs by using the convolution of 1x1, thereby reducing the redundancy feature graphs.

It should be noted that, because in the neck network, layer 2 in the top-down feature map and layer 1 in the bottom-up feature map are the same in spatial scale, feature channel number; layer 1 in the top-down feature map and layer 2 in the bottom-up feature map are the same in spatial scale, feature channel number. And according to the structure of the feature pyramid, the feature pairs with the same scale are fused, and the semantic information of the two later layers of feature graphs can be supplemented by fusing the first two layers of the neck network.

S4: and carrying out feature compression and aggregation processing on the feature graphs of the second third layer and the fourth layer of the neck network by using the prediction head, and generating a prediction vector of the target positioning classification as a final prediction result.

Still further, the processing includes outputting O from the second, third, and fourth layers of the neck network _i ，j∈[2,3,4]Convolution was performed using 1x1 convolution kernels, respectively, expressed as:

wherein c _out Index representing output channel, c _in Index indicating input feature map channel, h _out And w _out Indexes respectively representing the height and width of the output feature map;

∈P，IOU(x _i ,y _i ,a _i ,b _i )≥IOU_T，Conf _i ≥Conf_T}

wherein P is composed ofj∈[2,3,4]Constitution, IOU (x _i ,y _i ,a _i ,b _i ) Representing the target position (x _i ,y _i ,a _i ,b _i ) IOU_T represents IOU threshold for filtering excessively overlapping prediction frames, conf_T represents Conf threshold for filtering prediction vectors with low confidence, x _i And y _i Is the two-dimensional coordinates of the predicted target frame of the ith vector, a _i And b _i Is the length and width of a predicted target frame, conf _i Is the confidence level of the existence of the target in the ith vector prediction target frame, p ₁ ,p ₂ ,p ₃ Category confidence for categories 1, 2, 3;

Example 2

A second embodiment of the invention, which differs from the previous embodiment, is:

the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Example 3

Referring to fig. 6, in a third embodiment of the present invention, a system for detecting a disease target of hemerocallis based on a self-attention mechanism is provided, which includes an FTC module, a top-down pyramid fusion module group, a bottom-up pyramid fusion module group, a pre-measurement head, and a screening output module;

It should be noted that, through the cooperative work of the branches, the FTC module can realize the spatial association of the features and extract global context information, so as to enhance the expression capability of the features, and meanwhile, the network layer is shallow, so that the calculation amount can be reduced.

it should be noted that, through the top-down fusion mode, the module group can enhance the expression capability and receptive field of the features, effectively integrate the multi-scale information and improve the accuracy of small target disease target detection.

it should be noted that, by means of bottom-up fusion, the module group can enhance and align features, so that features of different scales can mutually affect and supplement each other, and detection accuracy is improved.

The prediction head is used for carrying out feature compression and aggregation on feature graphs of different scales of the neck network to generate a prediction vector of target positioning classification, and the prediction vector comprises a plurality of convolution layers;

it should be noted that, through the operations of these layers, the feature dimension reduction and abstraction are realized, and regression prediction and probability prediction results are generated, so as to achieve the purposes of predicting disease target positions and target classification.

It should be noted that, the module filters and selects the prediction result according to a certain screening rule, and eliminates the false detection result to obtain more accurate target position and classification information.

Example 4

In order to verify the beneficial effects of the invention, scientific demonstration is carried out through economic benefit calculation and experiments.

In this example, the experimental environment configuration is shown in Table 1, and the experimental program is written in python 3.7 language, and the deep learning framework is adopted

The Pytorch1.7.1 is used, the training batch is set to 250, the data training batch size is set to 16, the initial learning rate is 0.01, the weight attenuation coefficient is 0.0005, the momentum coefficient is 0.9, the experimental data set adopts the collected and manufactured daylily disease leaf data set, wherein leaf diseases are divided into three types, rust disease, other diseases and middle and later diseases, the data set has 600 pictures, and the ratio of the training set to the verification set is 8:2.

List one

Experimental results and analysis:

the accuracy of the detection algorithm is measured by using an AP (S), an AP (M), an AP (L) and an average precision average value mAP@0.5, wherein the AP is average precision, namely the average precision, and S, M, L respectively represents a small target, a medium target and a large target. The running speed of the algorithm was then evaluated using the parameters (parameters) and the evaluation algorithm model size, using the calculated amounts GFLOPS, which represents Giga Floating-point Operations Per Second, i.e. 10 billion Floating-point operands Per Second, and FPS, which represents Frame Per Second, i.e. the number of pictures detected Per Second by the method.

Table 2 shows a comparative experiment of the method of the present invention in combination with other mainstream two-stage and single-stage target detection methods with the above evaluation indexes. As can be seen from Table 2, the fast-RCNN (resnet 50) method had an AP (S) of 16.7%, an AP (M) of 27.4%, an AP (L) of 18.9%, an average precision mAP@0.5 of 36.3%, a model size of 41.1M, a calculated amount of 211.3GFLOPS, and an operating speed of 46.5FPS.

The Cascade-RCNN (resnet 50) method has an AP (S) of 19.8%, an AP (M) of 29.4%, an AP (L) of 25.5%, an average precision mAP@0.5 of 42.6%, a model size of 68.9M, a calculated amount of 239.1GFLOPS, and an operating speed of 37.4FPS.

The SSD512 approach training the daylily leaf dataset is not convergent because there are many small objects and the input resolution is too small resulting in insignificant small object features.

The Retinonet (resnet 50) method had an AP (S) of 18.7%, an AP (M) of 19.5%, an AP (L) of 13.8%, an average precision mAP@0.5 of 34.4%, a model size of 36.1M, a calculated amount of 210.5GFLOPS, and an operating speed of 53.5FPS.

The YOLOv3 (dark 53) method had an AP (S) of 17.3%, an AP (M) of 26.9%, an AP (L) of 22.9%, an average accuracy mAP@0.5 of 38.5%, a model size of 61.5M, a calculated amount of 70GFLOPS, and an operating speed of 79.1FPS.

The YOLOv7 method had an AP (S) of 21.4%, an AP (M) of 29.5%, an AP (L) of 28%, an average precision mAP@0.5 of 44.5%, a model size of 37.2M, a calculated amount of 104.8GFLOPS, and an operating speed of 142.9FPS.

The YOLOv5-L method had an AP (S) of 17.6%, an AP (M) of 25.2%, an AP (L) of 27.3%, an average precision mAP@0.5 of 44.6%, a model size of 46.1M, a calculated amount of 107.8GFLOPS, and an operating speed of 128.2FPS.

The method of the invention has the advantages of 24.2% of AP (S), 32.4% of AP (M), 29.3% of AP (L), 50% of average precision mAP@0.5, 26.3M of model size, 67.7GFLOPS of calculated amount and 156.1FPS of running speed.

Table 2 comparative test with other algorithms

Two-stage methods such as fast-RCNN (resnet 50) and Cascade-RCNN (resnet 50) perform well with average accuracy mAP@0.5, reaching 36.3% and 42.6%, respectively. However, their model size and computation are large and the operation speed is slow. The SSD512 method, although running at a faster speed, is relatively small in terms of parameters and computation, but does not converge when trained against complex background data sets.

The Retinonet (resnet 50) method reaches 34.4% on average accuracy mAP@0.5, and has small model size and calculation amount and moderate running speed. The YOLO series algorithms, such as YOLOv3 (dark net 53), YOLOv7 and YOLOv5-L all achieved good average accuracy mAP@0.5 of 38.5%, 46.4% and 44.6%, respectively. These algorithms have a small model size and computational effort and also perform well at operating speeds. The algorithm of the invention comprises the following steps: the method of the invention performs optimally in all algorithms, and the average accuracy mAP@0.5 reaches 50%. Compared with other algorithms, the method has smaller parameter size, lower calculation amount and faster running speed. The method has obvious advantages in practical application, and can provide efficient and accurate target detection capability.

In conclusion, the method provided by the invention has excellent performance in the hemerocallis disease detection task in a real scene, and provides a high-precision and high-efficiency solution by optimizing the size, the calculated amount and the running speed of the model.

The method has good robustness in the detection task of the blade disease image containing the complex background hemerocallis, reduces the quantity of parameters and the calculated amount under the condition of higher accuracy, has lower requirement on hardware, and can be better popularized to actual application scenes. However, the detection accuracy of the small target still needs to be improved, and the detection accuracy of the small target diseases is considered to be further improved in the following work.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims

1. A method for detecting a hemerocallis disease target based on a self-attention mechanism is characterized by comprising the following steps of: comprising the steps of (a) a step of,

Collecting a hemerocallis fulva leaf image, inputting an RGB color image, and constructing an FTC (fiber to the center) feature extraction backbone network based on a rapid self-attention mechanism, wherein the backbone network is divided into five-scale feature extraction layers;

2. The method for detecting the hemerocallis disease target based on the self-attention mechanism as set forth in claim 1, wherein the method comprises the following steps of: the feature extraction backbone network comprises a feature extraction network which is used for simultaneously extracting global and local spatial feature information and fusing, a group of convolution layers with the number of four channels of 64, 128, 256 and 512 are used for carrying out spatial downsampling and spatial feature fusion on an input color image, an output feature image is input into an FTC block to carry out spatial global feature information correlation, a convolution module with the number of channels of 512 is used for carrying out spatial downsampling, and a rapid pyramid pooling layer is used for expanding a receptive field.

3. The method for detecting the hemerocallis disease target based on the self-attention mechanism as set forth in claim 2, wherein the method comprises the following steps of: the feature extraction backbone network further comprises an FTC block, the length and width dimension information of the input three-dimensional matrix feature map is unfolded to form a two-dimensional matrix, the size of the first dimension maintains the channel number c, the size of the second dimension is the number h x w of pixels of the space dimension, h is the length of the feature map, w is the width of the feature map, the feature map is input to 4 FTC layers with the same structure and composed of a rapid self-attention mechanism, an input feature addition residual error is connected to the output of the second layer, the output of the second layer is added with a residual error and connected to the output of the fourth layer, and the feature map which is related by the global space feature information is obtained.

4. A method for detecting a disease target of hemerocallis based on a self-attention mechanism as set forth in claim 3, wherein: the feature extraction backbone network further comprises an FTC layer constructed by adopting a rapid self-attention mechanism, an input matrix is processed by a convolution branch, a high-efficiency self-attention module branch and a residual branch respectively, the outputs of the three branches are added element by element and standardized layer by layer to obtain an output A, the A is input into a full-connection layer and is subjected to batch standardization to obtain an output B, and the A is subjected to residual connection with the B;

5. The method for detecting the hemerocallis disease target based on the self-attention mechanism as set forth in claim 4, which is characterized in that: the top-down feature fusion comprises the steps of inputting a fifth layer result of a backbone network into a first top-down pyramid fusion module group, inputting the result into a second top-down pyramid fusion module group, wherein each top-down pyramid fusion module group comprises a convolution kernel with a channel number of b, a double up-sampling module, a channel splicing module and a C3 module with a channel number of b, the first group b is 512, the input of the splicing module is a third layer output feature map of the backbone network and an output feature map of a last module, the second group b is 256, and the input of the splicing module is a fourth layer output feature map of the backbone network and an output feature map of the last module.

6. The method for detecting the hemerocallis disease target based on the self-attention mechanism as set forth in claim 5, which is characterized in that: the alignment and bottom-up fusion comprises the steps of inputting a second layer result of the neck network into a first bottom-up pyramid fusion module group, inputting the result into a second bottom-up pyramid fusion module group, wherein each bottom-up pyramid fusion module group consists of a local feature extraction module, a feature selection alignment module and a C3 module;

7. The method for detecting the hemerocallis disease target based on the self-attention mechanism as set forth in claim 6, wherein the method comprises the following steps of: the processing includes outputting O from the second, third and fourth layers of the neck network _i ，j∈[2,3,4]Convolution was performed using 1x1 convolution kernels, respectively, expressed as:

P _f ＝{(p ₁ ,p ₂ ,p ₃ ,x _i ,y _i ,a _i ,b _i ,Conf _i )|(p ₁ ,p ₂ ,p ₃ ,x _i ,y _i ,a _i ,b _i ,Conf _i )∈P，IOU(x _i ,y _i ,a _i ,b _i )≥IOU_T，Conf _i ≥Conf_T}

wherein P is composed ofConstitution, IOU (x _i ,y _i ,a _i ,b _i ) Representing the target position (x _i ,y _i ,a _i ,b _i ) IOU_T represents IOU threshold for filtering excessively overlapping prediction frames, conf_T represents Conf threshold for filtering prediction vectors with low confidence, x _i And y _i Is the two-dimensional coordinates of the predicted target frame of the ith vector, a _i And b _i Is the length and width of a predicted target frame, conf _i Is the confidence level of the existence of the target in the ith vector prediction target frame, p ₁ ,p ₂ ,p ₃ Category confidence for categories 1, 2, 3;

8. A system employing the method for detecting a disease target of hemerocallis based on a self-attention mechanism as set forth in any one of claims 1 to 7, characterized in that: the device comprises an FTC module, a top-down pyramid fusion module group, a bottom-up pyramid fusion module group, a pre-measuring head and a screening output module;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any of claims 1 to 7 when executed by a processor.