CN114677553A - Image recognition method for solving unbalanced problem of crop disease and insect pest samples - Google Patents
Image recognition method for solving unbalanced problem of crop disease and insect pest samples Download PDFInfo
- Publication number
- CN114677553A CN114677553A CN202111676323.9A CN202111676323A CN114677553A CN 114677553 A CN114677553 A CN 114677553A CN 202111676323 A CN202111676323 A CN 202111676323A CN 114677553 A CN114677553 A CN 114677553A
- Authority
- CN
- China
- Prior art keywords
- tail
- sample
- head
- data set
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 241000607479 Yersinia pestis Species 0.000 title claims abstract description 50
- 201000010099 disease Diseases 0.000 title claims abstract description 33
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 33
- 238000000034 method Methods 0.000 title claims abstract description 23
- 241000238631 Hexapoda Species 0.000 title claims description 10
- 238000012549 training Methods 0.000 claims abstract description 32
- 238000012795 verification Methods 0.000 claims abstract description 17
- 238000012216 screening Methods 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims description 35
- 238000012360 testing method Methods 0.000 claims description 14
- 238000013499 data model Methods 0.000 claims description 6
- 238000012805 post-processing Methods 0.000 claims description 4
- 230000001629 suppression Effects 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 238000009826 distribution Methods 0.000 abstract description 4
- 208000025865 Ulcer Diseases 0.000 description 7
- 241001478315 Candidatus Liberibacter asiaticus Species 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 101100272279 Beauveria bassiana Beas gene Proteins 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to the field of pest and disease identification, in particular to an image identification method for solving the problem of imbalance of crop pest and disease samples. The method comprises the steps of performing model training by using a current labeled data set, selecting a current optimal model through model verification, performing image enhancement on a picture without the labeled data set for a plurality of times to obtain an enhanced image, performing reasoning and screening to obtain an identification result of the image without the label, inputting the identification result into a sample selection strategy, judging whether the result is reserved according to the sample selection strategy, generating a pseudo label if the result is reserved, moving the pseudo label to the current labeled data set, continuing training a new labeled data set, and performing iterative learning according to the process until the accuracy is not improved any more. The method can reduce the influence of long tail distribution, improve the recall rate and the accuracy rate of the tail category through iterative learning, simultaneously do not influence the identification effect of the head category, only adopt a single model to carry out reasoning, do not introduce an additional network layer and have no influence on the reasoning speed.
Description
Technical Field
The invention relates to the field of pest and disease identification, in particular to an image identification method for solving the problem of unbalanced crop pest and disease sample.
Background
Crop diseases and insect pests are one of the main agricultural disasters in the world, and if the diseases and insect pests are discovered and prevented in time, great loss can be caused to agricultural production, and the national food safety and agricultural product quality safety are threatened. Crop diseases and pests have the characteristics of multiple varieties, large influence and frequent outbreak of disasters, and the characteristics bring great challenges to the monitoring of the crop diseases and pests.
With the rapid development of computer vision and artificial intelligence, the pest and disease identification technology based on images is applied to pest and disease monitoring of various crops with the characteristics of low cost and high efficiency. The current image-based pest identification method generally uses a deep learning algorithm to perform model training and reasoning, deep learning needs to rely on mass data to achieve maximum identification effect, but crop pest image data has the characteristic of unbalanced samples, the data volume of common pest categories is very large, the data volume of uncommon pest categories is small, so that pest data are distributed in a long tail manner, the head data distributed in the long tail manner are very large, the middle part is gradually reduced, the data volume of the tail portion is very small or even no sample, the crop pest categories are many, and the tail portion is pulled to be very long.
The unbalanced sample problem has great influence on the effect of the crop disease and pest model, the model is easy to over-fit the head type with more data, and under-fit the tail type with less data. There are many general methods for solving the problem of sample imbalance, for example, a resampling algorithm undersamples the head class and oversamples the tail class to ensure the balance of training samples, but this can cause the model to be under-fitted to the head class and over-fitted to the tail class; the weight weighting algorithm gives low weight to the head category and high weight to the tail category, but the effect improvement is limited; the crop disease long-tail image identification method based on multi-stage training adjusts sample distribution in a multi-stage enhancement training mode on labeled data, massive label-free data are not fully utilized, and the richness of tail type data is insufficient.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides an image recognition method for solving the problem of imbalance of crop pest samples, and the specific technical scheme is as follows:
an image recognition method for solving the problem of imbalance of crop pest samples comprises the following steps:
step S1, creating a labeled data set: collecting crop pest and disease picture data, and marking the positions of the pests and diseases by using a rectangular frame to form a marked data set; dividing the labeled data set into a training set, a verification set and a test set according to a certain proportion;
Step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training;
step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed;
step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, performing post-processing on each recognition result respectively, overlapping the post-processed recognition results, screening the overlapped results through a non-maximum suppression algorithm, and finally obtaining the recognition result without labeled data;
Step S7, sample selection: judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample;
step S8, new data generation: generating a pseudo label of the non-artificial annotation for the new sample in the step S7 in a rectangular frame annotation manner with an annotated data set in the step S1, taking the pseudo label and the original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the annotated data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the unmarked data set in the step S4;
step S9, after the newly generated data of step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow of steps S1-S8, if the accuracy of the optimal target detection model is not improved any more in step S3, the iterative learning is ended, and the final target detection model is obtained;
step S10, labeled data model reasoning: and (4) inputting the test set with the labeled data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Preferably, in step S1, the ratio of 0.8: 0.1: the annotated data set is divided into a training set, a validation set, and a test set at a ratio of 0.1.
Preferably, the target detection model in step S2 is a YOLOv5l6 network structure model using a YOLOv5 target detection algorithm.
Preferably, the data enhancement in step S5 includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating, and randomly increasing the brightness, wherein N is 4.
Preferably, the sample selection strategy in step S7 includes the following steps:
step S71, head and tail division: performing sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to NmClassifying into a tail category; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal;
step S72, head and tail determination: classifying the corresponding category of each rectangular frame in the identification result of the label-free data in the step S6 to obtain the number of the head and the tail respectively, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
Step S73, new sample candidate: for the sample judged as the head, calculating the reliability mean value of the head class in the identification result of the sample, and if the reliability mean value of the head class is larger than the head reliability threshold value ThThen add the sample to the head new sample candidate queue QhPerforming the following steps; for the samples judged as the tail, calculating the reliability mean value of the tail category, and if the reliability mean value of the tail category is larger than the tail reliability threshold value TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps;
step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' the head ratio is selected to be PhAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample.
Preferably, the head confidence threshold T hHas a value range of T being not less than 0.9h<1。
Preferably, the tail confidence threshold TtHas a value range of T being not less than 0.9t<1。
The invention has the beneficial effects that: the invention provides an image recognition method for solving the unbalanced problem of crop pest samples, which comprises the steps of performing model training by utilizing a current labeled data set, selecting a current optimal model through model verification, performing image enhancement on a picture without the labeled data set for a plurality of times, obtaining an enhanced image, reasoning, screening a superposed result through a non-maximum suppression algorithm to obtain a recognition result without a labeled image, inputting the recognition result into a sample selection strategy, judging whether the result is retained according to the sample selection strategy, generating a pseudo label if the result is retained, moving to the current labeled data set, continuing training a new labeled data set, and performing iterative learning according to the flow until the accuracy is not improved any more. The invention fully utilizes massive unmarked crop disease and pest data to carry out semi-supervised learning, designs a sample selection strategy aiming at the problem of unbalanced samples, continuously adjusts the data distribution, reduces the influence of long tail distribution, improves the recall rate and the accuracy rate of tail categories by iterative learning, does not influence the identification effect of head categories, only adopts a single model to carry out reasoning, does not introduce an additional network layer, and has no influence on the reasoning speed.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of protection of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, the specific embodiment of the present invention provides an image recognition method for solving the unbalanced problem of crop pest samples, comprising the following steps:
step S1, creating a labeled data set: collecting crop pest and disease picture data, and marking the positions of the pests and diseases by using a rectangular frame to form a marked data set; according to the weight ratio of 0.8: 0.1: dividing the labeled data set into a training set, a verification set and a test set in a proportion of 0.1;
step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training; the target detection model is a YOLOv5l6 network structure model adopting a YOLOv5 target detection algorithm.
Step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed; data enhancement includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating and randomly increasing the brightness, wherein N is 4.
Step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, respectively performing post-processing on each recognition result, wherein the post-processing comprises recovering the random horizontally-overturned picture result according to a horizontal overturning parameter, recovering the random vertically-overturned picture result according to a vertical overturning parameter, recovering the random rotated picture result according to a rotating parameter, superposing each post-processed recognition result, and screening the superposed results through a non-maximum suppression algorithm to finally obtain a recognition result without labeled data;
Step S7, sample selection: and judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample. The sample selection strategy comprises the following steps:
step S71, head and tail division: carrying out sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to NmClassifying into a tail category; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal。
assuming that the training set with labeled data set has 100 pest categories, C is 100, the 1 st category is ulcer disease, the labeled number of ulcer disease is 20000, N120000, class 2 is Huanglongbing, the number of labels for Huanglongbing is 20, N2Counting the total number N of labels in all categories as 20 totalAnd obtaining:
Step S72, head-tail determination: and (5) performing head and tail classification on the category corresponding to each rectangular frame in the identification result without the labeling data in the step (S6) to respectively obtain the number of the head and the tail, wherein if the number of the head is greater than that of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample.
Head and tail judgment is carried out on 100 pest categories, the number 20000 of ulcer disease labels is greater than the average number 1000 of labels, the disease belongs to the head category, and the number 20 of Huanglongbing disease labels is less than the average number 1000 of labels, and the disease belongs to the tail category. Counting the total number N of head class labelshAssuming that 20 categories are head categories and 80 categories are tail categories, the total number of labels N is counted for the 20 head categorieshTo obtain Nh95000, the 20 tail classes are statistically labeled with the total number NtTo obtain Nt=5000,Nh+Nt=95000+5000=100000=Ntotal100000 is the total number of labels N for all categoriestotal。
Assuming that there are 200000 picture samples in the unlabeled data set, sequentially performing head and tail determination on each sample, wherein the identification result of the 1 st sample contains 2 detection frames, 2 of which are ulcer diseases, dividing according to the head and tail categories in step S71, and determining that the 1 st sample is a head sample if the number of heads is 2, the number of tails is 0, and the number of heads is greater than the number of tails; the identification result of the 2 nd sample contains 3 detection frames, wherein 1 is ulcer disease and 2 are huanglongbing disease, the 2 nd sample is judged to be a tail sample according to the head and tail classification in the step S71, the number of heads is 1, the number of tails is 2, and the number of heads is less than the number of tails.
Step S73, new sample candidate: for the sample judged as the head, the credibility of the head class label in the sample recognition result is summed and divided by the total number of the head class labels in the sample recognition result to obtain the credibility average value of the head class, and if the credibility average value of the head class is larger than the head credibility threshold ThThen add the sample to the head new sample candidate queue QhThe preparation method comprises the following steps of (1) performing; for the samples judged as the tail, the credibility of the tail category labels in the sample identification result is summed, and the sum is divided by the total number of the tail category labels in the sample identification result to obtain the credibility average value of the tail category, if the credibility average value of the tail category is larger than the tail credibility threshold TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps; head confidence threshold ThHas a value range of 0.9 to ThLess than 1; tail confidence threshold TtHas a value range of 0.9 to Tt<1。
For the sample determined to be the head in step S72, if the confidence levels of 2 ulcer diseases are 0.95 and 0.91 in the 1 st sample, respectively, the average confidence level isSetting a head confidence threshold Th0.90 and 0.93 > 0.90, add the 1 st sample to the new head sample candidateIn the selection queue, Q hContinuing to judge other head samples, namely {1 }; for the sample judged to be the tail in S72, if the confidence level of ulcer disease is 0.92 and the confidence levels of 2 huanglongbing diseases are 0.91 and 0.98, respectively, the average confidence level isSetting a tail confidence threshold Tt0.92 and 0.937 > 0.92, add the 2 nd sample to the tail new sample candidate queue, QtAnd (2), continuing to judge other tail samples.
Step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' where the head ratio is selected to be PhAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample. Head ratio PhIs calculated in a manner thatTail ratio PtIs calculated in a manner that
Candidate queue Q for head new samplehWith an average confidence of {0.93,0.90,0.92, … }, Q is assigned to the confidence level {1,3,4, … } hSorting in a descending order to obtain Qh' {1,4,3, … }, from Qh' where the head ratio is selected to beAs a new sample of the head; for tail new sample candidatesQueue QtWith an average confidence of {2,5,6, … }, Q is given a confidence of {0.937,0.92,0.93, … }, respectivelytSorting in descending order to obtain Qt' {2,6,5, … }, from Qt' in selecting the ratio of tail to tailThe sample of (2) is taken as the tail new sample. The head new samples and the tail new samples are combined to form the current new samples, the tail new data quantity proportion is far larger than that of the head, the richness of tail category data is improved, and meanwhile the head category quantity is ensured to be slowly increased.
Step S8, new data generation: generating a pseudo label of the non-artificial label for the new sample in the step S7 in a manner of labeling the rectangular frame with the labeled data set in the step S1, taking the pseudo label and the original picture corresponding to the label-free data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the labeled data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the label-free data set in the step S4;
step S9, after the newly generated data of step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow of steps S1-S8, if the accuracy of the optimal target detection model is not improved any more in step S3, the iterative learning is ended, and the final target detection model is obtained;
Step S10, the annotated data model inference: and (4) inputting the test set with the labeled data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Those of ordinary skill in the art will appreciate that the elements of the various embodiments described in connection with the embodiments disclosed herein can be embodied in electronic hardware, computer software, or combinations of both, and that the compositions of the various embodiments have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; the modifications and the substitutions do not cause the essence of the corresponding technical solutions to depart from the scope of the technical solutions of the embodiments of the present invention, and the corresponding technical solutions are all covered in the claims and the specification of the present invention.
Claims (9)
1. The utility model provides a solve unbalanced image recognition method of crops plant diseases and insect pests sample which characterized in that: the method comprises the following steps:
step S1, creating a labeled data set: collecting crop disease and insect pest picture data, and marking the positions of the disease and insect pests by using a rectangular frame to form a marked data set; dividing the labeled data set into a training set, a verification set and a test set according to a certain proportion;
step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training;
Step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed;
step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, respectively performing post-processing on each recognition result, overlapping each post-processed recognition result, screening the overlapped results through a non-maximum suppression algorithm, and finally obtaining the recognition result without labeled data;
step S7, sample selection: judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample;
Step S8, new data generation: generating a pseudo label of the non-artificial annotation for the new sample in the step S7 in a rectangular frame annotation manner with an annotated data set in the step S1, taking the pseudo label and the original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the annotated data set in the step S1 according to a certain proportion, and removing the corresponding original picture from the unmarked data set in the step S4;
step S9, after the newly generated data in step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow from step S1 to step S8, if the accuracy of the optimal target detection model in step S3 is not improved any more, the iterative learning is ended, and the final target detection model is obtained;
step S10, the annotated data model inference: and (4) inputting the test set with the labeled data set in the step S1 into the final target detection model obtained in the step S9 for model reasoning, so as to obtain an identification result of the test set after iterative learning optimization.
2. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 1, characterized in that: in step S1, the ratio of 0.8: 0.1: the labeled data set is divided into a training set, a validation set and a test set by a ratio of 0.1.
3. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the target detection model in step S2 is a YOLOv5l6 network structure model using a YOLOv5 target detection algorithm.
4. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the data enhancement in step S5 includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating and randomly increasing the brightness, wherein N is 4.
5. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the sample selection strategy in step S7 includes the following steps:
step S71, head and tail division: carrying out sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to N mDividing into tail categories; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal;
step S72, head and tail determination: classifying the corresponding category of each rectangular frame in the identification result of the label-free data in the step S6 to obtain the number of the head and the tail respectively, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
step S73, new sample candidate: for the sample judged as the head, calculating the reliability mean value of the head class in the identification result of the sample, and if the reliability mean value of the head class is larger than the head reliability threshold value ThThen add the sample to the head new sample candidate queue QhPerforming the following steps; for the samples judged as the tail, calculating the reliability mean value of the tail category, and if the reliability mean value of the tail category is larger than the tail reliability threshold TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps;
step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' where the head ratio is selected to be P hAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample.
6. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that:the head confidence threshold ThHas a value range of 0.9 to Th<1。
7. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that: the tail confidence threshold TtHas a value range of 0.9 to Tt<1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111676323.9A CN114677553B (en) | 2021-12-31 | 2021-12-31 | Image recognition method for solving imbalance problem of crop disease and pest samples |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111676323.9A CN114677553B (en) | 2021-12-31 | 2021-12-31 | Image recognition method for solving imbalance problem of crop disease and pest samples |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114677553A true CN114677553A (en) | 2022-06-28 |
CN114677553B CN114677553B (en) | 2024-05-14 |
Family
ID=82070802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111676323.9A Active CN114677553B (en) | 2021-12-31 | 2021-12-31 | Image recognition method for solving imbalance problem of crop disease and pest samples |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114677553B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523565A (en) * | 2023-11-13 | 2024-02-06 | 拓元(广州)智慧科技有限公司 | Tail class sample labeling method, device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188824A (en) * | 2019-05-31 | 2019-08-30 | 重庆大学 | A kind of small sample plant disease recognition methods and system |
CN112668490A (en) * | 2020-12-30 | 2021-04-16 | 浙江托普云农科技股份有限公司 | Yolov 4-based pest detection method, system, device and readable storage medium |
CN112686152A (en) * | 2020-12-30 | 2021-04-20 | 广西慧云信息技术有限公司 | Crop pest and disease identification method with multi-size input and multi-size targets |
CN113298150A (en) * | 2021-05-25 | 2021-08-24 | 东北林业大学 | Small sample plant disease identification method based on transfer learning and self-learning |
WO2021203505A1 (en) * | 2020-04-09 | 2021-10-14 | 丰疆智能软件科技(南京)有限公司 | Method for constructing pest detection model |
CN113657294A (en) * | 2021-08-19 | 2021-11-16 | 中化现代农业有限公司 | Crop disease and insect pest detection method and system based on computer vision |
-
2021
- 2021-12-31 CN CN202111676323.9A patent/CN114677553B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188824A (en) * | 2019-05-31 | 2019-08-30 | 重庆大学 | A kind of small sample plant disease recognition methods and system |
WO2021203505A1 (en) * | 2020-04-09 | 2021-10-14 | 丰疆智能软件科技(南京)有限公司 | Method for constructing pest detection model |
CN112668490A (en) * | 2020-12-30 | 2021-04-16 | 浙江托普云农科技股份有限公司 | Yolov 4-based pest detection method, system, device and readable storage medium |
CN112686152A (en) * | 2020-12-30 | 2021-04-20 | 广西慧云信息技术有限公司 | Crop pest and disease identification method with multi-size input and multi-size targets |
CN113298150A (en) * | 2021-05-25 | 2021-08-24 | 东北林业大学 | Small sample plant disease identification method based on transfer learning and self-learning |
CN113657294A (en) * | 2021-08-19 | 2021-11-16 | 中化现代农业有限公司 | Crop disease and insect pest detection method and system based on computer vision |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523565A (en) * | 2023-11-13 | 2024-02-06 | 拓元(广州)智慧科技有限公司 | Tail class sample labeling method, device, electronic equipment and storage medium |
CN117523565B (en) * | 2023-11-13 | 2024-05-17 | 拓元(广州)智慧科技有限公司 | Tail class sample labeling method, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114677553B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110188635B (en) | Plant disease and insect pest identification method based on attention mechanism and multi-level convolution characteristics | |
CN108171266A (en) | A kind of learning method of multiple target depth convolution production confrontation network model | |
CN101763502B (en) | High-efficiency method and system for sensitive image detection | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN110084165A (en) | The intelligent recognition and method for early warning of anomalous event under the open scene of power domain based on edge calculations | |
CN106709453A (en) | Sports video key posture extraction method based on deep learning | |
CN113298023B (en) | Insect dynamic behavior identification method based on deep learning and image technology | |
CN109472193A (en) | Method for detecting human face and device | |
CN111753805A (en) | Method and device for detecting wearing of safety helmet | |
CN111709477A (en) | Method and tool for garbage classification based on improved MobileNet network | |
CN114841961A (en) | Wheat scab detection method based on image enhancement and improvement of YOLOv5 | |
CN111652297B (en) | Fault picture generation method for image detection model training | |
CN116612386A (en) | Pepper disease and pest identification method and system based on hierarchical detection double-task model | |
CN114693616A (en) | Rice disease detection method, equipment and medium based on improved target detection model and convolutional neural network | |
CN114677553A (en) | Image recognition method for solving unbalanced problem of crop disease and insect pest samples | |
Liu et al. | “Is this blueberry ripe?”: a blueberry ripeness detection algorithm for use on picking robots | |
CN110837818A (en) | Chinese white sea rag dorsal fin identification method based on convolutional neural network | |
CN109886303A (en) | A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing | |
CN113591610A (en) | Crop leaf aphid detection method based on computer vision | |
CN113344009A (en) | Light and small network self-adaptive tomato disease feature extraction method | |
CN117132802A (en) | Method, device and storage medium for identifying field wheat diseases and insect pests | |
Banerjee et al. | Enhancing Snake Plant Disease Classification through CNN-Random Forest Integration | |
CN116246158A (en) | Self-supervision pre-training method suitable for remote sensing target detection task | |
CN116416423A (en) | Bird nest detection method based on deep learning | |
CN109949323A (en) | A kind of crop seed cleanliness judgment method based on deep learning convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |