CN114677553A - Image recognition method for solving unbalanced problem of crop disease and insect pest samples - Google Patents

Image recognition method for solving unbalanced problem of crop disease and insect pest samples Download PDF

Info

Publication number
CN114677553A
CN114677553A CN202111676323.9A CN202111676323A CN114677553A CN 114677553 A CN114677553 A CN 114677553A CN 202111676323 A CN202111676323 A CN 202111676323A CN 114677553 A CN114677553 A CN 114677553A
Authority
CN
China
Prior art keywords
tail
sample
head
data set
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111676323.9A
Other languages
Chinese (zh)
Other versions
CN114677553B (en
Inventor
苏家仪
韦光亮
王筱东
朱燕红
莫振东
顾小宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Talentcloud Information Technology Co ltd
Original Assignee
Guangxi Talentcloud Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Talentcloud Information Technology Co ltd filed Critical Guangxi Talentcloud Information Technology Co ltd
Priority to CN202111676323.9A priority Critical patent/CN114677553B/en
Publication of CN114677553A publication Critical patent/CN114677553A/en
Application granted granted Critical
Publication of CN114677553B publication Critical patent/CN114677553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of pest and disease identification, in particular to an image identification method for solving the problem of imbalance of crop pest and disease samples. The method comprises the steps of performing model training by using a current labeled data set, selecting a current optimal model through model verification, performing image enhancement on a picture without the labeled data set for a plurality of times to obtain an enhanced image, performing reasoning and screening to obtain an identification result of the image without the label, inputting the identification result into a sample selection strategy, judging whether the result is reserved according to the sample selection strategy, generating a pseudo label if the result is reserved, moving the pseudo label to the current labeled data set, continuing training a new labeled data set, and performing iterative learning according to the process until the accuracy is not improved any more. The method can reduce the influence of long tail distribution, improve the recall rate and the accuracy rate of the tail category through iterative learning, simultaneously do not influence the identification effect of the head category, only adopt a single model to carry out reasoning, do not introduce an additional network layer and have no influence on the reasoning speed.

Description

Image recognition method for solving unbalanced problem of crop disease and insect pest samples
Technical Field
The invention relates to the field of pest and disease identification, in particular to an image identification method for solving the problem of unbalanced crop pest and disease sample.
Background
Crop diseases and insect pests are one of the main agricultural disasters in the world, and if the diseases and insect pests are discovered and prevented in time, great loss can be caused to agricultural production, and the national food safety and agricultural product quality safety are threatened. Crop diseases and pests have the characteristics of multiple varieties, large influence and frequent outbreak of disasters, and the characteristics bring great challenges to the monitoring of the crop diseases and pests.
With the rapid development of computer vision and artificial intelligence, the pest and disease identification technology based on images is applied to pest and disease monitoring of various crops with the characteristics of low cost and high efficiency. The current image-based pest identification method generally uses a deep learning algorithm to perform model training and reasoning, deep learning needs to rely on mass data to achieve maximum identification effect, but crop pest image data has the characteristic of unbalanced samples, the data volume of common pest categories is very large, the data volume of uncommon pest categories is small, so that pest data are distributed in a long tail manner, the head data distributed in the long tail manner are very large, the middle part is gradually reduced, the data volume of the tail portion is very small or even no sample, the crop pest categories are many, and the tail portion is pulled to be very long.
The unbalanced sample problem has great influence on the effect of the crop disease and pest model, the model is easy to over-fit the head type with more data, and under-fit the tail type with less data. There are many general methods for solving the problem of sample imbalance, for example, a resampling algorithm undersamples the head class and oversamples the tail class to ensure the balance of training samples, but this can cause the model to be under-fitted to the head class and over-fitted to the tail class; the weight weighting algorithm gives low weight to the head category and high weight to the tail category, but the effect improvement is limited; the crop disease long-tail image identification method based on multi-stage training adjusts sample distribution in a multi-stage enhancement training mode on labeled data, massive label-free data are not fully utilized, and the richness of tail type data is insufficient.
Disclosure of Invention
Aiming at the defects in the background technology, the invention provides an image recognition method for solving the problem of imbalance of crop pest samples, and the specific technical scheme is as follows:
an image recognition method for solving the problem of imbalance of crop pest samples comprises the following steps:
step S1, creating a labeled data set: collecting crop pest and disease picture data, and marking the positions of the pests and diseases by using a rectangular frame to form a marked data set; dividing the labeled data set into a training set, a verification set and a test set according to a certain proportion;
Step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training;
step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed;
step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, performing post-processing on each recognition result respectively, overlapping the post-processed recognition results, screening the overlapped results through a non-maximum suppression algorithm, and finally obtaining the recognition result without labeled data;
Step S7, sample selection: judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample;
step S8, new data generation: generating a pseudo label of the non-artificial annotation for the new sample in the step S7 in a rectangular frame annotation manner with an annotated data set in the step S1, taking the pseudo label and the original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the annotated data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the unmarked data set in the step S4;
step S9, after the newly generated data of step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow of steps S1-S8, if the accuracy of the optimal target detection model is not improved any more in step S3, the iterative learning is ended, and the final target detection model is obtained;
step S10, labeled data model reasoning: and (4) inputting the test set with the labeled data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Preferably, in step S1, the ratio of 0.8: 0.1: the annotated data set is divided into a training set, a validation set, and a test set at a ratio of 0.1.
Preferably, the target detection model in step S2 is a YOLOv5l6 network structure model using a YOLOv5 target detection algorithm.
Preferably, the data enhancement in step S5 includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating, and randomly increasing the brightness, wherein N is 4.
Preferably, the sample selection strategy in step S7 includes the following steps:
step S71, head and tail division: performing sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
Figure BDA0003452064950000041
Figure BDA0003452064950000042
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to NmClassifying into a tail category; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal
step S72, head and tail determination: classifying the corresponding category of each rectangular frame in the identification result of the label-free data in the step S6 to obtain the number of the head and the tail respectively, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
Step S73, new sample candidate: for the sample judged as the head, calculating the reliability mean value of the head class in the identification result of the sample, and if the reliability mean value of the head class is larger than the head reliability threshold value ThThen add the sample to the head new sample candidate queue QhPerforming the following steps; for the samples judged as the tail, calculating the reliability mean value of the tail category, and if the reliability mean value of the tail category is larger than the tail reliability threshold value TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps;
step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' the head ratio is selected to be PhAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample.
Preferably, the head confidence threshold T hHas a value range of T being not less than 0.9h<1。
Preferably, the tail confidence threshold TtHas a value range of T being not less than 0.9t<1。
Preferably, the head proportion PhIs calculated in a manner that
Figure BDA0003452064950000051
Preferably, the tail portion ratio PtIs calculated in a manner that
Figure BDA0003452064950000052
The invention has the beneficial effects that: the invention provides an image recognition method for solving the unbalanced problem of crop pest samples, which comprises the steps of performing model training by utilizing a current labeled data set, selecting a current optimal model through model verification, performing image enhancement on a picture without the labeled data set for a plurality of times, obtaining an enhanced image, reasoning, screening a superposed result through a non-maximum suppression algorithm to obtain a recognition result without a labeled image, inputting the recognition result into a sample selection strategy, judging whether the result is retained according to the sample selection strategy, generating a pseudo label if the result is retained, moving to the current labeled data set, continuing training a new labeled data set, and performing iterative learning according to the flow until the accuracy is not improved any more. The invention fully utilizes massive unmarked crop disease and pest data to carry out semi-supervised learning, designs a sample selection strategy aiming at the problem of unbalanced samples, continuously adjusts the data distribution, reduces the influence of long tail distribution, improves the recall rate and the accuracy rate of tail categories by iterative learning, does not influence the identification effect of head categories, only adopts a single model to carry out reasoning, does not introduce an additional network layer, and has no influence on the reasoning speed.
Drawings
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of protection of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As shown in fig. 1, the specific embodiment of the present invention provides an image recognition method for solving the unbalanced problem of crop pest samples, comprising the following steps:
step S1, creating a labeled data set: collecting crop pest and disease picture data, and marking the positions of the pests and diseases by using a rectangular frame to form a marked data set; according to the weight ratio of 0.8: 0.1: dividing the labeled data set into a training set, a verification set and a test set in a proportion of 0.1;
step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training; the target detection model is a YOLOv5l6 network structure model adopting a YOLOv5 target detection algorithm.
Step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed; data enhancement includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating and randomly increasing the brightness, wherein N is 4.
Step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, respectively performing post-processing on each recognition result, wherein the post-processing comprises recovering the random horizontally-overturned picture result according to a horizontal overturning parameter, recovering the random vertically-overturned picture result according to a vertical overturning parameter, recovering the random rotated picture result according to a rotating parameter, superposing each post-processed recognition result, and screening the superposed results through a non-maximum suppression algorithm to finally obtain a recognition result without labeled data;
Step S7, sample selection: and judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample. The sample selection strategy comprises the following steps:
step S71, head and tail division: carrying out sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
Figure BDA0003452064950000071
Figure BDA0003452064950000072
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to NmClassifying into a tail category; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal
assuming that the training set with labeled data set has 100 pest categories, C is 100, the 1 st category is ulcer disease, the labeled number of ulcer disease is 20000, N120000, class 2 is Huanglongbing, the number of labels for Huanglongbing is 20, N2Counting the total number N of labels in all categories as 20 totalAnd obtaining:
Figure BDA0003452064950000081
average number of labels
Figure BDA0003452064950000082
Step S72, head-tail determination: and (5) performing head and tail classification on the category corresponding to each rectangular frame in the identification result without the labeling data in the step (S6) to respectively obtain the number of the head and the tail, wherein if the number of the head is greater than that of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample.
Head and tail judgment is carried out on 100 pest categories, the number 20000 of ulcer disease labels is greater than the average number 1000 of labels, the disease belongs to the head category, and the number 20 of Huanglongbing disease labels is less than the average number 1000 of labels, and the disease belongs to the tail category. Counting the total number N of head class labelshAssuming that 20 categories are head categories and 80 categories are tail categories, the total number of labels N is counted for the 20 head categorieshTo obtain Nh95000, the 20 tail classes are statistically labeled with the total number NtTo obtain Nt=5000,Nh+Nt=95000+5000=100000=Ntotal100000 is the total number of labels N for all categoriestotal
Assuming that there are 200000 picture samples in the unlabeled data set, sequentially performing head and tail determination on each sample, wherein the identification result of the 1 st sample contains 2 detection frames, 2 of which are ulcer diseases, dividing according to the head and tail categories in step S71, and determining that the 1 st sample is a head sample if the number of heads is 2, the number of tails is 0, and the number of heads is greater than the number of tails; the identification result of the 2 nd sample contains 3 detection frames, wherein 1 is ulcer disease and 2 are huanglongbing disease, the 2 nd sample is judged to be a tail sample according to the head and tail classification in the step S71, the number of heads is 1, the number of tails is 2, and the number of heads is less than the number of tails.
Step S73, new sample candidate: for the sample judged as the head, the credibility of the head class label in the sample recognition result is summed and divided by the total number of the head class labels in the sample recognition result to obtain the credibility average value of the head class, and if the credibility average value of the head class is larger than the head credibility threshold ThThen add the sample to the head new sample candidate queue QhThe preparation method comprises the following steps of (1) performing; for the samples judged as the tail, the credibility of the tail category labels in the sample identification result is summed, and the sum is divided by the total number of the tail category labels in the sample identification result to obtain the credibility average value of the tail category, if the credibility average value of the tail category is larger than the tail credibility threshold TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps; head confidence threshold ThHas a value range of 0.9 to ThLess than 1; tail confidence threshold TtHas a value range of 0.9 to Tt<1。
For the sample determined to be the head in step S72, if the confidence levels of 2 ulcer diseases are 0.95 and 0.91 in the 1 st sample, respectively, the average confidence level is
Figure RE-GDA0003624046420000091
Setting a head confidence threshold Th0.90 and 0.93 > 0.90, add the 1 st sample to the new head sample candidateIn the selection queue, Q hContinuing to judge other head samples, namely {1 }; for the sample judged to be the tail in S72, if the confidence level of ulcer disease is 0.92 and the confidence levels of 2 huanglongbing diseases are 0.91 and 0.98, respectively, the average confidence level is
Figure RE-GDA0003624046420000092
Setting a tail confidence threshold Tt0.92 and 0.937 > 0.92, add the 2 nd sample to the tail new sample candidate queue, QtAnd (2), continuing to judge other tail samples.
Step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' where the head ratio is selected to be PhAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample. Head ratio PhIs calculated in a manner that
Figure BDA0003452064950000093
Tail ratio PtIs calculated in a manner that
Figure BDA0003452064950000094
Candidate queue Q for head new samplehWith an average confidence of {0.93,0.90,0.92, … }, Q is assigned to the confidence level {1,3,4, … } hSorting in a descending order to obtain Qh' {1,4,3, … }, from Qh' where the head ratio is selected to be
Figure BDA0003452064950000101
As a new sample of the head; for tail new sample candidatesQueue QtWith an average confidence of {2,5,6, … }, Q is given a confidence of {0.937,0.92,0.93, … }, respectivelytSorting in descending order to obtain Qt' {2,6,5, … }, from Qt' in selecting the ratio of tail to tail
Figure BDA0003452064950000102
The sample of (2) is taken as the tail new sample. The head new samples and the tail new samples are combined to form the current new samples, the tail new data quantity proportion is far larger than that of the head, the richness of tail category data is improved, and meanwhile the head category quantity is ensured to be slowly increased.
Step S8, new data generation: generating a pseudo label of the non-artificial label for the new sample in the step S7 in a manner of labeling the rectangular frame with the labeled data set in the step S1, taking the pseudo label and the original picture corresponding to the label-free data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the labeled data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the label-free data set in the step S4;
step S9, after the newly generated data of step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow of steps S1-S8, if the accuracy of the optimal target detection model is not improved any more in step S3, the iterative learning is ended, and the final target detection model is obtained;
Step S10, the annotated data model inference: and (4) inputting the test set with the labeled data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Those of ordinary skill in the art will appreciate that the elements of the various embodiments described in connection with the embodiments disclosed herein can be embodied in electronic hardware, computer software, or combinations of both, and that the compositions of the various embodiments have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the unit is only one division of logical functions, and other division manners may be used in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; the modifications and the substitutions do not cause the essence of the corresponding technical solutions to depart from the scope of the technical solutions of the embodiments of the present invention, and the corresponding technical solutions are all covered in the claims and the specification of the present invention.

Claims (9)

1. The utility model provides a solve unbalanced image recognition method of crops plant diseases and insect pests sample which characterized in that: the method comprises the following steps:
step S1, creating a labeled data set: collecting crop disease and insect pest picture data, and marking the positions of the disease and insect pests by using a rectangular frame to form a marked data set; dividing the labeled data set into a training set, a verification set and a test set according to a certain proportion;
step S2, model training: constructing a target detection model, training the training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each training;
Step S3, model verification: inputting the verification set images in the step S1 into the intermediate model trained in the step S2 for model verification, and selecting the intermediate target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, creating a label-free data set: collecting mass crop disease and insect pest picture data as a label-free data set;
step S5, image enhancement: performing data enhancement on each original picture without the labeled data set in the step S4 to obtain enhanced N pictures, and merging the enhanced N pictures with the corresponding original pictures to obtain N +1 combined pictures as a group of data to be processed;
step S6, reasoning without a label data model: inputting each group of data to be processed in the step S5 into the current optimal target detection model in the step S3 respectively for reasoning to obtain N +1 recognition results, respectively performing post-processing on each recognition result, overlapping each post-processed recognition result, screening the overlapped results through a non-maximum suppression algorithm, and finally obtaining the recognition result without labeled data;
step S7, sample selection: judging the identification result of the non-labeled data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting the original picture corresponding to the identification result from the non-labeled data set in the step S4 as a new sample;
Step S8, new data generation: generating a pseudo label of the non-artificial annotation for the new sample in the step S7 in a rectangular frame annotation manner with an annotated data set in the step S1, taking the pseudo label and the original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into the training set, the verification set and the test set in the annotated data set in the step S1 according to a certain proportion, and removing the corresponding original picture from the unmarked data set in the step S4;
step S9, after the newly generated data in step S8 is added into the labeled data set in step S1, the iterative learning is continued according to the flow from step S1 to step S8, if the accuracy of the optimal target detection model in step S3 is not improved any more, the iterative learning is ended, and the final target detection model is obtained;
step S10, the annotated data model inference: and (4) inputting the test set with the labeled data set in the step S1 into the final target detection model obtained in the step S9 for model reasoning, so as to obtain an identification result of the test set after iterative learning optimization.
2. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 1, characterized in that: in step S1, the ratio of 0.8: 0.1: the labeled data set is divided into a training set, a validation set and a test set by a ratio of 0.1.
3. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the target detection model in step S2 is a YOLOv5l6 network structure model using a YOLOv5 target detection algorithm.
4. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the data enhancement in step S5 includes 4 ways: and randomly turning horizontally, randomly turning vertically, randomly rotating and randomly increasing the brightness, wherein N is 4.
5. The image recognition method for solving the unbalance problem of the crop pest and disease damage samples according to claim 1, characterized in that: the sample selection strategy in step S7 includes the following steps:
step S71, head and tail division: carrying out sample quantity statistics on the training set with the labeled data set in the step S1, wherein the labeled data set has C pest categories in total, and calculating the labeled quantity N of each pest category CcC is equal to {1,2, …, C }, and the total number of labels is NtotalAverage number of labels NmAnd then:
Figure FDA0003452064940000031
Figure FDA0003452064940000032
the number of labels is larger than NmIs divided into a head category, otherwise the number of labels is less than or equal to N mDividing into tail categories; counting the total number N of the head category labelshTotal number of tail class labels NtAnd then:
Nh+Nt=Ntotal
step S72, head and tail determination: classifying the corresponding category of each rectangular frame in the identification result of the label-free data in the step S6 to obtain the number of the head and the tail respectively, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
step S73, new sample candidate: for the sample judged as the head, calculating the reliability mean value of the head class in the identification result of the sample, and if the reliability mean value of the head class is larger than the head reliability threshold value ThThen add the sample to the head new sample candidate queue QhPerforming the following steps; for the samples judged as the tail, calculating the reliability mean value of the tail category, and if the reliability mean value of the tail category is larger than the tail reliability threshold TtThen add the sample to the tail new sample candidate queue QtPerforming the following steps;
step S74, selecting a new sample: candidate queue Q for head new samplehSorting in descending order according to the credibility to obtain a sorted head new sample candidate queue QhFrom the sorted head new sample candidate queue Qh' where the head ratio is selected to be P hAs a new sample of the head; candidate queue Q for tail new samplestSorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue QtFrom the sorted tail new sample candidate queue Qt' in the selection of the ratio of tail to PtThe sample of (2) is taken as a new tail sample; the head new sample and the tail new sample are combined into a current new sample.
6. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that:the head confidence threshold ThHas a value range of 0.9 to Th<1。
7. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that: the tail confidence threshold TtHas a value range of 0.9 to Tt<1。
8. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that: the head ratio PhIs calculated in a manner that
Figure FDA0003452064940000041
9. The image recognition method for solving the imbalance problem of the crop pest samples according to claim 5, characterized in that: the tail portion ratio PtIs calculated in a manner that
Figure FDA0003452064940000042
CN202111676323.9A 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples Active CN114677553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111676323.9A CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111676323.9A CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Publications (2)

Publication Number Publication Date
CN114677553A true CN114677553A (en) 2022-06-28
CN114677553B CN114677553B (en) 2024-05-14

Family

ID=82070802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111676323.9A Active CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Country Status (1)

Country Link
CN (1) CN114677553B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523565A (en) * 2023-11-13 2024-02-06 拓元(广州)智慧科技有限公司 Tail class sample labeling method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188824A (en) * 2019-05-31 2019-08-30 重庆大学 A kind of small sample plant disease recognition methods and system
CN112668490A (en) * 2020-12-30 2021-04-16 浙江托普云农科技股份有限公司 Yolov 4-based pest detection method, system, device and readable storage medium
CN112686152A (en) * 2020-12-30 2021-04-20 广西慧云信息技术有限公司 Crop pest and disease identification method with multi-size input and multi-size targets
CN113298150A (en) * 2021-05-25 2021-08-24 东北林业大学 Small sample plant disease identification method based on transfer learning and self-learning
WO2021203505A1 (en) * 2020-04-09 2021-10-14 丰疆智能软件科技(南京)有限公司 Method for constructing pest detection model
CN113657294A (en) * 2021-08-19 2021-11-16 中化现代农业有限公司 Crop disease and insect pest detection method and system based on computer vision

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188824A (en) * 2019-05-31 2019-08-30 重庆大学 A kind of small sample plant disease recognition methods and system
WO2021203505A1 (en) * 2020-04-09 2021-10-14 丰疆智能软件科技(南京)有限公司 Method for constructing pest detection model
CN112668490A (en) * 2020-12-30 2021-04-16 浙江托普云农科技股份有限公司 Yolov 4-based pest detection method, system, device and readable storage medium
CN112686152A (en) * 2020-12-30 2021-04-20 广西慧云信息技术有限公司 Crop pest and disease identification method with multi-size input and multi-size targets
CN113298150A (en) * 2021-05-25 2021-08-24 东北林业大学 Small sample plant disease identification method based on transfer learning and self-learning
CN113657294A (en) * 2021-08-19 2021-11-16 中化现代农业有限公司 Crop disease and insect pest detection method and system based on computer vision

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523565A (en) * 2023-11-13 2024-02-06 拓元(广州)智慧科技有限公司 Tail class sample labeling method, device, electronic equipment and storage medium
CN117523565B (en) * 2023-11-13 2024-05-17 拓元(广州)智慧科技有限公司 Tail class sample labeling method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114677553B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN110188635B (en) Plant disease and insect pest identification method based on attention mechanism and multi-level convolution characteristics
CN108171266A (en) A kind of learning method of multiple target depth convolution production confrontation network model
CN101763502B (en) High-efficiency method and system for sensitive image detection
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN110084165A (en) The intelligent recognition and method for early warning of anomalous event under the open scene of power domain based on edge calculations
CN106709453A (en) Sports video key posture extraction method based on deep learning
CN113298023B (en) Insect dynamic behavior identification method based on deep learning and image technology
CN109472193A (en) Method for detecting human face and device
CN111753805A (en) Method and device for detecting wearing of safety helmet
CN111709477A (en) Method and tool for garbage classification based on improved MobileNet network
CN114841961A (en) Wheat scab detection method based on image enhancement and improvement of YOLOv5
CN111652297B (en) Fault picture generation method for image detection model training
CN116612386A (en) Pepper disease and pest identification method and system based on hierarchical detection double-task model
CN114693616A (en) Rice disease detection method, equipment and medium based on improved target detection model and convolutional neural network
CN114677553A (en) Image recognition method for solving unbalanced problem of crop disease and insect pest samples
Liu et al. “Is this blueberry ripe?”: a blueberry ripeness detection algorithm for use on picking robots
CN110837818A (en) Chinese white sea rag dorsal fin identification method based on convolutional neural network
CN109886303A (en) A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing
CN113591610A (en) Crop leaf aphid detection method based on computer vision
CN113344009A (en) Light and small network self-adaptive tomato disease feature extraction method
CN117132802A (en) Method, device and storage medium for identifying field wheat diseases and insect pests
Banerjee et al. Enhancing Snake Plant Disease Classification through CNN-Random Forest Integration
CN116246158A (en) Self-supervision pre-training method suitable for remote sensing target detection task
CN116416423A (en) Bird nest detection method based on deep learning
CN109949323A (en) A kind of crop seed cleanliness judgment method based on deep learning convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant