CN114677553B - Image recognition method for solving imbalance problem of crop disease and pest samples - Google Patents

Image recognition method for solving imbalance problem of crop disease and pest samples Download PDF

Info

Publication number
CN114677553B
CN114677553B CN202111676323.9A CN202111676323A CN114677553B CN 114677553 B CN114677553 B CN 114677553B CN 202111676323 A CN202111676323 A CN 202111676323A CN 114677553 B CN114677553 B CN 114677553B
Authority
CN
China
Prior art keywords
tail
head
sample
data set
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111676323.9A
Other languages
Chinese (zh)
Other versions
CN114677553A (en
Inventor
苏家仪
韦光亮
王筱东
朱燕红
莫振东
顾小宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Talentcloud Information Technology Co ltd
Original Assignee
Guangxi Talentcloud Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Talentcloud Information Technology Co ltd filed Critical Guangxi Talentcloud Information Technology Co ltd
Priority to CN202111676323.9A priority Critical patent/CN114677553B/en
Publication of CN114677553A publication Critical patent/CN114677553A/en
Application granted granted Critical
Publication of CN114677553B publication Critical patent/CN114677553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of pest and disease damage identification, in particular to an image identification method for solving the problem of unbalanced crop pest and disease damage samples. The method comprises the steps of carrying out model training by using a current marked data set, selecting a current optimal model through model verification, carrying out image enhancement on a picture without the marked data set for a plurality of times, reasoning and screening the enhanced image to obtain a recognition result of the image without the marked, inputting the recognition result into a sample selection strategy, judging whether to retain the result according to the sample selection strategy, generating a pseudo tag if the result is retained, moving the pseudo tag to the current marked data set, continuing to train a new marked data set, and carrying out iterative learning according to the process until the accuracy rate is not improved. The invention can reduce the influence of long tail distribution, improves the recall rate and the accuracy rate of tail categories through iterative learning, does not influence the recognition effect of head categories, only adopts a single model to carry out reasoning, does not introduce an additional network layer, and has no influence on the reasoning speed.

Description

Image recognition method for solving imbalance problem of crop disease and pest samples
Technical Field
The invention relates to the field of pest and disease damage identification, in particular to an image identification method for solving the problem of unbalanced crop pest and disease damage samples.
Background
Crop diseases and insect pests are one of the main agricultural disasters worldwide, and if the diseases and insect pests are discovered and prevented and controlled in a short time, the diseases and insect pests can cause great loss to agricultural production, so that national grain safety and agricultural product quality safety are threatened. Crop diseases and insect pests have the characteristics of multiple types, large influence and frequent outbreak of disasters, and the characteristics bring great challenges to crop disease and insect pest monitoring.
With the rapid development of computer vision and artificial intelligence, image-based pest identification technology has been applied to pest monitoring of various crops with low cost and high efficiency. At present, a deep learning algorithm is generally used for model training and reasoning, the deep learning needs to rely on mass data to achieve maximum recognition effect, but crop pest image data has the characteristic of unbalanced samples, common pest category data quantity is very large, unusual pest category data quantity is small, therefore, pest data is distributed in long tails, head data distributed in long tails is very large, middle parts are gradually reduced, tail data is very few or even no samples, crop pest categories are large, and tails are long.
The problem of sample imbalance has a great influence on the effect of a crop disease and insect pest model, the model is easy to be over-fitted for head categories with more data, and under-fitted for tail categories with less data. There are many general methods for solving the problem of sample imbalance, such as undersampling the head class and oversampling the tail class by a resampling algorithm to ensure the balance of the training samples, but this can lead to under-fitting of the head class by the model and over-fitting of the tail class; the weighting algorithm gives low weight to the head category and high weight to the tail category, but the effect is limited; according to the crop disease long-tail image recognition method based on multi-stage training, sample distribution is adjusted in a multi-stage enhancement training mode on marked data, massive unmarked data are not fully utilized, and tail part class data are not abundant enough.
Disclosure of Invention
Aiming at the defects existing in the background technology, the invention provides an image recognition method for solving the problem of unbalanced crop pest samples, which comprises the following specific technical scheme:
an image recognition method for solving the problem of unbalance of crop pest samples comprises the following steps:
Step S1, a marked data set is manufactured: collecting crop disease and pest image data, marking positions of disease and pest by rectangular frames, and forming a marked data set; dividing the marked data set into a training set, a verification set and a test set according to a certain proportion;
step S2, model training: constructing a target detection model, training a training set in the data set in the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each round of training;
Step S3, model verification: inputting the verification set image in the step S1 into the middle model trained in the step S2 for model verification, and selecting the middle target detection model with the highest recognition accuracy as the current optimal target detection model;
Step S4, manufacturing a non-labeling data set: collecting massive crop disease and pest image data as a non-labeling data set;
Step S5, image enhancement: carrying out data enhancement on each original picture without the marked data set in the step S4 to obtain N enhanced pictures, and combining the N enhanced pictures with the corresponding original pictures to obtain N+1 combined pictures as a group of data to be processed;
step S6, non-labeling data model reasoning: respectively inputting each group of data to be processed in the step S5 into a current optimal target detection model in the step S3 for reasoning to obtain N+1 recognition results, respectively carrying out post-processing on each recognition result, superposing the recognition results after each post-processing, screening the superposed results through a non-maximum suppression algorithm, and finally obtaining a recognition result without marked data;
Step S7, sample selection: judging the identification result of the non-marked data in the step S6 according to a sample selection strategy, determining whether to reserve the identification result, and if so, selecting an original picture corresponding to the identification result from the non-marked data set in the step S4 as a new sample;
Step S8, new data generation: generating a non-manually marked pseudo tag for the new sample in the step S7 according to a rectangular frame marking mode of the marked data set in the step S1, taking the pseudo tag and an original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into a training set, a verification set and a test set of the marked data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the unmarked data set in the step S4;
step S9, after the data newly generated in the step S8 are added in the marked data set in the step S1, iteration learning is continuously carried out according to the flow of the steps S1-S8, and if the accuracy of the optimal target detection model in the step S3 is not improved any more, iteration learning is ended, and a final target detection model is obtained;
Step S10, labeling data model reasoning: and (3) inputting the test set with the marked data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Preferably, in step S1, the ratio of 0.8:0.1: the scale of 0.1 divides the annotated data set into a training set, a validation set and a test set.
Preferably, the object detection model in the step S2 is a YOLOv l6 network structure model adopting YOLOv object detection algorithm.
Preferably, the data enhancement in step S5 includes 4 ways: random horizontal flip, random vertical flip, random rotation, random increase in brightness, then n=4.
Preferably, the sample selection policy in step S7 includes the steps of:
Step S71, head and tail division: the training set with the labeling data set in the step S1 is subjected to sample number statistics, wherein the labeling data set has C pest and disease damage categories, the labeling number N c of each pest and disease damage category C is calculated, C epsilon {1,2, …, C }, the total labeling number is N total, and the average labeling number N m is:
Classifying the classes with the labeling quantity being greater than N m into head classes, or classifying the classes with the labeling quantity being less than or equal to N m into tail classes; counting the total number of head category labels N h and the total number of tail category labels N t, wherein the following steps are as follows:
Nh+Nt=Ntotal
Step S72, head and tail judgment: classifying the head and the tail of the class corresponding to each rectangular frame in the identification result without the marking data in the step S6 to respectively obtain the number of the head and the tail, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
step S73, new sample candidates: for the sample judged to be the head, calculating the reliability average value of the head category in the sample identification result, and if the reliability average value of the head category is larger than the head reliability threshold T h, adding the sample into a head new sample candidate queue Q h; for the samples judged to be tail, calculating the reliability average value of the tail category, and if the reliability average value of the tail category is larger than the tail reliability threshold T t, adding the samples into a tail new sample candidate queue Q t;
Step S74, new sample selection: for the head new sample candidate queue Q h, sorting in descending order according to the credibility to obtain a sorted head new sample candidate queue Q h ', and selecting a sample with the head ratio of P h from the sorted head new sample candidate queue Q h' as a head new sample; for the tail new sample candidate queue Q t, sorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue Q t ', and selecting a sample with the tail ratio of P t from the sorted tail new sample candidate queue Q t' as a tail new sample; the head new sample and the tail new sample are combined into the current new sample.
Preferably, the value range of the head reliability threshold T h is more than or equal to 0.9 and less than or equal to T h and less than 1.
Preferably, the range of the tail credibility threshold T t is more than or equal to 0.9 and less than or equal to T t and less than 1.
Preferably, the head duty ratio P h is calculated by
Preferably, the tail duty ratio P t is calculated by
The beneficial effects of the invention are as follows: the invention provides an image recognition method for solving the problem of unbalanced crop disease and pest samples, which comprises the steps of carrying out model training by using a current labeling data set, selecting a current optimal model through model verification, carrying out image enhancement on pictures without the labeling data set for a plurality of times, reasoning the enhanced images, screening the superposed results by a non-maximum suppression algorithm to obtain recognition results of the non-labeling images, inputting the recognition results into a sample selection strategy, judging whether the results are reserved according to the sample selection strategy, generating a pseudo tag if the results are reserved, moving the pseudo tag into the current labeling data set, continuing training on the new labeling data set, and carrying out iterative learning according to the process until the accuracy is not improved any more. According to the invention, massive unmarked crop disease and pest data are fully utilized to carry out semi-supervised learning, a sample selection strategy is designed aiming at the problem of sample imbalance, data distribution is continuously adjusted, long tail distribution influence is reduced, the recall rate and the accuracy rate of tail categories are improved through iterative learning, the head category identification effect is not influenced, only a single model is adopted to carry out reasoning, no additional network layer is introduced, and no influence is caused on the reasoning speed.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As shown in fig. 1, the embodiment of the invention provides an image recognition method for solving the problem of imbalance of crop pest samples, which comprises the following steps:
step S1, a marked data set is manufactured: collecting crop disease and pest image data, marking positions of disease and pest by rectangular frames, and forming a marked data set; according to 0.8:0.1: dividing the marked data set into a training set, a verification set and a test set according to the proportion of 0.1;
Step S2, model training: constructing a target detection model, training a training set in the data set of the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each round of training; the object detection model is a YOLOv l6 network structure model adopting YOLOv object detection algorithm.
Step S3, model verification: inputting the verification set image in the step S1 into the middle model trained in the step S2 for model verification, and selecting the middle target detection model with the highest recognition accuracy as the current optimal target detection model;
Step S4, manufacturing a non-labeling data set: collecting massive crop disease and pest image data as a non-labeling data set;
Step S5, image enhancement: carrying out data enhancement on each original picture without the marked data set in the step S4 to obtain N enhanced pictures, and combining the N enhanced pictures with the corresponding original pictures to obtain N+1 combined pictures as a group of data to be processed; data enhancement includes 4 ways: random horizontal flip, random vertical flip, random rotation, random increase in brightness, then n=4.
Step S6, non-labeling data model reasoning: respectively inputting each group of data to be processed in the step S5 into a current optimal target detection model in the step S3 for reasoning to obtain N+1 recognition results, respectively carrying out post-processing on each recognition result, wherein the post-processing comprises the steps of recovering a random horizontal turnover picture result according to horizontal turnover parameters, recovering a random vertical turnover picture result according to vertical turnover parameters, recovering a random rotation picture result according to rotation parameters, superposing each recognition result after the post-processing, screening the superposed result through a non-maximum suppression algorithm, and finally obtaining a recognition result without marking data;
step S7, sample selection: judging the identification result of the unmarked data in the step S6 according to a sample selection strategy, determining whether to retain the identification result, and if so, selecting an original picture corresponding to the identification result from the unmarked data set in the step S4 as a new sample. The sample selection strategy comprises the following steps:
Step S71, head and tail division: counting the number of samples of the training set with the labeling data set in the step S1, wherein the labeling data set has C pest and disease damage categories, calculating the labeling number N c of each pest and disease damage category C, C E {1,2, …, C }, the total labeling number is N total, and the average labeling number N m, and then:
Classifying the classes with the labeling quantity being greater than N m into head classes, or classifying the classes with the labeling quantity being less than or equal to N m into tail classes; counting the total number of head category labels N h and the total number of tail category labels N t, wherein the following steps are as follows:
Nh+Nt=Ntotal
Assuming that the training set with the labeling data set has 100 plant diseases and insect pests, wherein the C=100, the 1 st category is ulcer disease, the labeling quantity of the ulcer disease is 20000, the N 1 =20000, the 2 nd category is yellow dragon disease, the labeling quantity of the yellow dragon disease is 20, the N 2 =20, and the total labeling quantity N total of all the categories is counted to obtain:
Average number of labels
Step S72, head and tail judgment: and (3) classifying the head and the tail of the class corresponding to each rectangular frame in the identification result without the marked data in the step (S6) to obtain the number of the head and the tail respectively, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample.
The head and tail of 100 plant diseases and insect pests are judged, the labeling quantity 20000 of the ulcer diseases is larger than the average labeling quantity 1000, the insect pests belong to the head category, the labeling quantity 20 of the yellow dragon diseases is smaller than the average labeling quantity 1000, and the insect pests belong to the tail category. Counting the total number of head category labels N h, assuming 20 categories as head categories and 80 categories as tail categories, counting the total number of labels N h for the 20 head categories to obtain N h =95000, counting the total number of labels N t for the 20 tail categories to obtain N t=5000,Nh+Nt=95000+5000=100000=Ntotal, and 100000 being the total number of labels N total of all the categories.
Assuming that there are 200000 image samples in the label-free dataset, sequentially judging the head and the tail of each sample, wherein the identification result of the 1 st sample comprises 2 detection frames, 2 detection frames are ulcer diseases, the number of the heads is 2, the number of the tails is 0, and the number of the heads is greater than the number of the tails, and judging the 1 st sample as the head sample; the recognition result of the 2 nd sample comprises 3 detection frames, wherein 1 is ulcerous disease, 2 is yellow dragon disease, the head and tail parts are classified according to the head and tail parts in the step S71, the head number is 1, the tail number is 2, the head number is smaller than the tail number, and the 2 nd sample is judged to be the tail sample.
Step S73, new sample candidates: summing the credibility of the head category labels in the sample identification result for the samples judged to be heads, dividing the sum by the total number of the head category labels in the sample identification result to obtain the credibility average value of the head category, and adding the samples into a head new sample candidate queue Q h if the credibility average value of the head category is larger than a head credibility threshold T h; summing the credibility of tail class labels in the sample identification result for the samples judged to be tail, dividing the sum by the total number of the tail class labels in the sample identification result to obtain the credibility average value of tail classes, and adding the samples into a tail new sample candidate queue Q t if the credibility average value of the tail classes is larger than a tail credibility threshold T t; the value range of the head credibility threshold T h is more than or equal to 0.9 and less than or equal to T h and less than 1; the range of the tail credibility threshold T t is more than or equal to 0.9 and less than or equal to T t and less than 1.
For the samples judged to be head in step S72, if the 1 st sample, the 2 ulcer confidence levels are 0.95 and 0.91, respectively, the average confidence level isSetting a head reliability threshold T h =0.90, wherein 0.93 is more than 0.90, adding the 1 st sample into a head new sample candidate queue, and continuously judging other head samples by using Q h = {1 }; for the sample judged to be tail in S72, if the confidence level of the ulcer is 0.92,2 yellow dragon diseases and is 0.91 and 0.98 respectively, if the sample is the 2 nd sample, the average confidence level is/>And setting a tail reliability threshold T t =0.92, wherein 0.937 is larger than 0.92, adding the 2 nd sample into a tail new sample candidate queue, and continuing to judge other tail samples by using Q t = {2 }.
Step S74, new sample selection: for the head new sample candidate queue Q h, performing descending order according to the credibility to obtain an ordered head new sample candidate queue Q h ', and selecting a sample with the head ratio of P h from the ordered head new sample candidate queue Q h' as a head new sample; for the tail new sample candidate queue Q t, sorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue Q t ', and selecting a sample with the tail ratio of P t from the sorted tail new sample candidate queue Q t' as a tail new sample; the head new sample and the tail new sample are combined into the current new sample. The head duty ratio P h is calculated byThe tail duty ratio P t is calculated in a/>
For the new sample head candidate queue Q h = {1,3,4, … }, the average reliability is {0.93,0.90,0.92, … }, Q h is sorted in descending order according to the reliability to obtain Q h' = {1,4,3, … }, and the head duty ratio is selected from Q h As a header new sample; for the tail new sample candidate queue Q t = {2,5,6, … }, the average reliability is {0.937,0.92,0.93, … }, and Q t is sorted in descending order according to the reliability to obtain Q t '= {2,6,5, … }, and the tail duty ratio is selected from Q t'/>As a tail new sample. The new head sample and the new tail sample are combined into the current new sample, the new tail data quantity ratio is far greater than that of the head, the richness of tail category data is improved, and meanwhile, the number of head categories is guaranteed to be slowly increased.
Step S8, new data generation: generating a non-manually marked pseudo tag for the new sample in the step S7 according to a rectangular frame marking mode of the marked data set in the step S1, taking the pseudo tag and an original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into a training set, a verification set and a test set of the marked data set in the step S1 according to a certain proportion, and removing the original picture corresponding to the unmarked data set in the step S4;
step S9, after the data newly generated in the step S8 are added in the marked data set in the step S1, iteration learning is continuously carried out according to the flow of the steps S1-S8, and if the accuracy of the optimal target detection model in the step S3 is not improved any more, iteration learning is ended, and a final target detection model is obtained;
Step S10, labeling data model reasoning: and (3) inputting the test set with the marked data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
Those of ordinary skill in the art will appreciate that the elements of the various embodiments described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both, and in order to clearly illustrate the interchangeability of hardware and software, the components of the various embodiments have been described above generally in terms of functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, it should be understood that the division of the units is merely a logic function division, and there may be other division manners in actual implementation, for example, multiple units may be combined into one unit, one unit may be split into multiple units, or some features may be omitted.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (8)

1. An image recognition method for solving the problem of unbalance of crop pest samples is characterized by comprising the following steps of: the method comprises the following steps:
Step S1, a marked data set is manufactured: collecting crop disease and pest image data, marking positions of disease and pest by rectangular frames, and forming a marked data set; dividing the marked data set into a training set, a verification set and a test set according to a certain proportion;
step S2, model training: constructing a target detection model, training a training set in the data set in the step S1 by adopting the constructed target detection model, and outputting an intermediate target detection model after each round of training;
Step S3, model verification: inputting the verification set image in the step S1 into the middle model trained in the step S2 for model verification, and selecting the middle target detection model with the highest recognition accuracy as the current optimal target detection model;
step S4, manufacturing a non-labeling data set: collecting massive crop disease and pest image data as a non-labeling data set;
step S5, image enhancement: carrying out data enhancement on each original picture without the marked data set in the step S4 to obtain N enhanced pictures, and combining the N enhanced pictures with the corresponding original pictures to obtain N+1 combined pictures as a group of data to be processed;
Step S6, non-labeling data model reasoning: respectively inputting each group of data to be processed in the step S5 into a current optimal target detection model in the step S3 for reasoning to obtain N+1 recognition results, respectively carrying out post-processing on each recognition result, superposing each recognition result after post-processing, screening the superposed results through a non-maximum suppression algorithm, and finally obtaining a recognition result without marked data;
Step S7, sample selection: judging the identification result of the non-marked data in the step S6 according to a sample selection strategy, determining whether to reserve the identification result, and if so, selecting an original picture corresponding to the identification result from the non-marked data set in the step S4 as a new sample; the sample selection strategy in step S7 includes the following steps:
Step S71, head and tail division: carrying out sample number statistics on the training set with the labeling dataset in the step S1, wherein the labeling dataset contains C plant diseases and insect pests, calculating the labeling number N c of each plant diseases and insect pests C, C epsilon {1,2, …, C }, the total labeling number is N total, and the average labeling number N m, and then:
Classifying the classes with the labeling quantity being greater than N m into head classes, or classifying the classes with the labeling quantity being less than or equal to N m into tail classes; counting the total number of head category labels N h and the total number of tail category labels N t, wherein the following steps are as follows:
Nh+Nt=Ntotal
step S72, head and tail judgment: classifying the head and the tail of the class corresponding to each rectangular frame in the identification result without the marking data in the step S6 to respectively obtain the number of the head and the tail, wherein if the number of the head is greater than the number of the tail, the sample belongs to the head sample, otherwise, the sample belongs to the tail sample;
Step S73, new sample candidates: for the sample judged to be the head, calculating the reliability average value of the head category in the sample identification result, and if the reliability average value of the head category is larger than the head reliability threshold T h, adding the sample into a head new sample candidate queue Q h; for the samples judged to be tail, calculating the reliability average value of the tail category, and if the reliability average value of the tail category is larger than the tail reliability threshold T t, adding the samples into a tail new sample candidate queue Q t;
Step S74, new sample selection: for the head new sample candidate queue Q h, sorting in descending order according to the credibility to obtain a sorted head new sample candidate queue Q h ', and selecting a sample with the head ratio of P h from the sorted head new sample candidate queue Q h' as a head new sample; for the tail new sample candidate queue Q t, sorting in descending order according to the credibility to obtain a sorted tail new sample candidate queue Q t ', and selecting a sample with the tail ratio of P t from the sorted tail new sample candidate queue Q t' as a tail new sample; combining the head new sample and the tail new sample into a current new sample;
Step S8, new data generation: generating a non-manually marked pseudo tag for the new sample in the step S7 according to a rectangular frame marking mode of the marked data set in the step S1, taking the pseudo tag and an original picture corresponding to the unmarked data set in the step S4 as new data, putting all the new data into a training set, a verification set and a test set in the marked data set in the step S1 according to a certain proportion, and removing the corresponding original picture in the unmarked data set in the step S4;
Step S9, after the data newly generated in the step S8 are added in the marked data set in the step S1, iteration learning is continuously carried out according to the flow of the steps S1-S8, and if the accuracy of the optimal target detection model in the step S3 is not improved any more, iteration learning is ended, and a final target detection model is obtained;
Step S10, labeling data model reasoning: and (3) inputting the test set with the marked data set in the step (S1) into the final target detection model obtained in the step (S9) for model reasoning to obtain an identification result of the test set after iterative learning optimization.
2. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: in the step S1, the following steps are performed according to 0.8:0.1: the scale of 0.1 divides the annotated data set into a training set, a validation set and a test set.
3. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the object detection model in the step S2 is a YOLOv l6 network structure model adopting a YOLOv object detection algorithm.
4. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the data enhancement in step S5 includes 4 ways: random horizontal flip, random vertical flip, random rotation, random increase in brightness, then n=4.
5. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the value range of the head credibility threshold T h is more than or equal to 0.9 and less than or equal to T h and less than 1.
6. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the range of the value of the tail credibility threshold T t is more than or equal to 0.9 and less than or equal to T t and less than 1.
7. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the head duty ratio P h is calculated by
8. The image recognition method for solving the problem of unbalance of crop pest samples according to claim 1, wherein the method comprises the following steps: the tail duty ratio P t is calculated by
CN202111676323.9A 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples Active CN114677553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111676323.9A CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111676323.9A CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Publications (2)

Publication Number Publication Date
CN114677553A CN114677553A (en) 2022-06-28
CN114677553B true CN114677553B (en) 2024-05-14

Family

ID=82070802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111676323.9A Active CN114677553B (en) 2021-12-31 2021-12-31 Image recognition method for solving imbalance problem of crop disease and pest samples

Country Status (1)

Country Link
CN (1) CN114677553B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523565B (en) * 2023-11-13 2024-05-17 拓元(广州)智慧科技有限公司 Tail class sample labeling method, device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188824A (en) * 2019-05-31 2019-08-30 重庆大学 A kind of small sample plant disease recognition methods and system
CN112668490A (en) * 2020-12-30 2021-04-16 浙江托普云农科技股份有限公司 Yolov 4-based pest detection method, system, device and readable storage medium
CN112686152A (en) * 2020-12-30 2021-04-20 广西慧云信息技术有限公司 Crop pest and disease identification method with multi-size input and multi-size targets
CN113298150A (en) * 2021-05-25 2021-08-24 东北林业大学 Small sample plant disease identification method based on transfer learning and self-learning
WO2021203505A1 (en) * 2020-04-09 2021-10-14 丰疆智能软件科技(南京)有限公司 Method for constructing pest detection model
CN113657294A (en) * 2021-08-19 2021-11-16 中化现代农业有限公司 Crop disease and insect pest detection method and system based on computer vision

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188824A (en) * 2019-05-31 2019-08-30 重庆大学 A kind of small sample plant disease recognition methods and system
WO2021203505A1 (en) * 2020-04-09 2021-10-14 丰疆智能软件科技(南京)有限公司 Method for constructing pest detection model
CN112668490A (en) * 2020-12-30 2021-04-16 浙江托普云农科技股份有限公司 Yolov 4-based pest detection method, system, device and readable storage medium
CN112686152A (en) * 2020-12-30 2021-04-20 广西慧云信息技术有限公司 Crop pest and disease identification method with multi-size input and multi-size targets
CN113298150A (en) * 2021-05-25 2021-08-24 东北林业大学 Small sample plant disease identification method based on transfer learning and self-learning
CN113657294A (en) * 2021-08-19 2021-11-16 中化现代农业有限公司 Crop disease and insect pest detection method and system based on computer vision

Also Published As

Publication number Publication date
CN114677553A (en) 2022-06-28

Similar Documents

Publication Publication Date Title
CN110148120B (en) Intelligent disease identification method and system based on CNN and transfer learning
CN110046631B (en) System and method for automatically inferring changes in spatiotemporal images
CN109977943A (en) A kind of images steganalysis method, system and storage medium based on YOLO
CN109086799A (en) A kind of crop leaf disease recognition method based on improvement convolutional neural networks model AlexNet
CN110598598A (en) Double-current convolution neural network human behavior identification method based on finite sample set
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN114615093B (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN110991362A (en) Pedestrian detection model based on attention mechanism
Rahman et al. Recognition of local birds of Bangladesh using MobileNet and Inception-v3
CN110097090A (en) A kind of image fine granularity recognition methods based on multi-scale feature fusion
CN108734717B (en) Single-frame star map background dark and weak target extraction method based on deep learning
CN110765865A (en) Underwater target detection method based on improved YOLO algorithm
CN114677553B (en) Image recognition method for solving imbalance problem of crop disease and pest samples
CN114627467B (en) Rice growth period identification method and system based on improved neural network
CN112115849A (en) Video scene identification method based on multi-granularity video information and attention mechanism
CN111160389A (en) Lithology identification method based on fusion of VGG
CN114627411A (en) Crop growth period identification method based on parallel detection under computer vision
CN112633257A (en) Potato disease identification method based on improved convolutional neural network
CN111340019A (en) Grain bin pest detection method based on Faster R-CNN
CN113221913A (en) Agriculture and forestry disease and pest fine-grained identification method and device based on Gaussian probability decision-level fusion
CN110766082A (en) Plant leaf disease and insect pest degree classification method based on transfer learning
CN113344009A (en) Light and small network self-adaptive tomato disease feature extraction method
Liu et al. “Is this blueberry ripe?”: a blueberry ripeness detection algorithm for use on picking robots
CN113591610A (en) Crop leaf aphid detection method based on computer vision

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant