WO2023077821A1 - Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image - Google Patents

Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image Download PDF

Info

Publication number
WO2023077821A1
WO2023077821A1 PCT/CN2022/099827 CN2022099827W WO2023077821A1 WO 2023077821 A1 WO2023077821 A1 WO 2023077821A1 CN 2022099827 W CN2022099827 W CN 2022099827W WO 2023077821 A1 WO2023077821 A1 WO 2023077821A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
data
target detection
model
quality image
Prior art date
Application number
PCT/CN2022/099827
Other languages
French (fr)
Chinese (zh)
Inventor
王鹏
邓玉岩
林蔚东
Original Assignee
西北工业大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西北工业大学 filed Critical 西北工业大学
Publication of WO2023077821A1 publication Critical patent/WO2023077821A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention belongs to the technical field of image processing, and in particular relates to a small-sample low-quality image target detection method based on multi-definition integrated self-training.
  • the present invention proposes a small-sample low-quality image target detection method based on multi-resolution integrated self-training.
  • the data can be fully utilized for low-quality image target detection tasks under limited small-sample labeling data and computing resources, and can be adapted to different target detection models, taking into account the balance between efficiency and detection accuracy .
  • a small-sample low-quality image target detection method based on multi-definition integrated self-training characterized in that the steps are as follows:
  • Step 1 Assuming that the input image data is a labeled data pair (X 1 , Y 1 ) and unlabeled data X 2 , first use the labeled data to perform preliminary training on the target detection model Faster-R-CNN, and the optimization goal is:
  • the model Faster-R-CNN gets F 2 , and then uses F 2 to predict better labeling information for X 2 to get Y 3 ;
  • Step 3 By analogy, the final detection model F n is obtained through continuous iterative updating
  • Step 4 Use the final detection model Fn to detect the image data to be detected to obtain the final image target.
  • the further technical solution of the present invention after each training obtains the latest model F, use it to predict the unlabeled data: first, use the dark channel dehazing model to clear the original low-quality foggy pictures, and control Its window parameters are used to generate defogged pictures I 1 I 2 ...I k with different degrees of clarity, where k means that there are k types of clarity in total, and then input the pictures of the above k types of clarity to F for prediction, and k groups will be generated (x, y, w, h, c) five-tuple prediction results, where the first four numbers predict the position, and the last number c predicts the confidence of the current category; when integrating the k-group five-tuple prediction results, According to the size of confidence c: when c is greater than the given threshold 0.8, the current prediction result is kept in the final result set; when c is less than the given threshold 0.3, the current prediction result is added to the set to be manually corrected; for For the remaining prediction results, the intra-class
  • step 3 the number of iterations in step 3 is 6 times.
  • a small-sample low-quality image target detection method based on multi-definition integrated self-training proposed by the present invention is improved based on the self-training method in the semi-supervised technology, and active learning is added.
  • a multi-resolution integrated self-training method is proposed to effectively combine labeled low-quality image data and unlabeled low-quality image data, and improve the accuracy of small-sample low-quality image target detection.
  • Figure 1 Schematic diagram of the overall structure of the model
  • Figure 2 The main flow of the semi-supervised low-quality image target detection algorithm
  • the technical solution module of the present invention is described from two aspects: the overall structure of the model, and multi-definition integrated self-training.
  • the overall model structure is based on the existing two-stage target detection framework Faster-R-CNN.
  • the model is trained using the provided labeled data, and then the trained model is used to predict labels on unlabeled samples. Finally, these The "labeled" data is added to the original training data and retrained together to obtain the final model for testing.
  • the model of the present invention is constructed based on the Faster-R-CNN framework, assuming that the input data are labeled data pairs (X 1 , Y 1 ) and unlabeled data X 2 , the first step is to use labeled
  • the data is used for preliminary training of the model, and the optimization goal is:
  • the final detection model F n is obtained, and this model is used for detection on the test data set without any labeling, and the final low-quality image detection result is obtained.
  • the main steps of the whole algorithm are shown in Figure 2.
  • the reason why deep learning technology can be widely used includes two reasons: the large amount of data and the progress of hardware technology.
  • the data in deep learning will be divided into three different types: training data, verification data, and test data.
  • the training data needs to be labeled according to the task type, which is very time-consuming and costly.
  • the general self-training method does not consider the application scenarios of low-quality images, and lacks the effective use of the "image quality" factor. This method is improved on the basis of general self-training combined with low-quality image scenarios.
  • the present invention proposes a multi-definition integrated self-training strategy as follows: use the dehazing algorithm to dehaze low-quality images, generate images with different degrees of clarity for predicting labels, and then use active learning to fuse these label information Get the final label information.
  • samples close to the class boundary may be mislabeled, and samples close to the center of the class are selected by confidence, although the mislabeling rate can be reduced.
  • the labels predicted by the heterogeneous detectors for the sample may be inconsistent, and in the iterative process, only samples with high confidence are selected to be added to the training set of samples Among them, such samples have a high degree of similarity with the training set samples, and adding these samples may not improve the performance of the algorithm.
  • the samples with low confidence have a small similarity with the training set samples, correctly mark these samples, and then add them to the training set Among them, the performance of the classifier can be greatly improved.
  • Active learning is to let experts label a small number of unlabeled samples, so as to obtain the correct labeling information of unlabeled samples. Combining the two and using an integrated self-training algorithm that combines active learning and confidence voting can effectively solve the above-mentioned problems in integrated self-training.
  • Y i is the real label
  • f is the model output
  • Loss1 and Loss2 are the respective loss functions in the target detection model
  • n is the number of labeled training data
  • the latter part is the loss calculation of unlabeled data
  • the entire method flow is divided into three parts, the initial network training phase, the integrated self-training phase, and the testing phase.
  • the overall framework is improved based on the two-stage target detection framework Faster-R-CNN.
  • the present invention uses the Faster-R-CNN model in the original training stage of the network.
  • the model is firstly pre-trained on the Pascal-VOC and COCO data sets. Since the former has 20 categories and the latter has 80 categories, which cannot be directly merged, so the labels of the data set were clearly defined, similar labels were merged, and finally all were classified into 80 categories of COCO, with a total of 135,412 pictures, and a total of 12 Epochs were pre-trained.
  • the SGD optimizer with a learning rate of 0.001 and a batch-size of 16 attenuates the learning rate to half of the original at the ninth and eleventh Epoch respectively, reaching a better convergence condition; then use the trained weight As the initialization parameter of the network, and adjust the output category of the model to 5 categories in the RTTS data set, re-training, this training is fine-tuning, so only 6 Epochs are trained in total, and the initial learning rate is 0.0005, other are consistent with the previous.
  • the data used in the initial training phase includes two general-purpose target detection data sets with large data volume and the RTTS training data set with small data volume. There are a total of 500 pictures.
  • the training data here has manual annotation information.
  • the low-quality images are cleared to obtain images of different resolutions, and then the model obtained in the first step of training is used to label the unlabeled data of different resolutions.
  • a total of 4,000 unlabeled data are used.
  • the image of the label is used.
  • different confidence threshold scores are used, which are 0.3, 0.5, and 0.7, a total of three models.
  • the above-mentioned self-training process requires a total of 6 iterations.
  • the main purpose is to obtain more realistic labeling information for unlabeled data, increase the amount of effective information in the data set, and finally use all the data to train together to obtain the final test model parameters.
  • test phase is used to finally verify the effectiveness of the above multi-resolution integrated self-training method, and the parameter settings in this part are still consistent with frameworks such as Faster-R-CNN.

Abstract

A multi-resolution ensemble self-training-based target detection method for a small-sample low-quality image, relating to the technical field of image processing. First, preliminary training is performed on a target detection model by means of labeled data, then prediction is performed on unlabeled data by means of the trained model, then the data having experienced the prediction is added into original data, to train the target detection model again, and so on, and iterative updating is continuously performed to obtain a final detection model; moreover, a multi-resolution ensemble self-training mode is used each time the latest model is used for prediction of unlabeled data. According to the method, labeled low-quality image data and unlabeled low-quality image data are effectively combined, and the precision of target detection for small-sample low-quality images is improved.

Description

基于多清晰度集成自训练的小样本低质量图像目标检测方法Small-sample low-quality image target detection method based on multi-resolution ensemble self-training 技术领域technical field
本发明属于图像处理技术领域,具体涉及一种基于多清晰度集成自训练的小样本低质量图像目标检测方法。The invention belongs to the technical field of image processing, and in particular relates to a small-sample low-quality image target detection method based on multi-definition integrated self-training.
背景技术Background technique
随着科学技术的进步以及数字信息技术的飞速发展,数字设备产品不仅被广泛应用于各行各业中,也成为人们的日常生活中必不可少的一部分。从上世纪中叶至今,计算机图像目标检测技术呈现出蓬勃的生机,在航空航天、地形勘探和交通监控方面等领域都有着广泛的应用,在信息高速化发展的时代,特别是数码相机、手持相机等电子产品的普及,目标检测技术更是被用于生活中的方方面面。With the advancement of science and technology and the rapid development of digital information technology, digital equipment products are not only widely used in all walks of life, but also become an indispensable part of people's daily life. From the middle of the last century to the present, computer image target detection technology has shown vigorous vitality, and has been widely used in aerospace, terrain exploration, traffic monitoring and other fields. In the era of high-speed information development, especially digital cameras, hand-held cameras With the popularization of electronic products, object detection technology is used in all aspects of life.
在实际应用中,计算机图像成像条件多样,往往获取的图像质量差。例如在雨雪雾等极端天气下获取的图像对比度降低、细节模糊,质量严重下降,大大限制了后续计算机视觉的应用,尤其是在户外导航、交通监控、目标识别等方面;网络上的视频图像经过频繁的复制、传输、格式转换等常常清晰度较差,信息损失严重;拍摄设备的轻微移动导致图像抖动,画面模糊;又如夜间环境下,由于光照不足、光源单一、拍摄设备等因素影响,采集的图像普遍存在低对比度、高噪声、色彩失真的特点,使得图像的利用率大大降低。总结来说,低质量图像指由于拍摄场景成像条件差或者拍摄设备不稳定等原因造成的不清晰图像,研究图像质量降低的情况下的目标检测问题在计算机视觉领域和实际应用中都具有重要的意义。In practical applications, computer image imaging conditions are varied, and the quality of images often obtained is poor. For example, images obtained under extreme weather conditions such as rain, snow, fog, etc. have reduced contrast, blurred details, and severely degraded quality, which greatly limits the subsequent application of computer vision, especially in outdoor navigation, traffic monitoring, target recognition, etc.; video images on the Internet After frequent copying, transmission, format conversion, etc., the clarity is often poor and the information loss is serious; the slight movement of the shooting equipment causes the image to shake and the picture is blurred; another example is the night environment, due to factors such as insufficient light, single light source, and shooting equipment. , the collected images generally have the characteristics of low contrast, high noise, and color distortion, which greatly reduces the utilization rate of the image. In summary, low-quality images refer to unclear images caused by poor imaging conditions of the shooting scene or unstable shooting equipment, etc. Research on target detection problems in the case of reduced image quality is of great importance in the field of computer vision and practical applications. significance.
另一方面,相对于传统场景下的目标检测,极端天气低质量的场景图像获取往往更为困难,同时人工标注数据的成本往往较高,因此研究一种在很小的低质量图像标注数据下进行目标检测的方法十分有必要。On the other hand, compared with target detection in traditional scenes, it is often more difficult to obtain low-quality scene images in extreme weather, and the cost of manually annotating data is often high, so research on a method based on small low-quality image annotation data Methods for object detection are necessary.
发明内容Contents of the invention
要解决的技术问题technical problem to be solved
为了避免现有技术的不足之处,本发明提出一种基于多清晰度集成自训练的小样本低质量图像目标检测方法。使用本发明提出的检测框架,可以在有限的小样本标注数据和计算资源下,针对低质量图像目标检测任务充分利用数据,并能适配于不同的目标检测模型,兼顾效率与检测精度的平衡。In order to avoid the shortcomings of the prior art, the present invention proposes a small-sample low-quality image target detection method based on multi-resolution integrated self-training. Using the detection framework proposed by the present invention, the data can be fully utilized for low-quality image target detection tasks under limited small-sample labeling data and computing resources, and can be adapted to different target detection models, taking into account the balance between efficiency and detection accuracy .
技术方案Technical solutions
一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于步骤如下:A small-sample low-quality image target detection method based on multi-definition integrated self-training, characterized in that the steps are as follows:
步骤1:假设输入图像数据为有标注的数据对(X 1,Y 1)以及无标注数据X 2,首先使用有标注的数据对目标检测模型Faster-R-CNN进行初步训练,优化目标为: Step 1: Assuming that the input image data is a labeled data pair (X 1 , Y 1 ) and unlabeled data X 2 , first use the labeled data to perform preliminary training on the target detection model Faster-R-CNN, and the optimization goal is:
MIN Loss(Y 1,F 1(X 1)) MIN Loss(Y 1 ,F 1 (X 1 ))
在得到第一个训练模型F1之后,使用它来对无标注的数据进行预测,即:After getting the first training model F1, use it to predict the unlabeled data, namely:
Y 2=F 1(X 2) Y 2 =F 1 (X 2 )
步骤2:然后将(X 2,Y 2)当成有标注数据加入到原来的数据中得到增强数据集D 1=(X 1,Y 1,X 2,Y 2),利用D 1重新训练目标检测模型Faster-R-CNN得到F 2,然后再用F 2给X 2预测更好的标注信息得到Y 3Step 2: Then add (X 2 ,Y 2 ) as labeled data to the original data to obtain an enhanced data set D 1 =(X 1 ,Y 1 ,X 2 ,Y 2 ), use D 1 to retrain target detection The model Faster-R-CNN gets F 2 , and then uses F 2 to predict better labeling information for X 2 to get Y 3 ;
步骤3:以此类推,不断进行迭代式更新得到最终的检测模型F nStep 3: By analogy, the final detection model F n is obtained through continuous iterative updating;
步骤4:使用最终的检测模型F n对待检测图像数据进行检测,得到最终的图像目标。 Step 4: Use the final detection model Fn to detect the image data to be detected to obtain the final image target.
本发明进一步的技术方案:在每次训练得到最新的模型F后,用它对无标注的数据进行预测:首先针对原始的低质量带雾图片使用暗通道去雾模型进行清晰化处理,通过控制其窗口参数来产生不同清晰程度的去雾图片I 1I 2…I k,其中k表示一共有k种清晰度,然后将上述k种清晰程度的图片分别输入给F进行预测,会产生k组(x,y,w,h,c)五元组预测结果,其中前四个数字预测位置,最后一个数字c预测属于当前类别的置 信度;在对k组五元组预测结果进行集成时,根据置信度c的大小:当c大于给定阈值0.8时,则保留当前预测结果到最终结果集合中;当c小于给定阈值0.3时,则将当前预测结果添加到待人工纠正集合中;对于剩下的预测结果,则按照交并比大小进行类内的融合,即对于同类中的任意两个预测框交并比大于给定阈值0.7的坐标框,保留c值较大的那个加入到最终结果集中;最后将上述过程中产生的待人工纠正集合中的“错误”预测结果进行纠正后加入最终结果集。 The further technical solution of the present invention: after each training obtains the latest model F, use it to predict the unlabeled data: first, use the dark channel dehazing model to clear the original low-quality foggy pictures, and control Its window parameters are used to generate defogged pictures I 1 I 2 ...I k with different degrees of clarity, where k means that there are k types of clarity in total, and then input the pictures of the above k types of clarity to F for prediction, and k groups will be generated (x, y, w, h, c) five-tuple prediction results, where the first four numbers predict the position, and the last number c predicts the confidence of the current category; when integrating the k-group five-tuple prediction results, According to the size of confidence c: when c is greater than the given threshold 0.8, the current prediction result is kept in the final result set; when c is less than the given threshold 0.3, the current prediction result is added to the set to be manually corrected; for For the remaining prediction results, the intra-class fusion is performed according to the intersection ratio, that is, for any two prediction frames of the same type whose intersection ratio is greater than a given threshold of 0.7, the one with the larger c value is retained and added to the final Result set; finally, the "wrong" prediction results in the set to be manually corrected generated in the above process are corrected and added to the final result set.
本发明进一步的技术方案:步骤3中迭代次数为6次。A further technical solution of the present invention: the number of iterations in step 3 is 6 times.
有益效果Beneficial effect
本发明提出的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,基于半监督技术中的自训练方法进行改进,加入了主动学习有针对性的利用自训练时被标错的数据,同时针对低质量图像场景提出了多清晰度集成自训练的方式有效将有标签低质量图像数据以及无标签低质量图像数据结合起来,提高了小样本低质量图像目标检测的精度。A small-sample low-quality image target detection method based on multi-definition integrated self-training proposed by the present invention is improved based on the self-training method in the semi-supervised technology, and active learning is added. At the same time, for low-quality image scenes, a multi-resolution integrated self-training method is proposed to effectively combine labeled low-quality image data and unlabeled low-quality image data, and improve the accuracy of small-sample low-quality image target detection.
附图说明Description of drawings
附图仅用于示出具体实施例的目的,而并不认为是对本发明的限制,在整个附图中,相同的参考符号表示相同的部件。The drawings are for the purpose of illustrating specific embodiments only and are not to be considered as limitations of the invention, and like reference numerals refer to like parts throughout the drawings.
图1模型整体结构示意图;Figure 1 Schematic diagram of the overall structure of the model;
图2半监督低质量图像目标检测算法主要流程;Figure 2 The main flow of the semi-supervised low-quality image target detection algorithm;
图3多清晰度集成自训练方法。Fig. 3 Multi-resolution ensemble self-training method.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图和实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。此外,下面描述的本发明各个实施方式中所涉及到的技术 特征只要彼此之间未构成冲突就可以相互组合。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.
本发明的技术方案模块从二个方面进行阐述:模型总体结构、多清晰度集成自训练。总体的模型结构是基于现有的两阶段目标检测框架Faster-R-CNN构建,首先利用提供的已标注数据对模型进行训练,然后用训练好的模型在无标注样本上预测标注,最后将这些“有标注”数据加入到原训练数据中一起重新训练,得到最后的模型进行测试。The technical solution module of the present invention is described from two aspects: the overall structure of the model, and multi-definition integrated self-training. The overall model structure is based on the existing two-stage target detection framework Faster-R-CNN. First, the model is trained using the provided labeled data, and then the trained model is used to predict labels on unlabeled samples. Finally, these The "labeled" data is added to the original training data and retrained together to obtain the final model for testing.
1.模型总体结构1. The overall structure of the model
如图1所示,本发明的模型基于Faster-R-CNN框架进行构建,假设输入数据为有标注的数据对(X 1,Y 1)以及无标注数据X 2,首先第一步使用有标注的数据来对模型进行初步训练,优化目标为: As shown in Figure 1, the model of the present invention is constructed based on the Faster-R-CNN framework, assuming that the input data are labeled data pairs (X 1 , Y 1 ) and unlabeled data X 2 , the first step is to use labeled The data is used for preliminary training of the model, and the optimization goal is:
MIN Loss(Y 1,F 1(X 1)) MIN Loss(Y 1 ,F 1 (X 1 ))
在得到第一个训练模型F1之后,使用它来对无标注的数据进行预测,即:After getting the first training model F1, use it to predict the unlabeled data, namely:
Y 2=F 1(X 2) Y 2 =F 1 (X 2 )
然后将(X 2,Y 2)当成有标注数据加入到原来的数据中得到增强数据集D 1=(X 1,Y 1,X 2,Y 2),其中Y 2的标注信息是带有一定噪音的,可以通过提高最后的置信度阈值来增加其可信程度。同时为了提高标注信息的预测质量,可以重复上述的过程,利用D 1重新训练模型得到F 2,然后再用F 2给X 2预测更好的标注信息得到Y 3,以此类推,不断进行迭代式更新直到有一个较好的数据标注,毫无疑问数据量的增加会给模型带来较大的结果提升。 Then (X 2 , Y 2 ) is added to the original data as labeled data to obtain an enhanced data set D 1 = (X 1 , Y 1 , X 2 , Y 2 ), where the labeled information of Y 2 has a certain Noisy, you can increase its credibility by increasing the final confidence threshold. At the same time, in order to improve the prediction quality of labeling information, the above process can be repeated, using D 1 to retrain the model to get F 2 , and then use F 2 to predict better labeling information for X 2 to get Y 3 , and so on, continue to iterate The formula is updated until there is a better data label. There is no doubt that the increase in the amount of data will bring greater improvement to the model.
在经过上述的多轮迭代训练之后,得到最终的检测模型F n,使用该模型在没有经过任何标注的测试数据集上进行检测,得到最后的低质量图像检测结果。整个算法的主要步骤如图2所示。 After the above-mentioned multiple rounds of iterative training, the final detection model F n is obtained, and this model is used for detection on the test data set without any labeling, and the final low-quality image detection result is obtained. The main steps of the whole algorithm are shown in Figure 2.
2.多清晰度集成自训练2. Multi-resolution integrated self-training
深度学习技术之所以能够广泛得到应用,包括两个原因:大数据量以及硬件技术 的进步。一般来说深度学习中的数据会被划分成三个不同的种类:训练数据、验证数据以及测试数据,其中训练数据需要根据任务类型进行标注,十分耗费时间成本,同时在低质量的场景下,由于天气等原因限制,获取到充足的图像数据比较困难,所以针对此场景,可以利用自训练方式增加带标注数据的量。但一般的自训练方法没有考虑到低质量图像的应用场景,缺乏对“图像质量”这个因素的有效利用,本方法则在一般自训练的基础上结合低质图像场景进行了改进,具体来说,有以下两点:首先针对低质量图像提出用多清晰度共同预测其标注信息,增加模型对不同清晰程度的图像的拟合能力;其次使用一种结合主动学习和置信度投票的集成方法,对不同清晰度下的低质量图像预测结果进行融合。基于以上两点,本发明提出了多清晰度集成自训练的策略如下:使用去雾算法对低质图像去雾,产生不同清晰程度的图像用于预测标签,然后利用主动学习将这些标签信息融合得到最终的标注信息。The reason why deep learning technology can be widely used includes two reasons: the large amount of data and the progress of hardware technology. Generally speaking, the data in deep learning will be divided into three different types: training data, verification data, and test data. The training data needs to be labeled according to the task type, which is very time-consuming and costly. At the same time, in low-quality scenarios, Due to the limitations of weather and other reasons, it is difficult to obtain sufficient image data, so for this scenario, you can use the self-training method to increase the amount of labeled data. However, the general self-training method does not consider the application scenarios of low-quality images, and lacks the effective use of the "image quality" factor. This method is improved on the basis of general self-training combined with low-quality image scenarios. Specifically, , has the following two points: firstly, for low-quality images, it is proposed to use multi-definition to jointly predict its annotation information, and increase the model's ability to fit images with different levels of clarity; secondly, an integrated method combining active learning and confidence voting is used, Fusion of low-quality image prediction results under different resolutions. Based on the above two points, the present invention proposes a multi-definition integrated self-training strategy as follows: use the dehazing algorithm to dehaze low-quality images, generate images with different degrees of clarity for predicting labels, and then use active learning to fuse these label information Get the final label information.
在集成自训练方法中,通过投票的方式,在检测器性能较低的时候,可能错误标记靠近类边界的样本,而通过置信度的方式选择离类中心近的样本,虽然能降低错误标记率,但即使是各集成检测器判断置信度都高的样本,异构检测器对该样本预测的标签也可能不一致,而且在迭代的过程仅选出置信度较高的样本加入到样本的训练集当中,这样的样本与训练集样本具有很高的相似度,加入这些样本可能无法提高算法的性能,相反置信度低的样本和训练集样本相似度小,正确标记这些样本,然后加入到训练集当中,能很大的提高分类器的性能。主动学习是让专家去标注少量的无标签样本,从而得到无标签样本的正确的标注信息。将二者结合起来,使用结合主动学习与置信度投票的集成自训练算法,可以有效解决集成自训练存在的上述问题。In the integrated self-training method, by voting, when the performance of the detector is low, samples close to the class boundary may be mislabeled, and samples close to the center of the class are selected by confidence, although the mislabeling rate can be reduced. , but even if each integrated detector judges a sample with high confidence, the labels predicted by the heterogeneous detectors for the sample may be inconsistent, and in the iterative process, only samples with high confidence are selected to be added to the training set of samples Among them, such samples have a high degree of similarity with the training set samples, and adding these samples may not improve the performance of the algorithm. On the contrary, the samples with low confidence have a small similarity with the training set samples, correctly mark these samples, and then add them to the training set Among them, the performance of the classifier can be greatly improved. Active learning is to let experts label a small number of unlabeled samples, so as to obtain the correct labeling information of unlabeled samples. Combining the two and using an integrated self-training algorithm that combines active learning and confidence voting can effectively solve the above-mentioned problems in integrated self-training.
最终在得到比较准确的数据标签之后,加入到原来的有标签数据集中一起送到目标检测模型Faster-R-CNN当中进行协同训练,相当于用两部分不同的数据(一部分标注准确,一部分次准确)来一起优化最后的模型,于是最后的优化目标可以写为如下 表达式:Finally, after obtaining more accurate data labels, they are added to the original labeled data set and sent to the target detection model Faster-R-CNN for collaborative training, which is equivalent to using two different parts of data (one part is accurately labeled, and the other part is sub-accurate). ) to optimize the final model together, so the final optimization objective can be written as the following expression:
Figure PCTCN2022099827-appb-000001
Figure PCTCN2022099827-appb-000001
其中Y i为真实标签,f是模型输出,Loss1与Loss2是目标检测模型中各自的损失函数,n是有标记训练数据的数量,后面部分是无标签数据的损失计算,其中Y' i是通过前述优化后的集成自训练方法得到的较为准确的伪标签。 Among them, Y i is the real label, f is the model output, Loss1 and Loss2 are the respective loss functions in the target detection model, n is the number of labeled training data, and the latter part is the loss calculation of unlabeled data, where Y' i is passed The aforementioned optimized ensemble self-training method obtains more accurate pseudo-labels.
为了使本领域技术人员更好地理解本发明,下面结合具体实施例对本发明进行详细说明。In order to enable those skilled in the art to better understand the present invention, the present invention will be described in detail below in conjunction with specific embodiments.
将整个方法流程分为三部分,网络初始训练阶段、集成自训练阶段、测试阶段。框架整体基于两阶段目标检测框架Faster-R-CNN改进而来。The entire method flow is divided into three parts, the initial network training phase, the integrated self-training phase, and the testing phase. The overall framework is improved based on the two-stage target detection framework Faster-R-CNN.
1)网络初始训练阶段:1) Initial network training phase:
本发明在网络原始训练阶段使用Faster-R-CNN模型,为了增加模型的泛化性能,首先将模型在Pascal-VOC以及COCO数据集上进行了预训练,由于前者有20个类别而后者有80个类别,不能直接合并,因此对数据集标签进行了清晰,将相近的标签进行了合并,最终全部归入到COCO的80个类别,一共135412张图片,总共预训练了12个Epoch,使用了学习率为0.001以及batch-size为16的SGD优化器,分别在第九以及第十一个Epoch将学习率衰减为原来的一半,达到了一个较好的收敛条件;然后用该训练好的权重作为网络的初始化参数,并且将模型的输出类别调整为RTTS数据集中的5个类别,重新进行训练,本次训练由于是微调,所以总共只训练了6个Epoch,并且初始化学习率为0.0005,其他都与前面保持一致。The present invention uses the Faster-R-CNN model in the original training stage of the network. In order to increase the generalization performance of the model, the model is firstly pre-trained on the Pascal-VOC and COCO data sets. Since the former has 20 categories and the latter has 80 categories, which cannot be directly merged, so the labels of the data set were clearly defined, similar labels were merged, and finally all were classified into 80 categories of COCO, with a total of 135,412 pictures, and a total of 12 Epochs were pre-trained. The SGD optimizer with a learning rate of 0.001 and a batch-size of 16 attenuates the learning rate to half of the original at the ninth and eleventh Epoch respectively, reaching a better convergence condition; then use the trained weight As the initialization parameter of the network, and adjust the output category of the model to 5 categories in the RTTS data set, re-training, this training is fine-tuning, so only 6 Epochs are trained in total, and the initial learning rate is 0.0005, other are consistent with the previous.
初始训练阶段利用的数据包括数据量较大的两个通用目标检测数据集以及数据量较小的RTTS训练数据集一共有500张图片,这里的训练数据是具有人工标注信息的。The data used in the initial training phase includes two general-purpose target detection data sets with large data volume and the RTTS training data set with small data volume. There are a total of 500 pictures. The training data here has manual annotation information.
2)多清晰度集成自训练阶段:2) Multi-resolution integrated self-training stage:
在多清晰度集成自训练阶段,首先对低质图像进行清晰化得到不同清晰度的图像, 再利用第一步训练得到的模型对不同清晰度的无标签数据进行标注,一共使用了4000张无标签的图像。这一步使用了不同的置信度阈值得分,分别为0.3,0.5,0.7一共三个模型,在最后集成时首先利用NMS算法进行非极大值抑制去重,并且对三种参数下模型产生的四个坐标进行加权融合得到最后的坐标(x 1,y 1,x 2,y 2),选取概率最大的标签y c作为最后的类别标签,则可得到y'=(x 1,y 1,x 2,y 2)+y c作为最后的伪标签加入到训练集当中。 In the self-training stage of multi-resolution integration, firstly, the low-quality images are cleared to obtain images of different resolutions, and then the model obtained in the first step of training is used to label the unlabeled data of different resolutions. A total of 4,000 unlabeled data are used. The image of the label. In this step, different confidence threshold scores are used, which are 0.3, 0.5, and 0.7, a total of three models. In the final integration, the NMS algorithm is first used to suppress and deduplicate the non-maximum value, and the four models generated by the model under the three parameters Coordinates are weighted and fused to obtain the final coordinates (x 1 , y 1 , x 2 , y 2 ), and the label y c with the highest probability is selected as the final category label, then y'=(x 1 ,y 1 , x 2 ,y 2 )+y c is added to the training set as the final pseudo-label.
上述自训练过程总共需要迭代6轮,主要目的是为了给无标签数据获得更加真实的标注信息,增加数据集的有效信息量,最后用全部数据一起进行训练获得最后的测试模型参数。The above-mentioned self-training process requires a total of 6 iterations. The main purpose is to obtain more realistic labeling information for unlabeled data, increase the amount of effective information in the data set, and finally use all the data to train together to obtain the final test model parameters.
3)测试阶段:3) Testing phase:
测试阶段用于最终验证上述多清晰度集成自训练方式的有效性,这部分的参数设置与Faster-R-CNN等框架仍旧保持一致。The test phase is used to finally verify the effectiveness of the above multi-resolution integrated self-training method, and the parameter settings in this part are still consistent with frameworks such as Faster-R-CNN.
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明公开的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope disclosed by the present invention. Modifications or replacements shall all fall within the protection scope of the present invention.

Claims (6)

  1. 一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于步骤如下:A small-sample low-quality image target detection method based on multi-definition integrated self-training, characterized in that the steps are as follows:
    步骤1:假设输入图像数据为有标注的数据对(X 1,Y 1)以及无标注数据X 2,首先使用有标注的数据对目标检测模型Faster-R-CNN进行初步训练,优化目标为: Step 1: Assuming that the input image data is a labeled data pair (X 1 , Y 1 ) and unlabeled data X 2 , first use the labeled data to perform preliminary training on the target detection model Faster-R-CNN, and the optimization goal is:
    MIN Loss(Y 1,F 1(X 1)) MIN Loss(Y 1 ,F 1 (X 1 ))
    在得到第一个训练模型F1之后,使用它来对无标注的数据进行预测,即:After getting the first training model F1, use it to predict the unlabeled data, namely:
    Y 2=F 1(X 2) Y 2 =F 1 (X 2 )
    步骤2:然后将(X 2,Y 2)当成有标注数据加入到原来的数据中得到增强数据集D 1=(X 1,Y 1,X 2,Y 2),利用D 1重新训练目标检测模型Faster-R-CNN得到F 2,然后再用F 2给X 2预测更好的标注信息得到Y 3Step 2: Then add (X 2 ,Y 2 ) as labeled data to the original data to obtain an enhanced data set D 1 =(X 1 ,Y 1 ,X 2 ,Y 2 ), use D 1 to retrain target detection The model Faster-R-CNN gets F 2 , and then uses F 2 to predict better labeling information for X 2 to get Y 3 ;
    步骤3:以此类推,不断进行迭代式更新得到最终的检测模型F nStep 3: By analogy, the final detection model F n is obtained through continuous iterative updating;
    步骤4:使用最终的检测模型F n对待检测图像数据进行检测,得到最终的图像目标。 Step 4: Use the final detection model Fn to detect the image data to be detected to obtain the final image target.
  2. 根据权利要求1所述的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于在每次训练得到最新的模型F后,用它对无标注的数据进行预测:首先针对原始的低质量带雾图片使用暗通道去雾模型进行清晰化处理,通过控制其窗口参数来产生不同清晰程度的去雾图片I 1I 2…I k,其中k表示一共有k种清晰度,然后将上述k种清晰程度的图片分别输入给F进行预测,会产生k组(x,y,w,h,c)五元组预测结果,其中前四个数字预测位置,最后一个数字c预测属于当前类别的置信度;在对k组五元组预测结果进行集成时,根据置信度c的大小:当c大于第一给定阈值时,则保留当前预测结果到最终结果集合中;当c小于第二给定阈值时,则将当前预测结果添加到待人工纠正集合中;对于剩下的预测结果,则按照交并比大小进行类内的融合,即对于同类中的任意两个预测框交并比大于第三给定阈值的坐标框,保留c值较大的那个加入到最终结果集中;最后将上述过 程中产生的待人工纠正集合中的“错误”预测结果进行纠正后加入最终结果集。 A kind of small-sample low-quality image target detection method based on multi-definition integrated self-training according to claim 1, characterized in that after each training obtains the latest model F, use it to predict the data without labels: Firstly, the dark channel dehazing model is used to clear the original low-quality foggy pictures, and the dehazing pictures I 1 I 2 ...I k with different degrees of clarity can be generated by controlling its window parameters, where k means that there are k types of clearness in total degrees, and then input the pictures of the above k levels of clarity to F for prediction, which will generate k sets of (x, y, w, h, c) quintuple prediction results, in which the first four numbers predict the position, and the last number c predicts the confidence degree belonging to the current category; when integrating the prediction results of k groups of quintuples, according to the size of the confidence degree c: when c is greater than the first given threshold, the current prediction result is kept in the final result set; When c is less than the second given threshold, the current prediction result is added to the set to be manually corrected; for the remaining prediction results, the intra-class fusion is performed according to the intersection ratio, that is, for any two of the same class For the coordinate frame whose intersection ratio of prediction frame is greater than the third given threshold, keep the one with larger c value and add it to the final result set; finally, correct the "wrong" prediction result in the set to be manually corrected generated in the above process and add it final result set.
  3. 根据权利要求2所述的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于第一给定阈值为0.8。According to claim 2, a small-sample low-quality image target detection method based on multi-resolution integrated self-training is characterized in that the first given threshold is 0.8.
  4. 根据权利要求2所述的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于第二给定阈值为0.3。A small-sample low-quality image target detection method based on multi-resolution integrated self-training according to claim 2, characterized in that the second given threshold is 0.3.
  5. 根据权利要求2所述的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于第三给定阈值为0.7。A small-sample low-quality image target detection method based on multi-resolution integrated self-training according to claim 2, characterized in that the third given threshold is 0.7.
  6. 根据权利要求1所述的一种基于多清晰度集成自训练的小样本低质量图像目标检测方法,其特征在于步骤3中迭代次数为6次。A small-sample low-quality image target detection method based on multi-resolution integrated self-training according to claim 1, characterized in that the number of iterations in step 3 is 6 times.
PCT/CN2022/099827 2021-11-07 2022-06-20 Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image WO2023077821A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111309737.8A CN114067173A (en) 2021-11-07 2021-11-07 Small sample low-quality image target detection method based on multi-definition integrated self-training
CN202111309737.8 2021-11-07

Publications (1)

Publication Number Publication Date
WO2023077821A1 true WO2023077821A1 (en) 2023-05-11

Family

ID=80274201

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/099827 WO2023077821A1 (en) 2021-11-07 2022-06-20 Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image

Country Status (2)

Country Link
CN (1) CN114067173A (en)
WO (1) WO2023077821A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596161A (en) * 2023-07-04 2023-08-15 江南大学 Target prediction model construction method and prediction method under multi-center small sample scene
CN116776154A (en) * 2023-07-06 2023-09-19 华中师范大学 AI man-machine cooperation data labeling method and system
CN116912798A (en) * 2023-09-14 2023-10-20 南京航空航天大学 Cross-modal noise perception-based automatic driving event camera target detection method
CN117041625A (en) * 2023-08-02 2023-11-10 成都梵辰科技有限公司 Method and system for constructing ultra-high definition video image quality detection network
CN117496118A (en) * 2023-10-23 2024-02-02 浙江大学 Method and system for analyzing steal vulnerability of target detection model
CN117496191A (en) * 2024-01-03 2024-02-02 南京航空航天大学 Data weighted learning method based on model collaboration
CN117041625B (en) * 2023-08-02 2024-04-19 成都梵辰科技有限公司 Method and system for constructing ultra-high definition video image quality detection network

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067173A (en) * 2021-11-07 2022-02-18 西北工业大学 Small sample low-quality image target detection method based on multi-definition integrated self-training
CN114882344A (en) * 2022-05-23 2022-08-09 海南大学 Small-sample underwater fish body tracking method based on semi-supervision and attention mechanism

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527029A (en) * 2017-08-18 2017-12-29 卫晨 A kind of improved Faster R CNN method for detecting human face
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
CN111401418A (en) * 2020-03-05 2020-07-10 浙江理工大学桐乡研究院有限公司 Employee dressing specification detection method based on improved Faster r-cnn
CN111553414A (en) * 2020-04-27 2020-08-18 东华大学 In-vehicle lost object detection method based on improved Faster R-CNN
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
US20210064934A1 (en) * 2019-08-30 2021-03-04 Adobe Inc. Selecting logo images using machine-learning-logo classifiers
CN113052789A (en) * 2020-11-03 2021-06-29 哈尔滨市科佳通用机电股份有限公司 Vehicle bottom plate foreign body hitting fault detection method based on deep learning
CN114067173A (en) * 2021-11-07 2022-02-18 西北工业大学 Small sample low-quality image target detection method based on multi-definition integrated self-training

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107527029A (en) * 2017-08-18 2017-12-29 卫晨 A kind of improved Faster R CNN method for detecting human face
CN108764281A (en) * 2018-04-18 2018-11-06 华南理工大学 A kind of image classification method learning across task depth network based on semi-supervised step certainly
US20210064934A1 (en) * 2019-08-30 2021-03-04 Adobe Inc. Selecting logo images using machine-learning-logo classifiers
CN111401418A (en) * 2020-03-05 2020-07-10 浙江理工大学桐乡研究院有限公司 Employee dressing specification detection method based on improved Faster r-cnn
CN111553414A (en) * 2020-04-27 2020-08-18 东华大学 In-vehicle lost object detection method based on improved Faster R-CNN
CN112232416A (en) * 2020-10-16 2021-01-15 浙江大学 Semi-supervised learning method based on pseudo label weighting
CN113052789A (en) * 2020-11-03 2021-06-29 哈尔滨市科佳通用机电股份有限公司 Vehicle bottom plate foreign body hitting fault detection method based on deep learning
CN114067173A (en) * 2021-11-07 2022-02-18 西北工业大学 Small sample low-quality image target detection method based on multi-definition integrated self-training

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596161A (en) * 2023-07-04 2023-08-15 江南大学 Target prediction model construction method and prediction method under multi-center small sample scene
CN116596161B (en) * 2023-07-04 2023-10-13 江南大学 Target prediction model construction method and prediction method under multi-center small sample scene
CN116776154A (en) * 2023-07-06 2023-09-19 华中师范大学 AI man-machine cooperation data labeling method and system
CN116776154B (en) * 2023-07-06 2024-04-09 华中师范大学 AI man-machine cooperation data labeling method and system
CN117041625A (en) * 2023-08-02 2023-11-10 成都梵辰科技有限公司 Method and system for constructing ultra-high definition video image quality detection network
CN117041625B (en) * 2023-08-02 2024-04-19 成都梵辰科技有限公司 Method and system for constructing ultra-high definition video image quality detection network
CN116912798A (en) * 2023-09-14 2023-10-20 南京航空航天大学 Cross-modal noise perception-based automatic driving event camera target detection method
CN116912798B (en) * 2023-09-14 2023-12-19 南京航空航天大学 Cross-modal noise perception-based automatic driving event camera target detection method
CN117496118A (en) * 2023-10-23 2024-02-02 浙江大学 Method and system for analyzing steal vulnerability of target detection model
CN117496191A (en) * 2024-01-03 2024-02-02 南京航空航天大学 Data weighted learning method based on model collaboration
CN117496191B (en) * 2024-01-03 2024-03-29 南京航空航天大学 Data weighted learning method based on model collaboration

Also Published As

Publication number Publication date
CN114067173A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2023077821A1 (en) Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
Tan et al. Night-time scene parsing with a large real dataset
CN112597883B (en) Human skeleton action recognition method based on generalized graph convolution and reinforcement learning
CN110781262B (en) Semantic map construction method based on visual SLAM
US11568637B2 (en) UAV video aesthetic quality evaluation method based on multi-modal deep learning
US9230159B1 (en) Action recognition and detection on videos
Yang et al. St3d++: Denoised self-training for unsupervised domain adaptation on 3d object detection
US10929676B2 (en) Video recognition using multiple modalities
CN112150493A (en) Semantic guidance-based screen area detection method in natural scene
CN115393687A (en) RGB image semi-supervised target detection method based on double pseudo-label optimization learning
WO2023040510A1 (en) Image anomaly detection model training method and apparatus, and image anomaly detection method and apparatus
WO2023109361A1 (en) Video processing method and system, device, medium and product
Yun et al. Panoramic vision transformer for saliency detection in 360∘ videos
WO2022148248A1 (en) Image processing model training method, image processing method and apparatus, electronic device, and computer program product
CN113793359B (en) Target tracking method integrating twin network and related filtering
Zhang et al. EventMD: High-speed moving object detection based on event-based video frames
CN111723934B (en) Image processing method and system, electronic device and storage medium
JP6600288B2 (en) Integrated apparatus and program
WO2023092582A1 (en) A scene adaptive target detection method based on motion foreground
He et al. CPSPNet: Crowd counting via semantic segmentation framework
CN113627240B (en) Unmanned aerial vehicle tree species identification method based on improved SSD learning model
CN114882346A (en) Underwater robot target autonomous identification method based on vision
CN112634331A (en) Optical flow prediction method and device
Sun et al. Study of UAV tracking based on CNN in noisy environment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888857

Country of ref document: EP

Kind code of ref document: A1