CN116681983A

CN116681983A - Long and narrow target detection method based on deep learning

Info

Publication number: CN116681983A
Application number: CN202310648368.8A
Authority: CN
Inventors: 焦文华; 骆园; 田玉宇; 李瑞林; 谢小浩; 蔡晓异
Original assignee: China University of Mining and Technology CUMT
Current assignee: China University of Mining and Technology CUMT
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-09-01

Abstract

The invention discloses a long and narrow target detection method based on deep learning, which relates to the technical field of long and narrow target detection, wherein a test image is input into a detection model to detect a target object in the image, and the detection model comprises a data acquisition and preprocessing module, a long and narrow target detection network training module and a test image detection frame generation module. By adopting the structure, the invention obtains images with proper sizes and increases training samples through preprocessing data, thereby improving the generalization capability of the network model; a global attention mechanism GAM is added between a BackBone network of the backhaul and the Neck, so that the extraction capability of the network to the characteristics of the target object is enhanced, and the detection precision of the target is further improved; and introducing a directional boundary box representation method, carrying out accurate regression of the detection box, removing the generated repeated detection box by adopting a control threshold value, and acquiring a more accurate detection box result by adopting a CIoU loss function.

Description

Long and narrow target detection method based on deep learning

Technical Field

The invention relates to the technical field of long and narrow target detection, in particular to a long and narrow target detection method based on deep learning.

Background

The computer vision target detection aims at identifying and positioning a target object existing in an image, belongs to a classical task in the field of computer vision, has important application value in the fields of informationized intelligent agriculture, industrial intellectualization, automatic driving and the like, and becomes an important precondition for a subsequent vision task. Along with the rapid development of deep learning technology, the target detection task breaks through to the new field step by step, and the problems of low efficiency, poor accuracy, time consumption and labor consumption of the traditional manual detection mode are successively solved.

In recent years, in the case of narrow and long dense target detection in multiple fields, for example, there are stuck wheat grain detection and dense wheat ear detection in an agricultural scene, remote sensing target images of aircraft vessels and the like acquired from satellite images, and dense industrial product crack detection in an industrial scene, because targets are shielded from each other and the arrangement directions of the targets are different, the resolution of target objects is reduced, and the conventional single-stage YOLO, SSD and RetinaNet, double-stage Fast RCNN and Fast RCNN target detection methods have the problems of low precision and omission.

In the existing long and narrow target detection method, publication No. CN113326763A discloses a remote sensing target detection method based on boundary frame consistency, which mainly uses a ResNet101 Conv1-5 network model as a base network, generates a prediction boundary frame through a thermal diagram, offset information, prediction frame information and direction information, performs positioning display according to the prediction boundary frame, and improves regression effect and detection speed. However, the method has strong dependence on the data set and weak generalization capability, and has low efficiency and higher omission ratio when the scene is switched to a long, narrow and dense target data set with different directions.

Accordingly, there is a need to provide a method for detecting an elongated object based on deep learning to solve the above-mentioned problems.

Disclosure of Invention

The invention aims to provide a long and narrow target detection method based on deep learning, which is used for mainly solving the problems of low efficiency and missed detection generated when long and narrow targets with uneven arrangement and different directions are detected.

In order to achieve the above purpose, the invention provides a long and narrow target detection method based on deep learning, which inputs a test image into a detection model to detect a target object in the image, wherein the detection model comprises a data acquisition and preprocessing module, a long and narrow target detection network training module and a test image detection frame generation module.

Preferably, the data acquisition and preprocessing module comprises a data acquisition module and a data preprocessing module, wherein the data acquisition module takes a plurality of target images shot by a camera as a data set for model training, verification and testing; the data preprocessing module marks the target image by using a target detection tool ropylelmg, cuts and rotates the data set, and randomly divides the data set into a training set, a verification set and a test set.

Preferably, the detection model adopts convolution, normalization and activation operation to extract feature mapping, and combines channel information fusion operation to send feature graphs with different downsampling rates to a Neck structure.

Preferably, the detection model is improved based on initial YOLOX training, and the detection and regression mode of the detection model in the training and reasoning process is improved to be directed bounding box detection, a global attention mechanism GAM is adopted, and a loss function is optimized.

Preferably, the orientation bounding box detection is based on a conventional rectangular box with a rotation angle θ algebraically expressed as (x) _c ,y _c W, h, θ), where (x _c ,y _c ) Representing the coordinates of the center point of the range box, (w, h) representing the width and height of the range box.

Preferably, the global attention mechanism GAM is added between the Backbone network of the backhaul and the negk network.

Preferably, the global attention mechanism GAM comprises the following steps:

s1: compressing the feature map of the target image by using a global average pooling GAP module;

s2: using S _D The downsampling module is used for reducing characteristic dimension;

s3: activating by using a ReLU function;

s4: using S _U The up-sampling module returns to the original dimension through the full connection layer;

s5: obtaining normalized weights through a sigmoid function;

s6: the normalized weights are weighted to each channel using Scale, outputting the same number of weights as the input features.

Preferably, the loss function takes the form of a multitasking loss, consisting essentially of a localized loss L _obj Classification loss L _cls And confidence loss L _reg Composition, total loss L _total The expression is as follows:

L _total ＝L _obj +L _cls +L _reg

in the positioning loss L _obj Calculating the positioning error of a prediction frame of an image target object, wherein the positioning error comprises the coordinate error and the width-height error of a boundary frame; confidence loss L _reg Calculating the position error of a target object prediction frame; classification loss L _cls Calculating a class error of the detection target prediction frame;

classification loss L _cls Consists of target class loss and angle loss, expressed as follows by binary cross entropy loss:

wherein S is ² B is the number of anchor points, θ is the category of angle, I _ij Indicating that the jth anchor in the network detects the target object, I _ij =1; no target object is detected by the jth anchor, I _ij ＝0；P _i (c) Representing the probability of detection as a target object, P _i And (θ) represents the probability that the rotation angle of the target object is θ.

Preferably, the confidence loss L of the detection layer is improved based on the cross-over ratio _obj And calculating a real space relation between the positioning loss and the box by using the CIoU, and calculating an intersection ratio calculation formula:

wherein pred represents a target object prediction frame, targ represents a target object real boundary frame;

in the method, in the process of the invention,to measure similarity of aspect ratios;

weight function:

CIoU loss function:

wherein l (O) _b ,O _gt ) Representing the Euclidean distance, w, between the anchor frame center point and the bounding frame center point _gt And h _gt For the width and height of the bounding box, w _b And h _b Is the width and height of the anchor frame.

Preferably, the test image detection frame generation module comprises generation of a detection frame and display of a detection result, and the detection frame is subjected to de-duplication processing by adopting a control threshold in the generation process of the detection frame.

Therefore, the long and narrow target detection method based on deep learning has the following beneficial effects:

(1) The detection and regression mode of the detection model in the training and reasoning process is improved to be directional boundary box detection so as to meet the detection requirement of long and narrow target objects.

(2) The invention adopts the global attention mechanism to improve the representation capability of the image so as to obtain richer target characteristics.

(3) The invention adopts the directional boundary box, and can obtain the specific position of the rectangle in the image, thereby realizing the improvement of the detection performance and precision of the rotating target, reducing the size of the corresponding model and improving the detection accuracy through the detection method of the directional boundary box.

(4) The invention uses CIoU loss function, considers the position information between the detection frame and the real frame, and improves the detection performance.

(5) The invention adopts the control threshold value for de-duplication, and solves the problem that a plurality of detection frames appear on a target object in a visual result.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a flowchart of an overall implementation of an elongated object detection method based on deep learning;

FIG. 2 is a data annotation diagram of an elongated object detection method based on deep learning according to the present invention;

FIG. 3 is a CBS module architecture diagram of an elongated object detection method based on deep learning of the present invention;

FIG. 4 is a schematic diagram of an oriented bounding box of an elongated object detection method based on deep learning according to the present invention;

FIG. 5 is a GAM schematic diagram of an elongated object detection method based on deep learning according to the present invention;

FIG. 6 is a decoupling detection head of an elongated target detection method based on deep learning in accordance with the present invention;

FIG. 7 is a contrast diagram of a deduplication process for an elongated object detection method based on deep learning according to the present invention;

FIG. 8 is a model block diagram of an elongated object detection method based on deep learning according to the present invention;

Detailed Description

The technical scheme of the invention is further described below through the attached drawings and the embodiments.

Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Examples

As shown in fig. 1 to 8, the invention provides a long and narrow target detection method based on deep learning, which inputs a test image into a detection model to detect a target object in the image, wherein the detection model comprises a data acquisition and preprocessing module, a dense data detection network and training module and an image detection frame generation module.

The data acquisition and preprocessing module comprises a data acquisition module and a data preprocessing module, wherein the data acquisition module takes a plurality of target images shot by a camera as a data set for model training, verification and testing; the data preprocessing module adopts a target detection tool robalelmg to label a target image, cuts and rotates a data set, respectively carries out preprocessing operations of 30 degrees, 60 degrees, 90 degrees, 120 degrees and 180 degrees on the data set, cuts an image with the size of 2688 x 2688, and finally, the size of the image is 1024 x 1024, the coincidence degree of the cut image is 200, and the data set is randomly divided into a training set, a verification set and a test set, wherein the proportion is 7:2:1.

the detection model builds an initial model network based on a convolution, batch normalization and Sillu activation (CBS) module, a cross-stage part (CSP) structure, a Feature Pyramid Network (FPN), a Path Aggregation Network (PAN) module and a Space Pyramid Pooling (SPP) module, the architecture of the CBS module is shown in figure 3, the detection model extracts feature mapping by adopting convolution, normalization and activation operation, and the feature map with different downsampling rates is sent to a Neck structure by combining channel information fusion operation. The specific position of the rectangle in the image can be obtained, so that the detection performance and accuracy of the rotating target are improved, the size of a corresponding model is reduced, and the detection accuracy is improved through a detection method of the directional bounding box.

The detection model is improved based on initial YOLOX training, the detection and regression mode of the detection model in the training and reasoning process is improved to be directed boundary box detection, a backbone network continues to use YOLOX-dark 53, a global attention mechanism GAM is adopted, and a loss function is optimized.

Oriented bounding box detection, adding a rotation angle θ based on a conventional rectangular box, algebraically expressed as (x) _c ,y _c W, h, θ), where (x _c ,y _c ) Representing the coordinates of the center point of the range box, (w, h) representing the width and height of the range box.

A global attention mechanism GAM is added between the backhaul Backbone network and the neg network.

A global attention mechanism GAM comprising the steps of:

s3: activating by using a ReLU function;

s5: obtaining normalized weights through a sigmoid function;

The loss function adopts a multi-task loss form and mainly consists of a positioning loss L _obj Classification loss L _cls And confidence loss L _reg Composition, total loss L _total The expression is as follows:

L _total ＝L _obj +L _cls +L _reg

Improving confidence loss L of detection layer based on cross-correlation ratio _obj And calculating a real space relation between the positioning loss and the box by using the CIoU, and calculating an intersection ratio calculation formula:

weight function:

CIoU loss function:

The test image detection frame generation module comprises detection frame generation and detection result display, wherein the detection frame is subjected to de-duplication processing by adopting a control threshold value in the generation process of the detection frame.

Example 1

Taking dense wheat grains and impurities as an example, namely taking the smallest circumscribed rectangle for all detection frames of a target object, taking the circle center of the circumscribed rectangle, namely rotating the center point coordinates of the rectangular detection frames, and screening according to the distance and the confidence coefficient between the center point coordinates, wherein the specific pseudo code is as follows:

therefore, the method for detecting the long and narrow target based on deep learning is adopted, and the image with proper size is obtained and the training sample is added through preprocessing the data, so that the generalization capability of a network model is improved; adding a GAM global attention mechanism between a BackBone network of the backhaul and the Neck, enhancing the extraction capability of the network to the characteristics of the target object, and further improving the detection precision of the target; the method is introduced to accurately regress the detection frames, the repeated frames are removed by adopting a control threshold value, a CIoU loss function is adopted to obtain more accurate detection frame results, and the problems of low efficiency and detection omission caused by detection of long and narrow targets with uneven arrangement and different directions are solved.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. A long and narrow target detection method based on deep learning is characterized in that: the method comprises the steps of inputting a test image into a detection model, and detecting a target object in the image, wherein the detection model comprises a data acquisition and preprocessing module, a long and narrow target detection network training module and a test image detection frame generation module.

2. The method for detecting an elongated object based on deep learning according to claim 1, wherein: the data acquisition and preprocessing module comprises a data acquisition module and a data preprocessing module, wherein the data acquisition module takes a plurality of target images shot by a camera as a data set for model training, verification and testing; the data preprocessing module marks the target image by using a target detection tool ropylelmg, cuts and rotates the data set, and randomly divides the data set into a training set, a verification set and a test set.

3. The method for detecting an elongated object based on deep learning according to claim 1, wherein: the detection model adopts convolution, normalization and activation operation to extract feature mapping, and combines channel information fusion operation to send feature graphs with different downsampling rates to a Neck structure.

4. The method for detecting an elongated object based on deep learning according to claim 1, wherein: the detection model is improved based on initial YOLOX training, the detection and regression mode of the detection model in the training and reasoning process is improved to be directed bounding box detection, a global attention mechanism GAM is adopted, and a loss function is optimized.

5. The method for detecting an elongated object based on deep learning according to claim 4, wherein: the orientation bounding box detection adds a rotation angle θ based on a conventional rectangular box, algebraically expressed as (x) _c ,y _c W, h, θ), where (x _c ,y _c ) Representing the coordinates of the center point of the range box, (w, h) representing the width and height of the range box.

6. The method for detecting an elongated object based on deep learning according to claim 4, wherein: the global attention mechanism GAM is added between the backhaul Backbone network and the neg network.

7. The method for detecting an elongated object based on deep learning according to claim 6, wherein: the global attention mechanism GAM comprises the following steps:

s3: activating by using a ReLU function;

s5: obtaining normalized weights through a sigmoid function;

8. The method for detecting an elongated object based on deep learning according to claim 4, wherein: the loss function adopts a multi-task loss form and mainly comprises a positioning loss L _obj Classification loss L _cls And confidence loss L _reg Composition, total loss L _total The expression is as follows:

L _total ＝L _obj +L _cls +L _reg

9. The method for detecting an elongated object based on deep learning according to claim 8, wherein: improving confidence loss L of detection layer based on cross-correlation ratio _obj And calculating a real space relation between the positioning loss and the box by using the CIoU, and calculating an intersection ratio calculation formula:

weight function:

CIoU loss function:

10. The method for detecting an elongated object based on deep learning according to claim 1, wherein: the test image detection frame generation module comprises detection frame generation and detection result display, wherein the detection frame is subjected to de-duplication processing by adopting a control threshold in the generation process of the detection frame.