CN111680689A

CN111680689A - Target detection method, system and storage medium based on deep learning

Info

Publication number: CN111680689A
Application number: CN202010798762.6A
Authority: CN
Inventors: 马卫飞; 袁飞杨; 张胜森
Original assignee: Wuhan Jingce Electronic Group Co Ltd; Wuhan Jingli Electronic Technology Co Ltd
Current assignee: Wuhan Jingce Electronic Group Co Ltd; Wuhan Jingli Electronic Technology Co Ltd; Wuhan Jingce Electronic Technology Co Ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2020-09-18
Anticipated expiration: 2040-08-11
Also published as: CN111680689B

Abstract

The invention discloses a target detection method, a system and a storage medium based on deep learning. The training in the method comprises the following steps: extracting a feature mapping chart of a training sample; extracting an ROI (region of interest) region of the feature mapping graph to obtain a feature image of a positive rectangular ROI region; carrying out coordinate transformation on the regular rectangular ROI area to obtain a characteristic image of a corresponding oblique rectangular ROI area; calculating the characteristic image of the oblique rectangular ROI area and the IOU value of a real mark frame, and comparing the IOU value with a threshold value to determine a positive sample and a negative sample, wherein the threshold value is a dynamically adjusted value; converting the characteristic image of the oblique rectangular ROI area of the positive sample into a characteristic image of a corresponding positive rectangular ROI area; and outputting a detection result according to the converted characteristic image of the positive rectangular ROI area. The method can improve the target detection accuracy, is suitable for detecting the inclined target, and is particularly suitable for the field of panel defect detection.

Description

Target detection method, system and storage medium based on deep learning

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a target detection method and system based on deep learning and a storage medium.

Background

In AOI defect detection, it is important whether a detected bounding box (bounding box) fits a target defect. The more the detection result is attached to the target defect, the easier the manual re-judgment of the defect is, and the higher the detection precision is. In addition, aiming at the deep learning detection frames of the two stages, the closer the boundary frame of the target is, the more pertinent the feature extraction of the defect part is.

As shown in FIG. 1, a deep learning inspection method in the prior art is described by taking a two-stage inspection network representing Faster R-CNN currently used for defect inspection as an example. The Faster R-CNN detection network is mainly divided into 3 parts, namely a backbone network for feature extraction, a regional candidate network (RPN) and a regression classification network. The RPN is used to intercept a suggested region like (x, y, w, h) on the feature map obtained from the feature extraction layer, and then the region is sent to a final regression classification layer, and the regression classification further obtains the final target coordinate and category on the suggested region.

In this detection method, the output bounding box coordinates are also in the form of (x, y, w, h), and the bounding box is a positive rectangle. When the method is used for detecting general defects, the output bounding box can be attached to the target defects. However, when detecting some long or oblique defects, such as scratches, stains, etc., the output bounding box does not fit the target defect well. In addition, the coordinate frame obtained in the RPN is also a positive rectangle, so the proposed area fed into the final regression and classification network carries a large number of background features in addition to defects, which is very disadvantageous for regression and classification.

Disclosure of Invention

Aiming at least one defect or improvement requirement in the prior art, the invention provides a target detection method, a system and a storage medium based on deep learning, which can improve the target detection accuracy rate, are suitable for the detection of inclined targets, and are particularly suitable for the field of panel defect detection.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a target detection method based on deep learning, including a step of inputting training samples into a convolutional neural network for training, and a step of performing inclined target detection using the trained convolutional neural network, where the training includes:

performing feature extraction on a training sample to obtain a feature mapping chart of the training sample, wherein the training sample is marked with a target marking frame in advance;

extracting an ROI (region of interest) region of the feature mapping graph to obtain a feature image of a positive rectangular ROI region;

carrying out coordinate transformation on the regular rectangular ROI area to obtain a characteristic image of a corresponding oblique rectangular ROI area;

calculating the feature image of the oblique rectangular ROI area and the IOU value of the target marking frame, comparing the IOU value with a threshold value T to determine a positive sample and a negative sample, wherein the threshold value T is predefined to have a preset threshold value, and dynamically adjusting the threshold value T so that the threshold value T is reduced along with the increase of the length-width difference of the target marking frame and is increased along with the reduction of the length-width difference of the target marking frame;

converting the characteristic image of the oblique rectangular ROI area of the positive sample into a characteristic image of a corresponding positive rectangular ROI area;

and outputting a detection result according to the converted characteristic image of the positive rectangular ROI area.

Preferably, the threshold T is satisfied

Said T is₀And w is the length of the target mark frame, and h is the width of the target mark frame, wherein the preset threshold value is used.

Preferably, the preset threshold value T₀Is 0.5.

Preferably, a uniform marking mode is predefined, and the training samples are marked according to the marking mode.

Preferably, the marking mode is as follows: and selecting one point of a preset direction as an initial point, and sequentially marking the rest three vertexes of the target marking frame according to a preset direction.

Preferably, the target detection method based on deep learning is applied to the field of panel defect detection, and the detection target is a tilt defect.

According to a second aspect of the present invention, there is provided a deep learning-based target detection system, including a training module for inputting training samples into a convolutional neural network for training and a detection module for performing inclined target detection by using the trained convolutional neural network, the training module including:

the characteristic extraction module is used for extracting characteristics of a training sample to obtain a characteristic mapping chart of the training sample, and the training sample is marked with a target marking frame in advance;

the ROI area extraction module is used for extracting an ROI area of the feature mapping graph to obtain a feature image of a positive rectangular ROI area;

the region transformation module is used for carrying out coordinate transformation on the regular rectangular ROI region to obtain a characteristic image of a corresponding oblique rectangular ROI region; the system is also used for calculating the IOU value of the characteristic image of the oblique rectangular ROI area and the target marking frame, comparing the IOU value with a threshold value T to determine a positive sample and a negative sample, wherein the threshold value T is predefined to have a preset threshold value, and dynamically adjusting the threshold value T so that the threshold value T is reduced along with the increase of the length-width difference of the target marking frame and is increased along with the decrease of the length-width difference of the target marking frame; the characteristic image of the inclined rectangular ROI area of the positive sample is converted into a characteristic image of a corresponding positive rectangular ROI area;

and the classification module is used for receiving the converted characteristic image of the positive rectangular ROI and outputting a detection result.

According to a third aspect of the present invention, there is provided a convolutional neural network data processing system for target detection, comprising:

the backbone network is used for extracting features of a training sample to obtain a feature mapping chart of the training sample, and the training sample is marked with a target marking frame in advance;

the region selection network is used for extracting the ROI region of the feature mapping graph to obtain a feature image of the regular rectangular ROI region;

the region transformation network is used for carrying out coordinate transformation on the regular rectangular ROI region to obtain a characteristic image of a corresponding oblique rectangular ROI region; the matching module is further used for calculating the IOU value of the characteristic image of the oblique rectangular ROI area and the oblique rectangular target marking frame, comparing the IOU value with a threshold value T to determine a positive sample and a negative sample, wherein the threshold value T is predefined to have a preset threshold value, and dynamically adjusting the threshold value T so that the threshold value T is reduced along with the increase of the length-width difference of the target marking frame and is increased along with the reduction of the length-width difference of the target marking frame; the characteristic image of the inclined rectangular ROI area of the positive sample is converted into a characteristic image of a corresponding positive rectangular ROI area;

and the regression classification network is used for receiving the converted characteristic image of the positive rectangular ROI and outputting a detection result.

According to a fourth aspect of the invention, there is provided a computer-readable storage medium having a computer program stored thereon, characterized in that the computer program realizes any of the above methods when executed by a processor.

In general, compared with the prior art, the invention has the following beneficial effects:

(1) the method for dynamically and adaptively adjusting the threshold is applied to target detection, can improve the target detection accuracy, is suitable for detection of an inclined target, is particularly suitable for the field of panel defect detection, and enables a detection result to be more fit with the defect shape.

(2) Aiming at the detection algorithm of the tilt defect, a marking method for fitting any quadrangle is provided, and the marking method is more beneficial to the training process of the tilt defect detection algorithm.

Drawings

FIG. 1 is a schematic diagram of a prior art convolutional neural network;

FIGS. 2 and 3 are schematic diagrams illustrating the alignment of the mark frame with the oblique rectangular ROI area according to the embodiment of the present invention;

FIG. 4 is a schematic diagram of a marking method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The target detection method based on deep learning comprises the steps of inputting training samples into a convolutional neural network for training and detecting inclined targets by using the trained convolutional neural network, wherein the training comprises the steps S1 to S6.

S1: and performing feature extraction on the training sample to obtain a feature mapping chart of the training sample, wherein the training sample is marked with a target marking frame in advance.

The convolutional neural network adopted by the embodiment of the invention comprises a backbone network, a regional selection network (RPN network), a regional transformation network (Roi Transformer) and a regression classification network. The Roi Transformer includes a rroierner module and a Roi Align module, and detailed implementation and principles of the Roi Transformer will be described in detail later.

And sending the training sample to a backbone network to obtain a required feature mapping chart.

S2: and extracting the ROI area of the feature mapping image acquired in the last step to acquire a feature image of the regular rectangular ROI area.

And (4) sending the feature map obtained in the step (S1) to an RPN network to obtain the coordinates of the ROI area without the rotation of the regular rectangle, and intercepting a feature image corresponding to the ROI area of the regular rectangle according to the coordinates of the ROI area of the regular rectangle in the feature map output in the step (S2).

S3: and carrying out coordinate transformation on the positive rectangular ROI area in the last step to obtain a characteristic image of the corresponding oblique rectangular ROI area.

Sending the coordinate value of the regular rectangular ROI obtained in the step S2 into a Rroileramer module of a ROI Transformer to obtain the coordinate value of an inclined rectangular ROI area with an inclined angle; and the corresponding image is obtained according to the screenshot of the inclined rectangular ROI area coordinate value on the feature map output in the step S1.

S4: the feature image of the diagonal rectangular ROI region of the last step is computed with the IOU value of the target mark box of step S1, the computed IOU value is compared with a threshold to determine positive and negative samples, and the threshold can be dynamically adjusted.

The IOU (interaction over) value is a measure of the degree of coincidence of two images compared.

The feature image of this diagonal rectangular ROI region is matched with the feature image line of the pre-marked target mark frame, and the IOU values of the diagonal rectangular ROI and the manually marked target mark frame are calculated. A positive sample if the calculated IOU value is greater than the threshold, and a negative sample otherwise.

The threshold value compared with the threshold value can be dynamically adjusted, and is not a fixed threshold value commonly used in the prior art. The detailed implementation and principles of the dynamic threshold are described in detail below.

S5: and converting the characteristic image of the inclined rectangular ROI area of the positive sample into the characteristic image of the corresponding positive rectangular ROI area.

Because the regression classification network is suitable for processing the feature images of the regular rectangles, the feature images of the inclined rectangular ROI area of the reserved positive sample are input to an ROI Align module of the ROI Transformer and are converted into the feature images of the corresponding regular rectangular ROI area.

S6: and inputting the feature image of the positive rectangular ROI converted in the last step into a regression classification network to output a detection result.

The convolutional neural network may be embodied in the following manner.

The convolutional neural network of the embodiment of the invention is mainly different from the traditional two-stage detection network in that a Roi Transformer module is added between the RPN and the regression classification layer, thereby solving the problem of the conversion process from the regular rectangular coordinate to the oblique rectangular coordinate.

The Roi Transformer is mainly composed of two parts, RRoI Learner and RoI Align. After RPN and RoiAlign, a Horizontal ROI region (HRoi, Horizontal region of interest) of the form (x, y, w, h) is obtained. In step S3, HRoi is fed into the fully connected layer with dimension 5, and the regression target is the mark true value (RGT) of the image region rotation covered by the mark frame relative to the shift of HRoi (r) ((r))

，

，

) The formula is as follows:

wherein the content of the first and second substances,

represents the coordinates of the skewed rectangular ROI after the transformation and the vector formed by the length and width stack,

frame shape of indication markThe true coordinates of formula (iv). This completes the work of RRoI Learner. The obtained 5-dimensional coordinates are mapped to the feature map of step S1, and a feature image of the oblique rectangular ROI is extracted.

In step S5, to extract rotation invariant features in the network, a rotation position sensitive ROI Align module is used, which can convert the rotated matrix into a positive rectangle for use by the final regression classification layer. For each pixel in the diagonal rectangular ROI, it can be calculated by the following transformation formula:

wherein x is^，，y^，For each vertex coordinate of the transformed positive rectangle. x, y are coordinates of each vertex of the mark box.

And sending the transformed positive rectangular ROI into a final regression classification layer to obtain a final rotating frame coordinate.

However, applying the ROI Transformer module directly in the target defect has two problems:

(1) in the prior art, a fixed IOU threshold is generally used, and a uniform threshold, for example, 0.5, is set, which may cause a problem of too few or too many positive samples matching, thereby affecting the accuracy of detection. In some application scenarios, such as a panel defect detection scenario, a large number of samples with large length-width ratios exist, and the fixed IOU threshold is used to screen positive samples, which may cause the problems of insufficient number of positive samples and low model recall rate. Therefore, the embodiment of the invention adopts a dynamic threshold method to screen positive samples.

(2) At present, a unified rule is not formed in the labeling method of the Roi Transformer, so that a labeled sample cannot train a network well, and a corresponding labeling method needs to be designed according to the characteristics of panel defect detection. Therefore, in the embodiment of the present invention, a uniform labeling manner may be predefined, and the training samples are labeled according to the uniform labeling rule.

The adaptive dynamic threshold may be implemented in the following manner.

In target detection, matching the real mark box with the ROI area is a necessary process, and determines the allocation of positive and negative samples. The matching criterion is the IOU value, when the IOU is larger than a certain threshold value, the matching is successful, and the matching is considered as a positive sample, otherwise, the matching is considered as a negative sample. When matching is performed, the aspect ratio of the mark frame has a great influence on the matching.

As shown in fig. 2, GT represents the real target mark box, and the IOU is large when the difference between the length and width of the mark box and the oblique rectangular ROI area is small. As shown in fig. 3, the IOU is smaller when the angle is the same as that of fig. 2, but the difference in length and width is larger. If the fixed threshold method in the prior art is adopted, the characteristic image of the oblique rectangular ROI in FIG. 3 does not satisfy the matching condition.

Therefore, the embodiment of the invention adopts a method for self-adaptively adjusting the IOU threshold value.

Preferably, a preset threshold is predefined, and the IOU threshold is dynamically adjusted based on the preset threshold, so that the threshold decreases as the length-width difference of the mark frames increases and increases as the length-width difference of the mark frames decreases.

Preferably, let the threshold be denoted as T, the threshold T is satisfied

，T₀And w is the length of the mark frame and h is the width of the mark frame for presetting the threshold value. When w = h, the threshold is a preset threshold T₀(ii) a However, if the ratio of w to h becomes larger and larger, the threshold T becomes smaller because the overlapping portion of the real detection frame GT and the ROI becomes smaller as the ratio of length to width becomes larger; when the ratio of w to h becomes smaller, the threshold tends to 0.5.

Preferably, T is selected₀At 0.5, the threshold T satisfies the formula:

。

the marking method of the target marking box can be realized by the following mode.

To simplify the labeling process, the real coordinate frame label is configured as

The form of the four coordinate points can be automatically converted into the form of the four coordinate points in the training process

In the form of (1). In order to make the learning of the coordinate frame angle uniform, 4 points are marked by a uniform marking mode. Preferably, a point in a preset direction is selected as an initial point, and the remaining three vertexes of the target marking frame are marked in sequence according to a preset direction. As shown in fig. 4, the top left corner of the target defect is selected as the initial point, and marked with 1, 2, 3, and 4 in the clockwise direction. The coordinate frame is in a rotating rectangular state rather than an arbitrary quadrilateral shape as much as possible, so that the coordinate frame is more conveniently converted into a coordinate form with an angle.

The target detection system based on deep learning comprises a module for inputting training samples into a convolutional neural network to carry out training and a classification module for carrying out inclined target detection by utilizing the trained convolutional neural network, wherein the training module comprises a feature extraction module, an ROI (region of interest) extraction module, a region transformation module and a detection module:

the characteristic extraction module is used for extracting characteristics of the training samples to obtain a characteristic mapping chart of the training samples, and the training samples are marked with target marking frames in advance;

the ROI area extraction module is used for extracting an ROI area of the feature mapping graph to obtain a feature image of the positive rectangular ROI area;

the region transformation module is used for carrying out coordinate transformation on the positive rectangular ROI region to obtain a characteristic image of the corresponding oblique rectangular ROI region; the system is also used for calculating the IOU value of the characteristic image of the oblique rectangular ROI area and the target marking frame, comparing the IOU value with a threshold value to determine a positive sample and a negative sample, wherein the threshold value is predefined with a preset threshold value, and dynamically adjusting the threshold value to ensure that the threshold value is reduced along with the increase of the length-width difference of the target marking frame and is increased along with the reduction of the length-width difference of the target marking frame; the characteristic image of the inclined rectangular ROI area of the positive sample is converted into a characteristic image of a corresponding positive rectangular ROI area;

The implementation principle and technical effect of the target detection system device based on deep learning are similar to those of the target detection method, and are not described herein again.

The target detection method and system based on deep learning of the embodiment of the invention can be applied to the field of panel defect detection, and the detected target is an inclined defect. The method can also be applied to the detection field of other target objects, such as pedestrian detection, defect detection of other scenes and the like.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the technical solution of any one of the above-mentioned embodiments of the target detection method. The implementation principle and technical effect are similar to those of the above method, and are not described herein again.

It must be noted that in any of the above embodiments, the methods are not necessarily executed in order of sequence number, and as long as it cannot be assumed from the execution logic that they are necessarily executed in a certain order, it means that they can be executed in any other possible order.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A target detection method based on deep learning is characterized by comprising a step of inputting training samples into a convolutional neural network for training and a step of utilizing the trained convolutional neural network for inclined target detection, wherein the training comprises the following steps:

2. The deep learning-based target detection method of claim 1, wherein the threshold T is satisfied

3. The deep learning-based target detection method as claimed in claim 2, wherein the preset threshold T is₀Is 0.5.

4. The method for detecting the target based on the deep learning as claimed in any one of claims 1, 2 or 3, wherein a uniform marking mode is predefined, and the training samples are marked according to the marking mode.

5. The target detection method based on deep learning of claim 4, wherein the marking mode is as follows: and selecting one point of a preset direction as an initial point, and sequentially marking the rest three vertexes of the target marking frame according to a preset direction.

6. The method as claimed in claim 1, 2 or 3, wherein the method is applied to the field of panel defect detection, and the target is detected as a tilt defect.

7. A deep learning-based target detection system is characterized by comprising a training module for inputting training samples into a convolutional neural network for training and a detection module for detecting inclined targets by using the trained convolutional neural network, wherein the training module comprises:

8. The deep learning based object detection system of claim 7,the threshold T is satisfied

9. A convolutional neural network data processing system for use in target detection, comprising:

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.