CN112365497A

CN112365497A - High-speed target detection method and system based on Trident Net and Cascade-RCNN structures

Info

Publication number: CN112365497A
Application number: CN202011405295.2A
Authority: CN
Inventors: 刘凤余; 张琦; 张平平; 张冲
Original assignee: Shanghai Zhuofan Information Technology Co ltd
Current assignee: Shanghai Zhuofan Information Technology Co ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-02-12

Abstract

The invention provides a high-speed target detection method based on Trident Net and Cascade-RCNN structures, which comprises the steps of obtaining a data set of target detection images, and performing enhancement processing on the images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and carrying out target detection on the image to be detected through the target detection model, wherein the system comprises modules corresponding to the steps of the method.

Description

High-speed target detection method and system based on Trident Net and Cascade-RCNN structures

Technical Field

The invention relates to the field of artificial intelligence and machine learning, in particular to the field of deep learning, and discloses a high-speed target detection method and system based on Trident Net and Cascade-RCNN structures.

Background

In a plurality of application scenes of target detection, due to different distances between a target to be detected and a detector (sensor), the detection scales of the same type of target are different, and natural defects and low accuracy rate are inevitably generated when the same scale is used for detecting the targets with different sizes, so that multi-scale detection is forced to be needed. Although the conventional detection model can perform detection in a multi-scale mode, due to the preset template, the feature expression capacities of targets with different sizes are different, for example, the targets with too large or too small sizes are difficult to accurately detect. In order to enable the expression capability of the model to be approximate to the expression capability of targets with different sizes, the Trident Net introduces a scale-aware parallel structure, firstly proposes that the receptive fields have influence on objects with different sizes and dimensions in a target detection task, adopts scaled constraint to obtain characteristic diagrams of different receptive fields, and uses parameter sharing to ensure smaller parameters and calculated amount.

In the target detection task, threshold selection of an intersection ratio (IOU) has a great influence on detection, and a high-quality sample is obtained more easily when the threshold is higher. However, the pursuit of a high threshold at a glance raises a certain degree of problems: (1) sample reduction induced overfitting, (2) the use of different thresholds in the training and reasoning process can easily lead to mismatching. The Cascade-RCNN is a Cascade detection structure, and the core is to train positive and negative samples on different networks in a stage-by-stage mode by utilizing a continuously rising threshold value so as to enable a detector of each stage to be focused on detecting the prosusal of an IOU in a certain range, and because an output IOU is generally larger than an input IOU, the detection effect is continuously improved.

In order to obtain a more accurate target detection result by combining the advantages of TridentNet and Cascade-RCNN, the patent provides a method for combining two model structures, and the advantages of TridentNet and Cascade-RCNN are fully utilized to solve the multi-scale problem and the IOU threshold selection problem in target detection.

Disclosure of Invention

In order to at least partially solve the above problems, the present invention provides a method and a system for detecting a high-speed target based on a TridentNet and Cascade-RCNN structure, wherein the method comprises: the method comprises the following steps:

acquiring a data set of a target detection image, and enhancing the image in the data set;

constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;

training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;

and carrying out target detection on the image to be detected through the target detection model.

Further, the extracting a speech feature sequence from a user speech waveform includes:

the acquiring a data set of a target detection image and performing enhancement processing on the image in the data set includes:

step a1, performing global brightness enhancement processing on the brightness component of the image in the data set by using the following formula:

wherein L is_γ(c, d) represents a brightness function after global brightness adjustment, Gamma represents a Gamma change coefficient, I (c, d) represents an original brightness component of an image in a data set, q represents a standard difference value of a Gaussian function, pi represents a natural constant, exp represents an exponential function, and (c, d) represents coordinates of image pixel points in the data set;

step a2, after performing global brightness enhancement processing on the brightness component of the image in the data set, performing saturation enhancement processing according to the following formula:

where H' represents the enhanced saturation component, H represents the original saturation component of the image in the data set, ψ represents the parameter value, and T represents the average luminance of the illumination information.

Furthermore, the feature extraction network is divided into three paths of different scale features of the learning target by using a scale-aware parallel structure mode in the TridentNet and combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of the target detection model has a better detection effect on the targets with different scales.

Furthermore, the backbone network is ResNet-18, the dual-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network.

Further, the training the neural network through the enhanced data set, and determining the target according to the IOU threshold in the training process to obtain a target detection model, including:

acquiring an enhanced data set, and removing redundant repeated data;

the data are accurately marked in position and category by using the existing marking software, and a marking file is generated;

the method comprises the steps of enabling a label file to correspond to picture data in a data set one by one, then analyzing the label file to generate tag data in a txt format, and randomly segmenting the data set into a training set, a verification set and a test set;

training a neural network in the target detection model by using the training set data, carrying out target detection model verification observation by using the verification set data in a continuous iteration process, then testing the target detection model on the test set data, and analyzing the detection result of the target detection model.

Further, before the training of the neural network in the target detection model using the training set data, the method further includes: the method comprises the following steps of carrying out normalization processing on training set data:

step A1, calculating the mean and variance of the samples in the training set according to the following formulas:

wherein, mu_aRepresents the mean value of the sample data in the training set,

representing the variance, x, of sample data in the training set_iRepresents the ith sample data in the training set, i is 1,2, …, N;

step A2, the training set is normalized according to the following formula:

wherein,

represents the ith sample data in the normalized training set, and epsilon represents the minimum value.

Further, measuring the training degree of the neural network according to the Loss function Loss when training the neural network, including:

the Loss function Loss is obtained according to the following formula:

Loss＝R_cls[h]+R_loc[f]

wherein R is_cls[h]Representing a classification loss function, R_loc[f]Represents a position loss function;

the classification loss function is expressed as:

wherein, h (x)_i) Probability estimation, y, representing the posterior distribution of classes_iRepresents a class label, L_clsRepresents the cross entropy loss;

the position loss function is expressed as:

wherein, (f (x)_i,b_i),g_i) Representing an image area x_iPredicted frame b_iRegression to labeled label borders, L_locRepresenting the bezel regression loss.

Further, the bounding box regression loss L_locUsing smoothed L₁And (4) loss.

Further, the performing target detection on the image to be detected through the target detection model includes:

acquiring an image to be detected, dividing the image into three CNNs in a scale-aware mode for feature extraction, and performing high-speed feature extraction on each CNN by using a network strut of Resnet 18;

carrying out equal-size series operation on the extracted features to obtain the features after series connection;

and (3) performing high-speed feature extraction and processing on the features after series connection by adopting a network strut of Resnet18, dividing the features into three sub-features after feature extraction, setting different IOU values for the sub-features, performing cascade operation on the feature graph of three-way border regression, and taking the classification result and the border regression result after third-stage cascade as a final target detection result.

Further, the system comprises:

the data enhancement module is used for acquiring a data set of a target detection image and enhancing the image in the data set;

the network construction module is used for constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, each backbone network is fused with a deformable convolution network, and the prediction network comprises a double-branch structure;

the model acquisition module is used for training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;

and the target detection module is used for carrying out target detection on the image to be detected through the target detection model.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a high-speed target detection method based on Trident Net and Cascade-RCNN structures, which comprises the steps of obtaining a data set of target detection images, and performing enhancement processing on the images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; the target detection model is used for carrying out target detection on the image to be detected, and the reasonable and high-speed multi-scale detection model is designed, so that the accuracy and efficiency of target detection can be greatly improved.

The following description of the preferred embodiments for carrying out the present invention will be made in detail with reference to the accompanying drawings so that the features and advantages of the present invention can be easily understood.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments of the present invention will be briefly described below. Wherein the drawings are only for purposes of illustrating some embodiments of the invention and are not to be construed as limiting the invention to all embodiments thereof.

FIG. 1 is a flow chart of a high-speed target detection method based on TridentNet and Cascade-RCNN structures according to the present invention;

FIG. 2 is a block diagram of a high-speed target detection system based on TridentNet and Cascade-RCNN structures according to the present invention;

FIG. 3 is a diagram of the Trident-Cascade-RCNN neural network structure of the present invention;

FIG. 4 is an original test data diagram;

FIG. 5 is a diagram showing the detection result of the Trident-Cascade-RCNN dual-structure network of the present invention;

FIG. 6 is a diagram showing the results of detection by the Cascade-RCNN network;

fig. 7 is a graph of the detection results of the TridentNet network.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1 to 7, the technical problem to be solved by the present invention is to provide a method and a system for detecting a high-speed target based on a TridentNet and Cascade-RCNN structure, wherein the method comprises:

The working principle of the technical scheme is as follows: firstly, acquiring a data set of a target detection image, and enhancing the image in the data set; then, a neural network is constructed, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network refers to a scale-aware parallel structure mode in Trident Net, and is divided into different scale features of three paths of learning targets by combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of a target detection model has a better detection effect on targets with different scales, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, the prediction network comprises a double-branch structure, the backbone networks are Res-18, the double-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network; secondly, training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and finally, carrying out target detection on the image to be detected through the target detection model.

The beneficial effects of the above technical scheme are that: by designing a reasonable and high-speed multi-scale target detection model according to the technical scheme, the accuracy and efficiency of target detection can be greatly improved, wherein the target detection model comprises a neural network, and by using the neural network for reference to fast-RCNN, FPN (feature probable network) and RPN (region probable network) networks are adopted in sequence to obtain the feature mapping of the candidate frame, so that the feature extraction network can better extract features, the target detection model can be better trained, overfitting is reduced, and the accuracy and robustness of the target detection model are greatly improved.

In an embodiment provided by the present invention, the acquiring a data set of a target detection image and performing enhancement processing on an image in the data set includes:

The beneficial effects of the above technical scheme are that: the image characteristic information can be enhanced through the technology, different places in the image can be better distinguished by improving the brightness and the saturation of the image, so that the characteristic information is more sensitive and more obvious, the algorithm adjusts the brightness by adopting a Gaussian function, the image contrast is increased along with the increase of the standard difference value of the Gaussian function due to the brightness, but the brightness value is reduced, in order to solve the problem, the saturation component is also processed on the image, so that the contrast is enhanced, the characteristic information of the image in a database is more obvious, and the characteristic extraction is convenient.

In an embodiment of the present invention, the training of the neural network by the enhanced data set, and the determining of the target according to the IOU threshold in the training process to obtain the target detection model include:

acquiring an enhanced data set, and removing redundant repeated data;

The working principle of the technical scheme is as follows: firstly, acquiring an enhanced data set and removing redundant data, wherein PCA principal component analysis is used for removing the redundant data; secondly, accurately marking the position and the type of the data by using the existing marking software labellimg, and generating a marking file; secondly, the label file and the picture data in the data set are in one-to-one correspondence, then the label file is analyzed to generate tag data in a txt format, and the data set is randomly segmented into a training set, a verification set and a test set; and finally, training the neural network in the target detection model by using training set data, wherein the data normalization is required to be carried out by the following steps before the training of the neural network in the target detection model by using the training set data: step A1, calculating the mean and variance of the samples in the training set according to the following formulas:

wherein, mu_aRepresents the mean value of the sample data in the training set,

represents a training setVariance of this data, x_iRepresents the ith sample data in the training set, i is 1,2, …, N;

step A2, the training set is normalized according to the following formula:

wherein,

representing the ith sample data in the normalized training set, representing the minimum value, preventing the calculation error when the denominator is zero, wherein, the normalized data needs to be reconstructed when the gradient is reversed,

wherein, y_iThe representative category label, xi, beta, represents the parameter to be learned, and in order not to change the distribution characteristic of the data, xi, beta should be calculated according to the following formula:

β＝μ_a(ii) a When training the neural network, measuring the training degree of the neural network according to the Loss function Loss, comprising the following steps:

the Loss function Loss is obtained according to the following formula:

Loss＝R_cls[h]+R_loc[f]

the classification loss function is expressed as:

the position loss function is expressed as:

wherein, (f (x)_i,b_i),g_i) Representing an image area x_iPredicted frame b_iRegression to labeled label borders, L_locRepresenting the frame regression loss, frame regression loss L_locUsing smoothed L₁Loss; the method comprises the steps of using verification set data to verify and observe a target detection model in a continuous iteration process, testing the target detection model on the test set data, and analyzing a detection result of the target detection model, wherein the target detection model is realized based on Trident Net and Cascade-RCNN double-structure networks, the Trident module and the Cascade module are respectively marked as a Trident module and a Cascade module, the Trident module adopts a scale-aware parallel structure mode and is divided into three paths to extract CNN features, and each CNN has a backbone structure of ResNet-18 so as to comprehensively consider the capability and efficiency problems of feature extraction. The CNN structure uses hole convolution (scaled convolution) with different convolution rates to obtain feature maps under different receptive fields. The three branches share weight parameters, and the features of different scales are extracted according to the receptive fields of different scales, so that the Trident part of the model has a better feature extraction effect on the targets of different scales. Then, performing localization operation, and performing feature merging to send the feature merged into the next network module; the Cascaded module directly divides the characteristics output by the Trident module into three sub-networks according to the difference of IOU thresholds, the detection frame after the regression of the first sub-network is input into the second sub-network for frame classification and regression, the detection frame after the regression of the second sub-network is input into the third sub-network for frame classification and regression, and the final result after the classification and regression of the third sub-network is the final network output result. In the Cascade module, the IOU thresholds of the three sub-networks are respectively set to be 0.4, 0.6 and 0.8 according to the actual trial situation, so that the optimal average classification and regression precision can be obtained; the data of the test set can be respectively detected in TridentNet and Cascade-RCNN, so thatAnd analyzing the detection results of the Cascade-RCNN network, the TridentNet network and the TridentNet and Cascade-RCNN double structures.

The beneficial effects of the above technical scheme are that: by adopting a scale-aware parallel structure mode, the CNN characteristic extraction is carried out in three paths, so that favorable characteristics can be better extracted at high speed, the normalization operation is carried out on training set data before the network is trained, the data are in accordance with positive distribution, so that the loss function can be better reduced, the overfitting is prevented, the training data can be better fitted by network parameters, the precision value and the accuracy of a target detection model are greatly improved, and the frame regression loss in the loss function adopts L₁Loss, L₁The loss is based on image comparison difference, then an absolute value is taken, the problem of gradient explosion in the iterative process can be better prevented, when the loss is reduced to a certain value, the verification set and the test set are used for verification and test, the quality of the model is measured through mAP of a printed target detection model, the quality is automatically detected and identified by a computer, extra manual maintenance is not needed, and therefore the intelligent level is greatly improved.

A high-speed object detection system based on the TridentNet and Cascade-RCNN architecture, the system comprising:

The working principle of the technical scheme is as follows: firstly, a data enhancement module acquires a data set of a target detection image and performs enhancement processing on the image in the data set; then, a network construction module constructs a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network refers to a scale-aware parallel structure mode in Trident Net, and is divided into different scale features of three paths of learning targets by combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of a target detection model has a better detection effect on targets with different scales, the feature extraction network is fused with a plurality of backbone networks and comprises the feature pyramid network, each backbone network can be deformed and fused with a convolution network, the prediction network comprises a double-branch structure, the backbone network is ResNet-18, the double-branch structures are respectively FC-head and Conv-head, the FC-head is used as a classification network, and the Conv-head is used as a regression network; secondly, training the neural network by the enhanced data set through a model acquisition module, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and finally, the target detection module performs target detection on the image to be detected through the target detection model.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle scope of the present invention should be included in the protection scope of the present invention.

Claims

1. A high-speed target detection method based on Trident Net and Cascade-RCNN structures is characterized in that: the method comprises the following steps:

2. The method as claimed in claim 1, wherein the step of obtaining a data set of target detection images and performing enhancement processing on the images in the data set comprises:

3. The method as claimed in claim 1, wherein the feature extraction network uses a scale-aware parallel structure mode in TridentNet for reference, combines the features of a feature pyramid network and an FPN network, and divides the feature into different scale features of three paths of learning objects, so that the Trident part of the object detection model has better detection effect on objects with different scales.

4. The method as claimed in claim 1, wherein the backbone network is ResNet-18, the dual-branch structures are FC-head and Conv-head, the FC-head is a classification network, and the Conv-head is a regression network.

5. The method as claimed in claim 1, wherein the training of the neural network is performed by the enhanced data set, and the target is judged according to the IOU threshold during the training process to obtain the target detection model, and the method comprises:

acquiring an enhanced data set, and removing redundant repeated data;

6. The method of claim 5, wherein the training of the neural network in the target detection model using the training set data further comprises: the method comprises the following steps of carrying out normalization processing on training set data:

wherein, mu_aRepresents the mean value of the sample data in the training set,

step A2, the training set is normalized according to the following formula:

wherein,

7. The method as claimed in claim 4, wherein the measuring of the training degree of the neural network according to the Loss function Loss during the training of the neural network comprises:

the Loss function Loss is obtained according to the following formula:

Loss＝R_cls[h]+R_loc[f]

the classification loss function is expressed as:

the position loss function is expressed as:

8. The method as claimed in claim 7, wherein the bounding box regression loss L is L_locUsing smoothed L₁And (4) loss.

9. The method as claimed in claim 1, wherein the performing of the object detection on the image to be detected by the object detection model comprises:

10. A high-speed object detection system based on the TridentNet and Cascade-RCNN architecture, the system comprising: