CN112365497A - High-speed target detection method and system based on Trident Net and Cascade-RCNN structures - Google Patents
High-speed target detection method and system based on Trident Net and Cascade-RCNN structures Download PDFInfo
- Publication number
- CN112365497A CN112365497A CN202011405295.2A CN202011405295A CN112365497A CN 112365497 A CN112365497 A CN 112365497A CN 202011405295 A CN202011405295 A CN 202011405295A CN 112365497 A CN112365497 A CN 112365497A
- Authority
- CN
- China
- Prior art keywords
- network
- target detection
- training
- data
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 130
- 238000012549 training Methods 0.000 claims abstract description 73
- 238000013528 artificial neural network Methods 0.000 claims abstract description 51
- 238000000034 method Methods 0.000 claims abstract description 42
- 238000000605 extraction Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000006870 function Effects 0.000 claims description 31
- 238000012360 testing method Methods 0.000 claims description 16
- 238000012795 verification Methods 0.000 claims description 13
- 238000013527 convolutional neural network Methods 0.000 claims description 8
- 230000002708 enhancing effect Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 4
- 230000000694 effects Effects 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005286 illumination Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 230000009286 beneficial effect Effects 0.000 description 5
- 230000001094 effect on targets Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a high-speed target detection method based on Trident Net and Cascade-RCNN structures, which comprises the steps of obtaining a data set of target detection images, and performing enhancement processing on the images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and carrying out target detection on the image to be detected through the target detection model, wherein the system comprises modules corresponding to the steps of the method.
Description
Technical Field
The invention relates to the field of artificial intelligence and machine learning, in particular to the field of deep learning, and discloses a high-speed target detection method and system based on Trident Net and Cascade-RCNN structures.
Background
In a plurality of application scenes of target detection, due to different distances between a target to be detected and a detector (sensor), the detection scales of the same type of target are different, and natural defects and low accuracy rate are inevitably generated when the same scale is used for detecting the targets with different sizes, so that multi-scale detection is forced to be needed. Although the conventional detection model can perform detection in a multi-scale mode, due to the preset template, the feature expression capacities of targets with different sizes are different, for example, the targets with too large or too small sizes are difficult to accurately detect. In order to enable the expression capability of the model to be approximate to the expression capability of targets with different sizes, the Trident Net introduces a scale-aware parallel structure, firstly proposes that the receptive fields have influence on objects with different sizes and dimensions in a target detection task, adopts scaled constraint to obtain characteristic diagrams of different receptive fields, and uses parameter sharing to ensure smaller parameters and calculated amount.
In the target detection task, threshold selection of an intersection ratio (IOU) has a great influence on detection, and a high-quality sample is obtained more easily when the threshold is higher. However, the pursuit of a high threshold at a glance raises a certain degree of problems: (1) sample reduction induced overfitting, (2) the use of different thresholds in the training and reasoning process can easily lead to mismatching. The Cascade-RCNN is a Cascade detection structure, and the core is to train positive and negative samples on different networks in a stage-by-stage mode by utilizing a continuously rising threshold value so as to enable a detector of each stage to be focused on detecting the prosusal of an IOU in a certain range, and because an output IOU is generally larger than an input IOU, the detection effect is continuously improved.
In order to obtain a more accurate target detection result by combining the advantages of TridentNet and Cascade-RCNN, the patent provides a method for combining two model structures, and the advantages of TridentNet and Cascade-RCNN are fully utilized to solve the multi-scale problem and the IOU threshold selection problem in target detection.
Disclosure of Invention
In order to at least partially solve the above problems, the present invention provides a method and a system for detecting a high-speed target based on a TridentNet and Cascade-RCNN structure, wherein the method comprises: the method comprises the following steps:
acquiring a data set of a target detection image, and enhancing the image in the data set;
constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;
training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and carrying out target detection on the image to be detected through the target detection model.
Further, the extracting a speech feature sequence from a user speech waveform includes:
the acquiring a data set of a target detection image and performing enhancement processing on the image in the data set includes:
step a1, performing global brightness enhancement processing on the brightness component of the image in the data set by using the following formula:
wherein L isγ(c, d) represents a brightness function after global brightness adjustment, Gamma represents a Gamma change coefficient, I (c, d) represents an original brightness component of an image in a data set, q represents a standard difference value of a Gaussian function, pi represents a natural constant, exp represents an exponential function, and (c, d) represents coordinates of image pixel points in the data set;
step a2, after performing global brightness enhancement processing on the brightness component of the image in the data set, performing saturation enhancement processing according to the following formula:
where H' represents the enhanced saturation component, H represents the original saturation component of the image in the data set, ψ represents the parameter value, and T represents the average luminance of the illumination information.
Furthermore, the feature extraction network is divided into three paths of different scale features of the learning target by using a scale-aware parallel structure mode in the TridentNet and combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of the target detection model has a better detection effect on the targets with different scales.
Furthermore, the backbone network is ResNet-18, the dual-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network.
Further, the training the neural network through the enhanced data set, and determining the target according to the IOU threshold in the training process to obtain a target detection model, including:
acquiring an enhanced data set, and removing redundant repeated data;
the data are accurately marked in position and category by using the existing marking software, and a marking file is generated;
the method comprises the steps of enabling a label file to correspond to picture data in a data set one by one, then analyzing the label file to generate tag data in a txt format, and randomly segmenting the data set into a training set, a verification set and a test set;
training a neural network in the target detection model by using the training set data, carrying out target detection model verification observation by using the verification set data in a continuous iteration process, then testing the target detection model on the test set data, and analyzing the detection result of the target detection model.
Further, before the training of the neural network in the target detection model using the training set data, the method further includes: the method comprises the following steps of carrying out normalization processing on training set data:
step A1, calculating the mean and variance of the samples in the training set according to the following formulas:
wherein, muaRepresents the mean value of the sample data in the training set,representing the variance, x, of sample data in the training setiRepresents the ith sample data in the training set, i is 1,2, …, N;
step A2, the training set is normalized according to the following formula:
wherein,represents the ith sample data in the normalized training set, and epsilon represents the minimum value.
Further, measuring the training degree of the neural network according to the Loss function Loss when training the neural network, including:
the Loss function Loss is obtained according to the following formula:
Loss=Rcls[h]+Rloc[f]
wherein R iscls[h]Representing a classification loss function, Rloc[f]Represents a position loss function;
the classification loss function is expressed as:
wherein, h (x)i) Probability estimation, y, representing the posterior distribution of classesiRepresents a class label, LclsRepresents the cross entropy loss;
the position loss function is expressed as:
wherein, (f (x)i,bi),gi) Representing an image area xiPredicted frame biRegression to labeled label borders, LlocRepresenting the bezel regression loss.
Further, the bounding box regression loss LlocUsing smoothed L1And (4) loss.
Further, the performing target detection on the image to be detected through the target detection model includes:
acquiring an image to be detected, dividing the image into three CNNs in a scale-aware mode for feature extraction, and performing high-speed feature extraction on each CNN by using a network strut of Resnet 18;
carrying out equal-size series operation on the extracted features to obtain the features after series connection;
and (3) performing high-speed feature extraction and processing on the features after series connection by adopting a network strut of Resnet18, dividing the features into three sub-features after feature extraction, setting different IOU values for the sub-features, performing cascade operation on the feature graph of three-way border regression, and taking the classification result and the border regression result after third-stage cascade as a final target detection result.
Further, the system comprises:
the data enhancement module is used for acquiring a data set of a target detection image and enhancing the image in the data set;
the network construction module is used for constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, each backbone network is fused with a deformable convolution network, and the prediction network comprises a double-branch structure;
the model acquisition module is used for training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and the target detection module is used for carrying out target detection on the image to be detected through the target detection model.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a high-speed target detection method based on Trident Net and Cascade-RCNN structures, which comprises the steps of obtaining a data set of target detection images, and performing enhancement processing on the images in the data set; constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure; training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; the target detection model is used for carrying out target detection on the image to be detected, and the reasonable and high-speed multi-scale detection model is designed, so that the accuracy and efficiency of target detection can be greatly improved.
The following description of the preferred embodiments for carrying out the present invention will be made in detail with reference to the accompanying drawings so that the features and advantages of the present invention can be easily understood.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments of the present invention will be briefly described below. Wherein the drawings are only for purposes of illustrating some embodiments of the invention and are not to be construed as limiting the invention to all embodiments thereof.
FIG. 1 is a flow chart of a high-speed target detection method based on TridentNet and Cascade-RCNN structures according to the present invention;
FIG. 2 is a block diagram of a high-speed target detection system based on TridentNet and Cascade-RCNN structures according to the present invention;
FIG. 3 is a diagram of the Trident-Cascade-RCNN neural network structure of the present invention;
FIG. 4 is an original test data diagram;
FIG. 5 is a diagram showing the detection result of the Trident-Cascade-RCNN dual-structure network of the present invention;
FIG. 6 is a diagram showing the results of detection by the Cascade-RCNN network;
fig. 7 is a graph of the detection results of the TridentNet network.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to 7, the technical problem to be solved by the present invention is to provide a method and a system for detecting a high-speed target based on a TridentNet and Cascade-RCNN structure, wherein the method comprises:
acquiring a data set of a target detection image, and enhancing the image in the data set;
constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;
training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and carrying out target detection on the image to be detected through the target detection model.
The working principle of the technical scheme is as follows: firstly, acquiring a data set of a target detection image, and enhancing the image in the data set; then, a neural network is constructed, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network refers to a scale-aware parallel structure mode in Trident Net, and is divided into different scale features of three paths of learning targets by combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of a target detection model has a better detection effect on targets with different scales, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, the prediction network comprises a double-branch structure, the backbone networks are Res-18, the double-branch structures are FC-head and Conv-head respectively, the FC-head is used as a classification network, and the Conv-head is used as a regression network; secondly, training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and finally, carrying out target detection on the image to be detected through the target detection model.
The beneficial effects of the above technical scheme are that: by designing a reasonable and high-speed multi-scale target detection model according to the technical scheme, the accuracy and efficiency of target detection can be greatly improved, wherein the target detection model comprises a neural network, and by using the neural network for reference to fast-RCNN, FPN (feature probable network) and RPN (region probable network) networks are adopted in sequence to obtain the feature mapping of the candidate frame, so that the feature extraction network can better extract features, the target detection model can be better trained, overfitting is reduced, and the accuracy and robustness of the target detection model are greatly improved.
In an embodiment provided by the present invention, the acquiring a data set of a target detection image and performing enhancement processing on an image in the data set includes:
step a1, performing global brightness enhancement processing on the brightness component of the image in the data set by using the following formula:
wherein L isγ(c, d) represents a brightness function after global brightness adjustment, Gamma represents a Gamma change coefficient, I (c, d) represents an original brightness component of an image in a data set, q represents a standard difference value of a Gaussian function, pi represents a natural constant, exp represents an exponential function, and (c, d) represents coordinates of image pixel points in the data set;
step a2, after performing global brightness enhancement processing on the brightness component of the image in the data set, performing saturation enhancement processing according to the following formula:
where H' represents the enhanced saturation component, H represents the original saturation component of the image in the data set, ψ represents the parameter value, and T represents the average luminance of the illumination information.
The beneficial effects of the above technical scheme are that: the image characteristic information can be enhanced through the technology, different places in the image can be better distinguished by improving the brightness and the saturation of the image, so that the characteristic information is more sensitive and more obvious, the algorithm adjusts the brightness by adopting a Gaussian function, the image contrast is increased along with the increase of the standard difference value of the Gaussian function due to the brightness, but the brightness value is reduced, in order to solve the problem, the saturation component is also processed on the image, so that the contrast is enhanced, the characteristic information of the image in a database is more obvious, and the characteristic extraction is convenient.
In an embodiment of the present invention, the training of the neural network by the enhanced data set, and the determining of the target according to the IOU threshold in the training process to obtain the target detection model include:
acquiring an enhanced data set, and removing redundant repeated data;
the data are accurately marked in position and category by using the existing marking software, and a marking file is generated;
the method comprises the steps of enabling a label file to correspond to picture data in a data set one by one, then analyzing the label file to generate tag data in a txt format, and randomly segmenting the data set into a training set, a verification set and a test set;
training a neural network in the target detection model by using the training set data, carrying out target detection model verification observation by using the verification set data in a continuous iteration process, then testing the target detection model on the test set data, and analyzing the detection result of the target detection model.
The working principle of the technical scheme is as follows: firstly, acquiring an enhanced data set and removing redundant data, wherein PCA principal component analysis is used for removing the redundant data; secondly, accurately marking the position and the type of the data by using the existing marking software labellimg, and generating a marking file; secondly, the label file and the picture data in the data set are in one-to-one correspondence, then the label file is analyzed to generate tag data in a txt format, and the data set is randomly segmented into a training set, a verification set and a test set; and finally, training the neural network in the target detection model by using training set data, wherein the data normalization is required to be carried out by the following steps before the training of the neural network in the target detection model by using the training set data: step A1, calculating the mean and variance of the samples in the training set according to the following formulas:
wherein, muaRepresents the mean value of the sample data in the training set,represents a training setVariance of this data, xiRepresents the ith sample data in the training set, i is 1,2, …, N;
step A2, the training set is normalized according to the following formula:
wherein,representing the ith sample data in the normalized training set, representing the minimum value, preventing the calculation error when the denominator is zero, wherein, the normalized data needs to be reconstructed when the gradient is reversed,wherein, yiThe representative category label, xi, beta, represents the parameter to be learned, and in order not to change the distribution characteristic of the data, xi, beta should be calculated according to the following formula:β=μa(ii) a When training the neural network, measuring the training degree of the neural network according to the Loss function Loss, comprising the following steps:
the Loss function Loss is obtained according to the following formula:
Loss=Rcls[h]+Rloc[f]
wherein R iscls[h]Representing a classification loss function, Rloc[f]Represents a position loss function;
the classification loss function is expressed as:
wherein, h (x)i) Probability estimation, y, representing the posterior distribution of classesiRepresents a class label, LclsRepresents the cross entropy loss;
the position loss function is expressed as:
wherein, (f (x)i,bi),gi) Representing an image area xiPredicted frame biRegression to labeled label borders, LlocRepresenting the frame regression loss, frame regression loss LlocUsing smoothed L1Loss; the method comprises the steps of using verification set data to verify and observe a target detection model in a continuous iteration process, testing the target detection model on the test set data, and analyzing a detection result of the target detection model, wherein the target detection model is realized based on Trident Net and Cascade-RCNN double-structure networks, the Trident module and the Cascade module are respectively marked as a Trident module and a Cascade module, the Trident module adopts a scale-aware parallel structure mode and is divided into three paths to extract CNN features, and each CNN has a backbone structure of ResNet-18 so as to comprehensively consider the capability and efficiency problems of feature extraction. The CNN structure uses hole convolution (scaled convolution) with different convolution rates to obtain feature maps under different receptive fields. The three branches share weight parameters, and the features of different scales are extracted according to the receptive fields of different scales, so that the Trident part of the model has a better feature extraction effect on the targets of different scales. Then, performing localization operation, and performing feature merging to send the feature merged into the next network module; the Cascaded module directly divides the characteristics output by the Trident module into three sub-networks according to the difference of IOU thresholds, the detection frame after the regression of the first sub-network is input into the second sub-network for frame classification and regression, the detection frame after the regression of the second sub-network is input into the third sub-network for frame classification and regression, and the final result after the classification and regression of the third sub-network is the final network output result. In the Cascade module, the IOU thresholds of the three sub-networks are respectively set to be 0.4, 0.6 and 0.8 according to the actual trial situation, so that the optimal average classification and regression precision can be obtained; the data of the test set can be respectively detected in TridentNet and Cascade-RCNN, so thatAnd analyzing the detection results of the Cascade-RCNN network, the TridentNet network and the TridentNet and Cascade-RCNN double structures.
The beneficial effects of the above technical scheme are that: by adopting a scale-aware parallel structure mode, the CNN characteristic extraction is carried out in three paths, so that favorable characteristics can be better extracted at high speed, the normalization operation is carried out on training set data before the network is trained, the data are in accordance with positive distribution, so that the loss function can be better reduced, the overfitting is prevented, the training data can be better fitted by network parameters, the precision value and the accuracy of a target detection model are greatly improved, and the frame regression loss in the loss function adopts L1Loss, L1The loss is based on image comparison difference, then an absolute value is taken, the problem of gradient explosion in the iterative process can be better prevented, when the loss is reduced to a certain value, the verification set and the test set are used for verification and test, the quality of the model is measured through mAP of a printed target detection model, the quality is automatically detected and identified by a computer, extra manual maintenance is not needed, and therefore the intelligent level is greatly improved.
A high-speed object detection system based on the TridentNet and Cascade-RCNN architecture, the system comprising:
the data enhancement module is used for acquiring a data set of a target detection image and enhancing the image in the data set;
the network construction module is used for constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, each backbone network is fused with a deformable convolution network, and the prediction network comprises a double-branch structure;
the model acquisition module is used for training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and the target detection module is used for carrying out target detection on the image to be detected through the target detection model.
The working principle of the technical scheme is as follows: firstly, a data enhancement module acquires a data set of a target detection image and performs enhancement processing on the image in the data set; then, a network construction module constructs a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network refers to a scale-aware parallel structure mode in Trident Net, and is divided into different scale features of three paths of learning targets by combining the characteristics of a feature pyramid network and an FPN network, so that the Trident part of a target detection model has a better detection effect on targets with different scales, the feature extraction network is fused with a plurality of backbone networks and comprises the feature pyramid network, each backbone network can be deformed and fused with a convolution network, the prediction network comprises a double-branch structure, the backbone network is ResNet-18, the double-branch structures are respectively FC-head and Conv-head, the FC-head is used as a classification network, and the Conv-head is used as a regression network; secondly, training the neural network by the enhanced data set through a model acquisition module, and judging a target according to an IOU threshold value in the training process to obtain a target detection model; and finally, the target detection module performs target detection on the image to be detected through the target detection model.
The beneficial effects of the above technical scheme are that: by designing a reasonable and high-speed multi-scale target detection model according to the technical scheme, the accuracy and efficiency of target detection can be greatly improved, wherein the target detection model comprises a neural network, and by using the neural network for reference to fast-RCNN, FPN (feature probable network) and RPN (region probable network) networks are adopted in sequence to obtain the feature mapping of the candidate frame, so that the feature extraction network can better extract features, the target detection model can be better trained, overfitting is reduced, and the accuracy and robustness of the target detection model are greatly improved.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle scope of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A high-speed target detection method based on Trident Net and Cascade-RCNN structures is characterized in that: the method comprises the following steps:
acquiring a data set of a target detection image, and enhancing the image in the data set;
constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network fuses a plurality of backbone networks and comprises a feature pyramid network, a deformable convolution network is fused in each backbone network, and the prediction network comprises a double-branch structure;
training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and carrying out target detection on the image to be detected through the target detection model.
2. The method as claimed in claim 1, wherein the step of obtaining a data set of target detection images and performing enhancement processing on the images in the data set comprises:
step a1, performing global brightness enhancement processing on the brightness component of the image in the data set by using the following formula:
wherein L isγ(c, d) represents a brightness function after global brightness adjustment, Gamma represents a Gamma change coefficient, I (c, d) represents an original brightness component of an image in a data set, q represents a standard difference value of a Gaussian function, pi represents a natural constant, exp represents an exponential function, and (c, d) represents coordinates of image pixel points in the data set;
step a2, after performing global brightness enhancement processing on the brightness component of the image in the data set, performing saturation enhancement processing according to the following formula:
where H' represents the enhanced saturation component, H represents the original saturation component of the image in the data set, ψ represents the parameter value, and T represents the average luminance of the illumination information.
3. The method as claimed in claim 1, wherein the feature extraction network uses a scale-aware parallel structure mode in TridentNet for reference, combines the features of a feature pyramid network and an FPN network, and divides the feature into different scale features of three paths of learning objects, so that the Trident part of the object detection model has better detection effect on objects with different scales.
4. The method as claimed in claim 1, wherein the backbone network is ResNet-18, the dual-branch structures are FC-head and Conv-head, the FC-head is a classification network, and the Conv-head is a regression network.
5. The method as claimed in claim 1, wherein the training of the neural network is performed by the enhanced data set, and the target is judged according to the IOU threshold during the training process to obtain the target detection model, and the method comprises:
acquiring an enhanced data set, and removing redundant repeated data;
the data are accurately marked in position and category by using the existing marking software, and a marking file is generated;
the method comprises the steps of enabling a label file to correspond to picture data in a data set one by one, then analyzing the label file to generate tag data in a txt format, and randomly segmenting the data set into a training set, a verification set and a test set;
training a neural network in the target detection model by using the training set data, carrying out target detection model verification observation by using the verification set data in a continuous iteration process, then testing the target detection model on the test set data, and analyzing the detection result of the target detection model.
6. The method of claim 5, wherein the training of the neural network in the target detection model using the training set data further comprises: the method comprises the following steps of carrying out normalization processing on training set data:
step A1, calculating the mean and variance of the samples in the training set according to the following formulas:
wherein, muaRepresents the mean value of the sample data in the training set,representing the variance, x, of sample data in the training setiRepresents the ith sample data in the training set, i is 1,2, …, N;
step A2, the training set is normalized according to the following formula:
7. The method as claimed in claim 4, wherein the measuring of the training degree of the neural network according to the Loss function Loss during the training of the neural network comprises:
the Loss function Loss is obtained according to the following formula:
Loss=Rcls[h]+Rloc[f]
wherein R iscls[h]Representing a classification loss function, Rloc[f]Represents a position loss function;
the classification loss function is expressed as:
wherein, h (x)i) Probability estimation, y, representing the posterior distribution of classesiRepresents a class label, LclsRepresents the cross entropy loss;
the position loss function is expressed as:
wherein, (f (x)i,bi),gi) Representing an image area xiPredicted frame biRegression to labeled label borders, LlocRepresenting the bezel regression loss.
8. The method as claimed in claim 7, wherein the bounding box regression loss L is LlocUsing smoothed L1And (4) loss.
9. The method as claimed in claim 1, wherein the performing of the object detection on the image to be detected by the object detection model comprises:
acquiring an image to be detected, dividing the image into three CNNs in a scale-aware mode for feature extraction, and performing high-speed feature extraction on each CNN by using a network strut of Resnet 18;
carrying out equal-size series operation on the extracted features to obtain the features after series connection;
and (3) performing high-speed feature extraction and processing on the features after series connection by adopting a network strut of Resnet18, dividing the features into three sub-features after feature extraction, setting different IOU values for the sub-features, performing cascade operation on the feature graph of three-way border regression, and taking the classification result and the border regression result after third-stage cascade as a final target detection result.
10. A high-speed object detection system based on the TridentNet and Cascade-RCNN architecture, the system comprising:
the data enhancement module is used for acquiring a data set of a target detection image and enhancing the image in the data set;
the network construction module is used for constructing a neural network, wherein the neural network comprises a feature extraction network and a prediction network, the feature extraction network is fused with a plurality of backbone networks and comprises a feature pyramid network, each backbone network is fused with a deformable convolution network, and the prediction network comprises a double-branch structure;
the model acquisition module is used for training the neural network through the enhanced data set, and judging a target according to an IOU threshold value in the training process to obtain a target detection model;
and the target detection module is used for carrying out target detection on the image to be detected through the target detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011405295.2A CN112365497A (en) | 2020-12-02 | 2020-12-02 | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011405295.2A CN112365497A (en) | 2020-12-02 | 2020-12-02 | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112365497A true CN112365497A (en) | 2021-02-12 |
Family
ID=74535914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011405295.2A Pending CN112365497A (en) | 2020-12-02 | 2020-12-02 | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112365497A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326924A (en) * | 2021-06-07 | 2021-08-31 | 太原理工大学 | Depth neural network-based key target photometric positioning method in sparse image |
CN113591617A (en) * | 2021-07-14 | 2021-11-02 | 武汉理工大学 | Water surface small target detection and classification method based on deep learning |
CN113780193A (en) * | 2021-09-15 | 2021-12-10 | 易采天成(郑州)信息技术有限公司 | RCNN-based cattle group target detection method and equipment |
CN113869361A (en) * | 2021-08-20 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, target detection method and related device |
CN115526874A (en) * | 2022-10-08 | 2022-12-27 | 哈尔滨市科佳通用机电股份有限公司 | Round pin of brake adjuster control rod and round pin split pin loss detection method |
CN115527059A (en) * | 2022-08-16 | 2022-12-27 | 贵州博睿科讯科技发展有限公司 | Road-related construction element detection system and method based on AI (Artificial Intelligence) identification technology |
CN115931359A (en) * | 2023-03-03 | 2023-04-07 | 西安航天动力研究所 | Turbine pump bearing fault diagnosis method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779330A (en) * | 2012-06-13 | 2012-11-14 | 京东方科技集团股份有限公司 | Image reinforcement method, image reinforcement device and display device |
US20190095795A1 (en) * | 2017-03-15 | 2019-03-28 | Samsung Electronics Co., Ltd. | System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions |
CN110852349A (en) * | 2019-10-21 | 2020-02-28 | 上海联影智能医疗科技有限公司 | Image processing method, detection method, related equipment and storage medium |
CN111814755A (en) * | 2020-08-18 | 2020-10-23 | 深延科技(北京)有限公司 | Multi-frame image pedestrian detection method and device for night motion scene |
-
2020
- 2020-12-02 CN CN202011405295.2A patent/CN112365497A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779330A (en) * | 2012-06-13 | 2012-11-14 | 京东方科技集团股份有限公司 | Image reinforcement method, image reinforcement device and display device |
US20190095795A1 (en) * | 2017-03-15 | 2019-03-28 | Samsung Electronics Co., Ltd. | System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions |
CN110852349A (en) * | 2019-10-21 | 2020-02-28 | 上海联影智能医疗科技有限公司 | Image processing method, detection method, related equipment and storage medium |
CN111814755A (en) * | 2020-08-18 | 2020-10-23 | 深延科技(北京)有限公司 | Multi-frame image pedestrian detection method and device for night motion scene |
Non-Patent Citations (3)
Title |
---|
司马海峰等: "《遥感图像分类中的智能计算方法》", 31 January 2018 * |
杨东方等: "《数学模型在生态学的应用及研究》", 31 March 2019 * |
雷帮军等: "《视频目标跟踪系统分步详解》", 31 December 2015 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113326924B (en) * | 2021-06-07 | 2022-06-14 | 太原理工大学 | Depth neural network-based key target photometric positioning method in sparse image |
CN113326924A (en) * | 2021-06-07 | 2021-08-31 | 太原理工大学 | Depth neural network-based key target photometric positioning method in sparse image |
CN113591617B (en) * | 2021-07-14 | 2023-11-28 | 武汉理工大学 | Deep learning-based water surface small target detection and classification method |
CN113591617A (en) * | 2021-07-14 | 2021-11-02 | 武汉理工大学 | Water surface small target detection and classification method based on deep learning |
CN113869361A (en) * | 2021-08-20 | 2021-12-31 | 深延科技(北京)有限公司 | Model training method, target detection method and related device |
CN113780193A (en) * | 2021-09-15 | 2021-12-10 | 易采天成(郑州)信息技术有限公司 | RCNN-based cattle group target detection method and equipment |
CN113780193B (en) * | 2021-09-15 | 2024-09-24 | 易采天成(郑州)信息技术有限公司 | RCNN-based cattle group target detection method and RCNN-based cattle group target detection equipment |
CN115527059A (en) * | 2022-08-16 | 2022-12-27 | 贵州博睿科讯科技发展有限公司 | Road-related construction element detection system and method based on AI (Artificial Intelligence) identification technology |
CN115527059B (en) * | 2022-08-16 | 2024-04-09 | 贵州博睿科讯科技发展有限公司 | System and method for detecting road construction elements based on AI (advanced technology) recognition technology |
CN115526874B (en) * | 2022-10-08 | 2023-05-12 | 哈尔滨市科佳通用机电股份有限公司 | Method for detecting loss of round pin and round pin cotter pin of brake adjuster control rod |
CN115526874A (en) * | 2022-10-08 | 2022-12-27 | 哈尔滨市科佳通用机电股份有限公司 | Round pin of brake adjuster control rod and round pin split pin loss detection method |
CN115931359B (en) * | 2023-03-03 | 2023-07-14 | 西安航天动力研究所 | Turbine pump bearing fault diagnosis method and device |
CN115931359A (en) * | 2023-03-03 | 2023-04-07 | 西安航天动力研究所 | Turbine pump bearing fault diagnosis method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112365497A (en) | High-speed target detection method and system based on Trident Net and Cascade-RCNN structures | |
CN108830188B (en) | Vehicle detection method based on deep learning | |
CN111179251B (en) | Defect detection system and method based on twin neural network and by utilizing template comparison | |
EP3478728B1 (en) | Method and system for cell annotation with adaptive incremental learning | |
CN113160192B (en) | Visual sense-based snow pressing vehicle appearance defect detection method and device under complex background | |
CN106803247B (en) | Microangioma image identification method based on multistage screening convolutional neural network | |
WO2022012110A1 (en) | Method and system for recognizing cells in embryo light microscope image, and device and storage medium | |
CN109285139A (en) | A kind of x-ray imaging weld inspection method based on deep learning | |
CN108564085B (en) | Method for automatically reading of pointer type instrument | |
CN111815564B (en) | Method and device for detecting silk ingots and silk ingot sorting system | |
CN109919934A (en) | A kind of liquid crystal display panel defect inspection method based on the study of multi-source domain depth migration | |
CN106340016A (en) | DNA quantitative analysis method based on cell microscope image | |
CN109284779A (en) | Object detection method based on deep full convolution network | |
CN110599463B (en) | Tongue image detection and positioning algorithm based on lightweight cascade neural network | |
CN113435407B (en) | Small target identification method and device for power transmission system | |
CN116012291A (en) | Industrial part image defect detection method and system, electronic equipment and storage medium | |
CN113205511B (en) | Electronic component batch information detection method and system based on deep neural network | |
CN112613428B (en) | Resnet-3D convolution cattle video target detection method based on balance loss | |
CN109840483A (en) | A kind of method and device of landslide fissure detection and identification | |
CN115147363A (en) | Image defect detection and classification method and system based on deep learning algorithm | |
CN112381806A (en) | Double centromere aberration chromosome analysis and prediction method based on multi-scale fusion method | |
CN115294377A (en) | System and method for identifying road cracks | |
CN110287970B (en) | Weak supervision object positioning method based on CAM and covering | |
CN115439654A (en) | Method and system for finely dividing weakly supervised farmland plots under dynamic constraint | |
CN109993728B (en) | Automatic detection method and system for deviation of thermal transfer glue |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210212 |