CN110569782A - Target detection method based on deep learning - Google Patents

Target detection method based on deep learning Download PDF

Info

Publication number
CN110569782A
CN110569782A CN201910836094.9A CN201910836094A CN110569782A CN 110569782 A CN110569782 A CN 110569782A CN 201910836094 A CN201910836094 A CN 201910836094A CN 110569782 A CN110569782 A CN 110569782A
Authority
CN
China
Prior art keywords
layer
target
network
convolution
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910836094.9A
Other languages
Chinese (zh)
Inventor
赵骥
于海龙
吴晓翎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Liaoning USTL
Original Assignee
University of Science and Technology Liaoning USTL
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Liaoning USTL filed Critical University of Science and Technology Liaoning USTL
Priority to CN201910836094.9A priority Critical patent/CN110569782A/en
Publication of CN110569782A publication Critical patent/CN110569782A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

A target detection method based on deep learning replaces a VGG16 network used for extracting image features in a Faster RCNN method with a 101-layer residual error which has stronger expression capability and deeper layers; the structure of the residual error unit is changed into a pre-activation mode, so that the network is more smooth in the forward and backward propagation processes; and (3) introducing automatic deformable convolution by taking the most basic convolution as an improved entry point, and dynamically adjusting the size and the position of a convolution kernel according to the image content which needs to be identified currently. Based on the characteristics of diversity of target forms, small difference among classes, unclear targets, small targets, shielding among targets, complex located background and the like, a candidate region-based deep learning algorithm fast RCNN is improved, a new target detection method is established, and the target detection algorithm is strong in robustness: the detection result cannot be influenced no matter shielding, different illumination, similar background and unclear target, and the phenomena of missing detection and false detection are greatly reduced.

Description

target detection method based on deep learning
Technical Field
the invention relates to the technical field of computer vision, in particular to a target detection method based on deep learning.
Background
Vision is the main way for human beings to perceive external information, and provides a vital basis for people to distinguish things in sight. The target detection is one of the most classical research contents in the computer vision technology, and has important application value in new retail industry, intelligent traffic control, intelligent high-speed intersection management, community security and even in various scenes such as national military field and the like. The target detection mainly refers to detecting, extracting and segmenting a target from background information, quickly and accurately expressing and accurately positioning the target in an input image, and laying a foundation for reading and understanding information of target behaviors, so that the accuracy and the high efficiency of the target detection directly influence the quality of post-processing such as target identification.
currently, the following methods are mainly used for target detection: 1. the inter-frame difference method is to calculate the difference between two or more adjacent frames of image pixels in the image sequence and convert the difference into a binary image by setting a threshold value to determine a moving target. The target detection method only shows good effect under the condition that the background is static, but the target is difficult to detect under the condition that the textures and the color distribution of the background and the detected object are too uniform. 2. The background subtraction method is a method for obtaining a difference region by subtracting pixels in a current input image and pixels of a background image slice, is sensitive to the robustness of environmental changes and is only suitable for target detection of relatively stable motion of a background. 3. By utilizing HOG characteristics, Haar characteristics or SIFT characteristics, traversing the whole image by adopting sliding windows with different proportions, extracting the characteristics of the target, and then classifying the target in each window by using an SVM classifier and an AdaBoost classifier, wherein the exhaustion method consumes a large amount of time. 4. A multi-scale deformation component model DPM target detection algorithm is based on an SVM classifier and a sliding window detection idea by utilizing improved HOG characteristics and aiming at the multi-view problem of a target, a multi-component strategy is adopted, and the multi-scale deformation component model DPM target detection algorithm is only suitable for detection tasks of human faces and pedestrians, but is relatively complex and low in detection efficiency. 5. The SSD algorithm based on deep learning introduces a multi-scale concept, so that the SSD algorithm has an unsatisfactory effect of detecting a small target object. 6. A method for detecting fast RCNN based on the deep learning of a candidate region has a strong detection effect when a small target and a target with large overlapping degree between the targets are detected.
In recent years, target detection based on deep learning has received attention from researchers. Compared with the traditional detection method, the performance of the target detection method based on deep learning is greatly improved, but still has several disadvantages: 1. learning the features of the target is incomplete and an excessively small target cannot be detected. 2. Due to the fact that the candidate frames are removed by the fast RCNN method through the non-maximum suppression method, missing detection occurs when the target with the cross overlapping and shielding conditions is detected. The fast RCNN algorithm adopts a VGG16 network to extract the characteristics of the target, and as the geometric shape of a convolution kernel for convolution operation is fixed and unchangeable, the network geometric structure formed by the lamination of the network geometric structure is also fixed, and the extraction of the characteristics is limited to a certain degree, so that the geometric deformation cannot be well responded, and the execution effect of the target detection algorithm is not ideal for targets in different states presented under different visual angles.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a target detection method based on deep learning, which improves a candidate region-based deep learning algorithm Faster RCNN and establishes a new target detection method based on the characteristics of diversity of target forms, small inter-class difference, unclear target, too small target, shielding between targets, complex background and the like so as to more accurately detect the target.
In order to achieve the purpose, the invention adopts the following technical scheme:
A target detection method based on deep learning comprises the following steps:
1) Replacing a VGG16 network used for extracting image features in a Faster RCNN method with a 101-layer residual error with stronger expression capability and deeper layers;
The forward propagation of the residual error network is linear, the input of the rear layer is the sum of the input of the front layer and each residual error unit, and the calculation expression of the deep layer unit L can be obtained after multiple iterations:
wherein, XLrepresenting the output vector, x, of the L-th layer (deep layer)lThe input vector of the l-th layer (shallow layer), F (x)i,wi) The residuals in layer i are shown.
2) the structure of the residual error unit is changed into a pre-activation mode, so that the network is more smooth in the forward and backward propagation processes; the pre-activation mode is specifically as follows:
in order to make network training easier, the conventional 'post-activation' structure of the residual unit is changed into 'pre-activation'; x in modified residual unit structureLCan be regarded as XlAnd residual error accumulation, so that the gradient can be completely transmitted back when the network performs a back propagation operation, the information transmission is smooth, and the improved residual error unit structure is represented by the following formula:
Wherein ε represents a loss error value expressed asxLDenotes L layer prediction value, xlablerepresents the value of the corresponding exponential vector of the L layer, F (x)i,Wi) Representing the residual error.
3) The most basic convolution is used as an improved entry point, automatic deformable convolution is introduced, and the size and the position of a convolution kernel are dynamically adjusted according to the image content needing to be identified currently;
a set of n x n convolution kernelsdefining the size of the receptive field, outputting a feature map y for each pixel p on the conventional convolutional neural networknAs a matter of fact, it is possible to obtain:
Wherein p isnEnumerate and providePosition in, x (p)0) Is a sample point, w (p)n) Representing the corresponding weight.
In order to better adapt to the deformation of the target, automatic deformable convolution is introduced, and an offset variable delta p is added to the sampling point of each convolution kernelnGiven the nature of the convolution kernel deformation, for an automatic deformable convolution the formula is shown:
wherein, Δ pnfor each pixel point pnA corresponding one of the offset amounts is,Middle element passing offset { Δ pn1, …, N (where,);
Therefore, sampling will occur at irregular positions p with offsetn+Δpnthe above.
4) Aiming at the situation that cross overlapping and shielding often exist between targets, a soft non-maximum value suppression new strategy is adopted;
introducing a softNMS algorithm to carry out post-processing on the candidate frames, solving the problem of missing detection caused by most of overlapping of targets, wherein the score resetting function of the softNMS algorithm is as follows:
Wherein, iou (M, b)i) For the intersection ratio of the highest candidate box currently scored and the true box, i.e. the intersection needs to be calculated first, and then the union passes the areas of the two bounding boxesAnd subtracting the intersection part to obtain a union, and performing a series of conversions through a log function to generate a new Si
When the overlapping degree of adjacent detection frames exceeds a threshold value, the probability scores of adjacent candidate frames are reduced through a correlation function instead of complete elimination, the probability values of the detection frames which are adjacent to M are greatly attenuated by the function, and the detection frames far away from M cannot be influenced and still remain in the object detection sequence.
Compared with the prior art, the invention has the beneficial effects that:
1) The target detection algorithm of the invention has strong robustness: the detection result cannot be influenced no matter shielding, different illumination, similar background and unclear target, and the phenomena of missing detection and false detection are greatly reduced.
2) According to the invention, in the fast RCNN method, the VGG16 network used for extracting image features is replaced by the 101-layer residual error transformation aggregation depth network ResNeXt with stronger expression capability and deeper layers, so that the purpose of completely learning the target features is achieved, and the problems of target detection caused by external factors, such as unclear target, too small target, high similarity between target and background, and the like, are solved.
3) With the increase of the number of network layers, the detection accuracy is improved, but more training time is consumed. In order to facilitate training, the structure of the residual error unit is changed, and a conventional 'post-activation' mode is changed into a 'pre-activation' mode, so that the network is more smooth in the forward and backward propagation processes.
4) The most basic convolution is taken as an improved entry point, and automatic deformable convolution is introduced. The size and the position of the convolution kernel can be dynamically adjusted according to the image content which needs to be identified at present, so that the method can better adapt to the target detection task of geometric deformation of the shape, the size and the like of an object, and solves the problems that the target is deformed and the target can be detected in different states under different visual angles.
5) aiming at the situation that cross overlapping and shielding often exist between targets, a non-maximum suppression method is adopted by a traditional fast RCNN algorithm to remove candidate frames, and the phenomenon of missing detection occurs. Aiming at the problem, a new strategy of suppressing the soft non-maximum value is adopted to achieve the purpose of solving the problem of missing detection.
Drawings
FIG. 1 is a modified residual unit structure of the present invention;
FIG. 2 is a network overview framework diagram of the present invention;
Fig. 3 is a flow chart of the SoftNMS algorithm.
Detailed Description
The following detailed description of the present invention will be made with reference to the accompanying drawings.
a target detection method based on deep learning comprises the following steps:
1) Replacing a VGG16 network used for extracting image features in a Faster RCNN method with a 101-layer residual error with stronger expression capability and deeper layers;
The ResNeXt network is used for learning the target characteristics and has a 101-layer structure, is an upgraded version of the ResNet network, reserves the basic stacking mode of the ResNet network, is formed by stacking blocks which are parallel and have the same topological structure, only splits paths of the ResNet into independent paths with the number (called as a base number) of 32, simultaneously performs convolution operation on input images by the 32 paths, and finally performs accumulation and summation on outputs from different paths to obtain a final result. The operation makes the division of the network more definite and the local adaptability stronger. The preposed network is replaced by a ResNeXt network with 101 layers, and meanwhile, the basic construction unit of the network has identity mapping and quick connection mechanisms, so that the learning capability of the model is far superior to that of other deep learning models.
The forward propagation of the residual error network is linear, the input of the rear layer is the sum of the input of the front layer and each residual error unit, and the calculation expression of the deep layer unit L can be obtained after multiple iterations:
Wherein, XLis shown asOutput vector of L-th layer (deep layer), xlThe input vector of the l-th layer (shallow layer), F (x)i,wi) The residuals in layer i are shown.
2) In order to make the network training easier, the conventional "post-activation" structure of the residual unit is changed to "pre-activation". The conventional residual unit structure has two characteristics: the BN layer and the ReLU layer are behind the Conv layer, namely a Conv-BN-ReLU structure; 2. the second ReLU layer will follow the addition. The output of a conventional residual unit is:
Xl+1=f(Xl+F(Xl,wl))
Wherein F (X) is residual error, F is ReLU activation function, and Xl+1Is the output of the current layer, i.e. the input of the next layer.
in the conventional residual unit structure, a ReLU activation function exists after the weighted summation operation, and when the signal of the ReLU activation function is negative, the propagation is truncated, two branches of the residual unit are affected, so that information can only be propagated directly between two adjacent residual units. Therefore, both BN and ReLU are moved to the front of the weight layer, and at the same time, the ReLU activation function is moved to the residual function branch, and the identity mapping is constructed to form a "pre-activation" manner, so that the shortcut connection branch will not be affected, and the improved residual unit structure is as shown in fig. 1.
X in modified residual unit structureLcan be regarded as XlAnd accumulation of residuals. In this way, when the network performs the back propagation operation, the gradient can be completely transmitted back, and the information transmission is smooth, and the improved residual error unit structure can be represented by the following formula:
Wherein ε represents a loss error value expressed asxLDenotes L layer prediction value, xlableRepresenting the corresponding direction of interest vector of L layerValue, F (x)i,Wi) Representing the residual error.
3) In order to better adapt to the deformation of the target, automatic deformable convolution is introduced. Adding an offset variable Δ p to the sample points of each convolution kernelnGiving the convolution kernel the property of deformation. After the offset variable is added, the network can automatically adjust the shape of the convolution kernel according to the error learning offset of back propagation, so that the size of the deformable convolution kernel and the position of a sampling point can be dynamically adjusted according to the image content, and the adaptive capacity of space geometric deformation is further enhanced.
Such as a 3 x 3 convolution kernel, first requires up-sampling 9 positions, sets, from the input image or feature map xThe size of the receptive field is defined.
Wherein (-1, -1) represents x (p)0) Upper left sample point, (1,1) represents x (p)0) Lower right sample point, and the like.
for each pixel p on the conventional convolutional neural network output feature map y0as a matter of fact, it is possible to obtain:
Wherein p isnEnumerate and provideOf (c) is used.
In order to better adapt to the deformation of the target, automatic deformable convolution is introduced, and an offset variable delta p is added to the sampling point of each convolution kernelnGiven the nature of the convolution kernel deformation, for an automatic deformable convolution the formula is shown:
Wherein, Δ pnfor each pixel point pnA corresponding one of the offset amounts is,Middle element passing offset { Δ pn1, …, N (where,)。
Therefore, sampling will occur at irregular positions p with offsetn+Δpnthe above.
4) Aiming at the situation that cross overlapping and shielding often exist between targets, a soft non-maximum value suppression new strategy is adopted;
And (3) introducing a softNMS algorithm to carry out post-processing on the candidate frames, so that the problem of missed detection caused by most of overlapping of targets is solved. The softNMS algorithm score reset function is:
wherein, iou (M, b)i) The method is characterized in that the intersection ratio of the highest candidate frame and the real frame of the current score is obtained, namely, firstly, the intersection needs to be calculated, then the intersection part is subtracted from the sum of the areas of the two frames of the union set to obtain the union set, and a series of conversions are carried out through a log function to generate a new Si
When the overlapping degree of the adjacent detection frames exceeds a threshold value, the probability scores of the adjacent candidate frames are reduced through a correlation function instead of being completely eliminated. The function will attenuate to a large extent the probability values of the detection boxes that are very close to M, and the detection boxes that are far from M will not be affected and will remain in the sequence of object detection.
Referring to fig. 2, the target detection method based on deep learning includes: the ResNeXt network with 101 layers extracts target features, the RPN network generates 300 regional suggestion boxes, candidate boxes are determined, classification is carried out, regression positioning is carried out on the frames, and a detection result is obtained.
The method comprises the following concrete steps:
Step 1, downloading the ImageNet pre-training model, putting the model under a designated folder, and taking the model as an initialization parameter.
and 2, preparing a Pascal _ VOC data set, and converting the data set into an lmd file format accepted by a buffer framework.
and 3, fine-tuning the parameters of the pre-training model by using a Pascal _ VOC data set.
and 4, placing the finally generated network model under a specified folder for target detection.
And 5, inputting the image to be detected with any size into the network.
Step 6, feature extraction, namely extracting by adopting a 101-layer ResNeXt network, wherein a convolution kernel adopts deformable convolution, the basic structure of a residual error unit is 'preactivation', and meanwhile, convolution layers of residual error blocks all adopt a 'bottle socket design' structure: firstly, reducing the calculated amount through 1 multiplied by 1 convolution layer dimensionality reduction; then, performing convolution by a convolution layer of 3 multiplied by 3; finally, the structure is restored by another convolution of 1 multiplied by 1, and the precision is not influenced on the premise of ensuring that the calculated amount is reduced.
And step 7, directly generating 300 high-quality suggestion frames by adopting a regional suggestion network (RPN), wherein the regional suggestion network and the detection network share image convolution characteristics, so that the cost for calculating the regional suggestion frames is greatly reduced. In the design of generating the suggestion window, a sliding window needs to be applied on the characteristic diagram, considering that the sizes of the targets to be detected are different, in order to cope with objects with different sizes, three sliding windows (anchors) with the length-width ratios of 1:1, 2:1 and 1:2 are adopted, three multiple scale scaling sliding windows of 8, 16 and 32 are provided, and therefore each pixel point can obtain 9 types of sliding windows.
The specific implementation of the RPN: first, a small net needs to be slid over the feature map output by the last convolutional layer, while fully connecting with the input convolutional map (spatial window of size n × n). The features of each sliding window are mapped to a low-dimensional vector. The vectors are then passed to two fully connected network layers, namely a block classification layer (cls layer for short) and a block regression layer (reg layer for short). Firstly, generating a sliding window by the block classification layer, cutting and filtering the sliding window, judging whether the sliding window is an object or a background by the softmax classification sliding window, outputting the probability of an object and the probability of a non-object, but not identifying what the object is specifically; the square frame regression layer is mainly used for calculating the regression offset of the boundary frame of the sliding window to obtain an accurate suggestion frame, and the output of the regression frame is four parameters related to the regression frame, namely, the central coordinates x and y, the width w and the length h of the regression frame.
Step 8 uses the softNMS algorithm to cull out the overlapping predicted candidate boxes, first sorting all the test boxes according to their score, selecting the test box M with the largest score, and suppressing all other test boxes that overlap (using a predetermined threshold) with the M box, which would attenuate to a large extent the probability value of the test box that is very close to M, the test boxes far from M will not be affected, remain in the object detection sequence, leaving the most valuable candidate boxes, this process applied recursively to the remaining boxes. The specific implementation flow is shown in figure 3.
Step 9 after the RoI Pooling generates 300 candidate boxes through the RPN network, the target detection network maps the suggested windows to the feature map of the last layer, and each suggested window is formed into a fixed size through the RoI Pooling layer. The layer mainly utilizes the candidate frame generated by the RPN and the feature map obtained by the last layer of the ResNeXt to realize mapping operation, obtain the feature map with fixed size, and reduce the data volume needing to be processed while keeping certain useful information. Since the RPN generates more than one rectangular frame, it needs to go through each candidate frame and reduce the coordinate value by 16 times, so that the candidate frame generated on the basis of the original image can be mapped onto the feature map, thereby determining an area on the feature map, in the area, according to the parameters pooled _ w:7 and pooled _ h:7, the area is divided into 49(7 × 7) areas with the same size, and simultaneously, the maximum pooling operation is adopted, and finally a 7 × 7 feature map is generated.
step 10 full join operation. Each neuron needs to be connected with each pixel point of the input feature map, so that shallow features are integrated to express different function representations of data.
Step 11, after the full connection operation is performed in the full connection layer, the concrete classification of the target is completed by using Softmax Loss.
and step 12, frame regression. The candidate window is generally defined by a (x, y, w, h) four-dimensional vector, where x, y represents the center point coordinate w of the candidate window for width and h represents height.
in order to position the target more accurately, the original candidate frame needs to be dynamically adjusted, so that the regression window closer to the real window G is obtained after the original window is adjustedI.e. given the original frame coordinates (P)x,Py,Pw,Ph) After the operations of translation and zoom
Step 13 enters iteration until detection is completed.
According to the technical scheme, the fast RCNN based on the candidate region is improved according to the characteristics of various forms of targets, unclear targets, too small targets, shielding between targets, partially shielded targets, complex background and the like in real life. Extracting target characteristics by adopting a ResNeXt network with 101 layers; the internal structure of the residual error unit is changed into a pre-activation form, so that the network training is easier; automatic deformable convolution is introduced, so that the network can completely depend on an internal mechanism to cope with various morphological changes of the target; and (4) introducing a softNMS algorithm to screen the candidate frames, and avoiding the condition of missing detection caused by large-area overlapping between targets. A large number of experiments show that the method has the characteristics of high accuracy and strong robustness for detecting the target.
The above embodiments are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are given, but the scope of the present invention is not limited to the above embodiments. The methods used in the above examples are conventional methods unless otherwise specified.

Claims (2)

1. A target detection method based on deep learning is characterized by comprising the following steps:
1) replacing a VGG16 network used for extracting image features in a Faster RCNN method with a 101-layer residual error with stronger expression capability and deeper layers;
The forward propagation of the residual error network is linear, the input of the rear layer is the sum of the input of the front layer and each residual error unit, and the calculation expression of the deep layer unit L can be obtained after multiple iterations:
wherein, XLRepresenting the output vector, x, of the L-th layerlrepresenting the input vector of layer I, F (x)i,wi) The residual error in the l-th layer is shown; the L layer is a deep layer, and the L layer is a shallow layer;
2) The structure of the residual error unit is changed into a pre-activation mode, so that the network is more smooth in the forward and backward propagation processes; the pre-activation mode specifically comprises the following steps:
In order to make network training easier, the conventional 'post-activation' structure of the residual unit is changed into 'pre-activation'; x in modified residual unit structureLcan be regarded as XlAnd residual error accumulation, so that the gradient can be completely transmitted back when the network performs a back propagation operation, the information transmission is smooth, and the improved residual error unit structure is represented by the following formula:
Wherein ε represents a loss error value expressed asxLIndicating L layers of preMeasured value, xlαbleRepresents the value of the corresponding exponential vector of the L layer, F (x)i,Wi) Represents the residual;
3) The most basic convolution is used as an improved entry point, automatic deformable convolution is introduced, and the size and the position of a convolution kernel are dynamically adjusted according to the image content needing to be identified currently;
a set of n x n convolution kernelsdefining the size of the receptive field, outputting a feature map y for each pixel p on the conventional convolutional neural networknAs a matter of fact, it is possible to obtain:
Wherein p isnenumerate and provideposition in, x (p)0) Is a sample point, w (p)n) Representing the corresponding weight;
In order to better adapt to the deformation of the target, automatic deformable convolution is introduced, and an offset variable delta p is added to the sampling point of each convolution kernelnGiven the nature of the convolution kernel deformation, for an automatic deformable convolution the formula is shown:
Wherein, Δ pnfor each pixel point pnA corresponding one of the offset amounts is,Middle element passing offset { Δ pn1, …, N wherein,
Therefore, sampling will occur at irregular positions p with offsetn+ΔpnThe above.
2. The method for detecting the target based on the deep learning of claim 1, further comprising:
Aiming at the situation that cross overlapping and shielding often exist between targets, a soft non-maximum value suppression new strategy is adopted;
Introducing a softNMS algorithm to carry out post-processing on the candidate frames, solving the problem of missing detection caused by most of overlapping of targets, wherein the score resetting function of the softNMS algorithm is as follows:
Wherein, iou (M, b)i) The method is characterized in that the intersection ratio of the highest candidate frame and the real frame of the current score is obtained, namely, firstly, the intersection needs to be calculated, then the intersection part is subtracted from the sum of the areas of the two frames of the union set to obtain the union set, and a series of conversions are carried out through a log function to generate a new Si
When the overlapping degree of adjacent detection frames exceeds a threshold value, the probability scores of adjacent candidate frames are reduced through a correlation function instead of complete elimination, the probability values of the detection frames which are adjacent to M are greatly attenuated by the function, and the detection frames far away from M cannot be influenced and still remain in the object detection sequence.
CN201910836094.9A 2019-09-05 2019-09-05 Target detection method based on deep learning Pending CN110569782A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910836094.9A CN110569782A (en) 2019-09-05 2019-09-05 Target detection method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910836094.9A CN110569782A (en) 2019-09-05 2019-09-05 Target detection method based on deep learning

Publications (1)

Publication Number Publication Date
CN110569782A true CN110569782A (en) 2019-12-13

Family

ID=68777979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910836094.9A Pending CN110569782A (en) 2019-09-05 2019-09-05 Target detection method based on deep learning

Country Status (1)

Country Link
CN (1) CN110569782A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738069A (en) * 2020-05-13 2020-10-02 北京三快在线科技有限公司 Face detection method and device, electronic equipment and storage medium
CN111967399A (en) * 2020-08-19 2020-11-20 辽宁科技大学 Improved fast RCNN behavior identification method
CN112348187A (en) * 2020-11-11 2021-02-09 东软睿驰汽车技术(沈阳)有限公司 Training method and device of neural network model and electronic equipment
CN112529095A (en) * 2020-12-22 2021-03-19 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method
CN112749644A (en) * 2020-12-30 2021-05-04 大连海事大学 Improved deformable convolution-based Faster RCNN fire smoke detection method
CN112893159A (en) * 2021-01-14 2021-06-04 陕西陕煤曹家滩矿业有限公司 Coal gangue sorting method based on image recognition
CN113052756A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN113569752A (en) * 2021-07-29 2021-10-29 清华大学苏州汽车研究院(吴江) Lane line structure identification method, device, equipment and medium
CN113794915A (en) * 2021-09-13 2021-12-14 海信电子科技(武汉)有限公司 Server, display equipment, poetry and song endowing generation method and media asset playing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427920A (en) * 2018-02-26 2018-08-21 杭州电子科技大学 A kind of land and sea border defense object detection method based on deep learning
CN109325504A (en) * 2018-09-07 2019-02-12 中国农业大学 A kind of underwater sea cucumber recognition methods and system
CN109299688A (en) * 2018-09-19 2019-02-01 厦门大学 Ship Detection based on deformable fast convolution neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KAIMING HE等: "Identity Mappings in Deep Residual Networks", 《ARXIV:1603.05027V3》 *
SAINING XIE等: "Aggregated Residual Transformations for Deep Neural Networks", 《ARXIV:1611.05431V2》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052756A (en) * 2019-12-27 2021-06-29 武汉Tcl集团工业研究院有限公司 Image processing method, intelligent terminal and storage medium
CN111738069A (en) * 2020-05-13 2020-10-02 北京三快在线科技有限公司 Face detection method and device, electronic equipment and storage medium
CN111967399A (en) * 2020-08-19 2020-11-20 辽宁科技大学 Improved fast RCNN behavior identification method
CN112348187A (en) * 2020-11-11 2021-02-09 东软睿驰汽车技术(沈阳)有限公司 Training method and device of neural network model and electronic equipment
CN112651994A (en) * 2020-12-18 2021-04-13 零八一电子集团有限公司 Ground multi-target tracking method
CN112529095B (en) * 2020-12-22 2023-04-07 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN112529095A (en) * 2020-12-22 2021-03-19 合肥市正茂科技有限公司 Single-stage target detection method based on convolution region re-registration
CN112749644A (en) * 2020-12-30 2021-05-04 大连海事大学 Improved deformable convolution-based Faster RCNN fire smoke detection method
CN112749644B (en) * 2020-12-30 2024-02-27 大连海事大学 Faster RCNN fire smoke detection method based on improved deformable convolution
CN112893159A (en) * 2021-01-14 2021-06-04 陕西陕煤曹家滩矿业有限公司 Coal gangue sorting method based on image recognition
CN112893159B (en) * 2021-01-14 2023-01-06 陕西陕煤曹家滩矿业有限公司 Coal gangue sorting method based on image recognition
CN113569752B (en) * 2021-07-29 2023-07-25 清华大学苏州汽车研究院(吴江) Lane line structure identification method, device, equipment and medium
CN113569752A (en) * 2021-07-29 2021-10-29 清华大学苏州汽车研究院(吴江) Lane line structure identification method, device, equipment and medium
CN113794915A (en) * 2021-09-13 2021-12-14 海信电子科技(武汉)有限公司 Server, display equipment, poetry and song endowing generation method and media asset playing method
CN113794915B (en) * 2021-09-13 2023-05-05 海信电子科技(武汉)有限公司 Server, display device, poetry and singing generation method and medium play method

Similar Documents

Publication Publication Date Title
CN110569782A (en) Target detection method based on deep learning
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN108304873B (en) Target detection method and system based on high-resolution optical satellite remote sensing image
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN109522908A (en) Image significance detection method based on area label fusion
CN103049763B (en) Context-constraint-based target identification method
CN110032925B (en) Gesture image segmentation and recognition method based on improved capsule network and algorithm
CN110633708A (en) Deep network significance detection method based on global model and local optimization
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
Asokan et al. Machine learning based image processing techniques for satellite image analysis-a survey
CN110689021A (en) Real-time target detection method in low-visibility environment based on deep learning
CN109035300B (en) Target tracking method based on depth feature and average peak correlation energy
CN112364865B (en) Method for detecting small moving target in complex scene
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN113592911B (en) Apparent enhanced depth target tracking method
CN104657980A (en) Improved multi-channel image partitioning algorithm based on Meanshift
CN109977834B (en) Method and device for segmenting human hand and interactive object from depth image
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
CN105354547A (en) Pedestrian detection method in combination of texture and color features
Ju et al. A novel fully convolutional network based on marker-controlled watershed segmentation algorithm for industrial soot robot target segmentation
Yang et al. SiamMMF: multi-modal multi-level fusion object tracking based on Siamese networks
CN111797795A (en) Pedestrian detection algorithm based on YOLOv3 and SSR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination