CN112686207A - Urban street scene target detection method based on regional information enhancement - Google Patents
Urban street scene target detection method based on regional information enhancement Download PDFInfo
- Publication number
- CN112686207A CN112686207A CN202110085069.9A CN202110085069A CN112686207A CN 112686207 A CN112686207 A CN 112686207A CN 202110085069 A CN202110085069 A CN 202110085069A CN 112686207 A CN112686207 A CN 112686207A
- Authority
- CN
- China
- Prior art keywords
- detection
- network
- target
- feature
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000013461 design Methods 0.000 claims abstract description 10
- 238000001914 filtration Methods 0.000 claims abstract description 3
- 238000005070 sampling Methods 0.000 claims description 25
- 230000004927 fusion Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 7
- 230000003993 interaction Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 5
- 239000002131 composite material Substances 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 210000000988 bone and bone Anatomy 0.000 claims description 2
- 230000017105 transposition Effects 0.000 claims description 2
- 230000003213 activating effect Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 12
- 238000013135 deep learning Methods 0.000 abstract description 6
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 230000004438 eyesight Effects 0.000 abstract description 3
- 230000003068 static effect Effects 0.000 abstract description 3
- 230000006399 behavior Effects 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 12
- 230000011218 segmentation Effects 0.000 description 8
- 238000001931 thermography Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000012544 monitoring process Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000007477 logistic regression Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000011748 cell maturation Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
A city street scene target detection method based on regional information enhancement relates to the field of artificial intelligence and the field of computer vision. The method comprises the following steps: 1) the daytime data was added to the training data. 2) And outputting a target position code segmask, and detecting and outputting a network Detection Block output category prediction module cls and a size regression module size. 3) Optimizing a detection algorithm, and a) initializing network model parameters. b) Forward outputting the target category and the detection frame; and filtering and outputting a final detection result. The invention designs a target detection deep learning network, trains an urban street scene detection model, and designs a detection system which is suitable for various events under the whole scene in the daytime and at night by matching with an intelligent system of a video intelligent analysis static video frame target detection technology and a target behavior analysis technology in a dynamic video, thereby accurately and rapidly completing the automatic detection of illegal events and effectively avoiding the false detection and missed detection of targets.
Description
Technical Field
The invention relates to the field of artificial intelligence and the field of computer vision, in particular to an intelligent target detection method based on an image processing technology and a video analysis technology and applied to a monitoring scene of a camera in an urban street.
Background
With the development of modern science and technology, the camera is utilized to realize high-efficiency city supervision and is applied to city management work, and the camera helps city management personnel to play a vital role in dealing with complex city street emergency. Currently, more and more researchers are focusing on addressing the automated administration of urban street scenarios. The purpose of the urban scene visual supervision is to improve the resolution capability of a scene monitoring image and endow an intelligent urban management system with the capability of correctly understanding scene information so as to improve the safety of urban streets, parking lots and communities. Meanwhile, the camera device in the night scene is affected by some uncontrollable factors such as bad weather and illuminance, and the common target detection method cannot meet all-weather scene supervision requirements of the whole scene.
In the field of computer vision, two different detection ideas of anchor-base and anchor-free can be utilized for target detection in a complex scene, a paper "CenterNet: Objects as Points" published in 2019 of Xinyi Zhou et al is utilized to convert category regression of a target detection problem to find a target central point in the first real sense, namely, a Gaussian heat point is adopted by a detector as a key point estimation to fit the target central point, the target detection problem is changed into a standard key point estimation problem, and other target attributes such as size, 3D position, direction and even posture can be derived. Compared with a BBox-based detector, the model is microminiature from end to end, the detection process is simpler, faster and more accurate, and the best balance between the detection speed and the accuracy is achieved. Therefore, the idea based on the Gaussian heat point shows obvious advantages for the problems of target segmentation, target tracking and the like. A paper published in 2019 by Kailun Yang et al, "See Clearer at Right directions Robust righttime Semantic Segmentation through Day-Night Image Conversion", proposes the use of a generative countermeasure network (GAN) to alleviate the problem of low accuracy in applying a Semantic Segmentation model to a Nighttime environment. The GAN-based night semantic segmentation framework includes two approaches. The first uses GAN to convert night images to day images and trains a robust model to perform semantic segmentation on the day dataset already used; the second method converts the daytime images in the dataset into nighttime images using GAN to produce a very robust model under nighttime conditions and to predict the nighttime images directly. In the paper experiment, the second method significantly improves the segmentation performance of the model on the night image. The method is not only beneficial to the optimization of the visual perception of the intelligent vehicle, but also can be applied to various navigation auxiliary systems. A paper published by 2019 of Yishi et al, namely a night target identification method based on infrared thermal imaging and YOLOv3, provides infrared thermal imaging image reaction object temperature information, is less influenced by environmental conditions, and has strong application value in the aspects of night security monitoring, driving assistance, shipping, military reconnaissance and the like under specific conditions. In recent years, the development of the technology for detecting and identifying the target in the image by using artificial intelligence is greatly advanced, and the method is widely applied to various fields. The thesis proposes a night target identification method combining an infrared thermal imaging image processing technology and an artificial intelligence target identification technology. The method comprises the steps of collecting a thermal imaging video in real time, preprocessing the thermal imaging video to enhance the contrast and details of the thermal imaging video, detecting a specific target in a thermal imaging image after collection and processing by using a latest target detection framework YOLOv3 based on a deep learning technology, and outputting a detection result. The test result shows that the method has high night target recognition rate and strong real-time performance, combines the advantages of infrared thermal imaging night monitoring and artificial intelligent target detection, and has great application value for night target recognition and tracking technology.
In summary, for target detection in an urban management scene, it is an effective method to reasonably utilize preprocessing of images and design a more reasonable target detection algorithm, however, while target detection in a daytime scene is solved, night target detection in a complex scene still has certain disadvantages, and a complex background is easy to cause target false detection, missed detection and the like. How to improve the detection capability and reduce false detection is also a hot spot of complex scene target detection research.
Meanwhile, the existing detection algorithm also has some defects:
based on the traditional algorithm, the requirements are difficult to meet in the aspects of identification and understanding of urban scene monitoring, and the main reasons are that the color of two scenes in the day and at night is changed greatly, the complexity of the scenes is too high, the edges of targets are fuzzy, the targets are shielded, and the like. These factors require that the algorithm has super-strong generalization capability and accuracy, the traditional algorithm cannot meet the target detection requirement on the theoretical basis, and the false detection and the missing detection of the target are difficult to eliminate in the field application.
Based on the deep learning algorithm, a method with strong generalization capability can be designed to solve the problem of various target forms at night, but the method also has high requirements on the design of a network model. In addition, most of common deep learning algorithms are applied to target detection in daytime scenes by calibrating training samples, but static characteristics of targets in night monitoring scenes are sparse, and false detection and missing detection of the targets are easy to occur.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, an object of the present invention is to provide an object detection network based on regional information enhancement and a detection method thereof. The invention designs a target detection deep learning network, trains an urban street scene detection model, and designs a detection system which is suitable for various events under the whole scene in the daytime and at night by matching with an intelligent system of a video intelligent analysis static video frame target detection technology and a target behavior analysis technology in a dynamic video, thereby accurately and rapidly completing the automatic detection of illegal events and effectively avoiding the false detection and missed detection of targets.
In order to achieve the above object, the technical solution of the present invention is implemented as follows:
a city street scene target Detection method based on regional information enhancement uses a target Detection network which comprises a first WiFPN1, a feature selection network Seg Block, a second WiFPN2, an up-sampling network UATB and a Detection output network Detection Block which are connected in sequence. The downsampling network backhaul comprises a feature fusion module I WiFPN1 and a feature fusion module II WiFPN 2. The method comprises the following steps:
1) scene pre-processing of image data
Adding 40% of daytime data into training data, wherein the data enhancement aspect in the model training process comprises the enhancement of turning, scaling, cutting, color brightness and chroma; the pixel size of the input image is normalized to 448 x 256.
2) Network model design
An anchor free target Detection algorithm is adopted, a down-sampling and up-sampling network structure is designed, a feature selection network Seg Block is used for outputting a target position code segmask, and an output network Detection Block output category prediction module cls and a size regression module size are detected.
3) Detection algorithm optimization
a) Training process
Initializing network model parameters, and setting a learning target, a learning rate and an attenuation coefficient; and performing iterative learning and parameter updating on the loss function through an optimization algorithm.
b) Reasoning process
Outputting the target category and the detection frame in the forward direction by utilizing the cls and the size; and filtering and outputting a final detection result by setting a threshold and a non-maximum suppression algorithm.
In the method for detecting the urban street scene target, the downsampling network Backbone extracts intermediate features, the feature selection network Seg Block optimizes the intermediate features, and the upsampling network UATB extracts predicted features.
In the above method for detecting urban street scene targets, the feature selection network Seg Block contains area information for supervised learning. Firstly, designing a learnable variable soft with the size between 0 and 1, and carrying out calibration selection fusion on three output characteristics of a characteristic fusion module WiFPN1 by using the soft; compressing the features to a one-dimensional channel by using 1 × 1 convolution operation, and outputting a target position code segmask through an activation function sigmoid; designing a target segmask _ gt with a target area scalar value of 1, optimizing a target position coding segmask through a loss function, and finally multiplying the optimized target position coding segmask by three input pixel values of a feature selection network Seg Block correspondingly to complete the selection of the bottom features of the downsampling network Back bone network.
In the method for detecting the urban street scene target, the UATBUATB is an up-sampling network with composite multi-stage semantic features, the features of each stage of the down-sampling network Backbone are used as signal input, and a double-interaction attention module C _ ATB is adopted to carry out operation layer by layer until a prediction feature is obtained.
In the above method for detecting an object in a city street scene, the dual interaction attention module C _ ATB includes learning two-stage region information. Compressing two input features into a shared single-channel attention feature AT through convolution, transposition convolution, Concat merging and a Sigmoid activation function, and multiplying the single-channel attention feature AT by an upper layer input feature element and a lower layer input feature element to complete feature interaction in a first stage, so that the adjustment of a target space position is realized; and merging the characteristics of the intermediate adjustment, sending the merged characteristics into an SEnet network, and selecting the second-stage fusion of the characteristics of the two stages at the channel level to finish the up-sampling output information.
In the method for detecting the urban street scene target, the Detection output network Detection Block takes 2-time down-sampling output as a shared feature, convolution operation of an n-dimensional convolution kernel is adopted, feature values of n channels are normalized by a Sigmoid function and mapped to a category prediction module cls, and convolution operation of the 2-dimensional convolution kernel is adopted to obtain two layers of channels which are respectively mapped to a size regression module size corresponding to the size of the Detection target.
As the detection method is adopted, compared with the prior art, the invention has the following advantages:
1. a network structure is designed based on the anchor-free target detection method, and 2 times of down-sampling output is used for prediction, so that the detection effect of the small target is improved.
2. And (3) paying attention to the spatial information of the features by using a weak supervision method, and enhancing the attention to the position information of the target in the image.
3. Two WiFPN modules are used for effectively and independently integrating shallow features and deep features.
4. And a simpler UATB module is used for finishing the upsampling work, and the characteristic information of each input layer of the pyramid is effectively utilized for prediction.
The invention is further described with reference to the following figures and detailed description.
Drawings
FIG. 1 is a schematic diagram of a detection network used in the method of the present invention;
FIG. 2 is a schematic structural diagram of a Seg Block module in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a C _ ATB module according to an embodiment of the present invention.
Detailed Description
Referring to fig. 1, in the city street scene target Detection method based on regional information enhancement of the present invention, the target Detection network used includes a first WiFPN1, a feature selection network Seg Block, a second WiFPN2, an up-sampling network UATB, and a Detection output network Detection Block, which are connected in sequence. The method extracts intermediate features through a downsampling network Backbone, wherein the Backbone comprises a feature fusion module I WiFPN1 and a feature fusion module II WiFPN2, WiFPN1 extracts network shallow features, and WiFPN2 extracts network deep features. Designing a feature selection network Seg Block optimization intermediate feature; designing an up-sampling network UATB to extract prediction characteristics; the input video signal passes through the background, Seg Block, UATB and Detection Block in sequence to output segmask, cls and size.
The invention relates to a city street scene target detection method based on regional information enhancement, which comprises the following steps:
1) scene pre-processing of image data
The pixel size of the image in an urban scene is 1920 x 1080, and the pixel size of the image is normalized to 448 x 256 in order to reduce the occurrence of large deformation of small targets caused by excessive changes in the scale of the input dimensions of the network model. In order to supplement night target color features and target texture features, make the network model more robust to color interference and reduce the influence caused by the lack of night target features to a certain extent, 40% of day data is added into training data. The data enhancement in the process of training the model is added with the enhancement of color brightness and chroma besides conventional turning, scaling and clipping.
2) Network model design
In the invention, in order to solve the problems of false detection and missing detection of the target detection of the street scene, a new target detection network structure ADetNet is designed. ADetNet is a target detection algorithm adopting no anchor point, accelerates network convergence and increases the detection rate and accuracy of a target by three modules of a target position coding segmask, a category prediction module cls and a size regression module size, and the network structure comprises an Input and three output segmasks, cls and sizes. Adding a branch Seg Block on the basis of an ADetNet network WiFPN1 module, and outputting segmask information sampled by 8 times; after the Detection Block module of the ADetNet network, the 2-time down-sampled target class information cls and the 2-time down-sampled Detection frame information size are output.
Referring to fig. 1, the signal input represents the input of the ADetNet network, segmask, cls, size represent its three outputs, Conv represents its forward operation structure, Feature represents its operation intermediate result, dashed box WiFPN1 represents its shallow Feature fusion module, dashed box WiFPN2 represents its deep Feature fusion module, and dashed box UATB represents its attention upsampling module. In the network structure, each time of downlink operation carries out one characteristic downsampling operation, each time of uplink operation carries out one characteristic upsampling operation, the ADetNet network uses 5 times of Conv to realize the size reduction of feature map size by 32 times, and 2 times of downsampling is completed through one Conv.
Wherein, Conv represents convolution operation, Add represents element addition operation, and Max pooling represents maximum pooling operation.
The WiFPN module is a feature fusion module proposed in the thesis EfficientDet and aims to deliver better semantic features of the upper layer and the lower layer of the network for a detection part. The WiFPN module can realize weight-enhanced feature fusion for information features of different sizes, so that weighted operation of semantic features is realized by each layer of a downsampling network at the cost of small calculation. Therefore, a WiFPN module with three inputs and three outputs is built to organically integrate shallow and deep features, two groups of input features with different resolutions are fused, an extra weight is added for each input in the feature fusion process, the network is made to know the importance of each input feature, and the output O of each layer is as shown in formula (1).
Wherein, IiRepresenting the inputs of a three-layer WiFPN module, WiAre learnable weights, which may be scalars (per feature), vectors (per channel), or multidimensional tensors (per pixel),by applying at each WiThe Relu function was then applied for normalization and setting epsilon =0.000 avoids numerical instability. Similarly, the value of each normalized weight is also between 0 and 1, which is more efficient since softmax operation is not used here.
To further improve the fusion effect, we use separable deep convolutions for feature fusion and add bulk normalization and activation after each convolution. The WiFPN1 module is used for performing attention enhancement on the network output features of the network structure 2-8 times of down-sampling, optimizing a plurality of different resolution input features of the shallow network, and conveying better spatial semantic information for the deep network. The WiFPN2 module is used for carrying out attention enhancement on network output characteristics of network structure 8-32 times down-sampling, optimizing a plurality of input characteristics with different resolutions of a deep network, and accumulating better receptive field information for up-sampling operation. Various ways can be sampled during the upsampling process: (1) deconvolution, (2) linear upsampling, (3) linear upsampling in combination with 1 × 1 convolution operation can be implemented according to different requirements.
The shallow feature map has rich information semantics, and in order to fully mine texture features beneficial to target detection, weak supervision segmentation is used as a branch to complete local feature enhancement. The method takes an output feature map of a module WiFPN1 and a truth value of boundary box-level segmentation as input, generates a semantic feature mapping mask with the same dimension, multiplies the output feature map of the WiFPN1 by elements by using the semantic feature mapping mask to obtain a local feature to be concerned, and finally weights and sums the local feature and an underlying feature map by weight capable of learning and transmits the local feature and the underlying feature map downwards. Based on this idea, we designed a Seg Block module, as shown in fig. 2.
Unlike the centeret algorithm which uses 4 times of downsampled feature output as a prediction layer, a novel upsampling network UATB with composite image multi-stage semantic features is designed, considering that 2 times of downsampled feature output is more favorable for acquiring feature information of a small-size target (less than 50 x 50 pixels), and the structure of the upsampling network UATB is formed by combining sub-modules C _ ATB (double interaction attention modules). For the downsampling output characteristics of ADetNet network in each stage, starting from a 32-time downsampling layer, performing upsampling operation layer by adopting a C _ ATB module until 2-time downsampling characteristics are generated and are used as the input of a Detection Block module (prediction module), and the UATB module has the other function of enabling the output of two WiFPN modules to flow and interact characteristic information through an upsampling process.
Fig. 3 abstractly shows the size change process of each feature in the C _ ATB network, as shown in the figure, the length and width of the upper layer input is 2 times that of the lower layer input, two input features are compressed into a common single-channel attention feature AT through convolution, transposed convolution, Concat merging and Sigmoid activation functions, and then multiplied by the upper and lower layer input feature elements to complete the feature interaction in the first stage, so that the adjustment of the target spatial position is realized; and merging the characteristics of the intermediate adjustment, sending the merged characteristics into an SEnet network, and performing selective fusion on the characteristics of the two stages at the channel level to finish up-sampling output information. The C _ ATB network learns different weights on the spatial position and channels in different stages, and adaptively adjusts the spatial position information and the context semantic information of input features in different stages.
The Detection Block module serves as a functional differentiation mechanism of the ADetNet network, and takes the features before the last up-sampling (2-time down-sampling) of the UATB module as sharing features to generate final class prediction cls and size regression size. For an n-type target detection task, in order to map the class prediction cls, the convolution operation of an n-dimensional convolution kernel is adopted, the feature values of n channels are normalized by using a Sigmoid function, and each layer of channel corresponds to the corresponding value of one type of detection target. In order to map the size regression size, the convolution operation of the 2-dimensional convolution kernel is adopted, and the obtained two layers of channels respectively correspond to the size W, H of the detection target.
3) Detection algorithm optimization
a) Training process
in the design of the whole network structure, the segmask module and the detection module composed of the cls module and the size module share one network backbone structure backbone. For an n-class object detection network, orderIs an input image, which is wide W and high H. Its position code generates vectorThe branch cls module in the detection module outputs the vectorDetection of the branch size module output vector. Learning objectives are set for the above three network outputs,and coding to obtain a corresponding target vector, and performing iterative learning through a loss function.
aiming at segmask module class output, the learning target is a thermodynamic diagram of a key region under R times of original encodingWhere R is the output size scaling, with R = 8;a key area indicating the presence of a target;representing a background area; we use the Seg Block encoding-decoding network to predict the image I. When training segmask key area prediction network, orderIs target k (class C)k) By bbox, we pass the mask _ GT key point through a rectangular thermal frameIs dispersed in thermodynamic diagramThe above. To reduce computational burden, a common segmask prediction is used for all target classesThe training objective function is as follows, pixel-level logistic regression focal loss:
where α and β are the hyper-parameters of the focal loss, two numbers are set to 2 and 4 respectively in the experiment, N is the number of critical regions in input I, and dividing by N is mainly to normalize all the focal losses.
For the detection module class branch cls output, the learning objective is to generate a key point Gaussian thermodynamic diagramWhere R is the detection output vector size scaling, we use a larger output size for better prediction of small targets, the number of downsamples R = 2;which is indicative of the detected key point(s),representing background, namely, setting a positive sample by one target; we predict image I using the ADetNet full convolutional encoder-decoder network. Setting a group route (GT) key point of a classification target in an original image as C and the position of the key point as CCalculating to obtain corresponding key points on low resolution (after down sampling). We pass GT keypoint through the Gaussian kernelIs dispersed in thermodynamic diagramTherein are disclosedIs the standard deviation of the target scale adaptation. If two gaussian functions overlap for the same class n (the same keypoint or object class), we choose the element level that is the largest. For regression class loss, focal loss of pixel-level logistic regression is used as the training objective function.
For the output of the size branch size of the detection module, the learning target directly adopts the size width and height of the target, and orderIs target k (class C)k) Bbox having a central position ofWe use keypoint estimationAll the center points are obtained. In addition, the size of the target is regressed for each target k. To reduce computational burden, a single size prediction is used for each target classWe add L1 loss at the center point position:
in order to adjust the relationship of the three loss, the relationship is multiplied by a coefficient, and the target loss function of the whole training is:
in the course of the training process,the entire net prediction will output 2 x n +2 values at each position (i.e. keypoint class n, critical region class n, size w, h), all outputs sharing a fully-convolved backoff-bone.
b) Reasoning process
Only the detection module of the model is needed to predict the object type and the object sizeIs an input image, which is wide W and high H. Firstly, through network forward calculation, using model class branch cls to output vectorThe width of the vector is 0.5W, the height of the vector is 0.5H, N is the number of categories, and the value of each point in the vector represents the probability of the occurrence of the target; output vector using model size branch sizeAnd the two layers of characteristics in the vector are respectively mapped to the width and the height of the point location detection frame corresponding to the class output. Then, aiming at the category output, a part of the prediction results with correspondingly lower values is filtered out by a method of setting a threshold value. And finally, removing redundant detection frames by using a non-maximum suppression algorithm NMS and outputting a final detection result.
The embodiment of the invention is only used for explaining the technical scheme of the application, and similar substitution is made by technical personnel in the field on the basis of the application, for example, the deep learning anchor-free detection method is replaced by other anchor-free target detection methods; in addition, the present invention uses the Conv module to perform downsampling operation on the image, and replaces the downsampling operation with other full convolution encoding-decoding networks, mathematical models, and the like, which should fall into the protection scope of the present application.
Claims (6)
1. A city street scene target Detection method based on regional information enhancement, the target Detection network that it uses includes the feature fusion module one WiFPN1, feature selection network Seg Block, feature fusion module two WiFPN2, up-sampling network UATB and Detection output network Detection Block that connect sequentially, the down-sampling network backhaul includes feature fusion module one WiFPN1 and feature fusion module two WiFPN2, its method step is:
1) scene pre-processing of image data
Adding 40% of daytime data into training data, wherein the data enhancement aspect in the model training process comprises the enhancement of turning, scaling, cutting, color brightness and chroma; the pixel size of the input image is normalized to 448 x 256;
2) network model design
Designing down-sampling and up-sampling network structures by adopting an anchor free target Detection algorithm, selecting a network Seg Block output position code segmask by utilizing characteristics, and detecting an output network Detection Block output class cls and a Detection frame size;
3) detection algorithm optimization
a) Training process
Initializing network model parameters, and setting a learning target, a learning rate and an attenuation coefficient; iterative learning and parameter updating are carried out on the loss function through an optimization algorithm;
b) reasoning process
Outputting the target category and the detection frame in the forward direction by utilizing the cls and the size; and filtering and outputting a final detection result by setting a threshold and a non-maximum suppression algorithm.
2. The urban street scene target detection method based on regional information enhancement as claimed in claim 1, wherein the downsampling network backup extracts intermediate features, the feature selection network Seg Block optimizes the intermediate features, and the upsampling network UATB extracts predicted features.
3. The urban street scene target detection method based on regional information enhancement as claimed in claim 1 or 2, wherein the feature selection network Seg Block extracts regional information by means of supervised learning, firstly, a learnable variable soft with a size between 0 and 1 is designed, and the soft is used for carrying out calibration selection fusion on three output features of a feature fusion module WiFPN 1; compressing the features to a one-dimensional channel by using 1 × 1 convolution operation, and outputting a position code segmask by activating a function sigmoid; designing a target segmask _ gt with a target area scalar value of 1, optimizing a position coding segmask through a loss function, and finally multiplying the optimized position coding segmask by the pixel values of three inputs of a feature selection network Seg Block to complete the selection of the bottom features of the downsampling network Back bone network.
4. The city street scene target detection method based on regional information enhancement as claimed in claim 1 or 2, wherein the UATB is an up-sampling network with composite multi-stage semantic features, each stage feature of the down-sampling network backhaul is used as signal input, and a double-interaction attention module C _ ATB is adopted to perform operation layer by layer until a prediction feature is obtained.
5. The urban street scene target detection method based on regional information enhancement as claimed in claim 4, wherein the double interaction attention module C _ ATB comprises learning two-stage regional information, compressing two input features into a common single-channel attention feature AT through convolution, transposition convolution, Concat merging and Sigmoid activation functions, and then multiplying the single-channel attention feature AT with upper and lower input feature elements to complete feature interaction in the first stage, thereby realizing adjustment of a target spatial position; and merging the characteristics of the intermediate adjustment, sending the merged characteristics into an SEnet network, and selecting the second-stage fusion of the characteristics of the two stages at the channel level to finish the up-sampling output information.
6. The urban street scene target Detection method based on regional information enhancement as claimed in claim 1, wherein the Detection output network Detection Block uses 2 times down-sampling output as a shared feature, and adopts convolution operation of n-dimensional convolution kernel, and uses Sigmoid function to normalize the feature value of n channels to map to class cls, and adopts convolution operation of 2-dimensional convolution kernel to obtain two layers of channels which are respectively mapped to the size of the Detection target.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110085069.9A CN112686207B (en) | 2021-01-22 | 2021-01-22 | Urban street scene target detection method based on regional information enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110085069.9A CN112686207B (en) | 2021-01-22 | 2021-01-22 | Urban street scene target detection method based on regional information enhancement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112686207A true CN112686207A (en) | 2021-04-20 |
CN112686207B CN112686207B (en) | 2024-02-27 |
Family
ID=75458885
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110085069.9A Active CN112686207B (en) | 2021-01-22 | 2021-01-22 | Urban street scene target detection method based on regional information enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112686207B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113487600A (en) * | 2021-07-27 | 2021-10-08 | 大连海事大学 | Characteristic enhancement scale self-adaptive sensing ship detection method |
CN113837305A (en) * | 2021-09-29 | 2021-12-24 | 北京百度网讯科技有限公司 | Target detection and model training method, device, equipment and storage medium |
CN114565860A (en) * | 2022-03-01 | 2022-05-31 | 安徽大学 | Multi-dimensional reinforcement learning synthetic aperture radar image target detection method |
CN114581798A (en) * | 2022-02-18 | 2022-06-03 | 广州中科云图智能科技有限公司 | Target detection method and device, flight equipment and computer readable storage medium |
CN115578615A (en) * | 2022-10-31 | 2023-01-06 | 成都信息工程大学 | Night traffic sign image detection model establishing method based on deep learning |
CN115690704A (en) * | 2022-09-27 | 2023-02-03 | 淮阴工学院 | LG-CenterNet model-based complex road scene target detection method and device |
CN115985102A (en) * | 2023-02-15 | 2023-04-18 | 湖南大学深圳研究院 | Urban traffic flow prediction method and equipment based on migration contrast learning |
CN116630909A (en) * | 2023-06-16 | 2023-08-22 | 广东特视能智能科技有限公司 | Unmanned intelligent monitoring system and method based on unmanned aerial vehicle |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304798A (en) * | 2018-01-30 | 2018-07-20 | 北京同方软件股份有限公司 | The event video detecting method of order in the street based on deep learning and Movement consistency |
WO2019232894A1 (en) * | 2018-06-05 | 2019-12-12 | 中国石油大学(华东) | Complex scene-based human body key point detection system and method |
WO2020224406A1 (en) * | 2019-05-08 | 2020-11-12 | 腾讯科技(深圳)有限公司 | Image classification method, computer readable storage medium, and computer device |
-
2021
- 2021-01-22 CN CN202110085069.9A patent/CN112686207B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108304798A (en) * | 2018-01-30 | 2018-07-20 | 北京同方软件股份有限公司 | The event video detecting method of order in the street based on deep learning and Movement consistency |
WO2019232894A1 (en) * | 2018-06-05 | 2019-12-12 | 中国石油大学(华东) | Complex scene-based human body key point detection system and method |
WO2020224406A1 (en) * | 2019-05-08 | 2020-11-12 | 腾讯科技(深圳)有限公司 | Image classification method, computer readable storage medium, and computer device |
Non-Patent Citations (1)
Title |
---|
范红超;李万志;章超权;: "基于Anchor-free的交通标志检测", 地球信息科学学报, no. 01 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113487600A (en) * | 2021-07-27 | 2021-10-08 | 大连海事大学 | Characteristic enhancement scale self-adaptive sensing ship detection method |
CN113487600B (en) * | 2021-07-27 | 2024-05-03 | 大连海事大学 | Feature enhancement scale self-adaptive perception ship detection method |
CN113837305A (en) * | 2021-09-29 | 2021-12-24 | 北京百度网讯科技有限公司 | Target detection and model training method, device, equipment and storage medium |
US11823437B2 (en) | 2021-09-29 | 2023-11-21 | Beijing Baidu Netcom Science Technology Co., Ltd. | Target detection and model training method and apparatus, device and storage medium |
CN114581798A (en) * | 2022-02-18 | 2022-06-03 | 广州中科云图智能科技有限公司 | Target detection method and device, flight equipment and computer readable storage medium |
CN114565860A (en) * | 2022-03-01 | 2022-05-31 | 安徽大学 | Multi-dimensional reinforcement learning synthetic aperture radar image target detection method |
CN115690704A (en) * | 2022-09-27 | 2023-02-03 | 淮阴工学院 | LG-CenterNet model-based complex road scene target detection method and device |
CN115690704B (en) * | 2022-09-27 | 2023-08-22 | 淮阴工学院 | LG-CenterNet model-based complex road scene target detection method and device |
CN115578615A (en) * | 2022-10-31 | 2023-01-06 | 成都信息工程大学 | Night traffic sign image detection model establishing method based on deep learning |
CN115985102A (en) * | 2023-02-15 | 2023-04-18 | 湖南大学深圳研究院 | Urban traffic flow prediction method and equipment based on migration contrast learning |
CN116630909A (en) * | 2023-06-16 | 2023-08-22 | 广东特视能智能科技有限公司 | Unmanned intelligent monitoring system and method based on unmanned aerial vehicle |
CN116630909B (en) * | 2023-06-16 | 2024-02-02 | 广东特视能智能科技有限公司 | Unmanned intelligent monitoring system and method based on unmanned aerial vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN112686207B (en) | 2024-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112686207B (en) | Urban street scene target detection method based on regional information enhancement | |
CN112347859B (en) | Method for detecting significance target of optical remote sensing image | |
CN113902915B (en) | Semantic segmentation method and system based on low-light complex road scene | |
CN113052210A (en) | Fast low-illumination target detection method based on convolutional neural network | |
CN113158738A (en) | Port environment target detection method, system, terminal and readable storage medium based on attention mechanism | |
CN114565860A (en) | Multi-dimensional reinforcement learning synthetic aperture radar image target detection method | |
CN110490155B (en) | Method for detecting unmanned aerial vehicle in no-fly airspace | |
CN116311254B (en) | Image target detection method, system and equipment under severe weather condition | |
CN112651423A (en) | Intelligent vision system | |
CN114821018B (en) | Infrared dim target detection method for constructing convolutional neural network by utilizing multidirectional characteristics | |
CN112883887B (en) | Building instance automatic extraction method based on high spatial resolution optical remote sensing image | |
CN116229452B (en) | Point cloud three-dimensional target detection method based on improved multi-scale feature fusion | |
CN114220126A (en) | Target detection system and acquisition method | |
CN112560865A (en) | Semantic segmentation method for point cloud under outdoor large scene | |
CN116258940A (en) | Small target detection method for multi-scale features and self-adaptive weights | |
CN117157679A (en) | Perception network, training method of perception network, object recognition method and device | |
CN116740516A (en) | Target detection method and system based on multi-scale fusion feature extraction | |
CN115527096A (en) | Small target detection method based on improved YOLOv5 | |
Pashaei et al. | Fully convolutional neural network for land cover mapping in a coastal wetland with hyperspatial UAS imagery | |
CN116977866A (en) | Lightweight landslide detection method | |
CN115731517A (en) | Crowd detection method based on Crowd-RetinaNet network | |
CN113642676B (en) | Regional power grid load prediction method and device based on heterogeneous meteorological data fusion | |
CN113763356A (en) | Target detection method based on visible light and infrared image fusion | |
Zhu et al. | Small target detection algorithm based on multi-target detection head and attention mechanism | |
Alshammari et al. | Multi-task learning for automotive foggy scene understanding via domain adaptation to an illumination-invariant representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |