CN113901897A - Parking lot vehicle detection method based on DARFNet model - Google Patents

Parking lot vehicle detection method based on DARFNet model Download PDF

Info

Publication number
CN113901897A
CN113901897A CN202111118334.5A CN202111118334A CN113901897A CN 113901897 A CN113901897 A CN 113901897A CN 202111118334 A CN202111118334 A CN 202111118334A CN 113901897 A CN113901897 A CN 113901897A
Authority
CN
China
Prior art keywords
network
layer
prediction
parking lot
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111118334.5A
Other languages
Chinese (zh)
Inventor
陈志华
嵇恒铭
周小兵
公海涛
张景轩
王喆
吴宇迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN202111118334.5A priority Critical patent/CN113901897A/en
Publication of CN113901897A publication Critical patent/CN113901897A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to the technical field of video image processing, and provides a parking lot vehicle detection algorithm based on a DARFNet model, which comprises the following steps: a. providing a parking lot image to be detected, and performing feature extraction processing on the parking lot image to obtain corresponding image preliminary features; b. building a lightweight network, and inputting the parking lot image to be detected into the lightweight network to obtain the initial characteristics of the image; c. building a multi-channel hybrid fusion self-attention network, and inputting the preliminary features into the multi-channel fusion self-attention network to obtain multi-channel fusion features; d. and building a prediction network comprising a classification network, an IOU network and a regression network, and inputting the multi-channel fusion characteristics into the prediction network to obtain a prediction result. The embodiment of the invention provides a DARFNet model-based parking lot vehicle detection algorithm, which effectively improves the vehicle detection precision in a dense small target scene of a parking lot.

Description

Parking lot vehicle detection method based on DARFNet model
Technical Field
The invention relates to the technical field of image processing, in particular to a vehicle detection method of a parking lot image.
Background
With the rapid development of computer vision technology, various characteristics of image targets can be easily obtained through related technologies of deep learning. Dense target environments have become an important research target, and the dense scenes of most current interest are typically human scenes, including crowded pedestrian detection, crowded parking lot vehicle detection, and dense shopping mall item detection. Such datasets differ from conventional detection datasets, such as COCO, VOC, primarily because such datasets have greater coverage between target objects and there are a large number of small target objects. Under such circumstances, it is becoming increasingly necessary to quickly and accurately identify the target. Therefore, a new detection algorithm needs to be designed to solve the target detection problem of the dense scene. The main problem is that small target and multi-scale detection are a very difficult problem to solve in target detection. In addition, conventional NMS algorithms can provide satisfactory accuracy for most isolated target detection tasks. But conventional NMS algorithms may result in lower accuracy and recall due to over-suppression of prediction boxes in scenarios where there are a large number of overlapping objects.
Target detection is a basic problem in the field of computer vision, and vehicle detection is an important component, and is widely applied to traffic information acquisition and urban road network planning in an intelligent traffic system. Object detection in dense object images is a real challenge compared to general image object detection tasks, mainly for two reasons. One reason is that objects typically have a variety of shapes, colors, and poses. Another reason is that the detection model is easily affected by different weather conditions and lighting. In addition, there is an impact of texture on the object detection task, such as traffic signals on rocks, buildings and roads, which may interfere with vehicle detection. Usually, the vehicles in the unmanned aerial vehicle image are usually small, and the mobile vehicle detection is often influenced by motion blur, target occlusion and the like. The drone vehicle detection data set studied herein includes data for many ambient background disturbances and different lighting environments. These problems increase the difficulty of designing algorithms.
Many methods have been proposed by the academia for general object detection tasks. These methods can be broadly divided into two categories: a two-phase model and a one-phase model. In the two-stage model, the lateral connections of the feature-based pyramid network with the region suggestion network (RPN) are used to filter negative examples, and then classification and regression branches are used to generate the final prediction box. In a one-stage model, the problem is handled using a pyramid network based on a simplified feature without top-bottom connections. These networks typically use special loss functions or sampling methods to avoid interference of a large number of background boxes with the detection training process. While these two types of models can achieve high-precision detection on generic data sets, they do not perform well in crowded environments, particularly on data sets that contain many small targets.
Disclosure of Invention
Aiming at the defects existing in the parking lot small target detection method in the existing target detection technology, a DARF network module is provided to expand the receptive field in the general SSD model and an IOU prediction branch to obtain a threshold value, and the threshold value can prevent the traditional NMS algorithm from removing too many proposal boxes. DARF network modules we use consist of one densely connected RFB module. We used dense concatenations similar to the modules of DenseNet and densesaspp to concatenate the signatures from different expanded convolution kernels to further expand the receptive field range. We propose an improved NMS algorithm based further innovation in predicting the IOU value. In summary, our method contributes the following: the receptive field is further expanded by utilizing channel dense connection, and partial interference information is filtered by utilizing a channel attention module, and the characteristic interference of similar background objects is reduced; a new multi-hole convolution branch module DARF is provided; aiming at the problem that target boxes are easy to delete due to high overlapping rate in an excessively dense target post-processing NMS algorithm, a branch for predicting IOU is provided, and the product of the confidence of classified branch prediction and the predicted IOU is used as the confidence of the NMS algorithm. The main modules are described as follows: the module consists of a dense receptor field module and a channel attention module. A typical RFB module includes three branches including convolutions of three convolution kernels of different sizes. Prior to hole convolution, the feature map is processed using a convolution kernel. In addition, dense connection can obtain denser proportion and pixel sampling, which is helpful for feature acquisition of small targets. After merging the feature maps obtained from the different hole convolution kernels, we propose to use a simplified version of the channel attention module to highlight the channels of interest and suppress the noise information. To avoid introducing excessive parameters, the two fully-connected layers commonly used in general channel attention modules are not applicable here, and only one fully-connected layer is used to learn feature transformations. Attention weights are then obtained using the Sigmoid activation layer and the original feature maps are finally multiplied. Finally, the feature map before the DARF module and the feature map after the dilation convolution need to be combined in an accumulative manner.
In summary, there is provided a real-time detector for parking lot detection, comprising the steps of:
step S1: providing a parking lot image to be detected, and extracting features of the parking lot image by using a backbone network to obtain corresponding image features extracted by the backbone network.
Step S2: building a lightweight network, wherein the network comprises four convolutional layers; and inputting the parking lot image to be detected into the lightweight network to obtain the initial features extracted by the corresponding lightweight network.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises a connecting layer, a fusion layer and a convolution layer; and inputting the image preliminary features respectively obtained in the S1 and the S2 into a multi-channel fusion self-attention network to obtain multi-channel fusion features.
Step S4: building a DARF network module, wherein the network comprises a connecting layer, a fusion layer, a convolution layer and an average pooling layer; and inputting the mapping features obtained in the S3 into the DARF network to obtain multi-channel fusion features.
Step S5: building a network module of a fusion layer, wherein the network comprises a connecting layer, the fusion layer and a convolution layer; and inputting the mapping feature obtained in the step S4 into a feature fusion network to obtain a multi-channel fusion feature.
Step S6: building a prediction network, wherein the prediction network comprises a classification network, an IOU network and a regression network; and inputting the multi-channel fusion characteristics obtained in the step S5 into the prediction network to obtain a prediction result.
Optionally, in an embodiment of the present invention, the input of step S1 is a parking lot image of H × W (H, W respectively indicate the length and width of the parking lot image), and the multi-layer residual mapping feature of the image is extracted through an SSD network and indicated by R-Conv-1-4.
Optionally, in an embodiment of the present invention, the lightweight network of step S2 does not undergo a pre-training process. The network is divided into four layers, and each layer outputs network characteristics of different layers. And is represented by Q-Conv-1 to 4.
Optionally, in an embodiment of the present invention, the multi-channel converged self-attention network of step S3 has two inputs, which are output characteristics shown in step 1 and step 2, respectively. The product and convolution operations are performed on the two input features. A total of four pairs of inputs result in four outputs, and are denoted by C-Conv-1-4.
Optionally, in an embodiment of the present invention, the input of step S4 is the mapping characteristics C-Conv-1-4 obtained in step S3. This network module is called a DARF, and includes convolutional layers containing three convolutional kernels of different sizes in a DARF network. Can be expressed as:
Ci=F3,i(A) i=1
Ci=F3,i(Concat(A,Ci-1)) i=2,3
wherein, CiFor different levels of output features, F3,iAnd C-Conv-1-4 obtained in the step 3 are obtained by convolution of holes with different sizes. Finally, the output of S is obtained and expressed by B-Conv-1-4.
Optionally, in an embodiment of the invention, the method is characterized in that after obtaining B-Conv-1-4, an attention mechanism module is followed, wherein the attention mechanism module only uses one full connection layer to learn feature conversion. Attention weights are then obtained using the Sigmoid activation layer, and the original feature map is multiplied with it. Finally, the feature map before the DARF module and the feature map after the dilation convolution need to be combined in an accumulative manner.
Optionally, in an embodiment of the present invention, the IOU prediction branch in step S6 is composed of a convolution layer and a Sigmoid activation layer of a 3x3 convolution kernel. The activation layer is used to ensure that the IOU prediction value is in the [0,1] range.
fiFor confidence of classification branch prediction, IOUiIs the IOU value of the IOU branch prediction, a is the parameter used to adjust the product ratio, scoreiThen is the score of the final prediction, the confidence calculation for each prediction box can be expressed as:
Figure BDA0003273709220000051
the confidence level of the original prediction is replaced by the one with the largest score in the iteration of the NMS algorithm. May particularly be expressed as
Figure BDA0003273709220000061
In the formula bmIs a detection box with higher confidence selected in the iterative process of NMS algorithm, scorejIs the corresponding confidence, scorej' is the adjusted confidence and Ω is the threshold.
The invention provides a novel parking lot image vehicle detector. The DARFNet model provided by the invention can improve the accuracy of the detection result and obviously reduce the number of lost targets by utilizing a specific feature fusion module and an IOU prediction branch module. The invention can also achieve better performance than the most advanced real-time target detection model at present. The visualization result shows that the algorithm can adapt to different scenes, including dense scenes and sparse scenes, and different illumination conditions. The model can be widely applied to remote sensing detection tasks such as traffic monitoring, military target identification and the like. In summary, our method contributes the following:
1) the invention further researches the problem of feature fusion in a target detection model and provides a DARF module to fuse feature layers obtained by different convolutions. The system consists of a receptive field module with a plurality of dense connection layers and a lightweight channel attention module, and a larger receptive field can be obtained from a final characteristic diagram.
2) The invention provides a new IOU value prediction branch, the predicted overlap ratio is obtained in the prediction stage of detection result generation, and the product of the predicted overlap ratio and the predicted value of the classification branch is used as the confidence coefficient of the prediction frame, and the branch can avoid removing excessive prediction frames.
3) The results of comparison experiments performed by using the similar model and the model provided by the invention show that the detection performance can be obviously improved by adopting the specific characteristic fusion module and the IOU branch module. The results of the tests on the CARPK and PUCPR data sets show that the model proposed by the present invention performs best, while the model proposed by us has real-time processing speed.
Drawings
FIG. 1 shows a schematic flow diagram of a model for a parking lot image real-time detector of the present invention;
figure 2 shows a simple flow diagram of the DARF of the present invention. The module includes a feature fusion block and a channel attention block.
Fig. 3 shows a comparison of the visualization of the present invention on a parking lot data set with different real-time object detection methods.
Detailed description of the invention
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention. Referring to fig. 1, the method of an embodiment of the present invention operates as follows: s1, providing a parking lot image to be detected, and performing feature extraction processing on the parking lot image to obtain corresponding image preliminary features; s2, building a lightweight network, and inputting the parking lot image to be detected into the lightweight network to obtain the preliminary characteristics of the image; s3, building a multi-channel hybrid fusion self-attention network; and inputting the mapping characteristics into a multi-channel fusion self-attention network to obtain the multi-channel fusion characteristics. And S4, building a prediction network comprising a classification network, an IOU network and a regression network, and inputting the multi-channel fusion characteristics into the prediction network to obtain a prediction result.
In step S1, a parking lot image H multiplied by W (H, W respectively represents the length and width of the parking lot image) is input, and the multi-layer residual error mapping characteristics of the image are extracted through an SSD network and are represented by R-Conv-1-4.
For step S2, the network has two inputs. The network is divided into four layers, and each layer outputs network characteristics of different layers. And is represented by Q-Conv-1 to 4.
For step S3, the IOU prediction branch consists of a convolution layer of 3x3 convolution kernels and a Sigmoid activation layer. The activation layer is used to ensure that the IOU prediction value is in the [0,1] range.
fiFor confidence of classification branch prediction, IOUiIs the IOU value of the IOU branch prediction, a is the parameter used to adjust the product ratio, scoreiThen is the score of the final prediction, the confidence calculation for each prediction box can be expressed as:
Figure BDA0003273709220000081
the confidence level of the original prediction is replaced by the one with the largest score in the iteration of the NMS algorithm. May particularly be expressed as
Figure BDA0003273709220000082
In the formula bmIs a detection box with higher confidence selected in the iterative process of NMS algorithm, scorejIs the corresponding confidence, scorej' is the adjusted confidence and Ω is the threshold.

Claims (7)

1. A method for detecting vehicles in a parking lot based on a DARFNet model is characterized by comprising the following steps:
step S1: providing a parking lot image to be detected, and extracting features of the parking lot image by using a backbone network to obtain corresponding image features extracted by the backbone network.
Step S2: building a lightweight network, wherein the network comprises four convolutional layers; and inputting the parking lot image to be detected into the lightweight network to obtain the initial features extracted by the corresponding lightweight network.
Step S3: building a multi-channel fusion self-attention network, wherein the network comprises a connecting layer, a fusion layer and a convolution layer; and inputting the image preliminary features respectively obtained in the S1 and the S2 into a multi-channel fusion self-attention network to obtain multi-channel fusion features.
Step S4: building a DARF network module, wherein the network comprises a connecting layer, a fusion layer, a convolution layer and an average pooling layer; and inputting the mapping features obtained in the S3 into the DARF network to obtain multi-channel fusion features.
Step S5: building a network module of a fusion layer, wherein the network comprises a connecting layer, the fusion layer and a convolution layer; and inputting the mapping feature obtained in the step S4 into a feature fusion network to obtain a multi-channel fusion feature.
Step S6: building a prediction network, wherein the prediction network comprises a classification network, an IOU network and a regression network; and inputting the multi-channel fusion characteristics obtained in the step S5 into the prediction network to obtain a prediction result.
2. The method according to claim 1, wherein the input of step S1 is H × W (H, W respectively represents the length and width of the parking lot image) parking lot image, and the multi-layer residual mapping feature of the image is extracted through SSD network and represented by R-Conv-1 ~ 4.
3. The method according to claim 1, wherein the lightweight network of step S2 is not pre-trained. The network is divided into four layers, and each layer outputs network characteristics of different layers. And is represented by Q-Conv-1 to 4.
4. The method according to claim 1, wherein the multi-channel converged self-attention network of step S3 has two inputs, which are output characteristics shown in step 1 and step 2, respectively. The product and convolution operations are performed on the two input features. A total of four pairs of inputs result in four outputs, and are denoted by C-Conv-1-4.
5. The method according to claim 1, wherein the input of step S4 is the mapping characteristics C-Conv-1-4 obtained in step 3. This network module is called a DARF, and includes convolutional layers containing three convolutional kernels of different sizes in a DARF network. Can be expressed as:
Ci=T3,i(A)i=1
Ci=F3,i(Concat(A,Ci-1))i=2,3
wherein, CiFor different levels of output features, F3,iAnd C-Conv-1-4 obtained in the step 3 are obtained by convolution of holes with different sizes. Finally, the output of S is obtained and expressed by B-Conv-1-4.
6. The method of claim 5, wherein obtaining B-Conv-1-4 is followed by an attention mechanism module, wherein the attention mechanism module learns feature transformations using only one fully connected layer. Attention weights are then obtained using the Sigmoid activation layer, and the original feature map is multiplied with it. Finally, the feature map before the DARF module and the feature map after the dilation convolution need to be combined in an accumulative manner.
7. The method of claim 1, wherein the IOU prediction branch of step S6 is composed of convolution layer and Sigmoid activation layer of 3x3 convolution kernel. The activation layer is used to ensure that the IOU prediction value is in the [0,1] range.
fiIs classified into branchesConfidence of prediction, IOUiIs the IOU value of the IOU branch prediction, a is the parameter used to adjust the product ratio, scoreiThen is the score of the final prediction, the confidence calculation for each prediction box can be expressed as:
Figure FDA0003273709210000031
the confidence level of the original prediction is replaced by the one with the largest score in the iteration of the NMS algorithm. May particularly be expressed as
Figure FDA0003273709210000032
In the formula bmIs a detection box with higher confidence selected in the iterative process of NMS algorithm, scorejIs the corresponding confidence, score'jIs the adjusted confidence level and Ω is the threshold.
CN202111118334.5A 2021-09-22 2021-09-22 Parking lot vehicle detection method based on DARFNet model Pending CN113901897A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111118334.5A CN113901897A (en) 2021-09-22 2021-09-22 Parking lot vehicle detection method based on DARFNet model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111118334.5A CN113901897A (en) 2021-09-22 2021-09-22 Parking lot vehicle detection method based on DARFNet model

Publications (1)

Publication Number Publication Date
CN113901897A true CN113901897A (en) 2022-01-07

Family

ID=79029216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111118334.5A Pending CN113901897A (en) 2021-09-22 2021-09-22 Parking lot vehicle detection method based on DARFNet model

Country Status (1)

Country Link
CN (1) CN113901897A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495060A (en) * 2022-01-25 2022-05-13 青岛海信网络科技股份有限公司 Road traffic marking identification method and device
CN115410189A (en) * 2022-10-31 2022-11-29 松立控股集团股份有限公司 Complex scene license plate detection method
CN115908298A (en) * 2022-11-10 2023-04-04 苏州慧维智能医疗科技有限公司 Method for predicting polyp target in endoscopic image, model and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495060A (en) * 2022-01-25 2022-05-13 青岛海信网络科技股份有限公司 Road traffic marking identification method and device
CN114495060B (en) * 2022-01-25 2024-03-26 青岛海信网络科技股份有限公司 Road traffic marking recognition method and device
CN115410189A (en) * 2022-10-31 2022-11-29 松立控股集团股份有限公司 Complex scene license plate detection method
CN115908298A (en) * 2022-11-10 2023-04-04 苏州慧维智能医疗科技有限公司 Method for predicting polyp target in endoscopic image, model and storage medium
CN115908298B (en) * 2022-11-10 2023-10-10 苏州慧维智能医疗科技有限公司 Target prediction method, model and storage medium for polyp in endoscopic image

Similar Documents

Publication Publication Date Title
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN113901897A (en) Parking lot vehicle detection method based on DARFNet model
CN108875608B (en) Motor vehicle traffic signal identification method based on deep learning
JP2022515895A (en) Object recognition method and equipment
CN111222396A (en) All-weather multispectral pedestrian detection method
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN110348383B (en) Road center line and double line extraction method based on convolutional neural network regression
CN107944354B (en) Vehicle detection method based on deep learning
KR101908481B1 (en) Device and method for pedestraian detection
CN112966747A (en) Improved vehicle detection method based on anchor-frame-free detection network
CN111461221A (en) Multi-source sensor fusion target detection method and system for automatic driving
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN112115871B (en) High-low frequency interweaving edge characteristic enhancement method suitable for pedestrian target detection
CN113962281A (en) Unmanned aerial vehicle target tracking method based on Siamese-RFB
CN116597411A (en) Method and system for identifying traffic sign by unmanned vehicle in extreme weather
CN116630932A (en) Road shielding target detection method based on improved YOLOV5
CN115115973A (en) Weak and small target detection method based on multiple receptive fields and depth characteristics
CN114220087A (en) License plate detection method, license plate detector and related equipment
CN114049532A (en) Risk road scene identification method based on multi-stage attention deep learning
CN111160282B (en) Traffic light detection method based on binary Yolov3 network
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN113077496A (en) Real-time vehicle detection and tracking method and system based on lightweight YOLOv3 and medium
CN111062384A (en) Vehicle window accurate positioning method based on deep learning
CN111476075A (en) Object detection method and device based on CNN (convolutional neural network) by utilizing 1x1 convolution
CN112446292B (en) 2D image salient object detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination