CN114648736B - Robust engineering vehicle identification method and system based on target detection - Google Patents

Robust engineering vehicle identification method and system based on target detection Download PDF

Info

Publication number
CN114648736B
CN114648736B CN202210538060.3A CN202210538060A CN114648736B CN 114648736 B CN114648736 B CN 114648736B CN 202210538060 A CN202210538060 A CN 202210538060A CN 114648736 B CN114648736 B CN 114648736B
Authority
CN
China
Prior art keywords
attention
feature map
correction
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210538060.3A
Other languages
Chinese (zh)
Other versions
CN114648736A (en
Inventor
王中元
李云浩
陈世杰
邵振峰
何政
邓练兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202210538060.3A priority Critical patent/CN114648736B/en
Publication of CN114648736A publication Critical patent/CN114648736A/en
Application granted granted Critical
Publication of CN114648736B publication Critical patent/CN114648736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a robust engineering vehicle identification method and a robust engineering vehicle identification system based on target detection.A multi-scale feature extraction network is adopted to extract a feature map from a video frame image to be identified, and a feature enhancement network based on an attention mechanism is used for feature enhancement to obtain an enhanced feature map; and then inputting the acquired enhanced feature map into a position detection network to predict a target position, filtering low-quality position prediction through post-processing, extracting an interested region in the feature map according to the predicted target position, and then inputting a cascaded position correction and category prediction network and performing post-processing to obtain a final target position and type. The method and the device can accurately detect the position and the type of the engineering vehicle in the scene in the power grid monitoring video with complex scene changes, thereby providing an automatic external damage behavior monitoring means, reducing the labor cost and ensuring the safety of a power grid system.

Description

Robust engineering vehicle identification method and system based on target detection
Technical Field
The invention belongs to the technical field of computer vision, relates to a computer aided engineering vehicle detection method and system, and particularly relates to a robust engineering vehicle identification method and system based on target detection.
Technical background
Transportation of power resources depends on establishing a safe and powerful power grid system, and in reality, outdoor power transmission lines often have a plurality of potential safety hazards. The external damage caused by construction vehicles is the most common human unsafe factor, such as damage to a high-voltage telegraph pole caused by illegal construction of an excavator, mistaken touch of a high-voltage electric wire due to improper operation of a crane and the like. In order to avoid the accidents, related departments place a plurality of monitoring cameras on the power transmission line of the power grid, but the cost and the negligence results are difficult to bear only by manpower monitoring. If the target detection technology can be applied to video monitoring, the engineering vehicles in the monitoring scene are automatically analyzed and identified by utilizing the algorithm, so that the labor cost is reduced, false alarm and missed alarm caused by human negligence are eliminated, a measure for giving out warning before the accident occurs, assisting in handling when the accident occurs and conveniently and timely obtaining evidence after the accident occurs is provided, and the stable operation of the power grid is ensured.
Object detection is a hot direction for computer vision and digital image processing. In recent years, new ideas for detecting targets based on key points or based on central points without using anchor frames have emerged in the field of target detection. CornerNet firstly regards target detection as a key point detection and combination task, positions a pair of key points, namely an upper left corner point and a lower right corner point of a rectangular region where a target is located through a thermodynamic diagram, and then performs key point combination pairing to obtain a target boundary frame; the extreme points of the top point, the bottom point, the left point and the right point of the target and five key points of the central point are detected by the extreme points, and the four extreme points are grouped through the central point to obtain the position prediction of the target; the CenterNet directly simplifies the target detection task into that the geometric center point of a target boundary box is predicted by adopting a key point detection mode, and then the size of the boundary box is regressed, so that the target is positioned; the CenterNet2 integrates the CenterNet into a two-stage framework, and trains by adopting a probability optimization target, combines the advantages of an anchor-frame-free single-stage algorithm and a two-stage algorithm, accelerates the reasoning time of the two-stage algorithm, and greatly improves the detection precision of the model.
In a power grid transmission line monitoring video in a real scene, the scene or background of an image is very complex, different weather, illumination and other conditions can be involved, the image is limited by diversified software and hardware parameters of different cameras, the captured images are different in view angle, definition, resolution and other aspects, and the scales of different target examples in the image are also different greatly. In addition, the problem of overlapping or densely distributing extremely small objects, sheltered objects and a plurality of objects is very troublesome, and the engineering vehicle has the characteristic of variable forms. How to overcome these problems, detect and identify the engineering vehicle with higher precision, and obtain a robust detection and identification system is a difficult problem faced by the prior art.
Disclosure of Invention
In order to solve the technical problems, the invention provides a robust engineering vehicle identification method and system based on target detection by combining a target detection algorithm based on a central domain.
The method adopts the technical scheme that: a robust engineering vehicle identification method based on target detection comprises the following steps:
step 1: extracting a feature map from a video frame image to be identified by adopting a multi-scale feature extraction network, and performing feature enhancement by using a feature enhancement network based on an attention mechanism to obtain an enhanced feature map;
step 2: inputting the enhanced feature map obtained in the step 1 into a position detection network so as to predict the target position of the engineering vehicle, filtering low-quality position prediction through post-processing, then extracting an interested area in the feature map according to the predicted target position, and then inputting a cascaded position correction and category prediction network and performing post-processing to obtain the final target position and the type of the engineering vehicle.
The technical scheme adopted by the system of the invention is as follows: a robust engineering vehicle identification system based on target detection comprises the following modules:
the module 1 is used for extracting a feature map from a video frame image to be identified by adopting a multi-scale feature extraction network, and performing feature enhancement by using a feature enhancement network based on an attention mechanism to obtain an enhanced feature map;
and the module 2 is used for inputting the enhanced feature map acquired by the module 1 into a position detection network so as to predict the target position of the engineering vehicle, filtering low-quality position prediction through post-processing, then extracting an interested region in the feature map according to the predicted target position, and then inputting the cascaded position correction and category prediction network and performing post-processing to obtain the final target position and type of the engineering vehicle.
Compared with the existing detection method, the method has the following advantages and positive effects:
(1) the feature extraction operation in the invention is divided into two parts of extraction and enhancement, the enhancement operation enables the extracted features to be more accurate and effective, effective reference information is provided for subsequent modules, and in addition, the detection precision is higher by using a cascade position correction and category prediction module to adjust the predicted categories and positions for many times.
(2) The invention applies the detection algorithm without the anchor frame, does not need to adjust parameters related to the anchor frame, and reduces the extra calculation amount and reasoning time caused by densely laying the anchor frame on the image.
(3) The method has better effect on the extremely small target and the deformed complex target.
Drawings
FIG. 1: a method flowchart of an embodiment of the invention.
FIG. 2: the multi-scale feature extraction network structure chart provided by the embodiment of the invention.
FIG. 3: the feature enhancement network structure diagram of the embodiment of the invention.
FIG. 4: the position increase detection network result diagram of the embodiment of the invention.
FIG. 5: the invention discloses a network structure diagram for position correction and category prediction.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and the implementation examples, it is to be understood that the implementation examples described herein are only for the purpose of illustration and explanation and are not to be construed as limiting the present invention.
Referring to fig. 1, the robust engineering vehicle identification method based on target detection provided by the invention comprises the following steps:
step 1: extracting a feature map from a video frame image to be identified by adopting a multi-scale feature extraction network, and performing feature enhancement by using a feature enhancement network based on an attention mechanism to obtain an enhanced feature map;
in this embodiment, the specific implementation of step 1 includes the following substeps:
step 1.1: the video frame image to be identified is
Figure 826611DEST_PATH_IMAGE001
Inputting a multi-scale feature extraction network, wherein the scale
Figure 212593DEST_PATH_IMAGE002
So as to obtain a group of original feature maps with different scales
Figure 851384DEST_PATH_IMAGE003
ChIs the number of channels; in this exampleChA value of 256;
step 1.2: the original feature map is obtained
Figure 715435DEST_PATH_IMAGE003
Inputting a feature enhancement network based on an attention mechanism in a scale-by-scale manner, enhancing the response of the interested region in the feature map, and obtaining a group of enhanced feature maps
Figure 123283DEST_PATH_IMAGE004
And 2, step: inputting the enhanced feature map obtained in the step 1 into a position detection network so as to predict the target position of the engineering vehicle, filtering low-quality position prediction through post-processing, then extracting an interested region in the feature map according to the predicted target position, and then inputting a cascaded position correction and category prediction network and performing post-processing to obtain the final target position and type of the engineering vehicle.
In this embodiment, the specific implementation of step 2 includes the following substeps:
step 2.1: entering the enhanced feature graph one by one based on the centerrNet improved position detection network with high recall rate and strong reliable foreground and background classification capability, thereby obtaining N pairs (large quantity) of position prediction maps corresponding to input images
Figure 414587DEST_PATH_IMAGE005
And key point thermodynamic diagrams
Figure 478358DEST_PATH_IMAGE006
Here, each position of the position prediction map corresponds to a position prediction value
Figure 270733DEST_PATH_IMAGE007
Respectively representing the distance from the target center point to the upper left side, the lower right side and the four lower left sides of the circumscribed rectangle, wherein the confidence coefficient of the target key point of the engineering vehicle exists at the corresponding position of each position indication image of the key point thermodynamic diagram;
step 2.2: extracting K (a larger integer) positions with peak responses from the key point positioning thermodynamic diagram as predicted central domain key points of the engineering vehicle target
Figure 408454DEST_PATH_IMAGE008
And predicting the corresponding position from the key points
Figure 995293DEST_PATH_IMAGE009
K rectangular prediction areas possibly containing any engineering vehicle are obtained through calculation
Figure 218464DEST_PATH_IMAGE010
Using D-IoU NMS to filter partial low-quality prediction results to obtain no more than M rectangular prediction areas; wherein M is a preset value;
Figure 548951DEST_PATH_IMAGE011
respectively representing the coordinates of the upper left corner points of the rectangular prediction region
Figure 541178DEST_PATH_IMAGE012
And width
Figure 830077DEST_PATH_IMAGE013
Height of
Figure 540544DEST_PATH_IMAGE014
Figure 409142DEST_PATH_IMAGE015
Figure 521455DEST_PATH_IMAGE016
(ii) a In this embodiment, the value of M during training and reasoning is 4000 and 1000, respectively;
step 2.3: cutting and pooling the enhanced feature map obtained in the step 1 according to the rectangular prediction area calculated in the step 2.2 to obtain a feature map only containing a rectangular region of interest with uniform size; in this embodiment, the size of the cut enhanced feature map is 14 × 14;
step 2.4: inputting the characteristic diagram obtained in the step 2.3 into a cascade position correction and class prediction network to obtain a positioning precision perception classification confidence coefficient
Figure 653359DEST_PATH_IMAGE017
And position correction value
Figure 975756DEST_PATH_IMAGE018
Here, theN cls The number of the categories of the construction vehicle is represented,
Figure 788991DEST_PATH_IMAGE019
representing a rectangular region
Figure 886303DEST_PATH_IMAGE020
Correction values of coordinates of the upper left corner and width and height; in this embodiment, the positioning accuracy perception classification confidence is an average value obtained by three corrections.
The position correction value predicted when the cascade position correction and the class prediction network are corrected for the first time in step 2.4 is applied to the rectangular prediction area obtained in step 2.2
Figure 126792DEST_PATH_IMAGE021
The second correction applies the position correction value predicted for the second time to the rectangular prediction region corrected for the first time, and the third correction applies the position correction value predicted for the third time to the rectangular prediction region corrected for the second time. And then filtering redundant rectangular prediction regions by using D-IoUNMS for the result after the last correction, and shielding the rectangular prediction regions lower than the threshold value through a confidence threshold value to obtain the final detection results of all the engineering vehicles in the input image, namely the types of the engineering vehicles and the positions of the rectangular regions where the engineering vehicles are located. In the embodiment, the maximum number of the residual rectangular prediction regions after the D-IoUNMS filtering is respectively set to 2000 and 256 in the training and reasoning stages; the confidence threshold is 0.6.
Referring to fig. 2, the multi-scale feature extraction network adopted in the present embodiment is composed of a deep convolutional neural network and a multi-scale feature fusion layer; the last 6 layers of output of the deep convolutional neural network are convolved to generate feature maps C2, C3, C4, C5, C6 and C7 with the same channel number; c7 and C6 output P7 and P6 after passing through the multi-scale feature fusion layer; the P5 is obtained by splicing and convolution-fusing the result of the up-sampling of the P6 and the output of the attention gate module, wherein the attention gate module uses C5 and P6 as input to generate a feature map after the attention is exerted; the P4 is obtained by splicing and convolution-fusing the result of the up-sampling of the P5 and the output of the attention gate module, wherein the attention gate module uses C4 and P5 as input to generate a feature map after the attention is exerted; the P3 is obtained by splicing and convolution-fusing the result of the up-sampling of the P4 and the output of the attention gate module, wherein the attention gate module uses C3 and P4 as input to generate a feature map after the attention is exerted; the P2 is obtained by splicing the result of the upsampling of the P3 and the output of the attention gate module and carrying out convolution fusion, wherein the attention gate module uses C2 and P3 as inputs to generate a feature map after the attention is exerted.
Referring to fig. 3, the feature enhancement network adopted in the present embodiment includes an attention gate module, a cross-layer fusion scale perception attention module, a spatial perception self-attention module, and a task perception channel attention module;
in the feature enhancement network of the embodiment, P2, P3, P4, P5, P6 and P7 output by the multi-scale feature extraction network are used as input to obtain intermediate results a2, A3, a4, a5, A6 and a 7; wherein, A2 and A7 are directly obtained from P2 and P7; a3 is obtained by splicing the down-sampling result of A2 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P3 and a lower layer feature map A2 as input to generate a feature map after applying attention; a4 is obtained by splicing the down-sampling result of A3 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P4 and a lower layer feature map A3 as input to generate a feature map after attention is exerted; a5 is obtained by splicing the down-sampling result of A4 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P5 and a lower layer feature map A4 as input to generate a feature map after attention is exerted; a6 is obtained by splicing the down-sampling result of A5 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P6 and a lower layer feature map A5 as input to generate a feature map after attention is exerted; intermediate results A2, A3, A4, A5, A6 and A7 pass through a cross-layer fusion scale perception attention module, a spatial perception self-attention module and a task perception channel attention module which are connected in series to obtain final output feature maps F2, F3, F4, F5, F6 and F7.
Referring to fig. 4, in the position detection network adopted in this embodiment, input data sequentially passes through the deformable rolling block, the dynamic linear modification unit, the rolling block, the deformable rolling block, and the dynamic linear modification unit, and then output respectively passes through two paths, one path is input into the rolling block and the Scale layer to obtain a position prediction map, and the other path passes through the rolling block to obtain a key point thermodynamic map.
The position detection network of the present embodiment generates a position prediction map and a key point thermodynamic map by using an enhanced feature map output by a feature enhancement network as an input. The position prediction graph and the key point thermodynamic diagrams have the same size as the input enhanced feature graph, the former gives a predicted value of the position of the engineering vehicle target corresponding to each key point position in the input feature graph, and the latter predicts the confidence coefficient of the key point of the engineering vehicle target at each position in the input feature graph. The positions of the key points of the engineering vehicle can be obtained through the peak value response points on the key point thermodynamic diagram, and the corresponding position prediction values given by the position prediction diagram are combined, so that the positions of the rectangular prediction areas suspected of containing the engineering vehicle targets are obtained.
Referring to fig. 5, the location correction and category prediction network adopted in this embodiment includes a first correction module, a second correction module, and a third correction module, which are connected in sequence, where the first correction module, the second correction module, and the third correction module are all composed of a clipping and pooling layer, a location sensing category prediction branch, and a location correction branch;
the position correction and category prediction network of the embodiment inputs a rectangular prediction area output by the position prediction network and an enhanced feature map output by a feature enhancement network; when first correction is carried out in a first correction module, cutting out sub-regions of an enhanced feature map through a rectangular prediction region, pooling the sub-regions into feature maps with uniform sizes, inputting a positioning perception type prediction branch and a position correction branch to obtain a positioning perception classification confidence coefficient and a position correction value, and applying the position correction value to the rectangular prediction region to obtain a new corrected rectangular prediction region; the second correction is carried out in the second correction module, the third correction is carried out in the third correction module, and the first correction is carried out in the first correction module; the final confidence of the location-aware classification is an average of the confidences obtained by the three corrections, and the final detection result of the location correction and class prediction network of the embodiment is the result after the last correction.
The multi-scale feature extraction network of the embodiment is a well-trained multi-scale feature extraction network; the training process comprises the following steps:
(1) constructing a self-supervision training positive sample pair;
aiming at label-free self-supervision training data, in the process of each round of iterative training, N pictures are randomly extracted without being replaced to be used as a group, and each image in the group is randomly subjected to the following data augmentation operations twice, including cutting, color changing, shielding and rotating, to form a pair of positive sample images
Figure 670906DEST_PATH_IMAGE022
Thereby obtaining a set of positive sample image pairs
Figure 22253DEST_PATH_IMAGE023
(ii) a Wherein the content of the first and second substances,
Figure 171474DEST_PATH_IMAGE024
and
Figure 707498DEST_PATH_IMAGE025
namely, each image is randomly amplified twice to obtain an image pair;
(2) the obtained positive sample image pair
Figure 614274DEST_PATH_IMAGE023
Inputting the data into a multi-scale feature extraction network to obtain a group of positive sample feature map pairs
Figure 628366DEST_PATH_IMAGE026
Then inputting the characteristic diagram pairs into a characterization mapping layer composed of a full connection layer and a linear correction unit one by one to obtain a group of positive sample characterization pairs
Figure 569777DEST_PATH_IMAGE027
(ii) a Wherein the content of the first and second substances,Srepresenting the number of different scale feature graphs in the multi-scale feature extraction network;
(3) self-supervision training multi-scale feature extraction network;
for each scale, respectively calculating pairwise similarity of all positive sample characterization pairs, and then averaging; n pairs of token pairs, 2N token vectors;
and respectively calculating the loss of pairwise similarity of positive sample characterization pairs by each scale as follows:
Figure 276702DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 936354DEST_PATH_IMAGE029
to indicate a function whenk=iTaking 0 when the current value is zero, or taking 1 when the current value is zero;
Figure 754137DEST_PATH_IMAGE030
to characterize pairs
Figure 284476DEST_PATH_IMAGE031
The degree of similarity of the cosine of (c),
Figure 162302DEST_PATH_IMAGE032
is a hyper-parameter;
the final comparison and self-supervision learning pre-task loss function is as follows:
Figure 371566DEST_PATH_IMAGE033
and maximizing a contrast loss function, and obtaining a well-trained multi-scale feature extraction network.
In the embodiment, the multi-scale feature extraction network, the feature enhancement network, the position detection network and the cascade position correction and category prediction network are a trained integral network;
(1) constructing a training data set:
randomly performing a series of random data enhancement processing on the labeled training data to obtain an augmented training data set;
(2) training an integral network:
the training targets of the overall network of the embodiment are as follows:
Figure 868407DEST_PATH_IMAGE034
in the formula (I), the compound is shown in the specification,
Figure 643465DEST_PATH_IMAGE035
the target is trained for the foreground class,
Figure 567558DEST_PATH_IMAGE036
training a target for a background;C k The category of the work vehicle is represented,
Figure 60856DEST_PATH_IMAGE037
a background class is represented that is, for example,
Figure 423705DEST_PATH_IMAGE038
indicating that the position detection network produces a foreground rectangular area,
Figure 194215DEST_PATH_IMAGE039
indicating that the position detection network generates a background rectangular area;
the loss function of the location detection network of this embodiment is:
Figure 148264DEST_PATH_IMAGE040
wherein the content of the first and second substances,l loc for the loss of Distance-IoU,l hm subscript for binary focal losses
Figure 4225DEST_PATH_IMAGE041
Representing a real label value corresponding to the predicted value;boxrepresenting the rectangular area prediction value obtained by the location detection network,hma key point thermodynamic diagram representing a location detection network prediction;
the loss function of the cascaded position correction and class prediction network of the embodiment is as follows:
Figure 233081DEST_PATH_IMAGE042
wherein K represents the number of corrections;l cls in order for the softmax cross-entropy loss,l reg subscript for smoothL1 loss
Figure 920414DEST_PATH_IMAGE041
Representing a real label value corresponding to the predicted value;clsjoint prediction of positioning accuracy and classes representing positioning-aware class prediction branch generationThe value of the measured value is measured,deltaa position correction value representing a position corrected branch prediction.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A robust engineering vehicle identification method based on target detection is characterized by comprising the following steps:
step 1: extracting a feature map from a video frame image to be identified by adopting a multi-scale feature extraction network, and performing feature enhancement by using a feature enhancement network based on an attention mechanism to obtain an enhanced feature map;
the multi-scale feature extraction network consists of a deep convolutional neural network and a multi-scale feature fusion layer; the last 6 layers of output of the deep convolutional neural network are convolved to generate feature maps C2, C3, C4, C5, C6 and C7 with the same channel number; c7 and C6 output P7 and P6 after passing through the multi-scale feature fusion layer; the P5 is obtained by splicing and convolution-fusing the result of the up-sampling of the P6 and the output of the attention gate module, wherein the attention gate module uses C5 and P6 as input to generate a feature map after the attention is exerted; the P4 is obtained by splicing and convolution-fusing the result of the up-sampling of the P5 and the output of the attention gate module, wherein the attention gate module uses C4 and P5 as input to generate a feature map after the attention is exerted; the P3 is obtained by splicing and convolution-fusing the result of the up-sampling of the P4 and the output of the attention gate module, wherein the attention gate module uses C3 and P4 as input to generate a feature map after the attention is exerted; the P2 is obtained by splicing the result of the up-sampling of P3 and the output of an attention gate module and then carrying out convolution fusion, wherein the attention gate module uses C2 and P3 as input to generate a characteristic diagram after applying attention;
the feature enhancement network comprises an attention gate module, a cross-layer fusion scale perception attention module, a space perception self-attention module and a task perception channel attention module;
the feature enhancement network takes P2, P3, P4, P5, P6 and P7 output by the multi-scale feature extraction network as input to obtain intermediate results A2, A3, A4, A5, A6 and A7; wherein A2 and A7 are directly derived from P2 and P7; a3 is obtained by splicing the down-sampling result of A2 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P3 and a lower layer feature map A2 as input to generate a feature map after applying attention; a4 is obtained by splicing the down-sampling result of A3 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P4 and a lower layer feature map A3 as input to generate a feature map after attention is exerted; a5 is obtained by splicing the down-sampling result of A4 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P5 and a lower layer feature map A4 as input to generate a feature map after attention is exerted; a6 is obtained by splicing the down-sampling result of A5 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P6 and a lower layer feature map A5 as input to generate a feature map after attention is exerted; intermediate results A2, A3, A4, A5, A6 and A7 pass through a cross-layer fusion scale perception attention module, a spatial perception self-attention module and a task perception channel attention module which are connected in series to obtain final output characteristic diagrams F2, F3, F4, F5, F6 and F7;
step 2: inputting the enhanced feature map obtained in the step 1 into a position detection network so as to predict the target position of the engineering vehicle, filtering low-quality position prediction through post-processing, then extracting an interested area in the feature map according to the predicted target position, and then inputting a cascaded position correction and category prediction network and performing post-processing to obtain the final target position and the type of the engineering vehicle.
2. The robust engineering vehicle identification method based on object detection as claimed in claim 1, wherein: in the position detection network in the step 2, input data sequentially passes through the deformable volume block, the dynamic linear modification unit, the volume block, the deformable volume block and the dynamic linear modification unit, and then output respectively passes through two paths, wherein one path is input into the volume block and the Scale layer to obtain a position prediction diagram, and the other path passes through the volume block to obtain a key point thermodynamic diagram.
3. The robust engineering vehicle identification method based on object detection as claimed in claim 1, wherein: the position correction and category prediction network in the step 2 comprises a first correction module, a second correction module and a third correction module which are sequentially connected, wherein the first correction module, the second correction module and the third correction module are respectively composed of a cutting and pooling layer, a positioning perception category prediction branch and a position correction branch;
the position correction and category prediction network inputs a rectangular prediction area output by the position prediction network and an enhanced feature map output by the feature enhancement network; when the first correction is carried out in the first correction module, cutting out sub-areas of the enhanced feature map through the rectangular prediction area, pooling the sub-areas into feature maps with uniform sizes, inputting a positioning perception type prediction branch and a position correction branch to obtain a positioning perception classification confidence coefficient and a position correction value, and applying the position correction value to the rectangular prediction area to obtain a new rectangular prediction area after the first correction; the second correction is carried out in the second correction module, the third correction is carried out in the third correction module, the same principle as the first correction is carried out in the first correction module, and the input of the first correction module is a rectangular prediction area after the previous correction and an enhanced feature map output by a feature enhancement network; and the final positioning perception classification confidence coefficient is the average value of the confidence coefficients obtained by three corrections, and the final detection result of the position correction and class prediction network is the result after the last correction.
4. The robust engineering vehicle identification method based on object detection according to any one of claims 1-3, characterized in that the detailed implementation of step 1 comprises the following sub-steps:
step 1.1: the video frame image to be identified is
Figure 492112DEST_PATH_IMAGE001
Inputting a multi-scale feature extraction network, wherein the scale
Figure 281076DEST_PATH_IMAGE002
So as to obtain a group of original feature maps with different scales
Figure 789549DEST_PATH_IMAGE003
ChIs the number of channels;
step 1.2: the original feature map is obtained
Figure 296754DEST_PATH_IMAGE004
Inputting a feature enhancement network based on an attention mechanism in a scale-by-scale manner, enhancing the response of the interested region in the feature map, and obtaining a group of enhanced feature maps
Figure 960427DEST_PATH_IMAGE005
5. The robust engineering vehicle identification method based on object detection as claimed in claim 4, wherein the step 2 is implemented by the following sub-steps:
step 2.1: will enhance the feature map
Figure 135057DEST_PATH_IMAGE006
Inputting the position detection network one by one to obtain a key point thermodynamic diagram
Figure 752114DEST_PATH_IMAGE007
And a location prediction map
Figure 808932DEST_PATH_IMAGE008
Step 2.2: extracting K positions with peak responses from the key point positioning thermodynamic diagram as predicted central domain key points of the engineering vehicle target
Figure 418905DEST_PATH_IMAGE009
And predicting the corresponding position from the key points
Figure 431729DEST_PATH_IMAGE010
K rectangular prediction areas possibly containing any engineering vehicle are obtained through calculation
Figure 734534DEST_PATH_IMAGE011
Using D-IoU NMS to filter partial low-quality prediction results to obtain no more than M rectangular prediction areas; wherein M is a preset value;
Figure 216331DEST_PATH_IMAGE012
respectively representing the coordinates of the upper left corner points of the rectangular prediction region
Figure 177465DEST_PATH_IMAGE013
And width
Figure 795528DEST_PATH_IMAGE014
Height of
Figure 269235DEST_PATH_IMAGE015
Figure 21684DEST_PATH_IMAGE016
Figure 35776DEST_PATH_IMAGE017
Step 2.3: cutting and pooling the enhanced feature map obtained in the step 1 according to the rectangular prediction area calculated in the step 2.2 to obtain a feature map with uniform size and containing only the features in the rectangular region of interest;
step 2.4: inputting the characteristic diagram obtained in the step 2.3 into a cascade position correction and class prediction network to obtain a positioning precision perception classification confidence coefficient
Figure 790236DEST_PATH_IMAGE018
And position correction value
Figure 434844DEST_PATH_IMAGE019
Here, theN cls The number of the categories of the construction vehicle is represented,
Figure 687971DEST_PATH_IMAGE020
representing a rectangular region
Figure 692705DEST_PATH_IMAGE021
Correction values of coordinates of the upper left corner and width and height;
the position correction value predicted when the cascade position correction and the class prediction network are corrected for the first time in step 2.4 is applied to the rectangular prediction area obtained in step 2.2
Figure 19781DEST_PATH_IMAGE022
A second correction of applying the second predicted position correction value to the rectangular prediction area after the first correction, and a third correction of applying the third predicted position correction value to the rectangular prediction area after the second correction; and then filtering redundant rectangular prediction regions by using D-IoUNMS for the result after the last correction, and shielding the rectangular prediction regions lower than the threshold value through a confidence threshold value to obtain the final detection results of all the engineering vehicles in the input image, namely the types of the engineering vehicles and the positions of the rectangular regions where the engineering vehicles are located.
6. A robust engineering vehicle identification system based on target detection is characterized by comprising the following modules:
the module 1 is used for extracting a feature map from a video frame image to be identified by adopting a multi-scale feature extraction network, and performing feature enhancement by using a feature enhancement network based on an attention mechanism to obtain an enhanced feature map;
the multi-scale feature extraction network consists of a deep convolutional neural network and a multi-scale feature fusion layer; the last 6 layers of output of the deep convolutional neural network are convolved to generate feature maps C2, C3, C4, C5, C6 and C7 with the same channel number; c7 and C6 output P7 and P6 after passing through the multi-scale feature fusion layer; the P5 is obtained by splicing and convolution-fusing the result of the up-sampling of the P6 and the output of the attention gate module, wherein the attention gate module uses C5 and P6 as input to generate a feature map after the attention is exerted; the P4 is obtained by splicing and convolution-fusing the result of the up-sampling of the P5 and the output of the attention gate module, wherein the attention gate module uses C4 and P5 as input to generate a feature map after the attention is exerted; the P3 is obtained by splicing and convolution-fusing the result of the up-sampling of the P4 and the output of the attention gate module, wherein the attention gate module uses C3 and P4 as input to generate a feature map after the attention is exerted; the P2 is obtained by splicing and convolution-fusing the result of the up-sampling of the P3 and the output of the attention gate module, wherein the attention gate module uses C2 and P3 as input to generate a feature map after the attention is exerted;
the feature enhancement network comprises an attention gate module, a cross-layer fusion scale perception attention module, a space perception self-attention module and a task perception channel attention module;
the feature enhancement network takes P2, P3, P4, P5, P6 and P7 output by the multi-scale feature extraction network as input to obtain intermediate results A2, A3, A4, A5, A6 and A7; wherein A2 and A7 are directly derived from P2 and P7; a3 is obtained by splicing the down-sampling result of A2 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P3 and a lower layer feature map A2 as input to generate a feature map after attention is exerted; a4 is obtained by splicing the down-sampling result of A3 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P4 and a lower layer feature map A3 as input to generate a feature map after applying attention; a5 is obtained by splicing the down-sampling result of A4 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P5 and a lower layer feature map A4 as input to generate a feature map after attention is exerted; a6 is obtained by splicing the down-sampling result of A5 and the output of an attention gate module and carrying out convolution fusion, wherein the attention gate module uses P6 and a lower layer feature map A5 as input to generate a feature map after attention is exerted; intermediate results A2, A3, A4, A5, A6 and A7 pass through a cross-layer fusion scale perception attention module, a spatial perception self-attention module and a task perception channel attention module which are connected in series to obtain final output feature maps F2, F3, F4, F5, F6 and F7;
and the module 2 is used for inputting the enhanced feature map acquired by the module 1 into a position detection network so as to predict the target position of the engineering vehicle, filtering low-quality position prediction through post-processing, then extracting an interested region in the feature map according to the predicted target position, and then inputting the cascaded position correction and category prediction network and performing post-processing to obtain the final target position and type of the engineering vehicle.
CN202210538060.3A 2022-05-18 2022-05-18 Robust engineering vehicle identification method and system based on target detection Active CN114648736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210538060.3A CN114648736B (en) 2022-05-18 2022-05-18 Robust engineering vehicle identification method and system based on target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210538060.3A CN114648736B (en) 2022-05-18 2022-05-18 Robust engineering vehicle identification method and system based on target detection

Publications (2)

Publication Number Publication Date
CN114648736A CN114648736A (en) 2022-06-21
CN114648736B true CN114648736B (en) 2022-08-16

Family

ID=81997214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210538060.3A Active CN114648736B (en) 2022-05-18 2022-05-18 Robust engineering vehicle identification method and system based on target detection

Country Status (1)

Country Link
CN (1) CN114648736B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173423A (en) * 2023-08-09 2023-12-05 山东财经大学 Method, system, equipment and medium for detecting small image target

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2780595A1 (en) * 2011-06-22 2012-12-22 Roman Palenychka Method and multi-scale attention system for spatiotemporal change determination and object detection
US20220351535A1 (en) * 2019-12-20 2022-11-03 Intel Corporation Light Weight Multi-Branch and Multi-Scale Person Re-Identification
CN111401201B (en) * 2020-03-10 2023-06-20 南京信息工程大学 Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN114332780A (en) * 2021-11-30 2022-04-12 无锡数据湖信息技术有限公司 Traffic man-vehicle non-target detection method for small target
CN114332620A (en) * 2021-12-30 2022-04-12 杭州电子科技大学 Airborne image vehicle target identification method based on feature fusion and attention mechanism

Also Published As

Publication number Publication date
CN114648736A (en) 2022-06-21

Similar Documents

Publication Publication Date Title
Kalfarisi et al. Crack detection and segmentation using deep learning with 3D reality mesh model for quantitative assessment and integrated visualization
CN110232380B (en) Fire night scene restoration method based on Mask R-CNN neural network
US10037604B2 (en) Multi-cue object detection and analysis
CN110781756A (en) Urban road extraction method and device based on remote sensing image
CN110346699B (en) Insulator discharge information extraction method and device based on ultraviolet image processing technology
CN113344475B (en) Transformer bushing defect identification method and system based on sequence modal decomposition
CN112633231A (en) Fire disaster identification method and device
CN112766137B (en) Dynamic scene foreign matter intrusion detection method based on deep learning
CN113723377A (en) Traffic sign detection method based on LD-SSD network
CN113111727A (en) Method for detecting rotating target in remote sensing scene based on feature alignment
CN113920097A (en) Power equipment state detection method and system based on multi-source image
Tao et al. Automatic smoky vehicle detection from traffic surveillance video based on vehicle rear detection and multi‐feature fusion
CN115830004A (en) Surface defect detection method, device, computer equipment and storage medium
CN114648736B (en) Robust engineering vehicle identification method and system based on target detection
CN108509826B (en) Road identification method and system for remote sensing image
CN111881984A (en) Target detection method and device based on deep learning
Shit et al. An encoder‐decoder based CNN architecture using end to end dehaze and detection network for proper image visualization and detection
Leach et al. Data augmentation for improving deep learning models in building inspections or postdisaster evaluation
CN106778822B (en) Image straight line detection method based on funnel transformation
CN113192057A (en) Target detection method, system, device and storage medium
CN115083008A (en) Moving object detection method, device, equipment and storage medium
CN115346206B (en) License plate detection method based on improved super-resolution deep convolution feature recognition
CN115984378A (en) Track foreign matter detection method, device, equipment and medium
CN116385477A (en) Tower image registration method based on image segmentation
CN113505860B (en) Screening method and device for blind area detection training set, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant