CN116542912A - Flexible body bridge vibration detection model with multi-target visual tracking function and application - Google Patents

Flexible body bridge vibration detection model with multi-target visual tracking function and application Download PDF

Info

Publication number
CN116542912A
CN116542912A CN202310393300.XA CN202310393300A CN116542912A CN 116542912 A CN116542912 A CN 116542912A CN 202310393300 A CN202310393300 A CN 202310393300A CN 116542912 A CN116542912 A CN 116542912A
Authority
CN
China
Prior art keywords
module
frame
flexible body
feature
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310393300.XA
Other languages
Chinese (zh)
Inventor
王森
孙瑞阳
付涛
李茂�
林森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202310393300.XA priority Critical patent/CN116542912A/en
Publication of CN116542912A publication Critical patent/CN116542912A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H9/00Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves by using radiation-sensitive means, e.g. optical means
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/02Measuring arrangements characterised by the use of optical techniques for measuring length, width or thickness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a flexible body bridge vibration detection model with multi-target visual tracking and application thereof. The flexible body bridge vibration detection model with multi-target visual tracking constructed by the invention combines the target tracking module on the basis of the original YOLOv5-s frame, and fuses the time domain information and the space domain information between frames to realize more effective vibration displacement detection of the detection target. Aiming at the condition that the detected object and the camera are inclined and the regression displacement precision is inaccurate, the invention increases an angle parameter at the regression head part, increases CIoU on the basis of KFIOU, and through experimental verification, the RIoU used in the invention has better regression characteristic of a rotating frame and faster convergence speed compared with the KFIOU.

Description

Flexible body bridge vibration detection model with multi-target visual tracking function and application
Technical Field
The invention relates to a flexible body bridge vibration detection model with multi-target visual tracking and application thereof, and belongs to the fields of visual vibration displacement measurement and computer vision.
Background
In the traditional large-span flexible bridge vibration measurement and detection method, contact sensors such as an acceleration sensor, a strain gauge and the like are often adopted to monitor a structural body, and the installation and maintenance requirements of the sensors and a data acquisition system are complicated and expensive, so that the wide application of the traditional contact sensors in actual engineering is limited. Therefore, the research community has been actively exploring more commonly applicable technologies such as wireless sensing systems based on the internet of things, laser doppler vibrometer based on laser ranging, global positioning systems, and interferometric radar systems; these techniques have specific installation distance requirements, limited vibration amplitude measurement range, and most of them are very expensive, which is not conducive to practical industrial structure displacement monitoring.
The visual measurement method based on the deep learning provides a new thought for the vibration measurement of the structural body, but most of the current deep learning algorithms are more focused on the accuracy of target identification, and neglect the positioning accuracy and the rotation condition of the target. In addition, because the detected object or the camera can have certain inclination, the large-span bridge can have certain rotation after being disturbed, and the like, the problem of low prediction frame precision is further caused. Meanwhile, as the target detection algorithm only considers the targets of the current frame, the time-space correlation between adjacent frames is ignored, so that a larger jitter error exists when the vibration displacement signal is extracted.
Disclosure of Invention
The invention provides a flexible body bridge vibration detection model with multi-target visual tracking and application thereof.
The technical scheme of the invention is as follows:
according to one aspect of the invention, a multi-target visual tracking flexible body bridge vibration detection model is provided, and the multi-target visual tracking flexible body bridge vibration detection model is constructed by utilizing a feature extraction module, a PANet module, a head positioning module and a target tracking module.
The feature extraction module is based on a backbone network of a YOLOv5-s network model, changes the last C3 module of the backbone network of the YOLOv5-s network model into a transformer self-attention mechanism module and moves to the next layer of the SPPF module, thereby constructing a model consisting of C i And the feature extraction module is composed of a BS module, a C3 module, an SPPF module and a transducer self-attention mechanism module.
The PANet module outputs three feature graphs X of the feature extraction module 1 、X 2 、X 3 As input from the feature map X 3 Initially, feature map X 3 Through C 1 BS module obtains feature map S 3 Map S of the characteristics 3 Up-sampling and feature image X 2 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 2 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 2 Up-sampling and feature image X 1 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 1 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 1 Obtaining a characteristic diagram Q without any processing 1 Map Q of the characteristic 1 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 2 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 2 The method comprises the steps of carrying out a first treatment on the surface of the Map Q of the characteristic 2 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 3 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 3
The head positioning module outputs a characteristic diagram Q to the PANet module 1 、Q 2 、Q 3 First through a C 1 The BS module is used for obtaining a classification part and a regression part; then the two parts are respectively passed through a C 3 And the BS module is used for obtaining a regression branch and a background branch by the regression part and obtaining a classification branch by the classification part.
The multi-target visual tracking flexible body bridge vibration detection model adopts a rotating frame overlapping degree loss function L RIoU ,L RIoU The expression:
L RIoU =L KFIoU +L CIoU
wherein: l (L) KFIoU 、L CIoU The KFIOU loss function and the CIoU loss function are respectively represented.
The target tracking module is used for predicting the position of the tracking track of the current frame in the next frame by using Kalman filtering by taking the result output by the head positioning module as the input quantity of the target tracking module, taking IoU between the prediction frame and the actual detection frame as the similarity in two matching, and completing the matching by using a Hungary algorithm.
Judging according to the confidence coefficient and the threshold value of the output test frame of the head positioning module, dividing the test frame into a high sub-frame and a low sub-frame, and processing separately: if the frame is a high frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, calculating a IoU (cross ratio) value of the next frame boundary frame and the high frame boundary frame of the current frame, and completing matching by using a Hungary algorithm; if the frame is a low frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, and calculating a IoU (cross ratio) value of the next frame boundary frame and the current frame low frame boundary frame; and matching is completed through a Hungary algorithm.
According to another aspect of the invention, the flexible body bridge vibration detection model with the medium multi-target visual tracking is used for multi-target detection in flexible body bridge vibration.
The beneficial effects of the invention are as follows: the invention constructs a flexible body bridge vibration detection model with multi-target visual tracking, combines a target tracking module on the basis of the original YOLOv5-s frame, and fuses time domain information and space domain information between frames to realize more effective vibration displacement detection of a detection target. Aiming at the condition that the detected object and the camera are inclined and the regression displacement precision is inaccurate, the invention increases an angle parameter at the regression head part, increases CIoU on the basis of KFIOU, and through experimental verification, the RIoU used in the invention has better regression characteristic of a rotating frame and faster convergence speed compared with the KFIOU.
Drawings
FIG. 1 is a block diagram of the structure of the present invention;
FIG. 2 is a diagram of a bridge structure acquisition;
FIG. 3 is a label drawing of a labelImg2 tool on a bridge detection target;
FIG. 4 is a block diagram of a feature extraction module;
FIG. 5 is a block diagram of a C3 module;
FIG. 6 is a schematic diagram of an SPPF module;
FIG. 7 is a schematic diagram of a transducer self-attention mechanism module;
FIG. 8 is a detailed workflow diagram of a PANet module;
FIG. 9 is a schematic diagram of a head positioning module;
FIG. 10 is a graph comparing RIoU loss function and KFIOU loss function loss value curves;
FIG. 11 is a block diagram of a target tracking module;
FIG. 12 is a detailed step diagram of training a multi-objective visual tracking flexible body bridge vibration detection model;
FIG. 13 is a graph showing a time-domain global comparison of acceleration sensor and image displacement signals;
FIG. 14 is a partial time domain comparison of acceleration sensor and image displacement signals;
FIG. 15 is a time domain detail comparison of an acceleration sensor with an image displacement signal;
FIG. 16 is a diagram of a sensor FFT;
fig. 17 is a second fourier transform plot of sensor 1.
Detailed Description
The invention will be further described with reference to the drawings and examples, but the invention is not limited to the scope.
Example 1: as shown in fig. 1-17, according to an aspect of the embodiments of the present invention, a flexible body bridge vibration detection model for multi-target visual tracking is provided, and the flexible body bridge vibration detection model for multi-target visual tracking is constructed by using a feature extraction module, a PANet module, a head positioning module and a target tracking module together.
Further, as shown in fig. 4, the feature extraction module is based on the backbone network of the YOLOv5-s network model, changes the last C3 module of the backbone network of the YOLOv5-s network model into a transform self-attention mechanism module and moves to the next layer of the SPPF module, thereby constructing a model consisting of C i The feature extraction module is composed of a BS module, a C3 module, an SPPF module and a transducer self-attention mechanism module; the C is i The BS module is configured to: an ixi convolutional layer and BN layer and a SiLU activation function. The C3 module is shown in FIG. 5, and the C3 module is divided into a main part and a residual part, wherein in the main part, the directions from input to output are C 1 BS module and n Bottleneck modules. Residual part C alone 1 And a BS module. After the images respectively pass through the trunk part, the images are subjected to Concat stacking with the images passing through the residual part, and the stacked images are subjected to C again 1 BS module for augmentation by using residual networkAdding network depth, thereby improving the accuracy of vibration detection of the bridge structure; meanwhile, a jump connection mode is adopted in feature extraction, so that the gradient disappearance problem caused by increasing the network depth is relieved.
Specifically, in an embodiment of the present invention, the C3 module includes 3C 1 BS module, and n Bottlenck modules. The backbone network has 3C 3 modules, wherein the number of the Bottleneck modules in the 3C 3 modules is 1, 2 and 3 in sequence (i.e. from input to output, the number of the Bottleneck modules in the 1 st C3 module is 1, the number of the Bottleneck modules in the 2 nd C3 module is 2, and the number of the Bottleneck modules in the 3 rd C3 module is 3). Wherein the Bottlenck module consists of 1C 1 BS module and 1C 3 The BS module.
The SPPF module is shown in FIG. 6, and the SPPF module is composed of 2C 1 BS module and 3 max pooling layers. And before the last C3 module, an SPPF structure is used for extracting the characteristics of the bridge detection target through maximum pooling, so that the receptive field of the characteristics extraction backbone network is increased, and the purposes of fully extracting the characteristics of the bridge model and facilitating subsequent operation are achieved.
The principle diagram of the transducer self-attention mechanism module is shown in fig. 7. Firstly, performing matrix calculation of query (Q), key (K) and value (V) by using a full connection layer:
Q=Linear(x),K=Linear(x),V=Linear(x) (1)
then the Multi-head attention is calculated according to the calculation formula as follows:
where d is the query/key dimension, with a default value of 8. And adding the characteristic diagram of the multi-head self-attention mechanism with the input x, then adding the characteristic diagram with the residual edge of the characteristic diagram through feed forward, and obtaining the final characteristic diagram.
Further, as shown in fig. 8, the PANet module outputs three output feature graphs X of the feature extraction module 1 、X 2 、X 3 As input from the feature map X 3 Initially, feature map X 3 Through C 1 BS module obtains feature map S 3 Map S of the characteristics 3 Up-sampling and feature image X 2 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 2 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 2 Up-sampling and feature image X 1 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 1 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 1 Obtaining a characteristic diagram Q without any processing 1 Map Q of the characteristic 1 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 2 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 2 The method comprises the steps of carrying out a first treatment on the surface of the Map Q of the characteristic 2 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 3 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 3 . By the aid of the module, the representation capability of the backbone network to the shallow position information and the deep semantic information of the rotating target can be effectively enhanced, and the robustness of target detection and the compactness of the positioning anchor frame are improved.
Further, as shown in fig. 9, the head positioning module outputs a characteristic diagram Q to the PANet module 1 、Q 2 、Q 3 First through a C 1 The BS module is used for obtaining a classification part and a regression part; then the two parts are respectively passed through a C 3 And the BS module is used for obtaining a regression branch Obj and a background branch Reg by the regression part and obtaining a classification branch Cls by the classification part.
Further, the multi-target visual tracking flexible body bridge vibration detection model adopts a rotating frame overlapping degree loss function L RIoU ,L RIoU The expression:
L RIoU =L KFIoU +L CIoU (3)
wherein: l (L) KFIoU 、L CIoU The KFIOU loss function and the CIoU loss function are respectively represented.
It should be noted that, in this context, the angle parameter that the traditional IoUloss function cannot return to the rotating frame is comprehensively considered, but the convergence speed of the angular regression loss function kfio u with a better effect at present is slower when the predicting frame is not coincident with the real frame, so CIoU is added on the basis of kfio u, so that a RIoU loss function (the overlapping degree loss of the rotating frame) is constructed, and the calculation method of the RIoU loss function is shown in formula (3). In order to quantitatively evaluate the training effect of the loss function model and the convergence rate of the loss function, we compare the RIoU algorithm provided by the invention with the loss value change curve obtained by KFIOU after 500 epochs training, and from FIG. 10, it can be seen that the RIoU used by the invention has better convergence than the KFIOU.
As can be seen from formula (3), RIoU herein is divided into two parts, kfio u and CIoU, specifically described as follows:
the main principle of kfio u is to convert the rotation frame [ x, y, w, h, θ ] into two-dimensional gaussian distribution (μ, Σ), multiply two gauss to obtain gaussian distribution of the intersection region, reversely convert three gaussian distributions into rotation rectangle, calculate approximate SkewIoU (inclined rectangle frame IoU), and calculate the loss of two rotation frames. The conversion formula of the gaussian distribution is shown in the following formulas (4), (5):
μ=(x,y) T (4)
the specific expression of the KFIOU loss function is shown in the following formulas (6), (7):
L KFIoU =1-KFIOU (7)
wherein,,x, y, w, h, θ represent the abscissa, ordinate, width, height, and angle, respectively, of the center point of the rectangular frame.
The loss function of CIoU is divided mainly into 3 parts, distance loss, aspect ratio loss and IoU loss, respectively. The specific calculation formulas are shown as (8), (9) and (10).
Wherein b, b gt Representing the center points of the predicted and real frames, respectively, ρ representing the Euclidean distance between the two center points, c representing the diagonal distance of the minimum closure region capable of containing both the predicted and real frames, v measuring the similarity of aspect ratio, α being the weight parameter, w gt And h gt Representing the width and height of the real frame, w and h representing the width and height of the predicted frame.
Further, as shown in fig. 11, the target tracking module is configured to use the result output by the head positioning module as the input of the target tracking module, predict the position of the tracking track of the current frame in the next frame by using kalman filtering, use IoU between the predicted frame and the actual detection frame as the similarity when matching is performed twice, and complete matching by using the hungarian algorithm.
Further, the target tracking module avoids the mutual independence phenomenon between adjacent frames existing in the detection algorithm by using the ByteTrack multi-target tracking algorithm, optimizes the relevance of detection targets between frames, and more accurately measures the displacement offset of the vibration targets, specifically, judges according to the confidence level and the threshold value of the output detection frame of the head positioning module, divides the detection frame into a high frame and a low frame, and processes the high frame and the low frame separately: if the frame is a high frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, calculating a IoU (cross ratio) value of the next frame boundary frame and the high frame boundary frame of the current frame, and completing matching by using a Hungary algorithm; if the frame is a low frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, and calculating a IoU (cross ratio) value of the next frame boundary frame and the current frame low frame boundary frame; and matching is completed through a Hungary algorithm. In the embodiment of the invention, the threshold value is taken to be 0.5, and the check box larger than the threshold value is judged to be a high frame, otherwise, the check box is judged to be a low frame.
According to another aspect of the embodiment of the invention, the flexible body bridge vibration detection model with the medium multi-target visual tracking is used for multi-target detection in flexible body bridge vibration.
Further, an alternative implementation procedure for using the above-mentioned multi-target visual tracking flexible body bridge vibration detection model for multi-target detection in flexible body bridge vibration is given as follows:
step 1, acquiring a flexible bridge structure data set by using a high-speed camera and dividing the flexible bridge structure data set into a training data set and a verification data set;
specifically, the flexible bridge structure data set is vibration data of the excited bridge model obtained through high-speed camera shooting. Light compensation was performed after acquisition using a light supplementing lamp, the resolution of the acquired continuous frames was 640 x 512, and the acquired images were as shown in fig. 2. And simultaneously, synchronously acquiring vibration data of the excited bridge model by using an NI9234 acquisition card and an acceleration sensor, wherein the sampling frequency is 25.6kHz, and the vibration data are used for a comparison experiment. The flexible bridge structure data set is divided into a training data set and a verification data set. In the embodiment, the flexible bridge structure data sets used in the invention are all acquired on the diagonal cable bridge model, and 20000 diagonal cable bridge model image data sets are acquired through a high-speed camera at the shooting speed of 2000 frames per second; 380 pieces of collected vibration data of the inclined rope bridge model are sequentially taken from the collected data, and the training data set and the verification data set respectively account for 90% and 10% of the bridge structure data set: training data set 342, verifying data set 38; and the bridge structural body images to be tested can be added or acquired later and then tested.
Step 2, marking the training data set and the verification data set through LabelImg2 to obtain a training set and a verification set; specifically: the LabelImg2 tool kit is provided with labeled classes of sensor and sensor1, the training set and the verification set are labeled through LabelImg2, and the training set for training and the verification set for verification are obtained after labeling is completed. The specific operation of marking is to mark the detection targets on the bridge by using marking frames with the same size, and the marking frames use rotating frames and are marked by using labelImg2 toolkits. The labeling effect with the labelmg 2 toolkit is shown in fig. 3. Because the shape and the size of the inclined target in the image sequence are unchanged, the error caused by manual marking is reduced by manually marking by using a rotating frame with uniform size, and the marked data set is distributed to the training set, the verification set and the test set according to the proportion of 9:1.
Step 3, constructing a multi-target visual tracking model, wherein the multi-target visual tracking model comprises a feature extraction module, a PANet module, a head positioning module and a target tracking module, and is applied to vibration detection of a flexible body bridge;
step 4, obtaining a group of parameters for training by modifying the super parameters in the file;
step 5, training the flexible body bridge vibration detection model with multi-target visual tracking by calling a training set to obtain candidate weights; specifically, the flexible body bridge vibration detection model with multi-target visual tracking in the step 5 is trained, and the trained object comprises coordinates of a central point of a marking frame, the width and the height of the marking frame, the inclined angle of the marking frame and the type of the marking object. And after training, obtaining training weights and using the weights for parameter adjustment and prediction of the model. The specific steps of training the multi-target visual tracking flexible body bridge vibration detection model are shown in fig. 12, and the training process is as follows: applying the trained super-parameters to a flexible body bridge vibration detection model with multi-target visual tracking, and using partial pictures of a training set as a current training sample; sequentially placing the samples into a flexible body bridge vibration detection model with multi-target visual tracking for training to obtain updated training weight parameters; the updating of the weight parameters comprises the following specific steps: randomly initializing weight parameters to calculate the output of training samples; comparing the output of the training sample with a real frame, and calculating a loss function; calculating the gradient of the loss function to the weight parameter by using a chain rule; the weight parameters are updated according to the gradient values and the learning rate, so that the loss function is minimized. Repeating the steps until the iteration times of the training setting of the network model are completed. In the embodiment of the invention, the configuration file is firstly configured to set the number of pictures with the size of the extracted batch to be 4, the super-parameter learning rate=0.0032, the momentum=0.843, the weight attenuation coefficient=0.001, and the rest parameters are defaulted. And loading a flexible body bridge vibration detection model with multi-target visual tracking, and calling training pictures into images according to batches for training. And obtaining training weights after the flexible body bridge vibration detection model of the multi-target visual tracking is carried out for the set iteration times according to the set parameters such as the learning rate, momentum and attenuation coefficient. And carrying out quantitative evaluation on the performance of the candidate weights through the verification set, and selecting the optimal weights by taking the performance of the overall quantitative optimal weights such as accuracy, recall rate, average precision mean value and the like as evaluation basis.
And 6, loading the optimal weight into the flexible body bridge vibration detection model of the multi-target visual tracking, and obtaining the flexible body bridge vibration detection model of the multi-target visual tracking after loading the optimal weight.
And 7, carrying out a series of comparison experiments on the flexible body bridge vibration detection model with the multi-target visual tracking loaded with the optimal weight. Target detection and performance comparison are carried out through a series of traditional detection networks such as a traditional normalization correlation coefficient matching method. And (3) carrying out regression on the position information of the central point of the boundary frame, and comparing the displacement track with the displacement track synchronously acquired by the acceleration sensor. The invention improves the algorithm and compares the performance of the algorithm before and after improvement through different evaluation modes.
The invention compares the improvement of the algorithm through various evaluation modes, and verifies the improvement effect through detailed comparison experiments, and the detailed data of the experiments are shown in table 1. The invention uses Normalized Root Mean Square Error (NRMSE) as a quantitative evaluation index to measure the fitting degree between the displacement curve obtained by the improved algorithm and the acceleration sensor. The specific formula (11) is shown. As can be seen from the data of the chart 1, each innovation of the invention is an effective innovation, after each innovation is added, mNRmNSE is reduced to 0.0137 from the original 0.01685, and the improvement of the invention can effectively improve the accuracy of the rotating frame visual tracking algorithm in bridge displacement vibration measurement.
Wherein P is i Predictive value Q of flexible body bridge vibration detection model representing multi-target visual tracking i Representing the measured real value of the acceleration sensor, Q max And Q min The maximum value and the minimum value of the real value measured by the acceleration sensor are respectively.
TABLE 1 network model comparison experiment
In table 1: a is a traditional YOLOv5-s model; b is a head module of the traditional YOLOv5-s model, and the head module is the head positioning module of the invention; c is the RIoU function of the invention adopted by the loss function of the model B; d is based on a C model, a backbone network in a traditional model adopts the feature extraction module, namely a transducer is introduced, and simultaneously, the transducer is also introduced into a neck module; e is a target tracking module introduced on the basis of the D model.
The invention adopts a plurality of deep learning algorithms with different positioning principles to perform performance comparison. In particular, the displacement curve of the central point of the boundary frame obtained according to the invention is compared with the displacement of the central point of the boundary frame detected by different deep learning algorithmsCurve fitting comparison is carried out on the curve and the displacement curve obtained by the displacement signals acquired by the acceleration sensor, as shown in fig. 13, 14 and 15. The lines of different shapes in the figure represent YOLOv5, R, respectively 3 det, our algorithm and displacement curve of acceleration sensor. As the algorithm adopts the RIoU rotating frame detection algorithm, compared with the traditional detection algorithm, the method has better rotating frame regression characteristic and faster convergence speed. Because the detected object rotates, the fitting degree of the vibration signal measured by the algorithm and the vibration signal acquired by the acceleration sensor is higher.
The results of quantitative comparisons of the different algorithms are shown in table 2. mAP@.5 represents the average AP of the network at IOU of 0.5. As can be seen from the data in table 2, the algorithm herein has high detection accuracy.
TABLE 2 quantitative comparison between different visual displacement detection algorithms
Algorithm mNRMSE mAP@.5
R 3 det 0.0367 1.0
YOLOv5 0.01685 0.995
Ours 0.0137 1.0
The invention compares R 3 The displacement curves measured by det, YOLOv5, the algorithm and the acceleration sensor are shown on a time domain waveform diagram, and the displacement curves have the attenuation rules similar to the same sampling frequency and the same change trend, but the vibration amplitudes are obviously different. The time domain waveform is further subjected to a fast fourier transform, as shown in fig. 16 and 17. From the perspective of frequency domain consistency, the comparison of the frequency spectrum characteristics of different algorithms shows that the frequency spectrum characteristics obtained by the method are closest to the acceleration frequency spectrum signals, and other algorithms and the acceleration frequency spectrum signals keep approximate local consistency, which reflects the effectiveness of the algorithm in the aspect of visual displacement measurement.
In summary, the algorithm of the invention aims at the problems of smaller detection target, lower resolution of a shot image, inclination of a detected object and a camera, and the like, and provides the problems of increasing angle parameters and improving inaccurate positioning of a traditional horizontal frame to a rotating object by using a RIoU rotating frame detection algorithm. In addition, the invention introduces a ByteTrack target detection algorithm, and fuses the time domain information and the space domain information, thereby realizing more effective vibration displacement detection. The experimental part evaluates and compares the method with the current mainstream deep learning algorithm and the offset obtained by the acceleration sensor, and the algorithm of the invention shows better measurement results.
While the present invention has been described in detail with reference to the drawings, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (8)

1. A flexible body bridge vibration detection model with multi-target visual tracking is characterized in that a flexible body bridge vibration detection model with multi-target visual tracking is constructed by utilizing a feature extraction module, a PANet module, a head positioning module and a target tracking module.
2. Root of Chinese characterThe flexible body bridge vibration detection model of claim 1, wherein the feature extraction module changes the last C3 module of the backbone network of the YOLOv5-s network model to a transducer self-attention mechanism module and moves to the next layer of the SPPF module based on the backbone network of the YOLOv5-s network model, thereby constructing a virtual image composed of C i And the feature extraction module is composed of a BS module, a C3 module, an SPPF module and a transducer self-attention mechanism module.
3. The multi-target visual tracking flexible body bridge vibration detection model of claim 1, wherein the PANet module outputs three feature maps X of a feature extraction module 1 、X 2 、X 3 As input from the feature map X 3 Initially, feature map X 3 Through C 1 BS module obtains feature map S 3 Map S of the characteristics 3 Up-sampling and feature image X 2 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 2 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 2 Up-sampling and feature image X 1 Concat stacking is done and then C is used 3 The module performs feature extraction on the stacked feature layers to obtain a feature map S 1 The method comprises the steps of carrying out a first treatment on the surface of the Map S of the characteristics 1 Obtaining a characteristic diagram Q without any processing 1 Map Q of the characteristic 1 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 2 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 2 The method comprises the steps of carrying out a first treatment on the surface of the Map Q of the characteristic 2 Through C 3 The BS module performs downsampling once and then performs downsampling with the feature map S 3 Performing Concat stacking, and performing feature extraction on the stacked feature layers through a transducer self-attention mechanism module to obtain a feature map Q 3
4. The multi-target visual tracking flexible body bridge vibration detection model of claim 1, wherein the head positioning module outputs characteristics to a PANet moduleSymptomatic chart Q 1 、Q 2 、Q 3 First through a C 1 The BS module is used for obtaining a classification part and a regression part; then the two parts are respectively passed through a C 3 And the BS module is used for obtaining a regression branch and a background branch by the regression part and obtaining a classification branch by the classification part.
5. The multi-target visual tracking flexible body bridge vibration detection model of claim 1, wherein the multi-target visual tracking flexible body bridge vibration detection model employs a rotating frame overlap loss function L RIoU ,L RIoU The expression:
L RIoU =L KFIoU +L CIoU
wherein: l (L) KFIoU 、L CIoU The KFIOU loss function and the CIoU loss function are respectively represented.
6. The flexible body bridge vibration detection model of claim 1, wherein the target tracking module is configured to use a result output by the head positioning module as an input of the target tracking module, predict a position of a tracking track of a current frame in a next frame by using kalman filtering, predict a similarity between a frame and an actual detection frame when IoU is used as two matches, and complete the matches by using a hungarian algorithm.
7. The multi-target visual tracking flexible body bridge vibration detection model according to claim 6, wherein the judgment is performed according to the confidence level and the threshold value of the output inspection frame of the head positioning module, the inspection frame is divided into a high frame and a low frame, and the processing is performed separately:
if the frame is a high frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, calculating a IoU (cross ratio) value of the next frame boundary frame and the high frame boundary frame of the current frame, and completing matching by using a Hungary algorithm;
if the frame is a low frame, predicting the position and the size of a next frame boundary frame by using Kalman filtering, and calculating a IoU (cross ratio) value of the next frame boundary frame and the current frame low frame boundary frame; and matching is completed through a Hungary algorithm.
8. Use of the multi-target visually tracked flexible body bridge vibration detection model of any one of claims 1-7 for multi-target detection in flexible body bridge vibrations.
CN202310393300.XA 2023-04-13 2023-04-13 Flexible body bridge vibration detection model with multi-target visual tracking function and application Pending CN116542912A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310393300.XA CN116542912A (en) 2023-04-13 2023-04-13 Flexible body bridge vibration detection model with multi-target visual tracking function and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310393300.XA CN116542912A (en) 2023-04-13 2023-04-13 Flexible body bridge vibration detection model with multi-target visual tracking function and application

Publications (1)

Publication Number Publication Date
CN116542912A true CN116542912A (en) 2023-08-04

Family

ID=87446104

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310393300.XA Pending CN116542912A (en) 2023-04-13 2023-04-13 Flexible body bridge vibration detection model with multi-target visual tracking function and application

Country Status (1)

Country Link
CN (1) CN116542912A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576165A (en) * 2024-01-15 2024-02-20 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117910120A (en) * 2024-03-20 2024-04-19 西华大学 Buffeting response prediction method for wind-bridge system based on lightweight transducer

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576165A (en) * 2024-01-15 2024-02-20 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117576165B (en) * 2024-01-15 2024-04-19 武汉理工大学 Ship multi-target tracking method and device, electronic equipment and storage medium
CN117910120A (en) * 2024-03-20 2024-04-19 西华大学 Buffeting response prediction method for wind-bridge system based on lightweight transducer

Similar Documents

Publication Publication Date Title
CN108229404B (en) Radar echo signal target identification method based on deep learning
CN112001270B (en) Ground radar automatic target classification and identification method based on one-dimensional convolutional neural network
CN116542912A (en) Flexible body bridge vibration detection model with multi-target visual tracking function and application
CN112101426A (en) Unsupervised learning image anomaly detection method based on self-encoder
CN112395987B (en) SAR image target detection method based on unsupervised domain adaptive CNN
CN108447057B (en) SAR image change detection method based on significance and depth convolution network
CN108168564A (en) A kind of Data Association based on LHD grey relational grades
CN113808174B (en) Radar small target tracking method based on full convolution network and Kalman filtering
CN110647802A (en) Remote sensing image ship target detection method based on deep learning
CN111126134A (en) Radar radiation source deep learning identification method based on non-fingerprint signal eliminator
CN112990082B (en) Detection and identification method of underwater sound pulse signal
CN111967381B (en) Face image quality grading and labeling method and device
CN114549589B (en) Rotator vibration displacement measurement method and system based on lightweight neural network
CN107609579A (en) Classification of radar targets method based on sane variation self-encoding encoder
Yu et al. Ship detection in optical satellite images using Haar-like features and periphery-cropped neural networks
CN113486917B (en) Radar HRRP small sample target recognition method based on metric learning
CN113298007B (en) Small sample SAR image target recognition method
CN112818762B (en) Large-size composite material and rapid nondestructive testing method for sandwich structure thereof
CN112966710B (en) FY-3D infrared hyperspectral cloud detection method based on linear discriminant analysis
Duan et al. An anchor box setting technique based on differences between categories for object detection
CN116205918B (en) Multi-mode fusion semiconductor detection method, device and medium based on graph convolution
CN117437530A (en) Synthetic aperture sonar interest small target twin matching identification method and system
CN116930904A (en) Ground penetrating radar image alignment and difference detection method
CN116778349A (en) SAR target recognition method based on scattering center feature extraction
CN115205602A (en) Zero-sample SAR target identification method based on optimal transmission distance function

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination