CN112395952A - A unmanned aerial vehicle for rail defect detection - Google Patents

A unmanned aerial vehicle for rail defect detection Download PDF

Info

Publication number
CN112395952A
CN112395952A CN202011145523.7A CN202011145523A CN112395952A CN 112395952 A CN112395952 A CN 112395952A CN 202011145523 A CN202011145523 A CN 202011145523A CN 112395952 A CN112395952 A CN 112395952A
Authority
CN
China
Prior art keywords
network
yolov3
aerial vehicle
unmanned aerial
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202011145523.7A
Other languages
Chinese (zh)
Inventor
刘建虢
尹晓雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Cresun Innovation Technology Co Ltd
Original Assignee
Xian Cresun Innovation Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Cresun Innovation Technology Co Ltd filed Critical Xian Cresun Innovation Technology Co Ltd
Priority to CN202011145523.7A priority Critical patent/CN112395952A/en
Publication of CN112395952A publication Critical patent/CN112395952A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B61RAILWAYS
    • B61KAUXILIARY EQUIPMENT SPECIALLY ADAPTED FOR RAILWAYS, NOT OTHERWISE PROVIDED FOR
    • B61K9/00Railway vehicle profile gauges; Detecting or indicating overheating of components; Apparatus on locomotives or cars to indicate bad track sections; General design of track recording vehicles
    • B61K9/08Measuring installations for surveying permanent way
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B61RAILWAYS
    • B61KAUXILIARY EQUIPMENT SPECIALLY ADAPTED FOR RAILWAYS, NOT OTHERWISE PROVIDED FOR
    • B61K9/00Railway vehicle profile gauges; Detecting or indicating overheating of components; Apparatus on locomotives or cars to indicate bad track sections; General design of track recording vehicles
    • B61K9/08Measuring installations for surveying permanent way
    • B61K9/10Measuring installations for surveying permanent way for detecting cracks in rails or welds thereof
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C39/00Aircraft not otherwise provided for
    • B64C39/02Aircraft not otherwise provided for characterised by special use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an unmanned aerial vehicle for rail defect detection, which comprises: the power system is used for providing power required by the flight of the unmanned aerial vehicle; the flight control system is connected with the power system and used for controlling the attitude stability of the unmanned aerial vehicle, managing the unmanned aerial vehicle to execute tasks and processing emergency situations; the communication navigation system is connected with the flight control system and is used for information transmission in the working process of the unmanned aerial vehicle; the mission load system is connected with the flight control system, is pre-loaded with a program of a rail defect detection method, and determines the position and the defect type of a rail defect according to the method; and the launching and recovery system is used for ensuring that the unmanned aerial vehicle smoothly flies at a safe height and speed and safely falls back to the ground from the sky after the mission is finished. The unmanned aerial vehicle can detect the defects of different scales of the rail, can judge the types of the defects and solve the problem of missing detection of tiny defects; meanwhile, the detection speed and precision can be improved, and real-time detection is realized.

Description

A unmanned aerial vehicle for rail defect detection
Technical Field
The invention belongs to the field of defect detection, and particularly relates to an unmanned aerial vehicle for rail defect detection.
Background
Transportation has become a part of our lives today, particularly rail transportation. China railway transportation is in a rapid development stage, and the speed of a train is greatly improved, so that high requirements are put forward on the safety of the railway. Due to weather, heavy transport loads, etc., rails are subject to various degrees of wear over time, including geometric defects, rail component defects, rail surface defects, common rail surface defects including scars, cracks, ripping, wrinkles, flaking, wear, indentation, etc., which if not timely maintained for replacement, can develop into internal defects that affect proper rail use and threaten safe train operation.
In China, the detection of the rail defects always depends on manual inspection and visual inspection for a long time, and the efficiency is low; meanwhile, the detection result is greatly influenced by factors such as artificial subjective factors, weather illumination and the like, false detection and missed detection of some tiny defects can occur, and real-time detection cannot be achieved.
Therefore, how to realize high-precision and high-real-time detection of rail defects is a problem to be solved in the field.
Disclosure of Invention
In order to realize high-precision and high-real-time detection of rail defects, the embodiment of the invention provides an unmanned aerial vehicle for rail defect detection.
The specific technical scheme is as follows:
a drone for rail defect detection, comprising:
the power system is used for providing power required by the flight of the unmanned aerial vehicle;
the flight control system is connected with the power system and used for controlling the attitude stability of the unmanned aerial vehicle, managing the unmanned aerial vehicle to execute tasks and processing emergency situations;
the communication navigation system is connected with the flight control system and is used for information transmission in the working process of the unmanned aerial vehicle;
the mission load system is connected with the flight control system, is pre-loaded with a program of a rail defect detection method, and determines the position and the defect type of a rail defect according to the method;
and the launching and recovery system is used for ensuring that the unmanned aerial vehicle smoothly flies at a safe height and speed and safely falls back to the ground from the sky after the mission is finished.
In one embodiment of the present invention, the rail defect detecting method includes:
acquiring a target rail image to be detected;
inputting the target rail image into an improved YOLOv3 network obtained by pre-training, and performing feature extraction on the target rail image by using a backbone network to obtain x feature maps with different scales; x is a natural number of 4 or more;
carrying out feature fusion in a top-down and dense connection mode on the x feature graphs with different scales by using an improved FPN (field programmable gate array) network to obtain a prediction result corresponding to each scale;
obtaining attribute information of the target rail image based on all prediction results, wherein the attribute information comprises the position and the category of a target in the target rail image;
wherein the improved YOLOv3 network comprises a trunk network and the improved FPN network which are connected in sequence; the improved YOLOv3 network is formed by increasing a feature extraction scale, optimizing a feature fusion mode of an FPN network, pruning and combining knowledge distillation to guide network recovery processing on the basis of a YOLOv3 network; the improved YOLOv3 network is trained according to sample images and the positions and the types of the targets corresponding to the sample images.
In one embodiment of the present invention, the backbone network of the improved YOLOv3 network includes:
y residual modules connected in series; y is a natural number of 4 or more; y is greater than or equal to x;
the method for extracting features by using the backbone network to obtain x feature maps with different scales comprises the following steps:
and performing feature extraction on the target rail image by utilizing y residual modules connected in series to obtain x feature maps which are output by the x residual modules in the reverse direction along the input direction and have sequentially increased scales.
In an embodiment of the present invention, the performing, by using an improved FPN network, feature fusion in a top-down dense connection manner on the x feature maps with different scales includes:
for predicted branch YiAcquiring feature maps with corresponding scales from the x feature maps and performing convolution processing;
the feature map after convolution processing and the prediction branch Yi-1~Y1Performing cascade fusion on the feature maps subjected to the upsampling treatment respectively;
wherein the improved FPN network comprises x prediction branches Y with sequentially increased scale1~Yx(ii) a The prediction branch Y1~YxThe scales of the x feature maps correspond to the scales of the x feature maps one by one; prediction branch Yi-jHas an upsampling multiple of 2j(ii) a i is 2, 3, …, x; j is a natural number smaller than i.
In an embodiment of the present invention, the performing pruning and guiding network recovery processing in combination with knowledge distillation includes:
in a network obtained by adding a feature extraction scale on the basis of the YOLOv3 network and optimizing a feature fusion mode of the FPN network, carrying out layer pruning on a residual error module of a backbone network to obtain a YOLOv3-1 network;
carrying out sparse training on the YOLOv3-1 network to obtain a YOLOv3-2 network with BN layer scaling coefficients in sparse distribution;
performing channel pruning on the YOLOv3-2 network to obtain a YOLOv3-3 network;
knowledge distillation is carried out on the YOLOv3-3 network to obtain the improved YOLOv3 network.
In an embodiment of the present invention, before training the modified YOLOv3 network, the method further includes:
determining the quantity to be clustered aiming at the size of the anchor box in the sample image;
acquiring a plurality of sample images with marked target frame sizes;
based on a plurality of sample images marked with the size of the target frame, acquiring a clustering result of the size of the anchor box in the sample images by using a K-Means clustering method;
writing the clustering result into a configuration file of the improved YOLOv3 network.
In one embodiment of the invention, the improved YOLOv3 network further includes a classification network and a non-maxima suppression module.
In an embodiment of the present invention, the obtaining attribute information of the target rail image based on all the prediction results includes:
and classifying all prediction results through the classification network, and then performing prediction frame deduplication processing through the non-maximum suppression module to obtain attribute information of the target rail image.
In one embodiment of the invention, the classification network comprises a SoftMax classifier.
In one embodiment of the invention, the loss function of sparse training is:
Figure BDA0002739603920000041
wherein the content of the first and second substances,
Figure BDA0002739603920000042
representing the loss function of the network origin, (x, y) representing input data and target data of the training process, W representing trainable weights,
Figure BDA0002739603920000043
regularization term added for scale factor, g (γ)) The penalty function is used for carrying out sparse training on the scale coefficient, and lambda is weight. The penalty function selects the L1 norm since the scaling factor γ is to be sparse.
The invention provides an unmanned aerial vehicle for detecting rail defects, which adopts a rail defect detection method, transmits a feature map from shallow to deep through an improved YOLOv3 network, extracts feature maps of at least four scales, enables the network to detect defects of different scales, especially tiny defects, by increasing the feature extraction scale of fine granularity, and simultaneously realizes the accurate classification of the defects.
The unmanned aerial vehicle disclosed by the invention has the advantages that the characteristic fusion mode of the FPN is changed, the characteristic fusion is carried out on the characteristic graphs extracted from the main network in a top-down and dense connection mode, the deep-layer characteristics are directly subjected to upsampling with different multiples, so that all the transmitted characteristic graphs have the same size, the characteristic graphs and the shallow-layer characteristic graphs are fused in a series connection mode, more original information can be utilized, high-dimensional semantic information participates in the shallow network, and the detection precision is improved; meanwhile, more specific characteristics can be obtained by directly receiving the characteristics of a shallower network, the loss of the characteristics can be effectively reduced, the parameter quantity needing to be calculated can be reduced, the detection speed is improved, and real-time detection is realized.
According to the unmanned aerial vehicle, the pre-trained network is subjected to layer pruning, sparse training, channel pruning and knowledge distillation processing, optimized processing parameters are selected in each processing process, the network volume can be reduced, most redundant calculation is eliminated, and the detection speed can be greatly improved under the condition that the detection accuracy is maintained.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
The present invention will be described in further detail with reference to the accompanying drawings and examples.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an unmanned aerial vehicle for rail defect detection according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a rail defect detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a prior art YOLOv3 network;
fig. 4 is a schematic structural diagram of an improved YOLOv3 network according to an embodiment of the present invention;
FIG. 5-1 is a graph of weight shift for parameter set 5 selected by an embodiment of the present invention; fig. 5-2 is a weight overlap graph of a parameter combination 5 selected by an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
In order to realize high-precision and high-real-time detection of rail defects, the embodiment of the invention provides an unmanned aerial vehicle for rail defect detection.
Referring to fig. 1, fig. 1 is a schematic structural diagram of an unmanned aerial vehicle for detecting a rail defect according to an embodiment of the present invention. As shown in fig. 1, an unmanned aerial vehicle for rail defect detection provided by an embodiment of the present invention includes:
power system 101 for provide the required power of unmanned aerial vehicle flight, make unmanned aerial vehicle can carry out each item flight activity safely.
The power system 101 includes: the battery, motor, electronic governor, screw can realize hovering, functions such as variable speed.
And the flight control system 102 is connected with the power system and used for controlling the attitude stability of the unmanned aerial vehicle, managing the unmanned aerial vehicle to execute tasks and processing emergency situations.
The flight control system 102 may be said to be the "brain" of the drone, which plays a decisive role in the flight performance of the drone.
And the communication navigation system 103 is connected with the flight control system and used for information transmission in the working process of the unmanned aerial vehicle.
The functions of the communication navigation system 103 mainly include: the remote control device is used for ensuring that the remote control command can be accurately transmitted, and the unmanned aerial vehicle can timely, reliably and accurately receive and send information so as to ensure the reliability, accuracy, instantaneity and effectiveness of information feedback.
A mission load system 104 connected to the flight control system, wherein a program of a rail defect detection method is pre-installed in the mission load system, and the position and the defect type of the rail defect are determined according to the method;
and the launching and recovery system 105 is used for ensuring that the unmanned aerial vehicle smoothly ascends to the air to achieve safe height and speed flight, and safely falls back to the ground from the sky after the mission is finished.
The following description mainly refers to a method for detecting a rail defect pre-installed in the mission load system 104, and specific structures of the remaining modules may refer to the related prior art, which is not described herein again.
Referring to fig. 2, fig. 2 is a schematic flow chart of a rail defect detection method according to an embodiment of the present invention. As shown in fig. 2, a rail defect detecting method provided by an embodiment of the present invention may include the following steps:
s1, acquiring a target rail image to be detected;
the target rail image is an image shot by the image acquisition equipment for the rail to be detected.
The image acquisition equipment is deployed in a task load system of the unmanned aerial vehicle and used for executing a detection task of rail defects.
The image acquisition device may include a camera, a video camera, a still camera, a mobile phone, etc.; in an alternative embodiment, the image capture device may be a high resolution camera.
In the embodiment of the present invention, the size of the target rail image is 416 × 416 × 3. Thus, at this step, in one embodiment, the 416 x 3 size target rail image may be obtained directly from the image capture end; in another embodiment, an image of any size sent by the image acquisition end can be obtained, and then the obtained image is subjected to certain size scaling processing to obtain a target rail image of 416 × 416 × 3 size.
In the two embodiments, the obtained image may be subjected to image enhancement operations such as cropping, stitching, smoothing, filtering, edge filling, and the like, so as to enhance features of interest in the image and expand the generalization capability of the data set.
S2, inputting the target rail image into an improved YOLOv3 network obtained by pre-training, and extracting the features of the target rail image by using a backbone network to obtain x feature maps with different scales; x is a natural number of 4 or more;
to facilitate understanding of the network structure of the improved YOLOv3 network proposed in the embodiment of the present invention, first, a network structure of a YOLOv3 network in the prior art is introduced, please refer to fig. 3, and fig. 3 is a schematic structural diagram of a YOLOv3 network in the prior art. In fig. 3, the part inside the dashed box is the YOLOv3 network. Wherein the part in the dotted line frame is a backbone (backbone) network of the YOLOv3 network, namely a darknet-53 network; the backbone network of the YOLOv3 network is formed by connecting CBL modules and 5 resn modules in series. The CBL module is a Convolutional network module, and includes a conv layer (convolutive layer, convolutive layer for short), a BN (Batch Normalization) layer and an leakage relu layer corresponding to an activation function leakage relu, which are connected in series, and the CBL represents conv + BN + leakage relu. The resn module is a residual error module, n represents a natural number, and specifically, as shown in fig. 2, res1, res2, res8, res8, and res4 are sequentially arranged along the input direction; the resn module comprises a zero padding (zero padding) layer, a CBL module and a Residual error unit group which are connected in series, the Residual error unit group is represented by Res unit n, the Residual error unit group comprises n Residual error units, each Residual error unit comprises a plurality of CBL modules which are connected in a Residual error Network (ResNet) connection mode, and the feature fusion mode adopts a parallel mode, namely an add mode.
The rest of the network outside the main network is a Feature Pyramid (FPN) network, which is divided into three prediction branches Y1~Y3Predicting branch Y1~Y3The scales of (2) are in one-to-one correspondence with the scales of the feature maps output by the 3 residual error modules res4, res8, res8 in the reverse direction of the input, respectively. The prediction results of the prediction branches are respectively represented by Y1, Y2 and Y3, and the scales of Y1, Y2 and Y3 are increased in sequence.
Each prediction branch of the FPN network includes a convolutional network module group, specifically includes 5 convolutional network modules, that is, CBL × 5 in fig. 3. In addition, the US (up sampling) module is an up sampling module; concat represents that the feature fusion adopts a cascade mode, and concat is short for concatenate.
For the specific structure of each main module in the YOLOv3 network, please refer to the schematic diagram below the dashed box in fig. 3.
In the embodiment of the invention, the improved YOLOv3 network comprises a backbone network and an improved FPN network; the improved YOLOv3 network is formed by increasing a feature extraction scale, optimizing a feature fusion mode of an FPN network, pruning and combining knowledge distillation to guide network recovery processing on the basis of a YOLOv3 network; the improved YOLOv3 network is trained according to the sample image and the position and the category of the target corresponding to the sample image. The network training process is described later.
To facilitate understanding of the present invention, the structure of the modified YOLOv3 network is described below.
For example, referring to fig. 4, in the embodiment of the present invention, the backbone network extracts at least feature maps of 4 scales to perform feature fusion of subsequent prediction branches, so that the number y of residual error modules is greater than or equal to 4, so as to correspondingly fuse feature maps output by the backbone network into each prediction branch. It can be seen that the improved YOLOv3 network obviously adds at least one finer-grained feature extraction scale in the backbone network compared with the YOLOv3 network. Please refer to fig. 3, compared with the YOLOv3 network, the feature map output by the fourth residual module in the reverse direction of the input is extracted for subsequent feature fusion. Therefore, the four densely connected modules of the backbone network along the reverse input direction respectively output corresponding feature maps, and the scales of the four feature maps are sequentially increased. Specifically, the scale of each feature map is 13 × 13 × 72, 26 × 26 × 72, 52 × 52 × 72, and 104 × 104 × 72, respectively.
Of course, in an alternative embodiment, five feature extraction scales may be set, that is, the feature map output by the fifth densely connected module with the extraction direction reversed along the input direction is added for subsequent feature fusion, and so on.
Specifically, for the step S2, obtaining x feature maps with different scales includes:
and obtaining x characteristic graphs which are output by the x dense connection modules along the input reverse direction and have sequentially increased scales.
Referring to fig. 3, feature maps respectively output by the first residual module to the fourth densely connected module in the reverse direction of the input are obtained, and the sizes of the four feature maps are sequentially increased.
The improved YOLOv3 network transmits the feature maps from shallow to deep, extracts the feature maps with at least four scales, enables the network to detect defects with different scales, especially tiny defects, by increasing the feature extraction scale with fine granularity, and simultaneously realizes accurate classification of the defects.
S3, performing feature fusion in a top-down and dense connection mode on x feature graphs of different scales by using an improved FPN (field programmable gate array) network to obtain a prediction result corresponding to each scale;
the feature fusion mode of the top-down dense connection mode is described below with reference to the structure of the improved FPN network shown in fig. 3.
The improved FPN network comprises x prediction branches Y with sequentially increased scale1~Yx(ii) a Prediction branch Y1~YxThe scales of the feature maps are in one-to-one correspondence with the scales of the x feature maps; illustratively, the modified FPN network of FIG. 3 has 4 prediction branches Y1~Y4The scales of the feature maps correspond to the scales of the 4 feature maps one to one.
Aiming at the step S3, the method for performing feature fusion in a top-down and dense connection mode on x feature graphs with different scales by using an improved FPN network comprises the following steps:
for predicted branch YiAcquiring feature maps with corresponding scales from the x feature maps and performing convolution processing;
the feature map after convolution processing and the prediction branch Yi-1~Y1Performing cascade fusion on the feature maps subjected to the upsampling treatment respectively;
wherein branch Y is predictedi-jHas an upsampling multiple of 2j(ii) a i is 2, 3, …, x; j is a natural number smaller than i.
Referring to fig. 4, i is 3, i.e. the prediction branch Y3For illustration, the feature maps for performing the cascade fusion process are derived from three aspects: on the first hand, a feature map with a corresponding scale is obtained from 4 feature maps and is subjected to convolution processing, namely the feature map output by a third dense connection module along the input reverse direction is subjected to CBL module processing, and the feature map can also be understood as being subjected to 1-time upsampling and has a size of 52 × 52 × 72; the second aspect derives from predicting branch Y2(i.e. Y)i-1=Y2) I.e. the characteristic map (size 26 x 72) output from the second densely connected block in the reverse direction of the input passes through the predicted branch Y2The CBL module of (2)12 times the feature map after upsampling (size 52 × 52 × 72); the third aspect derives from the predicted branch Y1(i.e. Y)i-2=Y1) I.e. the characteristic map (size 13 x 72) output by the first densely-connected module in the reverse direction along the input is predicted for branch Y1The CBL module of (2) is then passed24 times the feature map after upsampling (size 52 × 52 × 72); then, as will be understood by those skilled in the art, after the above-mentioned process performs upsampling processing on 3 feature maps with different scales output by the backbone network, the sizes of the 3 feature maps to be cascaded and fused can be made to be consistent, and all the sizes are 52 × 52 × 72. Thus, branch Y is predicted3After cascade fusion, convolution and other processes can be continued to obtain a prediction result Y3, wherein the size of Y3 is 52 × 52 × 72.
About predicted branch Y2And Y4See the prediction branch Y3And will not be described herein. For the predicted branch Y1And the subsequent prediction process is carried out by the first intensive connection module after the characteristic diagram output by the first intensive connection module along the input reverse direction is obtained, and the characteristic diagrams of other prediction branches are not received to be fused with the characteristic diagrams.
In the original FPN network feature fusion method of the YOLOv3 network, a method of adding deep layer and shallow layer network features and then performing upsampling is used, and after the features are added, a feature map is extracted through a convolutional layer, which may destroy some original feature information. In the embodiment, the feature fusion is combined with the horizontal mode and the top-down dense connection mode, in this mode, the original top-down mode is changed into a mode that the feature map of the prediction branch with a smaller scale directly transmits the feature of the prediction branch with a larger scale to each prediction branch, and the feature fusion mode is changed into a dense fusion method, that is, the deep features directly perform upsampling with different multiples, so that all the transmitted feature maps have the same size. The feature maps and the shallow feature map are fused in a serial connection mode, features are extracted again from the fusion result to eliminate noise in the feature maps, main information is reserved, and then prediction is carried out, so that more original information can be utilized, and high-dimensional semantic information participates in a shallow network. Therefore, the advantage that the dense connection network reserves more original semantic features of the feature map can be exerted, but for a top-down method, the reserved original semantic is higher-dimensional semantic information, so that the object classification is facilitated. By directly receiving the characteristics of the shallower layer network, more specific characteristics can be obtained, so that the loss of the characteristics can be effectively reduced, the parameter quantity needing operation can be reduced, and the prediction process is accelerated.
In the above, the feature fusion method is mainly described, each prediction branch is mainly predicted by using some convolution operations after feature fusion, and for how to obtain each prediction result, reference is made to related prior art, and no description is made here.
In the improved YOLOv3 network of the embodiment of the invention, 4 prediction branches output feature maps of four scales in total, which are respectively 13 × 13 × 72, 26 × 26 × 72, 52 × 52 × 72 and 104 × 104 × 72, and the smallest feature map of 13 × 13 × 72 is suitable for larger target detection due to the largest receptive field; the medium 26 × 26 × 72 feature map is suitable for detecting medium-sized targets due to the medium receptive field; the larger 52X 72 characteristic map is suitable for detecting smaller targets due to the smaller receptive field; the largest 104X 72 feature map is suitable for detecting smaller targets because the feature map has a smaller receptive field. Therefore, the image is divided more finely, and the prediction result is more targeted to the object with smaller size.
The network training process is described below. The network training is completed in the server, and the network training can comprise three processes of network pre-training, network pruning and network fine-tuning. The method specifically comprises the following steps:
firstly, building a network structure; the method can be improved on the basis of a YOLOv3 network, the feature extraction scale is increased, and the feature fusion mode of the FPN network is optimized to obtain the network structure shown in the figure 3 as a built network; wherein m is 4.
And (II) obtaining a plurality of sample images and the positions and the types of the targets corresponding to the sample images. In this process, the position and the category of the target corresponding to each sample image are known, and the manner of determining the position and the category of the target corresponding to each sample image may be: by manual recognition, or by other image recognition tools, and the like. Afterwards, the sample image needs to be marked, and an artificial marking mode can be adopted, and of course, other artificial intelligence methods can also be utilized to carry out non-artificial marking, which is reasonable. The position of each sample image corresponding to the target is marked in the form of a target frame containing the target, the target frame is real and accurate, and each target frame is marked with coordinate information so as to embody the position of the target in the image.
(III) determining the size of the anchor box in the sample image; may include the steps of:
a) determining the quantity to be clustered aiming at the size of the anchor box in the sample image;
in the field of target detection, an anchor box (anchor box) is a plurality of boxes with different sizes obtained by statistics or clustering from real boxes (ground route) in a training set; the anchor box actually restrains the predicted object range and adds the prior experience of the size, thereby realizing the aim of multi-scale learning. In the embodiment of the present invention, since a finer-grained feature extraction scale is desired to be added, the sizes of the target frames (i.e., real frames) marked in the sample image need to be clustered by using a clustering method, so as to obtain a suitable anchor box size suitable for the scene of the embodiment of the present invention.
Wherein, determining the quantity to be clustered aiming at the size of the anchor box in the sample image comprises the following steps:
determining the number of types of the anchor box size corresponding to each scale; and taking the product of the number of the types of the anchor box sizes corresponding to each scale and x as the quantity to be clustered of the anchor box sizes in the sample image.
Specifically, in the implementation of the present invention, the number of types of the anchor box size corresponding to each scale is selected to be 3; there are 4 scales, and then the number of anchor box sizes to be clustered in the obtained sample image is 3 × 4 or 12.
b) Acquiring a plurality of sample images with marked target frame sizes;
this step is actually to obtain the size of each target frame in the sample image.
c) Based on a plurality of sample images marked with the size of the target frame, acquiring a clustering result of the size of the anchor box in the sample images by using a K-Means clustering method;
specifically, the size of each target frame can be clustered by using a K-Means clustering method to obtain a clustering result of the size of the anchor box; no further details regarding the clustering process are provided herein.
Wherein, the definition of the distances of different anchor boxes is the Euclidean distance of the width and the height:
Figure BDA0002739603920000141
wherein d is1,2Representing the Euclidean distance, w, of the two anchor boxes1,w2Width, h, of the anchor box1,h2Representing the height of the anchor box.
For the number of clusters to be clustered being 12, the anchor box size of each predicted branch can be obtained.
d) And writing the clustering result into a configuration file of the improved YOLOv3 network.
Those skilled in the art can understand that the clustering result is written into the configuration file of each predicted branch of the improved YOLOv3 network according to the anchor box size corresponding to different predicted branches, and then the network pre-training can be performed.
And (IV) pre-training the constructed network by utilizing each sample image and the position and the category of the target corresponding to each sample image, wherein the method comprises the following steps:
1) and taking the position and the type of the target corresponding to each sample image as a true value corresponding to the sample image, and training each sample image and the corresponding true value through a built network to obtain a training result of each sample image.
2) And comparing the training result of each sample image with the true value corresponding to the sample image to obtain the output result corresponding to the sample image.
3) And calculating the loss value of the network according to the output result corresponding to each sample image.
4) And adjusting parameters of the network according to the loss value, and repeating the steps 1) -3) until the loss value of the network reaches a certain convergence condition, namely the loss value reaches the minimum value, which means that the training result of each sample image is consistent with the true value corresponding to the sample image, thereby completing the pre-training of the network and obtaining a complex network with higher accuracy.
(V) network pruning and network fine adjustment; the process is to carry out pruning and guide network recovery processing by combining knowledge distillation.
Pruning and knowledge-based distillation guided network recovery processing are performed, and the method comprises the following steps:
firstly, in a network obtained by adding a feature extraction scale on the basis of a YOLOv3 network and optimizing a feature fusion mode of an FPN network, carrying out layer pruning on a residual error module of a main network to obtain a YOLOv3-1 network;
usually, channel pruning is directly performed in the simplified processing process of the YOLOv3 network, but the inventor finds in experiments that the effect of fast speed increase is still difficult to achieve only through channel pruning. Therefore, the treatment process of layer pruning is added before channel pruning.
Specifically, the step may perform layer pruning on a residual error module of the backbone network in the improved yollov 3 network, so as to obtain a yollov 3-1 network.
Sparsifying training the YOLOv3-1 network to obtain a YOLOv3-2 network with BN layer scaling coefficients sparsely distributed;
illustratively, a YOLOv3-1 network is subjected to sparse training to obtain a YOLOv3-2 network with a BN layer scaling coefficient in sparse distribution; the method can comprise the following steps:
carrying out sparse training on a YOLOv3-1 network, adding sparse regularization for a scaling factor gamma in the training process, wherein the loss function of the sparse training is as follows:
Figure BDA0002739603920000161
wherein the content of the first and second substances,
Figure BDA0002739603920000162
representing the loss function of the network origin, (x, y) representing input data and target data of the training process, W representing trainable weights,
Figure BDA0002739603920000163
and g (gamma) is a penalty function for sparse training of the scale coefficient, and lambda is weight. The penalty function selects the L1 norm since the scaling factor γ is to be sparse. Meanwhile, because the proportion of the latter term is unknown, the lambda parameter is introduced for adjustment.
Because the value of the lambda is related to the convergence rate of sparse training, the application scenario of the embodiment of the invention is a rail detection scenario, the number of the types of the targets to be detected can be set to be 13, which is far smaller than 80 types in the original YOLOv3 data set, so that the value of the lambda can be a larger lambda value, the convergence rate of sparse training is not too slow, and the convergence can be further accelerated by a method for improving the model learning rate; however, considering that the accuracy of the network model is lost due to excessive parameter selection, after the learning rate and the lambda parameter are continuously adjusted, the combination with the learning rate of 0.25 x and the lambda of 0.1 x is finally determined to be the optimal parameter combination for sparse training. The preferred combination of the learning rate and the weight in the embodiment of the invention is more favorable for the distribution of the weight after the coefficient training, and the accuracy of the network model is higher.
Thirdly, channel pruning is carried out on the YOLOv3-2 network to obtain a YOLOv3-3 network;
after the sparsification training, a network model with the BN layer scaling coefficients distributed sparsely is obtained, so that the importance of which channels is smaller can be determined conveniently. These less important channels can thus be pruned by removing incoming and outgoing connections and the corresponding weights.
Performing a channel pruning operation on the network, pruning a channel corresponding to substantially removing all incoming and outgoing connections of the channel, may directly result in a lightweight network without the use of any special sparse computation packages. In the channel pruning process, the scaling factor serves as a proxy for channel selection; because they are jointly optimized with network weights, the network can automatically identify insignificant channels that can be safely removed without greatly impacting generalization performance.
Specifically, the step may include the steps of:
setting a channel pruning proportion in all channels of all layers, then arranging all BN layer scaling factors in the YOLOv3-2 network in an ascending order, and pruning channels corresponding to the BN layer scaling factors arranged in the front according to the channel pruning proportion.
In a preferred embodiment, the channel pruning proportion may be 60%.
Through channel pruning, redundant channels can be deleted, the calculated amount is reduced, and the detection speed is accelerated.
However, after channel pruning, some precision may be reduced due to the reduction of parameters, the influence of different pruning proportions on the network precision is analyzed, if the network pruning proportion is too large, the network volume is compressed more, but the network precision is also reduced sharply, a certain loss is caused to the network accuracy, so that the balance between the network compression proportion and the compressed network precision needs to be carried out, and therefore, a knowledge distillation strategy is introduced to finely adjust the network, so that the network accuracy is improved.
Fourthly, knowledge distillation is carried out on the YOLOv3-3 network to obtain an improved YOLOv3 network.
Through pruning, a more compact Yolov3-3 network model is obtained, and then fine tuning is needed to recover the precision. The strategy of knowledge distillation is introduced here.
Specifically, knowledge distillation is introduced into a YOLOv3-3 network, the complex network is used as a teacher network, a YOLOv3-3 network is used as a student network, and the teacher network guides the student network to carry out precision recovery and adjustment, so that an improved YOLOv3 network is obtained.
As a preferred embodiment, the output result before the Softmax layer of the complex network is divided by the temperature coefficient to soften the predicted value finally output by the teacher network, and then the student network uses the softened predicted value as a label to assist in training the YOLOv3-3 network, so that the precision of the YOLOv3-3 network is finally equivalent to that of the teacher network; the temperature coefficient is a preset value and does not change along with network training.
The reason for introducing the temperature parameter T is that a trained and highly accurate network is substantially consistent with the classification result of the input data and the real label. For example, with three classes, the true known training class label is [1,0,0], the prediction result may be [0.95,0.02,0.03], and the true label value is very close. Therefore, for the student network, the classification result of the teacher network is used for assisting training and the data is directly used for training, and the difference is not great. The temperature parameter T can be used to control the softening degree of the prediction tag, i.e. the deviation of the teacher's network classification result can be increased.
The fine adjustment process added with the knowledge distillation strategy is compared with the general fine adjustment process, and the network accuracy recovered through knowledge distillation adjustment is higher.
The method comprises the steps of performing layer pruning, sparse training, channel pruning and knowledge distillation processing on a pre-trained network, selecting optimized processing parameters in each processing process to obtain a simplified network, greatly reducing the volume of the network, eliminating most redundant calculation, obtaining the network which is an improved YOLOv3 network for detecting a target rail image subsequently, greatly improving the detection speed based on the network, and maintaining the detection precision. The method can meet the requirement on high detection real-time performance, and can be completely deployed in image acquisition equipment at an image acquisition end due to small network size and low resource demand.
S4, obtaining attribute information of the target rail image based on all the prediction results, wherein the attribute information comprises the position and the type of the target in the target rail image;
the improved YOLOv3 network further includes a classification network and a non-maxima suppression module; the classification network and the non-maximum suppression module are connected in series after the FPN network.
Obtaining attribute information of the target rail image based on all prediction results, wherein the attribute information comprises:
classifying all prediction results through a classification network, and then performing prediction frame duplicate removal through a non-maximum suppression module to obtain attribute information of the target rail image;
wherein the classification network comprises a SoftMax classifier. The purpose is to realize the mutually exclusive classification of a plurality of defect categories. Optionally, the classification network may also perform classification along a logistic regression using the YOLOv3 network to achieve multiple independent two classifications.
The non-maximum suppression module is configured to perform NMS (non _ max _ suppression) processing. The method is used for repeatedly selecting a plurality of prediction boxes of the same target, and the prediction boxes with relatively low confidence coefficient are excluded.
For the content of the classification network and the non-maximum suppression module, reference may be made to the related description of the prior art, and details thereof are not repeated here.
It should be noted that fig. 3 does not show the classification module and the non-maximum suppression module for the sake of simplicity.
For each target, the detection result is in the form of a vector, including the position of the prediction box, the confidence of the defect in the prediction box, and the category of the target in the prediction box. The position of the prediction frame is used for representing the position of the target in the target rail image; specifically, the position of each prediction frame is represented by four values, bx, by, bw and bh, bx and by are used for representing the position of the center point of the prediction frame, and bw and bh are used for representing the width and height of the prediction frame.
The target in the embodiment of the invention is characterized by the defect on the railway rail, and the category of the target is the category of the defect to which the target belongs, such as scar, crack, ripple scratch, wrinkle, peeling, abrasion, indentation and the like.
The existing YOLOv3 network contains many convolutional layers because there are 80 types of targets. In the embodiment of the invention, the targets are mainly defects on the rails, and the number of the types of the targets is small, so that a large number of convolution layers are not necessary, network resources are wasted, and the processing speed is reduced.
In addition, optionally, the improved YOLOv3 network may also be obtained by adjusting the value of k in the convolutional network module group of each prediction branch in the FPN network, that is, reducing k from 5 in the original YOLOv3 network to 4 or 3, that is, changing the original CBL 5 to CBL 4 or CBL 3; therefore, the number of the convolution layers in the FPN network can be reduced, the target rail image of the embodiment of the invention can integrally realize the simplification of the number of network layers under the condition of not influencing the network precision, and the network processing speed is improved.
In the scheme provided by the embodiment of the invention, on one hand, a plurality of feature extraction scales are adopted, the feature extraction scale with fine granularity is added for the small target, and the detection precision of the small target in the target rail image can be improved, so that the rail defect can be accurately detected and classified, and the problem of missed detection of the small defect can be solved. On the other hand, the feature fusion mode of the FPN is changed, feature fusion is carried out on feature graphs extracted from a main network in a top-down dense connection mode, deep features are directly subjected to upsampling of different multiples, all transmitted feature graphs have the same size, the feature graphs and shallow feature graphs are fused in a series connection mode, more original information can be utilized, high-dimensional semantic information participates in a shallow network, and the detection precision is improved; meanwhile, more specific characteristics can be obtained by directly receiving the characteristics of a shallower network, the loss of the characteristics can be effectively reduced, the parameter quantity needing to be calculated can be reduced, the detection speed is improved, and real-time detection is realized. In another aspect, layer pruning, sparsification training, channel pruning and knowledge distillation processing are carried out on the pre-trained network, optimized processing parameters are selected in each processing process, the network volume can be reduced, most redundant calculation is eliminated, and the detection speed can be greatly improved under the condition of maintaining the detection precision.
The network improvement and the rail image detection performance of the embodiment of the invention are described in the following by combining the experimental process of the inventor, so as to facilitate the deep understanding of the performance.
The invention aims at sparse training, and the learning rate and mu can be adjusted according to the trade-off so as to ensure the convergence speed and precision. The present solution attempts different learning rates and values of μ as shown in table 2. Finally, parameter combinations 5 are selected by comparing the gamma weight distribution situation graphs. See fig. 5-1 and 5-2 for a gamma weight distribution map for parameter set 5. FIG. 5-1 is a graph of weight shift for parameter set 5; fig. 5-2 is a weight overlap graph of the parameter combination 5.
TABLE 2 different learning rates and λ combinations
Combination of Learning rate λ
1
2 0.1×
3 0.1×
4 0.025×
5 0.25× 0.1×
In fact, the initial experimental design herein does not include pruning of the network layer, and the original plan is to perform channel pruning directly. However, according to the analysis of the channel pruning result, more than half of the dense connection layers are found to have weights close to 0, so that the channels of the whole layer are pruned according to the channel pruning rule. This indicates that there are redundant units in the 4 residual error modules designed above, so before channel pruning, layer pruning may be performed to greatly reduce redundancy, and then channel pruning of relatively finer granularity is performed. Because more than half of the dense connection units are redundant units, layer pruning is performed by subjecting the residual modules to layer pruning to obtain the YOLOv3-1 network.
Then, carrying out sparse training on the YOLOv3-1 network to obtain a YOLOv3-2 network with BN layer scaling coefficients in sparse distribution;
channel pruning is carried out on the YOLOv3-2 network to obtain a YOLOv3-3 network;
the channel pruning ratio may be 60%, because a small number of target types in the target rail image to be detected are greatly affected in the network compression process, which directly affects the mAP, and therefore, the data set and the network compression ratio are considered. For processing the data set, the embodiment of the present invention selects the type of the target with a smaller number of combinations to balance the number of different types, or directly adopts the data set with more balanced type distribution, which is consistent with the application scenario of the embodiment of the present invention. In addition, the compression ratio is controlled, and the prediction accuracy of the types with small quantity is ensured not to be reduced too much.
In addition to analyzing the influence of compression from precision, the relationship between the detection time and the model compression ratio is also considered, and the simulation is carried out on the time for detecting the road image on different platforms (Tesla V100 server and Jetson TX2 edge equipment) of the network model processed with different pruning ratios, so that according to the simulation result, the influence of different network compression ratios on the detection time is very weak, the influence on the time required by NMS (non-maximum suppression) is large, the detection speed is accelerated along with the network compression before the compression ratio reaches 60%, but the detection speed is slowed down after the compression ratio exceeds 60%. Thus, the final selected channel pruning ratio is 60%.
Knowledge distillation is carried out on the YOLOv3-3 network to obtain an improved YOLOv3 network.
Meanwhile, the detection performance of the improved YOLOv3 network and the original YOLOv3 network of the invention are simulated, and the results are shown in table 2.
TABLE 2 comparison of detection Performance of the modified YOLOv3 network model and the original YOLOv3
Network mAP Size of model Detection time (Tesla V100)
YOLOv3 0.73 236M 42.8ms
Improved YOLOv3 0.852 222M 36.3ms
As can be seen from table 2, the detection accuracy of the improved YOLOv3 network formed by adding fine-grained feature extraction scale to the original YOLOv3 network and replacing the original horizontally connected FPN with densely connected FPN is improved by 16.7%, the model volume is relatively reduced, and the detection speed is improved by 15%.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
For the embodiments of the electronic device and the computer-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. An unmanned aerial vehicle for rail defect detection, comprising:
the power system is used for providing power required by the flight of the unmanned aerial vehicle;
the flight control system is connected with the power system and used for controlling the attitude stability of the unmanned aerial vehicle, managing the unmanned aerial vehicle to execute tasks and processing emergency situations;
the communication navigation system is connected with the flight control system and is used for information transmission in the working process of the unmanned aerial vehicle;
the mission load system is connected with the flight control system, is pre-loaded with a program of a rail defect detection method, and determines the position and the defect type of a rail defect according to the method;
and the launching and recovery system is used for ensuring that the unmanned aerial vehicle smoothly flies at a safe height and speed and safely falls back to the ground from the sky after the mission is finished.
2. The drone of claim 1, wherein the rail defect detection method comprises:
acquiring a target rail image to be detected;
inputting the target rail image into an improved YOLOv3 network obtained by pre-training, and performing feature extraction on the target rail image by using a backbone network to obtain x feature maps with different scales; x is a natural number of 4 or more;
carrying out feature fusion in a top-down and dense connection mode on the x feature graphs with different scales by using an improved FPN (field programmable gate array) network to obtain a prediction result corresponding to each scale;
obtaining attribute information of the target rail image based on all prediction results, wherein the attribute information comprises the position and the category of a target in the target rail image;
wherein the improved YOLOv3 network comprises a trunk network and the improved FPN network which are connected in sequence; the improved YOLOv3 network is formed by increasing a feature extraction scale, optimizing a feature fusion mode of an FPN network, pruning and combining knowledge distillation to guide network recovery processing on the basis of a YOLOv3 network; the improved YOLOv3 network is trained according to sample images and the positions and the types of the targets corresponding to the sample images.
3. The drone of claim 2, wherein the backbone network of the modified YOLOv3 network comprises:
y residual modules connected in series; y is a natural number of 4 or more; y is greater than or equal to x;
the method for extracting features by using the backbone network to obtain x feature maps with different scales comprises the following steps:
and performing feature extraction on the target rail image by utilizing y residual modules connected in series to obtain x feature maps which are output by the x residual modules in the reverse direction along the input direction and have sequentially increased scales.
4. The unmanned aerial vehicle of claim 2, wherein the top-down, densely connected feature fusion of the x feature maps of different scales using the modified FPN network comprises:
for predicted branch YiAcquiring feature maps with corresponding scales from the x feature maps and performing convolution processing;
the feature map after convolution processing and the prediction branch Yi-1~Y1Performing cascade fusion on the feature maps subjected to the upsampling treatment respectively;
wherein the improved FPN network comprises x prediction branches Y with sequentially increased scale1~Yx(ii) a The above-mentionedPrediction branch Y1~YxThe scales of the x feature maps correspond to the scales of the x feature maps one by one; prediction branch Yi-jHas an upsampling multiple of 2j(ii) a i is 2, 3, …, x; j is a natural number smaller than i.
5. The drone of claim 4, wherein the pruning and knowledge-based distillation guided network recovery process includes:
in a network obtained by adding a feature extraction scale on the basis of the YOLOv3 network and optimizing a feature fusion mode of the FPN network, carrying out layer pruning on a residual error module of a backbone network to obtain a YOLOv3-1 network;
sparse training is carried out on the YOLOv3-1 network to obtain a YOLOv3-2 network with BN layer scaling coefficients in sparse distribution;
performing channel pruning on the YOLOv3-2 network to obtain a YOLOv3-3 network;
knowledge distillation is carried out on the YOLOv3-3 network to obtain the improved YOLOv3 network.
6. The drone of claim 2, further comprising, prior to training the modified YOLOv3 network:
determining the quantity to be clustered aiming at the size of the anchor box in the sample image;
acquiring a plurality of sample images with marked target frame sizes;
based on a plurality of sample images marked with the size of the target frame, acquiring a clustering result of the size of the anchor box in the sample images by using a K-Means clustering method;
writing the clustering result into a configuration file of the improved YOLOv3 network.
7. The drone of claim 2, wherein the modified YOLOv3 network further comprises a classification network and a non-maxima suppression module.
8. The drone of claim 7, wherein the deriving attribute information for the target rail image based on all of the predictions comprises:
and classifying all prediction results through the classification network, and then performing prediction frame deduplication processing through the non-maximum suppression module to obtain attribute information of the target rail image.
9. The drone of claim 8, wherein the classification network includes a SoftMax classifier.
10. A drone according to claim 9, characterised in that the loss function of the sparse training is:
Figure RE-FDA0002898543880000041
wherein the content of the first and second substances,
Figure RE-FDA0002898543880000042
representing the loss function of the network origin, (x, y) representing input data and target data of the training process, W representing trainable weights,
Figure RE-FDA0002898543880000043
and g (gamma) is a penalty function for sparse training of the scale coefficient, and lambda is weight. The penalty function selects the L1 norm since the scaling factor γ is to be sparse.
CN202011145523.7A 2020-10-23 2020-10-23 A unmanned aerial vehicle for rail defect detection Withdrawn CN112395952A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011145523.7A CN112395952A (en) 2020-10-23 2020-10-23 A unmanned aerial vehicle for rail defect detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011145523.7A CN112395952A (en) 2020-10-23 2020-10-23 A unmanned aerial vehicle for rail defect detection

Publications (1)

Publication Number Publication Date
CN112395952A true CN112395952A (en) 2021-02-23

Family

ID=74596303

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011145523.7A Withdrawn CN112395952A (en) 2020-10-23 2020-10-23 A unmanned aerial vehicle for rail defect detection

Country Status (1)

Country Link
CN (1) CN112395952A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113269275A (en) * 2021-06-21 2021-08-17 昆明理工大学 Real-time detection method for silkworm cocoon
CN113284122A (en) * 2021-05-31 2021-08-20 五邑大学 Method and device for detecting roll paper packaging defects based on deep learning and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284122A (en) * 2021-05-31 2021-08-20 五邑大学 Method and device for detecting roll paper packaging defects based on deep learning and storage medium
CN113269275A (en) * 2021-06-21 2021-08-17 昆明理工大学 Real-time detection method for silkworm cocoon

Similar Documents

Publication Publication Date Title
CN114241282B (en) Knowledge distillation-based edge equipment scene recognition method and device
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN108647655B (en) Low-altitude aerial image power line foreign matter detection method based on light convolutional neural network
CN110909667B (en) Lightweight design method for multi-angle SAR target recognition network
CN112380921A (en) Road detection method based on Internet of vehicles
CN110070183A (en) A kind of the neural network model training method and device of weak labeled data
CN112288700A (en) Rail defect detection method
CN111507370A (en) Method and device for obtaining sample image of inspection label in automatic labeling image
CN112381763A (en) Surface defect detection method
KR102349854B1 (en) System and method for tracking target
CN112464718B (en) Target detection method based on YOLO-Terse network and storage medium
CN112364721A (en) Road surface foreign matter detection method
CN112395952A (en) A unmanned aerial vehicle for rail defect detection
CN113569672A (en) Lightweight target detection and fault identification method, device and system
CN110599459A (en) Underground pipe network risk assessment cloud system based on deep learning
CN115048870A (en) Target track identification method based on residual error network and attention mechanism
CN110348503A (en) A kind of apple quality detection method based on convolutional neural networks
CN113420651A (en) Lightweight method and system of deep convolutional neural network and target detection method
CN112395953A (en) Road surface foreign matter detection system
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN112380917A (en) A unmanned aerial vehicle for crops plant diseases and insect pests detect
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping
CN117114053A (en) Convolutional neural network model compression method and device based on structure search and knowledge distillation
CN116363469A (en) Method, device and system for detecting infrared target with few samples
CN115147432A (en) First arrival picking method based on depth residual semantic segmentation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210223