CN115830371A

CN115830371A - Deep learning-based rail transit subway steering frame rod member classification detection method

Info

Publication number: CN115830371A
Application number: CN202211487442.4A
Authority: CN
Inventors: 曾柏文; 陈晓龙
Original assignee: Guangzhou Songxing Electric Co ltd
Current assignee: Guangzhou Songxing Electric Co ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-21

Abstract

The invention provides a deep learning-based rail transit subway steering frame rod member classification detection method which can be applied to a robot detection workstation of a subway train set in a garage. And positioning the position of the rod piece in real time by utilizing a deep learning target detection algorithm and a yolov5 network detection model. The deep neural network technology is used for recognizing and learning a large amount of rod piece image information, accurate positioning and classification of consistent rod piece types, different sizes, inconsistent rod piece types, the same sizes, consistent rod piece types and inconsistent rod piece types are achieved, and classification detection and evaluation are further performed on the recognized rod pieces. The invention is based on the improved k-means algorithm of the genetic algorithm, and solves the problem that the traditional k-means algorithm depends on the initial 'cluster center' of a data set; the loss function of yolov5 is improved, the L2 norm is increased, and the risk of overfitting is effectively reduced by adopting an updating formula of the L2 regularized weight.

Description

Deep learning-based rail transit subway steering frame rod member classification detection method

Technical Field

The invention relates to the field of image processing and deep learning, in particular to a rail transit subway steering frame rod member classification detection method based on deep learning.

Background

With the rapid development of urban rail transit in China, by 6 months in 2022, 51 cities in provinces and cities in China have opened operating rail transit lines, the operating mileage is 9067 kilometers, and the lines are 277. With the increase of the number of subway train sets, an important subject about how to ensure the running safety, efficient maintenance and stable transportation of the subway train sets is in front of the people, and the important subject is an all-round guarantee for reducing the failure rate of the train sets to the minimum.

The running time of the subway train set is in the daytime, so the overhaul time of the train set can be intensively arranged at midnight, the structure of the components of the train set is complex, and the number of parts is large, which undoubtedly provides great challenges for the working state and physical and psychological health of overhaul personnel, and the factors often influence the overhaul efficiency and the operation completion quality of the train set.

With the recent development and the iterative update of deep learning technology, breakthrough progress is made in many fields, such as unmanned driving, voice recognition and automatic machine translation. Aiming at manually detecting the following disadvantages of a subway train set: the method is slow, time-consuming and inefficient, and can be used for carrying out visual detection on the subway train set by using a traditional machine learning algorithm SVM to solve the problem of low overhauling efficiency of workers at night, but the efficiency and accuracy of the method still have a lot of places to be improved. According to the work of the inventor signed at present, under the condition that the detection efficiency and the accuracy in the aspect of detecting the bogie rod pieces of the subway train set are still relatively low, due to the fact that the bogie rod pieces of the subway train set are various and large in number, higher requirements are provided for the robustness, the identification precision and the efficiency of a deep neural network model.

Therefore, a more accurate and efficient rod part detection method is needed, so that the labor load and the operation difficulty of workers for overhauling at night are reduced, the overhauling accuracy of a subway train set is improved, and the rapid promotion of the manual overhauling to the mechanical overhauling is facilitated.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a deep learning-based rail transit subway steering frame rod member classification detection method which can be applied to a robot detection workstation of a subway train set in a garage, so that the labor load and operation difficulty of workers for overhauling at night are reduced, the overhauling accuracy of the subway train set is improved, and the 'manual overhauling' is quickly promoted to 'machine overhauling'.

In order to achieve the purpose, the invention provides the following technical scheme: the rail transit subway steering frame rod component classification detection method based on deep learning is characterized by comprising the following steps of:

step I, manufacturing a rod piece picture database of the bogie of the subway train set, wherein the database comprises a same rod piece type graph database and different rod piece type graph databases; and randomly dividing the training set and the test set according to a certain proportion;

step II, inputting the pictures containing the rod pieces in the database into a yolov5 target detection network model, and carrying out region positioning on the rod pieces; the method specifically comprises the following steps:

step II-1: manually marking the rod piece pictures of the subway train set; and generating an xml tag file conforming to the VOC2007 data format by adopting a manual labeling method to obtain the labeling information of various images, such as the type, the position and the frame size of each rod piece, the type name of each rod piece and the image number in the coordinate regions Row _ min, row _ max, col _ min and Col _ max of the labeling boundary box. Then converting the xml file into a txt file which can be recognized by yolov 5;

step II-2, calculating to obtain prior frames anchors of various rod pieces of the subway train set by adopting a k-means algorithm improved based on a genetic algorithm based on the training set obtained in Step I, so that the prior frames anchors are more suitable for the rod pieces with different sizes and the candidate frames of the rod pieces with different types of the subway train set;

step II-3, preferably, to expand the number of training set samples, can be enhanced by: random region clipping, random noise filtering, rotation transformation, brightness transformation and the like;

step II-4, establishing a target positioning model of various rod pieces of the subway train set;

and Step II-5, inputting the pictures containing the rod members in the training set into a CSPDarknet layer of yolov5 for training, respectively passing through a plurality of convolution layers and an upper sampling layer, and storing the weight file obtained by training.

Step III, inputting the area image positioned by the rod piece into a deep convolution neural network, and training the type detection models of all the rod pieces;

and Step IV, inputting the picture to be detected into a target detection network, and inputting the positioning picture into a deep convolution neural network to obtain detection results of different rod types.

The invention is further configured to: the Step II is specifically that a k-means algorithm improved based on a genetic algorithm is used as a yolov5 target detection network model, and prior frames anchors of various rod pieces of the subway train set are obtained firstly.

The invention is further configured to: the txt label file comprises the serial number of the rod piece type, the Euclidean distance between the center points of the target frame and the prediction frame, and the diagonal distance of the target frame.

The invention is further configured to: the improved k-means algorithm based on the genetic algorithm comprises the following steps:

s1: firstly, reading a manufactured label file, dividing samples in all data sets into k disjoint subsets, then executing a k-means algorithm for M times, and generating a group of k prior frames anchors meeting the requirements after executing the k-means algorithm each time. Repeating the step, and sequencing all the prior frames anchors according to the area size to be used as a chromosome when the number M of the population meets the requirement of the population quantity because each prior frame anchor is equivalent to the combination of Width and Height;

s2: selecting a plurality of pairs of chromosomes from the population according to the set crossover probability to carry out crossover operation;

s3: selecting a plurality of pairs of chromosomes from the population according to the set mutation probability to carry out mutation operation;

s4: according to the constitution of the chromosomes, a fitness function is calculated for all chromosomes of the current population. And (3) sequencing the individuals of the group by calculating the value of the fitness function, and selecting the current group by using a roulette method to ensure that the number of the new group is M, wherein the fitness function is as follows:

wherein, boxes are the number of all target frames in the label file read by the k-means algorithm, i is the number of learners, j is the number of clusters, C _j Denotes the jth cluster, X _i Denotes to belong to C _j Learner, y _j Is represented by C _j The cluster center of (2). For the compounds belonging to C _j Element X in class _i And calculating Euclidean distances from the Euclidean distance to the clustering center, and solving the sum of all the distance values to obtain the fitness of the class, wherein the fitness of the chromosome consisting of the clustering center is obtained by adding the reciprocal of 1 to the sum of the fitness of all the classes.

The invention is further configured to: the yolov5 target detection network model uses Mosaic data to enhance operation and improve the training speed of the model and the accuracy of the network, a CSPDarknet network structure and a Bottleneck residual error structure are adopted, the residual error structure is used for solving the problems of gradient disappearance and gradient explosion, the model firstly passes through a 1 x 1 convolution layer, then passes through a 3 x 3 convolution layer, and finally is added with initial input through the residual error structure, a convolution kernel is responsible for extracting image characteristics, a pooling layer is responsible for reducing the dimension of the image characteristics, further screening and refining characteristics, and transfer learning is carried out by using a pre-trained weight file yolov5 s.pt.

The invention is further configured to: the Step II of carrying out region positioning on the rod piece specifically comprises the following steps: inputting a picture containing a rod into a CSPDarknet layer of yolov5 for training, and obtaining a weight file to be stored after carrying out convolution and upsampling for multiple times; in network training, the network outputs prediction boxes based on an initial anchor box, each prediction bounding box containing the following information: the central coordinates of the prediction frame, the height and the width of the prediction frame, the confidence score of whether the frame contains an object or not and the probability of belonging to a certain category are compared with the real frame group, the difference value of the two is calculated, and then the network parameters are updated reversely and iterated. The final prediction result is finally obtained by judging whether Non-Maximum Suppression (NMS) and a confidence score threshold are met.

To reduce the risk of over-fitting, it is simplest to increase the number of samples of other features and retrain the network, but this will reduce the efficiency of training. A standard neural network of a dropout method can be added in the network, and a certain neuron is skipped with a certain probability p. Besides the dropout method, a weight penalty term can be added into the error function, and a term which is increased along with the amplitude of the weight vector is added into the CIOU loss function, so that the CIOU loss function can search a value with smaller weight by adopting a gradient descent method, and the risk of overfitting is further reduced.

The invention is further configured to: the Bounding box loss function of the yolov5 target detection network model is as follows:

CIOU _loss in the function, IOU represents IOU error, distance_ ₂ ² Distance, representing the Euclidean Distance between the center points of the target frame and the predicted frame _C ² Is the diagonal distance of the target frame, and α and v are the aspect ratios;

where α is a balance parameter and does not participate in the gradient calculation. v is a parameter for measuring the uniformity of the aspect ratio, w ^gt ，h ^gt Is the width and height, w, of the real target frame ^p ，h ^p Is the width and height of the prediction box;

regularization is achieved using an L2 norm, w _mn Representing the weight between neuron m and neuron n, using the weight update formula before regularization as w _ij ＝w _ij +ηδ _j x _ij ，w _ij Represents the connection weight of the neurons j to i, and η represents the learning rate, δ _j Error term, x, representing neuron j _ij Representing the input value of neuron i to neuron j. The update function using the L2 regularized weights is, w _ij ＝(1-ηλ)w _ij +ηδ _j x _ij λ is a hyper-parameter for controlling the regularization strength, and the larger λ is, the heavier the penalty is for a value with a larger weight, so that the risk of overfitting can be effectively reduced.

The invention is further configured to: the Step III comprises the following steps: and aiming at the rod piece image identified by Step II artificial marking and positioning, scaling the marked image according to a certain proportion and inputting the scaled image into a rod piece type detection model based on a deep convolutional neural network according to different rod piece types, and generating a final weight model through training of a large amount of data and fine adjustment of parameters.

The invention is further configured to: the deep convolutional neural network is a convolutional neural network based on a Backbone, the training speed of a model and the accuracy of the network are improved by using Mosaic data enhancement operation, the network configuration comprises 5 convolutional layers, 3 full-connection layers, an activation function Conv + BN + Leaky _ relu, a local regularized LRN and a corresponding overfitting Dropout neural network.

In summary, the technical scheme of the invention has the following beneficial effects:

1. according to the method, the end-to-end deep learning target detection algorithm is utilized, the position where the rod piece is placed is positioned in real time through the yolov5 network detection model, and compared with image feature extraction methods such as SIFT (scale invariant feature transform), HOG (histogram of oriented gradient), blob analysis and the like which are manually designed in traditional image processing, the yolov5 network detection model can more comprehensively and accurately describe information contained in the rod piece in the read image.

2. The invention adopts the improved k-means algorithm based on the genetic algorithm, solves the problem that the traditional k-means algorithm depends on the initial cluster center of a data set, and because the initial cluster center point is randomly selected, the finally obtained cluster division is related to the randomly selected cluster center, and the predicament of falling into the local minimum value caused by depending on the initial cluster center can be avoided after the genetic algorithm mechanism is added.

3. According to the method, a loss function of yolov5 is improved, the L2 norm is increased, each weight is multiplied by (1-eta lambda) by adopting an updating formula of the weight after L2 regularization, wherein eta is the learning rate of weight updating, lambda is a hyper-parameter for controlling regularization strength, and the larger lambda is, the more punishment is carried out on the value with the larger weight, so that the risk of overfitting is effectively reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a system flow chart of a subway train set rod type detection method based on deep learning of the invention;

FIG. 2 is a flow chart of the invention for computing Anchor based on a genetic algorithm modified k-means algorithm;

fig. 3 is a flowchart of a specific method for inputting a picture containing a rod member in a database into a yolov5 target detection network model to perform area location on the rod member according to the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the following description of the technical solutions of the present invention with reference to the accompanying drawings of the present invention is made clearly and completely, and other similar embodiments obtained by a person of ordinary skill in the art without any creative effort based on the embodiments in the present application shall fall within the protection scope of the present application. In addition, directional terms such as "upper", "lower", "left", "right", etc. in the following embodiments are directions with reference to the drawings only, and thus, the directional terms used are intended to illustrate rather than limit the invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

As used herein, the singular forms "a", "an" and "the" may include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises/comprising," "includes" or "including," etc., specify the presence of stated features, integers, steps, operations, components, parts, or combinations thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, components, parts, or combinations thereof. Also, the terminology used in this specification includes any and all combinations of the associated listed items.

The invention is further described with reference to the drawings and the preferred embodiments.

The embodiment is as follows:

as shown in fig. 1, the general working process of the deep learning-based method for detecting the type of the member of the metro train set is shown, and the deep learning-based method for detecting the classification of the member of the rail transit metro turning bar is characterized by comprising the following steps:

step I, manufacturing a rod piece picture database of the bogie of the subway train set, wherein the database comprises a same rod piece type graph database and different rod piece type graph databases;

in specific implementation, a great number of subway train set bar graph manufacturing picture databases collected by robots of overhaul workstations in a garage are adopted, wherein the picture databases comprise bar graph databases with the same size, shape and different sizes and shapes, and the bar graph databases are randomly divided into training sets and testing sets according to a certain proportion.

Step II, inputting the pictures containing the rod pieces in the database into a yolov5 target detection network model, and carrying out region positioning on the rod pieces; as shown in fig. 3, the method specifically includes the following steps:

step II-1: manually marking the rod piece pictures of the subway train set; and generating an xml tag file conforming to the VOC2007 data format by adopting a manual labeling method to obtain the labeling information of various images, such as the type, the position and the frame size of each rod piece, the type name of each rod piece and the image number in the coordinate regions Row _ min, row _ max, col _ min and Col _ max of the labeling boundary box. Then converting the xml file into a txt file which can be recognized by yolov 5; the txt label file comprises the serial number of the rod type, the Euclidean distance between the center points of the target frame and the prediction frame, and the diagonal distance of the target frame.

Step II-2, calculating to obtain prior frames anchors of various rod pieces of the subway train set by adopting a k-means algorithm improved based on a genetic algorithm based on the training set obtained in Step I, so that the prior frames anchors are more suitable for the rod pieces with different sizes and the candidate frames of the rod pieces with different types of the subway train set; the Anchor obtained by replacing yolov5 originally based on ImageNet enables the algorithm training efficiency to be higher.

FIG. 2 shows a flowchart for computing Anchor based on the k-means algorithm modified by genetic algorithm; the method comprises a specific process of a k-means algorithm improved based on a genetic algorithm, and comprises the following steps:

s1: in Step II-1, the txt label file converted from the xml file is a label file (label file), the file is read through a k-means clustering algorithm, then the samples in the data set are divided into a plurality of usually disjoint subsets, and each subset is called as a cluster or a centroid. K =9 is taken, and since prediction is performed on 3 feature scales in yolov5, and 3 anchors are adopted in each scale, feature maps of 3 different target scales can be predicted, so that the total number of anchors is 9. Each time the k-means algorithm is executed, a set of 9 anchors meeting the requirements is generated, and the step is repeated until the number M of all the populations is met. The larger the population size M is, the more likely the algorithm is to converge, but the time consumption is relatively longer, and M is preferably 30 generally because of the limited space size of the solution. Each anchor is a combination of Width and Height, and all anchors in a group are ordered according to the size of the area to be used as a chromosome. Each gene value of the chromosome is an anchor, and the length of the chromosome is determined by k.

s2: selecting a plurality of pairs of chromosomes from the population according to the set crossover probability for crossover operation, wherein the crossover strategy adopts a two-point crossover method, and the empirical parameter value is between 0.45 and 0.85 when a new individual is generated through crossover operation.

s3: selecting a plurality of chromosomes from the population according to the set mutation probability to carry out uniform mutation operation, wherein the mutation strategy adopts a uniform mutation method, a small value is taken as a priority value of the mutation rate, and the value of the empirical parameter is between 0.001 and 0.01.

s4: calculating a fitness function for all chromosomes of the current population according to the composition of the chromosomes, sequencing individuals of the population according to the value of the fitness function, and selecting the current population by using a roulette method to ensure that the number of new populations is still M, wherein the fitness function is as follows:

wherein, boxes are the number of all target frames in the label file read by the k-means algorithm, i is the number of learners, j is the number of clusters, C _j Denotes the jth cluster, X _i Denotes to belong to C _j Learner, y _j Is represented by C _j The cluster center of (2). And calculating Euclidean distances from the element Xi belonging to the Cj class to the clustering center, solving the sum of all the distance values to obtain the fitness of the class, and adding the reciprocal of 1 to the sum of the fitness of all the classes to serve as a core index of a fitness function. Based on the improved k-means clustering algorithm of the genetic algorithm, the method can generate the sample set which is closer to the sample setanchor。

Step II-3, in order to expand the number of samples in the training set, the method can be enhanced by the following data: random region clipping, random noise filtering, rotation transformation, brightness transformation and the like;

the yolov5 target detection algorithm model of the end-to-end deep neural network is adopted, the motion data enhancement operation is used for improving the training speed of the model and the precision of the network, the network structure adopts CSPDarknet, a full convolution is combined with a Bottleneck residual error layer jump network, and in the network structure, the convolution with the step length of 2 is used for downsampling. Meanwhile, up-down sampling and route operation are used in the network. And setting the super parameters and various network parameters of the network in the cfg file.

In order to reduce the dependency on the labeled data, the asymmetry problem of the labeled data is solved by utilizing a pre-trained weight file yolov5s.pt and adopting a transfer learning method, so that the training efficiency is improved, the stability and generalization performance of the model can be improved, and the classification result cannot be changed due to a little pixel change.

The network structure comprises 3 yolo layers in total, and 8 times of downsampling, 16 times of downsampling and 32 times of downsampling are completed respectively. Performing multi-scale feature prediction on each downsampled feature map, taking 512 × 512 as an example, when 8 times downsampling is performed, dividing the feature map into 64 × 64 grids, performing prediction on each grid according to the prior frame obtained by the k-means algorithm to obtain 3 candidate real target frames, wherein each real target frame comprises 3 parts, w is w ^gt ,h ^gt Is the width and height, w, of the real target frame ^p ,h ^p The width and height of the prediction frame, the confidence score of whether the object contains the object and the probability of N categories are obtained by judging whether Non-Maximum Suppression (NMS) and a confidence score threshold are metTo the final predicted result.

Assuming that yolov5 input size is 1376 × 1376, in 32 times down-sampling, the picture is divided into 43 × 43 grids, each grid predicts 3 candidate frames, the euclidean distance between the target frame and the center point of the predicted frame is 7, the target frame focal distance is 21, and the new yolov5 loss function is:

CIOU _loss in the function, IOU represents the IOU error, and α and v are the aspect ratios.

CIOU _loss In the function, α is a balance parameter and does not participate in the gradient calculation. v is a parameter for measuring the uniformity of the aspect ratio, w ^gt ,h ^gt Is the width and height, w, of the real target frame ^p ,h ^p Is the width and height of the prediction box.

Regularization is implemented using an L2 norm, where w _mn Representing the weight between neuron m and neuron n, using the weight update formula before regularization as w _ij ＝w _ij +ηδ _jj x _ij ，w _ij Represents the connection weight of the neurons j to i, and η represents the learning rate, δ _j Error term, x, representing neuron j _ij Representing the input value of neuron i to neuron j. The update function using the L2 regularized weights is, w _ij ＝(1-ηλ)w _ij +ηδ _j x _ij λ is a hyper-parameter for controlling the regularization strength, and the larger λ is, the heavier the penalty is for a value with a larger weight, so that the risk of overfitting can be effectively reduced.

And Step III, inputting the rod piece positioning area image into a deep convolution neural network, and training detection models of different rod piece types. The method comprises the following steps:

establishing different rod type detection models, constructing a convolutional neural network based on backhaul, enhancing operation by using Mosaic data to improve the training speed of the model and the accuracy of the network, wherein the network configuration comprises 5 convolutional layers, 3 full-connection layers, an activation function Conv + BN + Leaky _ relu, a local regularized LRN and a corresponding overfitting Dropout neural network.

And (5) manually marking the rod piece graph identified by the Step II-5 positioning, zooming the marked image to 128 × 128, and inputting the image into the convolutional neural network. And generating a final weight model through training of a large amount of data and fine adjustment of parameters, predicting each input sample by the trained neural network, processing the input sample by a Softmax function at an output unit of the final full-connection layer to obtain the similarity between various rod pieces and the type of an actual rod piece, and taking the category of the maximum similarity as a final network prediction result of the sample.

And Step IV, inputting the picture to be detected into a target detection network, obtaining a positioning picture, and then inputting the positioning picture into a deep convolutional neural network to obtain a rod type detection result.

In conclusion, the invention provides a deep learning-based rail transit subway steering frame rod member classification detection method which can be applied to a robot detection workstation of a subway train set in a garage. And positioning the position of the rod piece in real time by utilizing a deep learning target detection algorithm and a yolov5 network detection model. The deep neural network technology is used for recognizing and learning a large amount of rod piece image information, accurate positioning and classification of consistent rod piece types, different sizes, inconsistent rod piece types, the same sizes, consistent rod piece types and inconsistent rod piece types are achieved, and classification detection and evaluation are further performed on the recognized rod pieces. The invention is based on the improved k-means algorithm of the genetic algorithm, and solves the problem that the traditional k-means algorithm depends on the initial 'cluster center' of a data set; the loss function of yolov5 is improved, the L2 norm is increased, and the risk of overfitting is effectively reduced by adopting an updating formula of the L2 regularized weight.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. The rail transit subway steering frame rod component classification detection method based on deep learning is characterized by comprising the following steps of:

step II-1: manually marking the rod piece pictures of the subway train set; generating an xml tag file conforming to a VOC2007 data format by adopting a manual labeling method, and obtaining the labeling information of various images, such as the types, the positions and the frame sizes of each rod piece, the type names of various rod pieces and the image numbers in the coordinate regions Row _ min, row _ max and Col _ min of the labeling boundary frame; then converting the xml file into a txt label file which can be recognized by yolov 5;

step II-5, inputting the pictures containing the rod pieces in the training set into a CSPDarknet layer of yolov5 for training, respectively passing through a convolution layer and an upper sampling layer for multiple times, and storing the weight files obtained by training;

step III, inputting the area image positioned by the rod piece into a depth convolution neural network, and training type detection models of all the rod pieces;

and Step IV, inputting the picture to be detected into a target detection network, obtaining a positioning picture, and then inputting the positioning picture into a deep convolutional neural network to obtain detection results of different rod piece types.

2. The deep learning-based rail transit subway turning bar component classification detection method as claimed in claim 1, wherein said txt tag file of Step ii-1 includes the number of bar type, the euclidean distance between the target frame and the predicted frame center point, and the diagonal distance between the target frame.

3. The deep learning based rail transit subway bogie rod component classification detection method as claimed in claim 1, wherein said genetic algorithm improved k-means algorithm comprises the following steps:

s1: firstly, reading a prepared txt label file, dividing samples in all data sets into k disjoint subsets, then executing a k-means algorithm for M times, and generating a group of k prior frames anchors meeting the requirements after executing the k-means algorithm each time; repeating the step, and sequencing all the prior frames anchors according to the area size to be used as a chromosome when the number M of the population meets the requirement of the population quantity because each prior frame anchor is equivalent to the combination of Width and Height;

s3: selecting a plurality of pairs of chromosomes from the population to perform mutation operation according to the set mutation probability;

s4: calculating a fitness function for all chromosomes of the current population according to the composition of the chromosomes; and (3) sequencing the individuals of the group by calculating the value of the fitness function, and selecting the current group by using a roulette method to ensure that the number of the new group is M, wherein the fitness function is as follows:

wherein, boxes are the number of all target frames in the label file read by the k-means algorithm, i is the number of learners, j is the number of clusters, C _j Denotes the jth cluster, X _i Denotes to belong to C _j Learner, y _j Is represented by C _j The cluster center of (a); for the compounds belonging to C _j Element X in class _i Calculating Euclidean distance from the Euclidean distance to the cluster center, and calculating the sum of all the distance values to obtain the fitness of the class, wherein the reciprocal of the sum of the fitness of all the classes plus 1 is determined byFitness of chromosomes formed by clustering centers.

4. The deep learning-based rail transit subway steering frame rod member classification detection method as claimed in claim 1, wherein said yolov5 target detection network model uses Mosaic data to enhance operation and improve training speed of the model and accuracy of the network, and adopts a CSPDarknet network structure and a Bottleneck residual structure, the residual structure is used for solving the problems of gradient disappearance and gradient explosion, and the method comprises the steps of firstly passing through a 1 x 1 convolution layer, then passing through a 3 x 3 convolution layer, and finally adding the initial input through the residual structure, wherein the convolution kernel is responsible for extracting image features, the pooling layer is responsible for reducing dimensions of the image features, further screening refined features, and using a pre-trained weight file for transfer learning.

5. The deep learning-based rail transit subway steering frame rod member classification detection method as claimed in claim 1, wherein said Step ii of performing area positioning on the rod member specifically comprises: inputting a picture containing a rod into a CSPDarknet layer of yolov5 for training, and obtaining a weight file to be stored after carrying out convolution and upsampling for multiple times; in network training, the network outputs prediction boxes based on an initial anchor box, each prediction bounding box containing the following information: the central coordinates of the prediction frame, the height and the width of the prediction frame, the confidence score of whether the frame contains an object or not and the probability of belonging to a certain category are compared with the real frame, the difference value of the two is calculated, and then the network parameters are updated reversely and iterated; and finally obtaining the final prediction result by judging whether the non-maximum suppression and the confidence score threshold are met.

6. The deep learning-based rail transit subway bogie rod member classification detection method as claimed in claim 5, wherein in order to reduce the risk of over-fitting, a weight penalty term is added to the error function, and a term which increases with the magnitude of the weight vector is added to the CIOU loss function, so that the CIOU loss function can search for a value with a smaller weight by adopting a gradient descent method, and the risk of over-fitting is further reduced.

7. The deep learning based rail transit subway bogie rod member classification detection method as claimed in claim 6, wherein said loss function is:

wherein alpha is a balance parameter and does not participate in gradient calculation; v is a parameter for measuring the uniformity of the aspect ratio, w ^gt ，h ^gt Is the width and height, w, of the real target frame ^p ，h ^p Is the width and height of the prediction box;

regularization is achieved using an L2 norm, w _mn Representing the weight between neuron m and neuron n, using the weight update formula before regularization as w _ij ＝w _ij +ηδ _j x _ij ，w _ij Representing the connection of neurons j to iWeight, η represents the learning rate, δ _j Error term, x, representing neuron j _ij Represents the input value of neuron i to neuron j; the update function using the L2 regularized weights is, w _ij ＝(1-ηλ)w _ij +ηδ _j x _ij λ is a hyper-parameter for controlling the regularization strength, and the larger λ is, the heavier the penalty is for a value with a larger weight, so that the risk of overfitting can be effectively reduced.

8. The deep learning based rail transit subway bogie rod member classification detection method as claimed in claim 1, wherein said Step iii comprises: and aiming at the rod piece image identified by Step II artificial marking and positioning, scaling the marked image according to a certain proportion and inputting the scaled image into a rod piece type detection model based on a deep convolutional neural network according to different rod piece types, and generating a final weight model through training of a large amount of data and fine adjustment of parameters.

9. The deep learning-based rail transit subway bogie rod component classification detection method as claimed in claim 8, wherein said deep convolutional neural network is a backsbone-based convolutional neural network, using Mosaic data to enhance operation and improve training speed of model and accuracy of network, and the network configuration comprises 5 convolutional layers, 3 full-link layers, activation function Conv + BN + leak _ relu, local regularized LRN and corresponding overfitting Dropout neural network.