CN112464846A

CN112464846A - Automatic identification method for abnormal fault of freight train carriage at station

Info

Publication number: CN112464846A
Application number: CN202011415713.6A
Authority: CN
Inventors: 刘清; 刘同财; 李雪琪; 谢兆青; 王靖博; 郭建明
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2021-03-09
Anticipated expiration: 2040-12-03
Also published as: CN112464846B

Abstract

The invention provides an automatic identification method for the abnormity and the fault of a freight train carriage at a station. Firstly, shooting left, right and top images of each carriage by using a high-speed linear array camera under the condition that a freight train does not stop to construct a carriage image data set, and forming a carriage fault image training set after preprocessing operations such as cutting, screening, manual marking and the like. And then constructing a train carriage abnormal fault recognition network and a loss function, inputting a train carriage fault image in a training set, and optimizing the recognition network through a gradient descent algorithm. During testing, inputting an image to be identified into the optimized train compartment abnormal fault identification network to obtain a primary identification result; and then carrying out post-processing operations such as confidence filtering, non-maximum suppression and the like to obtain a final recognition result. The invention has the advantages of high recognition rate, high speed, strong real-time performance and the like, realizes the functions of monitoring the running state of the train and automatically alarming for abnormity or faults, and further improves the intelligent level of railway transportation.

Description

Automatic identification method for abnormal fault of freight train carriage at station

Technical Field

The invention relates to the field of intelligent supervision of railway traffic safety, in particular to an automatic identification method for abnormal faults of a carriage of a freight train at a station.

Background

Aiming at the safety problem of the railway freight car in the operation process, some safety monitoring systems are put into use at present. The system mainly comprises 5 subsystems, namely a THDS (vehicle axle temperature intelligent detection system), a TPDS (vehicle operation quality rail side dynamic monitoring system), a TADS (vehicle rolling bearing fault rail side acoustic diagnosis system), a TFDS (freight car fault rail side image detection system) and a TCDS (passenger car operation safety monitoring system). The method is characterized in that related data in the running process of the train are collected by adopting technologies such as an infrared technology, acoustics, mechanics, image collection, a computer and the like, and the running condition of the train is monitored by using a manual screening mode. With the increase of railway lines and the improvement of safety guarantee requirements, the efficiency of a traditional train operation monitoring system and a method for checking potential safety hazards cannot meet the requirements. In the traditional train operation monitoring and potential safety hazard troubleshooting system, after a train image is collected by a camera, workers need to troubleshoot and verify the collected image samples one by one, and information such as train number, train type and fault is manually recorded. Due to the reasons of short parking time, large number of passing trains and the like in the train station, the problems of fault missing detection, fault false detection, recording errors and low efficiency caused by manual identification are easy to occur. For a train fault identification application scene, the target detection technology has natural advantages compared with manual detection, on one hand, the image detection technology can detect the train under the condition that the train is not stopped in a non-contact mode, the normal operation of the train cannot be affected, on the other hand, the labor intensity of personnel can be greatly reduced, and the labor cost is reduced. Therefore, the introduction of target detection technology into the train fault identification scenario can greatly promote the development of the railway transportation industry.

Disclosure of Invention

The invention provides an automatic identification method for the abnormity and the fault of a freight train carriage at a station, aiming at the problems that the abnormity and the fault of the freight train are identified by adopting a mode of manually checking images, the information of train numbers, train types, faults and the like is manually recorded, and the problems of fault omission, fault misdetection, recording errors, low efficiency and the like caused by manual identification are easy to occur.

In order to achieve the purpose, the invention adopts the technical scheme that:

step 1: respectively shooting a left high-resolution image, a right high-resolution image and a top high-resolution image of each carriage by using a high-speed linear array camera under the condition that a freight train does not stop to construct a carriage high-resolution image data set, reducing the carriage high-resolution image to a proper proportion by using a linear interpolation method in an equal proportion, cutting the carriage high-resolution image into four overlapped image blocks with the same size, screening out carriage image samples containing faults from all the image blocks, and constructing a carriage fault image data set by using the carriage image samples containing the faults;

step 2: manually labeling a carriage fault marking frame and fault types of each carriage fault image in the carriage fault image data set in the step 1, respectively counting the number of carriage fault image samples of each fault type, and collecting fault types of which the number of the image samples is less than a sample number threshold value until the number of the carriage fault image samples of each fault type is greater than the sample number threshold value so as to construct a train carriage abnormal fault identification network training set;

and step 3: constructing a train carriage abnormal fault recognition network, taking the train carriage abnormal fault recognition network training set in the step 2 as input data, constructing a train carriage abnormal fault recognition network loss function by combining the fault types of the carriage fault image samples in the train carriage abnormal fault recognition network training set, and obtaining the optimized train carriage abnormal fault recognition network through gradient descent algorithm training;

and 4, step 4: inputting the image to be recognized into the optimized train compartment abnormal fault recognition network, predicting to obtain a first prediction characteristic diagram, a second prediction characteristic diagram and a third prediction characteristic diagram of the image to be recognized, splicing the first prediction characteristic diagram, the second prediction characteristic diagram and the third prediction characteristic diagram of the image to be recognized to obtain a primary recognition result of the image to be recognized, and performing operations such as confidence screening, non-maximum value suppression and the like to obtain a final recognition result.

Preferably, the car fault image data set in step 1 is:

{train_s(m，n)，s∈[1，S]，m∈[1，M]，n∈[1，N]}

wherein, train_s(M, N) represents pixel information of the mth row and nth column of the S car fault image in the car fault image data set, S represents the number of all image samples in the car fault image data set, M is the row number of each fault image in the car fault image data set, and N is the column number of each fault image in the car fault image data set;

preferably, the coordinates of the car fault marking box of each car fault image in the car fault image data set in the step 2 are as follows:

where l denotes the left on the car trouble image, t denotes the upper on the car trouble image, r denotes the right on the car trouble image, and b denotes the lower on the car trouble image; s represents the number of fault images of all the carriages in the carriage fault image data set, K_sRepresenting the total number of the compartment fault mark frames in the s compartment fault image in the compartment fault image data set; box_s，kIndicating the k-th compartment fault mark frame in the s-th compartment fault image in the compartment fault image data setIs determined by the coordinate of (a) in the space,

the coordinates representing the upper left corner of the kth car fault flag box in the s-th car fault image in the car fault image dataset,

the abscissa representing the upper left corner of the kth car fault flag box in the s-th car fault image data set,

the ordinate of the upper left corner of the kth carriage fault marking frame in the s carriage fault image data set is represented;

the coordinates representing the lower right corner of the kth car fault flag box in the s-th car fault image in the car fault image dataset,

an abscissa representing the lower right corner of the kth car fault flag box in the s-th car fault image data set,

the ordinate represents the lower right corner of the kth carriage fault marking frame in the s carriage fault image data set;

step 2, the compartment fault marking frame category information of each compartment fault image in the compartment fault image data set is as follows:

label_s，k，c，s∈[1，S]，k∈[1，K_s]，c∈[1，C]

wherein C is the total number of fault types in the carriage fault image data set; label_s，k，cThe kth carriage fault marking frame which represents the s carriage fault image in the carriage fault image data set belongs to the c fault type;

step 2, the training set of the train compartment abnormal fault recognition network is as follows:

{train_s(m，n)，(box_s，k，label_s，k，c)}

s∈[1，S]，m∈[1，M]，n∈[1，N]，k∈[1，K_s]，c∈[1，C]

wherein, train_s(m, n) pixel information, box, of the mth row and the nth column of the mth train car fault image in the train car abnormal fault recognition network training set_s，kRepresenting the coordinates, label, of the kth carriage fault mark frame in the s carriage fault image in the train carriage abnormal fault recognition network training set_s，k，cIndicating that the kth carriage fault marking frame of the ith carriage fault image in the train carriage abnormal fault recognition network training set belongs to the type c fault; s represents the number of all image samples in the train compartment abnormal fault recognition network training set, M is the number of lines of each fault image in the train compartment abnormal fault recognition network training set, N is the number of columns of each fault image in the train compartment abnormal fault recognition network training set, and K is the number of columns of each fault image in the train compartment abnormal fault recognition network training set_sRepresenting the total number of the compartment fault marking frames in the s carriage fault image in the train compartment abnormal fault recognition network training set, wherein C is the total number of the fault types in the train compartment abnormal fault recognition network training set;

preferably, the network for identifying an abnormal fault in a train car in step 3 specifically includes: the system comprises a feature extraction network, a channel feature fusion network, a first spatial feature fusion network, a second spatial feature fusion network and a multi-scale prediction layer;

the channel feature fusion network is embedded in the feature extraction network as a sub-module; the feature extraction network is serially cascaded with the first spatial feature fusion network and then is connected with the second spatial feature fusion network in parallel; the second spatial feature fusion network is serially cascaded with the multi-scale prediction layer;

the feature extraction network: the dimensionality reduction convolution module and the residual error module are sequentially stacked and cascaded;

the dimension reduction convolution module is formed by sequentially stacking and cascading a dimension reduction convolution layer, a dimension reduction batch normalization layer and a Leaky ReLU activation layer;

the residual module is formed by sequentially stacking and cascading a plurality of Ghost residual blocks;

the Ghost residual block is composed of a residual convolution layer, a residual batch normalization layer and a ReLU activation layer according to the stacking mode of the traditional residual block;

the feature extraction network is defined as:

wherein N is_JRepresenting the number of dimension-reducing convolution modules, N, in a feature extraction network_CRepresenting the number of residual modules in the feature extraction network, N_GRepresenting the number of Ghost residual blocks in each residual module in the feature extraction network,

represents the number of layers of the dimensionality reduction convolution layer in each dimensionality reduction convolution module,

representing the number of layers of the dimensionality reduction batch normalization layer in each dimensionality reduction convolution module,

indicates the number of layers of the residual convolutional layer in each Ghost residual block,

representing the number of layers of a residual error batch normalization layer in each Ghost residual block;

representing the parameters in the b1 dimension reduction convolution layer in the a1 dimension reduction convolution module as the parameters to be optimized;

representing the translation amount of a b2 dimension reduction batch normalization layer in an a1 dimension reduction convolution module as a parameter to be optimized;

representing the scaling quantity of a b2 dimension reduction batch normalization layer in an a1 dimension reduction convolution module as a parameter to be optimized;

representing the parameters in the b3 th residual convolution layer in the a3 th Ghost residual block under the a2 th residual module as the parameters to be optimized;

representing the translation amount of a b4 th residual error batch normalization layer in a3 th Ghost residual error block under an a2 th residual error module, wherein the translation amount is a parameter to be optimized;

representing the scaling quantity of a b4 th residual error batch normalization layer in a3 th Ghost residual error block under an a2 th residual error module as a parameter to be optimized;

the input data of the feature extraction network is a single image in the train compartment abnormal fault recognition network training set in the step 2, and the output data is a low-dimensional feature map, namely Feat1 (M)₁×N₁×C₁) Middle dimension characteristic diagram, namely Feat2 (M)₂×N₂×C₂) High-dimensional feature map, i.e., Feat3 (M)₃×N₃×C₃)；

In the output data of the feature extraction network, M₁Is the width, N, of the low-dimensional feature map Feat1₁Height, C, of the low-dimensional feature map Feat1₁The number of channels is the low-dimensional feature map Feat 1; m₂For the width, N, of the medium-dimensional feature map Feat2₂Height, C, of the medium dimensional feature map Feat2₂The number of channels of the middle-dimensional feature map Feat 2; m₃Is the width, N, of the high-dimensional feature map Feat3₃Height, C, of high-dimensional feature map Feat3₃The number of channels of the high-dimensional feature map Feat 3;

the first spatial feature fusion network: the first space convolution layer, the first space batch normalization layer and the maximum pooling module are sequentially stacked and cascaded;

the maximum pooling module is formed by connecting a first maximum pooling layer, a second maximum pooling layer, a third maximum pooling layer and a fourth maximum pooling layer in parallel;

the first spatial feature fusion network is defined as:

wherein the content of the first and second substances,

representing the number of layers of the first spatial convolution layer in the first spatial feature fusion network,

representing the number of layers of a first spatial batch normalization layer in a first spatial feature fusion network; SPP _ kernel_eRepresenting the parameters in the e-th first space convolution layer in the first space feature fusion network as the parameters to be optimized; SPP _ Gamma_gRepresenting the translation amount of the g-th first space batch normalization layer in the first space feature fusion network, wherein the translation amount is a parameter to be optimized; SPP _ beta_gRepresenting the scaling quantity of the g-th first space batch normalization layer in the first space feature fusion network as a parameter to be optimized;

the input data of the first spatial feature fusion network is a high-dimensional feature map, Feat3, and the output data is a spatial fusion feature map, Feat4 (M)₄×N₄×C₄)；

In the output data of the first spatial feature fusion network, M₄For the width, N, of the spatially fused feature map Feat4₄Height, C, of spatially fused feature map Feat4₄The number of channels is the spatial fusion feature map Feat 4;

the second spatial feature fusion network: the device consists of a second space convolution layer, a second space deconvolution layer, a second space batch normalization layer and a ReLU activation layer which are connected in a cross way;

the second spatial feature fusion network is defined as:

wherein the content of the first and second substances,

representing the number of layers of a second spatial convolution layer in a second spatial feature fusion network,

representing the number of layers of a second spatial deconvolution layer in a second spatial feature fusion network,

representing the number of layers of a second spatial batch normalization layer in a second spatial feature fusion network; PAN _ kernel_pRepresenting the parameters in the p second space convolution layer in the second space feature fusion network as the parameters to be optimized; PAN _ kernel_qRepresenting the parameters in the qth second space deconvolution layer in the second space feature fusion network as the parameters to be optimized; PAN _ gamma_rRepresenting the translation amount of an r second space batch normalization layer in a second space feature fusion network, wherein the translation amount is a parameter to be optimized; PAN _ beta_rRepresenting the scaling quantity of an r second space batch normalization layer in a second space feature fusion network as a parameter to be optimized;

the input data of the second spatial feature fusion network is Feat1 which is a low-dimensional feature map, Feat2 which is a medium-dimensional feature map, and Feat4 which is a spatial fusion feature map, and the output data is Feat5 (M) which is a first fusion feature map₅×N₅×C₅) Feat6 (M), the second fused feature map₆×N₆×C₆) Feat7 (M), which is the third fused feature map₇×N₇×C₇)；

In the output data of the second spatial feature fusion network, M₅Is the width, N, of the first fused feature map, Feat5₅Height, C, of the first fused feature map, Feat5₅The number of channels being the first fused feature map Feat 5; m₆Is the width, N, of the second fused feature map Feat6₆Height, C, of the second fused feature map Feat6₆The number of channels being the second fused feature map Feat 6; m₇Is the width, N, of the third fused feature map Feat7₇Height, C, of the third fused feature map Feat7₇The number of channels being the third fused feature map Feat 7;

the channel feature fusion network comprises: the average pooling layer, the full-connection layer, the ReLU activation layer and the Sigmoid activation layer are sequentially stacked and cascaded;

the channel feature fusion network is defined as:

f_SE(SE_kernel_z)，z∈[1，N_SE]

wherein N is_SERepresenting the number of layers of a full connection layer in the channel characteristic fusion network; SE _ kernel_zRepresenting the parameter of the z-th full connection layer in the channel feature fusion network as the parameter to be optimized;

the input data of the channel feature fusion network are a low-dimensional feature map, i.e., Feat1, a medium-dimensional feature map, i.e., Feat2, and a high-dimensional feature map, i.e., Feat3, and the output data is a first Tensor, i.e., Tensor1(T × T)₁) The second Tensor, Tensor2 (T)₂) And the third Tensor, Tensor3(T₃)；

In the output data of the channel characteristic fusion network, T is the line number of a first Tensor Tensor1, a second Tensor Tensor2 and a third Tensor Tensor3, and T is₁Is the column number, T, of the first quantity, Tensor1₂Is the column number, T, of the second Tensor Tensor2₃Column number for the third Tensor 3;

the multi-scale prediction layer: sequentially stacking and cascading a prediction convolution layer, a prediction batch normalization layer and a ReLU activation layer;

the multi-scale prediction layer is defined as:

wherein the content of the first and second substances,

indicating the number of predicted convolutional layers in the multi-scale prediction layer,

representing the number of layers of a prediction batch normalization layer in the multi-scale prediction layer; YO _ kernel_xRepresenting the parameter of the xth predicted convolutional layer in the multi-scale predicted layer, which is the parameter to be optimized; YO-gamma_yRepresenting the translation amount of the ith prediction batch normalization layer in the multi-scale prediction layer, wherein the translation amount is a parameter to be optimized; YO _ beta_yRepresenting the zoom quantity of the ith prediction batch normalization layer in the multi-scale prediction layer as a parameter to be optimized;

the input data for the multi-scale prediction layer is the first fused feature map, Feat5, the second fused feature map, Feat6, and the third fused feature map, Feat7, and the output data is the first predicted feature map, Feat8 (M)₈×N₈×C₈) Feat9 (M), the second prediction feature map₉×N₉×C₉) Feat10 (M), the third predictive feature map₁₀×N₁₀×C₁₀)；

In the output data of the multi-scale prediction layer, M₈For the width, N, of the first predicted feature map, Feat8₈Height, C, of the first predicted feature map, Feat8₈The number of channels being the first predicted feature map Feat 8; m₉For the width, N, of the second predicted feature map Feat9₉Height, C, of the second predicted feature map, Feat9₉The number of channels being the second predicted feature map Feat 9; m₁₀For the width, N, of the third predicted feature map Feat10₁₀Height, C, of the third predicted feature map Feat10₁₀The number of channels is the third predicted feature map Feat 10;

step 3, constructing and constructing a train compartment abnormal fault recognition network loss function through a positioning loss function, a confidence coefficient loss function and a classification loss function;

wherein, when the train carriage fault image is input into the train carriage abnormal fault recognition network for training, the train carriage fault image is divided into A multiplied by A grids, each grid is preset with B anchor boxes, each anchor box obtains corresponding A multiplied by B prediction frames through network regression,not all prediction blocks participate in the computation of the loss function. When a certain car fault image train_sA certain fault flag box (box) in (m, n)_s，k，label_s，k，c) When the central point of (B) anchor boxes falls on the ith grid, the one of the B anchor boxes with the largest IOU value between the fault mark boxes is selected to learn the characteristic information of the fault and is considered as a positive sample, and the rest of the B-1 anchor boxes are considered as negative samples.

The positioning loss function is:

and is

Wherein the content of the first and second substances,

whether the jth anchor box under the ith grid is responsible for predicting a certain fault or not is shown, if so, the value is 1, and if not, the value is 0; so-called

Indicating that the IOU of the jth anchor box and the mark frame of the fault is maximum in all B anchor boxes under the ith grid; IoU_iFor car fault image train_s(mN) fault marker boxes (box) falling within the ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_s，k，c) Cross-over ratio of (d)_iMarking boxes (box) for failures in ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_s，k，c) Euclidean distance of two central points, l_iTo be able to cover the fault mark box (box) at the same time_s，k，label_s，k，c) And a failure prediction block (p _ box)_s，k，p_label_s，k，c) Is the diagonal distance, v, of the smallest rectangle_iFor measuring aspect ratio uniformity, alpha_iIs the trade-off parameter. So that the positioning loss L_locIndicating when an image train is input_sThe kth failure flag box (box) of (m, n)_s，k，label_s，k，c) Falls in the ith grid and the jth anchor box is responsible for predicting the fault, then the anchor box generates a fault prediction box (p _ box)_s，k，p_label_s，k，c) Should be associated with the label box (box) of the fault_s，k，label_s，k，c) The positioning losses are calculated together.

The confidence loss function is:

and is

Wherein the content of the first and second substances,

the jth anchor box representing the ith mesh is not responsible for predicting the fault, so-called

That is, in the ith grid, the IOU of the jth anchor box and the failed mark box is not the largest among all the B anchor boxes; lambda [ alpha ]_objAnd λ_noobjRespectively representing the weights when the anchor box is responsible for predicting a certain fault and not responsible for predicting the certain fault;

is the true value of the confidence, if the jth anchor box of the ith grid is responsible for predicting a certain fault

Taking 1, otherwise, taking 0;

confidence of the prediction box output for the multiscale prediction layer YOLO _ head. So there is a confidence loss L_locConsisting of confidence loss for prediction boxes where objects exist and confidence loss for prediction boxes where objects do not exist.

The classification loss function is:

and is

Wherein the content of the first and second substances,

if the ith grid is a true value of the category probability, the jth anchor box under the ith grid is responsible for predicting a certain fault (box)_s，k，label_s，k，c) When the temperature of the water is higher than the set temperature,

and a one-hot matrix with dimension C x 1 is generated, the C-th dimension of the matrix is 1, and the rest are 0.

The class probability of the prediction box for expressing the output of the multi-scale prediction layer YOLO _ head is a matrix with the dimension of C multiplied by 1, and the loss value L between the two is calculated by using cross entropy_cls；

The train compartment abnormal fault identification network loss function is as follows:

L＝L_loc+L_conf+L_cls

wherein L is_locFor the localization loss function, L_confAs a function of confidence loss, L_clsIs a classification loss function.

Preferably, the first prediction feature map of the image to be recognized in step 4 is Feat 8;

step 4, the second prediction characteristic map of the image to be recognized is Feat 9;

step 4, the third prediction characteristic map of the image to be recognized is Feat 10;

and 4, the preliminary identification result of the image to be identified comprises the probability that the prediction frame belongs to the foreground, the coordinate of the prediction frame and the class probability of the prediction frame.

The probability that the prediction frame belongs to the foreground in the preliminary recognition result of the image to be recognized is defined as follows:

I_v∈[0，1]，v∈N_Rs

wherein N is_RsRepresenting the number of preliminary recognition results, I, of the image to be recognized_vRepresenting the probability that the prediction frame belongs to the foreground in the preliminary identification result of the v-th image to be identified;

and the coordinates of a prediction frame in the preliminary identification result of the image to be identified are defined as:

wherein l represents the left on the image to be recognized, t represents the upper on the image to be recognized, r represents the right on the image to be recognized, and b represents the lower on the image to be recognized;

the coordinates of the upper left corner of the prediction frame in the preliminary recognition result of the v-th image to be recognized are shown,

the abscissa representing the upper left corner of the prediction box in the preliminary recognition result of the v-th image to be recognized,

representing the ordinate of the upper left corner of a prediction frame in the preliminary recognition result of the v-th image to be recognized;

the coordinates of the lower right corner of the prediction frame in the preliminary recognition result of the v-th image to be recognized are shown,

the abscissa representing the lower right corner of the prediction box in the preliminary recognition result of the v-th image to be recognized,

the ordinate of the lower right corner of a prediction frame in the preliminary recognition result of the v-th image to be recognized is represented;

and the prediction frame category probability in the preliminary identification result of the image to be identified is defined as:

wherein, Pr_vRepresenting a set of all six types of fault category probabilities in the preliminary identification result of the v-th image to be identified;

representing the probability that the preliminary identification result of the v-th image to be identified belongs to the 0 th type fault;

representing the probability that the preliminary identification result of the v-th image to be identified belongs to the 1 st type fault;

representing the probability that the preliminary identification result of the v-th image to be identified belongs to the 2 nd type fault;

representing the probability that the preliminary identification result of the v-th image to be identified belongs to the 3 rd type fault;

representing the probability that the preliminary identification result of the v-th image to be identified belongs to the 4 th type fault;

representing the probability that the v-th preliminary identification result belongs to the 5 th type fault;

the preliminary identification result of the image to be identified is defined as:

wherein R is_firstRepresenting a preliminary recognition result of the image to be recognized;

the final recognition result of the image to be recognized is defined as:

and is

Wherein R is_finalRepresenting the final recognition result of the image to be recognized, N_ReRepresenting the number of final recognition results of the image to be recognized;

representing the coordinates of the upper left corner of a prediction box in the final recognition result of the epsilon-th image to be recognized;

representing the coordinates of the lower right corner of a prediction frame in the final recognition result of the epsilon-th image to be recognized; plabel_εAnd the final recognition result of the epsilon-th image to be recognized belongs to which type of fault.

The invention has the following beneficial effects:

some problems possibly encountered in the actual recognition process are simulated by a method of manually adjusting the original picture, for example, environmental noise exists, the intensity of ambient light is too high or too low, color deviation exists, the robustness of the recognition network is improved, and meanwhile, the over-fitting problem caused by few training samples can be avoided to a certain extent.

Fusing a novel convolution calculation mode in the Ghost-block: fewer convolution kernels are used to generate the original feature map, then linear transformation operation is used to produce more phantom feature maps, and finally the two part feature maps are spliced. By the method, nearly half of parameters in the feature extraction network are compressed, and the recognition speed of the network on the input image is accelerated under the condition of not reducing the recognition precision.

The SE-block is introduced to enable the network model to automatically learn the importance degree of different channel characteristics, namely the convolution operation is not weighted but equal in addition operation, and the weight is obtained by the SE-block automatic learning, so that although the execution time of the algorithm is slightly increased, the problem of false detection in the algorithm can be better solved.

The improved identification network can accurately and quickly complete the identification task even on an industrial computer with low hardware configuration and feed the identification task back to workers through a display. The problem of high identification cost in the prior art can be solved, the staff does not need to check the vehicle passing samples one by one, and only needs to confirm the fault of the image output by the network, so that the operation efficiency is greatly improved.

Furthermore, the data acquisition equipment related by the invention is simple and convenient to install and deploy, a freight train does not need to be stopped, workers do not need to arrive at the site for operation, and the automatic identification of the abnormity and the fault of the carriage of the freight train can be realized only by using the train passing samples stored in the server.

Drawings

FIG. 1: the invention is a network structure diagram for identifying abnormal faults of train carriages;

FIG. 2: the invention is a flow chart of a training train compartment abnormal fault recognition network;

FIG. 3: the method is an execution flow chart of the train compartment abnormal fault identification method;

FIG. 4: the train car abnormal fault identification network is an example of the abnormal or fault which can be identified by the train car abnormal fault identification network.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The following describes embodiments of the present invention with reference to fig. 1 to 4:

step 1, the compartment fault image data set comprises:

{train_s(m，n)，s∈[1，S]，m∈[1，M]，n∈[1，N]}

wherein, train_s(M, N) represents pixel information of the mth row and nth column of the S-th compartment fault image in the compartment fault image data set, wherein S is 18031 represents the number of all image samples in the compartment fault image data set, M is 416 represents the number of rows of each fault image in the compartment fault image data set, and N is 416 represents the number of columns of each fault image in the compartment fault image data set;

step 2, the coordinates of the compartment fault marking frame of each compartment fault image in the compartment fault image data set are as follows:

where l denotes the left on the car trouble image, t denotes the upper on the car trouble image, r denotes the right on the car trouble image, and b denotes the lower on the car trouble image; s18031 denotes the number of all car fault images in the car fault image data set, K_sRepresenting the total number of the compartment fault mark frames in the s compartment fault image in the compartment fault image data set; box_s，kCoordinates representing a k-th car fault flag box in the s-th car fault image in the car fault image data set,

label_s，k，c，s∈[1，S]，k∈[1，K_s]，c∈[1，C]

wherein, C is 6 which is the total number of fault types in the carriage fault image data set; label_s，k，cThe kth carriage fault marking frame which represents the s carriage fault image in the carriage fault image data set belongs to the c fault type;

{train_s(m，n)，(box_s，k，label_s，k，c)}

s∈[1，S]，m∈[1，M]，n∈[1，N]，k∈[1，K_s]，c∈[1，C]

wherein, train_s(m, n) pixel information, box, of the mth row and the nth column of the mth train car fault image in the train car abnormal fault recognition network training set_s，kRepresenting the coordinates, label, of the kth carriage fault mark frame in the s carriage fault image in the train carriage abnormal fault recognition network training set_s，k，cIndicating that the kth carriage fault marking frame of the ith carriage fault image in the train carriage abnormal fault recognition network training set belongs to the type c fault; s18031 represents the number of all image samples in the train carriage abnormal fault recognition network training set, M416 is the number of lines of each fault image in the train carriage abnormal fault recognition network training set, N416 is the number of columns of each fault image in the train carriage abnormal fault recognition network training set, and K is the number of columns of each fault image in the train carriage abnormal fault recognition network training set_sRepresenting the total number of the compartment fault marking frames in the s-th compartment fault image in the train compartment abnormal fault recognition network training set, wherein C is 6 which is the total number of the fault types in the train compartment abnormal fault recognition network training set;

and step 3: building a YOLOv4 original network by using a deep learning framework Tensorflow, replacing partial convolution calculation in a YOLOv4 feature extraction network by using Ghost-block, introducing an attention mechanism (SE-block), and constructing a train compartment abnormal fault identification network. The train car abnormal fault recognition network structure is shown in fig. 2. And (3) taking the train carriage abnormal fault recognition network training set in the step (2) as input data, and constructing a train carriage abnormal fault recognition network loss function by combining the fault types of the carriage fault image samples in the train carriage abnormal fault recognition network training set. And training 50000 times by using a gradient descent algorithm to obtain an optimized train carriage abnormal fault recognition network. The training process of the train compartment abnormal fault identification network is shown in fig. 1;

and 3, the train compartment abnormal fault identification network specifically comprises: the system comprises a feature extraction network, a channel feature fusion network, a first spatial feature fusion network, a second spatial feature fusion network and a multi-scale prediction layer;

the feature extraction network is defined as:

wherein N is_JRepresenting the number of dimension-reducing convolution modules, N, in a feature extraction network_CRepresenting the number of residual modules in the feature extraction network, N_GEach of the representation feature extraction networksThe number of Ghost residual blocks in the residual module,

In the output data of the feature extraction network, M₁52 is the width of the low-dimensional feature map Feat1, N₁52 is the height, C, of the low-dimensional feature map Feat1₁256 is the number of channels of the low-dimensional feature map Feat 1; m₂26 is the width of the medium dimensional feature map Feat2, N₂26 is the height of the medium dimensional feature map Feat2, C₂512 is the channel number of the middle dimension feature map Feat 2; m₃13 is the width of the high-dimensional feature map Feat3, N₃13 is the height of the high-dimensional feature map Feat3, C₃1024 is the channel number of the high-dimensional feature map Feat 3;

the first spatial feature fusion network is defined as:

wherein the content of the first and second substances,

In the output data of the first spatial feature fusion network, M₄13 is the width of the spatial fusion feature map Feat4, N₄13 is the height of the spatially fused feature map, Feat4, C₄2048 is the number of channels of the spatial fusion feature map Feat 4;

the second spatial feature fusion network is defined as:

wherein the content of the first and second substances,

In the output data of the second spatial feature fusion network, M₅52 is the width of the first fused feature map Feat5, N₅52 is the height of the first fused feature map, Feat5, C₅128 is the channel number of the first fused feature map Feat 5; m₆26 is the width of the second fused feature map Feat6, N₆26 is the height of the second fused feature map, Feat6, C₆256 is the number of channels in the second fused feature map Feat 6; m₇13 is the width of the third fused feature map Feat7, N₇13 is the height of the third fused feature map, Feat7, C₇512 is the number of channels in the third fused feature map Feat 7;

the channel feature fusion network is defined as:

f_SE(SE_kernel_z)，z∈[1，N_SE]

In the output data of the channel characteristic fusion network, T1 is the line number of the first Tensor Tensor1, the second Tensor Tensor2 and the third Tensor Tensor3, and T is₁256 is the number of columns of the first quantity of tensors 1, T₂512 is the number of columns in the second Tensor2, T₃1024 is the number of columns of the third Tensor 3;

the multi-scale prediction layer is defined as:

wherein the content of the first and second substances,

representing the number of layers of a prediction batch normalization layer in the multi-scale prediction layer; YO _ kernel_xRepresenting the parameter of the xth predicted convolutional layer in the multi-scale predicted layer, which is the parameter to be optimized; YO _ gamma_yRepresenting the translation amount of the ith prediction batch normalization layer in the multi-scale prediction layer, wherein the translation amount is a parameter to be optimized; YO _ beta_yRepresenting the zoom quantity of the ith prediction batch normalization layer in the multi-scale prediction layer as a parameter to be optimized;

In the output data of the multi-scale prediction layer, M₈52 is the width of the first predicted feature map Feat8, N₈52 is the height of the first predicted feature map, Feat8, C₈33 is the number of channels of the first predicted feature map Feat 8; m₉26 is the width of the second predicted feature map Feat9, N₉26 is the height of the second predicted feature map Feat9, C₉33 is the number of channels of the second predicted feature map Feat 9; m₁₀13 is the width of the third predicted feature map Feat10, N₁₀13 is the height of the third predicted feature map Feat10, C₁₀33 is the number of channels of the third predicted feature map Feat 10;

when the train car fault image is input into the train car abnormal fault recognition network for training, the train car fault image is divided into A × A (A is 52, 26 and 13) grids, each grid is preset with B is 3 anchor boxes, and each anchor box obtains corresponding A × A × B prediction frames through network regression, but not all the prediction frames participate in calculation of the loss function. When a certain car fault image train_sA certain fault flag box (box) in (m, n)_s，k，label_s，k，c) When the central point of (B) anchor boxes falls on the ith grid, the one of the B anchor boxes with the largest IOU value between the fault mark boxes is selected to learn the characteristic information of the fault and is considered as a positive sample, and the rest of the B-1 anchor boxes are considered as negative samples.

The positioning loss function is:

and is

Wherein the content of the first and second substances,

Indicating that the IOU of the jth anchor box and the mark frame of the fault is maximum in all B anchor boxes under the ith grid; IoU_iFor car fault image train_s(m, n) a fault flag box (box) falling within the ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_s，k，c) Cross-over ratio of (d)_iMarking boxes (box) for failures in ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_s，k，c) Euclidean distance of two central points, l_iTo be able to cover the fault mark box (box) at the same time_s，k，label_s，k，c) And a failure prediction block (p _ box)_s，k，p_label_s，k，c) Is the diagonal distance, v, of the smallest rectangle_iFor measuring aspect ratio uniformity, alpha_iIs the trade-off parameter. So that the positioning loss L_locIndicating when an image train is input_sThe kth failure flag box (box) of (m, n)_s，k，label_s，k，c) Falls in the ith grid and the jth anchor box is responsible for predicting the fault, then the anchor box generates a fault prediction box (p _ box)_s，k，p_label_s，k，c) Should be associated with the label box (box) of the fault_s，k，label_s，k，c) The positioning losses are calculated together.

The confidence loss function is:

and is

Wherein the content of the first and second substances,

That is, in the ith grid, the IOU of the jth anchor box and the failed mark box is not the largest among all the B anchor boxes; lambda [ alpha ]_objAnd λ_noobjRespectively indicating that the anchor box is responsible for predicting a certain fault and not responsible for predicting a certain faultWeight at fault;

Taking 1, otherwise, taking 0;

The classification loss function is:

and is

Wherein the content of the first and second substances,

L＝L_loc+L_conf+L_cls

And 4, step 4: inputting the image to be recognized into the optimized train compartment abnormal fault recognition network, predicting to obtain a first prediction characteristic diagram, a second prediction characteristic diagram and a third prediction characteristic diagram of the image to be recognized, splicing the first prediction characteristic diagram, the second prediction characteristic diagram and the third prediction characteristic diagram of the image to be recognized to obtain a primary recognition result of the image to be recognized, and performing operations such as confidence screening, non-maximum value suppression and the like to obtain a final recognition result. And finally, storing the identification result in an image and log mode, and waiting for the worker to confirm the fault. The execution flow of the train car abnormal fault identification method is shown in fig. 3, and an example of the abnormality or fault that needs to be identified by the identification network is shown in fig. 4.

Step 4, the first prediction characteristic map of the image to be recognized is Feat 8;

I_v∈[0，1]，v∈N_Rs

the final recognition result of the image to be recognized is defined as:

and is

The invention provides an automatic identification method for railway station freight train carriage abnormity and fault, which identifies a train passing sample by analyzing a command file to access a remote picture, and completes the functions of storing a fault image, generating a log file and the like. After the test is carried out on the spot, compared with the unoptimized recognition algorithm, the recognition algorithm after training optimization is greatly improved in both accuracy and recall rate.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. An automatic identification method for abnormal faults of freight train carriages at a station is characterized by comprising the following steps:

2. The automatic identification method for the abnormal fault of the freight train car at the station as claimed in claim 1, characterized in that:

step 1, the compartment fault image data set comprises:

{train_s(m，n)，s∈[1，S]，m∈[1，M]，n∈[1，N]}

wherein, train_s(M, N) represents the pixel information of the mth row and the nth column of the S-th car fault image in the car fault image data set, S represents the number of all image samples in the car fault image data set, M is the row number of each fault image in the car fault image data set, and N is the column number of each fault image in the car fault image data set.

3. The automatic identification method for the abnormal fault of the freight train car at the station as claimed in claim 1, characterized in that:

where l denotes the left on the car trouble image, t denotes the upper on the car trouble image, r denotes the right on the car trouble image, and b denotes the lower on the car trouble image; s represents the number of fault images of all the carriages in the carriage fault image data set, K_sRepresenting the total number of the compartment fault mark frames in the s compartment fault image in the compartment fault image data set; box_s，kCoordinates representing a k-th car fault flag box in the s-th car fault image in the car fault image data set,

label_s，k，c，s∈[1，S]，k∈[1，K_s]，c∈[1，C]

wherein C is the total number of fault types in the carriage fault image data set; label_s，k，，The kth carriage fault marking frame which represents the s carriage fault image in the carriage fault image data set belongs to the c fault type;

{train_s(m，n)，(box_s，k，label_s，k，c)}

s∈[1，S]，m∈[1，M]，n∈[1，N]，k∈[1，K_s]，c∈[1，C]

wherein, train_s(m, n) pixel information, box, of the mth row and the nth column of the mth train car fault image in the train car abnormal fault recognition network training set_s，kRepresenting the coordinates, label, of the kth carriage fault mark frame in the s carriage fault image in the train carriage abnormal fault recognition network training set_s，k，cIndicating that the kth carriage fault marking frame of the ith carriage fault image in the train carriage abnormal fault recognition network training set belongs to the type c fault; s represents the number of all image samples in the train compartment abnormal fault recognition network training set, M is the number of lines of each fault image in the train compartment abnormal fault recognition network training set, N is the number of columns of each fault image in the train compartment abnormal fault recognition network training set, and K is the number of columns of each fault image in the train compartment abnormal fault recognition network training set_sAnd C is the total number of fault types in the train carriage abnormal fault recognition network training set.

4. The automatic identification method for the abnormal fault of the freight train car at the station as claimed in claim 1, characterized in that:

the feature extraction network is defined as:

a1∈[1，NJ]，a2∈[1，NC]，a3∈[1，NG]

The characteristic is providedIn the output data of the network, M₁Is the width, N, of the low-dimensional feature map Feat1₁Height, C, of the low-dimensional feature map Feat1₁The number of channels is the low-dimensional feature map Feat 1; m₂For the width, N, of the medium-dimensional feature map Feat2₂Height, C, of the medium dimensional feature map Feat2₂The number of channels of the middle-dimensional feature map Feat 2; m₃Is the width, N, of the high-dimensional feature map Feat3₃Height, C, of high-dimensional feature map Feat3₃The number of channels of the high-dimensional feature map Feat 3;

the first spatial feature fusion network is defined as:

wherein the content of the first and second substances,

input number of first spatial feature fusion networkThe data is a high-dimensional feature map, Feat3, and the output data is a spatially fused feature map, Feat4 (M)₄×N₄×C₄)；

the second spatial feature fusion network is defined as:

f_PAN(PAN_kernel_p，PAN_Ukernel_q，PAN_γ_r，PAN_β_r)

wherein the content of the first and second substances,

representing the number of layers of a second spatial batch normalization layer in a second spatial feature fusion network; PAN _ kernel_pRepresenting the parameters in the p second space convolution layer in the second space feature fusion network as the parameters to be optimized; PAN _ kernel_qRepresenting the parameters in the qth second space deconvolution layer in the second space feature fusion network as the parameters to be optimized; PAN _ gamma_rRepresenting the r-th second space batch in the second space feature fusion networkThe translation amount of a normalization layer is a parameter to be optimized; PAN _ beta_rRepresenting the scaling quantity of an r second space batch normalization layer in a second space feature fusion network as a parameter to be optimized;

the input data of the second spatial feature fusion network is Feat1 which is a low-dimensional feature map, Feat2 which is a medium-dimensional feature map, and Feat4 which is a spatial fusion feature map, and the output data is Feat5 (M) which is a first fusion feature map₅×N₅×C₅) F6at6 (M) as a second fused feature map₆×N₆×C₆) Feat7) M as the third fused feature map₇×N₇×C₇)；

the channel feature fusion network is defined as:

f_SE(SE_kernel_z)，z∈[1，N_SE]

the multi-scale prediction layer is defined as:

wherein the content of the first and second substances,

In the output data of the multi-scale prediction layer, M₈For the width, N, of the first predicted feature map, Feat8₈Is as followsA height, C, of a predicted feature map Feat8₈The number of channels being the first predicted feature map Feat 8; m₉For the width, N, of the second predicted feature map Feat9₉Height, C, of the second predicted feature map, Feat9₉The number of channels being the second predicted feature map Feat 9; m₁₀For the width, N, of the third predicted feature map Feat10₁₀Height, C, of the third predicted feature map Feat10₁₀The number of channels is the third predicted feature map Feat 10;

when the train carriage fault image is input into a train carriage abnormal fault recognition network for training, the train carriage fault image is divided into A multiplied by A grids, each grid is preset with B anchor boxes, each anchor box obtains corresponding A multiplied by B prediction frames through network regression, but not all the prediction frames participate in the calculation of a loss function; when a certain car fault image train_sA certain fault flag box (box) in (m, n)_s，k，label_s，k，c) When the central point of the B anchor boxes is located in the ith grid, selecting the one of the B anchor boxes which has the largest IOU value with the fault marking frame to learn the characteristic information of the fault, and regarding the characteristic information as a positive sample, and regarding the rest B-1 anchor boxes as negative samples;

the positioning loss function is:

and is

Wherein the content of the first and second substances,

Indicating that the IOU of the jth anchor box and the mark frame of the fault is maximum in all B anchor boxes under the ith grid; IoU_iFor car fault image train_s(m, n) a fault flag box (box) falling within the ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_s，k，c) Cross-over ratio of (d)_iMarking boxes (box) for failures in ith grid_s，k，label_s，k，c) With corresponding failure prediction blocks (p _ box)_s，k，p_label_a，k，c) Euclidean distance of two central points, l_iTo be able to cover the fault mark box (box) at the same time_s，k，label_s，k，c) And a failure prediction block (p _ box)_s，k，p_label_s，k，c) Is the diagonal distance, v, of the smallest rectangle_iFor measuring aspect ratio uniformity, alpha_iIs a trade-off parameter; so that the positioning loss L_locIndicating when an image train is input_sThe kth failure flag box (box) of (m, n)_s，k，label_s，k，c) Falls in the ith grid and the jth anchor box is responsible for predicting the fault, then the anchor box generates a fault prediction box (p _ box)_s，k，p_label_s，k，c) Should be associated with the label box (box) of the fault_s，k，label_s，k，c) Calculating the positioning loss together;

the confidence loss function is:

and is

Wherein the content of the first and second substances,

Taking 1, otherwise, taking 0;

confidence of the prediction box output for the multiscale prediction layer YOLO _ head; so there is a confidence loss L_locConsisting of confidence loss for prediction boxes with objects present and confidence loss for prediction boxes without objects present;

the classification loss function is:

and is

Wherein the content of the first and second substances,

and a one-hot matrix with the dimensionality of C multiplied by 1 is generated, the C-th dimensionality of the matrix is 1, and the rest dimensions are 0;

L＝L_loc+L_conf+L_cls

5. The automatic identification method for the abnormal fault of the freight train car at the station as claimed in claim 1, characterized in that:

step 4, the preliminary identification result of the image to be identified comprises the probability that the prediction frame belongs to the foreground, the coordinate of the prediction frame and the class probability of the prediction frame;

I_v∈[0，1]，v∈N_Rs

the final recognition result of the image to be recognized is defined as:

and is