CN110706197A

CN110706197A - Railway foreign matter intrusion detection method based on transfer learning in special scene

Info

Publication number: CN110706197A
Application number: CN201910720026.6A
Authority: CN
Inventors: 李云栋; 董晗; 刘艺
Original assignee: North China University of Technology
Current assignee: North China University of Technology
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2020-01-17

Abstract

The invention provides a railway foreign body intrusion detection method under a special scene based on transfer learning, which comprises the following steps: s1: collecting data; s2: data expansion: converting the image into images of different domains by using an image style converter to realize the transfer of the image style; s3: processing data; s4: and detecting and tracking the target. The method comprises the steps of firstly, aiming at the problem that training samples are too few in severe weather environment and night scenes, migrating conventional scene samples to corresponding scenes, and establishing a railway scene target sample library under different weather and different time conditions; secondly, based on a deep convolutional neural network, the detection of abnormal targets in a railway scene is realized through the combination of a target detection SSD algorithm and a CAFFE frame, then the model training under the complex weather condition based on the transfer learning is realized through the combination of a generated complex sample library, and the target detection, tracking and behavior analysis of the model are utilized.

Description

Railway foreign matter intrusion detection method based on transfer learning in special scene

Technical Field

The invention relates to the field of railway foreign matter intrusion detection, in particular to a railway foreign matter intrusion detection method based on migration learning in a special scene.

Background

The high-risk foreign matters such as pedestrians, animals, side slope falling rocks and the like enter the railway perimeter and bring great threat to the driving safety of the train. Therefore, the real-time detection and early warning work is carried out on the abnormal events, and the method has great significance for guaranteeing the running safety of the train. At present, two methods are mainly used for railway foreign matter intrusion detection: contact railway foreign object detection and non-contact railway foreign object detection. The contact detection technology comprises a double-electric-mesh anti-detection method, fiber grating mode detection and the like; non-contact detection techniques include radar technology, infrared correlation technology, video-based analysis, and the like. The detection method based on video analysis has the advantages that after videos are collected by the camera, the detection analysis processing is carried out at the rear end, the detection and early warning can be carried out in real time, and the detection method is wide in application. However, the method is easily interfered by environmental factors in a special scene, and the detection effect has certain defects, so that the normal work in practical use is influenced.

One of the major points is for efficient detection in night scenes. Under a night scene, a real-time image acquired by a high-definition infrared camera is different from an image acquired in the daytime, and the night infrared image has the problems of no color feature, low quality in the image, insufficient texture information and low discrimination. Because the number of the ordinary scene samples at night and the number of the railway scene samples at night are small, the detection algorithm training realized by the traditional deep learning method cannot achieve a good detection effect. In order to solve the problem, a transfer learning method is used, the railway scene training samples are endowed with night scene features, the number of learning samples is increased, and the adaptability of the detection model to night scenes and the detection effect are improved.

Efficient detection of extreme weather is also one of the important points. The imaging effect of the detected image can be seriously influenced by rain, snow, haze and other special weather, so that the detection effect is influenced. Firstly, Fourier transform is utilized to reduce the interference of various background conditions on image detection and reduce the influence of noise; and then, the detection capability of the detection model is improved, the training samples of the special weather scene are increased by using transfer learning, and the adaptability of the model to the special weather scene is improved, so that the detection effect of the model in the special weather scene is improved.

The training sample used by the existing research method is too simple, and cannot achieve good effect on complex image detection of special weather, night scenes and railway scenes, so that the requirement in practical application is difficult to meet, and the use value is low. In order to meet the requirement of real-time detection effect of scene specificity, a large number of learning samples are usually required to be collected manually, but the sample collection difficulty of special weather and night scenes under a railway scene is large, the sample amount is small, a good learning effect cannot be achieved, and the detection effect is difficult to promote.

Disclosure of Invention

Railway foreign body intrusion detection is one of important means for guaranteeing railway safety. When a train normally runs, if the intrusion of foreign matters into the railway perimeter cannot be detected in time, a serious traffic accident can be caused, and serious harm is caused. Therefore, the high-efficiency and reliable railway foreign matter intrusion detection method has great application value. The common detection method can obtain a relatively good detection effect under the conditions of good light, less interference noise and low scene complexity, but the detection image is influenced by various factors at night and in special weather scenes, and the detection effect cannot meet the requirements of specific use. In order to improve the detection efficiency and the reliability of detection work, the invention combines the transfer learning and the deep learning to realize the high-efficiency reliable detection of the railway foreign matter intrusion limit under the complex condition, and provides a railway foreign matter intrusion limit detection method under the special scene based on the transfer learning.

In order to solve the technical problems, the invention adopts the following technical scheme:

a railway foreign body intrusion detection method under a special scene based on transfer learning comprises the following steps:

s1: collecting data;

s2: data expansion: converting the image into images of different domains by using an image style converter to realize the transfer of the image style;

s3: processing data;

s4: and detecting and tracking the target.

As a preferred technical solution, the method for collecting data in step S1 includes:

arranging an infrared high-definition camera in a railway area, and collecting railway sample images at different time periods and under different weather conditions; and (3) collecting railway images and non-railway images in the same scene through a network path.

As a preferred technical solution, the data processing method in step S3 includes:

the existing image data is subjected to rotation and/or mirror image and/or down sampling and normalization processing, so that noise and fuzzy processing are increased, the complexity of a learning sample is increased, and the model learning effect is improved;

and calibrating all data samples, and making an XML file and an LMDB file to meet the requirements of model training.

As a preferred technical solution, the target tracking and tracing method in step S4 includes:

detecting and identifying deep features around each target through the multi-scale feature map;

target features are extracted from feature maps of different layers of the deep neural network, and more target scale information can be added naturally;

and tracking a specific target by matching the gradient direction histogram characteristics with the detection result, and drawing a tracking track.

As a preferred technical solution, in the step S2, the image style transition method includes:

the image style converter G can convert the image of the X domain into the style of the Y domain, and F can convert the image of the Y domain into the style of the X domain, after the picture of the X domain is converted into G (X) by the image style converter G, G (X) can also be converted into X by the image style converter F; similarly, after the Y domain picture is converted into F (Y), F (Y) may also be converted into Y by G. I.e. with reference to the following model:

G(X)≈Y F(Y)≈X

F(G(X))≈X G(F(Y))≈Y

and finally realizing mutual migration of the X-domain image style and the Y-domain image style.

Has the advantages that:

in order to realize the real-time efficient detection of the railway scene under special conditions, the method firstly aims at the problem that training samples are too few under severe weather environment and night scenes, transfers conventional scene samples to corresponding scenes, and establishes a railway scene target sample library under different weather and different time conditions; secondly, based on a deep convolutional neural network, through the combination of a target detection algorithm and a CAFFE frame, the detection, tracking and behavior analysis of abnormal targets in a railway scene are realized, and then the model training and target detection based on the migration learning under the complex weather condition are realized by combining a generated complex sample library.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a processing flow chart of a railway foreign body intrusion detection method in a special scene based on transfer learning according to the invention;

FIG. 2 is a schematic diagram of a sample augmentation method in data augmentation of a railway foreign body intrusion detection method in a special scenario based on transfer learning according to the present invention;

fig. 3 is an exemplary diagram of an image generated by a sample supplementing method of a railway foreign object intrusion detection method in a special scene based on transfer learning according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.

In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

The invention will now be further described with reference to the accompanying drawings.

Referring to fig. 1, a method for detecting railway foreign body intrusion in a special scene based on transfer learning includes the following steps:

s1: collecting data;

s3: processing data;

s4: and detecting and tracking the target.

The specific implementation is as follows.

1. Data collection

Arranging an infrared high-definition camera in a railway area, and collecting railway sample images in different time periods and different weather conditions such as day, night, rain, snow and the like; and (3) collecting railway images and non-railway images under the same scene in public data sets and material websites through a network approach.

2. Data expansion

Aiming at the problems of poor model training effect caused by less railway scene training and test samples under the conditions of complicated weather such as rain, snow, haze and the like and night scenes, the conversion of images under two different scenes can be realized by applying an algorithm. The method of data model expansion in the present invention is described in detail below.

Cyclic egan is essentially two mirror-symmetric GANs, forming a ring network. Two GANs share two generators and each have one arbiter, i.e. there are two arbiters and two generators in total, there are four los.

The generator consists of an encoder, a converter and a decoder, first extracting features from the input image using a convolutional neural network and compressing the image into 256 64 × 64 feature vectors. The dissimilar features of the images are then combined, converting the feature vectors of the images in the DA domain to feature vectors in the DB domain. The aim of simultaneously preserving the characteristics of the original image during conversion can be achieved by using 6-layer Reset modules, wherein each Reset module is a neural network layer consisting of two convolutional layers. Finally, the work of reducing low-level features from the feature vectors is completed by using a deconvolution layer, and a generated image is obtained. The discriminator itself belongs to a convolutional network, and it is necessary to extract features from the image and then determine whether the extracted features belong to a particular class by adding a convolutional layer that produces a one-dimensional output.

The LOSS used for training includes the reconstruction LOSS of the generator and the discrimination LOSS of the discriminator. The discriminator is used for judging whether the input picture is a real picture, so that the generated false picture and the original real picture are input into the discriminator, a 0, 1 two-classification loss is calculated, and the discriminator D_AAnd D_BLOSS of (A) is as follows:

the generator is used to reconstruct the picture in order to expect the generated picture g (a) to resemble the original picture B, f (B) as much as possible. At the same time, F (G (A)) is as similar as possible to the original pictures A, G (F (B)) and B. Therefore, L1-loss was taken for the calculation. The resulting LOSS is expressed as:

the cyclic egan network total LOSS is thus expressed as:

L(G，F，DX，DY)＝L_GAN(F，D_A，B，A)+L_GAN(G，D_B，A，B)+Lcyc(G，F)

to describe the details of the network implementation, in the structure of the generator and the discriminator, for the generator structure, we use 6 residual blocks for training images of size 128 × 128 and 9 residual blocks for training images of resolution 256 × 256 or higher. Let c7s1-k denote the 7 × 7 connected convolutional layer, batch normalization layer, RELU layer with k filters and step 1. dk denotes the 3 × 3 connection convolutional layer, batch normalization layer, RELU layer with k filters and step 2. Rk represents a residual block containing two 3 x 3 convolutional layers with the same number of filters on both layers. uk denotes the 3 x 3 connection convolution layer with k filters and step size 1/2, batch normalization layer, RELU layer. A network with 6 residual blocks comprises: c7s1-64, d128, d256, R256, R256, R256, R256, R256, R256, u128, u64, c7s 1-3. A network with 9 residual blocks comprises: c7s1-64, d128, d256, R256, R256, R256, R256, R256, R256, R256, R256, R256, u128, u64, c7s 1-3. For the discriminator structure, we use 70 × 70PatchGAN [22 ]. Let Ck denote the 4 × 4 connected convolutional layer, batch normalization layer, RELU layer with k filters and step 2. After the last layer, we apply a transform to produce a one-dimensional output. We did not use InstanceNorm for the first C64 layer. We used a leak ReLUs with a parameter of 0.2. The discriminator architecture is: C64-C128-C256-C512.

Qualitative and quantitative results were given on the data sets collected in the field and collected over the network to demonstrate the effectiveness of the data augmentation.

Data set: experiments are performed on the network data sets and field collected data sets collected at step S1. The data set comprises 2000 railway scene images in a common environment and 500 infrared night scene (or rain and snow weather scene) images, is divided into a training set and a testing set according to the proportion of 10: 1, and then is trained.

Setting parameters: this model is implemented using a pytorech framework. With respect to the set of network data,the resolution of the input image is reduced from 960 x 720 to 256 x 256, while for a data set acquired in the field, the bottom image is cropped and then adjusted to 256 x 256. And (3) fusing the output disparity maps of the two input images with the learning linear combination to obtain a final disparity map with the size of 256 multiplied by 256. The training batch size is set to 1 and the initial learning rate for all trials is 1 × 10^-4In the meantime. We use Adam optimization algorithm for optimization. After the 100 th epoch is trained, the learning rate begins to decay linearly.

This complete model training was performed on a single NVIDIA RTX2080 GPU with a data set of 2000 256 × 256 pictures, each picture generated taking approximately 1 second, 120 training sessions, and 96 hours total training sessions.

In the embodiment of the present invention, a sample supplement method for data expansion is proposed to solve the problem of sample shortage, as shown in fig. 2. The image style converter G can convert the image of the X domain into the style of the Y domain, and F can convert the image of the Y domain into the style of the X domain, after the picture of the X domain is converted into G (X) by the image style converter G, G (X) can also be converted into X by the image style converter F; similarly, after the Y domain picture is converted into F (Y), F (Y) may also be converted into Y by G. I.e. with reference to the following model: f (G (X)) is approximately equal to X, and G (F (Y)) is approximately equal to Y. And finally realizing the migration of the image styles of the X domain and the Y domain.

By using the algorithm, the railway scene sample X in the common weather and the non-railway scene sample Y in the special environment are combined to generate a new railway scene sample in the complex environment, so that the sample set in various weather and night scenes is expanded. The number of samples can meet the requirement of model learning, and the model can have good detection effect, cross-scene adaptability and higher use value. A specific example is shown with reference to fig. 3.

3. Data processing

By performing operations such as rotation, mirroring, downsampling, normalization and the like on the existing image data, noise and fuzzy processing such as Gaussian blur, Gaussian noise and the like are added, the complexity of a learning sample is increased, and the model learning effect is improved. And calibrating all data samples, and making an XML file and an LMDB file to meet the requirement of model training.

4. Target detection and tracking

And detecting and identifying deep features around each target by adopting an SSD target detection algorithm through a multi-scale feature map. Target features are extracted from feature maps of different layers of the deep neural network, more target scale information can be naturally added, and detection accuracy is improved while speed is not affected. In addition, specific target tracking is carried out by matching the features of a Histogram of gradient oriented (HOG) with the detection result, and a tracking track is drawn.

When the embodiment of the invention is implemented, the target detection and tracking are carried out according to the following steps.

4.1SSD model Structure

The structure of the SSD is modified on the basis of the VGG16 network, and is also trained as conv1_1, conv1_2, conv2_1, conv2_2, conv3_1, conv3_2, conv3_3, conv4_1, conv4_2, conv4_3, conv5_1, conv5_2, conv5_3(512), fc6 is convolved by 3 × 1024, fc7 is convolved by 1 × 1024, conv6_1, conv6_2, conv7_1, conv7_2, conv, 8_1, conv8_2, conv9_1, conv9_2, loss. Then on the one hand: for conv4_3(4), fc7(6), conv6_2(6), conv7_2(6), conv8_2(4), conv9_2(4) (the numbers in parentheses are the default box types selected in each layer), each convolution is performed by respectively adopting two convolution kernels with the size of 3 x 3, the two convolution kernels are parallel, and can refer to Caffe codes, the number 8732 in the second last column of the SSD structure represents the number of all the original boxes, one of the two convolution kernels with the size of 3 x 3 is used for localization (regression), and the other is used for confidence (classification). Localizations of conv6_2 use a 3 x 3 convolution kernel operation, with 24 convolution kernels: the permate layer is used for exchange, and the sequence is changed. The function of the flatten layer is to change 32 × 19 × 19 × 24 to 32 × 8664, and 32 is the size of the batch size. On the other hand, with the references conv4_3(4), fc7(6), conv6_2(6), conv7_2(6), conv8_2(4), conv9_2(4) and data layers (ground truthboxes), a primer box is generated through the primer box layer.

After the above two operations, the processing of each layer feature is completed. After the above operations are performed on all of the 5 convolutional layer outputs listed above, the obtained results are channel merged by using Concat.

4.2, training

In the training process, firstly, a real target (ground route) in a training picture is determined to be matched with a prior frame, and a bounding box corresponding to the matched prior frame is responsible for. Firstly, for each group channel in the picture, finding the prior frame with the largest IOU, and matching the prior frame with the prior frame, thus ensuring that each group channel is matched with a certain prior frame. The prior boxes matching with the group channel are usually called positive samples, whereas if one prior box is not matched with any group channel, the prior box can only be matched with the background, and is a negative sample. There are few group entries in a picture, but many prior frames, if matching is performed only according to the first principle, many prior frames will be negative samples, and the positive and negative samples are extremely unbalanced, so the second principle is needed. The second principle is: for the remaining unmatched prior frames, if the IOU of a certain group route is greater than a certain threshold (generally 0.5), the prior frame is also matched with the group route, one prior frame can only be matched with one group route, and if the IOU of a plurality of group routes and a certain prior frame is greater than the threshold, the prior frame is only matched with the prior frame with the largest IOU. The second principle must be followed by the first principle.

Although a group channel can match a plurality of prior frames, the group channel has a smaller number of prior frames, so that the number of negative samples is larger than that of positive samples. In order to ensure that positive and negative samples are balanced as much as possible, the SSD adopts hardnegative sampling to sample the negative samples, the negative samples are arranged in a descending order according to confidence errors (the smaller the confidence of a prediction background is, the larger the error is) during sampling, and top-k with larger error is selected as the training negative sample to ensure that the proportion of the positive and negative samples is close to 1: 3.

The training samples are determined and then followed by a loss function. The loss function is defined as a weighted sum of the position error (localization) and the confidence error (confidence) as:

where N is the number of positive samples of the prior box, the confidence error L_conf(x, c) Using Softmax loss, position error L_loc(x, L, g), using Smooth L1 loss, the weighting factor α is set to 1 by cross-validation. For position error, Smooth L1 loss is defined as follows:

for confidence errors, Softmax loss is defined as follows:

the performance of the SSD can be improved by Data amplification (Data Augmentation), and the mainly adopted techniques include horizontal flip (horizontal flip), random crop and color distortion (random crop & color distortion), and random sample a patch (small target training sample acquisition).

4.3 prediction Process

For each prediction box, firstly, the class (the one with the highest confidence) and the confidence value of the prediction box are determined according to the class confidence, and the prediction boxes belonging to the background are filtered out. The prediction boxes with lower thresholds are then filtered out according to a confidence threshold (e.g., 0.5). And (4) decoding the residual prediction frame, and obtaining the real position parameter of the prediction frame according to the prior frame (a clip is generally needed after decoding to prevent the position of the prediction frame from exceeding the picture). After decoding, the blocks are sorted in descending order according to confidence, and then only top-k (e.g. 400) prediction blocks are reserved. And finally, carrying out NMS algorithm to filter the prediction boxes with larger overlapping degree. And finally, taking the residual prediction box as a detection result.

4.4, implementation

The embodiment of the invention is implemented on a CAFFE.

Firstly, parameters of a target are defined, wherein the parameters mainly comprise picture size, category number + background, feature map scale range, prior frame scale of different feature maps, aspect ratio adopted by a prior frame of the feature maps, unit size of the feature maps and offset value, and the center of the prior frame is determined.

Then, the entire network is constructed.

And finally, for the detection of the feature map, independently defining a combination layer, and mainly performing convolution twice on the feature map to respectively obtain a category confidence coefficient and a bounding box position.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A railway foreign body intrusion detection method under a special scene based on transfer learning is characterized by comprising the following steps:

s1: collecting data;

s3: processing data;

s4: and detecting and tracking the target.

2. The method for detecting the intrusion of foreign objects into the railway under the special scenario based on the transfer learning of claim 1, wherein the data collection method in the step S1 is as follows:

3. The method for detecting the intrusion of foreign objects into the railway under the special scenario based on the transfer learning of claim 1, wherein the data processing method in the step S3 is as follows:

the existing image data is subjected to rotation and/or mirror image and/or down sampling, normalization processing is performed, noise and fuzzy processing are increased, complexity of a learning sample is increased, and a model learning effect is improved;

4. The method for detecting the intrusion of the foreign objects into the railway under the special scene based on the transfer learning of claim 1, wherein the method for detecting and tracking the target in the step S4 comprises:

5. The method for detecting the intrusion of foreign objects into the railway under the special scene based on the transfer learning of claim 1, wherein in the step S2, the transfer method of the image style comprises:

G(X)≈Y F(Y)≈X

F(G(X))≈X G(F(Y))≈Y；