CN114037750A

CN114037750A - Method for realizing track virtual responder

Info

Publication number: CN114037750A
Application number: CN202111130631.1A
Authority: CN
Inventors: 余祖俊; 朱力强; 王尧; 黄翊轩
Original assignee: Beijing Jiaotong University; China State Railway Group Co Ltd
Current assignee: Beijing Jiaotong University; China State Railway Group Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2022-02-11

Abstract

The invention provides a method for realizing a track virtual responder. The method comprises the following steps: the method comprises the steps of building a vehicle-mounted image acquisition and processing system arranged at a train end, and acquiring a driving record image set of a forward-looking visual angle and a side visual angle in the running process of a train by using the vehicle-mounted image acquisition and processing system; carrying out real-time target detection on the driving record image set by utilizing an improved YOLOv3 network, and acquiring the category and position information of the existing marker on the track operation line of the train; and performing content identification on the driving record image set according to the position information of the existing marker based on the improved CRNN, and acquiring text information of the existing marker on the track operation line of the train. The invention uses the machine vision method to detect the position of the existing marker on the line and identify the content, can be used for response positioning in the running process of the rail transit vehicle, thoroughly gets rid of the dependence on a ground responder and a satellite, and realizes completely autonomous train response positioning.

Description

Method for realizing track virtual responder

Technical Field

The invention relates to the technical field of rail transit communication, in particular to a method for realizing a rail virtual responder.

Background

In thirteen five periods, the method is a key period for rapid promotion and development of railway network construction in China, and by the end of 2020, the total operating mileage of railways in China reaches 14.63 kilometers. Along with the continuous expansion of the scale of a railway network, the operating mileage and the passenger quantity are also continuously increased, the running speed and the running density of a train are also continuously improved, and higher requirements are provided for traffic safety risk control and line intelligent maintenance and repair.

At present, a mobile block mode is generally adopted at home and abroad to ensure the efficient and safe operation of a railway line, and the mobile block has higher requirements on the operation control of trains. The train operation control system is a neural center of a rail transit system, the accurate positioning of the train is a key technical basis of train operation control, the accurate and reliable train position information is provided, and the system is of great importance to the whole system and is a precondition for ensuring the normal operation of the system.

The existing high-speed railway train positioning is mainly realized by a mode of correcting accumulated errors of the odometer at fixed points by means of a query transponder. The transponder is an important signal infrastructure and is widely applied to foreign railways, subways and urban light rail traffic, but the transponder needs to be laid on the line, so that the method is high in construction cost and high in daily maintenance difficulty. And is also less feasible for adding query transponders to a large number of existing lines. With the rapid development of the Beidou satellite navigation System independently researched and developed in China and the realization of Global networking, the performance of the Beidou satellite navigation System can be comparable to navigation systems such as a GPS (Global Positioning System) and a GLONASS. The improvement of the Beidou satellite navigation system provides a good solution for the accurate autonomous positioning of the train, but the mode still has the defect that satellite signals can be lost under special terrains such as valleys and long and large tunnels, namely the problem of a satellite signal 'blind area', and the problem is the main defect of the scheme.

Therefore, there is a need to develop a reliable train autonomous positioning system capable of performing a virtual response only by means of the existing characteristics of the line under extreme conditions.

In the prior art, a method for implementing a virtual transponder based on a GNSS (Global Navigation Satellite System) Satellite Navigation System includes: the positioning technology supported by the GNSS is applied to a train control system, and a map matching technology is assisted to realize a virtual responder, so that the requirement on a real responder is reduced, and even the real responder is replaced, thereby reducing the construction and maintenance cost. FIG. 1 is a block diagram of a virtual transponder of a GNSS-based navigation system. The definition of a virtual transponder is as follows: the virtual transponder simulates a transponder that is actually placed on the track, and when the train moves to a reference point on the track, the virtual transponder sends a location message to the transponder transmission module (BTM) that is identical to the message sent by the transponder on the actual track. The virtual transponder is internally composed of three modules: the device comprises a positioning unit module, a safety judgment module and a transponder (VB) module. The positioning unit module comprises a GNSS receiver and a Kalman filter, and Kalman filtering is carried out on positioning information of the receiver after GPS signals are received.

The safety discrimination module comprises a safety discrimination module and a map matching module, discriminates the position information transmitted by the positioning unit, and matches the position information with a map in a database. And determining a positioning error.

The virtual responder (VB) module comprises a VB database, a VB capturer and a VB information generator. The VB database stores the position information of the virtual transponder, when the train runs on the track, the train position information is confirmed by the safety discrimination module and then is transmitted to the VB capturer, the VB capturer carries out capture operation on the information and the position information in the VB database according to the positioning speed of the last train through dynamic prediction, if the capture is successful, the VB information generator generates message information compatible with an actual transponder and transmits the message information to a transponder transmission module (BTM), and the transponder transmission module transmits the message information to the vehicle-mounted computer for processing.

The virtual transponder is used for realizing real-time positioning, and in order to realize real-time and accurate positioning, the virtual transponder needs to be captured when the train just passes through the point where the virtual transponder is located. Since the GPS receiver receives data at a certain frequency, the train cannot receive positioning information exactly at the virtual transponder point in most cases, and therefore a certain acquisition area must be set with the virtual transponder point as the center.

The implementation method of the virtual transponder comprises the following steps:

step 1, determining a capture area of the virtual transponder, defining the capture area as a circle made by taking a virtual transponder point as a circle center and r2 as a radius, and calculating a capture radius r2 of the virtual transponder at the current moment according to the following formula:

r2＝(v0/2H+a/4H2+ξ)q

in the formula: v 0-train Current speed in m/s;

h-frequency in Hz;

a-acceleration in m/s 2;

xi is the corrected value of the capture radius, xi is less than L, and L is the distance between two positioning points;

q is the coefficient for controlling the capture rate, and q is more than 0 and less than or equal to 1.

Step 2, the current point does not enter the capturing area of the virtual responder yet, the virtual responder enters a pre-capturing state, the current positioning result does not trigger the virtual responder to give out positioning information, and the judgment condition of the pre-capturing state is as follows: l is less than or equal to L1, and L2 is more than or equal to r 2; l1 is the distance from the previous point to the virtual transponder point, L2 is the distance from the current point to the virtual transponder point, and L is the distance from two positioning points, namely the previous point to the current point;

step 3, the current point enters a capturing area of the virtual responder, the virtual responder is in a capturing state, and the judgment condition of the responder capturing is as follows: l2 < r 2; if L is more than L1 and L2 is more than or equal to r2, the state is a leakage capture state;

and 4, generating a message which is the same as the actual responder after the virtual responder captures the position, and transmitting the message to the responder transmission module.

The above-mentioned prior art method for implementing a virtual transponder based on a GNSS satellite navigation system has the following disadvantages:

the proposal adopts a GNSS satellite positioning system as a source of positioning information, a virtual responder system is constructed by installing a satellite signal receiving device at a train end to acquire the satellite positioning information, although this virtual transponder implementation may replace a physical transponder device such as a conventional terrestrial transponder, the virtual position answering point is provided in the running line of the train, the installation and maintenance cost of the ground equipment is reduced, however, the virtual transponder implemented based on the GNSS satellite system provides a response on the premise that signals of the GNSS satellite system can be received at the virtual response position, the problem of signal loss exists in the environment shielded by the GNSS satellite positioning system such as valleys, tunnels, under bridges, underground road sections and the like, in this case, none of the prior art virtual transponders provides position information well and the system will lose its functionality accordingly.

Therefore, the virtual transponder in the prior art cannot be applied to a mountainous and multi-tunnel complex and bad terrain and cannot cope with various extreme situations of satellite signal loss. In addition, the GNSS satellite positioning system often has a large error, and a differential base station needs to be erected on a positioning line to realize high-precision positioning, so that certain construction and maintenance costs are also needed. And under the condition that the base station is damaged in an extreme condition, the response positioning precision of the virtual response equipment in the prior art is greatly influenced.

Disclosure of Invention

Embodiments of the present invention provide a method for implementing a track virtual transponder, so as to overcome the problems in the prior art.

In order to achieve the purpose, the invention adopts the following technical scheme.

A method for implementing a track virtual transponder includes:

step 1, a vehicle-mounted image acquisition and processing system arranged at a train end is set up, and a driving record image set of a front view angle and a side view angle in the running process of a train is acquired by the vehicle-mounted image acquisition and processing system;

step 2, carrying out real-time target detection on the driving record image set by utilizing an improved YOLOv3 network, and acquiring the category and position information of the existing marker on the track operation line of the train;

and 3, performing content identification on the driving record image set based on the improved CRNN according to the position information of the existing marker, and acquiring text information of the existing marker on the track running line of the train.

And 4, calculating to obtain train operation positioning information by matching the acquired text information of the existing marker with an electronic map.

Preferably, the step 1 comprises:

the system comprises a power module, a processor, a forward-looking camera, a side-looking camera, a network switch and a hard disk, wherein the forward-looking camera is arranged in a cab and faces right ahead, the side-looking camera is arranged in a cab and faces a side window, the visual field direction is perpendicular to the advancing direction of a train, the forward-looking camera and the side-looking camera are connected through the switch, the forward-looking camera and the side-looking camera are subjected to parameter setting and secondary development, the forward-looking camera and the side-looking camera are used for collecting driving recording videos under specific resolution, specific shutter time and frame rate parameters in the running process of the train, the driving recording videos are stored into images frame by frame, all the images form a set, and the image set is stored in the hard disk.

Preferably, the step 2 comprises:

selecting a YOLOv3 network as a reference network, improving an input mode of a YOLOv3 network, reducing pixel loss of a small target in a down-sampling process, improving an activation function and a multi-scale feature fusion structure of the YOLOv3 network, and improving output processing of the YOLOv3 network by adopting a target frame weighted fusion WBF (weighted fusion) to obtain an improved YOLOv3 network;

and inputting the images in the image set into an improved YOLOv3 network, outputting the coordinates of the bounding box of the marker in the image and the type number of the marker by the YOLOv3 network, and acquiring the type and position information of the existing marker on the track operation line of the train.

Preferably, the improvement on the input mode of the YOLOv3 network reduces the pixel loss of the small target in the down-sampling process, and includes:

a rail scene forward view angle image YOLOv3 network is preprocessed in a sliding window mode, a window of 640 x 640 slides on an original 1920 x 1080 image, an image part in the sliding window is intercepted and adjusted to 416 x 416 to be input into the network, coordinates of a target object detected by the network are mapped back to the original image for post-processing, and parameters of a target frame on the original image are finally obtained.

Preferably, the improving the activation function and the multi-scale feature fusion structure of the YOLOv3 network, and the improving the output processing of the YOLOv3 network by using the target box weighted fusion WBF, includes:

a spatial pyramid pooling layer is added at the last of a YOLOv3 backbone network, the feature maps are pooled in different scales, new feature maps are obtained by splicing the pooled feature maps in the depth direction, the backbone network of the YOLOv3 adopts a darknet53 network structure, the structure is formed by stacking residual blocks into 52 layers, all activation functions of all layers adopt Leaky-ReLU activation functions, a plurality of overlapped detection frames are processed by utilizing a mode of weighting and fusing WBF by using a target frame, a YOLOv3 network model is built by a Pytrch deep learning frame, and the weight of the YOLOv3 network is obtained by training images collected by an artificially labeled image collection system.

Preferably, the step 3 comprises:

the CRNN network architecture introduces a residual connection structure, a Pythrch deep learning framework is used for building a CNN and a bidirectional LSTM neural network, features in an image are extracted and converted into sequence features for output prediction, and connection ambiguity time classification loss CTCLOs is adopted for the output prediction for optimized learning, so that the network automatically learns how to align a time sequence with actual text content to obtain an improved CRNN network;

inputting the category and position information of the existing marker on the rail operation line of the train into the improved CRNN, and outputting the text information of the existing marker on the rail operation line of the train by the improved CRNN.

The technical scheme provided by the embodiment of the invention can be seen that the method can be used for response positioning in the running process of rail transit vehicles, the machine vision method is used for detecting the positions of the existing markers on the line and identifying the contents, the algorithm for detection and identification adopts a deep learning method, the requirements on the accuracy and the real-time performance of the algorithm can be well met, the dependence on a ground responder and a satellite is thoroughly eliminated, and the completely autonomous train response positioning is realized.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a prior art virtual transponder based on a GNSS satellite navigation system;

FIG. 2 is a schematic diagram of an implementation of a visual virtual transponder detection and identification algorithm according to an embodiment of the present invention;

FIG. 3 is a process flow diagram of a visual virtual transponder detection and identification algorithm provided by an embodiment of the present invention;

fig. 4 is a diagram of a YOLOv3 network structure according to an embodiment of the present invention;

fig. 5 is a structural diagram of a spatial pyramid pooling layer according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hash activation function versus a leak-ReLU activation function according to an embodiment of the present invention;

fig. 7 is a schematic diagram comparing WBF and NMS according to an embodiment of the present invention.

Fig. 8 is a diagram illustrating a structure of a CRNN network according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking several specific embodiments as examples in conjunction with the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

The existing railway virtual transponder scheme generally depends on GNSS to provide positioning information, can not be applied to occasions where satellite signals are lost, and has high precision and high cost. Therefore, the invention carries out target detection and identification on the existing markers of the railway line by a machine vision method based on deep learning, extracts the content of the existing markers and matches the extracted content with the pre-established electronic map library to complete response. The acquired data in the experimental period comprises 2000 line images, the algorithm is trained and tested by the data acquired in the experimental process, the response accuracy of 99.5% can be realized by the algorithm, the response time within 150ms can be reached on an Nvidia AGX Xavier platform, and the real-time requirement can be met.

The embodiment of the invention adopts a deep neural network (such as Yolov3) target detection network architecture to detect and position the visual characteristic virtual transponder, extracts the text content area from the front-view camera image, and detects and positions the visual characteristic virtual transponder in the side-view camera, thereby accurately judging the time when the train crosses the virtual transponder.

The implementation principle of the detection and identification algorithm of the visual virtual responder provided by the embodiment of the invention is shown in fig. 2, and comprises the following processing procedures: for the text recognition part, considering the advantages of the cyclic neural network in processing the sequential data, recognizing text contents by adopting a convolutional neural network cascade cyclic neural network architecture (CRNN), acquiring character contents, and finally, matching the character contents with an electronic map to calculate train positioning information. In practical application, images of a front-view camera and images of a side-view camera are captured in real time through a cyclic acquisition camera node, the captured images are preprocessed, a target object search area is reduced, the calculation amount is reduced, the real-time performance is improved, the occupation ratio of a target object in a network input image is improved, the two parts of networks are cascaded, the preprocessed images are used as the input of a target detection network, the output of the detection network is used as the input of an identification network, and the two-stage visual virtual responder detection and identification algorithm can be realized.

The processing flow of the visual virtual responder detection and identification algorithm provided by the embodiment of the invention is shown in fig. 3, and comprises the following processing steps:

and step S10, constructing the vehicle-mounted image acquisition and processing system.

In order to capture the existing markers on the line in real time in the running process of the train, and perform image processing and detection and identification, a vehicle-mounted image acquisition and processing system arranged at the train end needs to be built. The system mainly comprises a core processor, two cameras (front view and side view) and a network switch.

Wherein the foresight camera arranges in the driver's cabin towards dead ahead, and mounted position and angle are in order to make the trackside marker content clear visible as the standard, and the side formula camera arranges in the driver's cabin towards the side window, and the direction of vision is perpendicular with train advancing direction, and the angle is in order to make trackside marker be in the camera field of vision scope as the standard.

And step S20, establishing a sample database.

And collecting the driving recording videos of the forward-looking visual angle and the lateral visual angle in the running process of the train by using the built vehicle-mounted image collecting and processing system, storing the driving recording videos frame by frame as images, and forming an image set by using all the images.

And carrying out data set expansion on the image set through cutting, zooming, brightness adjustment and contrast adjustment to establish an existing marker database of the railway scene.

And step S30, detecting the visual line marker.

Because the virtual response function needs to be realized by using the position information contained in the markers, the positions of the markers on the line need to be labeled, and an electronic map library is established.

The process of labeling the image set can be divided into two parts, namely labeling the category and the position of the marker in the image set and labeling the specific text content of the marker. The labels of the marker categories and positions are labels required to be completed by YOLOv3 network training, namely, real box information required by target detection network training is provided. The label of the specific text content of the marker is the label required by the CRNN network training, namely, the real label information corresponding to the network output is provided. The detection and identification process of a specific target can be realized only after the training of the two-part network is completed.

The method utilizes the existing markers of the line to carry out virtual response, firstly needs to carry out real-time target detection on images acquired by two cameras in the running process of the train, preferentially selects a YOLOv3 network as a reference network in order to ensure the accuracy and the real-time performance of a detection algorithm, and carries out improvement on the algorithm on the basis. Fig. 4 shows a YOLOv3 network structure provided by an embodiment of the present invention.

The improvement of the YOLOv3 mainly comprises the improvement of a network input mode, the pixel loss of a small target in the down-sampling process is reduced, the improvement of a network activation function and the improvement of a multi-scale feature Fusion structure are improved, the detection performance of the network is improved, and the final target frame position is more accurate by optimizing the improvement of network output processing by adopting WBF (Weighted Box Fusion). The input data of the network is an original image collected by an image collecting system. The output result is the coordinates of the bounding box of the marker and the category number of the marker.

The invention has to solve the problem that the detected target occupies fewer pixels in the image and belongs to a small target. Small target detection is a difficult problem in the field of target detection. It can be seen from the network structure of YOLOv3 that YOLOv3 receives an input size of 416 × 416 and the last feature map output is only 13 × 13, although the network uses the spatial pyramid structure of FPN to merge feature maps of different layers as output, even the largest output feature map has a size of only 52 × 52, and the down sampling rate is 8 times compared with the input size of the network, which means that an object with a size of 16 × 16 on the input image can only store 2 × 2 feature pixels on the largest output feature map, and can only share one pixel with its surrounding pixels to express features on the smallest output feature map, while if the image to be detected is directly scaled down to 416 × 416 size and input into the network, a large number of objects with a width of less than 16 pixels will be found by the previous size analysis of the data set object when the network is input, this will result in its characteristics being lost during the down-sampling process.

The invention preprocesses the forward view visual angle image of the railway scene in a sliding window mode, slides on the original 1920 multiplied by 1080 image by 640 window, intercepts the image part in the sliding window and adjusts the size to 416 multiplied by 416 input network. By adopting the mode, the down-sampling rate of the original image before being input into the network is greatly reduced, so that more pixels of the target can be reserved before being input into the network, and the small target is prevented from being input into the network to be detected as far as possible. And finally, mapping the coordinates of the target object detected by the network back to the original image for post-processing to obtain the parameters of the target frame on the original image finally.

For the data set of the road scene, because the route is short, the areas where the line markers are located in the camera view angle on the experimental route are relatively fixed and are all located on the right side of the road, the original image can be directly cut to remove the areas where the markers cannot appear, so that the down-sampling proportion when the image is input into a network is reduced, the image pixels to be processed are reduced, and the real-time performance of the network is improved.

In order to further fuse information of different scales, improve the detection effect of a target with a smaller scale and enable the network to give consideration to local features and global features, the invention adds a spatial pyramid pooling layer at the end of a YOLOv3 backbone network by using the thought of a spatial pyramid network (SPP-Net), pools feature maps of different scales, and splices the feature maps obtained after pooling in the depth direction to obtain a new feature map. Fig. 5 illustrates a spatial pyramid pooling layer according to an embodiment of the present invention. The backbone network of YOLOv3 adopts a darknet53 network structure, which uses the residual block structure of ResNet for reference and is formed by stacking residual blocks into 52 layers, wherein the activation functions of all layers adopt a leakage-ReLU activation function.

The formula of the Mish activation function is:

where x is the input and y is the output.

Fig. 6 is a schematic diagram of a hash activation function versus a leak-ReLU activation function according to an embodiment of the present invention. The curve of the Mish activation function is shown in the left graph of FIG. 6. It can be seen that the Mish activation function is obviously smoother than the Leaky-ReLU activation function, and the smoother activation function can allow information to go deeper into the neural network better, so as to improve the network performance, and experiments in the paper of Diganta Misra also prove that the activation effect in the activation function convolution neural network is far better than that of the common activation functions such as Leaky-ReLU. Therefore, the Mish activation function is adopted to replace the Leaky-ReLU activation function in the original Yolov3 backbone network.

In addition, since the present invention requires precise positioning of the detected target, the conventional non-maximum suppression (NMS) method only retains the coordinate information of the detection box with the highest confidence level according to the intersection ratio to the threshold value and disregards the information provided by other detection boxes when processing a plurality of overlapped detection boxes. The embodiment of the invention utilizes a mode of Weighted Box Fusion (WBF) to replace NMS. The WBF algorithm steps are first to add each predicted box to list B, and sort this list in descending order of confidence scores then to create an empty list L and list F (for fusion), then to iterate through list B and find the matching target box in F (same category MIOU >0.55), if no matching target box was found in the previous step, add this box to the tail of L and F, otherwise add this box to L, the added position is the index of the target box matching the box in F, there may be multiple boxes in each position in L, and the value at the corresponding F index needs to be updated according to these multiple boxes. A schematic diagram comparing WBF and NMS provided in an embodiment of the present invention is shown in fig. 7.

Step S40, identification of text content of responder

The invention improves the CRNN mainly by optimizing the backbone network, the backbone network of the original CRNN adopts the VGG-16 structure to extract the characteristics, the structure of the network is complex, the real-time requirement cannot be met, and the characteristic extraction effect is general, so that the invention adopts a Pythrch deep learning frame to build a new characteristic extraction network, and a residual connection structure is introduced into the network, so that the network can achieve a better characteristic extraction effect under a smaller calculation amount. In addition, convolution kernel pruning operation is carried out on the trained CRNN, so that the calculation amount is further reduced, and the instantaneity is ensured. The input data is the marker area obtained in step 2, and the output result is the specific marker text content.

Scene text detection or recognition methods have generally utilized the development of deep learning architectures and have demonstrated exceptional accuracy on common data sets when processing multi-resolution and multi-directional text. Considering that the input image of the present invention may not contain an irregular shape and a distorted image, a processing architecture based on CTC (connection semantic Temporal Classification) for a general regular image is selected, and text recognition is realized by concatenating a convolutional layer, a recurrent neural network layer, and a CTC transcription layer, where the structure is the structure of a CRNN network, but the CRNN network has certain defects.

Fig. 8 is a diagram of a CRNN network structure according to an embodiment of the present invention, where an original CRNN network is designed for a horizontal text, and an extracted feature of the original CRNN network finally compresses an input image into one pixel in the longitudinal direction, and for a vertical contact network column number that is common in a railway, if the input image is directly input into the network, it is obviously impossible to extract a sequence feature, so that when a railway scene image is used for network training, a sample of a vertical text needs to be rotated counterclockwise by 90 ° first, and then adjusted to a size suitable for network input and input into the network, so that the network can effectively learn the feature of the vertical text. In the simulation experiment of the highway scene, all samples are vertical texts, so that the sizes of convolution kernels are directly selected to be modified when a network is built, and a transverse rectangular convolution kernel and a pooling kernel are adopted to replace a longitudinal rectangular convolution kernel and a pooling kernel, so that an input image is compressed to one pixel in the width direction, and the longitudinal sequence characteristics of the pixel are extracted.

The original CRNN network adopts VGG-16 as a feature extraction network, the forward propagation speed is slow and the feature extraction performance is general due to the large network parameter quantity, the network can better utilize deep features by introducing a residual block structure into the network by using a ResNet network structure, meanwhile, in order to reduce the calculated quantity and the parameter quantity, the number of layers of convolution layers is properly reduced, and in addition, by using the idea of deep separable convolution in Mobile-Net, the deep separable convolution is used in partial convolution layers to replace the traditional convolution. Through the transformation, better identification accuracy and reasoning speed than the original network are finally obtained.

And finally, in order to further compress the network model and ensure the real-time performance of the algorithm, the calculated amount and the parameters of the network are reduced by pruning the network convolution layer. Convolutional neural networks are typically stacked from a plurality of convolutional layers, each of which contains a plurality of convolutional kernels. Assuming that the weight tensor of a certain convolution kernel is K, the basic convolution operation can be expressed as:

in the formula: x is the input of the convolutional neural network, and an input feature map X and an output feature map Fx are formed at a convolution kernel K in the process of forward propagation of the convolutional neural network along the network. Any convolution kernel K can be regarded as a characteristic template, and convolution operation is to scan an input image and calculate the matching degree of different positions of the input image and the characteristic template. The larger the number of convolution kernels, the more patterns the convolutional neural network can express, and the stronger the learning ability. Therefore, in practical applications, training a network structure containing more convolution kernels for a particular task will usually achieve the training goal more easily, but this will often result in a network with many redundant or useless convolution kernels. To identify these unnecessary convolution kernels, the current common approach is to compute after training is completeL1 norm of each convolution kernel

The convolution kernel with the large L1 norm is regarded as important, and the convolution kernel with the small L1 norm is regarded as unimportant and can be removed from the network, so that the purposes of reducing the calculation amount, avoiding over-training and the like are achieved. However, as can be seen from the nature of convolution operations, each convolution kernel actually represents a feature template at some level of abstraction. For example, in image recognition applications, the convolution kernels of the first layer are typically edge detection templates in different directions. Therefore, the magnitude of the convolution kernel elements does not necessarily accurately reflect whether the features represented by a certain convolution kernel are useful for the target problem. In other words, the convolution kernel L1 norm

The magnitude of (2) represents the magnitude of the feature only, and does not reflect the importance of the convolution kernel. The invention adopts the scaling coefficient of the batch normalization layer connected after the convolution layer as the evaluation standard of the importance degree of the convolution channel for pruning, and the principle is that the batch normalization layer normalizes the characteristic diagram output by the previous convolution layer, and the scaling coefficient just reflects the response amplitude of the characteristic diagram, so the invention can be used for judging the importance degree of the corresponding channel characteristic. The formula of the batch normalization layer is:

after the model is pruned, fine tuning training is carried out on the pruned model, so that the model precision can be ensured as far as possible, and a faster forward propagation speed can be obtained.

And finally, the detection network and the identification network are cascaded to realize the complete line visual responder identification algorithm, wherein the cascade method comprises the steps of firstly judging a target object class label obtained by the detection network, mapping a target frame coordinate obtained by the detection network back to an original image for a target with a label class of a text region, cutting out a corresponding region and inputting the corresponding region into the identification network of the second stage, and obtaining the identification network input from the original image to obtain higher resolution so that the network can learn more detailed characteristics and the character identification is facilitated.

In summary, the embodiment of the present invention provides a method for implementing a virtual rail transponder, which can be used for response positioning during operation of a rail transit vehicle, and uses a machine vision method to detect the position and identify the content of an existing marker on a line, and the detection and identification algorithm uses a deep learning method, so as to well meet the requirements of accuracy and real-time performance of the algorithm, completely get rid of the dependence on a ground transponder and a satellite, and implement completely autonomous train response positioning, so that the system has better response capability and reliability for various complex and severe conditions such as extreme weather, tunnel road sections, mountain road sections, equipment damage, and communication interruption.

The line characteristic virtual transponder based on machine vision in the embodiment of the invention can be compatible with the existing transponder system of the line and a train operation control system, and the existing system can realize reliable continuous autonomous positioning under adverse conditions such as extreme weather, geological disasters, ground equipment faults, mountain areas, tunnel environments and the like through technical upgrading, thereby having important significance for improving the railway operation safety level.

The invention adopts the existing marks of the line, such as kilometer posts and the numbers of the contact net upright posts, as the position information source of the virtual transponder, avoids additionally installing other equipment on the line, and greatly reduces the construction and maintenance cost.

The invention adopts a machine vision method based on deep learning to detect and identify the target, avoids manual design and characteristic extraction, and ensures that the algorithm has better robustness for the application under different light, different backgrounds and different weather conditions.

The image feature extraction part network structure adopted by the invention is optimized in light weight, the parameter quantity and the calculated quantity are small, and the operation can be directly carried out on the embedded computing platform without sending the image data to the server side for processing, so that the system is more real-time and reliable.

The system reserves interfaces with other sensors and vehicle-mounted positioning schemes, and is convenient to expand and optimize.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, they are described in relative terms, as long as they are described in partial descriptions of method embodiments. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for implementing a virtual transponder in a track, comprising:

2. The method of claim 1, wherein step 1 comprises:

3. The method according to claim 1 or 2, wherein the step 2 comprises:

4. The method of claim 3, wherein the improving the input mode of the YOLOv3 network to reduce the pixel loss of the small target in the down-sampling process comprises:

5. The method of claim 3, wherein the improving the activation function and the multi-scale feature fusion structure of the YOLOv3 network, and the improving the output processing of the YOLOv3 network by the target-box weighted fusion WBF, comprises:

6. The method of claim 3, wherein step 3 comprises: