CN115601558A

CN115601558A - Single turnout state detection system and detection method and semi-automatic data labeling method

Info

Publication number: CN115601558A
Application number: CN202211305472.9A
Authority: CN
Inventors: 张素民; 白日; 何睿; 李阔; 王鑫海
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-01-13

Abstract

The invention aims to provide a single turnout state detection system and method and a semi-automatic data labeling method, wherein the detection system comprises: the image acquisition module is used for providing original data for a test process; constructing a turnout data set module for providing original data for the training and verification processes; the data preprocessing module is used for identifying the track area of original sample data, further reducing the target search space and providing clues for the turnout target of the track where the train is located in the subsequent screening process; and the turnout state judging module is used for improving the turnout state identification accuracy. The invention solves the problems of large single turnout state detection calculated amount, difficult turnout state identification, high hardware identification cost, large data labeling workload and the like in the prior art.

Description

Single turnout state detection system and detection method and semi-automatic data labeling method

Technical Field

The invention belongs to the technical field of rail transit, and relates to a single turnout state detection system, a detection method and a semi-automatic data labeling method.

Background

With the rapid development of rail transit, the operation safety of rail vehicles is widely concerned, and turnouts serve as key parts of railways to control the crossing and the communication between different rails. Switch position opening mistake or dispatch system inefficacy can cause serious potential safety hazard, consequently, switch state discernment is significant to train safety of traveling, specifically: (1) The turnout state identification can provide important clues for the identification of the running route of the train in the turnout, and even avoid rail traffic accidents under the condition that a dispatching system is out of order. And (2) the turnout area is a high-risk area of track damage. The point rail structure of the turnout often bears high impact load, in addition, due to the problems of non-optimal contact between the point rail and the wheel rim, toe angle clearance between the point rail and the stock rail and the like, the abrasion and plastic deformation of the turnout area are easily caused, and the turnout state identification can also extract the corresponding turnout area for the track fault detection task. (3) The turnout is an important reference object of a track line, and turnout identification can assist a train in positioning. The selective positioning of the track of the rail vehicle is an unsolved task, the positioning precision based on a navigation system cannot meet all requirements, signals of the navigation system cannot be always available, and the self-positioning quality of the rail vehicle can be improved by combining turnout identification and a data map for use.

The existing turnout scene identification method is to extract a single track area with a bifurcation point by using a single track segmentation model, further detect a turnout center point and realize identification of a track turnout, but the method can only realize identification of the turnout area and cannot identify the turnout state. The method also comprises a turnout identification method based on a traditional image processing method, wherein rail features are extracted by graying, median filtering and edge detection methods, then a main rail line and a lateral line of the rail are extracted by Hough transformation and by using features such as dip angle length and the like, and finally turnout identification is realized by using linear correlation degree analysis; however, the method needs manual design and extraction of track features, the calculation amount of the algorithm is large, and in addition, the method only classifies the turnout types, does not detect the real-time state of the turnout, and cannot provide completely effective line information for train drivers.

The existing turnout state recognition device utilizes a turnout sensor to recognize the position of a switch rail, judges the state of the turnout, utilizes a wheel sensor to monitor a driving signal, and finally designs an early warning strategy by combining the information of the turnout sensor and the wheel sensor. According to the method, the turnout state is directly identified by using the sensor, so that the reliability of data is ensured, but the method needs to install a corresponding identification device in each turnout area needing to be identified, so that the implementation cost is high, and the applicability is poor.

Switch structures in railway environments mainly use single switches as main points, and large-scale switch data set labeling workload is large, so that great challenges are brought to training and testing work of related algorithms. Therefore, the invention provides a single turnout state detection system, a single turnout state detection method and a semi-automatic data labeling method.

Disclosure of Invention

The invention aims to provide a single turnout state detection system, which solves the problems of high hardware identification cost and inaccurate detection result in the prior art.

The invention also provides a detection method of the single turnout state detection system, and solves the problems of large single turnout state detection calculated amount, difficulty in turnout state identification and inaccurate track area extraction in the prior art.

The invention also provides a semi-automatic data labeling method of the single turnout state detection system, and the problem of large workload of data labeling is solved.

The technical scheme adopted by the invention is that a single turnout state detection system comprises: the image acquisition module is used for providing original image data for a test process; constructing a turnout data set module for providing original data for the training and verification processes; the data preprocessing module is used for identifying the track area of the original sample data, further reducing the target search space and providing clues for turnout targets of the track where the train is located in the subsequent screening process; and the turnout state judging module is used for improving the turnout state identification accuracy.

A detection method of a single turnout state detection system comprises the following steps:

s1, making a turnout data set;

s2, carrying out image preprocessing on the data set image;

s3, constructing a track turnout state judging module;

s4, taking the preprocessed turnout image and the corresponding label as input data, taking the constructed turnout state distinguishing model as a detection tool, and obtaining a turnout detection result;

and S5, screening and identifying turnouts of the track where the train is located by combining the track where the train is located and the turnout state judgment result, and outputting a final turnout state identification result.

A semi-automatic data labeling method of a single turnout state detection system specifically comprises the following steps:

step 1, through single switch turnout state detection system, gather the turnout image under different circuit, different weather conditions to divide the image of gathering into two parts of artifical mark collection and automatic mark collection, utilize VIA marking tool to mark artifical mark collection, according to quantity 8 with artifical mark collection: 1:1, randomly dividing the image into three sub-data sets of train, train interval and test, wherein the image of the artificial labeling set is at least 8000 frames and the turnout state example of each category is not less than 10000;

step 2, training the marked data set by using the turnout state judgment deep learning model of S31 to obtain an optimal turnout identification model of the model; the specific process is as follows:

training and testing a turnout state discrimination model in the single turnout state detection system by utilizing the manual marking set in the step 1, wherein the training times are recorded as R _i When it is at R _i When the detection accuracy and the recall rate of the turnout state of the wheel are both more than 0.85, the model parameters of the wheel are taken as available model parameters, and the model parameters and the detection of the wheel are takenPacking the test results together into the R < th > test result _i The wheel available turnout state judging model is stored into an available model list; otherwise, abandoning the model parameters of the wheel, continuing to train and test the model for the next round, stopping operation when the available model list is not empty and the continuous 3-round test results have no optimization effect, and selecting a group of available turnout state judgment models with the best performance from the available model list as the optimal turnout state judgment models; when the available model list is empty and the training times reach a threshold value R _{i threshold} When R is _{i threshold} The default value is 300, at the moment, the single turnout state detection system cannot meet the detection requirement, the calculation process is stopped, and the system is optimized;

step 3, randomly disordering the images in the automatic labeling concentration, grouping the images by taking 1000 images as a unit, recording the images less than 1000 according to 1000, randomly selecting 1 group of data, utilizing the optimal turnout state discrimination model obtained in the step 2 to detect the turnout state, obtaining the label information of the images in the automatic labeling concentration, and removing the selected grouped data from the automatic labeling data set;

step 4, manually checking whether the automatically generated annotation files meet the annotation requirements, optimizing corresponding unreasonable annotations, eliminating images without generated annotation information, and adding images with qualified annotations into a manual annotation set;

and 5, optionally selecting a group of images from the rest group of images of the automatic labeling set, repeating the steps 2,3 and 4 until the automatic labeling set becomes empty, completing labeling of all the images of the automatic labeling set, and ending the labeling process.

The invention has the beneficial effects that:

1. the track region extraction method is provided, track lines are extracted by designing a region growing strategy, and are screened and paired, so that the track region extraction accuracy is improved, and the search space of a deep learning network is reduced.

2. A turnout state distinguishing model combining a deep learning network and a traditional image processing technology is provided, firstly, the deep learning network is used for turnout state recognition, then, the traditional image processing method is used for carrying out secondary detection on a difficult sample with low confidence coefficient, a turnout state recognition mechanism is established, and compared with a single detection scheme, the turnout state distinguishing model is more transparent in detection mechanism and higher in result reliability.

3. The track turnout target and the side track turnout target of the train can be distinguished by combining the track area of the train and the predicted turnout target frame information, and turnout targets in different areas are identified according to importance degrees.

4. The method can carry out initial labeling on the image through a deep learning network, and then is corrected manually, so that direct manual labeling is avoided, and the labeling workload is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flow chart of a single switch state detection system according to an embodiment of the invention.

Fig. 2a is a schematic diagram of a left-turn switch state according to an embodiment of the present invention.

Fig. 2b is a schematic diagram of a right turn switch state in accordance with an embodiment of the present invention.

FIG. 3 is a flow chart of data preprocessing according to an embodiment of the present invention.

FIG. 4 is a diagram illustrating the effect of data preprocessing according to an embodiment of the present invention.

Fig. 5 is a schematic diagram of a turnout identification model according to an embodiment of the invention.

Fig. 6 is a schematic structural diagram of a TRCSP _ N module in the turnout determination model of the present invention.

Fig. 7a is a diagram of the effect of the detection result of the YOLOv5 algorithm turnout.

FIG. 7b is a diagram showing the effect of the detection result of the turnout in the algorithm of the present invention.

FIG. 8 is a flow chart of semi-automatic data set annotation according to an embodiment of the invention.

FIG. 9 is a comparison diagram of the effect of the automatically generated labeling box and the manual labeling box in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

As shown in fig. 1, the present invention provides a single switch state detection system, comprising:

the image acquisition module is used for providing original data for a test process;

constructing a turnout data set module for providing original data for the training and verification processes;

the data preprocessing module is used for identifying the track area of the original sample data, further reducing the target search space and providing clues for turnout targets of the track where the train is located in the subsequent screening process;

and the turnout state judging module is used for improving the turnout state identification accuracy.

The invention also provides a detection method of the single turnout state detection system, which comprises the following specific steps:

s1: and (5) making a turnout data set. Due to the lack of public switch data sets, switch data sets of trains in real operation scenes need to be made.

Further, S1 comprises the following substeps:

s11: the camera is arranged at the center of a train cab, so that the optical axis of the camera is consistent with the positive direction of train operation, and no barrier is shielded in the visual angle of the camera.

Preferably, the Haekwondo DS-2DC4223IW-D is selected as the camera, the farthest distance which can be detected by the camera of the type is 200m, the turnout state detection requirement can be met, and the size of the collected images is 1080 multiplied by 1920, so that the post-processing is facilitated.

S12: recording track scene videos collected by cameras under different lines and different weather working conditions in the running process of a train, then performing frame division and disassembly on the videos, and manually screening out images containing identifiable single turnout states to serve as data set original images.

Preferably, to ensure that the data set is representative enough, the data collection should be collected in different lighting conditions, different railway lines, and the collected raw images should be 8000 frames minimum and ensure that the turnout state instances of each category are not less than 10000.

S13: and marking the screened images by using a VIA marking tool, wherein the marking types are divided into a left-turning turnout state and a right-turning turnout state, as shown in figures 2 a-2 b. The turnout state is judged by identifying the contact relation between the switch rail and the stock rail, if the left switch rail is not in contact with the static stock rail, the turnout is considered to be a left-turn turnout, and similarly, if the right switch rail is not in contact with the static stock rail, the turnout is considered to be a right-turn turnout.

S14: data sets are divided into 8 by number: 1:1 is divided into three subdata sets of train, train val and test at random. the train data set and the train data set are used for training the verification process, and the test data set is used for testing the reasoning process.

Because there are very big time correlation and similarity between the adjacent switch images obtained after video acquisition, in order to ensure the randomness of data division, the image sequence after marking needs to be disordered at random before the data set division.

S2: as shown in fig. 3, image pre-processing is performed on the dataset image.

Further, S2 includes the following substeps:

s21: and carrying out graying processing on the RGB image acquired by the data set. The RGB image pixel unit is composed of three channels of R, G, and B, and the gray pixel gray is obtained by averaging the values of the three channels of the RGB image, i.e. gray = (R + G + B)/3. Wherein R, G and B are three channel values of the RGB image respectively.

S22: and extracting the track line information by adopting a region growing method. The region growing algorithm utilizes the gray difference value between the gray value of the rail and the gray value of the rail base surface in the turnout image, the regions with similar gray values are combined, the remaining region is the rail line information, and the extracted rail lines are fragmented.

Further, the specific steps of S22 are as follows:

s221: acquiring a seed starting point of the region growing, wherein the seed starting point is a growth starting point required to be given in advance in the region growing process. Because the turnout position in the turnout image is usually in the lower half part of the image, in order to remove redundant trackless areas and ensure that all track areas are reserved, the area growing method only processes the area of the turnout image with y-axis direction coordinates in the range of [1/3h, h ]. Setting 10 seed starting points, and setting the seed starting points in the image by adopting an evenly distributed method, wherein the image positions of the first 5 seed starting points are (w × i/5,5/9 × h) respectively, the image positions of the last 5 seed starting points are (w × (i-5)/5, 7/9 × h) respectively, wherein h is the height of the original turnout image, w is the width of the original turnout image, and i represents the ith seed starting point.

S222: let the coordinates of the ith seed start point be (sdx) _i ，sdy _i ) And obtaining the pixel point coordinate in the neighborhood of 8 seed starting point positions, and marking the neighborhood pixel point coordinate as (X) _{Adjacent to} ，Y _{Adjacent to} ) Then (X) is known _{Adjacent to} ，Y _{Adjacent to} )＝(sdx _i ，sdy _i ) + R, where R represents an 8 neighborhood relative coordinate array, R = [ (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 0), (1, -1), (0, -1)]。

S223: comparing the gray scale difference between the ith seed starting point position and the 8 adjacent region position, and determining the gray scale difference D ((sdx) _i ，sdy _i ))≤gray _t When the seed starting point position and the corresponding neighborhood position belong to the same gray scale region, the gray scale values of the seed starting point position and the corresponding neighborhood position are uniformly set to be 255, and the neighborhood position is set to be a new seed starting point, gray _t For the gray scale difference threshold, the default value is set to 6. The expression of the gray value difference is as follows:

D((sdx _i ，sdy _i ))＝gray((sdx _i ，sdy _i ))-gray((X _{adjacent to} ，Y _{Adjacent to} ))

Wherein, D ((sdx) _i ，sdy _i ) Represents the difference between the ith seed origin and the corresponding neighborhood gray value, gray ((sdx) _i ，sdy _i ) Gray value, gray ((X)) representing the starting position of the seed _{Adjacent to} ，Y _{Adjacent to} ) Representing the neighborhood position gray value.

S224: and repeating the step S223 until no seed starting point is available, which indicates that the corresponding region has completed the 'growing process', and processing after the region growing, wherein the rest region surrounded by 255 gray values is the track line region.

S23: and screening and pairing the track lines to determine a track area.

The track lines obtained after S224 often have a large amount of noise and false track lines to be removed, a Hough transform algorithm is selected to extract the track lines, and the obtained track lines are represented as follows: the jth track line is represented by two end values of a line segment, and is marked as (xh) _j ，yh _j )、(xt _j ，yt _j ) Wherein (xh) _j ，yh _j ) Indicating the end point of the track line segment closer to the x-axis, i.e. yh _j <yt _j . From the coordinates of the end points of the track lines, the expression of the jth track line is known

Wherein y is the y-axis coordinate of the track line and x is the x-axis coordinate of the track line; slope of track line

Length of track line in pixel coordinate system

Wherein k is _j Is the slope of the jth track line, L _j Is the jth track line length.

Further, the specific steps of S23 are as follows:

s231: removing false track lines: when k is _j ∈[-∞，-1]∪[1，∞]And L is _j >The track line is retained for 60 pixels, otherwise the track line is deleted.Wherein k is _j The value range of (A) is an optimal value obtained by manual adjustment and repeated tests; l is a radical of an alcohol _j The value range of (A) is based on the empirical value obtained after processing a large number of orbit images, when L is _j >When the image is 60 pixels, a good track line rejection effect can be obtained on most track images.

S232: the trajectory lines are grouped according to their slope.

Firstly, calculating the slope of the rest track lines, sorting all the track lines from large to small according to the absolute value of the slope, then selecting n track lines as a pairing standard from the sorted track lines in an equidistant sampling mode, wherein the default value of n is the number of all the track lines

The slope of n tracks is recorded as k _n Traversing all the remaining preserved track lines, and recording the slope of each traversal track line as k _g When | | | k _n |-|k _g || _< k _t When k is greater than _g Corresponding track line is added to k _n The indicated grouping of track lines. k is a radical of _t The default value is set to 0.1 for the slope absolute difference threshold.

S233: repeating S232 for the remaining ungrouped track lines until all track lines have been added to the packet, ending the grouping process, and discarding packets for only one track line directly.

S234: and combining different track line segments of the same track, and carrying out track combination and pairing.

The grouped internal orbital lines are divided into two types according to the positive and negative slopes, the two types are respectively marked as a type group and a type group b, and the type groups with only one orbital line are not combined. The number of each group of track lines meeting the merging requirement is recorded as N (N is more than or equal to 2), and track merging and track line pairing are respectively carried out according to the following steps:

the track merging specific process is as follows:

the following processing is respectively carried out on the a and b type group track lines: calculating slope average k of all track lines within a track line class group _{Are all made of} ，

Fine-tuning the slope of N track lines to be k _{Are all made of} . For N track lines, referring to the track line expression in S23, the track line expression is transformed into: y-k _{Are all made of} x＝y _h -k _{Are all made of} ·x _h Wherein y is _h Is the y-axis coordinate, x, of a certain point on the track line _h Is the x-axis coordinate corresponding to the y-axis coordinate of a certain point on the track line; let C = y _h -k _{Are all made of} ·x _h Wherein C represents the intercept of the corresponding trajectory line expression; sorting N track lines in a descending order according to the C value, and calculating the inter-track distance between every two adjacent track lines

Wherein p belongs to N-1; in the above formula, D _p The distance between the P track line after descending sorting and the P +1 track line after the P track line; c _p The intercept of the expression of the P track line is shown; c _p+1 Is the intercept of the expression of the P +1 track line; k is a radical of _{Are all made of} Is the fine-tuned slope of the trajectory line; n is the number of track lines. When D is present _p <D _t Then, it indicates that the track lines P and P +1 belong to the same track, and the two track lines are merged, D _t The default value is set to 6 for the track line merge distance threshold.

Specifically, the merging process is as follows: extracting the end points of the two ends of the P-th track line as (xh) _p ，yh _p )、(xt _p ，yt _p ) (ii) a Extracting the end points of the two ends of the P +1 th track line as (xh) _p+1 ，yh _p+1 )、(xt _p+1 ，yt _p+1 ) (ii) a And setting new end points of the combined track line, and marking as H and T. The new end point coordinate is determined according to the original track line end point coordinate when yh _p> yh _p+1 Then, H = (xh) _p+1 ，yh _p+1 ) Otherwise H = (xh) _p ，yh _p ). Similarly, when yt _p> yt _p+1 Then, T = (xt) _p ，yt _p ) Otherwise, T = (xt) _p+1 ，yt _p+1 )。

Pairing the track lines: the real paired track lines are in parallel relation, but in the turnout image, when the paired track lines are positioned on one side of the central line of the image, the paired track lines are still parallel, namely the slopes are the same, but when the paired track lines are positioned on two sides of the central line of the image, the slopes of the two track lines are similar in size and opposite in sign.

And respectively sorting the processed track lines in a descending order according to the C value in the class groups a and b, and respectively pairing the sorted track lines in pairs in sequence in the class groups a and b. And when the remaining track lines which are not paired exist in the a-type group and the b-type group after the pairing, independently pairing the remaining track lines in the a-type group track lines and the remaining track lines in the b-type group track lines, and marking the paired track lines as LinepairC by a special mark.

Particularly, when a train runs on a curve or a camera is subjected to position deviation and other factors in the data acquisition process, no linkage air train c pairing track line exists, and under a conventional running condition, the linkage air train c pairing track line is a boundary line of a track area where the train is located.

S235: and sequentially connecting the four end points of the matched track lines to serve as a track area.

S236: and repeating S234 and S235 to finish classification, combination and pairing of all grouped track lines, and extracting all track areas in the turnout image.

S24: and screening all track areas according to the track area positions to determine the track where the train is located. When a track area formed by the linebearer paired track lines exists, the area is the track where the train is located. Otherwise, screening the track area where the train is located according to the following process: specifically, all track areas are screened according to a centroid calculation formula, and the specific process is as follows: calculating the horizontal coordinate of the center point of the e-th track area and recording as xz _e Calculating the track line spacing D of the e-th track area according to the track line spacing calculation formula in S234 _e . The camera is positioned in the center of the train cab, the track where the train is positioned in the area where the image is closest to the center of the track, and the track pitch imaging is the largest. According to the difference and characteristics of the side track area and the track area of the trainDesign evaluation function as G _e ＝0.3×D _e ² -0.7×(xz _e -w/2) ² Wherein G is _e The evaluation function value of the e-th track area is obtained, and w is the width of the original turnout image; the track line spacing De of the track area is considered in one item of the evaluation function, and according to the track image acquisition characteristics, the track line spacing De of the track area where the train is located should be maximum as the track area where the train is located is often located in the middle area of the image; the distance between the abscissa of the central point of the track area and the center of the image is used as an evaluation index of the other item of the evaluation function, and the smaller the distance between the abscissa of the central point of the track area and the center of the image is, the higher the possibility that the track area is the track area where the train is located is. Finally, the term De of the track line spacing ² And the distance term (xz) between the abscissa of the central point of the orbit region and the center of the image _e -w/2) ² And respectively taking different weights to form the evaluation function.

And respectively calculating the G value of each track area according to the evaluation function, and taking the track area corresponding to the maximum G value as the track area where the train is located.

S25: and intercepting the minimum bounding rectangle containing all the track areas from the original image as an output image of data preprocessing.

And comparing all the corner points of the track area, taking out the maximum value of horizontal and vertical coordinates in all the corner points of the track area, taking the minimum value of the horizontal and vertical coordinates as the upper left corner point of the minimum circumscribed rectangle, and taking the maximum value of the horizontal and vertical coordinates as the lower right corner point of the minimum circumscribed rectangle. The minimum circumscribed rectangle is denoted as [ x ] _min ，y _min ，x _max ，y _max ]。

S26: and adjusting the original image label to correspond to the cut image. Let the image label be [ x ] ₁ ，y ₁ ，x ₂ ，y ₂ ]Then the clipped image label is [ x ] ₁ -x _min ，y ₁ -y _min ，x ₂ -x _min ，y ₂ -y _min ]。

As shown in fig. 4, after the data preprocessing operation is performed on the original switch image, the output image only includes the track area of the original image, so that invalid pixels in the image are reduced compared with the original image, and compared with the conventional deep learning network input, the network input image size is smaller, and the search space of the deep learning network is reduced.

S3: as shown in fig. 5, a track turnout state discrimination module is constructed, and the track turnout state discrimination module is divided into a deep learning model and a traditional image processing technology model. The deep learning model is used for detecting all samples of the turnout, and the traditional image processing technology model detects difficult samples again on the basis of the deep learning model processing so as to improve the detection accuracy.

Further, S3 includes the following substeps:

s31: and constructing a deep learning model.

Further, the specific steps of S31 are as follows:

s311: and constructing a backbone network, wherein the backbone network refers to the CSPDarknet53 structure and improves the CSP module in the CSPDarknet53 structure. The main network is divided into 11 network modules, which are a Focus module, a CONV module, a TRCSP _1 module, a CONV module, a TRCSP _3 module, a CONV module, a TRCSP _1 module, an SPP module and an SE module in sequence.

The Focus module is a slicing method for keeping image characteristics as much as possible and quickly sampling images, specifically, slicing an image preprocessed by S25 data, and setting the width of the image preprocessed by the data as W ₁ Height of H ₁ . Slice size W ₁ /2，H ₁ /2. Then, the sliced feature maps are stacked in the channel direction, the stacked feature maps are downsampled by using a convolution kernel of 1 × 1, and the number of channels of the feature maps to be output is adjusted to 64.

The CONV module performs convolution operation on the feature map, performs double down sampling on the obtained feature map, and adjusts the number of output channels.

The TRCSP _1 module and the TRCSP _3 module are feature extraction modules with a Transform structure, and the structural diagram of the module is shown in fig. 6, where TRCSP _1 indicates that N =1 in the structural diagram, that is, 1 Transform component is included; similarly, TRCSP _3 represents a structure diagram in which N =3, i.e. 3 Transform modules are included. A Transform module structure is introduced into the TRCSP _1 module and the TRCSP _3 module, so that the feature extraction capability of the network can be improved, and the CSP network structure can avoid a large amount of gradient repeated calculation and keep the feature information of an upper network by combining single convolution branches and Transform module branch information in parallel.

In particular, the Transform module is an excellent attention network module, and can improve the ability of capturing different local features by using a self-attention mechanism. LayerNorm and Dropout layers in the Transform module can regularize the network, so that convergence of a network training process is facilitated, and network overfitting is avoided.

The SPP module is used for realizing spatial pyramid pooling of image features, firstly convolving a feature map input at an upper layer, then reducing model parameters by adopting 3 parallel maximum pooling layers, finally splicing a pooling result and a result after convolution, performing convolution again, and further extracting the image features.

Compared with the traditional CSPDarknet53 module, the trunk network introduces an SE module behind the SPP module to further extract image features. The SE module is a channel attention module, and can automatically learn the importance degree of different channels of the feature map, and more concentrate the attention of the network on useful target features rather than image noise.

S312: and fusing the image features extracted in the step S311 to construct a detection head network.

Further, the specific steps of S312 are as follows:

s3121: specifically, the feature maps output by the first and second TRCSP _3 modules in S311 are extracted and recorded as C1 and C2, respectively, and the feature map output by the last SE module of the backbone network in S311 is recorded as C3. The characteristic diagrams input by the first TRCSP _3 module and the second TRCSP _3 module are respectively marked as C1 _p 、C2 _p The characteristic diagram of the last SE module input of the S311 backbone network is marked as C3 _p 。

S3122: c3 is convoluted to obtain a characteristic diagram D3 ₀ And C3 upsampling, and splicing with the C2 result to obtain a characteristic diagramIs marked as D2 ₀ (ii) a To D2 ₀ Upsampling, splicing with the C1 result, and marking the obtained characteristic diagram as D1 ₀ 。

S3123: to D1 ₀ 、D2 ₀ 、D3 ₀ And respectively carrying out cavity pyramid pooling on the three output characteristic graphs, and then respectively feeding back to the input ends of the first TRCSP _3 module, the second TRCSP _3 module and the SE module.

S3124: during the calculation of the first TRCSP _3 module, the second TRCSP _3 module and the SE module, the input end is at C1 _p 、C2 _p 、C3 _p Introducing feedback in S3123, repeating S3121, and recalculating C1, C2 and C3.

S3125: repeating S3122, S3123 and S3124 twice in turn, circularly extracting and fusing the image features to obtain D1 ₂ 、D2 ₂ 、D3 ₂ . D1 ₂ 、D2 ₂ 、D3 ₂ And D1 ₀ 、D2 ₀ 、D3 ₀ Splicing is carried out in the channel direction, and D1, D2 and D3 are respectively obtained and used as characteristic diagrams of the detection head for turnout state detection.

S313: and decoding the feature map obtained in S3125 to obtain the position of the prediction frame, the type of the object in the prediction frame and the confidence of the type, and specifically, selecting a yolov5 decoder model by the decoder.

S32: and constructing a turnout state discrimination model of the traditional image processing technology.

Further, the specific steps of S32 are as follows:

s321: obtaining the prediction frame information with reliability lower than 0.5 in S313, obtaining the position information of the prediction frame in the original image according to the down sampling ratio, and intercepting the corresponding prediction frame area in the original image as R _{Preparation of} ；

S322: to R _{Preparation of} The area is grayed. Then, edge features of the image are extracted through a canny operator, then, the track line features are extracted through a Hough transform method, and the coordinates of the end points of the extracted line segments are stored in list.

S323: first, the extracted track line end point information is used to convert the track line into the following expression form: y = Kx + C, k is an intercept, and the slope and intercept of each track line are recorded, respectively, where the slope and intercept of the t-th track line are denoted as Kt, ct, respectively.

The trajectory lines are grouped according to the trajectory line slope. Firstly, sorting all track lines from large to small according to Kt values, then selecting nt track lines as a pairing standard from the sorted track lines in an equidistant sampling mode, and recording the slopes of the nt tracks as Kt respectively _nt Traversing all the remaining track lines, and recording the slope of each traversal as kt _g When | kt _nt -kt _g |<kt _t When, will kt _g Corresponding track line added to kt _n The indicated grouping of track lines. kt _t The default value is set to 0.2 for the turnout track slope difference threshold.

Merging grouped track lines, and setting each subgroup to have N _t And if only one track line exists in the grouping, the merging operation is not needed. The slopes of all the trajectory lines within a group are averaged and denoted K _{Are all 1} And sorting the two track lines in descending order according to the intercept value, and calculating the distance between every two adjacent track lines

Wherein q ∈ N _t -1. In the above formula, dq is the track line distance between the q track line sorted in descending order and the q +1 track line sorted behind the q track line; cq is the intercept of the q-th track line expression; c _q+1 The intercept is the intercept of the q +1 track line expression; n is a radical of hydrogen _t Is the number of track lines. When Dq<D1 _t When it is indicated that track lines q and q +1 belong to the same track, the two track lines are merged, D1 _t The default value is set to 4 for the switch track line merge distance threshold.

Specifically, the endpoints of both ends of the q-th track line are extracted as (xh) _q ，yh _q )、(xt _q ，yt _q ) (ii) a Extracting the two end points of the (q + 1) th track line as (xh) _q+1 ，yh _q+1 )、(xt _q+1 ，yt _q+1 ) (ii) a Setting a new endpoint of the combined track line, and recording as H ₁ 、T ₁ . New end point coordinate according to original rail line endPoint coordinates are determined when yh _q> yh _q+1 When H is present ₁ ＝(xh _q+1 ，yh _q+1 ) Otherwise H ₁ ＝(xh _q ，yh _q ). Similarly, when yt _q> yt _q+1 When, T ₁ ＝(xt _q ，yt _q ) Otherwise T ₁ ＝(xt _q+1 ，yt _q+1 )。

S324: and when the screened track line is not equal to 4, indicating that the detection of the turnout track line has the condition of missing detection or false detection, adjusting the track line to extract the hyperparameter, and repeating S322 and S323 until 4 track lines are detected.

S325: respectively calculating the coordinate of the middle point of each track line and recording as (xc) _n ，yc _n ) And n is 1,2,3,4. Pressing xc to the center point of each track line _n Sequentially sorting the sizes in ascending order, and sequentially recording the sorted track lines as L _{Left 1} ,、L _{Left 2} 、L _{Right 1} ，L _{Right 2} (ii) a Wherein L is _{Left 1} 、L _{Left 2} The gap between the left side of the turnout is L _{Right 1} 、L _{Right 2} The gap between the two points is the gap on the right side of the turnout.

S326: calculating L _{Left 1} 、L _{Left 2} Point of intersection between them, denoted as P _{Left side of} Calculating L _{Right 1} 、L _{Right 2} The intersection point between them is denoted as P _{Right side} . Judging whether the intersection point is located in the range of the corresponding track line segment, if so, considering the gap line as a false gap line, and ensuring that the two track lines are intersected without gaps; and if the intersection point is positioned outside the range of the corresponding track line segment, the boundary line is considered as a real gap line.

Specifically, if P _{Left side of} Within the corresponding track segment, P _{Right side} If the turnout is not in the range of the corresponding track segment, the turnout is in a right-turn turnout state;

if P _{Left side of} Out of range of corresponding track segment, P _{Right side} If the turnout is located in the corresponding track line segment range, the turnout is in a left-turn turnout state;

if P _{Left side of} 、P _{Right side} Are not within the corresponding track segment or P _{Left side of} 、P _{Right side} Are all located at the corresponding trackAnd in the line segment range, the turnout state cannot be judged by adopting a traditional image processing method, and a deep learning model identification result is directly output.

S4: and (3) taking the turnout image and the corresponding label preprocessed in the S25 and the S26 as input data, and taking the turnout state judgment model constructed in the S31 and the S32 as a detection tool to obtain a turnout detection result.

Further, S4 includes the following substeps:

s41: and (4) sending the turnout images preprocessed in the S25 and the S26 and the corresponding labels to the network depth model of the S31 for primary detection of turnout states, and obtaining a predicted target frame and a category confidence coefficient.

S42: and judging the category confidence, directly outputting a turnout state judgment result when the confidence is greater than a threshold, mapping the predicted target frame back to the original image when the confidence is less than the judgment threshold, judging the turnout state again by using a turnout state judgment model of the S32 traditional image processing technology, and setting the default value of the confidence threshold to be 0.5. And when the two judgment results are consistent, the judgment result is directly output, and when the two judgment results are inconsistent, the result of the turnout model of the traditional processing technology is output, but the predicted target frame is specially marked. When the turnout model cannot judge the result by the traditional processing technology, the recognition result of the deep learning network model is adopted as the final output result, and special labeling is also carried out. Specifically, the special labeling is to perform special distinction or special text description on the color of the prediction frame in the process of visualizing the detection result so as to indicate the distinction between the prediction frame and other prediction frames directly judging the turnout state.

S5: and (5) combining the output results of S24 and S42, screening and identifying the turnouts of the track where the train is located, and outputting a final turnout state identification result. Comparing the track area image of the train output in the step S24 with the turnout prediction frame output in the step S42, and calculating the intersection ratio J of all the recognized prediction frames and the track area image one by one _iou And recording the image of the track area where the train is located as A, and recording the prediction frame as B.

In the above equation, area (a) indicates a track area where the train is located, and area (B) indicates an area where the prediction box is located.

When the intersection ratio is smaller than the threshold value, the prediction frame is considered to be a side track turnout, so that the train operation safety is not directly influenced, but important references can be provided for tasks such as turnout area identification, train positioning and the like, and therefore the important references are also reserved.

In particular, in order to enable the driver to reasonably use the switch detection result, the switch state when the intersection ratio is greater than the threshold value is displayed in a conspicuous red color, and the switch state when the intersection ratio is less than the threshold value is displayed in a green color for distinction.

For example, as shown in fig. 7 a-7 b, the detection result of the YOLOv5 algorithm turnout and the detection result effect of the turnout of the invention algorithm are that the red oval in fig. 7a indicates that the YOLOv5 algorithm generates missing detection, and fig. 7b shows that the single turnout state detection system successfully detects the turnout target at the corresponding position and correctly identifies the corresponding turnout state.

The invention also provides a semi-automatic data labeling method of the single turnout state detection system, which comprises the following specific steps as shown in figure 8:

step 1: through single switch state detecting system, gather switch image under different circuit, different weather conditions to divide into artifical mark set and automatic mark set two parts with the image gathered, and utilize VIA marking tool to mark artifical mark set, with artifical mark set according to quantity 8:1:1, randomly dividing the data set into three sub-data sets of train, train and test, wherein the train data set and the train data set are used for training a verification process, and the test data set is used for testing an inference process. In particular, in order to ensure the accuracy of subsequent automatic labeling, the manual labeling sets at least 8000 frames and ensures that the turnout state instances of each category are not less than 10000.

Particularly, images in the manual labeling set need to be manually screened, only images containing clear turnout states are selected and accurately labeled, the original video sequence images are directly selected from the automatic labeling set, manual screening is not needed, and the working intensity of manufacturing the data set is greatly reduced.

And 2, step: and training the marked data set through the turnout state discrimination deep learning model of S31 to obtain the optimal turnout identification model of the model.

Firstly, training and testing a turnout state discrimination model in a single turnout state detection system by using the manual labeling set in the step 1, wherein the training times are recorded as R _i When it is at R _i When the detection accuracy rate and the recall rate of the turnout state reach more than 0.85, taking the wheel model parameters as available model parameters, and packaging the wheel model parameters and the test result into an R < th > model parameter _i And storing the wheel available turnout state judgment model into an available model list. Otherwise, abandoning the model parameters of the round, and continuing to perform the next round of training and testing on the model. And when the available model list is not empty and the continuous 3-round test results have no optimization effect, stopping operation, and selecting a group of available turnout state judgment models with the best performance from the available model list as the optimal turnout state judgment models. When the available model list is empty and the training times reach a threshold value R _{i threshold} When R is _{i threshold} And the default value is 300, at the moment, the single turnout state detection system cannot meet the detection requirement, the calculation process is stopped, and the system is optimized.

And 3, step 3: and randomly disturbing the images in the automatic labeling set, grouping the images by taking 1000 images as a unit, and recording the image groups less than 1000 according to 1000. And randomly selecting 1 group of data, detecting the turnout state by using the optimal turnout state discrimination model obtained in the step 2, obtaining label information of the automatic labeling centralized image, and removing the selected grouped data from the automatic labeling data set.

And 4, step 4: and manually checking whether the automatically generated annotation file meets the annotation requirement, optimizing corresponding unreasonable annotation, removing the image without the generated annotation information, and adding the image with qualified annotation into a manual annotation set.

And 5: and (4) optionally selecting a group of images from the rest group of images of the automatic labeling set, repeating the steps 2,3 and 4 until the automatic labeling set becomes empty, completely labeling the images of the automatic labeling set, and ending the labeling process.

As shown in fig. 9, the frame with the label name is an automatic labeling frame, the frame without the label is a manual labeling frame, the label quality generated by the data set label semi-automatic labeling method can meet the training requirement, the labeling efficiency is greatly improved, and the labeling workload is greatly reduced.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A single switch state detection system, comprising:

the image acquisition module is used for providing original image data for a test process;

the data preprocessing module is used for identifying the track area of original sample data, further reducing the target search space and providing clues for the turnout target of the track where the train is located in the subsequent screening process;

2. A detection method of a single turnout state detection system is characterized by comprising the following steps:

s1, making a turnout data set;

s2, carrying out image preprocessing on the data set image;

s3, constructing a track turnout state judging module;

s4, taking the preprocessed turnout image and the corresponding label as input data, taking the constructed turnout state discrimination model as a detection tool, and obtaining a turnout detection result;

and S5, screening and identifying the turnouts of the track where the train is located by combining the track where the train is located and the turnout state judgment result, and outputting a final turnout state identification result.

3. The detection method of the single-turnout state detection system according to claim 2, wherein in the step S2, image preprocessing is performed on the data set image, and the method comprises the following steps:

s21, carrying out gray processing on the RGB image acquired by the data set;

s22, extracting track line information by adopting a region growing method;

s23, screening and pairing the track lines to determine a track area;

s24, screening all track areas according to the track area positions to determine the track where the train is located;

s25, intercepting the minimum external rectangle containing all the track areas from the original image as an output image of data preprocessing; comparing all the corner points of the track area, taking out the maximum value and the minimum value of horizontal and vertical coordinates in all the corner points of the track area, taking the minimum value of the horizontal and vertical coordinates as the upper left corner point of the minimum circumscribed rectangle, taking the maximum value of the horizontal and vertical coordinates as the lower right corner point of the minimum circumscribed rectangle, and marking the minimum circumscribed rectangle as [ x [ [ x ] _min ，y _min ，x _max ，y _max ]；

S26, adjusting the original image label to correspond to the cut image, and setting the original image label as [ x [ ] ₁ ，y ₁ ，x ₂ ，y ₂ ]Then the clipped image label is [ x ] ₁ -x _min ，y ₁ -y _min ，x ₂ -x _min ，y ₂ -y _min ]。

4. The method for detecting a single turnout state detection system according to claim 3, wherein in the step S22, the track line information is extracted by using a region growing method, and the method comprises the following steps:

s221, acquiring seed starting points of region growth, setting 10 seed starting points, and setting the seed starting points in an image by adopting an evenly distributed method, wherein the image positions of the first 5 seed starting points are (w × i/5,5/9 × h) respectively, the image positions of the last 5 seed starting points are (w × (i-5)/5, 7/9 × h) respectively, wherein h is the height of an original turnout image, w is the width of the original turnout image, and i represents the ith seed starting point;

s222, recording the coordinates of the ith seed starting point as (sdx) _i ，sdy _i ) And acquiring pixel point coordinates in the neighborhood of 8 seed starting point positions, and recording the neighborhood pixel point coordinates as (X) _{Adjacent to} ，Y _{Adjacent to} ) Then (X) is known _{Adjacent to} ，Y _{Adjacent to} )＝(sdx _i ，sdy _i ) + R, where R represents an 8 neighborhood relative coordinate array, R = [ (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 0), (1, -1), (0, -1)]；

S223, comparing the gray difference between the ith seed starting point position and the 8 neighboring region position, and determining the gray difference D ((sdx) _i ，sdy _i ))≤gray _t When the seed starting point position and the corresponding neighborhood position belong to the same gray scale region, the gray scale values of the seed starting point position and the corresponding neighborhood position are uniformly set to be 255, and the neighborhood position is set to be a new seed starting point, gray _t Setting the default value as 6 for the gray difference threshold value; the expression of the gray value difference is as follows:

In the above formula, D ((sdx) _i ，sdy _i ) Represents the difference between the ith seed origin and the corresponding neighborhood gray value, gray ((sdx) _i ，sdy _i ) Gray value, gray ((X)) representing the starting position of the seed _{Adjacent to} ，Y _{Adjacent to} ) Representing a neighborhood position gray value;

and S224, repeating the S223 until no seed starting point is available, wherein the corresponding region is indicated to complete the 'growing process', and processing is carried out after the region grows, and the rest region surrounded by 255 gray-scale values is the track line region.

5. The method for detecting a single turnout state detection system according to claim 3, wherein in the step S23, the track lines are screened and paired to determine the track area, and the method comprises the following steps:

s231, removing false track lines;

selecting Hough transform algorithm to extract the track line, wherein the obtained track line is expressed as follows: the jth track line is represented by two end values of the line segment, and is denoted by (xh) _j ，yh _j )、(xt _j ，yt _j ) Wherein (xh) _j ，yh _j ) Representing the end point of the track line segment nearer to the x-axis, i.e. yh _j <yt _j (ii) a From the coordinates of the end points of the track lines, the expression of the jth track line is known

Length of track line in pixel coordinate system

Wherein k is _j Is the slope of the jth track line, L _j The length of the jth track line; when k is _j ∈[-∞，-1]∪[1，∞]And L is _j >When 60 pixels are needed, the track line is reserved, otherwise, the track line is deleted;

s232, grouping the track lines according to the slope of the track lines;

calculating the slope of the rest track lines, sorting all the track lines from large to small according to the absolute value of the slope, selecting n track lines as a pairing standard from the sorted track lines in an equidistant sampling mode, wherein the default value of n is the number of all the track lines

The slope of n tracks is respectively k _n Traversing all the remaining preserved track lines, and recording the slope of each traversal track line as k _g When not ventilating|k _n |-|k _g ||<k _t When k is greater than _g Corresponding track line is added to k _n In the indicated grouping of track lines, k _t The absolute value difference threshold value of the slope is set as 0.1 by default;

s233, repeating S232 for the rest of non-grouped track lines until all track lines are added into the group, ending the grouping process, and directly abandoning the group with only one track line;

s234, combining different track line segments of the same track, and carrying out track combination and pairing;

dividing the grouped internal track lines into two types according to the positive and negative slope, respectively recording the two types as a type a and a type b, not merging the type groups with only one track line, recording the number of the track lines in each group which meet the merging requirement as N, wherein N is more than or equal to 2, and respectively merging the tracks and pairing the track lines according to the following processes:

the track merging specific process is as follows:

the following processing is respectively carried out on the a and b type group track lines: calculating the slope average k of all the track lines in the track line class group _{Are all made of} ，

Fine-adjusting the slope of N track lines to be k _{Are all made of} (ii) a For the N track lines, the track line expression in S231 is converted into: y-k _{Are all made of} x＝y _h -k _{Are all made of} ·x _h Wherein y is _h Is the y-axis coordinate, x, of a certain point on the track line _h Is the x-axis coordinate corresponding to the y-axis coordinate of a certain point on the track line; let C = y _h -k _{Are all made of} ·x _h Wherein C represents the intercept of the corresponding track line expression; sorting the N track lines in descending order according to the C value, and calculating the distance between every two adjacent track lines

Wherein p belongs to N-1; in the above formula, D _p The distance between the P track line after descending sorting and the P +1 track line after the P track line; c _p Truncation of expression for P track lineDistance; c _p+1 The intercept is the intercept of the expression of the P +1 track line; k is a radical of _{Are all made of} Is the fine-tuned slope of the trajectory line; n is the number of track lines, when D _p <D _t Then, it indicates that the track lines P and P +1 belong to the same track, and the two track lines are merged, D _t Merging the distance threshold for the track line, with default value set to 6;

extracting the end points of the two ends of the P track line as (xh) _p ，yh _p )、(xt _p ，yt _p ) (ii) a Extracting the two end points of the P +1 th track line as (xh) _p+1 ，yh _p+1 )、(xt _p+1 ，yt _p+1 ) (ii) a Setting new end points of the combined track line, and marking as H and T; the new end point coordinate is determined according to the original track line end point coordinate when yh _p >yh _p+1 H = (xh) _p+1 ，yh _p+1 ) Otherwise H = (xh) _p ，yh _p ) (ii) a Similarly, when yt _p >yt _p+1 Then, T = (xt) _p ，yt _p ) Otherwise, T = (xt) _p+1 ，yt _p+1 )；

The specific process of track line pairing is as follows:

sorting the processed track lines in a descending order according to the C value in the a-type group and the b-type group respectively, pairing the sorted track lines in pairs in the a-type group and the b-type group respectively, independently pairing the rest track lines in the a-type group track lines and the rest track lines in the b-type group track lines when the rest track lines which are not paired exist in the a-type group and the b-type group track lines after pairing, and marking the paired track lines as LinepairC;

s235, sequentially connecting four end points of the matched track line to serve as a track area;

and S236, repeating S234 and S235, finishing the classification, combination and pairing of all grouped track lines, and extracting all track areas in the turnout image.

6. The method for detecting the single turnout state detection system according to claim 3, wherein in the step S24, all track areas are screened according to the track area positions to determine the track where the train is located, and the method comprises the following steps:

when a track area formed by the LinepairC paired track lines exists, the area is the track where the train is located; otherwise, screening the track area where the train is located according to the following process:

and (3) screening all track areas according to a centroid calculation formula: calculating the horizontal coordinate of the center point of the e-th track area and recording as xz _e Calculating the track line spacing D of the e-th track area according to the track line spacing calculation formula _e Designing an evaluation function G according to the difference and characteristics of the side track area and the track area where the train is located _e ＝0.3×D _e ² -0.7×(xz _e -w/2) ² Wherein G is _e The evaluation function value of the e-th track area is obtained, and w is the width of the original turnout image; and respectively calculating the G value of each track area according to the evaluation function, and taking the track area corresponding to the maximum G value as the track area where the train is located.

7. The method for detecting the single turnout state detection system according to claim 2, wherein in the step S3, a track turnout state discrimination module is constructed, and the method comprises the following steps:

s31, constructing a deep learning model, and specifically comprising the following steps:

s311, constructing a backbone network, wherein the backbone network is divided into 11 network modules, namely a Focus module, a CONV module, a TRCSP _1 module, a CONV module, a TRCSP _3 module, a CONV module, a TRCSP _1 module, an SPP module and an SE module;

the Focus module slices the image after data preprocessing, and the width of the image after data preprocessing is set as W ₁ Height of H ₁ Slice size W ₁ /2，H ₁ Secondly, stacking the sliced feature maps in the channel direction, then performing downsampling on the stacked feature maps by adopting a 1 multiplied by 1 convolution kernel, and finally adjusting the number of output feature map channels to 64;

the CONV module performs double down sampling on the feature map output by the Focus module and adjusts the number of output channels;

the TRCSP _1 module and the TRCSP _3 module are feature extraction modules fusing Transform structures, wherein the TRCSP _1 module comprises 1 Transform component, and the TRCSP _3 module comprises 3 Transform modules; the Transform module is an attention network module, and LayerNorm and Dropout layers in the Transform module can regularize the network;

the method comprises the following steps that an SPP module achieves spatial pyramid pooling of image features, the SPP module firstly performs convolution on a feature map input at an upper layer, then 3 parallel maximum pooling layers are adopted to reduce model parameters, finally, pooling results and convolved results are spliced and are subjected to convolution again, and image features are further extracted;

s312, fusing the image features extracted in the S311 to construct a detection head network; the method comprises the following specific steps:

s3121, extracting feature maps output by the first and second TRCSP _3 modules in S311, which are respectively marked as C1 and C2, marking a feature map output by the last SE module of the backbone network of S311 as C3, and marking feature maps input by the first and second TRCSP _3 modules as C1 _p 、C2 _p The characteristic diagram of the last SE module input of the S311 backbone network is marked as C3 _p ；

S3122, carrying out convolution on C3 to obtain a characteristic diagram marked as D3 ₀ (ii) a C3 is up-sampled and spliced with the C2 result, and the obtained characteristic diagram is marked as D2 ₀ (ii) a To D2 ₀ Upsampling, splicing with the C1 result, and marking the obtained characteristic diagram as D1 ₀ ；

S3123, to D1 ₀ 、D2 ₀ 、D3 ₀ The three output characteristic diagrams are respectively subjected to cavity pyramid pooling, and then are respectively fed back to the input ends of the first TRCSP _3 module, the second TRCSP _3 module and the SE module;

s3124, during the calculation process of the first TRCSP _3 module, the second TRCSP _3 module and the SE module, the input end is at C1 _p 、C2 _p 、C3 _p Introducing feedback in S3123 on the basis, repeating S3121, and recalculating C1, C2, C3;

s3125, repeating S3122, S3123 and S3124 in turn twice, circularly extracting and fusing image features to obtain D1 ₂ 、D2 ₂ 、D3 ₂ (ii) a D1 ₂ 、D2 ₂ 、D3 ₂ And D1 ₀ 、D2 ₀ 、D3 ₀ Splicing in the channel direction to respectively obtain D1, D2 and D3 as characteristic diagrams of the detection heads for turnout state detection;

s313, decoding the feature map obtained in the S3125 to obtain a prediction frame position, a prediction frame object type and a type confidence level;

s32, constructing a turnout state discrimination model of the traditional image processing technology, and specifically comprising the following steps:

s321, obtaining the prediction frame information with reliability lower than 0.5 in S313, obtaining the position information of the prediction frame in the original according to the down sampling ratio, and extracting the region corresponding to the prediction frame in the original as R _{Preparation of} ；

S322, for R _{Preparation of} Graying the region, extracting edge features of the image through a canny operator, extracting track line features by using a Hough transform method, and storing the extracted line segment endpoint coordinates in a list;

s323, converting the track line into the following expression form by using the extracted track line end point information: y = Kx + C, k is an intercept, and the slope and the intercept of each track line are respectively recorded, wherein the slope and the intercept of the t-th track line are respectively represented as Kt and Ct;

grouping the trajectory lines according to the slope of the trajectory lines; the specific process is as follows: sorting all track lines from large to small according to Kt values, selecting nt track lines from the sorted track lines in an equidistant sampling mode to serve as matching standards, and recording the slopes of the nt tracks as Kt respectively _nt Traversing the rest of the reserved track lines, and recording the slope of the track line traversed each time as kt _g When | kt _nt -kt _g |<kt _t While, kt _g Corresponding track line added to kt _n In the illustrated track line grouping, kt _t Setting a default value as 0.2 for a difference threshold value of the track slope of the turnout track;

merging grouped track lines, and setting each subgroup to have N _t A track line, if only one track line exists in the group, the merging operation is not needed; will be grouped intoIs the average of the slopes of all the track lines and is recorded as K _{Are all 1} And sorting the two track lines in descending order according to the intercept value, and calculating the distance between every two adjacent track lines

Wherein q ∈ N _t -1, where Dq is the track line distance between the q track line sorted in descending order and the q +1 track line sorted after it; cq is the intercept of the q-th track line expression; c _q+1 The intercept is the intercept of the q +1 track line expression; n is a radical of _t The number of track lines; when Dq<D1 _t Then, it indicates that the track lines q and q +1 belong to the same track, and the two track lines are merged, D1 _t Setting a default value as 4 for a combination distance threshold value of a turnout track line;

extracting the end points of the two ends of the q track line as (xh) _q ，yh _q )、(xt _q ，yt _q ) (ii) a Extracting the two end points of the q +1 th track line as (xh) _q+1 ，yh _q+1 )、(xt _q+1 ，yt _q+1 ) (ii) a Setting a new endpoint of the combined track line, and marking as H ₁ 、T ₁ (ii) a The new end point coordinates are determined according to the original track line end point coordinates when yh _q> yh _q+1 When H is present ₁ ＝(xh _q+1 ，yh _q+1 ) Otherwise H ₁ ＝(xh _q ，yh _q ) (ii) a Similarly, when yt _q> yt _q+1 When, T ₁ ＝(xt _q ，yt _q ) Otherwise T ₁ ＝(xt _q+1 ，yt _q+1 )；

S324, when the screened track line is not equal to 4, the condition of missing detection or false detection exists in the detection of the turnout track line, the track line is adjusted to extract the hyper-parameter, and S322 and S323 are repeated until 4 track lines are detected;

s325, respectively obtaining the midpoint coordinate of each track line, and recording as (xc) _n ，yc _n ) N is 1,2,3,4; pressing xc to the center point of each track line _n Sequentially sorting the sizes in ascending order, and sequentially recording the sorted track lines as L _{Left 1} 、L _{Left 2} 、L _{Right 1} 、L _{Right 2} (ii) a Wherein L is _{Left 1} 、L _{Left 2} The gap between the left side of the turnout is L _{Right 1} 、L _{Right 2} The gap between the two points is the right gap of the turnout;

s326, calculating L _{Left 1} 、L _{Left 2} Point of intersection between them, denoted as P _{Left side of} Calculating L _{Right 1} 、L _{Right 2} The intersection point between them is denoted as P _{Right side} (ii) a Judging whether the intersection point is located in the range of the corresponding track segment, if so, considering the gap line as a false gap line, and enabling the two track lines to be intersected without gaps; if the intersection point is positioned outside the range of the corresponding track line segment, the boundary line is considered as a real gap line;

if P _{Left side of} Within the range of the corresponding track segment, P _{Right side} If the turnout is not in the range of the corresponding track segment, the turnout is in a right-turn turnout state;

if P _{Left side of} Not within the corresponding track segment, P _{Right side} If the turnout is located in the range of the corresponding track line segment, the turnout is in a left-turn turnout state;

if P _{Left side of} 、P _{Right side} Are not within the corresponding track segment or P _{Left side of} 、P _{Right side} If the turnout state is within the range of the corresponding track line segment, the turnout state cannot be judged by adopting the traditional image processing method, and the deep learning model identification result is directly output.

8. The detection method of the single turnout state detection system according to claim 2, wherein in the step S4, the preprocessed turnout image and the corresponding label are used as input data, the constructed turnout state discrimination model is used as a detection tool, and a turnout detection result is obtained; the method comprises the following steps:

s41, sending the preprocessed turnout image and the corresponding label into a network depth model for primary detection of turnout states, and obtaining a predicted target frame and a category confidence coefficient;

s42, judging the category confidence, directly outputting a turnout state judgment result when the confidence is greater than a threshold, mapping the predicted target frame back to an original image when the confidence is less than the judgment threshold, judging the turnout state again by using a turnout state judgment model of the traditional image processing technology, and setting the confidence threshold default value as 0.5; when the two judgment results are consistent, the judgment result is directly output, and when the two judgment results are inconsistent, the result of the turnout model of the traditional processing technology is output, but the predicted target frame is specially marked; when the turnout model cannot judge the result in the traditional processing technology, the recognition result of the deep learning network model is adopted as the final output result, and special labeling is also carried out.

9. The detection method of the single turnout state detection system according to claim 2, wherein in the step S5, turnouts of a track on which a train is located are screened and identified by combining the track on which the train is located and the turnout state discrimination result, and a final turnout state identification result is output; the specific process is as follows:

comparing the output track area image of the train with the output turnout prediction frames, and calculating the intersection ratio J of all the recognized prediction frames with the track area image one by one _iou Recording the image of the track area where the train is located as A, recording the prediction frame as B,

in the above formula, area (a) represents a track area where a train is located, and area (B) represents an area where a prediction box is located;

when the intersection ratio is larger than the threshold value, the prediction frame belongs to the track area where the train is located, and when the intersection ratio is smaller than the threshold value, the prediction frame is a side track turnout and displays turnout detection results by different colors.

10. A semi-automatic data labeling method of a single turnout state detection system is characterized by comprising the following steps:

step 1, acquiring turnout images under different lines and different weather conditions through a single turnout state detection system, dividing the acquired images into two parts, namely a manual marking set and an automatic marking set, marking the manual marking set by using a VIA marking tool, and marking the manual marking set according to the quantity of 8:1:1, randomly dividing the image into three sub-data sets of train, train interval and test, wherein the image of the artificial labeling set is at least 8000 frames and the turnout state example of each category is not less than 10000;

step 2, training the marked data set through the turnout state discrimination deep learning model of S31 to obtain the optimal turnout identification model of the model; the specific process is as follows:

training and testing a turnout state discrimination model in the single turnout state detection system by utilizing the manual marking set in the step 1, wherein the training times are recorded as R _i When it is at R _i When the detection accuracy rate and the recall rate of the turnout state reach more than 0.85, taking the wheel model parameters as available model parameters, and packaging the wheel model parameters and the test result into an R < th > model parameter _i The wheel available turnout state judging model is stored into an available model list; otherwise, abandoning the model parameters of the wheel, continuing to train and test the model for the next round, stopping operation when the available model list is not empty and the test results of 3 consecutive rounds have no optimization effect, and selecting a group of available turnout state judgment models with the best performance from the available model list as the optimal turnout state judgment models; when the available model list is empty and the training times reach a threshold value R _{i threshold} When R is _{i threshold} The default value is 300, at the moment, the single turnout state detection system cannot meet the detection requirement, the calculation process is stopped, and the system is optimized;

step 4, manually checking whether the automatically generated annotation file meets the annotation requirement, optimizing corresponding unreasonable annotation, removing the image without the generated annotation information, and adding the image with qualified annotation into a manual annotation set;

step 5, repeating steps 2,3 and 4 from any group of images in the rest group of images of the automatic labeling set; and (4) until the automatic labeling set becomes empty, completing labeling of all the images of the automatic labeling set, and ending the labeling process.