CN113160219B - Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image - Google Patents

Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image Download PDF

Info

Publication number
CN113160219B
CN113160219B CN202110518589.4A CN202110518589A CN113160219B CN 113160219 B CN113160219 B CN 113160219B CN 202110518589 A CN202110518589 A CN 202110518589A CN 113160219 B CN113160219 B CN 113160219B
Authority
CN
China
Prior art keywords
category
image
railway scene
railway
unmanned aerial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110518589.4A
Other languages
Chinese (zh)
Other versions
CN113160219A (en
Inventor
王志鹏
童磊
贾利民
秦勇
耿毅轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN202110518589.4A priority Critical patent/CN113160219B/en
Publication of CN113160219A publication Critical patent/CN113160219A/en
Application granted granted Critical
Publication of CN113160219B publication Critical patent/CN113160219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30236Traffic on road, railway or crossing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a real-time railway scene analysis method for an unmanned aerial vehicle remote sensing image, which comprises the following steps: acquiring an unmanned aerial vehicle remote sensing image in real time, and acquiring and processing data of the image to obtain a data set; constructing a railway scene analysis network model, and training and verifying the railway scene analysis network model according to the obtained data set to obtain an optimal line loss proportional coefficient; and testing the model by adopting different computers according to the optimal linear loss proportion coefficient to obtain an analysis result, and comprehensively evaluating the analysis result. The method realizes the analysis of the railway scene in real time, fast and high-efficiency based on the unmanned aerial vehicle on-board computer with limited computing resources, so as to perform high-precision segmentation of the track area in the railway scene.

Description

Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image
Technical Field
The invention relates to the field of rail transit operation safety and guarantee, in particular to a real-time railway scene analysis method for an unmanned aerial vehicle remote sensing image.
Background
Recently, drones are widely used in scene parsing tasks in many fields. As an important auxiliary inspection mode except manual inspection and rail inspection vehicle inspection, automatic inspection based on an unmanned aerial vehicle is an important development trend in the field of high-speed railway safety operation. The automatic routing inspection of the unmanned aerial vehicle has multiple advantages of flexibility, high efficiency, low cost and the like, has no influence on the normal operation of a train, and can provide advanced safety guarantee for railway operation. The unmanned aerial vehicle can carry load equipment such as a visible light camera and the like, can carry a small-sized on-board computer (on-board computer) with the volume and the quality meeting certain specification requirements simultaneously, analyze and process data such as video streams from the load equipment, and simultaneously can carry out more flexible and customized flight control on the unmanned aerial vehicle in real time according to requirements. Therefore, the unmanned aerial vehicle-based railway automatic inspection system has a wide application prospect and brings revolutionary progress to railway inspection.
In recent years, deep learning has been greatly developed, and its results are widely applied to various fields such as face recognition, industrial defect detection, and intelligent robots. In the field of railway automated intelligent inspection, constructing a deep learning model to effectively detect an area or an object concerned in the railway inspection process is also an important research topic. However, to realize automatic inspection of railways, effective analysis of railway scenes becomes a first-priority task. The real-time convolution neural network model constructed based on the deep learning technology is a deep model with great potential which can be operated on an unmanned aerial vehicle airborne computer for real-time railway scene analysis.
Therefore, a real-time railway scene analysis method for unmanned aerial vehicle remote sensing images by adopting a deep learning method is urgently needed
Disclosure of Invention
The invention provides a real-time railway scene analysis method for an unmanned aerial vehicle remote sensing image, which aims to overcome the defects in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme.
The embodiment of the invention provides a real-time railway scene analysis method for an unmanned aerial vehicle remote sensing image, which comprises the following steps:
acquiring an unmanned aerial vehicle remote sensing image in real time, and acquiring and processing data of the image to obtain a data set;
constructing a railway scene analysis network model, and training and verifying the railway scene analysis network model according to the obtained data set to obtain an optimal line loss proportional coefficient;
and testing the model by adopting different computers according to the optimal linear loss proportion coefficient to obtain an analysis result, and comprehensively evaluating the analysis result.
Preferably, the data acquisition and processing of the image are performed to obtain a data set, including screening the obtained image, and dividing the screened image into a training set, a verification set and a test set according to a certain proportion after performing data annotation on the image by using a label software.
Preferably, training and verifying the railway scene analysis network model according to the obtained data set comprises: when the data set only has two semantic categories of a track area and a non-track area, training and verifying the railway scene analysis network model only by adopting a line loss function; and when other semantic categories are included, training and verifying the railway scene analysis network model by adopting the integrated loss function.
Preferably, the integration loss function is represented by the following formula (1):
L=(1-α)L CE +αL LL (1)
wherein L is CE Representing the cross entropy loss function, L LL Represents the line loss function and alpha represents the scaling factor.
Preferably, when the railway scene analysis network model is trained and verified according to the obtained data set, the track area in the unmanned aerial vehicle remote sensing image of the railway scene needs to be long-strip-shaped.
Preferably, the overall architecture for constructing the railway scenario analysis network model is as shown in table 1 below:
TABLE 1
Figure GDA0003870846770000031
Preferably, the line loss function is as shown in equation (2) below:
Figure GDA0003870846770000032
wherein, the pixel point sets corresponding to the track area and the non-track area in the image are respectively P r And P n Wherein | P r |=N,|P n I = M, for pixel point p i ∈P r And p j ∈P n The membership degrees of the track areas are respectively 1/lambda i And 1/lambda j ,f i And f j Are respectively a pixel point p i And p j Probability of being predicted as a track region class.
Preferably, the degree of membership is calculated according to the following formula (3):
λ=d/d 0 (3)
when only a single track area exists in the image, the distance from the pixel point p to a track central line l is d, and the distance from a point on the edge of the track area to the central line l is d 0
When two or more strip-shaped track areas exist in the image, the pixel point p reaches the central line l β D, from a point on the edge of the beta-th track area to the centre line l β Is a distance d 0
Preferably, the comprehensive evaluation of the analysis result includes: and calculating the analytic result obtained by adopting the test set and the corresponding label truth value to obtain prediction precision evaluation, and evaluating the inference speed of the railway scene analytic network model.
Preferably, the prediction accuracy evaluation is calculated according to the following formulas (3) to (4):
Figure GDA0003870846770000041
Figure GDA0003870846770000042
the TP represents the number of pixels of a certain semantic category c predicted to be the category; in the TN indicates the pixel points of the category c, the number of the pixel points which are not predicted to be the category is predicted; the FP expresses the number of the pixels of the category c, which are predicted to be the category but are not the number of the pixels of the category in fact; FN represents the number of pixels of the category, but the number is not predicted to be the category but actually is predicted to be the number of the pixels of the category; the IoU represents the intersection ratio precision of the category c; in the formula (4), mlou represents the average cross-over ratio of the precision of all semantic categories, and C represents the number of semantic categories.
According to the technical scheme provided by the real-time railway scene analysis method for the remote sensing image of the unmanned aerial vehicle, the method is characterized in that a deep complete decoupling residual convolution network is designed, so that the real-time efficient analysis of the railway scene is realized within the calculation capacity range of an onboard computer of the unmanned aerial vehicle, and the automatic railway inspection work based on the unmanned aerial vehicle is supported to the greatest extent; by designing a customized auxiliary loss function, in the network model training process, the network is trained by using the auxiliary loss function, the segmentation of a track area and a non-track area can be simultaneously restricted under the condition of not increasing the computational complexity, so that the predicted track areas are accurately concentrated in a strip-shaped area, and the predicted track areas are prevented from appearing in other impossible local areas, an unmanned aerial vehicle-mounted computer based on limited computational resources is realized, real-time, rapid and efficient railway scene analysis can be carried out, and the high-precision segmentation of the track areas in the railway scene is carried out.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a real-time railway scene analysis method for remote sensing images of unmanned aerial vehicles according to an embodiment of the invention;
FIG. 2 is a schematic comparison of a fully decoupled convolution and standard convolution filter;
FIG. 3 is a schematic diagram comparing the fully decoupled residual module proposed in the present embodiment with the residual module of the prior art;
FIG. 4 is a diagram illustrating a comparison between a conventional pixel coordinate system and a normalized coordinate system;
FIG. 5 is a schematic illustration of a single track area and a dual track area and their centerlines;
FIG. 6 is a schematic diagram showing the variation of the proposed FDRNet and ERFNet different types of prediction accuracy with the line loss function ratio α;
FIG. 7 is a graph of FDRNet visual effects trained with an integration loss strategy.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are exemplary only for explaining the present invention and are not construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, but do not preclude the presence or addition of one or more other features, integers, steps, operations, and/or groups thereof. It should be understood that the term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding of the embodiments of the present invention, the following detailed description will be given by way of example with reference to the accompanying drawings, and the embodiments are not limited to the embodiments of the present invention.
Examples
Fig. 1 is a schematic flow chart of a real-time railway scene analysis method for remote sensing images of unmanned aerial vehicles according to an embodiment of the present invention, and with reference to fig. 1, the method includes:
s1, acquiring a remote sensing image of the unmanned aerial vehicle in real time, and acquiring and processing data of the image to obtain a data set.
Under the condition of good weather conditions, the unmanned aerial vehicle flies over the railway line and acquires remote sensing images of the railway scene, and the acquired images are screened to remove inapplicable images. And marking the data of the screened image by utilizing the labeme software, and manually marking the interested semantic category in the image. In the present embodiment, the classification into 5 categories of track, plant, bare land, road, building and background is exemplary. Dividing the manufactured data set into a training set, a verification set and a test set according to a certain proportion, wherein the training set is used for training a network, the verification set is used for verifying the network performance in the training process, the test set is used for testing a trained network model so as to verify the performance of the constructed model, and the proportion of the training set, the verification set and the test set is schematically 7:2:1.
s2, a railway scene analysis network model is built, training and verification are carried out on the railway scene analysis network model according to the obtained data set, and the optimal line loss proportional coefficient is obtained.
Firstly, a lightweight railway scene analysis network model FDRNet (Fully depleted basic ConvNet) is constructed, so that the model can be operated at a certain speed on an on-board computer.
Constructing a railway scene analysis network model:
the railway scene analysis network model constructed in the embodiment is a complete decoupling residual convolution network model, and is specifically constructed through processing of a deep complete decoupling convolution and a deep complete decoupling residual block.
(1) Deep fully decoupled convolution
The basic idea of completely decoupling convolution is to further perform correlation decoupling on standard convolution, which means that parameters and calculation amount are greatly reduced, that is, on the basis of keeping the basic mapping correlation of convolution unchanged, the parameters can be greatly reduced, thereby avoiding spending excessive time on calculation and resource occupation. The fully decoupled convolution proposed by the present embodiment includes two aspects of correlation decoupling: (1) decoupling of cross-channel and spatial correlations; (2) decoupling of lateral and longitudinal spatial correlations.
The following assumptions are first proposed: two coupling correlation modes in the convolutional network can be completely decoupled, namely (1) cross-channel correlation and spatial correlation in the characteristic diagram can be completely decoupled; (2) Furthermore, the two spatial correlations (horizontal and vertical) in the feature map can also be completely decoupled. FIG. 2 is a schematic diagram comparing a fully decoupled convolution and a standard convolution filter, as shown in FIG. 2, comparing a filter bank of a fully decoupled convolution and a standard convolution. Fully decoupled convolution decomposes the standard convolution into three sequential steps: transverse 1D depth convolution, longitudinal 1D depth convolution and cross-channel 1x1 convolution. Wherein, M convolution kernels in the depth convolution of the first two different space dimensions respectively correspond to M channels of the input feature map. The final 1x1 convolution is a special case of a common standard convolution of size 1x 1. The method mainly completes the establishment of cross-channel correlation mapping relation in the convolution process, and can convert the number of channels of the input feature graph from M to N.
By using
Figure GDA0003870846770000081
Which represents a non-linear activation function,
Figure GDA00038708467700000810
additional offsets for the mth layer of the filter representing the transverse 1D depth convolution and the longitudinal 1D depth convolution of the fully decoupled convolution process, respectively, b i p Representing the i-th filter additional offset across channel 1x1 convolution in the complete decoupling convolution process;
Figure GDA0003870846770000082
Figure GDA0003870846770000083
respectively, the vectors represented by the m-th layer weight parameters in the convolution kernels representing the transverse 1D depth convolution and the longitudinal 1D depth convolution in the complete decoupling convolution process,
Figure GDA0003870846770000084
representing a vector represented by the mth layer weight parameter of the cross-channel 1x1 convolution ith filter in the complete decoupling convolution process;
Figure GDA0003870846770000085
the vector represented by the mth layer of the input feature map is represented. Completely decoupling ith channel of convolution process output characteristic diagram
Figure GDA0003870846770000086
Input feature map a that can be expressed as 0 Is shown in formula (1), where denotes the convolution operation:
Figure GDA0003870846770000087
due to convolution kernel
Figure GDA0003870846770000088
Representing a 1x1 convolution with only one scalar parameter, the following equation (2) can be obtained:
Figure GDA0003870846770000089
(2) Deep complete decoupling residual block
The embodiment further provides a complete decoupling residual module by fully utilizing the decomposition form of the complete decoupling convolution. Fig. 3 is a schematic diagram comparing the fully decoupled residual block proposed in this embodiment with the residual block of the prior art, and the original remaining blocks (bottleneck and non-bottleneck versions) are proposed in ResNet before doing so, as shown in 3- (a) and 3- (b). Considering that a non-bottleneck design version may be obtained to bring higher precision, and also noticing that a bottleneck design also brings other degradation problems, ERFNet modifies a non-bottleneck residual module through one-dimensional decomposition to accelerate the model operation speed and simultaneously reduce the parameters of the original non-bottleneck residual module, which is called non-bottleneck residual-1D, as shown in fig. 3- (c). Here, the non-bottleneck 1D residual module is further modified to fully deconvolute with the proposed correlation to further reduce the number of parameters and the time cost, as shown in fig. 3- (D). The fully decoupled residual module, called non-bottomleneck-FD, proposed in this embodiment is composed of two fully decoupled convolutions and identity maps connected together. The 1D convolution in the non-bottleneck 1D is modified to two 1D depth convolutions that only consider spatial correlation, and an additional 1x1 convolution (also known as point-by-point convolution) is appended to achieve the final cross-channel correlation of the feature map. It should also be noted that a ReLu nonlinear activation function is also added after each convolution process.
(3) Deep complete decoupling residual error network
The architecture for constructing the railway scene analysis network model in the embodiment is shown in the following table 1, and is a compact and effective network architecture, and although a non-bottomleneck-FD module is adopted, network parameters can be greatly reduced, which inevitably causes network performance degradation. It is therefore possible to consider scaling up the network to compensate for this loss. The expansion of the network scale is carried out from two directions, one is to deepen the network, and the other is to widen the network. And simultaneously, the two directions are designed and tested, and an empirical result shows that the extended network is a more optimal direction more suitable for the current deep learning framework PyTorch. Analysis also showed that the increase in the number of intermediate loops in the convolution slows down the network speed, probably because PyTorch is more sensitive to the depth of the network than to the width. It can be noted that the number of intermediate loops of the fully decoupled convolution proposed by the present invention is increased (no additional batch normalization and subsequent non-linear activation layer is calculated) compared to the conventional standard convolution, which in itself will deepen the network and lead to an increase in forward inference time. From table 1, it can be seen that the encoder consists of 1-14 layers and the decoder consists of 15-19 layers. Inspired by ERFNet, the invention designs a wider convolutional network architecture, and the same downsampling modules are adopted in layers 1, 2 and 8 to respectively perform downsampling. In these downsampling modules, the maximum pooling result and a single 3x3 convolution result of span 2 are pieced together as the final result of downsampling to capture more rich features. The network adopts hole convolution with hole rates of 2, 4 and 8 to obtain more context information and global information. The present invention does not employ convolutional layers with a void ratio of 16, giving up the minimal gains in contextual characteristics that it may bring, while avoiding the continued increase in network depth. In addition, the ratio of Dropout used is 0.05, while the ratio in ERFNet is 0.03.Dropout is also included in our architecture as a regularization measure that yields a better characterization. For the upsampling step, three successive transposed convolutional upsampling modules, layers 15, 17 and 19, are employed to expand the resolution of the feature map to the original size of the input image.
TABLE 1
Figure GDA0003870846770000101
It should be noted that when the railway scene analysis network model is trained and verified according to the obtained data set, the track area in the unmanned aerial vehicle remote sensing image of the railway scene needs to be long.
When the data set only has two semantic categories of a track area and a non-track area, training and verifying the railway scene analysis network model by only adopting a line loss function; and when other semantic categories are included, training and verifying the railway scene analysis network model by adopting the integrated loss function.
(1) Line Loss function (LL, line Loss function)
The railway area is the main field of concern for unmanned aerial vehicle based automated railway routing inspection. Accurate prediction of this area plays an important role in future inspection work such as fastener inspection, rail inspection, and rail slab inspection. The embodiment provides a new line loss function to improve the accuracy of the railway category in the railway scene analysis task. The line loss function looks at the excellent characteristics of a long distance and relatively straight high speed railway. Usually for highways, the route designer always deliberately bends it to prevent visual fatigue of the driver. Unlike the concept of turning on a highway, which is intentionally designed by people, the longer the straight line of the highway, the better. Thus, in terms of straightness, railways far exceed highways. While at the same time it can be noted that in the local area of the railway covered by the remote sensing images of the unmanned aerial vehicle, the track area is very straight under most circumstances.
The traditional Cross Entropy (CE) loss function solves this task by classifying each discrete pixel point of the whole image, ignoring the inherent relationship between pixels. It should be noted that the track area in the unmanned aerial vehicle remote sensing image is always an elongated area. Therefore, the closer the pixel is to the center line of the strip-shaped railway area, the more likely it becomes a part of the railway; the further a pixel is from the centerline, the less likely it is to belong to a railroad area. To fully exploit this idea, the present embodiment proposes a line loss function. The proposed line loss function can largely correct the problem of railway area pixel classification errors inherent in the conventional loss function.
1) Normalized coordinate system
Fig. 4 is a schematic diagram comparing a conventional pixel coordinate system with a normalized coordinate system, in the conventional sense, an image coordinate system is established based on pixel points in an image, and one of the pixel points is taken as a unit length, as shown in fig. 4 (a). In such a coordinate system, when an image needs to be zoomed, the distance between two points at corresponding positions in the image before and after the zooming cannot satisfy the good characteristic that the distance remains unchanged. Therefore, in order to solve this problem, the present embodiment establishes a normalized coordinate system as shown in fig. 4 (b). The coordinate system takes the length and width of the entire image as unit length. For a resolution of w 0 ×h 0 For the image of (1), if the coordinates of a point in the conventional coordinate system are (w, h), the coordinates in the normalized coordinate system are (w/w) 0 ,h/h 0 ). Two pixels p can be calculated under the normalized coordinate system 1 and p 2 D (p) between 1 ,p 2 ) Where d represents the euclidean distance. Obviously, the normalized coordinate system has the following characteristics: (1) The distance between two pixels is invariant to the scaling transformation of the image; (2) The maximum distance between two pixels in the normalized coordinate system
Figure GDA0003870846770000111
2) Basic assumptions
Suppose one: if a track area exists in the railway scene image acquired by the unmanned aerial vehicle, two boundary lines of the area are parallel to each other.
Suppose two: if there are two track areas in the railway scene image captured by the drone, then these two track areas are also parallel to each other.
The line loss function is shown in the following equation (2):
Figure GDA0003870846770000121
wherein, the pixel point sets corresponding to the track area and the non-track area in the image are respectively P r And P n Wherein | P r |=N,|P n I = M, for pixel point p i ∈P r And p j ∈P n The Membership degrees (DoM) of the track areas are respectively 1/lambda i And 1/lambda j ,f i And f j Are respectively a pixel point p i And p j Probability of being predicted as a track region class. With the help of the feature map (also called probe-class) of the last layer of the CNN network, the probability of classifying each pixel can be calculated using the softmax function.
The degree of membership is calculated according to the following formula (3):
λ=d/d 0 (3)
FIG. 5 is a schematic diagram of a single track area, a dual track area and their center lines, wherein, when there is only a single track area in the image, as shown in FIG. 5 (a), the distance from pixel point p to track center line l is d, and the distance from the point on the edge of the track area to center line l is d 0
When two or more strip-shaped track regions exist in the image, as shown in 5 (b), the pixel point p is located at the central line l β D, point on the edge of the beta-th orbital region to the centre line l β Is a distance d 0
The membership degrees of the non-orbit area pixel points are all smaller than 1, and the membership degrees of the orbit area pixel points are all larger than 1.
Because the LL function can only perform a binary task, the LL function is used alone to train the network, and the trained network can only distinguish orbital regions from non-orbital regions. Thus, to enable multi-classification of a network, one can consider training the network in conjunction with the use of a cross-entropy loss function. Therefore, the CE loss function is used for training the network to realize multi-classification tasks, and meanwhile, the auxiliary line loss function LL can be used for carrying out stricter constraint on the track area to realize more accurate track area segmentation. The integration loss function is shown in the following equation (4):
L=(1-α)L CE +αL LL (4)
wherein L is CE Representing the cross entropy loss function, L LL Represents the line loss function and alpha represents the scaling factor.
The above equation (4) shows that if the line loss function is to be reduced, f i Must be larger and larger, f j It must be smaller and smaller. If and only if f i =1,f j =0, the line loss function takes the ideal minimum value 0. It is worth pointing out that the line loss function only applies to the task of two classification of track areas and non-track areas. Therefore, a network trained using only the line loss function can only be used to distinguish between track and non-track areas. To implement multi-classification, the present embodiment trains the model by using a set of line loss function and cross entropy loss function.
And S3, testing the model by adopting different computers according to the optimal linear loss proportion coefficient to obtain an analysis result, and comprehensively evaluating the analysis result.
And (3) fully and effectively training the railway scene analysis network model FDRNet by using the training set and the verification set in the step (S1) and adopting the line loss function, and after the training is finished, selecting the network model under the optimal proportional coefficient alpha of the line loss function for testing. Inputting the pictures in the test set into the trained model, predicting all the pictures one by one, and comprehensively evaluating the analysis result, wherein the method comprises the following steps: and calculating the analytic result obtained by adopting the test set and the corresponding label truth value to obtain prediction precision evaluation, and evaluating the inference speed of the railway scene analytic network model.
The prediction accuracy evaluation was calculated according to the following formulas (5) to (6):
Figure GDA0003870846770000131
Figure GDA0003870846770000132
wherein, TP (true positive) represents the number of pixels of a certain semantic category c predicted as the category; the TN (true negative) represents the number of the pixel points of the category c which are predicted not to be the category; the FP expresses the number of the pixels of the category c, which are predicted to be the category but are not the number of the pixels of the category in fact; FN (false positive) indicates that the number of pixels in the category is not predicted to be the category but is actually the number of the pixels in the category; the IoU represents the intersection ratio precision of the category c; in the formula (4), mlou represents the average cross-over ratio of the precision of all semantic categories, and C represents the number of semantic categories.
The following is a simulation case adopting the method of the embodiment, and a certain three positions A, B and C of the Jingu high-speed railway gallery are selected to acquire railway scene pictures, so that the weather condition of the data acquisition on the day is better and sufficient in light. Based on the data, a railway scene analysis data set based on the unmanned aerial vehicle is constructed, the constructed real-time railway scene analysis network model is trained by using the data set, and finally, an evaluation test is completed on an airborne computer of the unmanned aerial vehicle.
The method comprises the following steps: the method comprises the steps of obtaining an unmanned aerial vehicle remote sensing image in real time, and carrying out data acquisition and processing on the image to obtain a data set.
The dataset has five semantic categories (i.e. background, track, building, vegetation and road), with pixel-level annotation for each image in the dataset. All unmanned aerial vehicle remote sensing images collected in the data set are collected from a certain three positions A, B and C of a Jinghushi high-speed railway gallery section. To demonstrate the performance of the different models, 3000 images from zones a and B were used to construct training data (2700) and validation data (300), while 300 images from zone C were used for testing. Here, region C is completely different from regions a and B to avoid the possible high similarity of images between training data and test data. And training the proposed model on a training set.
Step two: and constructing a railway scene analysis network model, and training and verifying the railway scene analysis network model according to the obtained data set to obtain the optimal line loss proportional coefficient.
And (4) fully training the constructed lightweight railway scene analysis network model by utilizing the integrated loss function and the data set constructed in the step one.
In the present simulation example, the training model uses a batch size of 2 (batch size = 2), the parameter optimization is performed using an Adam optimizer, the momentum parameter size is 0.9, and the weight attenuation is 2e -4 And initial learning rate 5e -4 . In the present simulation example, 100 rounds (epoch = 100) of training were performed on all networks. A consistent random seed is set in the code implementation so that all networks can be trained using a fixed sequence of pictures to ensure repeatability of the work. Meanwhile, the line loss proportion coefficient alpha in the integrated loss function is used for changing from 0 to 0.9 at intervals of 0.1 in the training process of the network, and the best result is selected.
(1) Optimum ratio of line loss function
According to experiments, the proposed line loss function was found to be much stronger than the cross loss function in constraining the network to accurately predict the railway area. But the line loss function can only classify each pixel in two. Therefore, the training process of the network is constrained by using the line loss function, and the CE loss function is adopted to perform multi-class classification. Therefore, how to properly combine CE and line loss function functions is a key issue in terms of how to limit the training process to both railway and non-railway regions. In this regard, a completely different training strategy is implemented compared to the conventional training process, which performs back propagation with integration penalty in each training step. For this strategy, different proportions a of the line loss function are chosen, varying from 0 to 0.9 at intervals of 0.1.
FIG. 6 is a schematic diagram showing the variation of the proposed FDRNet prediction accuracy with the line loss function ratio α, i.e., the tendency of the prediction accuracy IoU values and mIoU values with α. The proposed FDRNet, in terms of orbit region prediction accuracy and overall mlou, can improve its accuracy simultaneously by the line loss function when the appropriate α is employed. It was also found that the accuracy reached a maximum when α = 0.7.
In addition, as shown in table 2, the accuracy performance trend in all classes where the FDRNet online loss function ratio varies from 0 to 0.9 is also described, and the maximum accuracy increment and the increment when α =0.7 are illustrated. It can be concluded with certainty that the line loss function can improve the prediction effect not only for the track region but also for the non-track region. It is easily understood that when the prediction accuracy of the track area is lowered, it means that more non-track pixels are predicted as the track (FP case) or more track pixels are predicted as the non-track (FN case), eventually resulting in a reduction in the prediction accuracy of the non-track area. Table 2 shows the increase in accuracy for different classes of pixels. It can be seen that the maximum accuracy increase for the track category is 7.58% due to its relatively strong linear characteristic, while the maximum accuracy increase for the road category is 6.66%. Typically, the accuracy of the mlou is improved by 3.36%.
More details on the quantitative comparison between the accuracy of the original FDRNet and the FDRNet trained with the integration loss strategy with α =0.7 are listed in table 3. It can be seen that the accuracy of all classes rises to a higher level. Background, building, vegetation, track, road accuracy increased from 54.98% to 57.04%,46.20% to 47.34%,58.98% to 61.12%,70.72% to 78.30% and 44.77% to 48.67%, respectively. Eventually, the mlou increased from 55.13% to 58.49%. In particular, the track class achieves the largest increments.
TABLE 2
Figure GDA0003870846770000161
TABLE 3
Figure GDA0003870846770000162
In summary, the line loss function proposed in this embodiment has the following advantages: (1) The integrated loss strategy can greatly improve the precision of a railway area and a non-railway area, and finally improve the overall precision. (2) According to the exact definition of the LL function, the model trained using the line loss function can concentrate the predicted track areas in the image in the strip-shaped areas while suppressing the appearance of track areas at other impossible locations. (3) The line loss function is more explanatory for the railway region segmentation.
Step three: and testing the model by adopting different computers according to the optimal line loss proportional coefficient to obtain an analysis result, and comprehensively evaluating the analysis result.
To demonstrate the advantage of the method in reasoning speed, comparison experiments were performed at different resolutions on a single NVIDIA Jetson TX2 embedded device and a single NVIDIA geource RTX 2060 card, respectively. Tables 4 and 5 show the comparison of the inference speed on a single embedded device TX2 and the comparison of the inference speed on a single GEFORCE RTX 2060 in the method, wherein ms is the number of milliseconds needed for inferring a picture, and fps is the number of pictures that can be inferred per second
It can be seen that with the TX2 device, FDRNet reaches a peak of 12.8 at a resolution of 512x 256. For the RTX 2060GPU card, it can be noted that FDRNet achieves a peak fps of 90.9 at a resolution of 512x 256.
TABLE 4
Figure GDA0003870846770000171
Table 5:
Figure GDA0003870846770000172
as shown in table 4, the results of FDRNet under different training strategies are given while comparing the precision with other models, where "r.iou" represents the precision of the railway track area category, "mliou" represents the comprehensive evaluation of the precision of all categories, and "scratch" represents the model as being trained from scratch without using a pre-trained model. "with LL" means training the network with the integrated loss function, "pretrained, with LL" means training the model with the pre-trained model, and with the integrated loss function.
Table 6 shows the comparison between the overall performance of the FDRNet and other models, and as can be seen from table 6 below, the FDRNet does not use the line loss function in the training stage and cannot obtain satisfactory accuracy when training is performed from the beginning, the track class prediction accuracy is 70.72%, and the overall prediction accuracy mlou is 55.13%, which is not very good.
However, once the integrated loss backhaul strategy is adopted, the proposed method will improve rapidly and outperform all other algorithms, with an mlio u of 58.49%. Notably, the orbit class accuracy increased by 7.58%. In addition, by adopting a pre-training model on Cityscapes, the method of the embodiment can finally reach 80.99% and 58.82% of optimal R.IoU and mIoU.
TABLE 6
Figure GDA0003870846770000181
FIG. 7 is a graph of the visual comparison of FDRNet and ERFNet trained with an integrated loss-back strategy, as shown in FIG. 7, showing several examples of track region predictions generated by FDRNet (c), which can produce more accurate segmentation results in local details and edges when trained with an integrated loss-back strategy. Most of the predicted track regions are constrained to the expected two elongated regions, which is the purpose of the proposed customized Line Loss (LL) function. When low luminance pictures are processed, as shown in lines 5 and 6 of fig. 7. The method has the advantages that the system structure is more suitable for different illumination conditions, the track area can be accurately and effectively analyzed, the model has higher robustness in different working environments, and the unmanned aerial vehicle-based railway automatic inspection can be better served.
In conclusion, the method can realize rapid and efficient railway scene analysis based on the unmanned aerial vehicle. The network architecture provided can be used for real-time operation of an airborne computer of the unmanned aerial vehicle, greatly improves the accuracy of track area segmentation and extraction according to the characteristics that the track area is relatively straight and is often concentrated in a strip-shaped area, and has obvious application value under the condition of railway automation routing inspection based on the unmanned aerial vehicle.
It will be appreciated by those skilled in the art that the foregoing types of applications are merely exemplary, and that other types of applications, whether presently existing or later to be developed, that may be suitable for use with the embodiments of the present invention, are also intended to be encompassed within the scope of the present invention and are hereby incorporated by reference.
It should be understood by those skilled in the art that the foregoing example of determining the invoking policy according to the user information is only for better explaining the technical solution of the embodiment of the present invention, and is not limited to the embodiment of the present invention. Any method of determining the invoking policy based on the user attributes is included in the scope of embodiments of the present invention.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
While the invention has been described with reference to specific preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. A real-time railway scene analysis method for unmanned aerial vehicle remote sensing images is characterized by comprising the following steps:
acquiring an unmanned aerial vehicle remote sensing image in real time, and acquiring and processing data of the image to obtain a data set;
constructing a railway scene analysis network model, and training and verifying the railway scene analysis network model according to the obtained data set to obtain an optimal line loss proportional coefficient;
the overall architecture for constructing the railway scene analysis network model is shown in the following table 1:
TABLE 1
Figure FDA0003889155550000011
The training and verifying of the railway scene analysis network model according to the obtained data set comprises the following steps: when the data set only has two semantic categories of a track area and a non-track area, training and verifying the railway scene analysis network model only by adopting a line loss function; when other semantic categories are included, training and verifying the railway scene analysis network model by adopting an integrated loss function;
the integration loss function is shown in the following formula (1):
L=(1-α)L CE +αL LL (1)
wherein L is CE Representing the cross entropy loss function, L LL The method comprises the steps that a line loss function is represented, alpha represents a proportionality coefficient, and the optimal line loss proportionality coefficient is obtained by selecting alpha; the line loss function is shown in the following formula (2):
Figure FDA0003889155550000021
wherein, the pixel point sets corresponding to the track area and the non-track area in the image are respectively P r And P n Wherein | P r |=N,|P n I = M, for pixel point p i ∈P r And p j ∈P n The membership degrees of the track areas are respectively 1/lambda i And 1/lambda j ,f i And f j Are respectively a pixel point p i And p j A probability of being predicted as a track region class;
when the railway scene analysis network model is trained and verified according to the obtained data set, the track area in the unmanned aerial vehicle remote sensing image of the railway scene needs to be a long strip shape;
and testing the model by adopting different computers according to the optimal linear loss proportion coefficient to obtain an analysis result, and comprehensively evaluating the analysis result.
2. The method of claim 1, wherein the acquiring and processing the image to obtain the data set comprises screening the acquired image, labeling the screened image with a lableme software, and dividing the image into a training set, a verification set and a test set according to a certain ratio.
3. The method of claim 1, wherein the degree of membership is calculated according to the following equation (3):
λ=d/d 0 (3)
when only a single track area exists in the image, the distance from the pixel point p to a track central line l is d, and the distance from a point on the edge of the track area to the central line l is d 0
When two or more strip-shaped track areas exist in the image, the pixel point p reaches the central line l β D, point on the edge of the beta-th orbital region to the centre line l β Is a distance d 0
4. The method of claim 1, wherein the comprehensive evaluation of the analysis result comprises: and calculating the analytic result obtained by adopting the test set and the corresponding label truth value to obtain prediction precision evaluation, and evaluating the inference speed of the railway scene analytic network model.
5. The method according to claim 4, wherein the prediction accuracy evaluation is calculated according to the following formulas (3) to (4):
Figure FDA0003889155550000031
Figure FDA0003889155550000032
the TP represents the number of pixels of a certain semantic category c predicted to be the category; in the TN indicates the pixel points of the category c, the number of the pixel points which are not predicted to be the category is predicted; the FP expresses the number of the pixels of the category c, which are predicted to be the category but are not the number of the pixels of the category in fact; FN represents the number of pixels of the category, but is not predicted to be the category but actually the number of the pixels of the category; the IoU represents the intersection ratio precision of the category c; in the formula (4), mlou represents the average cross-over ratio of the precision of all semantic categories, and C represents the number of semantic categories.
CN202110518589.4A 2021-05-12 2021-05-12 Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image Active CN113160219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110518589.4A CN113160219B (en) 2021-05-12 2021-05-12 Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110518589.4A CN113160219B (en) 2021-05-12 2021-05-12 Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image

Publications (2)

Publication Number Publication Date
CN113160219A CN113160219A (en) 2021-07-23
CN113160219B true CN113160219B (en) 2023-02-07

Family

ID=76875162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110518589.4A Active CN113160219B (en) 2021-05-12 2021-05-12 Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image

Country Status (1)

Country Link
CN (1) CN113160219B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116562653B (en) * 2023-06-28 2023-11-28 广东电网有限责任公司 Distributed energy station area line loss monitoring method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929621A (en) * 2019-11-15 2020-03-27 中国人民解放军63729部队 Road extraction method based on topology information refinement
CN111047630A (en) * 2019-11-13 2020-04-21 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN111144418A (en) * 2019-12-31 2020-05-12 北京交通大学 Railway track area segmentation and extraction method
CN111582225A (en) * 2020-05-19 2020-08-25 长沙理工大学 Remote sensing image scene classification method and device
CN111723675A (en) * 2020-05-26 2020-09-29 河海大学 Remote sensing image scene classification method based on multiple similarity measurement deep learning
CN111767810A (en) * 2020-06-18 2020-10-13 哈尔滨工程大学 Remote sensing image road extraction method based on D-LinkNet
CN111985451A (en) * 2020-09-04 2020-11-24 南京航空航天大学 Unmanned aerial vehicle scene detection method based on YOLOv4
CN112131967A (en) * 2020-09-01 2020-12-25 河海大学 Remote sensing scene classification method based on multi-classifier anti-transfer learning
CN112308860A (en) * 2020-10-28 2021-02-02 西北工业大学 Earth observation image semantic segmentation method based on self-supervision learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106204572B (en) * 2016-07-06 2020-12-04 合肥工业大学 Road target depth estimation method based on scene depth mapping
CN110084107A (en) * 2019-03-19 2019-08-02 安阳师范学院 A kind of high-resolution remote sensing image method for extracting roads and device based on improvement MRF

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111047630A (en) * 2019-11-13 2020-04-21 芯启源(上海)半导体科技有限公司 Neural network and target detection and depth prediction method based on neural network
CN110929621A (en) * 2019-11-15 2020-03-27 中国人民解放军63729部队 Road extraction method based on topology information refinement
CN111144418A (en) * 2019-12-31 2020-05-12 北京交通大学 Railway track area segmentation and extraction method
CN111582225A (en) * 2020-05-19 2020-08-25 长沙理工大学 Remote sensing image scene classification method and device
CN111723675A (en) * 2020-05-26 2020-09-29 河海大学 Remote sensing image scene classification method based on multiple similarity measurement deep learning
CN111767810A (en) * 2020-06-18 2020-10-13 哈尔滨工程大学 Remote sensing image road extraction method based on D-LinkNet
CN112131967A (en) * 2020-09-01 2020-12-25 河海大学 Remote sensing scene classification method based on multi-classifier anti-transfer learning
CN111985451A (en) * 2020-09-04 2020-11-24 南京航空航天大学 Unmanned aerial vehicle scene detection method based on YOLOv4
CN112308860A (en) * 2020-10-28 2021-02-02 西北工业大学 Earth observation image semantic segmentation method based on self-supervision learning

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Deep Semantic Segmentation Neural Networks of Railway Scene;Zhengwei He et.al;《2018 37th Chinese Control Conference (CCC)》;20181007;全文 *
Deep TEC: Deep Transfer Learning with Ensemble;J. Senthilnath et.al;《Remote Sens》;20200110;全文 *
Learning Disentangled Feature Representation;Xin Li et.al;《arXiv:2007.11430v1》;20200622;全文 *
基于SPUD-ResNet的遥感影像道路提取网络;李代栋等;《计算机工程与应用》;20201111;第57卷(第23期);全文 *
基于深度学习的遥感图像道路提取;赵阳;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20200215;全文 *
遥感道路的场景感知与分类检测;杨俊等;《计算机辅助设计与图形学学报》;20070419;第19卷(第3期);全文 *

Also Published As

Publication number Publication date
CN113160219A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
Zou et al. Robust lane detection from continuous driving scenes using deep neural networks
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN109711320B (en) Method and system for detecting violation behaviors of staff on duty
US20200302248A1 (en) Recognition system for security check and control method thereof
CN113486726B (en) Rail transit obstacle detection method based on improved convolutional neural network
CN111709416B (en) License plate positioning method, device, system and storage medium
CN110287826B (en) Video target detection method based on attention mechanism
CN111598030A (en) Method and system for detecting and segmenting vehicle in aerial image
Ma et al. A real-time crack detection algorithm for pavement based on CNN with multiple feature layers
CN111126359A (en) High-definition image small target detection method based on self-encoder and YOLO algorithm
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
US11244188B2 (en) Dense and discriminative neural network architectures for improved object detection and instance segmentation
Ye et al. Steering angle prediction YOLOv5-based end-to-end adaptive neural network control for autonomous vehicles
CN114782798A (en) Underwater target detection method based on attention fusion
CN115131760A (en) Lightweight vehicle tracking method based on improved feature matching strategy
CN113160219B (en) Real-time railway scene analysis method for unmanned aerial vehicle remote sensing image
CN113435370B (en) Method and device for acquiring vehicle queuing length based on image feature fusion
CN109508639B (en) Road scene semantic segmentation method based on multi-scale porous convolutional neural network
Yang et al. Robust visual tracking using adaptive local appearance model for smart transportation
CN116977712A (en) Knowledge distillation-based road scene segmentation method, system, equipment and medium
CN116994164A (en) Multi-mode aerial image fusion and target detection combined learning method
Meng et al. A modified fully convolutional network for crack damage identification compared with conventional methods
CN109583584B (en) Method and system for enabling CNN with full connection layer to accept indefinite shape input
CN116863384A (en) CNN-Transfomer-based self-supervision video segmentation method and system
CN114494893B (en) Remote sensing image feature extraction method based on semantic reuse context feature pyramid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant