CN113011360B - Road traffic sign line detection method and system based on attention capsule network model - Google Patents

Road traffic sign line detection method and system based on attention capsule network model Download PDF

Info

Publication number
CN113011360B
CN113011360B CN202110331964.4A CN202110331964A CN113011360B CN 113011360 B CN113011360 B CN 113011360B CN 202110331964 A CN202110331964 A CN 202110331964A CN 113011360 B CN113011360 B CN 113011360B
Authority
CN
China
Prior art keywords
road traffic
capsule
traffic sign
sign line
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110331964.4A
Other languages
Chinese (zh)
Other versions
CN113011360A (en
Inventor
管海燕
于永涛
柯福阳
曹爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Kebo Space Information Technology Co ltd
Original Assignee
Jiangsu Simate Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Simate Technology Co ltd filed Critical Jiangsu Simate Technology Co ltd
Priority to CN202110331964.4A priority Critical patent/CN113011360B/en
Publication of CN113011360A publication Critical patent/CN113011360A/en
Application granted granted Critical
Publication of CN113011360B publication Critical patent/CN113011360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a road traffic sign line detection method based on an attention capsule network model, which comprises the following steps: s1, capturing context information of road traffic sign lines in an image layer by utilizing downsampling operation, and coupling a channel attention module to enhance the context information of the road traffic sign lines under the condition of not losing feature details and resolution; s2, the expansion path is a decoder, the structure of the expansion path is symmetrical to that of the encoder, the up-sampling operation is utilized to restore the position information of the road traffic sign line of the image, the details and the image resolution of the road traffic sign line are gradually restored, the space information of the road sign line is captured through the space attention module, and the semantic information of the road traffic sign line is enhanced; the road traffic marking line detection method based on the channel-space attention mechanism capsule network model is faster, more robust and more comprehensive in detecting and extracting the road traffic marking lines based on the road traffic marking line image data.

Description

Road traffic sign line detection method and system based on attention capsule network model
Technical Field
The application relates to the field of intelligent traffic and mapping science, in particular to a road traffic sign line detection method and system based on an attention capsule network model.
Background
In recent years, with the development of economy and the progress of society, problems such as traffic capacity, accessibility, traffic safety and the like of roads are more and more prominent. Roads and road detail elements such as road boundaries, traffic sign lines and the like are important infrastructure traffic facilities in China. The road traffic sign line (Traffic Index Line) is a traffic facility composed of various lines, arrows, characters, patterns and elevation marks, physical marks, raised road signs, outline marks, and the like, which are drawn or mounted on the road surface, and transmits guidance, restriction, warning, and the like to traffic participants. The traffic control device has the functions of controlling and guiding traffic, and can be matched with a sign for use or used independently. The effective maintenance of the road sign line can reduce traffic pressure, enable vehicles to run orderly in a well, reduce traffic accidents and enhance traffic safety. The rapid acquisition and updating of high-precision geometric and semantic information such as shape, position, topology and structural relation information plays an important role in guaranteeing traffic safety, and is a basic and core element in the fields of intelligent traffic, intelligent high-precision maps, navigation and positioning services and the like. In urban environments, road markings are generally judged to be damaged mainly in terms of chromaticity, photosensitivity, damage degree and the like. At present, detection is carried out through manual visual inspection, and the method has the advantages of accuracy, but has the defects of various road shapes, complex scene, low manual measurement efficiency, low degree of automation, more manpower and material resources, influence on road traffic and the like, and can not meet the requirements of rapid extraction and update of high-precision road information.
The availability and clarity of road traffic sign lines are key factors for traffic management systems and traffic accidents. According to semantic knowledge (such as shape) and reflection intensity characteristics of road traffic sign lines, the current method is mainly divided into two main types, namely three-dimensional point cloud driving and two-dimensional image driving road traffic sign line extraction. The three-dimensional point cloud driven road traffic sign line extraction method directly segments the three-dimensional point cloud of the vehicle-mounted/unmanned aerial vehicle to obtain road surface information, and then the road traffic sign line extraction is completed by using the intensity information. However, extracting road traffic sign lines from a massive three-dimensional point cloud remains a very difficult task, especially to process point cloud data with strong concave-convex features and non-uniform distribution. Therefore, the neural network-based deep learning method is used in the road traffic sign line automatic classification study. However, these methods have limitations in terms of processing the mass three-dimensional point cloud data volume of the urban road network. The two-dimensional image driven road traffic sign line extraction method is to complete the road traffic sign line extraction by using mature image processing methods such as Hough transformation, threshold segmentation, mathematical morphology and the like according to semantic information (such as size, direction and shape) of the traffic sign line. However, these methods are difficult to adapt to road traffic sign lines of different sizes and different reflection intensities in complex road environments. Secondly, because the complex scene image or the characteristic image contains a large amount of noise, it is difficult to infer the traffic sign line saliency structure from sparse two-dimensional image data under noise masking, and the robustness is limited to a certain extent. In addition, the research focuses on the independent extraction of structural information of road boundaries and marked lines, ignores semantic information among different types of objects, and lacks description of road scene structuring.
Along with the development of deep learning algorithms, the extraction of high-level abstract features of traffic sign lines and the improvement of the extraction precision and automation degree of road sign lines have become current development trends. Some optimization classification algorithms or convolutional neural networks perform end-to-end road traffic sign line detection. The deep learning method consumes more time and labor in the training process, but has high speed in the actual detection process, and the recognition effect on the road traffic sign line characteristics is obviously better than that of the conventional method. However, the number of model architectures proposed for the road traffic sign line is small, and most models only use the common architecture of the Convolutional Neural Network (CNN) model, and no model architecture suitable for the road traffic sign line identification field for the image characteristics of the road traffic sign line is proposed. In addition, the models are simple in structure, and the problems of fusion of high-level depth features and topological semantic information, extraction of road detail elements and fine recognition of similar targets and incomplete targets exist. A capsule network (capsule network) provides an efficient way to model local to global relationships between entities and can learn a view-invariant representation. Through this improved representation learning, the capsule network can achieve better detection and classification performance with fewer parameters. A capsule network is also a neural network, and differs from a normal neural network in that the neurons of the capsule network are a vector (a set of values) rather than a scalar (a single value), and the neurons are called vector neurons. Each value in a vector neuron represents a certain property, such as pose (position, size, direction), deformation, velocity, color, texture, etc. The capsule uses the length of its output vector to represent the probability of whether a pattern of a certain kind exists, and then uses the internal details of this vector, such as its direction, to determine what features the pattern has. Therefore, the capsule network achieves good effects on tasks such as human behavior positioning, medical image target segmentation, text classification and the like. In addition, the recently proposed attention mechanism (attention mechanism) learns to obtain characteristic channels or spatial attention weights by neural network calculation gradient and forward propagation and backward feedback, so that irrelevant information can be ignored and important information can be focused in the operation process. The attention mechanism is introduced into the object recognition task to learn the context information of the object to be processed. Therefore, the application utilizes the capsule network model of the coding-decoding structure and the context information enhancement module of the coupling channel-space attention mechanism to fully obtain the inherent, obvious and high-order characteristic description of the road marking line, thereby achieving a rapid, convenient, accurate and timely road marking line condition acquisition and evaluation system and being a data disorder for intelligent transportation, intelligent high-precision map, navigation and positioning service and the like.
Disclosure of Invention
The application aims to: based on the above, it is necessary to provide a road traffic sign line detection method based on an attention capsule network model.
The technical scheme is as follows: the application provides a road traffic sign line detection method based on an attention capsule network model, which comprises the following steps:
extracting characteristics of an input road traffic marking line image to be detected through a contracted path based on an encoder-decoder network model to obtain the capsule characteristics of the road traffic marking line, and coupling a channel attention module to enhance the context information of the road traffic marking lines, wherein the resolution and the scale of each road traffic marking line are different;
deconvolution up-sampling is carried out on the road traffic marking line capsule characteristics through the expansion path of the encoder-decoder network model, and the space information of the road traffic marking lines is captured through the space attention module, so that the road traffic marking line information is obtained.
In one embodiment, the feature extraction of the input road traffic sign line image to be detected is performed through a contracted path based on an encoding-decoding structure network model to obtain a road traffic sign line capsule feature, the channel attention module is coupled to enhance the context information of the road traffic sign line, and the resolution and the scale of each road traffic sign line are different, including:
inputting the road traffic sign line image to be detected into a first convolution layer for feature extraction to obtain low-order road traffic sign line feature information;
inputting the low-order road traffic sign line characteristic information into a capsule initial layer for characteristic conversion to obtain a road traffic sign line capsule vector;
carrying out capsule convolution on the road traffic sign line capsule vectors through N capsule convolution groups with different scales to obtain the road traffic sign line capsule characteristics output by each capsule convolution group;
in one embodiment, the scaling step length of the capsule convolution groups with the N different scales is 2, and the scaling step length is connected between the capsule convolution groups through a pooling layer with the maximum value of 2 x 2 as a convolution kernel, so that the spatial resolution of the feature map is gradually reduced from bottom to top; each capsule convolution group comprises 5 convolution kernels with the same feature size, namely a 3×3 capsule convolution layer and an attention channel module; and inputting the road traffic sign line capsule characteristics into the attention channel module, acquiring the context information of the traffic sign line, and carrying out information enhancement on the road traffic sign line characteristics.
In one embodiment, the information enhancement of the road traffic sign line capsule feature through one attention channel to obtain an enhanced road traffic sign line capsule feature includes:
inputting the road traffic sign line capsule feature map into a capsule feature map with a convolution kernel of 1 multiplied by 1, converting the capsule feature map into a one-dimensional capsule feature map, and constructing a channel descriptor through global average value pooling operation;
converting the channel descriptor into a channel attention descriptor, wherein the information coded by each element in the descriptor corresponds to the channel in the input road traffic sign line capsule feature map;
and carrying out channel-level product processing on the channel attention descriptor serving as a channel weight function and the input road traffic sign line capsule feature map, and adjusting the road traffic sign line capsule feature.
In one embodiment, the deconvolution up-sampling is performed on each of the road traffic sign line capsule features by the extended path of the encoding-decoding structure network model, and the road traffic sign line spatial information is captured by the spatial attention module, so as to obtain the road traffic sign line information, which includes:
up-sampling the reinforced road traffic sign line capsule characteristics output by the N layer, and then splicing the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram;
the feature obtained after the capsule convolution operation is carried out on the spliced capsule feature map is input to a channel attention module, and the road traffic sign line capsule feature is output and used as the current enhanced road traffic sign line capsule feature to be spliced;
returning to the step of up-sampling the current reinforced road traffic sign line capsule characteristics to be spliced, and then splicing the current reinforced road traffic sign line capsule characteristics with the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram until the current reinforced road traffic sign line capsule characteristics to be spliced with the reinforced road traffic sign line capsule characteristics output by the previous layer are the reinforced road traffic sign line capsule characteristics output by the first layer to obtain a spliced capsule characteristic diagram;
and carrying out convolution operation on the spliced capsule feature map, outputting the capsule feature of the road traffic sign line, and inputting the capsule feature of the road traffic sign line to a space attention module to obtain the enhancement information of the road traffic sign line.
In one embodiment, the convolving operation is performed on the spliced capsule feature map, the capsule feature of the road traffic sign line is output, and the capsule feature is input to the spatial attention module to obtain the enhancement information of the road traffic sign line, including:
inputting the road traffic sign line capsule feature map into a capsule convolution layer with a convolution kernel of 1 multiplied by 1 to convert the road traffic sign line capsule feature map into 2 one-dimensional capsule feature maps;
the first road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents the image size of the road traffic sign line, and the matrix longitudinally represents the feature number;
the second road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents feature numbers, and the matrix longitudinally represents the image size of the road traffic sign line;
multiplying the two feature matrixes to construct a spatial attention matrix, wherein the transverse and longitudinal directions of the matrix represent the image size of the road traffic sign line;
and converting the road traffic marking line capsule feature map into a road traffic marking line capsule feature matrix, multiplying the road traffic marking line capsule feature matrix by a space attention moment matrix, converting the road traffic marking line capsule feature matrix, and performing convolution operation for obtaining the road traffic marking line capsule feature map by 2 times of convolution kernels into 3×3 capsules.
The application provides a road traffic sign line detection system based on an attention capsule network model, which comprises the following modules:
the contracted path module is used for extracting the characteristics of the input road traffic marking line image to be detected through a contracted path based on the encoder and the decoder to obtain the capsule characteristics of each road traffic marking line, and the resolution and the scale of the capsule characteristics of each road traffic marking line are different;
the channel attention context information enhancement module is used for carrying out channel feature importance analysis based on the road traffic line capsule features, enhancing the context information and obtaining the capsule features of each enhanced traffic sign line;
and the expansion path is used for performing deconvolution up-sampling on the capsule characteristics of each enhanced road traffic sign line through the expansion path based on the encoder and the decoder, and splicing the capsule characteristics with the characteristics of each road traffic sign line of the contracted path to obtain road sign line detection information.
And the space attention module is used for obtaining the road traffic sign line space information based on the road traffic line capsule characteristics so as to obtain the road traffic sign line information enhancement.
According to the road traffic sign line detection method based on the channel-space attention capsule network model, the feature extraction is carried out on the input road traffic sign line image to be detected through the contracted path based on the encoder and the decoder, so that the capsule features of the road traffic sign lines are obtained, and the resolution and the scale of the capsule features of the road traffic sign lines are different; carrying out context information enhancement on each road traffic sign line capsule feature through a channel attention mechanism to obtain each enhanced road traffic sign line capsule feature; and deconvoluting and up-sampling the capsule characteristics of each enhanced road traffic sign line through the expansion path, and enhancing the space information of the attention road traffic sign line of the space channel to obtain the detection information of the road traffic sign line. The encoder and decoder structure is adopted, the channel and the spatial attention information are enhanced, the inherent, obvious and high-order characteristic description of the road traffic sign line is fully obtained, and the recognition accuracy of the road traffic sign line is improved.
Drawings
FIG. 1 is a flow chart of a road traffic sign line detection method based on a channel-space attention capsule network model in one embodiment;
FIG. 2 is a schematic diagram of a channel attention module in one embodiment;
FIG. 3 is a schematic diagram of a spatial attention model in one embodiment;
FIG. 4 is a flow diagram of an encoder-decoder architecture constructed based on a capsule network and channel-space attention model in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In one embodiment, as shown in fig. 1, there is provided a road traffic sign line detection method based on a channel-space attention capsule network model, including the steps of:
step S1, extracting characteristics of an input road traffic marking line image to be detected through a contracted path based on an encoder-decoder network model to obtain the capsule characteristics of the road traffic marking line, coupling a channel attention module and enhancing the context information of the road traffic marking lines, wherein the resolution and the scale of each road traffic marking line are different.
The road traffic sign line image to be detected can be high-resolution road image data acquired by vehicles, satellites, unmanned aerial vehicles and the like. The encoder is a contracted path, captures the context information of the road traffic sign line in the image layer by utilizing the downsampling operation, is coupled with the channel attention module, enhances the context information of the road traffic sign line under the condition of not losing characteristic details and resolution, and greatly improves the target detection performance. The shrink path is a module based on codec de-employment for feature extraction, comprising 2 conventional convolutional layers, a capsule initial layer, and N capsule convolutional groups of different scales.
In one embodiment, feature extraction is performed on an input road traffic sign line image to be detected through a contracted path based on an encoder-decoder network model to obtain a road traffic sign line capsule feature, and the road traffic sign line capsule feature is coupled with a channel attention module to enhance the context information of the road traffic sign lines, wherein the resolution and the scale of each road traffic sign line are different, and the method comprises the following steps:
inputting the road traffic sign line image to be detected into a first convolution layer for feature extraction to obtain low-order road traffic sign line feature information; inputting the low-order road traffic sign line characteristic information into a capsule initial layer for characteristic conversion to obtain a road traffic sign line capsule vector; carrying out capsule convolution on the road traffic sign line capsule vectors through N capsule convolution groups with different scales to obtain the road traffic sign line capsule characteristics output by each capsule convolution group; and inputting the road traffic sign line capsule characteristics into the attention channel module, acquiring the context information of the traffic sign line, and carrying out information enhancement on the road traffic sign line characteristics.
Specifically, the image block (800×800 pixels) of the road traffic sign line to be detected is input to a first convolution layer formed by two 3×3 conventional convolutions, and the activation function of the first convolution layer is ReLU (Rectified Linear Unit, modified linear unit) to extract the characteristic information (such as 256 dimensions) of the low-order road traffic sign line; the extracted low-order road traffic sign line features are input into a primary capsule layer and converted into a road traffic sign line capsule vector, the number of channels of the road traffic sign line vector is set to 64, and the dimension of each capsule is 16 dimensions. That is, the feature map generated after the primary capsule layer convolution operation contains 64 channels, each channel contains 16 pieces of feature information, so that a 16-dimensional capsule is formed at each pixel point. For each entity, each pixel is represented by 64 16-dimensional capsule features;
5 sets of capsule convolution sets of different dimensions were constructed. The capsule convolution groups were downsampled group by a scale of 0.5 times. Downsampling is achieved by essentially 0.5 times the feature image size through a capsule maximum pooling operation in groups 2-5. Thus, the convolution scales of the 5 groups of capsules are {1,1/2,1/4,1/8,1/16} of the input road traffic sign line image, and the sizes of the corresponding capsule feature maps are 800×800 pixels, 400×400 pixels, 200×200 pixels, 100×100 pixels and 50×50 pixels. Each set of capsule convolutions contains 5 capsule convolutions of the same feature image size and spatial resolution. The four capsule convolution sets, except the last capsule convolution set, comprise (1) 1 2 x 2 capsule maximum pooling layers for downsampling the input feature map; (2) Performing convolution operation on 5 3×3 capsule convolution layers with the same size; (3) 1 channel attention coupling module (Channel Feature Attention, CFA is shown in fig. 4) to obtain a road traffic sign line capsule feature map;
the channel attention mechanism operation is performed on the feature map of each set of outputs (as shown in fig. 2). Performing a 1 x 1 convolution operation on the input 64 channel 16-dimensional capsule feature maps (H x W, H, W represent the height and width of the feature maps respectively, and if the feature maps are the first group of capsule convolution groups, the feature maps are 800 x 800 pixels in height and width respectively, and the second group is 400 x 400 pixels, and the like) to convert the feature maps into a one-dimensional capsule feature map a, and constructing the one-dimensional capsule feature map into a channel descriptor through a global average pooling operation; the 2 full link layers (activation functions ReLU and signmoid, respectively) then convert the channel descriptors into channel attention descriptors C, the information encoded by each channel in the descriptors corresponding to the channel in the input capsule profile; finally, channel attention descriptors are used as channel weight functions to carry out channel-level product processing on the input capsule feature images, and channel importance in the input capsule feature images is carried out; after the channel attention coupling operation, a 5-scale/space resolution high-order road traffic sign line characteristic map is output.
Step S2, performing deconvolution up-sampling on the road traffic sign line capsule features through the extended path of the encoder-decoder network model, capturing the road traffic sign line spatial information through a spatial attention module, and obtaining the road traffic sign line information, including:
the expansion path is a decoder part and is also formed by 4 groups of capsule convolution operations, wherein each group of feature images have the same size and spatial resolution and are {1,1/2,1/4,1/8} of the input high-resolution image scale respectively; along the extended path, up the 4 {1/8,1/4,1/2,1} scale capsule convolution groups include one capsule deconvolution layer, 5 capsule deconvolution layers, and a channel attention module. Performing capsule deconvolution 2 times up-sampling operation on 4 capsule convolution groups group by group; each group of up-sampled features are spliced with features which are of corresponding scale and subjected to channel attention operation in the step S1, wherein the features are contracted by a path (encoder); the feature images after each group of splicing are input to a channel attention module through convolution operation of 5 capsule convolution layers, and the importance of the channels in the capsule feature images is adjusted again;
after the layer-by-layer upsampling operation, as shown in fig. 3, the capsule feature map F is restored to 800×800 pixels in spatial resolution, and then input to the spatial feature attention module. First, 2 1×1 convolution operations convert the input feature map F into 2 one-dimensional capsule feature maps B (hxw×64) and D (hxw×64); then, the capsule feature maps B and D are transformed again into 2 feature matrices G (n×64) and E (64×n), where n=h×w; secondly, multiplying the two feature matrixes G and E, and then performing softmax operation to construct a spatial attention matrix S (N multiplied by N); next, the input capsule feature map F is converted into a capsule feature matrix T (nx64×16), and the capsule feature matrix T and the spatial attention matrix S are multiplied; finally, the result matrix obtained after multiplication is transformed to obtain an output capsule feature map P (H multiplied by W multiplied by 64 multiplied by 16), and the final capsule feature map and road traffic sign line information result are output by performing 2 times of 3 multiplied by 3 capsule convolution operation.
According to the road traffic sign line detection method based on the channel-space attention capsule network model, the feature extraction is carried out on the input road traffic sign line image to be detected through the contracted path based on the encoder and the decoder, so that the capsule features of the road traffic sign lines are obtained, and the resolution and the scale of the capsule features of the road traffic sign lines are different; carrying out context information enhancement on each road traffic sign line capsule feature through a channel attention mechanism to obtain each enhanced road traffic sign line capsule feature; and deconvoluting and up-sampling the capsule characteristics of each enhanced road traffic sign line through the expansion path, and enhancing the space information of the attention road traffic sign line of the space channel to obtain the detection information of the road traffic sign line. The encoder and decoder structure is adopted, the channel and the spatial attention information are enhanced, the inherent, obvious and high-order characteristic description of the road traffic sign line is fully obtained, and the recognition accuracy of the road traffic sign line is improved.
In one embodiment, as shown in fig. 4, a road traffic sign line detection method based on a channel-space attention capsule network model is provided, which is specifically implemented as follows:
before a road traffic sign line detection method based on a channel-space attention capsule network model is executed, constructing a coder decoder model based on a capsule network, a channel attention and space attention mechanism, and the method comprises the following steps: a contracted path and an expanded path; the shrink path comprises a first convolution layer (namely two 3X 3 convolution layers), a capsule initial layer, a capsule convolution group with a data scale of 1/2, a capsule convolution group with a data scale of 1/4, a capsule convolution group with a data scale of 1/8 and a capsule convolution group with a data scale of 1/16, each group of capsule convolution groups comprises 5 3X 3 capsule convolution layers (namely the capsule layers in FIG. 4) with the same image and the same characteristic size and a channel attention module, the capsule convolution group with the data scale of 1 is connected with the capsule convolution group with the data scale of 1/2 through a 2X 2 maximum value pooling layer (namely the maximum value pooling layer in FIG. 4), the capsule convolution group with the data scale of 1/2 is connected with the capsule convolution group with the data scale of 1/4 through a 2X 2 maximum value pooling layer, the capsule convolution group with the data scale of 1/8 is connected with the capsule convolution group with the data scale of 1/16 through a 2X 2 maximum value pooling layer, and the capsule convolution group with the data scale of 1/8 is scaled with the data scale of 2; the contraction path comprises a capsule convolution group with a data scale of 1/8, a capsule convolution group with a data scale of 1/4, a capsule convolution group with a data scale of 1/2, and a capsule convolution group with a data scale of 1; each capsule convolution set includes a capsule deconvolution layer, 5 capsule deconvolution layers, and a channel attention module. Performing capsule deconvolution 2 times up-sampling operation on 4 capsule convolution groups group by group; each group of up-sampled features are spliced with features which are of corresponding scale and subjected to channel attention operation in the step S1, wherein the features are contracted by a path (encoder); the feature images after each group of splicing are input to a channel attention module through convolution operation of 5 capsule convolution layers, and the importance of the channels in the capsule feature images is adjusted again; and inputting the capsule feature map with the data scale of 1 into a spatial feature attention module to adjust the spatial information of the road traffic sign line, so as to obtain the traffic sign line enhancement information.
The image block (800×800 pixels) of the road traffic sign line to be detected is input into a first convolution layer formed by two 3×3 conventional convolutions, and the activation function of the first convolution layer is ReLU (Rectified Linear Unit, modified linear unit) to extract low-order road traffic sign line characteristic information (such as 256 dimensions); the extracted low-order road traffic sign line features are input into a primary capsule layer and converted into a road traffic sign line capsule vector, the number of channels of the road traffic sign line vector is set to 64, and the dimension of each capsule is 16 dimensions. That is, the feature map generated after the primary capsule layer convolution operation contains 64 channels, each channel contains 16 pieces of feature information, so that a 16-dimensional capsule is formed at each pixel point. For each entity, each pixel is represented by 64 16-dimensional capsule features;
5 sets of capsule convolution sets of different dimensions were constructed. The capsule convolution groups were downsampled group by a scale of 0.5 times. Downsampling is achieved by essentially 0.5 times the feature image size through a capsule maximum pooling operation in groups 2-5. Thus, the convolution scales of the 5 groups of capsules are {1,1/2,1/4,1/8,1/16} of the input road traffic sign line image, and the sizes of the corresponding capsule feature maps are 800×800 pixels, 400×400 pixels, 200×200 pixels, 100×100 pixels and 50×50 pixels. Each set of capsule convolutions contains 5 capsule convolutions of the same feature image size and spatial resolution. The four capsule convolution sets, except the last capsule convolution set, comprise (1) 1 2 x 2 capsule maximum pooling layers for downsampling the input feature map; (2) Performing convolution operation on 5 3×3 capsule convolution layers with the same size; (3) 1 channel attention coupling module (Channel Feature Attention, CFA is shown in fig. 4) to obtain a road traffic sign line capsule feature map;
the output feature map is subjected to channel attention mechanism operation (as shown in fig. 2). Performing a 1 x 1 convolution operation on the input 64 channel 16-dimensional capsule feature maps (H x W, H, W represent the height and width of the feature maps respectively, and if the feature maps are the first group of capsule convolution groups, the feature maps are 800 x 800 pixels in height and width respectively, and the second group is 400 x 400 pixels, and the like) to convert the feature maps into a one-dimensional capsule feature map a, and constructing the one-dimensional capsule feature map into a channel descriptor through a global average pooling operation; the 2 full link layers (activation functions ReLU and signmoid, respectively) then convert the channel descriptors into channel attention descriptors C, the information encoded by each channel in the descriptors corresponding to the channel in the input capsule profile; finally, channel attention descriptors are used as channel weight functions to carry out channel-level product processing on the input capsule feature images, and channel importance in the input capsule feature images is carried out; after the channel attention coupling operation, a high-order road traffic sign line characteristic diagram with 5 scales/spatial resolution is output.
Step S2, performing deconvolution up-sampling on the road traffic sign line capsule features through the extended path of the encoder-decoder network model, capturing the road traffic sign line spatial information through a spatial attention module, and obtaining the road traffic sign line information, including:
the expansion path is a decoder part and is also formed by 4 groups of capsule convolution operations, wherein each group of feature images have the same size and spatial resolution and are {1,1/2,1/4,1/8} of the input high-resolution image scale respectively; along the extended path, up the 4 {1/8,1/4,1/2,1} scale capsule convolution groups include one capsule deconvolution layer, 5 capsule deconvolution layers, and a channel attention module. Performing capsule deconvolution 2 times up-sampling operation on 4 capsule convolution groups group by group; each group of up-sampled features are spliced with features which are of corresponding scale and subjected to channel attention operation in the step S1, wherein the features are contracted by a path (encoder); the feature images after each group of splicing are input to a channel attention module through convolution operation of 5 capsule convolution layers, and the importance of the channels in the capsule feature images is adjusted again;
after the layer-by-layer upsampling operation, as shown in fig. 3, the capsule feature map F is restored to 800×800 pixels in spatial resolution, and then input to the spatial feature attention module. First, 2 1×1 convolution operations convert the input feature map F into 2 one-dimensional capsule feature maps B (hxw×64) and D (hxw×64); then, the capsule feature maps B and D are transformed again into 2 feature matrices G (n×64) and E (64×n), where n=h×w; secondly, multiplying the two feature matrixes G and E, and then performing softmax operation to construct a spatial attention matrix S (N multiplied by N); next, the input capsule feature map F is converted into a capsule feature matrix T (nx64×16), and the capsule feature matrix T and the spatial attention matrix S are multiplied; finally, the result matrix obtained after multiplication is transformed to obtain an output capsule feature map P (H multiplied by W multiplied by 64 multiplied by 16), and the final capsule feature map and road traffic sign line information result are output by performing 2 times of 3 multiplied by 3 capsule convolution operation.
According to the road traffic sign line detection method based on the channel-space attention capsule network model, the feature extraction is carried out on the input road traffic sign line image to be detected through the contracted path based on the encoder and the decoder, so that the capsule features of the road traffic sign lines are obtained, and the resolution and the scale of the capsule features of the road traffic sign lines are different; carrying out context information enhancement on each road traffic sign line capsule feature through a channel attention mechanism to obtain each enhanced road traffic sign line capsule feature; and deconvoluting and up-sampling the capsule characteristics of each enhanced road traffic sign line through the expansion path, and enhancing the space information of the attention road traffic sign line of the space channel to obtain the detection information of the road traffic sign line. The encoder and decoder structure is adopted, the channel and the spatial attention information are enhanced, the inherent, obvious and high-order characteristic description of the road traffic sign line is fully obtained, and the recognition accuracy of the road traffic sign line is improved.
By the road traffic sign line detection method based on the channel-space attention capsule network model, the capsule network is applied to road traffic sign line detection, and the geometric, position, internal attribute and the like of the road traffic sign line can be described by effectively utilizing the capsule vector; the combined encoder and decoder structure and the channel attention and space attention module integrate the capsule characteristics with different layers and different scales and strengthen the context information of the road traffic sign line, thereby improving the expression of the semantic information of the road traffic sign line, achieving accurate and effective road traffic sign line detection and providing a guarantee for the safety protection of roads.
The foregoing is merely a preferred embodiment of the present application, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and variations should also be regarded as being within the scope of the application.

Claims (3)

1. A method for detecting road traffic sign lines based on an attention capsule network model, the method comprising:
extracting features of an input road traffic marking line image to be detected through a contracted path based on an encoding-decoding structure network model to obtain the capsule features of the road traffic marking line, wherein the encoding-decoding structure network is coupled with a channel attention module to enhance the context information of the road traffic marking lines, and the resolution and the scale of each road traffic marking line are different;
deconvolution up-sampling is carried out on the capsule characteristics of each road traffic sign line through the expansion path of the encoding-decoding structure network model, and the space information of the road traffic sign lines is captured through a space attention module to obtain the information of the road traffic sign lines;
the method comprises the steps of extracting characteristics of an input road traffic marking line image to be detected through a contracted path based on an encoding-decoding structure network model, obtaining the capsule characteristics of the road traffic marking line, coupling a channel attention module, enhancing the context information of the road traffic marking line, wherein the resolution and the scale of each road traffic marking line are different, and specifically comprises the following steps:
inputting the road traffic sign line image to be detected into a first convolution layer for feature extraction to obtain low-order road traffic sign line feature information;
inputting the low-order road traffic sign line characteristic information into a capsule initial layer for characteristic conversion to obtain a road traffic sign line capsule vector;
carrying out capsule convolution on the road traffic sign line capsule vectors through N capsule convolution groups with different scales to obtain the road traffic sign line capsule characteristics output by each capsule convolution group;
the N capsule convolution groups with different scales carry out capsule convolution on the road traffic sign line capsule vectors to obtain the road traffic sign line capsule characteristics output by each capsule convolution group, and the method comprises the following steps:
the scaling step length of the N capsule convolution groups with different scales is 2, and the capsule convolution groups are connected through a pooling layer with the maximum value of 2 multiplied by 2 by the convolution kernel, so that the spatial resolution of the road traffic sign line characteristic map is gradually reduced from bottom to top;
each capsule convolution group comprises 5 convolution kernels with the same feature size, namely a 3×3 capsule convolution layer and an attention channel module;
inputting the road traffic sign line capsule characteristics into an attention channel module, acquiring traffic sign line context information, and carrying out information enhancement on the road traffic sign line characteristics;
deconvolution up-sampling of the road traffic sign line capsule features by the extended path of the encoding-decoding structural network model, capturing road traffic sign line spatial information by a spatial attention module, and obtaining the road traffic sign line information, comprising:
up-sampling the reinforced road traffic sign line capsule characteristics output by the N layer, and then splicing the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram;
the feature obtained after the capsule convolution operation is carried out on the spliced capsule feature map is input to a channel attention module, and the road traffic sign line capsule feature is output and used as the current enhanced road traffic sign line capsule feature to be spliced;
returning to the step of performing up-sampling on the current reinforced road traffic sign line capsule characteristics to be spliced, and then splicing with the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram until returning to the step of performing splicing on the reinforced road traffic sign line capsule characteristics output by the first layer to obtain a spliced capsule characteristic diagram;
convolving the spliced capsule feature map, outputting the capsule feature of the road traffic sign line, and inputting the capsule feature of the road traffic sign line to a space attention module to obtain the enhancement information of the road traffic sign line;
performing convolution operation on the spliced capsule feature map, outputting the capsule feature of the road traffic sign line, inputting the capsule feature to a spatial attention module, and obtaining the enhancement information of the road traffic sign line, wherein the convolution operation comprises the following steps:
inputting the road traffic sign line capsule feature map into a capsule convolution layer with a convolution kernel of 1 multiplied by 1 to convert the road traffic sign line capsule feature map into 2 one-dimensional capsule feature maps;
the first road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents the image size of the road traffic sign line, and the matrix longitudinally represents the feature number;
the second road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents feature numbers, and the matrix longitudinally represents the image size of the road traffic sign line;
multiplying the two feature matrixes to construct a spatial attention matrix, wherein the transverse and longitudinal directions of the matrix represent the image size of the road traffic sign line;
and converting the road traffic marking line capsule feature map into a road traffic marking line capsule feature matrix, multiplying the road traffic marking line capsule feature matrix by a space attention moment matrix, converting the road traffic marking line capsule feature matrix, and performing convolution operation for obtaining the road traffic marking line capsule feature map by 2 times of convolution kernels into 3×3 capsules.
2. The method of claim 1, wherein the information enhancement of the road traffic marking capsules feature by a attention channel, obtaining an enhanced road traffic marking capsules feature, comprises:
inputting the road traffic sign line capsule feature map into a capsule feature map with a convolution kernel of 1 multiplied by 1, converting the capsule feature map into a one-dimensional capsule feature map, and constructing a channel descriptor through global average value pooling operation;
converting the channel descriptor into a channel attention descriptor, wherein the information coded by each element in the descriptor corresponds to the channel in the input road traffic sign line capsule feature map;
and carrying out channel-level product processing on the channel attention descriptor serving as a channel weight function and the input road traffic sign line capsule feature map, and adjusting the road traffic sign line capsule feature.
3. A road traffic sign line detection system based on an attention capsule network model, the system comprising the following modules:
the contraction path module is used for extracting characteristics of the input road traffic marking line image to be detected through a contraction path based on the encoding-decoding structure to obtain the capsule characteristics of each road traffic marking line, and the resolution and the scale of the capsule characteristics of each road traffic marking line are different, specifically:
inputting the road traffic sign line image to be detected into a first convolution layer for feature extraction to obtain low-order road traffic sign line feature information;
inputting the low-order road traffic sign line characteristic information into a capsule initial layer for characteristic conversion to obtain a road traffic sign line capsule vector;
carrying out capsule convolution on the road traffic sign line capsule vectors through N capsule convolution groups with different scales to obtain the road traffic sign line capsule characteristics output by each capsule convolution group;
the scaling step length of the N capsule convolution groups with different scales is 2, and the capsule convolution groups are connected through a pooling layer with the maximum value of 2 multiplied by 2 by the convolution kernel, so that the spatial resolution of the road traffic sign line characteristic map is gradually reduced from bottom to top;
each capsule convolution group comprises 5 convolution kernels with the same feature size, namely a 3×3 capsule convolution layer and an attention channel module;
inputting the road traffic sign line capsule characteristics into an attention channel module, acquiring traffic sign line context information, and carrying out information enhancement on the road traffic sign line characteristics;
the channel attention context information enhancement module is used for enhancing the context information by carrying out channel feature importance analysis based on the road traffic line capsule features to obtain the capsule features of each enhanced traffic sign line;
the extended path module is used for performing deconvolution up-sampling on the capsule characteristics of each enhanced road traffic sign line through an extended path based on the encoding-decoding structure, and splicing the capsule characteristics of each road traffic sign line of the contracted path to obtain road sign line detection information, and specifically comprises the following steps:
up-sampling the reinforced road traffic sign line capsule characteristics output by the N layer, and then splicing the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram;
the feature obtained after the capsule convolution operation is carried out on the spliced capsule feature map is input to a channel attention module, and the road traffic sign line capsule feature is output and used as the current enhanced road traffic sign line capsule feature to be spliced;
returning to the step of performing up-sampling on the current reinforced road traffic sign line capsule characteristics to be spliced, and then splicing with the reinforced road traffic sign line capsule characteristics output by the previous layer to obtain a spliced road traffic sign line capsule characteristic diagram until returning to the step of performing splicing on the reinforced road traffic sign line capsule characteristics output by the first layer to obtain a spliced capsule characteristic diagram;
convolving the spliced capsule feature map, outputting the capsule feature of the road traffic sign line, and inputting the capsule feature of the road traffic sign line to a space attention module to obtain the enhancement information of the road traffic sign line;
the space attention module is used for obtaining the road traffic sign line space information based on the road traffic line capsule characteristics so as to obtain the road traffic sign line information enhancement, and specifically comprises the following steps:
inputting the road traffic sign line capsule feature map into a capsule convolution layer with a convolution kernel of 1 multiplied by 1 to convert the road traffic sign line capsule feature map into 2 one-dimensional capsule feature maps;
the first road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents the image size of the road traffic sign line, and the matrix longitudinally represents the feature number;
the second road traffic sign line feature map is converted into a feature matrix, the matrix transversely represents feature numbers, and the matrix longitudinally represents the image size of the road traffic sign line;
multiplying the two feature matrixes to construct a spatial attention matrix, wherein the transverse and longitudinal directions of the matrix represent the image size of the road traffic sign line;
and converting the road traffic marking line capsule feature map into a road traffic marking line capsule feature matrix, multiplying the road traffic marking line capsule feature matrix by a space attention moment matrix, converting the road traffic marking line capsule feature matrix, and performing convolution operation for obtaining the road traffic marking line capsule feature map by 2 times of convolution kernels into 3×3 capsules.
CN202110331964.4A 2021-03-29 2021-03-29 Road traffic sign line detection method and system based on attention capsule network model Active CN113011360B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110331964.4A CN113011360B (en) 2021-03-29 2021-03-29 Road traffic sign line detection method and system based on attention capsule network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110331964.4A CN113011360B (en) 2021-03-29 2021-03-29 Road traffic sign line detection method and system based on attention capsule network model

Publications (2)

Publication Number Publication Date
CN113011360A CN113011360A (en) 2021-06-22
CN113011360B true CN113011360B (en) 2023-11-24

Family

ID=76408512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110331964.4A Active CN113011360B (en) 2021-03-29 2021-03-29 Road traffic sign line detection method and system based on attention capsule network model

Country Status (1)

Country Link
CN (1) CN113011360B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN111428556A (en) * 2020-02-17 2020-07-17 浙江树人学院(浙江树人大学) Traffic sign recognition method based on capsule neural network
CN111582201A (en) * 2020-05-12 2020-08-25 重庆理工大学 Lane line detection system based on geometric attention perception
CN111861925A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on attention mechanism and gate control circulation unit
CN112184687A (en) * 2020-10-10 2021-01-05 南京信息工程大学 Road crack detection method based on capsule characteristic pyramid and storage medium
CN112241728A (en) * 2020-10-30 2021-01-19 中国科学院合肥物质科学研究院 Real-time lane line detection method and system for learning context information by adopting attention mechanism
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN111428556A (en) * 2020-02-17 2020-07-17 浙江树人学院(浙江树人大学) Traffic sign recognition method based on capsule neural network
CN111582201A (en) * 2020-05-12 2020-08-25 重庆理工大学 Lane line detection system based on geometric attention perception
CN111861925A (en) * 2020-07-24 2020-10-30 南京信息工程大学滨江学院 Image rain removing method based on attention mechanism and gate control circulation unit
CN112184687A (en) * 2020-10-10 2021-01-05 南京信息工程大学 Road crack detection method based on capsule characteristic pyramid and storage medium
CN112241728A (en) * 2020-10-30 2021-01-19 中国科学院合肥物质科学研究院 Real-time lane line detection method and system for learning context information by adopting attention mechanism
CN112418027A (en) * 2020-11-11 2021-02-26 青岛科技大学 Remote sensing image road extraction method for improving U-Net network

Also Published As

Publication number Publication date
CN113011360A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
CN113850825B (en) Remote sensing image road segmentation method based on context information and multi-scale feature fusion
Yu et al. A real-time detection approach for bridge cracks based on YOLOv4-FPM
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN109190752B (en) Image semantic segmentation method based on global features and local features of deep learning
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
CN105354568A (en) Convolutional neural network based vehicle logo identification method
CN108876805B (en) End-to-end unsupervised scene passable area cognition and understanding method
CN114187450A (en) Remote sensing image semantic segmentation method based on deep learning
CN113095152B (en) Regression-based lane line detection method and system
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN116343053B (en) Automatic solid waste extraction method based on fusion of optical remote sensing image and SAR remote sensing image
CN116453121B (en) Training method and device for lane line recognition model
CN113052106A (en) Airplane take-off and landing runway identification method based on PSPNet network
CN115497002A (en) Multi-scale feature fusion laser radar remote sensing classification method
Li et al. An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
Zuo et al. A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields
Wang et al. Global perception-based robust parking space detection using a low-cost camera
CN115049945A (en) Method and device for extracting lodging area of wheat based on unmanned aerial vehicle image
CN112668662B (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN117274388B (en) Unsupervised three-dimensional visual positioning method and system based on visual text relation alignment
CN113011360B (en) Road traffic sign line detection method and system based on attention capsule network model
CN113033411A (en) Ground semantic cognition method based on segmentation and attention mechanism
CN117037119A (en) Road target detection method and system based on improved YOLOv8
CN115965783A (en) Unstructured road segmentation method based on point cloud and image feature fusion
Zhang et al. Infrastructure 3D Target detection based on multi-mode fusion for intelligent and connected vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 214000 building 50-9, South Shanhe Road, East Xianfeng Road, anzhen street, Xishan District, Wuxi City, Jiangsu Province

Patentee after: Wuxi Simate Intelligent Technology Co.,Ltd.

Country or region after: China

Address before: 214000 building 50-9, South Shanhe Road, East Xianfeng Road, anzhen street, Xishan District, Wuxi City, Jiangsu Province

Patentee before: Jiangsu Simate Technology Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right

Effective date of registration: 20240417

Address after: Room 901-1, 9 / F, 168 Lushan Road, Jianye District, Nanjing, Jiangsu 210000

Patentee after: JIANGSU KEBO SPACE INFORMATION TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 214000 building 50-9, South Shanhe Road, East Xianfeng Road, anzhen street, Xishan District, Wuxi City, Jiangsu Province

Patentee before: Wuxi Simate Intelligent Technology Co.,Ltd.

Country or region before: China