CN115115915A - Zebra crossing detection method and system based on intelligent intersection - Google Patents

Zebra crossing detection method and system based on intelligent intersection Download PDF

Info

Publication number
CN115115915A
CN115115915A CN202210652919.3A CN202210652919A CN115115915A CN 115115915 A CN115115915 A CN 115115915A CN 202210652919 A CN202210652919 A CN 202210652919A CN 115115915 A CN115115915 A CN 115115915A
Authority
CN
China
Prior art keywords
zebra crossing
feature map
feature
image data
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210652919.3A
Other languages
Chinese (zh)
Inventor
闫军
丁丽珠
王艳清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smart Intercommunication Technology Co ltd
Original Assignee
Smart Intercommunication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smart Intercommunication Technology Co ltd filed Critical Smart Intercommunication Technology Co ltd
Priority to CN202210652919.3A priority Critical patent/CN115115915A/en
Publication of CN115115915A publication Critical patent/CN115115915A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a zebra crossing detection method and a zebra crossing detection system based on an intelligent crossing. The method comprises the steps of obtaining a zebra crossing image dataset; inputting a plurality of zebra crossing images containing real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction, and outputting a plurality of first zebra crossing feature maps; respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of a zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps; inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery, and outputting a predicted pixel category corresponding to each pixel; and constructing a loss function according to the real pixel category and the predicted pixel category, performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model, and detecting the position of the zebra crossing in the zebra crossing image to be detected.

Description

Zebra crossing detection method and system based on intelligent intersection
Technical Field
The application relates to the technical field of intelligent urban traffic management, in particular to a zebra crossing detection method and a zebra crossing detection system based on an intelligent crossing.
Background
In recent years, with the increasing urbanization process, the number of motor vehicles is increasing, and problems such as urban traffic jam and parking contradiction are also generated. Therefore, increasing the level of intelligence in urban traffic management is a social problem that needs to be solved urgently at present, and increasing the level of intelligence in urban traffic management can effectively improve the current traffic management situation.
The intelligent urban traffic management system is an intelligent management system which can collect, process and feed back traffic information accurately in real time by relying on an artificial intelligence algorithm, a cloud service platform, intelligent hardware equipment, edge end computing equipment and the like. The intelligent management system comprises evidence capturing and warning of traffic illegal behaviors such as red light running, overspeed and the like, guidance and parking position recording of road side parking, real-time updating, prediction and release of traffic jam conditions and the like. Through intelligent traffic management, the urban road traffic condition can be effectively improved, more evidences about vehicle violation, irregular driving and the like can be provided for traffic management departments, and more intelligent application and management can be realized on the whole traffic flow, so that the urban traffic management system has positive promoting effects on various aspects of urban traffic management, driving safety and the like.
The zebra crossing detection is an important ring in realizing intelligent traffic management. At a traffic intersection with a large traffic flow, efficient and accurate zebra crossing detection has important significance for improving intersection traffic safety. However, the conventional zebra crossing detection method outputs the zebra crossing by generating a minimum outsourcing rectangle of the zebra crossing, so that under certain viewing angles and a fisheye lens, the zebra crossing is long and narrow or deformed or the zebra crossing is worn. The traditional zebra crossing detection method has poor detection effect on the zebra crossing by adopting a rectangular frame detection mode, so that the detection accuracy on the zebra crossing is low, and the intelligent management on the city is not facilitated.
Content of application
The method aims to solve the technical problem that the accuracy rate of a traditional zebra crossing detection method is low. In order to achieve the purpose, the application provides a zebra crossing detection method and a zebra crossing detection system based on an intelligent intersection.
The application provides a zebra crossing detection method based on an intelligent crossing, which comprises the following steps:
acquiring a zebra crossing image data set, wherein the zebra crossing image data set comprises a plurality of zebra crossing images containing real pixel class labels;
inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction, and outputting a plurality of first zebra crossing feature maps;
respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps;
inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery, and outputting a predicted pixel category corresponding to each pixel;
constructing a loss function according to the real pixel category and the predicted pixel category, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model;
and detecting the position of the zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
In one embodiment, the acquiring the zebra crossing image dataset comprises a plurality of zebra crossing images containing real pixel class labels, and comprises:
acquiring a first image data set containing zebra stripes, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to obtain a first zebra stripe image data set;
acquiring a second image data set containing zebra stripes, wherein the second image data set and the first image data set are image data sets acquired under different scenes;
performing model training on the circularly generated countermeasure network according to the first image data set and the second image data set to obtain a circularly generated countermeasure network after training is completed;
inputting the first zebra crossing image data set to the trained loop generation countermeasure network, and outputting a second zebra crossing image data set under different simulated scenes;
and obtaining the zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
In one embodiment, the inputting the plurality of zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction and outputting a plurality of first zebra crossing feature maps includes:
according to the pooling layer of the feature extraction network, feature extraction is carried out on the zebra crossing images containing the real pixel class labels to obtain a first feature map set;
according to the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer of the feature extraction network, feature extraction is carried out on the first feature map set to obtain a plurality of first zebra crossing feature maps;
wherein a ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
In one embodiment, the inputting the plurality of first zebra crossing feature maps into the spatial attention network and the channel attention network of the zebra crossing detection model respectively in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps includes:
inputting each first zebra crossing feature map into a transverse dimension space attention network, and carrying out average pooling and longitudinal replication type capacity expansion on the first zebra crossing feature maps to obtain a transverse attention parameter feature map;
inputting each first zebra crossing characteristic diagram into a longitudinal dimension space attention network, and carrying out average pooling and transverse replication type capacity expansion on the first zebra crossing characteristic diagrams to obtain longitudinal attention parameter characteristic diagrams;
adding the transverse attention parameter feature map and the longitudinal attention parameter feature map pixel by pixel to obtain a spatial attention parameter feature map;
inputting the spatial attention parameter characteristic diagram into a normalization layer, and normalizing the spatial attention parameter to obtain a normalized spatial attention parameter;
and multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to obtain a spatial attention weighted feature map.
In one embodiment, the inputting, in parallel, the plurality of first zebra crossing feature maps into the spatial attention network and the channel attention network of the zebra crossing detection model respectively to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps further includes:
inputting each first zebra crossing feature map into two identical convolution layers respectively, and outputting a first feature map and a second feature map;
respectively performing feature map reshaping on the first feature map and the second feature map to obtain a first reshaped feature map and a second reshaped feature map;
performing feature transposition on the second repeated plastic feature map to obtain a second transposed feature map;
performing matrix multiplication operation on the first remolded characteristic diagram and the second transposed characteristic diagram to obtain a channel characteristic diagram, and performing characteristic diagram parameter normalization on the channel characteristic diagram to obtain a channel dimension attention characteristic diagram;
performing feature map remodeling on the first zebra crossing feature map to obtain a first zebra crossing remodeling feature map;
performing matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and performing feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map;
and performing pixel-by-pixel addition on the space attention weighted feature map and the channel attention weighted feature map corresponding to each first zebra crossing feature map to obtain a plurality of second zebra crossing feature maps.
In one embodiment, the inputting the plurality of second zebra crossing feature maps into an upsampling network of the zebra crossing detection model for image size recovery and outputting a predicted pixel class corresponding to each pixel includes:
inputting each second zebra crossing feature map into 5 deconvolution layers of an upper sampling network to restore the image size to the image size of the original zebra crossing image, and obtaining a plurality of second zebra crossing feature maps with the original image size;
and obtaining a prediction pixel class corresponding to each pixel according to the second zebra crossing feature map of each original image size.
In one embodiment, the present application provides a zebra crossing detection system based on intelligent intersections, comprising:
the system comprises an image data acquisition module, a pixel classification module and a pixel classification module, wherein the image data acquisition module is used for acquiring a zebra crossing image dataset which comprises a plurality of zebra crossing images containing real pixel class labels;
the feature extraction network module is used for inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction and outputting a plurality of first zebra crossing feature maps;
the attention network module is used for respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps;
the up-sampling module is used for inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery and outputting a prediction pixel category corresponding to each pixel;
the model training module is used for constructing a loss function according to the real pixel class and the predicted pixel class, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model;
and the detection module is used for detecting the position of the zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
In one embodiment, the image data acquisition module comprises:
the device comprises a first zebra crossing image data set acquisition module, a second zebra crossing image data set acquisition module and a zebra crossing image data set generation module, wherein the first zebra crossing image data set acquisition module is used for acquiring a first image data set containing zebra crossings, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to acquire a first zebra crossing image data set;
the second image data set acquisition module is used for acquiring a second image data set containing zebra crossings, wherein the second image data set and the first image data set are image data sets acquired under different scenes;
the circularly generated confrontation network module is used for carrying out model training on the circularly generated confrontation network according to the first image data set and the second image data set to obtain a circularly generated confrontation network after training is finished;
the second zebra crossing image data set acquisition module is used for inputting the first zebra crossing image data set to the trained cyclic generation countermeasure network and outputting a second zebra crossing image data set under different simulated scenes;
and the zebra crossing image data set acquisition module is used for acquiring the zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
In one embodiment, the feature extraction network module comprises:
the pooling layer module is used for performing feature extraction on the zebra crossing images containing the real pixel class labels according to a pooling layer of the feature extraction network to obtain a first feature atlas;
the full convolution layer module is used for performing feature extraction on the first feature map set according to a first full convolution layer, a second full convolution layer, a third full convolution layer and a fourth full convolution layer of the feature extraction network to obtain a plurality of first zebra crossing feature maps;
wherein a ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
In one embodiment, the attention network module comprises:
a transverse attention parameter feature map obtaining module, configured to input each first zebra crossing feature map into a transverse dimension space attention network, and perform average pooling and longitudinal replication type capacity expansion on the first zebra crossing feature maps to obtain a transverse attention parameter feature map;
a longitudinal attention parameter feature map obtaining module, configured to input each first zebra crossing feature map into a longitudinal dimension spatial attention network, and perform average pooling and transverse replication type capacity expansion on the first zebra crossing feature maps to obtain a longitudinal attention parameter feature map;
the spatial attention parameter characteristic map acquisition module is used for performing pixel-by-pixel addition on the transverse attention parameter characteristic map and the longitudinal attention parameter characteristic map to obtain a spatial attention parameter characteristic map;
the normalized spatial attention parameter acquisition module is used for inputting the spatial attention parameter characteristic diagram into a normalization layer, and normalizing the spatial attention parameter to obtain a normalized spatial attention parameter;
and the spatial attention weighted feature map acquisition module is used for multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to acquire a spatial attention weighted feature map.
In one embodiment, the attention network module further comprises:
the convolution module is used for respectively inputting each first zebra crossing feature map into two identical convolution layers and outputting a first feature map and a second feature map;
the first feature reshaping module is used for respectively reshaping the first feature map and the second feature map to obtain a first reshaped feature map and a second reshaped feature map;
the feature transposing module is used for performing feature transposing on the second reshaped feature map to obtain a second transposed feature map;
a channel dimension attention feature map obtaining module, configured to perform matrix multiplication on the first reshaped feature map and the second transposed feature map to obtain a channel feature map, and perform feature map parameter normalization on the channel feature map to obtain a channel dimension attention feature map;
the second characteristic reshaping module is used for performing characteristic map reshaping on the first zebra crossing characteristic map to obtain a first zebra crossing reshaping characteristic map;
the channel attention weighted feature map acquisition module is used for performing matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and performing feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map;
a second zebra crossing feature map obtaining module, configured to perform pixel-by-pixel addition on the spatial attention weighted feature map and the channel attention weighted feature map corresponding to each first zebra crossing feature map, so as to obtain multiple second zebra crossing feature maps.
In one embodiment, the upsampling module comprises:
the image size recovery module is used for inputting each second zebra crossing feature map into 5 deconvolution layers of an upper sampling network to recover the image size to the image size of the original zebra crossing image, and obtaining a plurality of second zebra crossing feature maps with the original image size;
and the pixel category acquisition module is used for acquiring the predicted pixel category corresponding to each pixel according to the second zebra crossing feature map of each original image size.
In the zebra crossing detection method and system based on the intelligent intersection, the acquired zebra crossing image data set is used as a model training set and sequentially input into a feature extraction network, a space attention network, a channel attention network and an up-sampling network of the zebra crossing detection model, a loss function is constructed based on the real pixel type and the predicted pixel type to train and optimize the model, and the trained zebra crossing detection model is acquired and used for detecting the position of the zebra crossing in the zebra crossing image to be detected. Therefore, through the zebra crossing detection method based on the intelligent crossing, the feature extraction in space is enhanced by adding a double attention mechanism, the zebra crossing data of the long and thin strip-shaped space characteristics can be effectively utilized, the feature extraction in channel dimensions is enhanced, the robustness of zebra crossing detection can be improved, the space characteristics of the zebra crossing and the characteristics between channels are accurately extracted, the detection accuracy of a zebra crossing detection model is improved, the zebra crossing detection effect is improved, and the urban intelligent management is facilitated.
Drawings
Fig. 1 is a schematic flow chart illustrating steps of a zebra crossing detection method based on an intelligent intersection provided by the present application.
Fig. 2 is a schematic structural diagram of a zebra crossing detection system based on an intelligent intersection provided in the present application.
Detailed Description
The technical solution of the present application is further described in detail by the accompanying drawings and examples.
Referring to fig. 1, the present application provides a zebra crossing detection method based on an intelligent intersection, including:
s10, acquiring a zebra crossing image data set, wherein the zebra crossing image data set comprises a plurality of zebra crossing images containing real pixel class labels;
s20, inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of the zebra crossing detection model for feature extraction, and outputting a plurality of first zebra crossing feature maps;
s30, respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting second zebra crossing feature maps;
s40, inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery, and outputting a predicted pixel type corresponding to each pixel;
s50, constructing a loss function according to the real pixel type and the prediction pixel type, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model;
and S60, detecting the position of the zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
In this embodiment, in S10, the zebra crossing image data set is acquired by the traffic intersection monitoring camera. The traffic intersection monitoring camera covers cameras at a plurality of visual angles, and shot image data comprise zebra crossing image data at a plurality of azimuth visual angles such as transverse, longitudinal and oblique directions. And performing pixel-by-pixel semantic segmentation and labeling on the zebra crossing in the image acquired by the camera by using a polygon labeling tool to obtain a plurality of zebra crossing images containing real pixel class labels, and forming a zebra crossing image data set. The pixel category label comprises two pixel categories, namely a background category and a zebra crossing category, and the position of the zebra crossing can be obtained by obtaining the category of a pixel point in the image.
In S20, the feature extraction network, the spatial attention network, the channel attention network, and the upsampling network constitute a zebra crossing detection model. The feature extraction network is used for adjusting the feature size of the feature images of the zebra crossing images, and enhancing the feature extraction of the images by expanding the number of feature channels, so that the feature extraction of each zebra crossing image is realized, and the specific positions of the zebra crossings in the zebra crossing images can be determined.
In S30, the spatial attention network is connected in parallel with the channel attention network to form an attention mechanism network. The spatial attention network is used for enhancing the spatial feature extraction of the zebra crossing. The channel attention network is used for effectively enhancing the extraction of semantic features by acquiring the interdependencies among different channels. The space attention network and the channel attention network are operated in parallel, feature extraction in space and channel dimensions is achieved, detection accuracy of the zebra crossing detection model is improved, and the position of the zebra crossing can be accurately determined.
In S40, the image size is restored by performing feature extraction on the second zebra crossing feature map formed by the spatial attention network and the channel attention network through an upsampling operation, the image size that is originally input to the feature extraction network is restored, and the prediction pixel type corresponding to each pixel point after the restored image size is output is corresponded. The zebra crossing detection model is composed of a feature extraction network, a space attention network, a channel attention network and an up-sampling network, a plurality of zebra crossing images containing real pixel class labels are input into the zebra crossing detection model, and a prediction pixel class corresponding to each pixel is output.
In S50, a loss function of the zebra crossing detection model is constructed by comparing the predicted pixel class corresponding to each pixel predicted by the zebra crossing detection model with the real pixel class, and the model is trained and optimized to form a trained zebra crossing detection model.
In S60, the zebra crossing image to be detected may be an image containing the zebra crossing acquired by the traffic intersection monitoring camera. Inputting the zebra crossing image to be detected into the trained zebra crossing detection model, outputting the pixel class corresponding to each pixel, and further knowing whether each pixel corresponds to the zebra crossing class. Thus, a plurality of pixels corresponding to the zebra crossing category form a specific position of the zebra crossing. Therefore, the position of the zebra crossing in the zebra crossing image to be detected is detected according to the trained zebra crossing detection model.
According to the zebra crossing detection method based on the intelligent intersection, the acquired zebra crossing image data set is used as a model training set and sequentially input into a feature extraction network, a space attention network, a channel attention network and an up-sampling network of a zebra crossing detection model, a loss function is constructed based on a real pixel class and a prediction pixel class to train and optimize the model, and the trained zebra crossing detection model is obtained and used for detecting the position of the zebra crossing in the zebra crossing image to be detected. Therefore, by adding a double attention mechanism, the method for detecting the zebra crossing based on the intelligent intersection enhances the feature extraction in space, can effectively utilize the zebra crossing data of the elongated strip-shaped spatial characteristics, enhances the feature extraction in channel dimension, can improve the robustness of zebra crossing detection, accurately extracts the spatial characteristics of the zebra crossing and the inter-channel characteristics, and improves the detection accuracy of a zebra crossing detection model, thereby improving the zebra crossing detection effect and being beneficial to realizing intelligent management of cities.
In one embodiment, S10, acquiring a zebra crossing image dataset comprising a plurality of zebra crossing images with true pixel class labels, includes:
s110, acquiring a first image data set containing zebra stripes, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to obtain a first zebra stripe image data set;
s120, acquiring a second image data set containing zebra crossings, wherein the second image data set and the first image data set are image data sets acquired under different scenes;
s130, performing model training on the circularly generated countermeasure network according to the first image data set and the second image data set to obtain a circularly generated countermeasure network after training is completed;
s140, inputting the first zebra crossing image data set to a trained loop generation countermeasure network, and outputting a second zebra crossing image data set under different simulated scenes;
s150, obtaining a zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
In this embodiment, the first image data set is image data acquired by a traffic intersection monitoring camera. And performing semantic segmentation and labeling on each image in the first image data set to obtain a first zebra crossing image data set. Each image in the first zebra crossing image data set is labeled with a real pixel type, which is a background type and a zebra crossing type. The second image dataset is an image dataset acquired in a different scene than the first image dataset. Different scenes can be understood as different weather conditions, such as sunny days, rainy days, foggy days, etc., and also as time scenes at different times, such as morning, midday, evening, etc. The second image data set is image data acquired in a case where the weather conditions are different from the first image data set.
And performing model training by taking the unpaired second image data set and the first image data set as training data, and constructing a generator and a discriminator network of a rainy or foggy scene data generation model by adopting a circularly generated confrontation network to obtain the circularly generated confrontation network after training. And inputting the first zebra crossing image data sets under different weathers into the trained cyclic generation countermeasure network, and outputting a second zebra crossing image data set simulating different weathers from the first zebra crossing image data set. The second zebra crossing image data set can be zebra crossing data generated in a simulated mode under different weather conditions such as rainy days and foggy days. And fusing the first zebra crossing image data set and the second zebra crossing image data set under different weather scenes together to form a zebra crossing image data set serving as a model training set for realizing training optimization of the zebra crossing detection model. The zebra crossing image data sets under different scene conditions are fused together, so that training set data of the zebra crossing detection model are more comprehensive and sufficient, zebra crossing states under different scene conditions are simulated in a data enhancement mode, the zebra crossing detection accuracy under special conditions is improved, and the accuracy of the zebra crossing detection model is improved.
In one embodiment, S20, inputting a plurality of zebra crossing images with real pixel class labels into a feature extraction network of the zebra crossing detection model for feature extraction, and outputting a plurality of first zebra crossing feature maps, includes:
s210, according to a pooling layer of a feature extraction network, performing feature extraction on a plurality of zebra crossing images containing real pixel class labels to obtain a first feature map set;
s220, extracting the features of the first feature map set according to the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer of the feature extraction network to obtain a plurality of first zebra crossing feature maps;
wherein the ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
In this embodiment, a network structure having five stages of pooling and full convolution forms a feature extraction network. In the first stage, a pooling layer is adopted to reduce the size of a characteristic diagram of a plurality of zebra crossing images containing real pixel class marks, so that parameters of subsequent model calculation are reduced, and the detection efficiency of the zebra crossing detection model is improved. And the pooling layer is calculated in a maximum pooling mode, the pooling step length is set to be 2, and the features of the zebra crossing image are extracted. A convolution layer can be connected behind the pooling layer and is used for expanding the number of characteristic channels and enhancing the characteristic extraction of the image. The convolution kernel size of the convolutional layer connected to the pooling layer is 1 × 1, the step size is 1, and the number of channels increases from 3 to 32.
The second stage to the fifth stage are all full convolution calculations, which are the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer, respectively. The first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer each include three convolution layers. The convolution kernel size of the first layer and the second layer of each full convolution layer is 3 multiplied by 3, the convolution kernel size of the third layer is 1 multiplied by 1, the convolution step size of the first layer is 2, the convolution step size of the second layer and the third layer is 1, the number of output channels of convolution of the first layer and the second layer is not changed, the number of output channels of the third layer is doubled, and the first layer, the second layer and the third layer of convolution layers are respectively connected with a normalization layer and an activation function layer in sequence. The ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4:8, and is respectively 64, 128, 256 and 512. Normalization layers, including but not limited to an instance normalization layer, an adaptive instance normalization layer, and the like. The nonlinear activation layer includes, but is not limited to, nonlinear activation functions such as ReLU, Leaky ReLU, and the like. The input image size of a plurality of zebra crossing images containing real pixel class labels input into the feature extraction network is 3 x 1920 x 1280(C x H x W, Channel x Height x Width, Channel number x Height x Width), and a plurality of first zebra crossing feature maps with feature map size of 512 x 60 x 40 are obtained through five-stage operation of a pooling layer, a first full convolution layer, a second full convolution layer, a third full convolution layer and a fourth full convolution layer of the feature extraction network.
In this embodiment, through a feature extraction network formed by the pooling layer and the full convolution layers, the feature extraction and the reduction of the feature size of the zebra crossing images containing the real pixel class labels are performed, so as to reduce the parameters of subsequent model calculation, which is beneficial to improving the detection accuracy of the zebra crossing detection model.
In one embodiment, S30, inputting the plurality of first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model respectively in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps, includes:
s310, inputting each first zebra crossing feature map into a transverse dimension space attention network, and carrying out average pooling and longitudinal replication type capacity expansion on the first zebra crossing feature maps to obtain a transverse attention parameter feature map;
s320, inputting each first zebra crossing feature map into a longitudinal dimension space attention network, and carrying out average pooling and transverse duplication type capacity expansion on the first zebra crossing feature maps to obtain a longitudinal attention parameter feature map;
s330, adding the transverse attention parameter feature map and the longitudinal attention parameter feature map pixel by pixel to obtain a spatial attention parameter feature map;
s340, inputting the spatial attention parameter characteristic diagram into a normalization layer, and normalizing the spatial attention parameter to obtain a normalized spatial attention parameter;
and S350, multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to obtain a spatial attention weighted feature map.
In this embodiment, the spatial attention network and the channel attention network form an attention mechanism network, and the two parts are connected in parallel. Through the spatial attention network, the extraction of the spatial features of the zebra crossing can be enhanced. The transverse dimension space attention network and the longitudinal dimension space attention network fully consider the transverse or longitudinal strip shape of the zebra crossing in the image, and further fully consider the characteristics of the zebra crossing to extract the characteristics. The spatial attention part carries out parallel attention calculation from two dimensions of the transverse dimension and the longitudinal dimension respectively, and the feature extraction of the zebra crossing can be enhanced. The input of the spatial attention part is a first zebra crossing feature map output by the feature extraction network, and the feature map size is 512 × 60 × 40.
The spatial attention network calculation process of the transverse dimension is to perform average pooling operation with a pooling step length of (1, 40) on the first zebra crossing feature map, convert the feature map size into 512 × 60 × 1, perform copy type capacity expansion on the averaged pooled feature map along the longitudinal direction, so that feature parameters of each longitudinal direction are the same, obtain a feature map size of 512 × 60 × 40, and obtain a transverse attention parameter feature map.
The space attention network computing process of the longitudinal dimension is similar to the space attention network computing process of the transverse dimension, a first zebra crossing feature map with the output feature map size of 512 x 60 x 40 of the feature extraction network part is input, the average pooling operation with the pooling step length of (60, 1) is carried out on the feature map, the feature map size is converted into 512 x 1 x 40, the average pooled feature map is subjected to copy type capacity expansion along the transverse direction, so that feature parameters of each transverse direction are the same, the feature map size is 512 x 60 x 40, and a longitudinal attention parameter feature map is obtained.
And adding the transverse attention parameter feature map and the longitudinal attention parameter feature map pixel by pixel to obtain an integral spatial attention parameter feature map. Inputting the spatial attention parameter feature map into a normalization layer, normalizing the attention parameter, and multiplying the normalized attention parameter and a first zebra crossing feature map (namely the input feature map of the transverse dimension spatial attention network and the longitudinal dimension spatial attention network) pixel by pixel to realize the feature weighting operation of spatial attention, thereby obtaining a spatial attention weighted feature map, wherein the size of the output feature map is 512 × 60 × 40.
In one embodiment, S30, the parallel inputting of the plurality of first zebra crossing feature maps to the spatial attention network and the channel attention network of the zebra crossing detection model for feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps further includes:
s360, inputting each first zebra crossing feature map into two identical convolution layers respectively, and outputting a first feature map and a second feature map;
s370, respectively performing feature map reshaping on the first feature map and the second feature map to obtain a first reshaped feature map and a second reshaped feature map;
s380, performing feature transposition on the second double-plastic feature map to obtain a second transposed feature map;
s390, performing matrix multiplication operation on the first remolded feature map and the second transposed feature map to obtain a channel feature map, and performing feature map parameter normalization on the channel feature map to obtain a channel dimension attention feature map;
s391, performing feature map remodeling on the first zebra crossing feature map to obtain a first zebra crossing remodeling feature map;
s392, performing matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and performing feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map;
and S393, performing pixel-by-pixel addition on the space attention weighted feature map corresponding to each first zebra crossing feature map and the channel attention weighted feature map to obtain a plurality of second zebra crossing feature maps.
In this embodiment, a channel attention network is formed by two identical convolution layer calculation processes, a feature map remodeling process, a feature transposition process, a matrix multiplication operation process, a feature map parameter normalization process, and a feature map pixel-by-pixel addition process. Through the channel attention network, the interdependency among different channels can be obtained, and the extraction of semantic features is effectively enhanced. The channel attention network operates in parallel with the spatial attention network. The input of the channel attention network is a first zebra crossing feature map with the feature map size of 512 × 60 × 40 output by the feature extraction network. And respectively obtaining two characteristic graphs, namely a first characteristic graph and a second characteristic graph, from the first zebra crossing characteristic graph output by the characteristic extraction network through two convolution layers with convolution kernel sizes of 1 multiplied by 1. The first characteristic diagram and the second characteristic diagram have the same characteristic diagram size, and are 512 multiplied by 60 multiplied by 40. And performing feature map reshaping on the first feature map and the second feature map to form a first reshaping feature map and a second reshaping feature map. The feature sizes of the first reshaped feature and the second reshaped feature were 512 × (60 × 40). And performing feature transposition on the second reshaped feature map to obtain a second transposed feature map, wherein the size of the feature map is changed into 2400 multiplied by 512. And carrying out matrix multiplication operation on the first remolded characteristic diagram and the transposed second remolded characteristic diagram (namely the second transposed characteristic diagram) to obtain a channel characteristic diagram. The feature size of the channel feature is 512 x 512. And after feature map parameter normalization is carried out on the channel feature map, obtaining an attention feature map of the channel dimension, namely the channel dimension attention feature map.
And (3) performing feature map reshaping on the first zebra crossing feature map input by the channel attention network, wherein the feature map is transformed from 512 multiplied by 60 multiplied by 40 to 512 multiplied by 2400, and obtaining a first zebra crossing reshaping feature map. And performing matrix multiplication on the channel dimension attention feature map and the remolded first zebra crossing remolding feature map to obtain a re-weighted feature map with the size of 512 multiplied by 2400, and remolding the feature map of the re-weighted feature map to obtain the channel attention weighted feature map. The feature map size of the channel attention weighted feature map is 512 × 60 × 40, and the feature re-weighting operation in the channel dimension can be realized. And adding the characteristic output of the spatial attention part and the characteristic output of the channel attention part pixel by pixel to obtain a second zebra crossing characteristic diagram with the characteristic diagram size of 512 multiplied by 60 multiplied by 40.
Through the zebra crossing detection method based on the intelligent crossing provided by the application, through increasing a dual attention mechanism, the feature extraction in space is enhanced, the zebra crossing data of the elongated strip-shaped space characteristic can be effectively utilized, the feature extraction in the channel dimension is enhanced simultaneously, the robustness of zebra crossing detection can be improved, the space characteristic of the zebra crossing and the inter-channel characteristic are accurately extracted, the detection accuracy of a zebra crossing detection model is improved, the detection effect on the zebra crossing is improved, and the urban intelligent management is facilitated.
In one embodiment, S40, inputting the plurality of second zebra crossing feature maps into an upsampling network of the zebra crossing detection model for image size recovery, and outputting a predicted pixel class corresponding to each pixel, includes:
s410, inputting each second zebra crossing feature map into 5 deconvolution layers of the upper sampling network to restore the image size to the image size of the original zebra crossing image, and obtaining a plurality of second zebra crossing feature maps with the original image size;
and S420, obtaining a prediction pixel category corresponding to each pixel according to the second zebra crossing feature map of each original image size.
In this embodiment, the upsampling part is a third part of the zebra crossing detection model, and the upsampling part recovers the image size in a deconvolution manner. The input of the up-sampling part is a second zebra crossing feature diagram formed after passing through the space attention network and the channel attention network. The second zebra crossing feature map is a feature map with the size of 512 × 60 × 40, and the second zebra crossing feature map is restored to the original input image size through a deconvolution process, which can be understood as the image size originally input to the zebra crossing detection model, and can also be understood as the image size input to the feature extraction network, that is, 3 × 1920 × 1280. The convolution kernel size of the 5 deconvolution layers was 3 × 3, the convolution step size was 2.
The zebra detection model comprises a feature extraction network, a space attention network, a channel attention network and an up-sampling network. And inputting a plurality of zebra crossing images containing real pixel class labels into a feature extraction network, parallelly passing through a space attention network and a channel attention network, and then passing through an up-sampling network to output a predicted pixel class corresponding to each pixel in a second zebra crossing feature image of the original image size.
In one embodiment, S50, constructing a loss function according to the real pixel class and the predicted pixel class, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model, includes:
s510, constructing a loss function according to the real pixel category and the prediction pixel category, wherein the loss function comprises:
Figure BDA0003684368210000171
where C represents the number of classes of pixel classes, P c Representing the probability of predicting a pixel class as either a background class or a zebra crossing class, y c Representing a real pixel class.
And S520, performing model training and optimization on the zebra crossing detection model according to the loss function to obtain the trained zebra crossing detection model.
In this embodiment, the loss function may adopt a cross entropy loss function. And performing model training and optimization on the zebra crossing detection model according to the loss function to obtain the trained zebra crossing detection model, predicting the pixel category of each pixel point in the zebra crossing image to be detected, and further obtaining the specific position of the zebra crossing.
Referring to fig. 2, the present application provides an intelligent intersection-based zebra crossing detection system 100. The intelligent intersection-based zebra crossing detection system 100 comprises an image data acquisition module 10, a feature extraction network module 20, an attention network module 30, an upsampling module 40, a model training module 50 and a detection module 60. The image data obtaining module 10 is configured to obtain a zebra crossing image dataset, where the zebra crossing image dataset includes a plurality of zebra crossing images containing real pixel class labels. The feature extraction network module 20 is configured to input the plurality of zebra crossing images containing the real pixel class labels to a feature extraction network of the zebra crossing detection model for feature extraction, and output a plurality of first zebra crossing feature maps. The attention network module 30 is configured to input the plurality of first zebra crossing feature maps to a space attention network and a channel attention network of the zebra crossing detection model in parallel for feature extraction and fusion, and output a plurality of second zebra crossing feature maps.
The up-sampling module 40 is configured to input the plurality of second zebra crossing feature maps to an up-sampling network of the zebra crossing detection model for image size recovery, and output a predicted pixel class corresponding to each pixel.
The model training module 50 is configured to construct a loss function according to the real pixel class and the predicted pixel class, and perform model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model. The detection module 60 is configured to detect a position of a zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
In the present embodiment, the description of the image data acquisition module 10 may refer to the description of S10 in the above embodiment. The description of the feature extraction network module 20 may refer to the description of S20 in the above embodiment. The related description of the attention network module 30 may refer to the related description of S30 in the above embodiment. The description of the up-sampling module 40 may refer to the description of S40 in the above embodiment. The relevant description of the model training module 50 may refer to the relevant description of S50 in the above embodiment. The related description of the detection module 60 can refer to the related description of S60 in the above embodiment.
In one embodiment, the image data acquisition module 10 includes a first zebra crossing image dataset acquisition module, a second zebra crossing image dataset acquisition module, a recurrent countermeasure network module, a second zebra crossing image dataset acquisition module, and a zebra crossing image dataset acquisition module. The first zebra crossing image data set acquisition module is used for acquiring a first image data set containing zebra crossings, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to acquire the first zebra crossing image data set. The second image data set acquisition module is used for acquiring a second image data set containing zebra crossings, and the second image data set and the first image data set are image data sets acquired under different scenes. And the circularly generated confrontation network module is used for carrying out model training on the circularly generated confrontation network according to the first image data set and the second image data set to obtain the circularly generated confrontation network after training. And the second zebra crossing image data set acquisition module is used for inputting the first zebra crossing image data set to a trained cyclic generation countermeasure network and outputting a second zebra crossing image data set under different simulated scenes. The zebra crossing image data set acquisition module is used for acquiring a zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
In this embodiment, the relevant description of the first zebra crossing image data set acquisition module may refer to the relevant description of S110 in the above embodiment. The relevant description of the second image data set acquisition module can refer to the relevant description of S120 in the above embodiment. The relevant description of the loop generation countermeasure network module can refer to the relevant description of S130 in the above embodiment. The relevant description of the second zebra crossing image data set acquisition module can refer to the relevant description of S140 in the above embodiment. The relevant description of the zebra crossing image dataset acquisition module can refer to the relevant description of S150 in the above embodiment.
In one embodiment, the feature extraction network module 20 includes a pooling layer module and a full convolution layer module. The pooling layer module is used for performing feature extraction on a plurality of zebra crossing images containing real pixel class labels according to a pooling layer of the feature extraction network to obtain a first feature map set. The full convolution layer module is used for extracting the features of the first feature map set according to the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer of the feature extraction network to obtain a plurality of first zebra crossing feature maps.
Wherein the ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
In this embodiment, the relevant description of the pooling layer module may refer to the relevant description of S210 in the above embodiment. The related description of the full convolutional layer module may refer to the related description of S220 in the above embodiment.
In one embodiment, the attention network module 30 includes a lateral attention parameter profile acquisition module, a longitudinal attention parameter profile acquisition module, a spatial attention parameter profile acquisition module, a normalized spatial attention parameter acquisition module, and a spatial attention weighted profile acquisition module. The transverse attention parameter characteristic diagram acquisition module is used for inputting each first zebra crossing characteristic diagram into a transverse dimension space attention network, and performing average pooling and longitudinal replication type capacity expansion on the first zebra crossing characteristic diagrams to acquire the transverse attention parameter characteristic diagram. The longitudinal attention parameter characteristic diagram acquisition module is used for inputting each first zebra crossing characteristic diagram into a longitudinal dimension space attention network, and performing average pooling and transverse replication type capacity expansion on the first zebra crossing characteristic diagrams to acquire a longitudinal attention parameter characteristic diagram.
The spatial attention parameter characteristic diagram obtaining module is used for adding the transverse attention parameter characteristic diagram and the longitudinal attention parameter characteristic diagram pixel by pixel to obtain a spatial attention parameter characteristic diagram. The normalized spatial attention parameter acquisition module is used for inputting the spatial attention parameter characteristic diagram into the normalization layer, and normalizing the spatial attention parameter to acquire a normalized spatial attention parameter. The spatial attention weighted feature map acquisition module is used for multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to acquire a spatial attention weighted feature map.
In this embodiment, the related description of the lateral attention parameter feature map obtaining module may refer to the related description of S310 in the above embodiment. The related description of the longitudinal attention parameter feature map acquisition module may refer to the related description of S320 in the above embodiment. The related description of the spatial attention parameter feature map acquisition module may refer to the related description of S330 in the above embodiment. The relevant description of the normalized spatial attention parameter obtaining module may refer to the relevant description of S340 in the above embodiment. The related description of the spatial attention weighted feature map acquisition module may refer to the related description of S350 in the above embodiment.
In one embodiment, the attention network module 30 further includes a convolution module, a first feature reshaping module, a feature transposing module, a channel dimension attention feature map acquisition module, a second feature reshaping module, a channel attention weighted feature map acquisition module, and a second zebra crossing feature map acquisition module. The convolution module is used for respectively inputting each first zebra crossing feature map into two identical convolution layers and outputting a first feature map and a second feature map. The first feature reshaping module is used for respectively reshaping the first feature map and the second feature map to obtain a first reshaped feature map and a second reshaped feature map. The feature transposing module is used for performing feature transposing on the second reshaped feature map to obtain a second transposed feature map. The channel dimension attention feature map acquisition module is used for performing matrix multiplication operation on the first remolded feature map and the second transposed feature map to obtain a channel feature map, and performing feature map parameter normalization on the channel feature map to obtain a channel dimension attention feature map.
The second characteristic reshaping module is used for performing characteristic map reshaping on the first zebra crossing characteristic map to obtain a first zebra crossing reshaping characteristic map. The channel attention weighted feature map acquisition module is used for carrying out matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and carrying out feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map. The second zebra crossing feature map acquisition module is used for performing pixel-by-pixel addition on the space attention weighted feature map and the channel attention weighted feature map corresponding to each first zebra crossing feature map to obtain a plurality of second zebra crossing feature maps.
In this embodiment, the description of the convolution module may refer to the description of S360 in the embodiment. The related description of the first feature reshaping module may refer to the related description of S370 in the embodiment. The relevant description of the feature transpose module can refer to that of S380 in the embodiment. The relevant description of the channel dimension attention feature map acquisition module can refer to the relevant description of S390 in the embodiment. The related description of the second feature reshaping module can refer to the related description of S391 in the embodiment. The relevant description of the channel attention weighted feature map acquisition module may refer to the relevant description of S392 in the embodiment. The related description of the second zebra crossing feature map acquisition module can refer to the related description of S393 in the embodiment.
In one embodiment, the upsampling module includes an image size recovery module and a pixel class acquisition module. And the image size recovery module is used for inputting each second zebra crossing feature map into 5 deconvolution layers of the upper sampling network to recover the image size to the image size of the original zebra crossing image, and obtaining a plurality of second zebra crossing feature maps with the original image size. The pixel category obtaining module is used for obtaining a prediction pixel category corresponding to each pixel according to the second zebra crossing feature map of each original image size.
In this embodiment, the related description of the image size recovery module may refer to the related description of S410 in the above embodiment. The related description of the pixel class acquisition module may refer to the related description of S420 in the above embodiment.
In the various embodiments described above, the particular order or hierarchy of steps in the processes disclosed is an example of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of steps in the processes may be rearranged without departing from the scope of the present disclosure. The accompanying method claims present elements of the various steps in a sample order, and are not intended to be limited to the specific order or hierarchy.
Those of skill in the art will also appreciate that the various illustrative logical blocks, modules, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design requirements of the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
The various illustrative logical blocks, or modules, described in the embodiments herein may be implemented or operated by a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in the embodiments herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. For example, a storage medium may be coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC, which may be located in a user terminal. In the alternative, the processor and the storage medium may reside in different components in a user terminal.
The above-mentioned embodiments, objects, technical solutions and advantages of the present application are described in further detail, it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present application, and are not intended to limit the scope of the present application, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present application should be included in the scope of the present application.

Claims (12)

1. A zebra crossing detection method based on an intelligent crossing is characterized by comprising the following steps:
acquiring a zebra crossing image data set, wherein the zebra crossing image data set comprises a plurality of zebra crossing images containing real pixel class labels;
inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction, and outputting a plurality of first zebra crossing feature maps;
respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps;
inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery, and outputting a predicted pixel category corresponding to each pixel;
constructing a loss function according to the real pixel category and the predicted pixel category, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model;
and detecting the position of the zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
2. The method for detecting the zebra crossing based on the intelligent crossing as claimed in claim 1, wherein the acquiring the zebra crossing image dataset, the zebra crossing image dataset comprising a plurality of zebra crossing images with real pixel class labels, comprises:
acquiring a first image data set containing zebra stripes, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to obtain a first zebra stripe image data set;
acquiring a second image data set containing zebra stripes, wherein the second image data set and the first image data set are image data sets acquired under different scenes;
performing model training on the circularly generated countermeasure network according to the first image data set and the second image data set to obtain a trained circularly generated countermeasure network;
inputting the first zebra crossing image data set to the trained loop generation countermeasure network, and outputting a second zebra crossing image data set under different simulated scenes;
and obtaining the zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
3. The method for detecting the zebra crossing based on the intelligent crossing according to claim 1, wherein the step of inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction and outputting a plurality of first zebra crossing feature maps comprises the steps of:
according to the pooling layer of the feature extraction network, feature extraction is carried out on the zebra crossing images containing the real pixel class labels to obtain a first feature map set;
according to the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer of the feature extraction network, feature extraction is carried out on the first feature map set to obtain a plurality of first zebra crossing feature maps;
wherein a ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
4. The zebra crossing detection method based on intelligent intersections according to claim 1, wherein the step of inputting the plurality of first zebra crossing feature maps to the spatial attention network and the channel attention network of the zebra crossing detection model in parallel for feature extraction and fusion respectively and outputting a plurality of second zebra crossing feature maps comprises:
inputting each first zebra crossing characteristic diagram into a transverse dimension space attention network, and carrying out average pooling and longitudinal replication type capacity expansion on the first zebra crossing characteristic diagrams to obtain transverse attention parameter characteristic diagrams;
inputting each first zebra crossing characteristic diagram into a longitudinal dimension space attention network, and carrying out average pooling and transverse replication type capacity expansion on the first zebra crossing characteristic diagrams to obtain longitudinal attention parameter characteristic diagrams;
adding the transverse attention parameter feature map and the longitudinal attention parameter feature map pixel by pixel to obtain a spatial attention parameter feature map;
inputting the spatial attention parameter characteristic diagram into a normalization layer, and normalizing the spatial attention parameter to obtain a normalized spatial attention parameter;
and multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to obtain a spatial attention weighted feature map.
5. The method according to claim 4, wherein the step of inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a second zebra crossing feature maps further comprises:
inputting each first zebra crossing feature map into two identical convolution layers respectively, and outputting a first feature map and a second feature map;
respectively performing feature map reshaping on the first feature map and the second feature map to obtain a first reshaped feature map and a second reshaped feature map;
performing feature transposition on the second repeated plastic feature map to obtain a second transposed feature map;
performing matrix multiplication operation on the first remolded characteristic diagram and the second transposed characteristic diagram to obtain a channel characteristic diagram, and performing characteristic diagram parameter normalization on the channel characteristic diagram to obtain a channel dimension attention characteristic diagram;
performing feature map reshaping on the first zebra crossing feature map to obtain a first zebra crossing reshaping feature map;
performing matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and performing feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map;
and performing pixel-by-pixel addition on the space attention weighted feature map and the channel attention weighted feature map corresponding to each first zebra crossing feature map to obtain a plurality of second zebra crossing feature maps.
6. The method according to claim 1, wherein the step of inputting the second zebra crossing feature maps into an upsampling network of the zebra crossing detection model for image size recovery and outputting a predicted pixel class corresponding to each pixel comprises:
inputting each second zebra crossing feature map into 5 deconvolution layers of an upper sampling network to restore the image size to the original zebra crossing image size, and obtaining a plurality of second zebra crossing feature maps with the original image size;
and obtaining a prediction pixel class corresponding to each pixel according to the second zebra crossing feature map of each original image size.
7. The utility model provides a zebra crossing detecting system based on intelligence crossing which characterized in that includes:
the system comprises an image data acquisition module, a pixel classification module and a pixel classification module, wherein the image data acquisition module is used for acquiring a zebra crossing image dataset which comprises a plurality of zebra crossing images containing real pixel class labels;
the feature extraction network module is used for inputting the zebra crossing images containing the real pixel class labels into a feature extraction network of a zebra crossing detection model for feature extraction and outputting a plurality of first zebra crossing feature maps;
the attention network module is used for respectively inputting the first zebra crossing feature maps into a space attention network and a channel attention network of the zebra crossing detection model in parallel to perform feature extraction and fusion, and outputting a plurality of second zebra crossing feature maps;
the up-sampling module is used for inputting the second zebra crossing feature maps into an up-sampling network of the zebra crossing detection model for image size recovery and outputting a prediction pixel category corresponding to each pixel;
the model training module is used for constructing a loss function according to the real pixel class and the predicted pixel class, and performing model training and optimization on the zebra crossing detection model according to the loss function to obtain a trained zebra crossing detection model;
and the detection module is used for detecting the position of the zebra crossing in the zebra crossing image to be detected according to the trained zebra crossing detection model.
8. The intelligent intersection-based zebra crossing detection system of claim 7, wherein the image data acquisition module comprises:
the first zebra crossing image data set acquisition module is used for acquiring a first image data set containing zebra crossings, and performing pixel-by-pixel semantic segmentation and labeling on the first image data set to acquire a first zebra crossing image data set;
the second image data set acquisition module is used for acquiring a second image data set containing zebra crossings, wherein the second image data set and the first image data set are image data sets acquired under different scenes;
the cyclic generation confrontation network module is used for carrying out model training on the cyclic generation confrontation network according to the first image data set and the second image data set to obtain the trained cyclic generation confrontation network;
the second zebra crossing image data set acquisition module is used for inputting the first zebra crossing image data set to the trained cyclic generation countermeasure network and outputting a second zebra crossing image data set under different simulated scenes;
and the zebra crossing image data set acquisition module is used for acquiring the zebra crossing image data set according to the first zebra crossing image data set and the second zebra crossing image data set.
9. The system of claim 7, wherein the feature extraction network module comprises:
the pooling layer module is used for performing feature extraction on the zebra crossing images containing the real pixel class labels according to a pooling layer of the feature extraction network to obtain a first feature atlas;
the full convolution layer module is used for performing feature extraction on the first feature map set according to a first full convolution layer, a second full convolution layer, a third full convolution layer and a fourth full convolution layer of the feature extraction network to obtain a plurality of first zebra crossing feature maps;
wherein a ratio of the number of output channels of the first full convolution layer, the second full convolution layer, the third full convolution layer and the fourth full convolution layer is 1:2:4: 8.
10. The intelligent intersection-based zebra crossing detection system of claim 7, wherein the attention network module comprises:
a transverse attention parameter feature map obtaining module, configured to input each first zebra crossing feature map into a transverse dimension space attention network, and perform average pooling and longitudinal replication type capacity expansion on the first zebra crossing feature maps to obtain a transverse attention parameter feature map;
a longitudinal attention parameter feature map obtaining module, configured to input each first zebra crossing feature map into a longitudinal dimension spatial attention network, and perform average pooling and transverse replication type expansion on the first zebra crossing feature maps to obtain a longitudinal attention parameter feature map;
the spatial attention parameter characteristic map acquisition module is used for performing pixel-by-pixel addition on the transverse attention parameter characteristic map and the longitudinal attention parameter characteristic map to obtain a spatial attention parameter characteristic map;
the normalized spatial attention parameter acquisition module is used for inputting the spatial attention parameter characteristic diagram into a normalization layer, and normalizing the spatial attention parameter to obtain a normalized spatial attention parameter;
and the spatial attention weighted feature map acquisition module is used for multiplying the normalized spatial attention parameter and the first zebra crossing feature map pixel by pixel to acquire a spatial attention weighted feature map.
11. The intelligent intersection-based zebra crossing detection system of claim 10, wherein the attention network module further comprises:
the convolution module is used for respectively inputting each first zebra crossing feature map into two identical convolution layers and outputting a first feature map and a second feature map;
the first feature reshaping module is used for respectively reshaping the first feature map and the second feature map to obtain a first reshaping feature map and a second reshaping feature map;
the characteristic transposition module is used for performing characteristic transposition on the second remolded characteristic diagram to obtain a second transposed characteristic diagram;
a channel dimension attention feature map obtaining module, configured to perform matrix multiplication operation on the first remolded feature map and the second transposed feature map to obtain a channel feature map, and perform feature map parameter normalization on the channel feature map to obtain a channel dimension attention feature map;
the second characteristic reshaping module is used for performing characteristic map reshaping on the first zebra crossing characteristic map to obtain a first zebra crossing reshaping characteristic map;
the channel attention weighted feature map acquisition module is used for performing matrix multiplication operation on the channel dimension attention feature map and the first zebra crossing remodeling feature map to obtain a re-weighted feature map, and performing feature map remodeling on the re-weighted feature map to obtain a channel attention weighted feature map;
a second zebra crossing feature map obtaining module, configured to perform pixel-by-pixel addition on the spatial attention weighted feature map and the channel attention weighted feature map corresponding to each first zebra crossing feature map, so as to obtain multiple second zebra crossing feature maps.
12. The intelligent intersection-based zebra crossing detection system of claim 7, wherein the upsampling module comprises:
the image size recovery module is used for inputting each second zebra crossing feature map into 5 deconvolution layers of an upper sampling network to recover the image size to the image size of the original zebra crossing image, and obtaining a plurality of second zebra crossing feature maps with the original image size;
and the pixel category acquisition module is used for acquiring the predicted pixel category corresponding to each pixel according to the second zebra crossing feature map of each original image size.
CN202210652919.3A 2022-06-08 2022-06-08 Zebra crossing detection method and system based on intelligent intersection Pending CN115115915A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210652919.3A CN115115915A (en) 2022-06-08 2022-06-08 Zebra crossing detection method and system based on intelligent intersection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210652919.3A CN115115915A (en) 2022-06-08 2022-06-08 Zebra crossing detection method and system based on intelligent intersection

Publications (1)

Publication Number Publication Date
CN115115915A true CN115115915A (en) 2022-09-27

Family

ID=83326738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210652919.3A Pending CN115115915A (en) 2022-06-08 2022-06-08 Zebra crossing detection method and system based on intelligent intersection

Country Status (1)

Country Link
CN (1) CN115115915A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132956A (en) * 2023-09-21 2023-11-28 阜阳交通能源投资有限公司 Road rapid damage detection method and system based on multi-head attention mechanism
CN117152552A (en) * 2023-07-27 2023-12-01 至本医疗科技(上海)有限公司 Method, apparatus and medium for training a model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117152552A (en) * 2023-07-27 2023-12-01 至本医疗科技(上海)有限公司 Method, apparatus and medium for training a model
CN117132956A (en) * 2023-09-21 2023-11-28 阜阳交通能源投资有限公司 Road rapid damage detection method and system based on multi-head attention mechanism

Similar Documents

Publication Publication Date Title
CN111368687B (en) Sidewalk vehicle illegal parking detection method based on target detection and semantic segmentation
CN112084901B (en) GCAM-based high-resolution SAR image airport runway area automatic detection method and system
Tourani et al. A robust deep learning approach for automatic iranian vehicle license plate detection and recognition for surveillance systems
CN110889449A (en) Edge-enhanced multi-scale remote sensing image building semantic feature extraction method
CN110766098A (en) Traffic scene small target detection method based on improved YOLOv3
CN115115915A (en) Zebra crossing detection method and system based on intelligent intersection
CN111104903A (en) Depth perception traffic scene multi-target detection method and system
CN114677507A (en) Street view image segmentation method and system based on bidirectional attention network
CN115063786A (en) High-order distant view fuzzy license plate detection method
CN112699889A (en) Unmanned real-time road scene semantic segmentation method based on multi-task supervision
Meng et al. A block object detection method based on feature fusion networks for autonomous vehicles
Pham Semantic road segmentation using deep learning
CN114898243A (en) Traffic scene analysis method and device based on video stream
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN114519819A (en) Remote sensing image target detection method based on global context awareness
Lin et al. A lightweight, high-performance multi-angle license plate recognition model
CN116229406B (en) Lane line detection method, system, electronic equipment and storage medium
CN112597996A (en) Task-driven natural scene-based traffic sign significance detection method
CN114708560B (en) YOLOX algorithm-based illegal parking detection method and system
CN115909140A (en) Video target segmentation method and system based on high-order video monitoring
Xu et al. SPNet: Superpixel pyramid network for scene parsing
Rahmani et al. IR-LPR: A Large Scale Iranian License Plate Recognition Dataset
CN114627400A (en) Lane congestion detection method and device, electronic equipment and storage medium
Li et al. Infrared Small Target Detection Algorithm Based on ISTD-CenterNet.
Toha et al. DhakaNet: unstructured vehicle detection using limited computational resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination