CN115578615A

CN115578615A - Night traffic sign image detection model establishing method based on deep learning

Info

Publication number: CN115578615A
Application number: CN202211342707.1A
Authority: CN
Inventors: 岳希; 赵海涛; 唐聃; 高燕
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2022-10-31
Filing date: 2022-10-31
Publication date: 2023-01-06
Anticipated expiration: 2042-10-31
Also published as: CN115578615B

Abstract

The invention discloses a night traffic sign image detection model establishing method based on deep learning, which comprises the steps of collecting images containing night traffic signs as a training set; clustering prior frames; an attention mechanism is introduced into a YOLOv3 feature extraction network to obtain an improved YOLOv3 network; the attention mechanism realizes spatial attention through a shallow characteristic diagram and realizes channel attention through a deep characteristic diagram; determining a frame loss function; and inputting the training set into an improved YOLOv3 network for training to obtain a night traffic sign image detection model. The invention provides a night traffic sign image detection model establishing method based on deep learning, which aims to solve the problems of inaccurate positioning, difficult feature extraction, high identification failure rate and the like of an image detection model in the prior art in night traffic sign identification, and achieves the purposes of improving the identification effect of a night traffic sign and improving the identification accuracy rate.

Description

Night traffic sign image detection model establishing method based on deep learning

Technical Field

The invention relates to the field of traffic sign detection and identification, in particular to a night traffic sign image detection model establishing method based on deep learning.

Background

With the gradual maturity of various target detection frameworks and the appearance of large foreign traffic sign data sets, many traffic sign detection and identification empirical studies based on deep learning have appeared, the research results show the superiority of deep learning in the field of traffic sign detection and identification, and the method for detecting and identifying traffic signs based on deep learning becomes the current mainstream of research. However, when the current mature deep learning model is adopted to detect the traffic sign, the failure rate of the traffic sign sample identification and detection under the low-light condition such as night is found to be far greater than that under the sufficient-light condition. The detection failure is represented by missed detection, inaccurate positioning and low confidence.

The inventor finds that at night, factors such as light intensity around the traffic sign can be reflected on the traffic sign image, and the factors are shown as obvious difference of the identification degree of the sample in the image, and particularly when other strong light sources exist around the traffic sign at night, the traffic sign is difficult to be separated from a scene. Therefore, the traditional image detection model is difficult to meet the actual application requirements. In the intelligent transportation application, many complex night traffic sign recognition working conditions inevitably occur, so that it is necessary to deeply research a traffic sign detection and recognition method under the night condition to improve the effect of the traffic sign detection and recognition method in practical application.

Disclosure of Invention

The invention provides a night traffic sign image detection model establishing method based on deep learning, which aims to solve the problems of inaccurate positioning, difficult feature extraction, high identification failure rate and the like of an image detection model in the prior art in night traffic sign identification, and achieves the purposes of improving the identification effect of a night traffic sign and improving the identification accuracy rate.

The invention is realized by the following technical scheme:

the night traffic sign image detection model building method based on deep learning comprises the following steps:

collecting images containing night traffic signs as a training set;

clustering prior frames;

an attention mechanism is introduced into a YOLOv3 feature extraction network to obtain an improved YOLOv3 network; the attention mechanism realizes space attention through a shallow feature map and realizes channel attention through a deep feature map;

determining a frame loss function;

and inputting the training set into an improved YOLOv3 network for training to obtain a night traffic sign image detection model.

Aiming at the problems of inaccurate positioning, difficult feature extraction, high recognition failure rate and the like of an image detection model in night traffic sign recognition in the prior art, the invention provides a night traffic sign image detection model building method based on deep learning. The deep learning network used in the method takes YOLOv3 as a basic framework and is improved in the introduced attention mechanism.

The inventor finds that in the process of research, in the generation of a night traffic sign identification model, a deep network space is small in characteristic diagram and large in channel number; in the night blurred and insufficient lighting condition picture, the extracted channel weight is high in generality and falls on some specific features. Therefore, the method realizes the space attention through the shallow layer feature map and the channel attention through the deep layer feature map, namely, the deep layer feature network is adopted to realize the weight distribution of the channel attention system, the shallow layer feature network realizes the weight distribution of the space attention system, compared with the traditional CBAM attention system, the method has the advantages that the number of extracted channel attention channels is large, and the weight can be fully embodied, so that the accuracy of detecting and identifying the night traffic sign is more easily improved.

Further, the attention mechanism is output by the following method:

determining a deep feature map from a deep network layer and a shallow feature map from a shallow network layer in a feature map generated by a YOLOv3 network residual error structure; wherein the shallow profile is twice as large as the deep profile;

taking the deep feature map as a first input of an attention mechanism to obtain a brand new feature map with channel importance difference as a first feature map;

taking the shallow feature map as a second input of the attention mechanism to obtain a brand new feature map with a spatial pixel weight relationship as a second feature map;

and performing connection operation on the first characteristic diagram and the second characteristic diagram to obtain the output of the attention mechanism.

The scheme makes more detailed limitation on the operation process of the improved attention mechanism. As the prior art in the field, the trunk feature extraction network of YOLOv3 is Darknet53, the Residual network is Residual, the whole trunk part is formed by Residual convolution, and the repetition is performed by using a Resblock _ body Residual structure. After the features of the image are extracted through a convolutional network, a residual error structure can generate a plurality of channel feature maps, the scheme respectively processes feature maps from a deep network layer and a shallow network layer, and the condition that the shallow layer feature map is a branch twice as large as the deep layer feature map is met, namely when the deep layer feature map is 13 multiplied by 13, the shallow layer feature map is 26 multiplied by 26. It can be seen that in the scheme, the deep layer feature network is adopted to realize the weight distribution of the channel attention mechanism to obtain a first feature map, the shallow layer feature network is adopted to realize the weight distribution of the spatial attention mechanism to obtain a second feature map, and finally the first feature map and the second feature map are connected to serve as the output of the improved attention mechanism.

Further, the method for obtaining the first feature map comprises the following steps:

performing maximum pooling and average pooling polling operations on the deep feature map to obtain a maximum value and an average value of each channel;

performing global average pooling and global maximum pooling to obtain channel characteristic quantities of two dimensions of average pooling and maximum pooling;

sharing the full connection layer of the channel characteristic quantities of the two dimensions, keeping the dimension of the characteristic quantities unchanged in the sharing process, and adding the two shared results to obtain a global channel characteristic quantity; carrying out nonlinear activation on the global channel characteristic quantity by using Sigmoid function to obtain C ₁ Channel weight coefficients of x1 dimensions; wherein, C ₁ The number of channels for inputting the deep characteristic diagram;

carrying out micro-step convolution on the deep characteristic diagram to obtain a third characteristic diagram;

and multiplying the third feature map by the channel weight coefficient to obtain the first feature map.

Further, the method for performing micro-step convolution on the deep layer feature map comprises the following steps:

and (3) filling the deep layer characteristic diagram inside: filling the average value of every two characteristic points in the X coordinate direction and the Y coordinate direction of the characteristic diagram, and copying the last characteristic point at the tail end of the X coordinate direction and the Y coordinate direction; when a certain characteristic point has no corresponding reference point in the X coordinate direction or the Y coordinate direction, taking the average value of four diagonal angles for filling;

convolution with convolution kernel size of 3 × 3 is performed.

The non-corresponding reference point refers to a feature point which is located on the boundary and has no corresponding matching outward (up, down, left and right) and can form the aforementioned "two feature points", and for such a feature point, the average value of four diagonal angles of the feature map is filled between the feature point and the end of the feature map.

For the channel attention mechanism, in the up-sampling process of channel convolution, the existing mode is to directly make a simple copy and fill on the boundary of a feature map, which destroys the original spatial distribution deep semantic information of a night picture, and makes the accuracy of night traffic sign detection obviously reduced under the conditions of dim picture, not clear enough and simple up-sampling under the condition of weak illumination. In the scheme, the improved microstep convolution is adopted to replace the gating effect of up-sampling, the mean value of two points is filled between the characteristic points, the spatial structure and upper and lower semantic information of a deep network are reserved to the maximum extent, the detection of conditions such as missing detection, limited pixels, small target detection and the like is stably improved, and the problems of the existing night traffic sign picture identification technology are greatly improved.

Further, the method for obtaining the second feature map includes:

at C ₂ Local maximum coding and local average coding are carried out on the characteristic points of the shallow characteristic diagram on each channel to respectively obtain a coding result F _pmax ,F _pavg ；

Are respectively at F _pmax ,F _pavg Generating the number of characteristic points H along the channel direction ₂ ×W ₂ The feature vector of (2); wherein H ₂ For inputting high, W of shallow feature map ₂ Is the width of the input shallow feature map;

using C ₂ X1 vector convolution pair H ₂ ×W ₂ Carrying out convolution processing on the vectors to obtain two weight graphs subjected to coding and convolution;

the mapping of the two weight graphs is forwarded to a convolutional layer, a weight graph with only one channel is generated, and a spatial attention weight matrix is obtained by using a Sigmoid function for activation;

and multiplying the shallow feature map by a space attention weight matrix to obtain the second feature map.

Further, the feature tensor of the shallow feature map is

wherein F₂ Is the shallow feature map of the input. As is common knowledge in the art, those skilled in the art will understand that where R is the real set notation of the feature tensor, there is no specific meaning.

Further, in the process of local maximum coding and local average coding, the width of the adopted filter is D, the height of the adopted filter is D, and the step length of the adopted filter is 1; wherein,

r is a ratio.

The inventor finds that the detection accuracy rate of the traffic sign sample under the night weak illumination condition is greatly lower than that under the sufficient illumination condition in the research process, and the reason is mainly that the picture characteristic expression form is not clear enough, and the characteristic extraction network cannot effectively extract the spatial structure information of the characteristic. In order to overcome the problems, when the space attention of the shallow network is realized, the method of taking the maximum value and the average value on the channel of each feature point in the prior art is abandoned, and the local maximum coding and the local average coding with the filter of D multiplied by D and the step length of 1 are respectively carried out on each channel, so that the perception field of view is increased, the local feature of the night traffic sign image after image preprocessing is enhanced, and the problem of loss of the deep feature caused by taking the maximum value and the average value on the channel feature points in the prior art is solved.

And, this scheme adopts C ₂ The vector convolution of the x1 carries out weight two-dimensional convolution transformation on the vector formed by each channel feature point, can effectively capture the upper and lower semantic features of the same feature point of different channels, enhances the expression capability of a weight map, improves the detection capability of an algorithm model on the prominent features of a preprocessed night dim image, and further improves the recognition effect of a night traffic sign.

Further, the clustering prior frame is realized by adopting a K-Means + + algorithm, and a centroid calculation formula of the adopted K-Means + + algorithm is as follows:

Δx＝|C _i-1 -x|

in the formula ：C_i Is the clustering center, x is the sample point, | C _i I is the number of samples of this class, C _i-1 For the clustering center of the last iteration, Δ x is the sample point and C _i-1 Manhattan distance, Δ x _M Is the median M and C _i-1 Manhattan distance of (a).

In the existing K-Means + + algorithm, the centroid of a cluster is taken as a clustering center and added into the next round of calculation, so that a small amount of data can greatly influence the average value, the result is unstable and even wrong, and the accurate detection of the traffic sign at night is difficult. In order to overcome the problems, the scheme optimizes the centroid calculation of the K-Means + + algorithm, performs weight adjustment when electing the centroid, and adds the median M as a critical point, so that the problem that the prior art is sensitive to noise and isolated point data is solved, and the method is more favorable for accurately detecting the traffic sign at night.

Further, the determined frame loss function comprises a confidence error, a positioning error and a classification error;

wherein, the confidence error and the classification error are calculated by adopting cross entropy error;

the positioning error is calculated by adopting the following method:

calculating the intersection ratio IOU of the prediction frame and the real frame;

calculating target location loss L _loc ：

in the formula ：b、b^gt respectively the central points of the real frame and the prediction frame, rho (-) represents the Euclidean distance, c is b and b ^gt Diagonal distance of minimum circumscribed rectangle, h ^gt The diagonal lines of the prediction frame and the real frame are respectively, alpha and v are influence factors, and beta is the diagonal angle between the prediction frame and the real frame.

When the positioning loss is calculated by the loss function, the fact that the prediction frame moves towards the target frame is considered, meanwhile, the central point distance and the overlapping area of the boundary frame need to be considered, and the angle between the diagonal line of the target frame and the diagonal line of the anchor frame is also very important, so that the scheme introduces the influence factors alpha and v, and the ratio and the included angle of the diagonal lines of the prediction frame and the real frame are considered in the calculation of the loss function.

Further, the method also comprises the following steps of preprocessing the images in the training set:

calculating the brightness of the image, and if the brightness is lower than 130, introducing self-adaptive Gamma correction for image brightness enhancement; in the introduced adaptive Gamma correction, the Gamma value γ is calculated by:

wherein X is the image average brightness;

acquiring the corner feature of the image by using a Shi-Tomasi corner detection algorithm;

and performing data enhancement on the night traffic sign images with the sample number less than the set threshold value.

Aiming at the phenomenon that the detection of the night traffic sign picture fails in the prior art, the scheme also provides a special image preprocessing mode. The core idea is to introduce image brightness self-adaptive adjustment in self-adaptive Gamma correction and adjust the brightness and contrast of the image. The contrast of the image refers to the interval, and the discrimination between the traffic sign and the background is improved by increasing the interval between the pixel values while increasing the pixel values. The images with different illumination degrees are subjected to self-adaptive processing, and the brightness and the contrast of the images are adjusted according to the reasonable pixel distribution range of the images under different illumination conditions. The image of the traffic sign is mostly triangular and rectangular, the outline of the traffic sign pattern has an obvious angle, the feature expression of the salient angle area is detected, the distinguishing degree of the image is extremely high, stable features can be accurately positioned, the corner features of the night traffic sign image are obtained, and then the features are fused into the training data set of the convolutional neural network, so that the feature extraction effect of the model on the night traffic sign is more remarkably improved.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the night traffic sign image detection model building method based on deep learning adopts a deep layer feature network to realize the weight distribution of a channel attention mechanism, adopts a shallow layer feature network to realize the weight distribution of a space attention mechanism, and connects the deep layer feature network and the shallow layer feature network to be used as the output of the improved attention mechanism.

2. The night traffic sign image detection model building method based on deep learning adopts improved micro-step convolution to replace the gating effect of traditional sampling for a channel attention mechanism, fills the mean value of two points between characteristic points, furthest reserves the spatial structure and upper and lower semantic information of a deep network, stably improves the detection of the conditions of missing detection, limited pixels, small target detection and the like, and greatly improves the problems of the existing night traffic sign image identification technology.

3. The night traffic sign image detection model building method based on deep learning abandons the mode of taking the maximum value and the average value on the channel of each feature point in the prior art for a space attention mechanism, but respectively performs local maximum coding and local average coding on each channel, increases the field of view, enhances the local feature of the night traffic sign image after image preprocessing, and overcomes the problem of deep layer feature loss caused by taking the maximum value and the average value on the channel feature points in the prior art.

4. According to the night traffic sign image detection model establishing method based on deep learning, weight two-dimensional convolution transformation is carried out on vectors formed by feature points of each channel, upper and lower semantic features of the same feature point of different channels can be effectively captured, the expression capability of a weight map is enhanced, the detection capability of an algorithm model on prominent features after night dim image preprocessing is improved, and the identification effect on night traffic signs is further improved.

5. The night traffic sign image detection model building method based on deep learning optimizes the centroid calculation of the K-Means + + algorithm, performs weight adjustment when the centroid is elected, and adds the median M as a critical point, so that the problem that noise and isolated point data are sensitive in the prior art is solved, and accurate detection of night traffic signs is facilitated.

6. According to the night traffic sign image detection model establishing method based on deep learning, influence factors alpha and v are introduced in the loss function calculation process, and the diagonal proportion and the included angle of the prediction frame and the real frame are considered in the calculation of the loss function, so that the detection precision is improved.

7. The invention relates to a night traffic sign image detection model building method based on deep learning, which introduces image brightness self-adaptive adjustment in self-adaptive Gamma correction in the image preprocessing process to adjust the brightness and contrast of an image; and stable characteristics can be accurately positioned, the corner characteristics of the night traffic sign image are obtained, and the characteristics are further fused into the training data set of the convolutional neural network, so that the characteristic extraction effect of the model on the night traffic sign is remarkably improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a schematic flow chart of an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network architecture in an embodiment of the present invention;

FIG. 3 is a schematic illustration of an attention mechanism in an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating comparison between before and after image preprocessing according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention. In the description of the present application, it is to be understood that the terms "front", "back", "left", "right", "upper", "lower", "vertical", "horizontal", "high", "low", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be construed as limiting the scope of the present application.

Example 1:

the method for establishing the night traffic sign image detection model based on deep learning shown in fig. 1 comprises the following steps:

collecting images containing night traffic signs as a training set;

clustering prior frames;

determining a frame loss function;

The workflow of the attention mechanism is shown in fig. 3, and includes:

determining a deep layer feature map from a deep layer of a network and a shallow layer feature map from a shallow layer of the network in a feature map generated by a YOLOv3 network residual error structure; wherein the shallow feature map is twice as large as the deep feature map;

taking the deep feature map as a first input of an attention mechanism, and obtaining a brand new feature map with channel importance difference as a first feature map;

In this embodiment, the method for obtaining the first feature map includes:

performing maximum pooling and average pooling polling operations on the deep feature map to obtain a maximum value and an average value of each channel, mainly aiming at aggregating spatial information and extracting representation symbols of the channels:

let the input feature tensor

C ₂ ＝2C ₁ ,H ₂ ＝2H ₁ ,W ₂ ＝2W ₁ ；F ₁ For input of deep profile, C ₁ Number of channels, H, for deep profile ₁ Is the height, W, of the deep profile ₁ Width of deep level feature map, C ₂ Number of channels, H, for shallow feature maps ₂ Is the height, W, of the shallow feature map ₂ Width of the shallow feature map;

and executing global average pooling and global maximum pooling to obtain channel characteristic quantities of two dimensionalities of average pooling and maximum pooling, wherein the channel characteristic quantities preliminarily represent the importance degree among different channels, and the global average pooling and global maximum pooling processes are as follows:

wherein ,f_avg(X) and f_max (X) represents a global average pooling function and a global maximum pooling function, respectively, X (i, j) represents a pixel value with coordinates (i, j), and k represents a kth channel.

Sharing the full connection layer of the channel characteristic quantities of the two dimensions, keeping the dimension of the characteristic quantities unchanged in the sharing process, and adding the two shared results to obtain a global channel characteristic quantity; carrying out nonlinear activation on the global channel characteristic quantity by using a Sigmoid function to obtain C ₁ Channel weight coefficients of x1 dimensions;

and (3) carrying out micro-step convolution on the deep characteristic map: filling the deep characteristic diagram internally, filling the average value of every two characteristic points in the X coordinate direction and the Y coordinate direction of the characteristic diagram, and copying the last characteristic point at the tail of the X coordinate direction and the Y coordinate direction; when a certain characteristic point has no corresponding reference point in the X coordinate direction or the Y coordinate direction, taking the average value of four diagonal angles for filling;

in this embodiment, the X coordinate direction is taken as an example for explanation, and the specific filling formula in the X coordinate direction is as follows:

when i is even and j is odd:

X(i,W ₂ )＝X(i,W ₂ -1)

i∈H ₂ ,j∈W ₂ i =2k, j =2k +1, k is a positive integer

When i, j are both odd:

X(i,W ₂ )＝X(i-1,W ₂ -1),X(H ₂ ,j)＝X(H ₂ -1,j-1)

i∈H ₂ ,j∈W ₂ i =2k +1, j =2k +1, k is a positive integer

The Y coordinate direction is the same.

Followed by the last step of the micro-step convolutionPerforming convolution with convolution kernel size of 3 × 3 to obtain

The third characteristic diagram of (1);

multiplying the third characteristic diagram by the channel weight coefficient to obtain a brand new characteristic diagram F ″ ₁ Defined as the first profile:

F′ ₁ ＝f _conv1 (F ₁ )

F″ ₁ ＝δ[W ₀ f _max (F ₁ )+W ₁ f _avg (F ₁ )]·F′ ₁

wherein ,f_conv1 Representing a microstep convolution, W ₀ ,W ₁ The full connection layer weight coefficient matrix is shown, and delta represents a Sigmoid activation function.

In this embodiment, the method for obtaining the second feature map includes:

let the input feature tensor of the shallow feature map be

At C ₂ On each channel, filter is carried out on the characteristic points as D multiplied by D, step length is 1, local maximum coding and local average coding of filling are copied, and coding results F are obtained respectively _pmax ,F _pavg； wherein ,

r is a ratio; the concrete formula is as follows:

f _pavg (x)＝max(X(i,j)),i,j∈D

F _pmax ＝f _pmax (X(i,j)),F _pavg (x)＝f _pavg (X(i,j)),for i＝1,j＝1,…,D

then, respectively at F _pmax ,F _pavg Generating the number of characteristic points H along the channel direction ₂ ×W ₂ Is/are as follows

Feature vector:

X _pmax (i,j)∈F _pmax

X _pavg (i,j)∈F _pavg

using C ₂ X1 vector convolution pair H ₂ ×W ₂ The vectors are convoluted to respectively obtain

Two weight maps which are coded and convolved; the mapping of the two weight graphs is forwarded to a convolutional layer, a weight graph with only one channel is generated, and a spatial attention weight matrix is obtained by using a Sigmoid function for activation; multiplying the shallow feature map by a space attention weight matrix to obtain the second feature map F' ₂ (ii) a The concrete formula is as follows:

F′ ₂ ＝δ[f _conv2 (f _conc (F _conv1 ,F _conv2 ))]·F ₂

in the formula ,

as a self-defined function, X _pmax (i, j) are characteristic pixel points, f _conv1 Is a vector convolution of f _conv2 Adjusting the convolution for the channel, f _conc Indicating a join operation and delta indicates a Sigmoid activation function.

In the present embodiment, the first characteristic diagram F ″ ₁ And a second feature map F' ₂ The formula for obtaining the output of the attention mechanism for the join operation is expressed as: f _CBAM-1 ＝f _conc (F″ ₁ ,F′ ₂ )。

In a more preferred embodiment, the training set collected preferably contains all possible traffic signs present in the current area.

Example 2:

a night traffic sign image detection model building method based on deep learning is characterized in that on the basis of embodiment 1, a K-Means + + algorithm is used for clustering prior frames and dividing 9 scales of the prior frames.

In this embodiment, weight adjustment is performed when the K-Means + + algorithm elects the centroid, a median M is added as a critical point, and a centroid calculation formula is as follows:

Δx＝|C _i-1 -x|

Example 3:

in the night traffic sign image detection model establishing method based on deep learning, on the basis of the embodiment 1 or 2, a frame loss function comprises a confidence error, a positioning error and a classification error;

wherein, the confidence error and the classification error are calculated by adopting a cross entropy error;

the positioning error is calculated by adopting the following method:

calculating target location loss L _loc ：

in the formula ：b、b^gt Respectively the central points of the real frame and the prediction frame, rho (-) represents the Euclidean distance, c is b and b ^gt Diagonal distance of minimum circumscribed rectangle, h ^gt The diagonal lines of the prediction frame and the real frame are respectively, both alpha and v are influence factors, and beta is the diagonal angle between the prediction frame and the real frame.

Example 4:

the night traffic sign image detection model building method based on deep learning further comprises the following steps of preprocessing the images in the training set on the basis of any one of the embodiments.

The embodiment introduces image brightness adaptive adjustment in adaptive Gamma correction:

wherein X is the image average brightness;

after the gamma balance, the image becomes clearer, the traffic sign pattern highlights the color, the shape and the symbol, basic elements such as night reflection and the like are displayed, then the shit-Tomasi corner detection algorithm is used for acquiring the corner features of the night traffic sign image, and the score formula is as follows: r =min(λ ₁ ，λ ₂ ) (ii) a Wherein R is a fraction of lambda ₁ The gradient change information, lambda, of the pixel point in the x direction is represented ₂ The gradient change information of the pixel point in the y direction is represented, and if the fraction is larger than a set threshold value, the pixel point is regarded as a corner point.

And finally, performing data enhancement on the night traffic sign images with the number of samples less than the set threshold value.

Experiments have shown that a gamma of 2.2 is most suitable.

The ratio before and after preprocessing according to the method of the present embodiment is shown in fig. 4, where the left image in fig. 4 is the original image and the right image is the image preprocessed according to the method of the present embodiment.

Example 5:

a night traffic sign image detection model building method based on deep learning comprises the following specific processes:

step 1: data acquisition and preprocessing

1.1. A night image containing 45 types of traffic signs such as other traffic signs such as prohibition, indication and warning is collected through a vehicle-mounted camera, and the side length of most of the traffic signs in the image is between 1000 and 500 pixels. And (4) carrying out screening deletion on the repeated images and the target fuzzy data set.

1.2. And making the acquired images into a data set in a VOC format. And labeling the image by using a LabelMe labeling tool to obtain an xml format file, wherein the xml file comprises the width, height, position information, category information and position information of the target detection frame in the image. The night images were data enhanced for traffic signs with a sparse number of partial samples using the improved adaptive Gamma correction to adjust image brightness and contrast as described in example 4, using Shi-Tomasi corner detection processing.

1.3. And (4) placing the processed image data under a JPEGImaps folder, and placing the image annotation data under an Annotations folder. All data marked are divided into training set data and testing data according to the proportion of 5.

And 2, step: an improved YOLOv3 network model as shown in fig. 2 was built:

2.1. The backbone feature extraction network of yolovv 3 is Darknet53, residual convolution in the Darknet53 is divided into two parts by using Residual network Residual, and the backbone part is convolution of 1X1 and convolution of 3X 3; the residual edge part is not processed, and the input and the output of the main stem are directly combined.

2.2. The whole trunk part is formed by residual convolution, and is repeated for 1+2+8+ 4 times by using the Resblock _ body residual structure.

2.3. And extracting the multi-scale features of the night traffic sign in the data set by using the convolutional neural backbone network of the YOLOv3 model, and extracting three feature layers in total. The three characteristic layers are positioned at different positions of the trunk part Darknet53, and are respectively positioned at the middle layer, the middle lower layer and the bottom layer, and the shape of the three characteristic layers is respectively (52, 256), (26, 512), (13, 1024).

2.4. The feature maps generated for the residual structure in Darknet-53 are then processed using the modified CBAM-1 as described in example 1, with the shallow feature map to achieve spatial attention, the deep feature map to achieve channel attention, the first input being feature map 13X13 from the deep layer of the network, and the second feature map 26X26 from the shallow network.

2.5. Firstly, maximum pooling and average pooling polling operations are carried out by utilizing a 13X13 feature map to obtain the maximum value and the average value of each channel, and global pooling in two modes, namely global average pooling and global maximum pooling, is carried out in parallel, so that two-dimensional channel feature quantities are obtained, and the channel feature quantities preliminarily represent the importance degree among different channels.

2.6. And the feature quantity of the global channel directly passes through the shared full-connection layer, the dimension of the feature quantity is unchanged in the process, and the two processed results are added. Then nonlinear activation is carried out by using Sigmoid function to obtain C ₁ Channel weight coefficients of x1 dimension.

2.7. And then, carrying out micro-step convolution on the input 13X13 feature map, firstly carrying out internal filling, filling the average value of every two feature points in the X coordinate direction and the Y coordinate direction, copying the last feature point at the tail end, and taking the average value of four diagonal angles when no reference point exists in the upper, lower, left and right directions.

2.8. Finally, multiplying the input characteristic diagram by the channel weight coefficient to obtain a brand new characteristic diagram F' with channel importance difference ₁ 。

2.9. And then, realizing spatial attention by using a 26X26 feature map, performing local maximum coding and local average coding with a filter of DxD on the feature points on a channel, and generating feature vectors of the feature points along the channel direction in the space.

2.10. Vector convolution is used for carrying out convolution processing on vectors to obtain two weight maps which are respectively subjected to coding and convolution. And then, the mapping of the two graphs is forwarded to a convolutional layer to generate a weight graph with only one channel, and a final spatial attention weight matrix is obtained through a Sigmoid activation function.

2.11. And multiplying the weight matrix by the input characteristic diagram to obtain the characteristic diagram with the spatial pixel weight relation.

2.12. And finally, performing a connection operation on the two feature maps after the channel attention and the space attention are processed, and using the two feature maps as the output of the CBAM-1 attention mechanism.

2.13. Inputting the 26X26 and 52X52 feature maps into a CBAM-1 attention mechanism, repeating the above process, obtaining two enhanced features and an original 13X13 feature, wherein the shape of the three features are (52, 128), (26, 256) and (13, 512), and then carrying out five times of convolution on the feature layers of the three shape, then transmitting the feature layers into a YoloHead to obtain a prediction result,

2.14. and carrying out score sorting and non-maximum inhibition screening after the obtained prediction result is obtained.

2.15. Next, the loss calculation needs to be performed on three feature layers, and here, the minimum feature layer is taken as an example.

2.16. The position (m, 13,3, 1) of the point in the feature layer where the target is actually present and the corresponding type (m, 13,3, 45) thereof are retrieved.

2.17. The predicted value output of yolo _ outputs is processed to obtain predicted values after resume, and shape is (m, 13,3, 50), (m, 26,3, 50), (m, 52,3, 50) respectively.

2.18. And acquiring the value of the real frame after coding, and then calculating the loss, wherein the meaning of the coded value is the same as the predicted value and can be used for calculating the loss.

2.19. For each graph, the IOU is computed for all real and predicted blocks, the prior block with the largest IOU in each network point is taken, and if this largest IOU is smaller than the ignore _ thresh, it is retained, typically 0.5 for ignore _ thresh, and the purpose of this step is to balance the negative samples.

2.20. And calculating the loss of the center and the width and the height, wherein the calculated loss is actually the target, and comparing the result after 2.19 real frame coding with the unprocessed prediction result to obtain the loss.

2.21. Calculating the loss of the confidence coefficient, wherein the loss consists of two parts, the first part is that the target actually exists, and the value of the confidence coefficient in the prediction result is compared with 1; the second part is that there is virtually no target, and in the fourth step the value of its maximum IOU is obtained compared to 0.

2.22. The loss of the prediction class is calculated, which calculates the difference between the prediction class and the real class, where the target actually exists.

And step 3: training by using the processed data and the built model:

3.1. and (3) disorganizing the training set, randomly selecting a batch of pictures from the training set each time, and performing data amplification preprocessing operation on the selected pictures, wherein the preprocessing operation comprises random zooming, translation, image turning and noise addition. And sending the image data subjected to data amplification processing into a model, calculating the error between the predicted value and the true value of the data through a loss function, and optimizing and updating the weight of the neural network according to a chain rule during reverse propagation.

3.2. The training process is divided into two stages, wherein the pre-training model with a good effect is introduced by using the pre-training model on the Imagenet data set in the first stage, and the whole network parameters are updated in the second stage.

3.2.1. The learning rate of the first stage of freezing training is 0.001, the model structure of the pre-training model is kept unchanged, the weights of the front k network layers of the model are locked and kept unchanged, a data set is detected by using a traffic sign, the unlocked layer is retrained again, and the network trains 50 rounds on the training set to obtain new weights;

3.2.2. training a second stage by using a learning rate attenuation and early stop strategy, fine-tuning the training parameters, taking the initial learning rate of 0.0001, starting unfreezing training, keeping the model structure and the parameter weight obtained in the last stage of training, continuing training for 50 rounds by using the training set until the loss value starts to converge, calculating the error of the model on the loss of the verification set in each time period, stopping training when the error of the model in a certain period is greater than the error of the verification set in the last period, and storing the network model; when the loss on the validation set after the network iterates over successive k cycles begins to be less than a predetermined loss value, then the learning rate of the training is halved.

And after the training is finished, verifying by using the verification set to finish the model establishment.

And 4, step 4: the model is tested with a test set.

In this embodiment, the following experimental results are obtained after the model is trained by 50 epochs:

original YOLOv3 model + CBAM attention model: the average accuracy rate on the test set of the number ratio of the daytime picture to the nighttime picture 1 is 76.42%, the average recall rate is 76.73%, and the average mAP is 79.54%;

the application model is as follows: on the same test set of the number of daytime pictures and nighttime pictures compared to 1, the average accuracy is 85.54%, the average recall is 84.92%, and the average mAP is 85.51%.

It can be seen that the traditional detection method without any prior information has the problems of inaccurate positioning, difficult feature extraction, waste of a large amount of computing resources and the like. The invention provides a method for quickly and accurately detecting a night traffic sign target, which effectively solves the problem that the traffic sign is difficult to separate from a scene when other strong light sources exist in the night traffic sign.

Example 6:

a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth in any one of the above embodiments.

All or part of the flow of the method of the embodiments may be stored in a computer readable storage medium through a computer program, and the computer program may implement the steps of the method embodiments when being executed by a processor. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard-disk, a magnetic diskette, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in the jurisdiction.

The processor may be a central processing unit, but may also be other general-purpose processors, digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Claims

1. The night traffic sign image detection model building method based on deep learning is characterized by comprising the following steps of:

collecting images containing night traffic signs as a training set;

clustering prior frames;

determining a frame loss function;

2. The night traffic sign image detection model building method based on deep learning of claim 1, wherein the attention mechanism is output by the following method:

3. The night traffic sign image detection model building method based on deep learning of claim 2, wherein the method for obtaining the first feature map comprises:

sharing the fully connected layer for the channel characteristic quantities of the two dimensions, keeping the dimension of the characteristic quantities unchanged in the sharing process, and adding the two shared results to obtain the global channel characteristic quantity; carrying out nonlinear activation on the global channel characteristic quantity by using Sigmoid function to obtain C ₁ Channel weight coefficients of x1 dimensions; wherein, C ₁ The number of channels for inputting the deep characteristic diagram;

4. The night traffic sign image detection model building method based on deep learning of claim 3, wherein the method for performing micro-step convolution on the deep feature map comprises the following steps:

convolution with convolution kernel size of 3 × 3 is performed.

5. The night traffic sign image detection model building method based on deep learning of claim 2, wherein the method for obtaining the second feature map comprises:

at C ₂ Local maximum coding and local average coding are carried out on the characteristic points of the shallow characteristic diagram on each channel to respectively obtain a coding result F _p max,F _pavg ；

Respectively at F _pmax ,F _pavg Generating the number of characteristic points H along the channel direction ₂ ×W ₂ The feature vector of (2); wherein H ₂ For inputting the height, W, of the shallow feature map ₂ Is the width of the input shallow feature map;

use of C ₂ X1 vector convolution pair H ₂ ×W ₂ Carrying out convolution processing on the vectors to obtain two weight graphs subjected to coding and convolution;

the mapping of the two weight maps is forwarded to a convolution layer, a weight map with only one channel is generated, and a spatial attention weight matrix is obtained by using Sigmoid function activation;

6. The method for building the night traffic sign image detection model based on the deep learning of claim 5, wherein the feature tensor of the shallow feature map is

wherein F₂ Is the shallow characteristic diagram of the input.

7. The night traffic sign image detection model building method based on deep learning of claim 5, wherein in the process of local maximum coding and local average coding, the width of the adopted filter is D, the height of the adopted filter is D, and the step length of the adopted filter is 1; wherein,

r is a ratio.

8. The method for building the night traffic sign image detection model based on the deep learning of claim 1, wherein the clustering prior frame is implemented by using a K-Means + + algorithm, and a centroid calculation formula of the K-Means + + algorithm is as follows:

Δx＝|C _i-1 -x|

in the formula ：C_i Is the clustering center, x is the sample point, | C _i I is the number of samples of this class, C _i-1 For the clustering center of the last iteration, Δ x is the sample point and C _i-1 Manhattan distance, Δ x _M Is the median M and C _i-1 Manhattan distance of.

9. The night traffic sign image detection model building method based on deep learning of claim 1, wherein the determined border loss function includes confidence error, positioning error and classification error;

the positioning error is calculated by adopting the following method:

calculating target location loss L _loc ：

in the formula ：b、b^gt Respectively the central points of the real frame and the prediction frame, p (-) represents Euclidean distance,c is b, b ^gt Diagonal distance of minimum circumscribed rectangle, h ^gt The diagonal lines of the prediction frame and the real frame are respectively, both alpha and v are influence factors, and beta is the diagonal angle between the prediction frame and the real frame.

10. The night traffic sign image detection model building method based on deep learning of claim 1, further comprising preprocessing the images in the training set by:

wherein X is the image average brightness;

and performing data enhancement on the night traffic sign images with the number of samples less than the set threshold value.