CN111191611B

CN111191611B - Traffic sign label identification method based on deep learning

Info

Publication number: CN111191611B
Application number: CN201911425706.1A
Authority: CN
Inventors: 黄世泽; 陶婷; 杨玲玉; 张肇鑫
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2023-10-13
Anticipated expiration: 2039-12-31
Also published as: CN111191611A

Abstract

A traffic sign label recognition method based on deep learning. Capturing scene images where the mark labels are located from the monitoring camera in real time; inputting an original scene image to be detected into a trained mark label detection network, and identifying the area where the mark label in the scene image is located; dividing the area where the mark label is located from the original image, inputting a trained character recognition network, and recognizing characters in the area of the mark label; and sequencing and combining the left upper corner coordinates of the bounding boxes of all the characters to obtain the specific content of the mark labels. The invention does not need to preprocess the image and segment single characters from the region where the mark labels are positioned; the content of the sign marks in the scene images is detected and identified by using two target detection network models based on deep learning, so that the method has good identification effect on traffic sign marks with skew angles, blurred scene images, scene images with darker illumination and the like, and has good instantaneity and high accuracy.

Description

Traffic sign label identification method based on deep learning

Technical Field

The invention relates to the technical field of computer vision, in particular to a traffic sign label identification method based on deep learning.

Background

The traffic sign label contains a large amount of road information, vehicle information, etc., such as a speed limit sign, a stop (no stop) sign, a vehicle license plate, a rail transit train number, etc. The real-time sign mark recognition is used for detecting and recognizing the specific content of the traffic sign mark in the scene image shot by the camera, and the real-time sign mark recognition is an important component part of an advanced auxiliary driving system, an automatic driving system and traffic management and control of an automobile, so that the safety of vehicle running is guaranteed, and the traffic management and control efficiency is improved. Due to the complex and varied environment in real scenes, traffic sign label identification is susceptible to multiple factors such as complex weather, lighting conditions, sign label diversity, and the like, and still is a challenging technology.

The currently mainly adopted traffic sign label recognition method is mainly focused on the position and the category of the detected sign label, but the specific content of the sign label cannot be known, such as a speed limit sign, but only the speed limit can be detected, and the specific speed limit value cannot be known. The method for identifying the specific content often needs to detect the mark label first, then separate all characters in the mark label by using an image processing algorithm, and then identify each character individually to determine the specific content of the mark label. The method has very high requirements on an image processing algorithm and a character segmentation algorithm, is easy to segment and fail in a scene image in a complex environment, and cannot obtain complete characters, so that the specific content of the mark label cannot be accurately identified in the subsequent steps, and the accuracy and the speed of the mark label identification are reduced.

Accordingly, there is a need in the art for a real-time, accurate method of identifying traffic sign labels that can be adapted to a variety of complex scenarios.

Disclosure of Invention

In order to overcome the problems in the related art, the invention provides a unique traffic sign label identification method based on deep learning, so as to solve the problem that the existing traffic sign label identification method is difficult to be suitable for complex weather conditions, complex illumination conditions and various sign labels.

The technical scheme is as follows:

the invention provides a traffic sign label identification method based on deep learning, which comprises the following steps:

step S1: capturing scene images of the traffic sign labels from the monitoring cameras in real time;

step S2: inputting the original scene image to be detected captured in the step S1 into a trained mark label detection network, and identifying mark labels in the scene image to obtain boundary frame parameters of the mark labels;

step S3: dividing the area where the mark label is located from the original image according to the boundary frame parameters of the mark label obtained in the step S2, inputting a trained character recognition network, recognizing the characters of the area where the mark label is located, and obtaining the category of the characters and the boundary frame parameters;

step S4: and (3) sorting and combining the upper left corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step (S3) to obtain the specific content of the mark labels.

Further, in step S2, the step of inputting the original scene image to be detected into the trained sign label detection network, and identifying the traffic sign label in the scene image to obtain the boundary frame parameter of the traffic sign label includes:

step S2.1: inputting the scene image into a mark label detection network to obtain a detection result;

step S2.2: if the marker is detected, outputting the bounding box parameter { x } of the marker _1eft ,x _right ,y _top ,y _bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image _1eft X coordinate x of right boundary _right Y coordinate y of upper boundary _top Y coordinate y of lower boundary _bottom Wherein the origin of coordinates is located in the upper left corner of the scene image.

Further, in step S3, the marking label area is segmented from the original image, a trained character recognition network is input, and characters in the marking label area are recognized to obtain bounding box parameters of the characters, including:

step S3.1: if the parameter output in the step S2 is the "None Sign" prompt message, the step S3 and the step S4 are not executed; if the parameter outputted in step S2 is the bounding box parameter { x } _1eft ,x _right ,y _top ,y _bottom Step S3, step S4;

step S3.2: according to the parameter bounding box parameter { x } output in step S2 _1eft ,x _right ,y _top ,y _bottom Dividing a traffic sign label area from the scene image;

step S3.3: inputting the segmented mark label region image into a character recognition network to obtainClass and bounding box parameter set U to all characters _chr ：

Where n represents the number of recognized traffic sign label characters and Cn represents the category of the n-th character recognized.

Further, in step S4, the sorting and combining the coordinates of the upper left corners of the bounding boxes of all the characters to obtain the specific content of the sign label includes:

s4.1: according to the boundary frame parameter set U of the character obtained in the step 3 _chr Taking the x coordinate x of the left boundary of each character boundary box _1eft Forming a character position parameter set U _{chr_x} ：Wherein n represents the number of recognized characters;

s4.2: for character position parameter set U _{chr_x} The coordinates corresponding to the categories in (a) are ordered in order from small to large;

s4.3: and combining the ordered categories to determine the specific content of the mark labels.

Further, in step S2, the label detection network is a deep learning-based target detection network—yolo-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;

the mark label detection network comprises a conv layer, a maxpooling layer, a route layer, an upsample layer and a yolo layer, and 24 layers in total; the conv layer extracts basic features of an original image, such as color, texture, shape and the like, through convolution kernels of 3×3 and 1×1, and the step length is 1; the maxpooling layer adopts a maximum pooling method to carry out maximum sampling on the previous layer, the sliding window size is 2 multiplied by 2, and the step length is 2; the route layer splices the deep feature map and the shallow feature map, and learns the features of the deep layer and the shallow layer at the same time; the upsampling layer upsamples the image; the yolo layer designates parameters such as the category number of the scene images, calculates an average loss value loss and the like of training, and outputs the average loss value loss;

layers 0 through 11 are 6 of the conv layers of convolution kernel size 3 x 3, each of the conv layers being followed by the maxpooling layer; layers 12 to 15 are 4 of said conv layers, the convolution kernel sizes being 3×3, 1×1, respectively; layer 16 is the yolo layer; the 17 th layer is the route layer, and the feature map of the 13 th layer is spliced; layer 18 is the conv layer with a convolution kernel size of 1 x 1; layer 19 is the upsample layer; the 20 th layer is the route layer, and the characteristic diagram of the 19 th layer is spliced with the characteristic diagram of the 8 th layer; layers 21 to 22 are two of the conv layers of convolution kernel sizes 3×3, 1×1, respectively; layer 23 is the yolo layer, and outputs the last mark label detection result.

Further, in step S3, the character recognition network is a deep learning-based target detection network—yolov3 network, and the number of detected categories is determined according to the characteristics of the traffic sign labels;

the character recognition network comprises 107 layers, wherein the 107 layers comprise a feature extraction network function layer, a feature interaction function layer and a classification and bounding box regression function layer which are sequentially connected;

the 107 layers are respectively subordinate to conv layer, res layer, route layer, upsample layer and yolo layer, wherein:

the conv layer performs feature extraction operation on the feature map;

the res layer is a residual error connecting block, and the features of different layers are connected in a layer jump mode;

the route layer realizes the splicing of the feature graphs in different dimensions;

the upsampling layer is used for upsampling the feature map to reduce the size of the feature map;

the yolo layer mainly carries out loss function calculation, classification prediction and bounding box regression;

the 0 th layer to the 74 th layer are feature extraction network function layers, and the feature extraction network function layers are used for inputting segmented images, extracting image features and providing the output to a feature interaction function layer; the feature extraction network function layer consists of a conv layer and a res layer;

the 75 th layer to the 105 th layer are feature interaction functional layers, and the combination of a conv layer, a route layer, an upsample layer and a yolo functional layer is used for realizing the feature interaction connection between the layers;

the 106 th layer is a classification and bounding box regression functional layer and is composed of a yolo layer; the network finally realizes character feature extraction, character classification and bounding box regression, and finally the yolo layer outputs the category of the traffic sign label character and the bounding box parameters.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

1) The invention adopts two target detection network cascading methods based on deep learning to respectively detect the traffic sign marks in the scene images and identify the specific contents of the traffic sign marks. The method can identify the mark labels of various categories, and has good identification effect on small mark labels.

2) Compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failures caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can achieve better performance by adopting simple steps.

3) The method for detecting and identifying the specific content of the traffic sign label is fast in speed, high in accuracy and suitable for complex scenes.

The invention does not need to preprocess the image and segment single characters from the region where the mark labels are positioned; the content of the sign marks in the scene images is detected and identified by using two target detection network models based on deep learning, so that the method has good identification effect on traffic sign marks with skew angles, blurred scene images, scene images with darker illumination and the like, and has good instantaneity and high accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for identifying traffic sign labels based on deep learning in an embodiment of the invention;

FIG. 2 is a flow chart of identifying traffic sign labels in an image of a scene in an embodiment of the invention;

FIG. 3 is a flow chart of dividing the region where the logo marks are located and identifying the characters of the region where the logo marks are located in an embodiment of the present invention;

FIG. 4 is a flow chart of character bounding box parameter ordering and combining to obtain specific content of logo labels in an embodiment of the invention;

FIG. 5 is a flowchart of a license plate recognition method in a scene image based on deep learning in embodiment 2;

fig. 6 is a diagram of the recognition result of the example image of fig. 5 in embodiment 2 through the logo mark detection network;

fig. 7 is a result diagram of dividing the area where the license plate is located from the scene image according to the detection result in fig. 6 in embodiment 2 of the present invention;

fig. 8 is a diagram of the result of recognition by the character recognition network of fig. 7 in embodiment 2 according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Example 1

In this embodiment, a method for identifying traffic sign labels based on deep learning is provided, fig. 1 is a flowchart of a method for identifying traffic sign labels based on deep learning according to an embodiment of the present invention, and as shown in fig. 1, the detection method includes the following steps:

Through the steps, the method automatically identifies the area where the traffic sign label in the scene image is located, segments the area where the sign label is located from the scene image according to the boundary frame parameters of the target area, then identifies all characters in the area, and finally sequences and combines the boundary frame parameters of the characters to obtain the specific content of the sign label. The invention can identify the category and the specific content of the traffic sign label; compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failure caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can achieve better performance by adopting simple steps. Thereby improving the accuracy and the robustness of the detection and the identification of the traffic sign marks.

FIG. 2 is a flow chart for identifying traffic sign labels in an image of a scene, including the steps of:

step S21: inputting the scene image into a mark label detection network to obtain a detection result;

step S22: if the marker is detected, outputting the bounding box parameter { x } of the marker _1eft ,x _right ,y _top ,y _bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image _1eft X coordinate x of right boundary _right Y coordinate y of upper boundary _top Y coordinate y of lower boundary _bottom Wherein the origin of coordinates is located in the upper left corner of the scene image.

Specifically, the marker label detection network model adopts a target detection network based on deep learning, namely a YOLO-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;

The original scene image is input into a mark label detection network to know whether the scene image has a mark label or not, if the scene image has no mark label, a 'None Sign' prompt message is output, and the subsequent steps are not executed; if the mark number exists, outputting the boundary box parameter { x ] of the region where the mark number exists _1eft ,x _right ,y _top ,y _bottom And continuing to execute the subsequent steps.

Fig. 3 is a flowchart of dividing the region where the logo mark is located and identifying the character of the region where the logo mark is located according to an embodiment of the present invention, including the steps of:

step S31: if the parameter output in the step S2 is the "None Sign" prompt message, the step S3 and the step S4 are not executed; if the parameter outputted in step S2 is the bounding box parameter { x } _1eft ,x _right ,y _top ,y _bottom Step S3, step S4;

step S32: according to the parameter bounding box parameter { x } output in step S2 _1eft ,x _right ,y _top ,y _bottom Dividing a traffic sign label area from the scene image;

step S33: inputting the segmented marker label region image into a character recognition network to obtain a category and boundary frame parameter set U of all characters _chr ：

Specifically, the step of dividing the marker label region is a simple image dividing algorithm, and the rectangular frame is divided from the image, so that no extra image processing step is needed.

The character recognition network is a target detection network-YOLOv 3 network based on deep learning, and the number of detected categories is determined according to the characteristics of the traffic sign labels;

the conv layer performs feature extraction operation on the feature map;

FIG. 4 is a flowchart of character bounding box parameter ordering and combining to obtain specific content of logo labels in accordance with an embodiment of the present invention, comprising the steps of:

s41: according to the boundary frame parameter set U of the character obtained in the step 3 _chr Taking the x coordinate x of the left boundary of each character boundary box _1eft Forming character position parameter set Wherein n represents the number of recognized characters;

s42: for character position parameter set U _{chr_x} The coordinates corresponding to the categories in (a) are ordered in order from small to large;

s43: and combining the ordered categories to determine the specific content of the mark labels.

The positions and the specific contents of the traffic sign marks in the scene images shot by the camera can be known through the processing of the steps. Because the area where the mark label is located is segmented from the original scene image in the step S3, the character recognition network can accurately recognize the specific content of the mark label, avoid the interference of the non-mark label area on the characteristics of the image, improve the robustness of mark label recognition, and is suitable for the conditions of complex weather conditions, complex illumination conditions, diversified mark labels and the like.

Example 2

The embodiment provides a method for identifying a sign label based on deep learning, in particular a method for identifying a license plate in a scene image, which comprises the following steps:

step 1: capturing a scene image from a monitoring camera in real time;

step 2: inputting the original scene image to be detected captured in the step 1 into a trained mark label detection network, and identifying a license plate region in the scene image to obtain boundary frame parameters of the license plate region;

step 3: dividing the license plate region from the original image according to the boundary frame parameters of the license plate region obtained in the step 2, inputting a trained character recognition network, recognizing all characters of the region where the license plate is located, and obtaining the category of the characters and the boundary frame parameters;

step 4: and (3) sequencing and combining the left upper corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step (3) to obtain the specific content of the license plate.

Through the steps, the method and the device automatically identify the area where the license plate is located in the scene image, divide the area where the license plate is located from the scene image according to the boundary frame parameters of the license plate area, identify all characters in the license plate area, and finally sequence and combine the boundary frame parameters of the characters to obtain specific contents of the license plate. Compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failure caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can be suitable for complex weather conditions, license plates with different angles and the like.

An alternative embodiment of the present invention, steps 1 to 4, is described in detail below with reference to fig. 5 to 8.

Fig. 5 is an example of license plate scene image in embodiment 2 according to the present invention, as shown in fig. 5, (a) is a blurred image, (b) is a skewed license plate image, (c) is a license plate image with darker illumination, and (d) is a license plate image in rainy days, the resolutions are respectively: (2048×1536), (1920×1200), (2048×1536). The license plate in the data set is characterized by 2 fixed Chinese characters, "steel transportation" followed by 5 digits.

Inputting the original image in FIG. 5 into a mark label detection network to obtain a detection result; if the license plate is detected, outputting the boundary frame parameter { x } of the license plate _1eft ,x _right ,y _top ,y _bottom -a }; if no license plate is detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image _1eft X coordinate x of right boundary _right Y coordinate y of upper boundary _top Y coordinate y of lower boundary _bottom Wherein the origin of coordinates is located in the upper left corner of the scene image. The number of detection categories of the label detection network is 1, namely license: license plate. Fig. 6 is a diagram of the recognition result of the exemplary image of fig. 5 through the logo mark detection network according to embodiment 2 of the present invention, and the specific bounding box parameter results are as follows:

for the bounding box parameters of the license plate of fig. 6 (a):

{x _1eft ,x _right ,y _top ,y _bottom }＝{578,754,1105,1190}

for the bounding box parameters of the license plate of fig. 6 (b):

{x _1eft ,x _right ,y _top ,y _bottom }＝{592,760,1152,1260}

for the bounding box parameters of the license plate of fig. 6 (c):

{x _1eft ,x _right ,y _top ,y _bottom }＝{915,1038,617,666}

for the bounding box parameters of the license plate of fig. 6 (d):

{x _1eft ,x _right ,y _top ,y _bottom }＝{1109,1272,1047,1140}

parameter bounding box parameter { x ] output according to the above steps _1eft ,x _right ,y _top ,y _bottom Dividing the area where the license plate is located from the scene image; inputting the segmented license plate region image into a character recognition network to obtain a character category and a bounding box parameter set U _chr ：

The recognition category number of the character recognition network is 12, namely two Chinese characters of steel and fortune; 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

FIG. 7 is a graph showing the result of the detection of FIG. 6 in embodiment 2 of the present invention, in which the area where the license plate is located is segmented from the scene image, and FIG. 8 is a graph showing the result of the recognition of FIG. 7 by the character recognition network in embodiment 2 of the present invention, in which the character recognition accurately recognizes all the characters in the license plate area, and the boundary box parameter set U of the characters is obtained as shown in FIG. 8 _chr 。

For the bounding box parameter set U of the character of FIG. 8 (a) _chr :

For the bounding box parameter set U of the character of FIG. 8 (b) _chr :

For the bounding box parameter set U of the character of FIG. 8 (c) _chr :

The bounding box parameter set U for the character of FIG. 8 (d) _chr :

The boundary frame parameter set U of the character obtained according to the steps _chr Taking the x coordinate x of the left boundary of each character boundary box _1ef Forming character position parameter setWherein n represents the number of recognized characters; for character position parameter set U _{chr_x} The coordinates corresponding to the categories in (a) are ordered in order from small to large; and combining the ordered categories to determine the specific content of the license plate.

For the character position parameter set U of FIG. 8 (a) _{chr_x} ：

For the character position parameter set U of FIG. 8 (b) _{chr_x} ：

For FIG. 8 (c) character position parameter set U _{chr_x} ：

For the character position parameter set U of FIG. 8 (d) _{chr_x} ：

For the character position parameter set U _{chr_x} The coordinates corresponding to the categories in (a) are ordered in the order from small to large, and the results of the ordering and the combination are as follows:

for license plate details of fig. 8 (a): "Steel fortune 20020";

for license plate details of fig. 8 (b): "Steel fortune 20031";

for license plate details of fig. 8 (c): "Steel fortune 10042";

for license plate details of fig. 8 (d): "Steel fortune 10068";

the position of the license plate in the scene image and the specific content thereof can be obtained through the processing of the steps. Under complex scenes, such as image blurring, license plate image deflection, darker image illumination, rainy days and the like, the specific content of the license plate can be accurately detected and identified, and the robustness and accuracy of license plate identification are improved.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. The traffic sign label identification method based on deep learning is characterized by comprising the following steps:

step S3: dividing the area where the mark label is located from the original image according to the boundary frame parameters of the mark label obtained in the step S2, inputting the divided mark label image into a trained character recognition network, and recognizing the characters of the area where the mark label is located to obtain the category of the characters and the boundary frame parameters;

step S4: sequencing and combining the upper left corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step S3 to obtain the specific content of the mark labels;

the step S2 specifically includes:

step S22: if the marker is detected, outputting the bounding box parameter { x } of the marker _1eft ,x _right ,y _top ,y _bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message;

the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image _1eft X coordinate x of right boundary _right Y coordinate y of upper boundary _top Y coordinate y of lower boundary _bottom Wherein the origin of coordinates is located in the upper left corner of the scene image;

the marker label detection network is a deep learning-based target detection network, namely a YOLO-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;

the mark label detection network comprises a conv layer, a maxpooling layer, a route layer, an upsample layer and a yolo layer, and 24 layers in total;

the conv layer extracts basic features of an original image through convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, and the step length is 1;

the maxpooling layer adopts a maximum pooling method to carry out maximum sampling on the previous layer, the sliding window size is 2 multiplied by 2, and the step length is 2;

the route layer splices the deep feature map and the shallow feature map, and learns the features of the deep layer and the shallow layer at the same time;

the upsampling layer upsamples the image;

the yolo layer designates the scene image category number parameter, calculates the average loss value loss of training and outputs the average loss value loss;

layers 0 through 11 are 6 of the conv layers of convolution kernel size 3 x 3, each of the conv layers being followed by the maxpooling layer;

layers 12 to 15 are 4 of said conv layers, the convolution kernel sizes being 3×3, 1×1, respectively;

layer 16 is the yolo layer;

the 17 th layer is the route layer, and the feature map of the 13 th layer is spliced;

layer 18 is the conv layer with a convolution kernel size of 1 x 1;

layer 19 is the upsample layer;

the 20 th layer is the route layer, and the characteristic diagram of the 19 th layer is spliced with the characteristic diagram of the 8 th layer;

layers 21 to 22 are two of the conv layers of convolution kernel sizes 3×3, 1×1, respectively;

the 23 rd layer is the yolo layer, and outputs the last mark label detection result;

the step S3 specifically comprises the following steps:

step S3.3: inputting the segmented marker label region image into a character recognition network to obtain a category and boundary frame parameter set U of all characters _chr ：

Wherein n represents the number of the identified traffic sign label characters, cn represents the category of the identified nth character;

the character recognition network comprises 107 layers, wherein the 107 layers comprise a feature extraction network function layer, a feature interaction function layer and a classification and bounding box regression function layer which are sequentially connected; the 107 layers are respectively subordinate to conv layer, res layer, route layer, upsample layer and yolo layer, wherein:

the conv layer performs feature extraction operation on the feature map;

the 106 th layer is a classification and bounding box regression functional layer and is composed of a yolo layer; the network finally realizes character feature extraction, character classification and bounding box regression, and finally the yolo layer outputs the category of the traffic sign label character and the bounding box parameter;

the step S4 specifically comprises the following steps:

s4.1: according to the boundary frame parameter set U of the character obtained in the step 3 _chr Taking the x coordinate x of the left boundary of each character boundary box _1eft Forming a character position parameter set U _{chr_x} ：{C1:x _1eft1 ,C2:x _1eft2 ,…,Cn:x _{1eft_n} -wherein n represents the number of recognized characters;