CN111191611B - Traffic sign label identification method based on deep learning - Google Patents

Traffic sign label identification method based on deep learning Download PDF

Info

Publication number
CN111191611B
CN111191611B CN201911425706.1A CN201911425706A CN111191611B CN 111191611 B CN111191611 B CN 111191611B CN 201911425706 A CN201911425706 A CN 201911425706A CN 111191611 B CN111191611 B CN 111191611B
Authority
CN
China
Prior art keywords
layer
label
layers
mark
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911425706.1A
Other languages
Chinese (zh)
Other versions
CN111191611A (en
Inventor
黄世泽
陶婷
杨玲玉
张肇鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201911425706.1A priority Critical patent/CN111191611B/en
Publication of CN111191611A publication Critical patent/CN111191611A/en
Application granted granted Critical
Publication of CN111191611B publication Critical patent/CN111191611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

A traffic sign label recognition method based on deep learning. Capturing scene images where the mark labels are located from the monitoring camera in real time; inputting an original scene image to be detected into a trained mark label detection network, and identifying the area where the mark label in the scene image is located; dividing the area where the mark label is located from the original image, inputting a trained character recognition network, and recognizing characters in the area of the mark label; and sequencing and combining the left upper corner coordinates of the bounding boxes of all the characters to obtain the specific content of the mark labels. The invention does not need to preprocess the image and segment single characters from the region where the mark labels are positioned; the content of the sign marks in the scene images is detected and identified by using two target detection network models based on deep learning, so that the method has good identification effect on traffic sign marks with skew angles, blurred scene images, scene images with darker illumination and the like, and has good instantaneity and high accuracy.

Description

Traffic sign label identification method based on deep learning
Technical Field
The invention relates to the technical field of computer vision, in particular to a traffic sign label identification method based on deep learning.
Background
The traffic sign label contains a large amount of road information, vehicle information, etc., such as a speed limit sign, a stop (no stop) sign, a vehicle license plate, a rail transit train number, etc. The real-time sign mark recognition is used for detecting and recognizing the specific content of the traffic sign mark in the scene image shot by the camera, and the real-time sign mark recognition is an important component part of an advanced auxiliary driving system, an automatic driving system and traffic management and control of an automobile, so that the safety of vehicle running is guaranteed, and the traffic management and control efficiency is improved. Due to the complex and varied environment in real scenes, traffic sign label identification is susceptible to multiple factors such as complex weather, lighting conditions, sign label diversity, and the like, and still is a challenging technology.
The currently mainly adopted traffic sign label recognition method is mainly focused on the position and the category of the detected sign label, but the specific content of the sign label cannot be known, such as a speed limit sign, but only the speed limit can be detected, and the specific speed limit value cannot be known. The method for identifying the specific content often needs to detect the mark label first, then separate all characters in the mark label by using an image processing algorithm, and then identify each character individually to determine the specific content of the mark label. The method has very high requirements on an image processing algorithm and a character segmentation algorithm, is easy to segment and fail in a scene image in a complex environment, and cannot obtain complete characters, so that the specific content of the mark label cannot be accurately identified in the subsequent steps, and the accuracy and the speed of the mark label identification are reduced.
Accordingly, there is a need in the art for a real-time, accurate method of identifying traffic sign labels that can be adapted to a variety of complex scenarios.
Disclosure of Invention
In order to overcome the problems in the related art, the invention provides a unique traffic sign label identification method based on deep learning, so as to solve the problem that the existing traffic sign label identification method is difficult to be suitable for complex weather conditions, complex illumination conditions and various sign labels.
The technical scheme is as follows:
the invention provides a traffic sign label identification method based on deep learning, which comprises the following steps:
step S1: capturing scene images of the traffic sign labels from the monitoring cameras in real time;
step S2: inputting the original scene image to be detected captured in the step S1 into a trained mark label detection network, and identifying mark labels in the scene image to obtain boundary frame parameters of the mark labels;
step S3: dividing the area where the mark label is located from the original image according to the boundary frame parameters of the mark label obtained in the step S2, inputting a trained character recognition network, recognizing the characters of the area where the mark label is located, and obtaining the category of the characters and the boundary frame parameters;
step S4: and (3) sorting and combining the upper left corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step (S3) to obtain the specific content of the mark labels.
Further, in step S2, the step of inputting the original scene image to be detected into the trained sign label detection network, and identifying the traffic sign label in the scene image to obtain the boundary frame parameter of the traffic sign label includes:
step S2.1: inputting the scene image into a mark label detection network to obtain a detection result;
step S2.2: if the marker is detected, outputting the bounding box parameter { x } of the marker 1eft ,x right ,y top ,y bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image 1eft X coordinate x of right boundary right Y coordinate y of upper boundary top Y coordinate y of lower boundary bottom Wherein the origin of coordinates is located in the upper left corner of the scene image.
Further, in step S3, the marking label area is segmented from the original image, a trained character recognition network is input, and characters in the marking label area are recognized to obtain bounding box parameters of the characters, including:
step S3.1: if the parameter output in the step S2 is the "None Sign" prompt message, the step S3 and the step S4 are not executed; if the parameter outputted in step S2 is the bounding box parameter { x } 1eft ,x right ,y top ,y bottom Step S3, step S4;
step S3.2: according to the parameter bounding box parameter { x } output in step S2 1eft ,x right ,y top ,y bottom Dividing a traffic sign label area from the scene image;
step S3.3: inputting the segmented mark label region image into a character recognition network to obtainClass and bounding box parameter set U to all characters chr
Where n represents the number of recognized traffic sign label characters and Cn represents the category of the n-th character recognized.
Further, in step S4, the sorting and combining the coordinates of the upper left corners of the bounding boxes of all the characters to obtain the specific content of the sign label includes:
s4.1: according to the boundary frame parameter set U of the character obtained in the step 3 chr Taking the x coordinate x of the left boundary of each character boundary box 1eft Forming a character position parameter set U chr_xWherein n represents the number of recognized characters;
s4.2: for character position parameter set U chr_x The coordinates corresponding to the categories in (a) are ordered in order from small to large;
s4.3: and combining the ordered categories to determine the specific content of the mark labels.
Further, in step S2, the label detection network is a deep learning-based target detection network—yolo-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;
the mark label detection network comprises a conv layer, a maxpooling layer, a route layer, an upsample layer and a yolo layer, and 24 layers in total; the conv layer extracts basic features of an original image, such as color, texture, shape and the like, through convolution kernels of 3×3 and 1×1, and the step length is 1; the maxpooling layer adopts a maximum pooling method to carry out maximum sampling on the previous layer, the sliding window size is 2 multiplied by 2, and the step length is 2; the route layer splices the deep feature map and the shallow feature map, and learns the features of the deep layer and the shallow layer at the same time; the upsampling layer upsamples the image; the yolo layer designates parameters such as the category number of the scene images, calculates an average loss value loss and the like of training, and outputs the average loss value loss;
layers 0 through 11 are 6 of the conv layers of convolution kernel size 3 x 3, each of the conv layers being followed by the maxpooling layer; layers 12 to 15 are 4 of said conv layers, the convolution kernel sizes being 3×3, 1×1, respectively; layer 16 is the yolo layer; the 17 th layer is the route layer, and the feature map of the 13 th layer is spliced; layer 18 is the conv layer with a convolution kernel size of 1 x 1; layer 19 is the upsample layer; the 20 th layer is the route layer, and the characteristic diagram of the 19 th layer is spliced with the characteristic diagram of the 8 th layer; layers 21 to 22 are two of the conv layers of convolution kernel sizes 3×3, 1×1, respectively; layer 23 is the yolo layer, and outputs the last mark label detection result.
Further, in step S3, the character recognition network is a deep learning-based target detection network—yolov3 network, and the number of detected categories is determined according to the characteristics of the traffic sign labels;
the character recognition network comprises 107 layers, wherein the 107 layers comprise a feature extraction network function layer, a feature interaction function layer and a classification and bounding box regression function layer which are sequentially connected;
the 107 layers are respectively subordinate to conv layer, res layer, route layer, upsample layer and yolo layer, wherein:
the conv layer performs feature extraction operation on the feature map;
the res layer is a residual error connecting block, and the features of different layers are connected in a layer jump mode;
the route layer realizes the splicing of the feature graphs in different dimensions;
the upsampling layer is used for upsampling the feature map to reduce the size of the feature map;
the yolo layer mainly carries out loss function calculation, classification prediction and bounding box regression;
the 0 th layer to the 74 th layer are feature extraction network function layers, and the feature extraction network function layers are used for inputting segmented images, extracting image features and providing the output to a feature interaction function layer; the feature extraction network function layer consists of a conv layer and a res layer;
the 75 th layer to the 105 th layer are feature interaction functional layers, and the combination of a conv layer, a route layer, an upsample layer and a yolo functional layer is used for realizing the feature interaction connection between the layers;
the 106 th layer is a classification and bounding box regression functional layer and is composed of a yolo layer; the network finally realizes character feature extraction, character classification and bounding box regression, and finally the yolo layer outputs the category of the traffic sign label character and the bounding box parameters.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1) The invention adopts two target detection network cascading methods based on deep learning to respectively detect the traffic sign marks in the scene images and identify the specific contents of the traffic sign marks. The method can identify the mark labels of various categories, and has good identification effect on small mark labels.
2) Compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failures caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can achieve better performance by adopting simple steps.
3) The method for detecting and identifying the specific content of the traffic sign label is fast in speed, high in accuracy and suitable for complex scenes.
The invention does not need to preprocess the image and segment single characters from the region where the mark labels are positioned; the content of the sign marks in the scene images is detected and identified by using two target detection network models based on deep learning, so that the method has good identification effect on traffic sign marks with skew angles, blurred scene images, scene images with darker illumination and the like, and has good instantaneity and high accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying traffic sign labels based on deep learning in an embodiment of the invention;
FIG. 2 is a flow chart of identifying traffic sign labels in an image of a scene in an embodiment of the invention;
FIG. 3 is a flow chart of dividing the region where the logo marks are located and identifying the characters of the region where the logo marks are located in an embodiment of the present invention;
FIG. 4 is a flow chart of character bounding box parameter ordering and combining to obtain specific content of logo labels in an embodiment of the invention;
FIG. 5 is a flowchart of a license plate recognition method in a scene image based on deep learning in embodiment 2;
fig. 6 is a diagram of the recognition result of the example image of fig. 5 in embodiment 2 through the logo mark detection network;
fig. 7 is a result diagram of dividing the area where the license plate is located from the scene image according to the detection result in fig. 6 in embodiment 2 of the present invention;
fig. 8 is a diagram of the result of recognition by the character recognition network of fig. 7 in embodiment 2 according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.
Example 1
In this embodiment, a method for identifying traffic sign labels based on deep learning is provided, fig. 1 is a flowchart of a method for identifying traffic sign labels based on deep learning according to an embodiment of the present invention, and as shown in fig. 1, the detection method includes the following steps:
step S1: capturing scene images of the traffic sign labels from the monitoring cameras in real time;
step S2: inputting the original scene image to be detected captured in the step S1 into a trained mark label detection network, and identifying mark labels in the scene image to obtain boundary frame parameters of the mark labels;
step S3: dividing the area where the mark label is located from the original image according to the boundary frame parameters of the mark label obtained in the step S2, inputting a trained character recognition network, recognizing the characters of the area where the mark label is located, and obtaining the category of the characters and the boundary frame parameters;
step S4: and (3) sorting and combining the upper left corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step (S3) to obtain the specific content of the mark labels.
Through the steps, the method automatically identifies the area where the traffic sign label in the scene image is located, segments the area where the sign label is located from the scene image according to the boundary frame parameters of the target area, then identifies all characters in the area, and finally sequences and combines the boundary frame parameters of the characters to obtain the specific content of the sign label. The invention can identify the category and the specific content of the traffic sign label; compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failure caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can achieve better performance by adopting simple steps. Thereby improving the accuracy and the robustness of the detection and the identification of the traffic sign marks.
FIG. 2 is a flow chart for identifying traffic sign labels in an image of a scene, including the steps of:
step S21: inputting the scene image into a mark label detection network to obtain a detection result;
step S22: if the marker is detected, outputting the bounding box parameter { x } of the marker 1eft ,x right ,y top ,y bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image 1eft X coordinate x of right boundary right Y coordinate y of upper boundary top Y coordinate y of lower boundary bottom Wherein the origin of coordinates is located in the upper left corner of the scene image.
Specifically, the marker label detection network model adopts a target detection network based on deep learning, namely a YOLO-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;
the mark label detection network comprises a conv layer, a maxpooling layer, a route layer, an upsample layer and a yolo layer, and 24 layers in total; the conv layer extracts basic features of an original image, such as color, texture, shape and the like, through convolution kernels of 3×3 and 1×1, and the step length is 1; the maxpooling layer adopts a maximum pooling method to carry out maximum sampling on the previous layer, the sliding window size is 2 multiplied by 2, and the step length is 2; the route layer splices the deep feature map and the shallow feature map, and learns the features of the deep layer and the shallow layer at the same time; the upsampling layer upsamples the image; the yolo layer designates parameters such as the category number of the scene images, calculates an average loss value loss and the like of training, and outputs the average loss value loss;
layers 0 through 11 are 6 of the conv layers of convolution kernel size 3 x 3, each of the conv layers being followed by the maxpooling layer; layers 12 to 15 are 4 of said conv layers, the convolution kernel sizes being 3×3, 1×1, respectively; layer 16 is the yolo layer; the 17 th layer is the route layer, and the feature map of the 13 th layer is spliced; layer 18 is the conv layer with a convolution kernel size of 1 x 1; layer 19 is the upsample layer; the 20 th layer is the route layer, and the characteristic diagram of the 19 th layer is spliced with the characteristic diagram of the 8 th layer; layers 21 to 22 are two of the conv layers of convolution kernel sizes 3×3, 1×1, respectively; layer 23 is the yolo layer, and outputs the last mark label detection result.
The original scene image is input into a mark label detection network to know whether the scene image has a mark label or not, if the scene image has no mark label, a 'None Sign' prompt message is output, and the subsequent steps are not executed; if the mark number exists, outputting the boundary box parameter { x ] of the region where the mark number exists 1eft ,x right ,y top ,y bottom And continuing to execute the subsequent steps.
Fig. 3 is a flowchart of dividing the region where the logo mark is located and identifying the character of the region where the logo mark is located according to an embodiment of the present invention, including the steps of:
step S31: if the parameter output in the step S2 is the "None Sign" prompt message, the step S3 and the step S4 are not executed; if the parameter outputted in step S2 is the bounding box parameter { x } 1eft ,x right ,y top ,y bottom Step S3, step S4;
step S32: according to the parameter bounding box parameter { x } output in step S2 1eft ,x right ,y top ,y bottom Dividing a traffic sign label area from the scene image;
step S33: inputting the segmented marker label region image into a character recognition network to obtain a category and boundary frame parameter set U of all characters chr
Where n represents the number of recognized traffic sign label characters and Cn represents the category of the n-th character recognized.
Specifically, the step of dividing the marker label region is a simple image dividing algorithm, and the rectangular frame is divided from the image, so that no extra image processing step is needed.
The character recognition network is a target detection network-YOLOv 3 network based on deep learning, and the number of detected categories is determined according to the characteristics of the traffic sign labels;
the character recognition network comprises 107 layers, wherein the 107 layers comprise a feature extraction network function layer, a feature interaction function layer and a classification and bounding box regression function layer which are sequentially connected;
the 107 layers are respectively subordinate to conv layer, res layer, route layer, upsample layer and yolo layer, wherein:
the conv layer performs feature extraction operation on the feature map;
the res layer is a residual error connecting block, and the features of different layers are connected in a layer jump mode;
the route layer realizes the splicing of the feature graphs in different dimensions;
the upsampling layer is used for upsampling the feature map to reduce the size of the feature map;
the yolo layer mainly carries out loss function calculation, classification prediction and bounding box regression;
the 0 th layer to the 74 th layer are feature extraction network function layers, and the feature extraction network function layers are used for inputting segmented images, extracting image features and providing the output to a feature interaction function layer; the feature extraction network function layer consists of a conv layer and a res layer;
the 75 th layer to the 105 th layer are feature interaction functional layers, and the combination of a conv layer, a route layer, an upsample layer and a yolo functional layer is used for realizing the feature interaction connection between the layers;
the 106 th layer is a classification and bounding box regression functional layer and is composed of a yolo layer; the network finally realizes character feature extraction, character classification and bounding box regression, and finally the yolo layer outputs the category of the traffic sign label character and the bounding box parameters.
FIG. 4 is a flowchart of character bounding box parameter ordering and combining to obtain specific content of logo labels in accordance with an embodiment of the present invention, comprising the steps of:
s41: according to the boundary frame parameter set U of the character obtained in the step 3 chr Taking the x coordinate x of the left boundary of each character boundary box 1eft Forming character position parameter set Wherein n represents the number of recognized characters;
s42: for character position parameter set U chr_x The coordinates corresponding to the categories in (a) are ordered in order from small to large;
s43: and combining the ordered categories to determine the specific content of the mark labels.
The positions and the specific contents of the traffic sign marks in the scene images shot by the camera can be known through the processing of the steps. Because the area where the mark label is located is segmented from the original scene image in the step S3, the character recognition network can accurately recognize the specific content of the mark label, avoid the interference of the non-mark label area on the characteristics of the image, improve the robustness of mark label recognition, and is suitable for the conditions of complex weather conditions, complex illumination conditions, diversified mark labels and the like.
Example 2
The embodiment provides a method for identifying a sign label based on deep learning, in particular a method for identifying a license plate in a scene image, which comprises the following steps:
step 1: capturing a scene image from a monitoring camera in real time;
step 2: inputting the original scene image to be detected captured in the step 1 into a trained mark label detection network, and identifying a license plate region in the scene image to obtain boundary frame parameters of the license plate region;
step 3: dividing the license plate region from the original image according to the boundary frame parameters of the license plate region obtained in the step 2, inputting a trained character recognition network, recognizing all characters of the region where the license plate is located, and obtaining the category of the characters and the boundary frame parameters;
step 4: and (3) sequencing and combining the left upper corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step (3) to obtain the specific content of the license plate.
Through the steps, the method and the device automatically identify the area where the license plate is located in the scene image, divide the area where the license plate is located from the scene image according to the boundary frame parameters of the license plate area, identify all characters in the license plate area, and finally sequence and combine the boundary frame parameters of the characters to obtain specific contents of the license plate. Compared with the existing method for independently segmenting each character, the method does not need to independently segment the characters, and avoids the situation of recognition errors or failure caused by inaccurate character segmentation; compared with the existing method adopting more image processing steps, the method does not need to adopt additional image processing steps, and can be suitable for complex weather conditions, license plates with different angles and the like.
An alternative embodiment of the present invention, steps 1 to 4, is described in detail below with reference to fig. 5 to 8.
Fig. 5 is an example of license plate scene image in embodiment 2 according to the present invention, as shown in fig. 5, (a) is a blurred image, (b) is a skewed license plate image, (c) is a license plate image with darker illumination, and (d) is a license plate image in rainy days, the resolutions are respectively: (2048×1536), (1920×1200), (2048×1536). The license plate in the data set is characterized by 2 fixed Chinese characters, "steel transportation" followed by 5 digits.
Inputting the original image in FIG. 5 into a mark label detection network to obtain a detection result; if the license plate is detected, outputting the boundary frame parameter { x } of the license plate 1eft ,x right ,y top ,y bottom -a }; if no license plate is detected, outputting a 'None Sign' prompt message; the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image 1eft X coordinate x of right boundary right Y coordinate y of upper boundary top Y coordinate y of lower boundary bottom Wherein the origin of coordinates is located in the upper left corner of the scene image. The number of detection categories of the label detection network is 1, namely license: license plate. Fig. 6 is a diagram of the recognition result of the exemplary image of fig. 5 through the logo mark detection network according to embodiment 2 of the present invention, and the specific bounding box parameter results are as follows:
for the bounding box parameters of the license plate of fig. 6 (a):
{x 1eft ,x right ,y top ,y bottom }={578,754,1105,1190}
for the bounding box parameters of the license plate of fig. 6 (b):
{x 1eft ,x right ,y top ,y bottom }={592,760,1152,1260}
for the bounding box parameters of the license plate of fig. 6 (c):
{x 1eft ,x right ,y top ,y bottom }={915,1038,617,666}
for the bounding box parameters of the license plate of fig. 6 (d):
{x 1eft ,x right ,y top ,y bottom }={1109,1272,1047,1140}
parameter bounding box parameter { x ] output according to the above steps 1eft ,x right ,y top ,y bottom Dividing the area where the license plate is located from the scene image; inputting the segmented license plate region image into a character recognition network to obtain a character category and a bounding box parameter set U chr
Where n represents the number of recognized traffic sign label characters and Cn represents the category of the n-th character recognized.
The recognition category number of the character recognition network is 12, namely two Chinese characters of steel and fortune; 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
FIG. 7 is a graph showing the result of the detection of FIG. 6 in embodiment 2 of the present invention, in which the area where the license plate is located is segmented from the scene image, and FIG. 8 is a graph showing the result of the recognition of FIG. 7 by the character recognition network in embodiment 2 of the present invention, in which the character recognition accurately recognizes all the characters in the license plate area, and the boundary box parameter set U of the characters is obtained as shown in FIG. 8 chr
For the bounding box parameter set U of the character of FIG. 8 (a) chr :
For the bounding box parameter set U of the character of FIG. 8 (b) chr :
For the bounding box parameter set U of the character of FIG. 8 (c) chr :
The bounding box parameter set U for the character of FIG. 8 (d) chr :
The boundary frame parameter set U of the character obtained according to the steps chr Taking the x coordinate x of the left boundary of each character boundary box 1ef Forming character position parameter setWherein n represents the number of recognized characters; for character position parameter set U chr_x The coordinates corresponding to the categories in (a) are ordered in order from small to large; and combining the ordered categories to determine the specific content of the license plate.
For the character position parameter set U of FIG. 8 (a) chr_x
For the character position parameter set U of FIG. 8 (b) chr_x
For FIG. 8 (c) character position parameter set U chr_x
For the character position parameter set U of FIG. 8 (d) chr_x
For the character position parameter set U chr_x The coordinates corresponding to the categories in (a) are ordered in the order from small to large, and the results of the ordering and the combination are as follows:
for license plate details of fig. 8 (a): "Steel fortune 20020";
for license plate details of fig. 8 (b): "Steel fortune 20031";
for license plate details of fig. 8 (c): "Steel fortune 10042";
for license plate details of fig. 8 (d): "Steel fortune 10068";
the position of the license plate in the scene image and the specific content thereof can be obtained through the processing of the steps. Under complex scenes, such as image blurring, license plate image deflection, darker image illumination, rainy days and the like, the specific content of the license plate can be accurately detected and identified, and the robustness and accuracy of license plate identification are improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (1)

1. The traffic sign label identification method based on deep learning is characterized by comprising the following steps:
step S1: capturing scene images of the traffic sign labels from the monitoring cameras in real time;
step S2: inputting the original scene image to be detected captured in the step S1 into a trained mark label detection network, and identifying mark labels in the scene image to obtain boundary frame parameters of the mark labels;
step S3: dividing the area where the mark label is located from the original image according to the boundary frame parameters of the mark label obtained in the step S2, inputting the divided mark label image into a trained character recognition network, and recognizing the characters of the area where the mark label is located to obtain the category of the characters and the boundary frame parameters;
step S4: sequencing and combining the upper left corner coordinates of the boundary boxes of all the characters according to the boundary box parameters of the characters obtained in the step S3 to obtain the specific content of the mark labels;
the step S2 specifically includes:
step S21: inputting the scene image into a mark label detection network to obtain a detection result;
step S22: if the marker is detected, outputting the bounding box parameter { x } of the marker 1eft ,x right ,y top ,y bottom -a }; if the mark label is not detected, outputting a 'None Sign' prompt message;
the boundary frame parameter is the x coordinate x of the left boundary of the marked label area rectangular boundary frame in the scene image 1eft X coordinate x of right boundary right Y coordinate y of upper boundary top Y coordinate y of lower boundary bottom Wherein the origin of coordinates is located in the upper left corner of the scene image;
the marker label detection network is a deep learning-based target detection network, namely a YOLO-Tiny network, and the number of detected categories is 1: sign, i.e. the Sign label;
the mark label detection network comprises a conv layer, a maxpooling layer, a route layer, an upsample layer and a yolo layer, and 24 layers in total;
the conv layer extracts basic features of an original image through convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, and the step length is 1;
the maxpooling layer adopts a maximum pooling method to carry out maximum sampling on the previous layer, the sliding window size is 2 multiplied by 2, and the step length is 2;
the route layer splices the deep feature map and the shallow feature map, and learns the features of the deep layer and the shallow layer at the same time;
the upsampling layer upsamples the image;
the yolo layer designates the scene image category number parameter, calculates the average loss value loss of training and outputs the average loss value loss;
layers 0 through 11 are 6 of the conv layers of convolution kernel size 3 x 3, each of the conv layers being followed by the maxpooling layer;
layers 12 to 15 are 4 of said conv layers, the convolution kernel sizes being 3×3, 1×1, respectively;
layer 16 is the yolo layer;
the 17 th layer is the route layer, and the feature map of the 13 th layer is spliced;
layer 18 is the conv layer with a convolution kernel size of 1 x 1;
layer 19 is the upsample layer;
the 20 th layer is the route layer, and the characteristic diagram of the 19 th layer is spliced with the characteristic diagram of the 8 th layer;
layers 21 to 22 are two of the conv layers of convolution kernel sizes 3×3, 1×1, respectively;
the 23 rd layer is the yolo layer, and outputs the last mark label detection result;
the step S3 specifically comprises the following steps:
step S3.1: if the parameter output in the step S2 is the "None Sign" prompt message, the step S3 and the step S4 are not executed; if the parameter outputted in step S2 is the bounding box parameter { x } 1eft ,x right ,y top ,y bottom Step S3, step S4;
step S3.2: according to the parameter bounding box parameter { x } output in step S2 1eft ,x right ,y top ,y bottom Dividing a traffic sign label area from the scene image;
step S3.3: inputting the segmented marker label region image into a character recognition network to obtain a category and boundary frame parameter set U of all characters chr
Wherein n represents the number of the identified traffic sign label characters, cn represents the category of the identified nth character;
the character recognition network is a target detection network-YOLOv 3 network based on deep learning, and the number of detected categories is determined according to the characteristics of the traffic sign labels;
the character recognition network comprises 107 layers, wherein the 107 layers comprise a feature extraction network function layer, a feature interaction function layer and a classification and bounding box regression function layer which are sequentially connected; the 107 layers are respectively subordinate to conv layer, res layer, route layer, upsample layer and yolo layer, wherein:
the conv layer performs feature extraction operation on the feature map;
the res layer is a residual error connecting block, and the features of different layers are connected in a layer jump mode;
the route layer realizes the splicing of the feature graphs in different dimensions;
the upsampling layer is used for upsampling the feature map to reduce the size of the feature map;
the yolo layer mainly carries out loss function calculation, classification prediction and bounding box regression;
the 0 th layer to the 74 th layer are feature extraction network function layers, and the feature extraction network function layers are used for inputting segmented images, extracting image features and providing the output to a feature interaction function layer; the feature extraction network function layer consists of a conv layer and a res layer;
the 75 th layer to the 105 th layer are feature interaction functional layers, and the combination of a conv layer, a route layer, an upsample layer and a yolo functional layer is used for realizing the feature interaction connection between the layers;
the 106 th layer is a classification and bounding box regression functional layer and is composed of a yolo layer; the network finally realizes character feature extraction, character classification and bounding box regression, and finally the yolo layer outputs the category of the traffic sign label character and the bounding box parameter;
the step S4 specifically comprises the following steps:
s4.1: according to the boundary frame parameter set U of the character obtained in the step 3 chr Taking the x coordinate x of the left boundary of each character boundary box 1eft Forming a character position parameter set U chr_x :{C1:x 1eft1 ,C2:x 1eft2 ,…,Cn:x 1eft_n -wherein n represents the number of recognized characters;
s4.2: for character position parameter set U chr_x The coordinates corresponding to the categories in (a) are ordered in order from small to large;
s4.3: and combining the ordered categories to determine the specific content of the mark labels.
CN201911425706.1A 2019-12-31 2019-12-31 Traffic sign label identification method based on deep learning Active CN111191611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911425706.1A CN111191611B (en) 2019-12-31 2019-12-31 Traffic sign label identification method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911425706.1A CN111191611B (en) 2019-12-31 2019-12-31 Traffic sign label identification method based on deep learning

Publications (2)

Publication Number Publication Date
CN111191611A CN111191611A (en) 2020-05-22
CN111191611B true CN111191611B (en) 2023-10-13

Family

ID=70708113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911425706.1A Active CN111191611B (en) 2019-12-31 2019-12-31 Traffic sign label identification method based on deep learning

Country Status (1)

Country Link
CN (1) CN111191611B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114071005B (en) * 2020-08-07 2022-12-27 华为技术有限公司 Object detection method, electronic device and computer-readable storage medium
CN112348025B (en) * 2020-11-06 2023-04-07 上海商汤智能科技有限公司 Character detection method and device, electronic equipment and storage medium
CN112435222A (en) * 2020-11-11 2021-03-02 深圳技术大学 Circuit board detection method and device and computer readable storage medium
CN113591543B (en) * 2021-06-08 2024-03-26 广西综合交通大数据研究院 Traffic sign recognition method, device, electronic equipment and computer storage medium
CN113435446B (en) * 2021-07-07 2023-10-31 南京云创大数据科技股份有限公司 Deep learning-based inclined license plate correction method
CN113963329B (en) * 2021-10-11 2022-07-05 浙江大学 Digital traffic sign detection and identification method based on double-stage convolutional neural network
CN115410184A (en) * 2022-08-24 2022-11-29 江西山水光电科技股份有限公司 Target detection license plate recognition method based on deep neural network
CN117951648B (en) * 2024-03-26 2024-06-07 成都正扬博创电子技术有限公司 Airborne multisource information fusion method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930830A (en) * 2016-05-18 2016-09-07 大连理工大学 Road surface traffic sign recognition method based on convolution neural network
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN108334881A (en) * 2018-03-12 2018-07-27 南京云创大数据科技股份有限公司 A kind of licence plate recognition method based on deep learning
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110619327A (en) * 2018-06-20 2019-12-27 湖南省瞬渺通信技术有限公司 Real-time license plate recognition method based on deep learning in complex scene

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930830A (en) * 2016-05-18 2016-09-07 大连理工大学 Road surface traffic sign recognition method based on convolution neural network
CN107133616A (en) * 2017-04-02 2017-09-05 南京汇川图像视觉技术有限公司 A kind of non-division character locating and recognition methods based on deep learning
CN108334881A (en) * 2018-03-12 2018-07-27 南京云创大数据科技股份有限公司 A kind of licence plate recognition method based on deep learning
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110619327A (en) * 2018-06-20 2019-12-27 湖南省瞬渺通信技术有限公司 Real-time license plate recognition method based on deep learning in complex scene

Also Published As

Publication number Publication date
CN111191611A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111191611B (en) Traffic sign label identification method based on deep learning
CN109740469B (en) Lane line detection method, lane line detection device, computer device, and storage medium
Greenhalgh et al. Recognizing text-based traffic signs
CN103824066B (en) A kind of licence plate recognition method based on video flowing
CN104112128B (en) Digital image processing system and method applied to bill image character recognition
EP3806064A1 (en) Method and apparatus for detecting parking space usage condition, electronic device, and storage medium
KR101992398B1 (en) Method and Apparatus for Recognizing Road Symbols and Lanes
JP6904614B2 (en) Object detection device, prediction model creation device, object detection method and program
CN107992819B (en) Method and device for determining vehicle attribute structural features
CN111325769A (en) Target object detection method and device
CN113221861B (en) Multi-lane line detection method, device and detection equipment
CN110135377B (en) Method and device for detecting motion state of object in vehicle-road cooperation and server
CN111382625A (en) Road sign identification method and device and electronic equipment
CN111191482B (en) Brake lamp identification method and device and electronic equipment
CN111160395A (en) Image recognition method and device, electronic equipment and storage medium
CN106778736A (en) The licence plate recognition method and its system of a kind of robust
CN108573244B (en) Vehicle detection method, device and system
CN116052090A (en) Image quality evaluation method, model training method, device, equipment and medium
CN112507801A (en) Lane road surface digital color recognition method, speed limit information recognition method and system
CN116168380A (en) Target identification method, model training method and electronic equipment
CN111402185A (en) Image detection method and device
CN112784737B (en) Text detection method, system and device combining pixel segmentation and line segment anchor
CN111583341B (en) Cloud deck camera shift detection method
JP6354316B2 (en) Image detection apparatus and program
Chincholkar et al. TRAFFIC SIGN BOARD DETECTION AND RECOGNITION FOR AUTONOMOUS VEHICLES AND DRIVER ASSISTANCE SYSTEMS.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant