WO2013026205A1

WO2013026205A1 - System and method for detecting and recognizing rectangular traffic signs

Info

Publication number: WO2013026205A1
Application number: PCT/CN2011/078897
Authority: WO
Inventors: Charles Chuanming Wang; Yankun ZHANG; Chuyang HONG
Original assignee: Harman International (Shanghai) Management Co., Ltd.
Priority date: 2011-08-25
Filing date: 2011-08-25
Publication date: 2013-02-28

Abstract

This invention is directed to a method for detection and recognition of rectangular traffic signs by detecting, tracking, and recognizing the traffic signs. The method comprises the steps of receiving a plurality of image frames captured by a camera mounted in a vehicle; detecting, in an initialized region of interest (ROI) of each received image frame, to determine whether a traffic sign appears; if a traffic sign appears, starting tracking the traffic sign in a tracking ROI of each received image frame; calculating a rotation angle of the tracked traffic sign and correcting the same with the rotation angle; segmenting digits of the traffic sign into a digit patch; and recognizing the traffic sign based on the digit patch. The present invention is also directed to a system for detecting, tracking, and recognizing rectangular traffic signs, containing the respective units for performing at least the corresponding steps of the method.

Description

SYSTEM AND METHOD FOR DETECTING AND

RECOGNIZING RECTANGULAR TRAFFIC SIGNS

Field of The Invention

The present invention relates to a method and system for detecting and recognizing traffic signs in a moving vehicle, and more specifically it relates to a method and system for detecting, tracking and recognizing rectangular traffic signs.

Background of The Invention

Traffic signs, especially speed limit signs, are very important for car drivers to insure driving safety and comfort. Accordingly, detection and recognition of traffic signs is an important component of a driver assistant system. Current studies on traffic sign recognition have been performed on circular traffic signs, especially circular speed limit signs, which are commonly used in Europe, Asia, and Australia. However, few research works are directed to the recognition of rectangular speed limit signs which are common in the United States.

The methods of detection and recognition of circular speed limit signs are normally based on transformation of shape and colors. Compared with the recognition of circular speed limit signs, the recognition of rectangular speed limit signs seems more difficult. The traditional method for detecting and recognizing rectangular speed limit signs is based on shape detection, which has poor performance because a rectangular speed limit sign is not a regular polygon. Moreover, the recognition of rectangular speed limit signs may require rotation, correction and digit segmentation after being detected. Accordingly, there is a need for an efficient way of detection and recognition of rectangular traffic signs, especially rectangular speed limit signs.

In the processing of detecting and recognizing traffic signs, most known techniques of tracking traffic signs compute a feature map of the entire image frame based on color and gradient information, and then a Kalman filter is usually applied to keep tracking candidates of the objects. Kalman filter based object tracking is in principle a statistical model to estimate the location of an object in a sequence of video frames. It requires enough number of video frames to build up the statistical model. Thus, a limitation of this method is that it cannot be used in a real-time application such as traffic sign detection and recognition. On the other hand, it is desired for a method of fast tracking that will not only reduce the detection time and reject false positives, but also detect traffic signs more accurately in a smaller region of interest (or ROl) for a fine searching. Therefore, there is a need for a reliable method for fast tracking traffic signs.

Summary of The Invention

At least one embodiment of the invention is directed to a method for detecting and recognizing rectangular traffic signs. The method comprises the steps of receiving a plurality of image frames captured by a camera mounted in a vehicle; detecting a presence of a traffic sign in an initialized region of interest (ROl) of each said received image frame; if a traffic sign are detected, generating a tracking ROl for tracking the detected traffic sign within the initialized ROl; tracking the detected traffic sign within the tracking ROl; calculating a rotation angle of the tracked traffic sign, and rotating and correcting the tracked traffic sign with the rotation angles; segmenting the digits of the corrected traffic sign into a patch; recognizing the traffic sign based on one or more digits of the traffic sign; and outputting the recognized one or more traffic signs in a vehicle, Another embodiment is directed to a system for detecting and recognizing rectangular traffic signs. The system comprises a receiving unit for receiving a plurality of image frames captured by a camera mounted in a vehicle; a detecting unit for a presence of a traffic sign in an initialized region of interest (ROl) of each said received image frame; a tracking unit for generating a tracking ROl for tracking the detected traffic sign within the initialized ROl if a traffic sign are detected, and tracking the detected traffic sign within the tracking ROl; a correcting unit for calculating a rotation angle of the tracked traffic sign and correcting the tracked traffic sign with the rotation angel; and a segmenting unit for segmenting digits of the corrected traffic sign into a digit patch; a recognizing unit for recognizing the traffic sign based on one or more digits of the traffic sign; and an outputting unit for outputting the recognized one or more traffic signs in a vehicle. Another embodiment is directed to a method of fast tracking traffic signs for detection and recognition. The method comprises the steps of receiving a plurality of image frames of objects captioned by a camera mounted in a vehicle; detecting a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame; and if a traffic sign is present, tracking the traffic sign in a tracking ROI within the initialized ROI.

Another embodiment is directed to an apparatus of fast tracking traffic signs. The apparatus comprise means for receiving a plurality of image frames captured by a camera mounted in a vehicle; means for detecting a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame; and means for tracking the traffic sign in a tracking ROI within the initialized ROI if a traffic sign is present.

Brief description of the drawings

The features, nature, and advantages of the present invention may be appreciated from the following description in connection with the drawings.

FIG. 1 illustrates a method for detecting and recognizing rectangular traffic signs according to one or more embodiments;

FIG. 2 illustrates an example of a speed limit sign tracking process;

FIG. 3 illustrates an example of a speed limit sign tracking process on the basis of moving information of the vehicle;

FIG. 4 illustrates an example of a rotation alignment procedure;

FIG. 5 illustrates an example of the normalized speed limit digit patches for classification of the recognized speed limit signs;

FIG. 6 illustrates an example of binary tree linear support vector machine structure for speed limit sign recognition; and

FIG. 7 illustrates an example of a voting process for speed sign detection and recognition. Brief Description of The Invention Any embodiment described herein is not necessarily more preferable or advantageous than other embodiments. Although many aspects of the present disclosure are shown in the drawings, the drawings are not necessarily drawn to scale or intended to be all-inclusive. FIG. 1 illustrates a method for detecting, tracking and recognizing any rectangular traffic signs. At step 101, a plurality of image or video frames is received from a camera mounted in a vehicle. The received image frames may be transformed from color images to grayscale images. Frame differences may be calculated to determine whether detection may be performed on an initialized image region of interest (ROI) where a potential traffic sign may exist. When the value of the frame difference is above a predefined threshold, the detection may be performed in the initialized image region of interest (ROI) to determine whether or not an object, for example a rectangular traffic sign, exists in the ROI at step 102. As a non- limiting example, a threshold may be set at, e.g., 0.3. The ROI may be initialized as the entire area of each received image frame.

In one embodiment, a Viola- Jones boosting detector may be used for speed sign detection and recognition. Unlike Haar Wavelet features that are widely used as features in a Viola Jones boosting detector framework, Multi-block Local binary pattern (MB-LBP) features may be used to train adaboost classifiers. The use of MB-LBP features is advantageous due to the amount of information about image structure that can be captured. They can also be calculated more rapidly through integral image. Also, the MB-LBP feature may use a smaller number of features compared to, for example, Haar features. For example, for 14* 18 image patch size, the number of features in MB-LBP is 1530 which is about 1/30 of that of features (47952) in Haar. Therefore, as compared with Haar features, MB-LBP features can save computational time and hardware resources, which are at least two important factors in the driver assistant system.

When an object is detected continuously in two consecutive frames, it can be determined that a traffic sign appears, and then the tracking process in a smaller image ROI will start at step 103. Based on the motion estimation principal used in a video compression (e.g. MPEG) scheme, the geographic position of a traffic sign object in a video frame should remain in a vicinity of the region from frame to frame. Instead of using a Kalman filter to estimate the coordinates of a potential traffic sign in a sequence of video frames, the Viola- Jones detector may be used to identify a potential traffic sign within a limited smaller rectangular region to search for speed limit sign candidates.

In a real environment, speed limit signs are often rotated slightly due to various factors, such as wind, camera shaking, etc. The rotation angle of the speed limit signs may be estimated and corrected at step 104. Here, a simple but effective projection method may be used to estimate a rotation angle Θ. This method uses an efficient computation based on the magnitude of the vertical and horizontal projection on the detected region. Further details of the method will be described below.

After the rotation angle is corrected, the speed limit digits in the traffic signs may be segmented into patches for recognition at step 105. Here, an integral image based adaptive threshold method may be used to convert a gray image to a binary image. For the binary image, a connected components labeling algorithm may be used to find the minimum rectangular region, which contains the speed limit digits. The integral image based adaptive threshold algorithm is less sensitive to illumination changes. It will be described in further details below. At step 106, a clustering based binary tree of a linear support vector machine may be used to classify the segmented patches of the speed limit digits into particular type of speed limit signs.

Fig. 2 illustrates a non-limiting example of a road sign tracking process for road sign recognition, wherein Fig. 2a shows that a traffic sign is detected, and Fig. 2b shows a smaller ROI enclosed by a rectangle, in which the traffic sign is tracked. In Fig. 2a, an object of speed limit sign is detected by a LBP or MB-LBP feature based adaboost detector. When two consecutive image frames include the same object, the object may be determined as a speed limit sign. For example, the object being detected is a rectangular sign or a rectangular speed limit sign. Thereafter, the tracking process may begin. The coordinates of the detected object may be obtained by the detector searching the whole area of the image frame, and thereby obtaining a rectangle region of the object. In the next image frame, the searching and tracking process of a traffic sign may be performed within the pre-determined width and height of the rectangle region which may be proportionally enlarged (e.g., 2-5 times larger than that of the detected sign candidates).

Fig.2b shows the concept of pre-determined rectangle searching. The image in Fig.2b is the next consecutive image frame of the picture of Fig. 2a. The rectangle is the region of interest (ROI) for searching and tracking, and may be enlarged proportionally based on the rectangle area of the detected object candidates. The rectangle area is smaller than the whole image frame being captured by the camera. In some embodiments, the center of the searching area may not be aligned with the center of the traffic sign being detected due to, for example, the motion of the vehicle, as shown in Fig. 2b. Since the rectangle (e.g., tracking) region is smaller than the searching area of the whole image frame, the tracking is faster and more accurate.

In some embodiments, the searching area may be further limited by using some additional motion information, such as, turning direction, and velocity, etc. Fig. 3 illustrates another embodiment of the tracking process. In this embodiment, the tracking process is based on moving information of the vehicle. As discussed above, when an object is detected continuously in two consecutive image frames, that the object may be identified as a traffic sign Once detected, the tracking process may start in a smaller ROI of the image frame. In other words, the coordinates (left, top, right, bottom) of the rectangular objects in two consecutive frames has been obtained, and thus, the center of the rectangle region of the object can be obtained on the image frame. For instance, the coordinates are recorded as cx, cy, w, h. The explanation of how to determine a smaller ROI based on the moving information associated with the vehicle can be shown by the following parameters of the image frames. frame _ i

{rectj^ cx^ cy^ w, ,^

frame _ i + 1

{rect}_i+l , cx_i+l , cy_i+l , w_i+l , h_i+l

{rect} = {left , top , right , bottom}

Here:

cx = (left + right) 12 ;

cy = (top + bottom) 12 ;

w = (right - left);

h = (bottom - top);

The displacement of the object can be calculated as follows:

dy = cy_i+l - cy_i

The displacement of the object may be associated with a speed of the vehicle. The higher the speed of the vehicle, the larger the displacement of the object. Assuming the vehicle runs in a constant speed during a very short time period, the center of the object in frame_i+2 may be estimated as follows:

cy_i+2 '= cy_i+l + dy

Then, the rectangle area of a smaller ROI (hereinafter designated as "roi") can be obtained. roi _left = cx_i+2 '-(r + 0.5) * w_i+l

roi _ top = cy_i+2 ^*-(r + 0.5) * h_i+l

roi _ right = cx_i+2 '+(r + 0.5) * w_i+l

roi _bottom— cy_i+2 ^*+(r + 0.5) * & i+l wherein "r" is a scale factor, 0<r<l . In this embodiment, "r" is set to 0.5.

The tracking process may be performed on the estimated roi, instead of on the ROI. As compared with the embodiment where no vehicle motion information is taken into consideration, this method further reduces the area of searching. A smaller searching area is determined on the basis of the motion information. When the detector detects the object in frame_i+2, the real coordinate of the {rect}_i+2 can be obtained. The displacement of the object due to the motion of the vehicle can be modified as follows:

new _ dx = cx_i+2 - cx_i+l

new_ dy = cy_i+2 - cy_i+l

update (4)

dx = t * dx + (l - t) * new _ dx

dy = t * dy + (1 - t) * new _ dx

wherein "t" is a retardation factor, 0<t<l . In this non-limiting example, "t" may be set to 0.4.

When the object (the rectangular speed limit sign) is being tracked, the rotation angle correction and digits segmentation are applied on the object being tracked to get the digit patches. Fig.4 illustrates an example of the speed limit sign rotation angle alignment procedure according to at least one embodiment of the invention. In a real environment, the speed limit signs are often rotated slightly, e.g., not in straight upright position, due to various factors, such as wind, camera shaking etc. Accordingly, at least one embodiment of the sign recognition process includes a rotation correction of the speed limit signs in the image frames.

Assuming the range of rotation angle is [-10°, 10°] (- 10° < θ < 10° ), the four sides of the rectangle which illustrates the detected traffic sign are enlarged proportionally, e.g. (and without limitation), expanded by 1/4 percent (i.e., 25%). Accordingly, all four sides of the rectangle may be enclosed in a rectangle region on the image frame. In Fig.4, W represents the width of the expanded rectangle, and H represents the height of the expanded rectangle. The expanded rectangle may be rotated 20 times, and each time, the rotation angle may be 1°.

For each rotated rectangle, a projection can be performed. In the expanded rectangle region, projection curves may be obtained by a horizontal projection and a vertical projection. Then, a first order difference operation on the projection curves may be performed to obtain the value of their weighted mean square, as shown in the formula shown in Fig. 4, wherein X; represents the horizontal projection curve; and Y; represents the vertical projection curve;

wherein i=0, 1, ..., w-1; and dyi=yi₊₁-yi, wherein i=0, 1, ..., H-l. For each rotation, a projection is performed and a value calculated. After a number of rotations (e.g., twenty times), total values may be obtained. The angle corresponding to the maximum value of those twenty (20) values is considered as the final estimation of the rotation angle Θ.

After the rotation correction, digit segmentation is performed on the corrected images to obtain digit patches. An integral image based adaptive threshold method may be used to convert the gray image frames to binary images. Adaptive threshold takes into account spatial variations in illumination, which is more robust to illumination changes in the image. This method calculates integral image first through the input. In the second pass, it computes the s xs average using the integral image for each pixel. In one non- limiting embodiment, "s" is set to 4. For the binary image, a connected component labeling algorithm is used to obtain the digit patches.

After the rotation correction and segmentation, the digit patches with a minimum rectangular region may be normalized to the same size. Fig.5 shows some of the normalized image patches. Once the digit patches are extracted, those speed limit digits within the rectangular region may be classified. A principal component analysis (PCA) may be applied to extract the feature vector. Then a clustering based binary tree linear support vector machine (BTSVM) strategy is formed, and may be used to classify the digits. An example is given to show the BTSVM strategy in Fig.6. The speed limit signs have a plurality of classes, e.g., 15, 30, 45 miles/hour, and so on. At the root of the tree, an object is classified as either class "1" or "0". An object that is classified into class "1" (left branch in Fig. 6) is considered as a valid sign in one of the plurality of classes, while an object in class "0" (right branch in Fig. 6) is considered not having any valid digit (e.g., no digit on the sign or undetected digit sign due to blur, occlusion, etc.). Therefore, the right branch is terminated.

Before the binary tree is constructed continuously, an unsupervised clustering of those classes feature vectors of the training samples are performed to find their similarities. For instance, all the classes ending with a "zero" (0) digit are clustered into one (left) branch and those ending with a "five" (5) digit are clustered into the other (right) branch. According to their similarities, BTSVM is constructed into left and right branches until all classes of digits are classified as leaf nodes. Fig.6 is an exemplary BTSVM for 16 classes of speed limits, i.e., 10, 15, 20 85. The result can be output to the driver through an output device of a navigation system on the vehicle.

The voting process may start when a speed limit sign object disappears in the sight. The voting result may be more accurate. Fig.7 shows the voting process. In one embodiment, a weights based voting strategy may be used. When a car passes a speed limit sign, the size of the speed limit sign in the consecutive video or image frames varies from small to large and then disappears because the vehicle or car is approaching the sign and then passes it. Large objects may be easier to identify as compared with small objects. Thus, the recognition results corresponding to large objects may be more reliable than that of small objects. Voting weights may be assigned based on the size of the recognized sign in the sequence of the consecutive video or image frames. For example, the larger the size, the larger the weight. The voting output to the vehicle occupant may be the current passing speed limit digit. As shown in Fig.7, the image size of the eighth frame is larger and, accordingly, the weight assigned to this frame is large. The voting result of the eighth frame may be selected as the output (e.g., '40').

The aforesaid detection and recognition method can be carried out by a computer-implemented system. The system of detection and recognition traffic signs may comprise a receiving unit for receiving a plurality of image frames from a camera mounted on a vehicle; a detecting unit for detecting whether a traffic sign appears in an initialized region of interest (ROI) of each received image frame and determining that a traffic sign appears when an object is detected continuously in two consecutive image frames; a tracking unit for tracking the detected traffic sign in a smaller ROI of each received image frame upon determination of appearance of the traffic sign and being informed by the detecting unit. The tracking unit can be further configured to determine a smaller ROI of each received image frame based on various motion information of the vehicle. The tracked traffic signs may be passed into a correcting unit for estimating and correcting a rotation angle of the tracked traffic sign using a projection process. After the rotation correction, the result may be sent to a segmenting unit to segment the digits of the corrected traffic sign into patches. A classifying unit may classify the digit patches to determine a particular type of the detected traffic sign using a clustering process based a binary tree linear support vector machine (BTSVM). The system may further comprise a normalizing unit for normalizing the region of the segmented patches to the same size before the segmented patches are classified by the classifying unit. After the classified results are obtained, a voting unit of the system may perform a voting process using multiple frames majority voting mechanism, thereby further improving the recognition result.

While the above described embodiments are illustrative, it is not intended that these embodiments limit the scope of the invention. Various modifications to the embodiments or alternatives may be available without departing from the sprit or scope of the invention.

Claims

1. A method of detecting and recognizing traffic signs, the method comprising:

receiving a plurality of image frames captured by a camera mounted in a vehicle; detecting a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame;

if a traffic sign are detected, generating a tracking ROI for tracking the detected traffic sign within the initialized ROI;

tracking the detected traffic sign within the tracking ROI;

calculating a rotation angle of the tracked traffic sign, and rotating and correcting the tracked traffic sign with the rotation angles;

segmenting the digits of the corrected traffic sign into a patch;

recognizing the traffic sign based on one or more digits of the traffic sign; and

outputting the recognized one or more traffic signs in a vehicle.

2. The method according to claim 1, wherein the traffic signs are rectangular traffic signs, particularly speed limit signs.

3. The method according to claim 1, wherein the step of detecting comprises the determination of the appearance of the traffic sign when an object is detected continuously in two consecutive image frames.

4. The method according to claim 1, wherein the initialized ROI is the entire region of each said received image frame.

5. The method according to claim 1, wherein the tracking ROI of each said received image frame is a predetermined plurality of times larger than the detected region of the traffic sign, but smaller than the entire region of each said received image frame.

6. The method according to claim 1, wherein the tracking ROI of each said received image frame is determined on the basis of motion information of the vehicle.

7. The method according to claim 1, wherein the step of correcting further comprises estimating the rotation angle using a projection process.

8. The method according to claim 1, wherein the step of recognizing comprises classifying the patches using a clustering based binary tree linear support vector machine.

9. The method according to claim 1, further comprising a step of normalizing the region of the segmented digit patches to the same size.

10. The method according to claim 1, further comprising a step of voting the traffic sign with selected weighted factors on the basis of the size of the detected traffic sign to get an accurate result for recognition.

11. A system of detecting and recognizing of traffic signs, comprising:

a receiving unit for receiving a plurality of image frames captured by a camera mounted in a vehicle;

a detecting unit for a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame;

a tracking unit for generating a tracking ROI for tracking the detected traffic sign within the initialized ROI if a traffic sign are detected, and tracking the detected traffic sign within the tracking ROI;

a correcting unit for calculating a rotation angle of the tracked traffic sign and correcting the tracked traffic sign with the rotation angel; and

a segmenting unit for segmenting digits of the corrected traffic sign into a digit patch; a recognizing unit for recognizing the traffic sign based on one or more digits of the traffic sign; and

an outputting unit for outputting the recognized one or more traffic signs in a vehicle.

12. The system according to claim 11, wherein the detecting unit is configured to determine the appearance of the traffic sign when an object is detected continuously in two consecutive image frames.

13. The system according to claim 11, wherein the initialized ROI is the entire region of each said received image frame.

14. The system according to claim 11, wherein the tracking ROI of each said received image frame is a predetermined plurality of times larger than the detected region of the traffic sign, and smaller than the entire region of each said received image frame.

15. The system according to claim 11, wherein the tracking unit is configured to determine the tracking ROI of each said received image frame based on motion information of the vehicle.

16. The system according to claim 11, wherein the correcting unit is further configured to estimate the rotation angle using a projection process.

17. The system according to claim 11, wherein the classifying unit is further configured to classify the digit patches using a clustering based binary tree linear support vector machine.

18. The system according to claim 11, further comprising a normalizing unit for normalizing the region of the segmented digit patches to the same size

19. The system according to claim 11, further comprising a voting unit for voting the traffic sign with selected weighted factors on the basis of the size of the detected traffic sign to get an accurate result for recognition.

20. A method of fast tracking traffic signs for detection and recognition, comprising the following steps of: receiving a plurality of image frames of objects captioned by a camera mounted in a vehicle;

detecting a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame; and

if a traffic sign is present, tracking the traffic sign in a tracking ROI within the initialized

ROI.

21. The method according to claim 20, wherein the step of detecting comprises the determination of the appearance of the traffic sign when an object is detected continuously in two consecutive image frames.

22. The method according to claim 20, wherein the initialized ROI is the entire region of each said received image frame.

23. The method according to claim 20, wherein the tracking ROI of each said received image frame is predetermined to be a plurality times larger than the detected region of the traffic sign, but smaller than the entire region of each said received image frame.

24. The method according to claim 20, wherein the tracking ROI of each said received image frame is determined based on motion information of the vehicle.

25. An apparatus of fast tracking traffic signs, comprising:

means for receiving a plurality of image frames captured by a camera mounted in a vehicle;

means for detecting a presence of a traffic sign in an initialized region of interest (ROI) of each said received image frame;

means for tracking the traffic sign in a tracking ROI within the initialized ROI if a traffic sign is present.

26. The apparatus according to claim 25, wherein the means for detecting is configured to determine an appearance of a traffic sign when an object is detected continuously in two consecutive image frames.

27. The apparatus according to claim 25, wherein the initialized ROI is the entire region of each said received image frame.

28. The apparatus according to claim 25, wherein the tracking ROI of image frames is predetermined to be a plurality times larger than the detected region of the traffic sign, but smaller than the entire region of each said received image frame.

29. The apparatus according to claim 25, wherein the means for tracking is configured to determine the tracking ROI of each said received image frame based on motion information of the vehicle.