CN102722707A

CN102722707A - License plate character segmentation method based on connected region and gap model

Info

Publication number: CN102722707A
Application number: CN2012101898982A
Authority: CN
Inventors: 蒋龙泉; 王琰滨; 冯瑞; 金城; 薛向阳
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2012-06-11
Filing date: 2012-06-11
Publication date: 2012-10-10

Abstract

The invention belongs to the technical field of digital image processing and pattern recognition, in particular to a license plate character segmentation method based on a connected area and gap model. The present invention first performs binarization processing on the license plate gray image, then marks the connected area on the binarized license plate image, initially obtains the character area in the license plate, and then establishes a gap model according to the ratio between the standard license plate characters to determine the characters of the license plate , thus completing the segmentation of the character area in the license plate. The invention effectively utilizes the size of characters and gaps of the standard license plate, and avoids the influence of unclear characters on character segmentation caused by license plate stains, image illumination and the like. Compared with traditional license plate character segmentation methods based on connected regions and projection segmentation, the present invention effectively improves the accuracy of license plate character segmentation, and better solves the problem of license plate character segmentation caused by unclear images.

Description

License Plate Character Segmentation Method Based on Connected Region and Gap Model

技术领域 technical field

本发明属于数字图像处理及模式识别技术领域，具体涉及智能交通管理中交通场景车牌识别的方法。 The invention belongs to the technical field of digital image processing and pattern recognition, and in particular relates to a method for recognizing license plates in traffic scenes in intelligent traffic management.

技术背景 technical background

智能交通系统(Intelligent Transportation System，简称ITS)是将电子技术、计算机技术、信息技术、传感器技术和系统工程技术集成运用于地面运输的实际需求的各种交通道路监视管理系统、车辆控制系统及公路交通安全系统的综合系统。车牌识别(Vehicle License Plate Recognition，简称VLPR)是智能交通系统的重要组成部分之一，其应用非常广泛。车牌识别以数字图像处理、模式识别、计算机视觉等技术为基础，对摄像机所拍摄的车辆图像或者视频序列进行分析，得到每辆汽车唯一的车牌号码，从而完成识别过程。通过一些后续的处理手段，车牌识别技术可以实现停车场收费管理、交通流量控制指标测量、车辆定位、汽车防盗、高速公路超速自动化监管等功能，对维护交通安全和城市治安，防止交通堵塞，实现交通的自动化管理有着现实的意义。 Intelligent Transportation System (Intelligent Transportation System, referred to as ITS) is to integrate electronic technology, computer technology, information technology, sensor technology and system engineering technology into various traffic road monitoring and management systems, vehicle control systems and road traffic control systems to meet the actual needs of ground transportation. An integrated system of traffic safety systems. Vehicle License Plate Recognition (VLPR for short) is one of the important components of the intelligent transportation system, and its application is very extensive. License plate recognition is based on digital image processing, pattern recognition, computer vision and other technologies. It analyzes the vehicle images or video sequences captured by the camera to obtain the unique license plate number of each car, thus completing the recognition process. Through some follow-up processing methods, the license plate recognition technology can realize the functions of parking lot fee management, traffic flow control index measurement, vehicle positioning, car anti-theft, expressway overspeed automatic supervision, etc., to maintain traffic safety and urban security, prevent traffic jams, realize The automatic management of traffic has practical significance.

车牌识别技术主要包括三个步骤：车牌图像的定位、车牌字符的分割和车牌字符的识别。“车牌定位”是利用图像处理方法和模式识别技术从一幅具有不确定背景的车辆数字图像中准确定位出车牌的位置，为后续的车牌字符分割和识别处理提供准确可靠的数据源。“字符分割”是将车牌区域分割成单个的字符区域，用于下一步的字符识别。车牌的字符由有限的汉字、英文字母和数字组成，“车牌字符的识别”则是对分割出来的待识别的字符进行分类从而识别它们。 The license plate recognition technology mainly includes three steps: the location of the license plate image, the segmentation of the license plate characters and the recognition of the license plate characters. "License plate location" is to use image processing methods and pattern recognition technology to accurately locate the position of the license plate from a digital image of a vehicle with an uncertain background, and provide an accurate and reliable data source for subsequent license plate character segmentation and recognition processing. "Character segmentation" is to divide the license plate area into a single character area for the next step of character recognition. The characters of the license plate are composed of limited Chinese characters, English letters and numbers. "Recognition of license plate characters" is to classify the separated characters to be recognized to identify them.

其中，“字符分割”是车牌识别技术承前启后的一环，字符分割的正确与否直接影响到车牌字符识别的结果。但是，“字符分割”却是难度较大的部分，其难度主要体现在以下几个问题上： Among them, "character segmentation" is a link between the past and the future of license plate recognition technology, and the correctness of character segmentation directly affects the result of license plate character recognition. However, "character segmentation" is a more difficult part, and its difficulty is mainly reflected in the following issues:

(1)由于天气情况，车牌图像清洁程度不一，经过车牌预处理后得到的车牌二值图像中，可能存在字符粘连，也可能存在字符笔画断裂，因此，在字符分割方法中应避免将多个字符分割为一个字符，以及将一个字符分割成多个字符的误操作。 (1) Due to the weather conditions, the license plate image has different cleanliness. In the license plate binary image obtained after the license plate preprocessing, there may be character adhesion and character stroke breakage. Therefore, in the character segmentation method, it should be avoided. Characters are split into one character, and one character is split into multiple characters by mistake.

(2)由于字符“1”的宽度比其它字符的宽度要窄，如何区分字符“1’’与干扰笔画，特别是车牌左右保留下来的竖直边框，在方法中应当充分考虑。 (2) Since the width of the character "1" is narrower than that of other characters, how to distinguish the character "1" from interfering strokes, especially the vertical borders left and right of the license plate, should be fully considered in the method.

(3)经过预处理后的车牌二值图，并不能精确定位到字符的上下边界，还受到边框及铆钉的干扰，另外车牌的左右两边也会存在干扰，特别是左右两边的边框，并且可能与第一个或者最后一个字符发生粘连。因此所设计的分割方法不能受这些干扰区域的影响。 (3) The preprocessed license plate binary image cannot accurately locate the upper and lower boundaries of the characters, and is also interfered by the frame and rivets. In addition, there will be interference on the left and right sides of the license plate, especially the left and right borders, and may Glue to the first or last character. Therefore the designed segmentation method cannot be affected by these disturbing regions.

(4)由于拍摄时光照的影响，车牌的全部或者部分区域显得特别亮，导致字符和背景颜色区分度不大，分割方法应该考虑一种合适的灰度化和二值化方法准确的区分背景区域和字符区域。 (4) Due to the influence of light during shooting, all or part of the license plate area is particularly bright, resulting in little distinction between the characters and the background color. The segmentation method should consider a suitable grayscale and binarization method to accurately distinguish the background regions and character regions.

目前常用的几种字符分割方法有： Several character segmentation methods commonly used at present are:

(1)基于投影法的字符分割方法。利用垂直投影法快速找到字符之间的最优分割点，并可利用横向投影来去除车牌边框等干扰。 (1) Character segmentation method based on projection method. Use the vertical projection method to quickly find the optimal segmentation point between characters, and use the horizontal projection to remove interference such as license plate borders.

(2)基于车牌字符几何特征的分割字符方法。该方法先用数学形态学对二值化后的车牌图像进行一系列的形态运算，去除掉一些无用信息，使得字符与车牌左右边框、字符与字符之间空隙变大，便于划出字符间的垂直分割线。 (2) A character segmentation method based on the geometric features of license plate characters. This method first uses mathematical morphology to perform a series of morphological operations on the binarized license plate image to remove some useless information, so that the characters and the left and right borders of the license plate and the gap between characters become larger, so that it is easy to draw the gap between characters. Vertical dividing line.

(3)基于模板匹配的最大类间方差车牌字符分割方法。根据字符串的结构和尺寸特征，设计了车牌字符串模板，该模板在车牌区域滑动匹配进行分类，并结合最大类间方差判决准则确定最佳匹配位置，分割车牌字符。 (3) A method for segmentation of license plate characters with maximum inter-class variance based on template matching. According to the structure and size characteristics of the string, a license plate string template is designed, which is classified by sliding matching in the license plate area, and combined with the maximum inter-class variance judgment criterion to determine the best matching position and segment the license plate characters.

(4)基于扫描的字符分割方法，该方法利用由中间向两端搜索的方法确定字符的上下边界，且利用一维循环清零法及车牌字符的相关规则进行垂直分割得到单个的字符。 (4) A character segmentation method based on scanning. This method utilizes the method of searching from the middle to both ends to determine the upper and lower boundaries of the character, and utilizes the one-dimensional cycle clearing method and the relevant rules of the license plate character to carry out vertical segmentation to obtain a single character.

(5)基于连通区域分析分割车牌字符的方法。即按照属于同一个字符的像素构成一个连通区域的原则，再结合车牌字符的相关规则，从而较好解决了汽车牌照在复杂背景条件下的字符切分问题。 (5) A method for segmenting license plate characters based on connected region analysis. That is, according to the principle that the pixels belonging to the same character form a connected area, combined with the relevant rules of the license plate characters, the character segmentation problem of the license plate under complex background conditions is better solved.

(6)基于神经网络和颜色特征的车牌字符分割方法。在颜色空间中利用树型判决结构，首先由亮度信息识别出车牌区域中的白色、黑色像素，然后利用网络对车牌区域中的蓝、红、黄色以及其他颜色进行识别。根据车牌的颜色特征，在判断出车牌的类型后，对车牌区域进行二值化处理。去除车牌边框和铆钉后，综合利用投影法和字符的连通性来分割车牌字符。 (6) License plate character segmentation method based on neural network and color features. In the color space, the tree-type decision structure is used to first identify the white and black pixels in the license plate area from the brightness information, and then use the network to identify the blue, red, yellow and other colors in the license plate area. According to the color feature of the license plate, after the type of the license plate is judged, the binarization process is performed on the license plate area. After removing the license plate frame and rivets, the license plate characters are segmented by comprehensively using the projection method and the character connectivity.

以上这些车牌字符分割方法，是车牌字符分割方法发展过程中逐渐提高的方法，但都没有很好的解决污损或受光照影响车牌的字符分割，特别是车牌中有两三个字符区域模糊或混淆的情况，这些方法往往都不能准确的分割，在实际交通场景中，字符分割的准确率就会大幅度降低。 The above license plate character segmentation methods are gradually improved during the development of license plate character segmentation methods, but none of them can solve the problem of defaced or affected license plate character segmentation, especially if there are two or three character areas in the license plate that are blurred or blurred. In the case of confusion, these methods often cannot accurately segment, and in actual traffic scenarios, the accuracy of character segmentation will be greatly reduced.

由于交通领域应用场景的复杂性，车牌识别技术所要解决的问题相当复杂。目前的车牌识别技术在实际应用中还存在一些不足之处，例如：外界亮度过低、光照条件恶劣、特殊天气条件、复杂的非车牌区域干扰等都会给车牌的定位造成一定的难度；车牌自身的清洁度或光照条件的影响会给字符切分造成困难；相似字符的识别区分也存在某种困难。 Due to the complexity of the application scenarios in the traffic field, the problems to be solved by the license plate recognition technology are quite complicated. The current license plate recognition technology still has some shortcomings in practical applications, such as: low external brightness, bad lighting conditions, special weather conditions, complex non-license plate area interference, etc. will cause certain difficulties to the positioning of the license plate; the license plate itself The influence of cleanliness or lighting conditions will cause difficulty in character segmentation; there is also some difficulty in recognizing and distinguishing similar characters.

发明内容 Contents of the invention

本发明的目的是，为解决上述车牌污损和光照不均匀导致车牌字符难以区分的问题，提供一种基于连通区域和间隙模型的车牌字符分割方法。 The object of the present invention is to provide a method for segmenting license plate characters based on a connected region and gap model in order to solve the problem that the license plate characters are difficult to distinguish due to the above-mentioned license plate defacement and uneven illumination.

本发明提出的基于连通区域和间隙模型的车牌字符分割方法，是在定位获取车牌之后，进行如下步骤的操作： The license plate character segmentation method based on the connected area and the gap model proposed by the present invention is to perform the following steps after positioning and obtaining the license plate:

（1）灰度化和二值化处理 (1) Grayscale and binarization processing

① 采用分量法将车牌图像进行灰度化处理，其步骤为：将彩色图像中的RGB三分量的亮度作为三个灰度图像的灰度值，根据应用需要选取其中一种灰度图像； ① Use the component method to grayscale the license plate image. The steps are: use the brightness of the RGB three-component in the color image as the grayscale value of the three grayscale images, and select one of the grayscale images according to the application needs;

由于蓝色车牌的红色分量比较明显，黄色车牌的绿色分量比较明显，为了增加字符区域和背景之间的区分度，需要对蓝色和黄色车牌进行灰度化处理； Since the red component of the blue license plate is more obvious, and the green component of the yellow license plate is more obvious, in order to increase the distinction between the character area and the background, it is necessary to grayscale the blue and yellow license plates;

② 在步骤（1）-①的灰度化处理之后，对车牌图像进行二值化处理，其步骤为：采用NiBlack动态阈值二值化算法，通过公式计算得到车牌图像中某像素点（坐标点）阈值，将该阈值与坐标点的像素值做比较，小于阈值为背景颜色，大于阈值为字符颜色，得到车牌二值图像； ② After the grayscale processing in step (1)-①, the license plate image is binarized. The steps are: use the NiBlack dynamic threshold binarization algorithm, and calculate a certain pixel point (coordinate point) in the license plate image through the formula ) threshold, compare the threshold with the pixel value of the coordinate point, if it is less than the threshold, it is the background color, and if it is greater than the threshold, it is the character color, and the binary image of the license plate is obtained;

（2）标记连通区域 (2) Mark connected regions

所述连通区域为八连通区域，即任一像素点和与之周围相邻的八个像素点的像素值相同时，则视为它们之间连通；以此为标准，将步骤（1）-②得到的二值图像按照递归的方法标记连通区域；连通区域经过标记确定后，记录连通区域最小和最大的横纵坐标； The connected area is an eight-connected area, that is, when any pixel point has the same pixel value as the eight adjacent pixel points around it, it is regarded as connected between them; based on this standard, step (1)- ② The obtained binary image is marked with the connected region according to the recursive method; after the connected region is marked and determined, record the minimum and maximum horizontal and vertical coordinates of the connected region;

（3）去除杂质噪声 (3) Remove impurity noise

通过步骤（2）记录的连通区域最小和最大的横纵坐标计算连通区域的几何特征，判断连通区域是否为字符区域；若满足条件则保留；否则，则设为背景色； Calculate the geometric features of the connected region through the minimum and maximum horizontal and vertical coordinates of the connected region recorded in step (2), and judge whether the connected region is a character region; if the condition is met, keep it; otherwise, set it as the background color;

（4）横向切割 (4) Horizontal cutting

车辆的车牌是通过铆钉固定在车辆上的，在车牌的二值图像中部分字符区域和铆钉相连，使其和边框连通成为一个连通区域，这样的区域因不符合字符区域的几何特征将会被删除；保留下来的字符区域的上、下两个中心坐标点分别存在一个集合中，对这两个坐标点集分别进行线性拟合，上、下各得到一条直线；两条直线将字符区域与铆钉分离开，重复进行步骤（2）和步骤（3）的操作，将与铆钉相连的字符区域保留下来； The license plate of the vehicle is fixed on the vehicle by rivets. In the binary image of the license plate, part of the character area is connected to the rivet, making it connected with the frame to form a connected area. Such an area will be blocked because it does not conform to the geometric characteristics of the character area. Delete; the upper and lower central coordinate points of the reserved character area are respectively stored in a set, and linear fitting is performed on the two coordinate point sets respectively, and a straight line is obtained from the upper and lower points respectively; the two straight lines connect the character area with the The rivet is separated, and the operation of step (2) and step (3) is repeated, and the character area connected with the rivet is reserved;

（5）间隙计算恢复剩余字符。 (5) The gap calculation restores the remaining characters.

通过计算相邻连通区域间的间隙，计算这两个连通区域间存在几个字符和几个间隙，再根据已知区域的标准宽度和计算获得的间隙宽度来恢复被误删的字符区域；通过这些连续的保留字符区域计算字符的标准宽度和标准高度，再通过标准宽度计算间隙的宽度，最后利用由标准宽度和标准高度确定的字符框，按间隙向左右扩展，最终确定车牌的七个字符区域，完成分割。 By calculating the gap between adjacent connected regions, calculate how many characters and how many gaps exist between these two connected regions, and then restore the character region that was deleted by mistake according to the standard width of the known region and the calculated gap width; These continuous reserved character areas calculate the standard width and standard height of the characters, and then calculate the width of the gap through the standard width, and finally use the character frame determined by the standard width and standard height to expand left and right according to the gap, and finally determine the seven characters of the license plate region to complete the segmentation.

本发明的积极效果是： The positive effect of the present invention is:

（1）利用二值图像字符连通和车牌字符标准几何形状的特性，通过连通区域的几何形状信息能够去除不规则的杂质噪声。 (1) Utilizing the characteristics of binary image character connectivity and license plate character standard geometry, the irregular impurity noise can be removed through the geometric shape information of the connected area.

（2）利用已确定的字符区域的位置信息进行上、下切割，能很好的避免铆钉等杂质与字符相连对分割的影响。 (2) Use the position information of the determined character area to perform upper and lower cutting, which can well avoid the influence of rivets and other impurities connected with characters on the segmentation.

（3）利用车牌标准字符大小和字符间隙的比例，计算字符所在的位置，关键的作用就在于能够恢复一下由于污渍或光照影响而不符合几何特性被删除的字符区域。 (3) Using the ratio of the standard character size of the license plate and the character gap to calculate the position of the character, the key function is to be able to restore the character area that was deleted due to stains or lighting effects that do not conform to the geometric characteristics.

附图说明 Description of drawings

图1是本发明基于连通区域和间隙模型的车牌字符分割方法的流程框图。 Fig. 1 is a block flow diagram of the license plate character segmentation method based on the connected region and gap model of the present invention.

图2是NiBlack二值化算法原理图。 Figure 2 is a schematic diagram of the NiBlack binarization algorithm.

图3是标记流通区域方法流程图。 Fig. 3 is a flowchart of a method for marking a circulation area.

图4是去除杂质方法流程图。 Fig. 4 is a flowchart of a method for removing impurities.

图5是车牌字符区域保留区域的类型表。 Fig. 5 is a type table of the reserved area of the license plate character area.

图6是标准汽车车牌（图像来源于GA36-2007标准）。 Figure 6 is a standard car license plate (the image comes from the GA36-2007 standard).

图7是车牌字符间隙类型表。 Fig. 7 is a table of license plate character gap types.

图8是车牌二值图像保留区域间隙扩展后的类型表。 Fig. 8 is a type table after the gap expansion of the reserved area of the binary image of the license plate.

图9是本发明方法处理一张车牌的各步骤效果图。 Fig. 9 is an effect diagram of each step of processing a license plate by the method of the present invention.

具体实施方式 Detailed ways

以下结合附图解释本发明基于连通区域和间隙模型的车牌字符分割方法的具体实施方式，但是应该指出，本发明的实施不限于以下的实施方式。 The specific implementation of the license plate character segmentation method based on the connected region and gap model of the present invention is explained below in conjunction with the accompanying drawings, but it should be pointed out that the implementation of the present invention is not limited to the following embodiments.

一种基于连通区域和间隙模型的车牌字符分割方法，首先对车牌图像进行灰度化和二值化，标记连通区域之后去除杂质噪声，然后横向切割之后再重复标记连通区域和去除杂质噪声两个步骤，最后通过间隙计算恢复剩余字符获取七个字符区域。 A license plate character segmentation method based on the connected region and gap model. First, the license plate image is grayed and binarized, and the impurity noise is removed after marking the connected region, and then the two steps of marking the connected region and removing the impurity noise are repeated after the horizontal cutting step, and finally restore the remaining characters through gap calculation to obtain seven character regions.

本发明方法的具体运算步骤如附图1所示。 The specific operation steps of the method of the present invention are as shown in accompanying drawing 1.

一、灰度化和二值化处理 1. Grayscale and binarization processing

首先，这是蓝色车牌，进行红色分量二值化；然后进行NiBlack动态阈值二值化。 First, this is a blue license plate, and the red component is binarized; then NiBlack dynamic threshold binarization is performed.

通过计算图像中每个像素点其邻域的统计性质(均值与方差)来确定此点的阈值，方法原理图如图2所示,计算方法如下公式所示： The threshold of this point is determined by calculating the statistical properties (mean and variance) of each pixel in the image and its neighborhood. The schematic diagram of the method is shown in Figure 2. The calculation method is shown in the following formula:

Figure 2012101898982100002DEST_PATH_IMAGE001

公式(1)是图像中坐标点

Figure 2012101898982100002DEST_PATH_IMAGE003

的阈值计算公式，k是权重系数，

Figure 2012101898982100002DEST_PATH_IMAGE005

、

分别是以坐标点

为中心正方形的邻域内所有像素点的方差和均值。得到此点的阈值后，就将其与该坐标点的像素值做比较，小于阈值T为背景颜色，大于阈值T为字符颜色。如下公式所示： Formula (1) is the coordinate point in the image

threshold Calculation formula, k is the weight coefficient,

,

are coordinate points

is the variance and mean of all pixels in the neighborhood of the central square. After getting the threshold of this point, compare it with the pixel value of this coordinate point, if it is less than the threshold T, it is the background color, and if it is greater than the threshold T, it is the character color. As shown in the following formula:

公式(2)中

表示二值图像

中坐标点的像素值，

Figure 2012101898982100002DEST_PATH_IMAGE011

为灰度图像中坐标点的像素值，为上述步骤计算出来的阈值。 In formula (2)

Represents a binary image

middle the pixel value of the coordinate point,

is the coordinate point in the grayscale image the pixel value of Threshold calculated for the above steps.

可以看出，由于车牌的污质较多，二值化后的车牌图像杂质区域很多，噪声影响很大。 It can be seen that due to the large amount of dirt on the license plate, there are many impurity areas in the binarized license plate image, and the noise has a great influence.

二、标记连通区域 2. Mark connected regions

按照上一步的二值化的结果，黑色为字符颜色，白色为背景颜色。按照递归的思想标记连通区域，流程图如图3所示： According to the result of binarization in the previous step, black is the character color and white is the background color. According to the idea of recursion, the connected regions are marked, and the flow chart is shown in Figure 3:

步骤1.对每一个像素点设置标记，初始均设为0； Step 1. Set a mark for each pixel, initially set to 0;

步骤2.遍历每一个像素点，若像素值为黑色且标记为0，计数器加1，若遍历完所有像素点则结束，否则进行步骤3； Step 2. Traverse each pixel. If the pixel value is black and marked as 0, add 1 to the counter. If all pixels have been traversed, it will end. Otherwise, go to step 3;

步骤3.标记此点为当前计数器的值，并更新此连通区域的最小和最大横纵坐标的值； Step 3. Mark this point as the value of the current counter, and update the minimum and maximum horizontal and vertical coordinate values of this connected region;

步骤4.依次判断此点周围的八个点的像素值是否为黑色且标记为0，若满足条件，则进入此点继续步骤3；若八个点均不满足条件，则递归继续查找上一个点的连通区域，若递归结束，则进入步骤5； Step 4. Determine in turn whether the pixel values of the eight points around this point are black and marked as 0. If the conditions are met, enter this point and continue to step 3; if none of the eight points meet the conditions, continue to search for the previous one recursively The connected area of the point, if the recursion ends, go to step 5;

步骤5.根据记录的最小和最大横纵坐标计算此连通区域的高度、宽度和高宽比等特征值，并根据这些特征值判断此连通区域是否为字符区域，若是，则保留此区域；若不是，则将此区域标记为背景色，计数器减1。返回继续执行步骤2。 Step 5. Calculate the characteristic values such as the height, width and aspect ratio of this connected region according to the minimum and maximum horizontal and vertical coordinates of the record, and judge whether this connected region is a character region according to these characteristic values, if so, then keep this region; if If not, mark this area as the background color and decrement the counter by 1. Go back to step 2.

连通区域经过标记确定后，记录连通区域的最小和最大横纵坐标。 After the connected region is marked and determined, record the minimum and maximum horizontal and vertical coordinates of the connected region.

三、按几何特征去除杂质噪声区域 3. Remove impurity and noise areas according to geometric features

由于光照、车牌边框及车牌上的污渍等原因，二值化之后的车牌图像杂质噪声较多，通过连通区域的最小和最大横纵坐标计算杂质噪声的几何特征来判断前一步标记出来的连通区域是否为字符区域。若满足条件，则保留；否则，则设为背景色。去除杂质程序流程如图4所示；分别计算连通区域的高度，连通区域的宽度、连通区域的角度、连通区域的宽高比，判断这些几何特征是否符合字符区域的条件；保留符合条件的连通区域，若不满足条件，则删除此连通区域；再计算保留的连通区域计算连通区域的平均宽度和平均高度，再判断连通区域的几何特征和平均值相比是否有较大偏离，若是，则删除此连通区域。此步操作之后，可以看到仅“E”和“4”的区域保留下来。 Due to the illumination, license plate frame and stains on the license plate, the license plate image after binarization has more impurities and noise. The geometric characteristics of the impurity noise are calculated by the minimum and maximum horizontal and vertical coordinates of the connected area to judge the connected area marked in the previous step. Is it a character area. If the condition is met, keep it; otherwise, set it as the background color. The program flow for removing impurities is shown in Figure 4; respectively calculate the height of the connected region, the width of the connected region, the angle of the connected region, and the aspect ratio of the connected region, and judge whether these geometric features meet the conditions of the character region; retain the qualified connected area, if the condition is not satisfied, delete this connected area; then calculate the average width and average height of the connected area, and then judge whether the geometric characteristics of the connected area deviate greatly from the average value, if so, then Delete this connected region. After this step, you can see that only the "E" and "4" areas remain.

四、上下切割 Four, up and down cutting

通过保留下来的连通区域的最小和最大横纵坐标，计算字符区域的高度和位置信息，将保留下来的字符区域的上中心坐标点存在一个集合中，对这个坐标点集通过最小二乘法进行线性拟合，得到一条拟合出来的直线；用相同的方法，将保留下来的字符区域的下中心点进行线性拟合；两条直线将字符上下进行了切割，使“B”和“2”与上下的铆钉或杂质分离开；然后重新进行标记连通区域和去除杂质这两个步骤，使得“B”和“2”这两个字符区域也符合字符的几何特征而保留下来。 Calculate the height and position information of the character area through the minimum and maximum horizontal and vertical coordinates of the retained connected area, store the upper center coordinate points of the reserved character area in a set, and linearize this set of coordinate points through the least square method Fitting to get a fitted straight line; use the same method to linearly fit the lower center point of the reserved character area; the two straight lines cut the characters up and down, so that "B" and "2" and The upper and lower rivets or impurities are separated; and then the two steps of marking the connected area and removing impurities are performed again, so that the two character areas "B" and "2" also conform to the geometric characteristics of the character and are preserved.

五、间隙计算 5. Clearance Calculation

通过前面几步的计算，由于污渍的影响以及汉字的断裂等杂质噪声的干扰，不是所有的字符区域都能保留。为了恢复剩余字符方便，暂不利用汉字区域，若能顺利恢复剩余6个字母和数字，汉字的位置能通过这些字母和数字来确定。 Through the calculation of the previous steps, due to the influence of stains and the interference of impurity noise such as the breakage of Chinese characters, not all character areas can be preserved. For the convenience of restoring the remaining characters, the Chinese character area is not used for the time being. If the remaining 6 letters and numbers can be successfully restored, the position of the Chinese characters can be determined by these letters and numbers.

车牌字符保留区域的类型表如图5所示，实心的黑色正方形或圆形表示保留的字符区域，而空心的正方形或圆形表示被误删的字符区域，恢复空心的字符区域是正确分割字符的前提条件。 The type table of the license plate character reserved area is shown in Figure 5. The solid black square or circle represents the reserved character area, while the hollow square or circle represents the character area that has been deleted by mistake. Restoring the hollow character area is the correct segmentation character. prerequisites.

如图6所示，由于标准车牌的都遵循统一的标准：“由一个省份汉字后跟字符或阿拉伯数字组成的7个字符序列，标准车牌的排列格式为：

，其中

Figure 2012101898982100002DEST_PATH_IMAGE013

是各省、直辖市的简称，

是大写英文字母，·是间隔符，后面的5个字符是英文字符或阿拉伯数字。车牌的原始尺寸中每个字符所占宽度一定，均为45mm，字符之间以及字符与间隔符之间的间隙为12mm，间隔符宽10mm，字符高度均为90mm”。所以标准间隙和标准字宽存在固定的比例，其中后五个字符以及前两个字符的间隙与标准字宽的比例约为0.267，称之为小间隙；由于间隔符在上述方法中会被认为是杂质而删去，于是第二个和第三个字符间的间隙与标准字宽的比例约为0.756，称之为大间隙。 As shown in Figure 6, since the standard license plates all follow a unified standard: "a sequence of 7 characters consisting of a province Chinese character followed by characters or Arabic numerals, the arrangement format of the standard license plate is:

,in

It is the abbreviation of provinces and municipalities directly under the Central Government.

is an uppercase English letter, · is a spacer, and the following 5 characters are English characters or Arabic numerals. The width of each character in the original size of the license plate is 45mm, the gap between characters and between characters and spacers is 12mm, the width of spacers is 10mm, and the height of characters is 90mm". Therefore, the standard gap and standard character The width has a fixed ratio, and the ratio of the gap between the last five characters and the first two characters to the standard character width is about 0.267, which is called a small gap; because the spacer will be considered as an impurity in the above method and deleted, Therefore, the ratio of the gap between the second and third characters to the standard word width is about 0.756, which is called a large gap.

如图7所示的间隙类型表，可以通过计算相邻连通区域间的间隙，来计算这两个连通区域间存在多少个字符和几个大小间隙，再根据已知区域的标准宽度和计算出来的大小间隙的宽度来恢复被误删的字符区域，使这些连通区域内部不存在缺失的字符区域。“E”和“4”之间的区域满足一个字符两个小间隙的宽度，于是恢复出中间的字符“1”。 The gap type table shown in Figure 7 can calculate the number of characters and several gaps between the two connected regions by calculating the gap between adjacent connected regions, and then calculate it based on the standard width of the known region The width of the size gap is used to restore the character regions that have been deleted by mistake, so that there are no missing character regions in these connected regions. The area between "E" and "4" satisfies the width of two small gaps for one character, so the character "1" in the middle is restored.

通过间隙类型表所示的间隙扩展后，保留的字符区域“B E142”是连续的，仅剩下两边的字符区域缺失待恢复，所以保留区域的类型就会减少很多，如图8所示，其中剩余三个连续的字符区域时，可以通过判断第一个间隙是否为大间隙来唯一确定如何扩展左右剩余的字符区域；因为“B”和“E”符合一个大间隙的宽度，所以左右各扩展一个字符。此外当4个和3个乃至2个不含大间隙的连续连通区域存在时，可以判断连续连通区域右侧到边界的宽度能扩展几个字符区域，以此来进行左右扩展。 After the gap expansion shown in the gap type table, the reserved character area "B E142" is continuous, and only the character areas on both sides are missing to be restored, so the types of reserved areas will be greatly reduced, as shown in Figure 8. When there are three consecutive character areas remaining, how to expand the left and right remaining character areas can be uniquely determined by judging whether the first gap is a large gap; because "B" and "E" conform to the width of a large gap, so the left and right Extend one character. In addition, when there are 4, 3 or even 2 continuous connected regions without large gaps, it can be judged how many character regions can be expanded from the right side of the continuous connected region to the boundary, so as to expand left and right.

然后利用这些连续的保留字符区域，计算字符的标准宽度和标准高度来确定字符框，最终框定车牌的七个字符区域，完成分割；上述方法处理一张车牌的效果图如图9所示。 Then use these continuous reserved character areas to calculate the standard width and height of the characters to determine the character frame, and finally frame the seven character areas of the license plate to complete the segmentation; the rendering of a license plate processed by the above method is shown in Figure 9.

Claims

1. A license plate character segmentation method based on connected regions and gap models, characterized in that, after the license plate is acquired at the location, the operation of the following steps is carried out:

(1) Grayscale and binarization processing

① Use the component method to grayscale the license plate image. The steps are: use the brightness of the three components in the color image as the grayscale value of the three grayscale images, and select one of the grayscale images according to the application needs;

Grayscale processing of blue and yellow license plates;

② After the grayscale processing in step (1)-①, the license plate image is binarized. The steps are: use the NiBlack dynamic threshold binarization algorithm to calculate the threshold of a pixel in the license plate image, and use the threshold Compare with the pixel value of the coordinate point, if it is less than the threshold, it is the background color, and if it is greater than the threshold, it is the character color, and obtain the binary image of the license plate;

(2) Mark connected regions

The connected area is an eight-connected area, that is, when any pixel point has the same pixel value as the eight adjacent pixel points around it, it is regarded as connected between them; based on this standard, step (1)- ② The obtained binary image is marked with the connected region according to the recursive method; after the connected region is marked and determined, record the minimum and maximum horizontal and vertical coordinates of the connected region;

(3) Remove impurity noise

Calculate the geometric features of the connected region through the minimum and maximum horizontal and vertical coordinates of the connected region recorded in step (2), and judge whether the connected region is a character region; if the condition is met, keep it; otherwise, set it as the background color;

(4) Horizontal cutting

The license plate of the vehicle is fixed on the vehicle by rivets. In the binary image of the license plate, part of the character area is connected to the rivet, making it connected with the frame to form a connected area. Such an area will be blocked because it does not conform to the geometric characteristics of the character area. Delete; the upper and lower central coordinate points of the reserved character area are respectively stored in a set, and linear fitting is performed on the two coordinate point sets respectively, and a straight line is obtained from the upper and lower points respectively; the two straight lines connect the character area with the The rivet is separated, and the operation of step (2) and step (3) is repeated, and the character area connected with the rivet is reserved;

(5) The gap calculation restores the remaining characters;

By calculating the gap between adjacent connected regions, it is judged how many characters and how many gaps exist between the two connected regions, and then restore the character region that has been deleted by mistake according to the standard width of the known region and the calculated gap width; These continuous reserved character areas calculate the standard width and standard height of the characters, and then calculate the width of the gap through the standard width, and finally use the character frame determined by the standard width and standard height to expand left and right according to the gap, and finally determine the seven characters of the license plate region to complete the segmentation.