CN106127198A

CN106127198A - A kind of image character recognition method based on Multi-classifers integrated

Info

Publication number: CN106127198A
Application number: CN201610442435.0A
Authority: CN
Inventors: 潘家辉; 黄绍峰; 罗笑玲; 欧阳天优
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2016-06-20
Filing date: 2016-06-20
Publication date: 2016-11-16

Abstract

The invention provides an image text recognition method based on the integration of multiple classifiers, which converts the color image to be recognized into a grayscale image; performs binary processing on the grayscale image, and divides the image area containing text information; Each Chinese character is segmented from the entire text image; the grid feature and direction feature of each Chinese character are extracted; the minimum distance classifier is used, and the stroke density and total length feature is selected for the first layer of rough classification; the nearest neighbor classifier is used , select peripheral feature, grid feature and direction feature respectively to complete the classification matching of the second layer. The advantages of the present invention include: character recognition not only has strong anti-interference ability, but also has a strong ability to describe the local structure of characters, and is less affected by the stroke width; The combined classifier integration technology makes the system more reliable; it can intelligently recognize characters, improve the adaptability of the system and have a high recognition rate.

Description

An image text recognition method based on multi-classifier integration

技术领域technical field

本发明涉及图像文字识别领域，更具体地，涉及一种基于多分类器集成的图像文字识别方法。The present invention relates to the field of image and character recognition, and more specifically, relates to an image and character recognition method based on multi-classifier integration.

背景技术Background technique

社会发展进入信息时代，随着实践活动的扩大、深入和社会化需要，人类需要去识别很多类的形式内容复杂的信息。人们已经不再停留在自己的耳朵和眼睛去直接获得这些信息，而是使用计算机将文字自动的输入计算机。由于科技水平不断提高，使得各种不同的研究对象得到“图像化”和“数字化”，以图像为主的多媒体信息迅速成为重要的信息传递媒介，图像中的文字信息包含了丰富的高层语义信息。提取出这些文字，对于图像高层次语义的理解、索引和检索非常有帮助。Social development has entered the information age. With the expansion and deepening of practical activities and the needs of socialization, human beings need to identify many types of information with complex forms and contents. People no longer stay in their own ears and eyes to directly obtain this information, but use computers to automatically input text into computers. Due to the continuous improvement of the level of science and technology, various research objects have been "imaged" and "digitized", and multimedia information based on images has quickly become an important information transmission medium. The text information in the image contains rich high-level semantic information. . Extracting these words is very helpful for the understanding, indexing and retrieval of high-level semantics of images.

现在对于文字图像识别技术的研究，还面临几个问题，一是图像数据量大，一般来说，要取得较高的识别精度，原始图像应具有较高的分辨率，至少应大于64×64。二是图像污损，由于目标环境的干扰、传输的误差、传感器的误差、噪声、背景干扰、变形等会污损图像。三是准确性，位移、旋转、尺度变化、扭曲，和人类的视觉一样，目标和传感器之间存在有位置的变化，因此，要求系统在目标产生位移、旋转、尺度变化、扭曲时，仍能够正确识别目标。四是实时性，在军事领域的应用中，大都要求系统能够实时的识别目标，这就要求系统有极快的出来速度和识别效率。Now, there are still several problems in the research of text image recognition technology. One is the large amount of image data. Generally speaking, to obtain higher recognition accuracy, the original image should have a higher resolution, at least larger than 64×64 . The second is image defacement. Due to the interference of the target environment, transmission errors, sensor errors, noise, background interference, deformation, etc., the image will be defaced. The third is accuracy, displacement, rotation, scale change, and distortion. Like human vision, there is a position change between the target and the sensor. Therefore, the system is required to be able to detect displacement, rotation, scale change, and distortion of the target correctly identify the target. The fourth is real-time performance. Most of the applications in the military field require the system to be able to identify targets in real time, which requires the system to have extremely fast output speed and identification efficiency.

鉴于当前文字识别系统的发展现状，如何提高印刷体文字的识别率仍是当前的研究热点,如何在世界场景下识别文字将是文字识别系统发展的一个方向。此外,如何构建具有版面自动分析、容错性强、识别率高、错误自学习自修正、易扩展特点的文字识别系统是文字识别自动化的研究目标。所以，图像文字识别技术的研究显得尤为重要。In view of the current development status of character recognition systems, how to improve the recognition rate of printed characters is still a current research hotspot, and how to recognize characters in the world scene will be a direction for the development of character recognition systems. In addition, how to build a text recognition system with the characteristics of automatic layout analysis, strong fault tolerance, high recognition rate, error self-learning and self-correction, and easy expansion is the research goal of text recognition automation. Therefore, the research of image text recognition technology is particularly important.

发明内容Contents of the invention

本发明为克服上述现有技术所述的至少一种缺陷，提供一种自动化的、识别率高的基于多分类器集成的图像文字识别方法。In order to overcome at least one defect of the above-mentioned prior art, the present invention provides an automatic image character recognition method based on multi-classifier integration with high recognition rate.

为解决上述技术问题，本发明的技术方案如下：In order to solve the problems of the technologies described above, the technical solution of the present invention is as follows:

一种基于多分类器集成的图像文字识别方法，所述方法包括以下步骤：A kind of image character recognition method based on multiclassifier integration, described method comprises the following steps:

S1：将彩色的待识别图像转换为灰度图像，若待识别图像本身为灰度图像则省略本步骤；S1: convert the color image to be recognized into a grayscale image, and omit this step if the image to be recognized itself is a grayscale image;

S2：对得到的灰度图像进行二值化处理，并将包含文字信息的图像区域分割出来；S2: Perform binarization processing on the obtained grayscale image, and segment the image area containing text information;

S3：将每个汉字从整块文字图像中分割出来；S3: Segment each Chinese character from the entire text image;

S4：提取每个汉字的网格特征和方向特征；S4: extract the grid feature and direction feature of each Chinese character;

S5：采用最小距离分类器，选用笔画密度总长度特征来进行第一层的粗分类；S5: Use the minimum distance classifier, and select the stroke density and total length feature to perform the rough classification of the first layer;

S6：采用最邻近分类器，分别选用外围特征、网格特征和方向特征相结合来完成第二层的分类匹配。S6: The nearest neighbor classifier is used, and the combination of peripheral features, grid features and direction features is used to complete the classification matching of the second layer.

在一种优选的方案中，步骤S1中，将彩色的待识别图像转换为灰度图像时，采用加权平均值方法进行灰度转换，即对R、G、B的值加权平均：R＝G＝B＝a*R+b*G+c*B；其中，R、G、B分别表示红色、绿色和蓝色，a，b，c分别为R、G、B的权值，其中b>a>c。In a preferred scheme, in step S1, when converting the color image to be recognized into a grayscale image, the weighted average method is used for grayscale conversion, that is, the weighted average of the values of R, G, and B: R=G =B=a*R+b*G+c*B; wherein, R, G, and B represent red, green, and blue respectively, and a, b, and c are the weights of R, G, and B respectively, where b> a>c.

在图像文字识别时，输入的待识别图像一般都是彩色RGB图像，其包含了大量的颜色信息，要是对图像进行处理会降低系统的执行速度，加之RGB图像包含有很多与文字识别无关的颜色信息，不利于文字的定位，而灰度图像，只包含亮度信息，不包含色彩信息，有利于后期对图像进一步的处理，可以提高运行速度，有利于下一步的文字定位。由于人眼对绿色最为敏感，对红色的敏感度次之，对蓝色的敏感度最低，所以当在b>a>c的条件下，可以得到较易于识别的灰度图像。In image text recognition, the input image to be recognized is generally a color RGB image, which contains a large amount of color information. If the image is processed, the execution speed of the system will be reduced. In addition, the RGB image contains many colors that are not related to text recognition. Information is not conducive to the positioning of text, while grayscale images only contain brightness information and do not contain color information, which is conducive to further processing of images in the later stage, can improve the running speed, and is conducive to the next step of text positioning. Since the human eye is most sensitive to green, second to red, and the least sensitive to blue, under the condition of b>a>c, an easier-to-recognize grayscale image can be obtained.

在一种优选的方案中，步骤S2中，采用OTSU算法（大津法或最大类间方差法）对灰度图像进行二值化处理。In a preferred solution, in step S2, the grayscale image is binarized by using the OTSU algorithm (Otsu method or maximum inter-class variance method).

图像的二值化处理，是对图像上的像素点的灰度值置为0或255，即当所有灰度大于或等于阀值的像素点被判定为特定的物体，其灰度值为255，否则，其灰度值为0，表示其他的物体区域或者背景，处理后的图像将呈现明显的黑白效果。图像的二值化将是具有256个灰度等级的灰度图像经过合适的阀值选取后，将像素的灰度级分成2级。经过二值化处理后的图像，其性质只与灰度值为0或255的像素点的位置有关，不再涉及到其他灰度级的像素点，便于对图像作进一步的处理，且数据的处理量和压缩量较小，且获得的二值化图像仍旧可以反映图像整体与局部的特征。为了得到理想的二值化图像，阀值的选取至关重要。选取适当的阀值，不仅可以有效地去除噪声，而且可将图像明显地分成目标区域和背景，大大减少信息量，提高处理的速度。The binarization process of the image is to set the gray value of the pixel on the image to 0 or 255, that is, when all the pixels whose gray value is greater than or equal to the threshold are judged as specific objects, their gray value is 255 , otherwise, its grayscale value is 0, indicating other object areas or backgrounds, and the processed image will show an obvious black and white effect. The binarization of the image will be to divide the gray level of the pixel into 2 levels after the gray level image with 256 gray levels is selected through a suitable threshold. The properties of the image after binarization processing are only related to the position of the pixel with a gray value of 0 or 255, and no longer involve pixels of other gray levels, which is convenient for further processing of the image, and the data The amount of processing and compression is small, and the obtained binarized image can still reflect the overall and local characteristics of the image. In order to obtain an ideal binarized image, the selection of the threshold is very important. Choosing an appropriate threshold can not only effectively remove noise, but also clearly divide the image into the target area and background, greatly reducing the amount of information and improving the processing speed.

在一种优选的方案中，步骤S3中，采用字切分法识别图像区域里的单个文字，即利用字和字之间的空白间隙在图像水平方向上的垂直投影形成的波峰与波谷将单个字符分割出来。In a preferred solution, in step S3, a word segmentation method is used to identify a single character in the image area, that is, using the peaks and troughs formed by the vertical projection of the blank space between characters on the horizontal direction of the image to separate the individual characters Characters are split out.

在一种优选的方案中，步骤S3中，为了提高准确率，采用回归式字切分法识别单个文字，即根据汉字是方形图形、具有大致的均匀尺寸的特点，利用行切分时获取的文字高度来估计文字的宽度，从而预测下一个文字的位置。In a preferred solution, in step S3, in order to improve the accuracy rate, the regression type character segmentation method is used to identify a single character, that is, according to the characteristics that the Chinese character is a square shape and has a roughly uniform size, it is obtained when using line segmentation The height of the text is used to estimate the width of the text, so as to predict the position of the next text.

在一种优选的方案中，步骤S4中，提取文字网格特征的具体方法如下：In a preferred solution, in step S4, the specific method for extracting text grid features is as follows:

1)将文字点阵分成8×8份；1) Divide the text lattice into 8×8 parts;

2)求出每份中的黑点数，用P11，P12，…P18，P21…P88表示；2) Find the number of black dots in each share, represented by P11, P12, ... P18, P21 ... P88;

3)求出文字总的黑点数P=P11+P12+…+Pl8+P21+…+P88；3) Find the total number of black dots P=P11+P12+...+P18+P21+...+P88 in the text;

4)求出每份中黑点数所占整个文字黑点数的百分比Pij=Pij× 100 / P，其中i、j为大于等于1且小于等于8的整数，特征向量(P11，P12，…P18，P21…P88)就是文字的网格特征。4) find the percentage Pij=Pij × 100/P that black dots account for the whole text black dots in every share, wherein i, j are integers greater than or equal to 1 and less than or equal to 8, feature vector (P11, P12, ... P18, P21...P88) is the grid feature of the text.

在一种优选的方案中，步骤S4中，提取文字方向特征的具体方法如下：In a preferred solution, in step S4, the specific method of extracting the text direction feature is as follows:

对文字点阵图像进行二值化和归一化，并提取轮廓信息，对轮廓上的每个点赋予一个或两个方向的属性，方向取水平、垂直及正反45°共四个角度，将文字点阵划分为n×n个网格，计算每个网格中包括的4个方向属性的个数，从而构成一个4维向量，综合所有的网格特征，形成一个4×n×n维的特征向量，即为方向特征。Binarize and normalize the text bitmap image, extract the contour information, and assign one or two directions to each point on the contour. Divide the text lattice into n×n grids, calculate the number of 4 direction attributes included in each grid, thereby forming a 4-dimensional vector, and integrate all grid features to form a 4×n×n The feature vector of dimension is the direction feature.

在一种优选的方案中，步骤S5中，构建最小距离分类器的具体方法如下：In a preferred solution, in step S5, the specific method of constructing the minimum distance classifier is as follows:

1)从样本中提取文字的笔画密度长度作为粗分类的特征向量。2)分别计算每一个类别的样本所对应的特征，每一类的每一维都有特征集合，通过集合，可以计算出一个均值，也就是特征中心。3)通常为了消除不同特征因为量纲不同的影响，我们对每一维的特征，需要做一个归一化，或者是放缩到（-1,1）等区间，使其去量纲化。4)利用选取的距离准则，对待分类的本进行判定。1) Extract the stroke density length of the text from the sample as the feature vector of the rough classification. 2) Calculate the features corresponding to the samples of each category separately. Each dimension of each category has a feature set. Through the set, a mean value, that is, the feature center, can be calculated. 3) Usually, in order to eliminate the influence of different features due to different dimensions, we need to normalize the features of each dimension, or scale them to (-1,1) and other intervals to make them de-dimensionalized. 4) Use the selected distance criterion to judge the book to be classified.

在一种优选的方案中，步骤S6中，构建最邻近分类器的具体方法如下：In a preferred solution, in step S6, the specific method of constructing the nearest neighbor classifier is as follows:

1）初始化距离为最大值1) Initialize the distance to the maximum value

2）计算未知样本和每个训练样本的距离dist2) Calculate the distance dist between the unknown sample and each training sample

3）得到目前K个最临近样本中的最大距离maxdist3) Get the maximum distance maxdist among the current K nearest samples

4）如果dist小于maxdist，则将该训练样本作为K-最近邻样本4) If dist is less than maxdist, use the training sample as the K-nearest neighbor sample

5）重复步骤2、3、4，直到未知样本和所有训练样本的距离都算完5) Repeat steps 2, 3, and 4 until the distance between the unknown sample and all training samples is calculated

6）统计K-最近邻样本中每个类标号出现的次数6) Count the number of occurrences of each class label in the K-nearest neighbor sample

7）选择出现频率最大的类标号作为未知样本的类标号7) Select the class label with the highest frequency as the class label of the unknown sample

与现有技术相比，本发明技术方案的有益效果是：本发明提供一种基于多分类器集成的图像文字识别方法，将彩色的待识别图像转换为灰度图像；对灰度图像进行二值化处理，并将包含文字信息的图像区域分割出来；将每个汉字从整块文字图像中分割出来；提取每个汉字的网格特征和方向特征；采用最小距离分类器，选用笔画密度总长度特征来进行第一层的粗分类；采用最邻近分类器，分别选用外围特征、网格特征和方向特征相结合来完成第二层的分类匹配。对于特征提取，采用网格和方向特征结合的方法，使文字识别既有较强的抗干扰能力、又有较强的描述文字局部结构的能力，而且受笔画宽度的影响较小；对于图像文字识别中，应用了人工智能学习技术，提高系统的适应性并且识别率高；对于分类器设计，采用了最小距离分类器、最临近分类器互补结合的分类器集成技术，使系统更具可靠性。Compared with the prior art, the beneficial effects of the technical solution of the present invention are: the present invention provides an image character recognition method based on multi-classifier integration, which converts the color image to be recognized into a grayscale image; value processing, and segment the image area containing text information; segment each Chinese character from the entire text image; extract the grid feature and direction feature of each Chinese character; use the minimum distance classifier, and select the total stroke density The length feature is used to carry out the rough classification of the first layer; the nearest neighbor classifier is used to combine the peripheral feature, grid feature and direction feature to complete the classification matching of the second layer. For feature extraction, the method of combining grid and direction features is adopted, so that text recognition has both strong anti-interference ability and strong ability to describe the local structure of text, and is less affected by stroke width; for image text In the identification, artificial intelligence learning technology is applied to improve the adaptability of the system and the recognition rate is high; for the design of the classifier, the classifier integration technology of the complementary combination of the minimum distance classifier and the nearest classifier is used to make the system more reliable .

附图说明Description of drawings

图1为基于多分类器集成的图像文字识别方法的流程图。Fig. 1 is a flowchart of an image text recognition method based on multi-classifier integration.

图2为灰度转换和二值化的示意图。Figure 2 is a schematic diagram of grayscale conversion and binarization.

图3为回归式字切分法的示意图。Fig. 3 is a schematic diagram of the regression word segmentation method.

图4为提取文字网格特征的示意图。Figure 4 is a schematic diagram of extracting text grid features.

图5为提取方向网格特征的示意图。Fig. 5 is a schematic diagram of extracting directional grid features.

图6为多个分类器集成的文字识别示意图。Fig. 6 is a schematic diagram of character recognition integrated with multiple classifiers.

图7为整段文字分割成单个的字体的示意图。FIG. 7 is a schematic diagram of dividing a whole paragraph of text into individual fonts.

图8为以文本框的形式输出文字的示意图。FIG. 8 is a schematic diagram of outputting text in the form of a text box.

具体实施方式detailed description

附图仅用于示例性说明，不能理解为对本专利的限制；The accompanying drawings are for illustrative purposes only and cannot be construed as limiting the patent;

为了更好说明本实施例，附图某些部件会有省略、放大或缩小，并不代表实际产品的尺寸；In order to better illustrate this embodiment, some parts in the drawings will be omitted, enlarged or reduced, and do not represent the size of the actual product;

对于本领域技术人员来说，附图中某些公知结构及其说明可能省略是可以理解的。For those skilled in the art, it is understandable that some well-known structures and descriptions thereof may be omitted in the drawings.

下面结合附图和实施例对本发明的技术方案做进一步的说明。The technical solutions of the present invention will be further described below in conjunction with the accompanying drawings and embodiments.

实施例1Example 1

如图1所示，一种基于多分类器集成的图像文字识别方法，所述方法包括以下步骤：As shown in Figure 1, a kind of image character recognition method based on multiclassifier integration, described method comprises the following steps:

将彩色的待识别图像转换为灰度图像时，采用加权平均值方法进行灰度转换，即对R、G、B的值加权平均：R＝G＝B＝a*R+b*G+c*B；其中，R、G、B分别表示红色、绿色和蓝色，a，b，c分别为R、G、B的权值，其中b>a>c。When converting the color image to be recognized into a grayscale image, the weighted average method is used for grayscale conversion, that is, the weighted average of the values of R, G, and B: R=G=B=a*R+b*G+c *B; Among them, R, G, and B represent red, green, and blue, respectively, and a, b, and c are the weights of R, G, and B, respectively, where b>a>c.

如图2所示，步骤S2中，采用OTSU算法对灰度图像进行二值化处理。图像的二值化处理，是对图像上的像素点的灰度值置为0或255，即当所有灰度大于或等于阀值的像素点被判定为特定的物体，其灰度值为255，否则，其灰度值为0，表示其他的物体区域或者背景，处理后的图像将呈现明显的黑白效果。图像的二值化将是具有256个灰度等级的灰度图像经过合适的阀值选取后，将像素的灰度级分成2级。经过二值化处理后的图像，其性质只与灰度值为0或255的像素点的位置有关，不再涉及到其他灰度级的像素点，便于对图像作进一步的处理，且数据的处理量和压缩量较小，且获得的二值化图像仍旧可以反映图像整体与局部的特征。为了得到理想的二值化图像，阀值的选取至关重要。选取适当的阀值，不仅可以有效地去除噪声，而且可将图像明显地分成目标区域和背景，大大减少信息量，提高处理的速度。As shown in FIG. 2, in step S2, the grayscale image is binarized using the OTSU algorithm. The binarization process of the image is to set the gray value of the pixel on the image to 0 or 255, that is, when all the pixels whose gray value is greater than or equal to the threshold are judged as specific objects, their gray value is 255 , otherwise, its grayscale value is 0, indicating other object areas or backgrounds, and the processed image will show an obvious black and white effect. The binarization of the image will be to divide the gray level of the pixel into 2 levels after the gray level image with 256 gray levels is selected through a suitable threshold. The properties of the image after binarization processing are only related to the position of the pixel with a gray value of 0 or 255, and no longer involve pixels of other gray levels, which is convenient for further processing of the image, and the data The amount of processing and compression is small, and the obtained binarized image can still reflect the overall and local characteristics of the image. In order to obtain an ideal binarized image, the selection of the threshold is very important. Choosing an appropriate threshold can not only effectively remove noise, but also clearly divide the image into the target area and background, greatly reducing the amount of information and improving the processing speed.

OTSU算法是按图像的灰度特性，将图像分成背景和目标2部分，背景和目标之间的类间方差越大，说明构成图像的2部分的差别越大，当部分目标错分为背景或部分背景错分为目标都会导致2部分差别变小，因此，使类间方差最大的分割意味着错分概率最小；The OTSU algorithm divides the image into two parts, the background and the target, according to the grayscale characteristics of the image. The larger the variance between the classes between the background and the target, the greater the difference between the two parts of the image. When part of the target is wrongly divided into background or target Misclassification of part of the background into the target will cause the difference between the two parts to become smaller. Therefore, the segmentation that maximizes the variance between classes means that the probability of misclassification is the smallest;

Otsu算法步骤如下：The steps of Otsu algorithm are as follows:

设图象包含L个灰度级(0,1…,L-1)，灰度值为i的的象素点数为Ni ，图象总的象素点数为N=N0+N1+...+N(L-1)，灰度值为i的点的概率为：P(i) = N(i)/N；Suppose the image contains L gray levels (0,1...,L-1), the number of pixels with gray value i is Ni, and the total number of pixels in the image is N=N0+N1+...+ N(L-1), the probability of a point with a gray value i is: P(i) = N(i)/N;

门限t将整幅图象分为暗区c1和亮区c2两类，则类间方差σ是t的函数：σ=a1*a2(u1-u2)^2 ；式中，aj 为类cj的面积与图象总面积之比，a1=sum(P(i)) i->t, a2 = 1-a1;Threshold t divides the entire image into two categories, dark area c1 and bright area c2, then the inter-class variance σ is a function of t: σ=a1*a2(u1-u2)^2; where aj is the value of class cj The ratio of the area to the total area of the image, a1=sum(P(i)) i->t, a2 = 1-a1;

uj为类cj的均值，u1 = sum(i*P(i))/a1 0->t, u2 = sum(i*P(i))/a2, t+1->L-1,该法选择最佳门限t^使类间方差最大，即：令Δu=u1-u2，σb = max{a1(t)*a2(t)Δu^2}。uj is the mean value of class cj, u1 = sum(i*P(i))/a1 0->t, u2 = sum(i*P(i))/a2, t+1->L-1, this method Select the optimal threshold t^ to maximize the variance between classes, that is: let Δu=u1-u2, σb = max{a1(t)*a2(t)Δu^2}.

如图3所示，步骤S3中，采用字切分法识别图像区域里的单个文字，即利用字和字之间的空白间隙在图像水平方向上的垂直投影形成的波峰与波谷将单个字符分割出来。为了提高准确率，采用回归式字切分法识别单个文字，即根据汉字是方形图形、具有大致的均匀尺寸的特点，利用行切分时获取的文字高度来估计文字的宽度，从而预测下一个文字的位置。As shown in Figure 3, in step S3, a single character in the image area is identified using the character segmentation method, that is, a single character is segmented using the peaks and troughs formed by the vertical projection of the blank space between characters in the horizontal direction of the image come out. In order to improve the accuracy, a regression-type character segmentation method is used to identify a single character, that is, according to the characteristics of Chinese characters that are square and roughly uniform in size, the width of the character is estimated by using the height of the character obtained during line segmentation, so as to predict the next character. The position of the text.

抽取单一种类的特征进行汉字识别，误识率不易降低，且抗干扰性也不易提高。因为这样所利用的汉字信息量有限，不能全面反映汉字的特点，对任何一种特征来说，必然存在其识别的“死角”，即利用这种特征很难区分汉字。从模式识别的角度来看，若将汉字的全部矢量化特征所组成的空间称作空间Ω(i=1,2，...)，那么利用整个空间Ω的信息进行汉字识别，由于提供的汉字信息很充分，抗干扰性会大大增强。但是，在实际应用中，必须考虑到识别正确率与识别速度(运算量)及系统资源三者的折衷。所以任何一个实用的OCR系统只利用其中部分子空间的信息。由于信息的缺陷，便不可避免地遇到识别“死角”的问题。Extracting a single type of feature for Chinese character recognition is not easy to reduce the false recognition rate, and it is not easy to improve the anti-interference ability. Because the amount of information of Chinese characters used in this way is limited, it cannot fully reflect the characteristics of Chinese characters. For any feature, there must be a "dead angle" in its recognition, that is, it is difficult to distinguish Chinese characters by using this feature. From the perspective of pattern recognition, if the space composed of all the vectorized features of Chinese characters is called space Ω (i=1, 2,...), then the information of the entire space Ω is used for Chinese character recognition, since the provided The information of Chinese characters is sufficient, and the anti-interference ability will be greatly enhanced. However, in practical applications, the trade-off between recognition accuracy, recognition speed (calculation amount) and system resources must be considered. So any practical OCR system only utilizes the information of some of the subspaces. Due to the lack of information, it is inevitable to encounter the problem of identifying "dead spots".

在这些方法研究的基础上，本发明选择了汉字的网格特征和方向特征进行汉字识别，这些特征具有较强的抗干扰能力，又有较强的描述文字局部结构的能力，而且受笔画宽度的影响较小，相得益彰，使汉字识别的“死角”大幅减小，从而提高识别率。Based on the research of these methods, the present invention selects the grid features and direction features of Chinese characters to recognize Chinese characters. These features have strong anti-interference ability and strong ability to describe the local structure of characters. The influence of each is small, and they complement each other, so that the "dead angle" of Chinese character recognition is greatly reduced, thereby improving the recognition rate.

如图4所示，步骤S4中，提取文字网格特征的具体方法如下：As shown in Figure 4, in step S4, the specific method for extracting text grid features is as follows:

1)将文字点阵分成m×m份，本实施例中分为8×8份。1) Divide the character dot matrix into m×m parts, which are 8×8 parts in this embodiment.

2)求出每份中的黑点数，用P11，P12，…P18，P21…P88表示。2) Find the number of black dots in each share, expressed by P11, P12, ... P18, P21 ... P88.

3)求出文字总的黑点数P=P11+P12+…+Pl8+P21+…+P88。3) Calculate the total number of black dots in the text P=P11+P12+...+P18+P21+...+P88.

如图5所示，步骤S4中，提取文字方向特征的具体方法如下：As shown in Figure 5, in step S4, the specific method of extracting the text direction feature is as follows:

S5：如图6所示，采用最小距离分类器，选用笔画密度总长度特征来进行第一层的粗分类；S5: As shown in Figure 6, the minimum distance classifier is used, and the stroke density and total length feature is selected to perform the rough classification of the first layer;

最小距离分类器选用笔画密度总长度特征来进行第一层的粗分类。在这种方法中，被识别模式与所属模式类别样本的距离最小。假定c 个类别代表模式的特征向量用R1，…，Rc表示，x是被识别模式的特征向量，|x-Ri|是x与Ri(i＝1，2，…，c)之间的距离，如果|x-Ri|最小，则把x分为第i类。The minimum distance classifier uses the stroke density and total length feature to perform the rough classification of the first layer. In this method, the distance between the recognized pattern and the samples belonging to the pattern category is the smallest. Assume that the eigenvectors of c categories representing patterns are denoted by R1,...,Rc, x is the eigenvector of the recognized pattern, |x-Ri| is the distance between x and Ri (i=1, 2,...,c) , if |x-Ri| is the smallest, then divide x into the i-th category.

最邻近分类器分别选用网格特征和方向特征相结合来完成第二层的分类匹配。最近邻分类器是在最小距离分类的基础上进行扩展，将训练集中的每一个样本作为判别依据，寻找距离待分类样本最近的训练集中的样本，以此为依据来进行分类。The nearest neighbor classifier selects the combination of grid features and direction features to complete the classification matching of the second layer. The nearest neighbor classifier expands on the basis of the minimum distance classification, uses each sample in the training set as the basis for discrimination, and searches for the samples in the training set that are closest to the sample to be classified, and uses this as the basis for classification.

经过多次试验与研究，结论表明基于单个识别器原理不能从根本上提高系统性能，应依靠多个分类器的识别结果的集成。多分类器集成即通过多个互补的分类器来改善单个分类器的性能，得到一个可靠性更高的识别系统。因此，本发明采用最小距离分类器及最邻近分类器集成，通过分类器设计上的优化，进一步提高了文字的可以别率和准确率。After many experiments and researches, the conclusion shows that the principle of a single recognizer cannot fundamentally improve the performance of the system, and it should rely on the integration of the recognition results of multiple classifiers. Multi-classifier integration is to improve the performance of a single classifier through multiple complementary classifiers, and obtain a more reliable recognition system. Therefore, the present invention adopts the integration of the minimum distance classifier and the nearest neighbor classifier, and further improves the recognizability and accuracy of characters by optimizing the classifier design.

为验证本发明的有效性，需进行相关实验，本发明使用包含697个汉字的原始图像来进行测试。首先把该原始图片转化为灰度图像以便进行下一步的操作。通过回归式字切分法把整段文字分割成单个的字体，测试效果如图7，可以准确地分割每个汉字。最后，采用多特征提取和多分类器集成的方法识别分割出来的文字，并以文本框的形式输出，测试结果如图8，结果全部正确。In order to verify the effectiveness of the present invention, relevant experiments need to be carried out. The present invention uses an original image containing 697 Chinese characters for testing. First, convert the original image into a grayscale image for the next step. The whole paragraph of text is divided into individual fonts through the regression character segmentation method. The test effect is shown in Figure 7, and each Chinese character can be accurately segmented. Finally, the method of multi-feature extraction and multi-classifier integration is used to recognize the segmented text and output it in the form of a text box. The test results are shown in Figure 8, and the results are all correct.

多特征提取方法及多分类器集成方法使提高图像文字识别率成为可能，其良好的识别效果引起了人们的普遍重视，具有广阔的应用前景。本发明基于多分类器集成方法实现图像文字识别，使图像文字信息的处理及提取更具可行性。The multi-feature extraction method and the multi-classifier integration method make it possible to improve the recognition rate of image characters, and its good recognition effect has attracted people's attention and has broad application prospects. The invention realizes image text recognition based on a multi-classifier integration method, and makes processing and extraction of image text information more feasible.

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. a kind of image text recognition method based on multiclassifier integration, it is characterized in that, described method comprises the following steps:

S1: convert the color image to be recognized into a grayscale image, and omit this step if the image to be recognized itself is a grayscale image;

S2: Perform binarization processing on the obtained grayscale image, and segment the image area containing text information;

S3: Segment each Chinese character from the entire text image;

S4: extract the grid feature and direction feature of each Chinese character;

S5: Use the minimum distance classifier, and select the stroke density and total length feature to perform the rough classification of the first layer;

S6: The nearest neighbor classifier is used, and the combination of peripheral features, grid features and direction features is used to complete the classification matching of the second layer.

2. the image character recognition method based on multi-classifier integration according to claim 1, is characterized in that, in step S1, when the colored image to be recognized is converted into a grayscale image, the weighted average method is used to carry out the grayscale conversion , that is, the weighted average of the values of R, G, and B: R=G=B=a*R+b*G+c*B; wherein, R, G, and B represent red, green, and blue, respectively, and a, b , c are the weights of R, G, and B respectively, where b>a>c.

3. the image character recognition method based on multiclassifier integration according to claim 1, is characterized in that, in step S2, adopts OTSU algorithm to carry out binarization process to grayscale image, OTSU algorithm is to press the grayscale characteristic of image , Divide the image into background and target. The larger the inter-class variance between the background and the target, the greater the difference between the two parts that make up the image. When part of the target is wrongly divided into the background or part of the background is wrongly divided into the target, it will cause 2 Part of the difference becomes smaller, so the split that maximizes the variance between classes means the smallest probability of misclassification;

The steps of Otsu algorithm are as follows:

Suppose the image contains L gray levels (0,1...,L-1), the number of pixels with gray value i is Ni, and the total number of pixels in the image is N=N0+N1+...+ N(L-1), the probability of a point with a gray value i is: P(i) = N(i)/N;

Threshold t divides the entire image into two categories, dark area c1 and bright area c2, then the inter-class variance σ is a function of t: σ=a1*a2(u1-u2)^2; where aj is the value of class cj The ratio of the area to the total area of the image, a1=sum(P(i)) i->t, a2 = 1-a1;

uj is the mean value of class cj, u1 = sum(i*P(i))/a1 0->t, u2 = sum(i*P(i))/a2, t+1->L-1, this method Select the optimal threshold t^ to maximize the variance between classes, that is: let Δu=u1-u2, σb = max{a1(t)*a2(t)Δu^2}.

4. the image text recognition method based on multi-classifier integration according to claim 1, is characterized in that, in step S3, adopts word segmentation method to identify the single text in the image area, promptly utilizes the blank between word and word The peaks and troughs formed by the vertical projection of the gap on the horizontal direction of the image separate individual characters.

5. the image text recognition method based on multi-classifier integration according to claim 4, is characterized in that, in step S3, adopts the regression type character segmentation method to identify single text, promptly according to Chinese character is a square figure, has roughly uniform The characteristics of size, use the height of the text obtained during line segmentation to estimate the width of the text, so as to predict the position of the next text.

6. the image text recognition method based on multiclassifier integration according to claim 1, is characterized in that, in step S4, the concrete method of extracting text grid feature is as follows:

1) Divide the text lattice into 8×8 parts;

2) Find the number of black dots in each share, represented by P11, P12, ... P18, P21 ... P88;

3) Find the total number of black dots P=P11+P12+...+P18+P21+...+P88 in the text;

4) find the percentage Pij=Pij × 100/P that black dots account for the whole text black dots in every share, wherein i, j are integers greater than or equal to 1 and less than or equal to 8, feature vector (P11, P12, ... P18, P21...P88) is the grid feature of the text.

7. the image text recognition method based on multiclassifier integration according to claim 1, is characterized in that, in step S4, the concrete method of extracting text direction feature is as follows:

Binarize and normalize the text bitmap image, extract the contour information, and assign one or two directions to each point on the contour. Divide the text lattice into n×n grids, calculate the number of 4 direction attributes included in each grid, thereby forming a 4-dimensional vector, and integrate all grid features to form a 4×n×n The feature vector of dimension is the direction feature.