CN109034154A

CN109034154A - The extraction and recognition methods of Invoice Seal duty paragraph

Info

Publication number: CN109034154A
Application number: CN201810811435.2A
Authority: CN
Inventors: 张文博; 王凯; 姬红兵; 李林; 臧博
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-07-23
Filing date: 2018-07-23
Publication date: 2018-12-18

Abstract

The invention discloses a method for extracting and identifying a tax number of a special invoice stamp, which mainly solves the problems of poor efficiency and low precision in obtaining the tax number in the existing method. The implementation steps include: 1) obtaining the RGB image containing the invoice special stamp; 2) performing Laplace sharpening on the RGB image and converting it to the YCrCb color space; 3) performing adaptive threshold processing on the Cr channel of the YCrCb color space , only keep the content of the invoice special stamp; 4) Obtain the outer contour ellipse image of the invoice special stamp through Canny edge detection; 5) Find the center point by calculating the centroid coordinates of the ellipse image; 6) Intercept only the tax number according to the center point Rectangular area; 7) Divide the tax number in the rectangular area into single numbers; 8) Perform template matching on each number to identify the tax number. The invention effectively improves the acquisition speed of the tax number of the special invoice stamp, reduces the identification error, and can be used for fast and accurate extraction and identification of the tax number.

Description

Method for extracting and identifying the tax number of the special seal of the invoice

技术领域technical field

本发明属于数字图像处理技术领域，进一步涉及文字识别技术，具体为一种发票专用章税号的提取和识别方法，可用于获取税务登记号信息。The invention belongs to the technical field of digital image processing, and further relates to character recognition technology, in particular to a method for extracting and identifying tax numbers of special invoice stamps, which can be used to obtain tax registration number information.

背景技术Background technique

发票专用章是指使用发票的企业、单位和个体工商业户按税务机关规定刻制，在领购或开具发票时加盖于发票之上的专用印章，该印章印模里含有其公司单位名称、发票专用章字样及税务登记号，其中税务登记号简称税号。税号是发证机关给出的一张税务“身份证”，统一税号制度有利于强化税务登记，规范税收征管，加强纳税档案管理和税源管理，促进依法治税。因此税号的提取、识别显得尤为重要。The special seal for invoices refers to the special seals that are engraved by enterprises, units and individual industrial and commercial households that use invoices in accordance with the regulations of the tax authorities. They are stamped on the invoices when they receive purchases or issue invoices. The characters of the special seal for invoices and the tax registration number, where the tax registration number is referred to as the tax number. A tax ID is a tax “identity card” given by the issuing authority. A unified tax ID system is conducive to strengthening tax registration, standardizing tax collection and management, strengthening tax file management and tax source management, and promoting taxation according to law. Therefore, the extraction and identification of the tax number is particularly important.

现阶段税号的提取和识别方法主要有两种：一是传统的人工识别方法，其通过人工辨识税号，然后输入到计算机中，这种方法效率低、误差大，而且人工和时间成本过高；二是基于深度学习的提取和识别方法，通过大量已知训练样本训练分类器，然后采用循环神经网络对预处理后的图像数字进行定位，分割出带有数字的图像，最后将分割好的图像送到训练好的分类器中进行分类，该方法可以有效改善人工识别误差大、成本高的问题，然而其所采用的循环神经网络和分类器均需要提取图像的指定特征作为输入，因此计算量较大，依然存在提取和识别速度慢的不足。At this stage, there are mainly two methods for extracting and identifying tax ID numbers: one is the traditional manual identification method, which manually identifies the tax ID number and then inputs it into the computer. This method has low efficiency, large errors, and excessive labor and time costs. The second is the extraction and recognition method based on deep learning. A classifier is trained through a large number of known training samples, and then the recurrent neural network is used to locate the preprocessed image digits, segment the image with digits, and finally segment the image. The image is sent to the trained classifier for classification. This method can effectively improve the problem of large error and high cost of manual recognition. However, both the cyclic neural network and the classifier used need to extract the specified features of the image as input, so The amount of calculation is large, and there are still shortcomings of slow extraction and recognition speed.

发明内容Contents of the invention

本发明的目的在于针对上述现有技术的不足，提出一种发票专用章税号的提取和识别方法，通过计算椭圆图像的质心，快速定位发票专用章中心坐标，进而选定税号所在区域，然后分割出单个数字图像，依次对其进行模板匹配，快速准确地识别发票专用章中的税号，从而提高对发票专用章税号的识别效率及精度。The purpose of the present invention is to address the above-mentioned deficiencies in the prior art, and propose a method for extracting and identifying the tax number of the special invoice stamp, by calculating the centroid of the ellipse image, quickly locate the center coordinates of the special stamp of the invoice, and then select the area where the tax number is located, Then segment a single digital image, perform template matching on it in turn, and quickly and accurately identify the tax number in the special invoice stamp, thereby improving the recognition efficiency and accuracy of the tax number in the special invoice stamp.

为实现上述目的，本发明采取的技术方案为：In order to achieve the above object, the technical scheme that the present invention takes is:

(1)获取包含发票专用章的RGB图像；(1) Obtain the RGB image containing the special stamp for the invoice;

(2)将获取的RGB图像进行拉普拉斯锐化，然后将其转换到YCrCb颜色空间；(2) Laplacian sharpening is carried out to the acquired RGB image, and then it is converted to the YCrCb color space;

(3)将YCrCb颜色空间进行通道分离，并对分离出的Cr通道进行自适应阈值处理，将图像转换为二值图，再通过膨胀腐蚀运算去除二值图的噪声，得到仅包含发票专用章的二值图；(3) Separate the channels of the YCrCb color space, and perform adaptive threshold processing on the separated Cr channel, convert the image into a binary image, and then remove the noise of the binary image through dilation and corrosion operations, and obtain only the invoice-specific stamp The binary map;

(4)对仅包含发票专用章的二值图进行Canny边缘检测，选出最外层轮廓，即发票专用章的外轮廓椭圆图像；(4) Carry out Canny edge detection on the binary image that only contains the special invoice stamp, and select the outermost contour, that is, the outer contour ellipse image of the special invoice stamp;

(5)计算外轮廓椭圆图像的质心坐标，即发票专用章的中心点坐标；(5) Calculate the centroid coordinates of the outer contour ellipse image, i.e. the coordinates of the center point of the special invoice stamp;

(6)根据中心点坐标截取只包含税号的矩形区域；(6) According to the coordinates of the center point, a rectangular area containing only the tax number is intercepted;

(7)将矩形区域中的税号分割为单个数字；(7) Divide the tax code in the rectangular area into a single number;

(8)制作数字0～9的搜索模板T，调整单个数字图像的大小，使其与搜索模板T中数字的大小一致，将调整后的单个数字作为被搜索图S依次送入制作好的数字搜索模板中，采用相关法对数字进行匹配，根据匹配结果识别税号。(8) Make a search template T for numbers 0 to 9, adjust the size of a single digital image to make it consistent with the size of the number in the search template T, and send the adjusted single number as the searched image S to the prepared number in sequence In the search template, the correlation method is used to match the numbers, and the tax ID is identified according to the matching result.

本发明与现有技术相比，具有以下优点：Compared with the prior art, the present invention has the following advantages:

第一、由于本发明利用零阶和一阶几何矩获取发票专用章外轮廓椭圆的中心点坐标，然后根据中心点坐标精确定位税号位置，因此无需训练网络和提取图像特征，从而减小了税号定位的计算量、提高了定位精确度。First, because the present invention uses the zero-order and first-order geometric moments to obtain the center point coordinates of the ellipse of the special invoice seal, and then accurately locates the position of the tax number according to the center point coordinates, there is no need to train the network and extract image features, thereby reducing the The amount of calculation for tax number positioning improves the positioning accuracy.

第二、由于本发明对发票专用章中的数字部分采用模板匹配的方法进行识别，因此不需要训练分类器以及提取图像特征的过程，从而加快了对数字的识别速度、提高了提取并识别税号的效率。Second, because the present invention uses template matching to identify the digital part of the invoice special chapter, it does not need to train a classifier and extract image features, thereby speeding up the recognition of numbers, improving the extraction and recognition of tax Number of efficiency.

附图说明Description of drawings

图1是本发明的实现流程图；Fig. 1 is the realization flowchart of the present invention;

图2是包含发票专用章的RGB图；Figure 2 is an RGB image containing a special stamp for invoices;

图3是只包含发票专用章内容的二值图；Figure 3 is a binary image containing only the content of the invoice special seal;

图4是发票专用章的外轮廓图像；Figure 4 is the outline image of the invoice special seal;

图5是发票专用章外轮廓及中心点图像；Figure 5 is an image of the outer contour and center point of the invoice special seal;

图6是只包含税号的矩形二值图；Figure 6 is a rectangular binary map containing only tax codes;

图7是对只包含税号的矩形二值图进行颜色反转后得到的图像。Figure 7 is the image obtained after color inversion of the rectangular binary image containing only tax codes.

具体实施方式Detailed ways

为使本领域技术人员更好地理解本发明的技术方案，下面结合附图与具体实施例，对本发明作进一步详细阐述。In order to enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

参照图1，发票专用章税号的提取和识别方法，包括以下步骤：Referring to Fig. 1, the method for extracting and identifying the tax number of the special invoice stamp includes the following steps:

步骤1.获取包含发票专用章的RGB图像，如图2所示。Step 1. Obtain an RGB image containing the invoice-specific stamp, as shown in Figure 2.

步骤2.将获取的RGB图像进行拉普拉斯锐化，然后将其转换到YCrCb颜色空间。Step 2. Laplacian sharpen the acquired RGB image and then convert it to YCrCb color space.

首先，对步骤1获取的RGB图像进行拉普拉斯锐化处理，具体如下：将低于邻域像素平均灰度值的中心像素灰度值降低，将高于邻域像素平均灰度值的中心像素灰度值提高。即，当邻域中心像素灰度低于它所在领域内其它像素的平均灰度时，降低此中心像素的灰度值；当邻域中心像素灰度高于它所在邻域内其它像素的平均灰度时，提高此中心像素的灰度值。图像经过锐化处理后，其发票专用章部分更容易提取。First, Laplacian sharpening is performed on the RGB image obtained in step 1, as follows: reduce the gray value of the center pixel that is lower than the average gray value of the neighborhood pixels, and lower the gray value of the center pixel that is higher than the average gray value The gray value of the central pixel increases. That is, when the gray level of the central pixel in the neighborhood is lower than the average gray level of other pixels in its area, reduce the gray value of the central pixel; when the gray level of the central pixel in the neighborhood is higher than the average gray level of other pixels in its neighborhood Increase the gray value of this center pixel when the brightness is higher. After the image is sharpened, the invoice-specific stamp part is easier to extract.

然后，通过如下公式将锐化后的RGB图像转换到YCrCb颜色空间：Then, the sharpened RGB image is converted to the YCrCb color space by the following formula:

Y＝(B×1868+G×9617+R×4899+8192)÷16384；Y＝(B×1868+G×9617+R×4899+8192)÷16384;

Cr＝((B-Y)×9241+8192)÷16384+128；Cr＝((B-Y)×9241+8192)÷16384+128;

Cb＝((R-Y)×11682+8192)÷16384+128；Cb=((R-Y)×11682+8192)÷16384+128;

其中，“R”表示RGB图像中R通道的值，“G”表示RGB图像中G通道的值，“B”表示RGB图像中B通道的值，“Y”表示YCrCb图像中Y通道的值，“Cr”表示YCrCb图像中Cr通道的值，“Cb”表示YCrCb图像中Cb通道的值。Among them, "R" indicates the value of the R channel in the RGB image, "G" indicates the value of the G channel in the RGB image, "B" indicates the value of the B channel in the RGB image, and "Y" indicates the value of the Y channel in the YCrCb image, "Cr" indicates the value of the Cr channel in the YCrCb image, and "Cb" indicates the value of the Cb channel in the YCrCb image.

YCrCb颜色空间包含三个通道，其中：“Y”表示明亮度，即灰阶值；“Cr”和“Cb”均表示色度，“Cr”指红色色度分量，“Cb”指蓝色色度分量。The YCrCb color space contains three channels, among which: "Y" represents brightness, that is, the grayscale value; "Cr" and "Cb" both represent chroma, "Cr" refers to the red chroma component, and "Cb" refers to the blue chroma portion.

步骤3.对YCrCb颜色空间的Cr通道进行自适应阈值处理，仅保留发票专用章的内容。Step 3. Perform adaptive threshold processing on the Cr channel of the YCrCb color space, and only keep the content of the invoice special chapter.

从YCrCb颜色空间中分离出Cr通道，对Cr通道进行自适应阈值处理，将图像转换为二值图，然后对其进行膨胀腐蚀运算去除噪声，得到只保留发票专用章的二值图，如图3所示。Separate the Cr channel from the YCrCb color space, perform adaptive threshold processing on the Cr channel, convert the image into a binary image, and then perform expansion and corrosion operations on it to remove noise, and obtain a binary image that only retains the special stamp for invoices, as shown in the figure 3.

步骤4.通过Canny边缘检测获取发票专用章的外轮廓椭圆图像。Step 4. Obtain the outer contour ellipse image of the invoice special stamp through Canny edge detection.

通过以下步骤对只保留发票专用章的二值图进行Canny边缘检测：Use the following steps to perform Canny edge detection on the binary image that only retains the special chapter of the invoice:

1)使用高斯滤波器，平滑图像、滤除噪声；1) Use a Gaussian filter to smooth the image and filter out noise;

2)计算图像中每个像素点的梯度强度和方向；2) Calculate the gradient strength and direction of each pixel in the image;

3)利用非极大值抑制消除边缘检测带来的杂散响应；3) Use non-maximum value suppression to eliminate the spurious response caused by edge detection;

4)使用双阈值检测确定真实的以及潜在的边缘；4) Identify real and potential edges using dual-threshold detection;

5)通过抑制孤立的弱边缘最终完成边缘检测。5) Edge detection is finally completed by suppressing isolated weak edges.

完成边缘检测后选出最外层轮廓，即发票专用章的外轮廓椭圆图像，如图4所示。After the edge detection is completed, the outermost contour is selected, that is, the outer contour ellipse image of the invoice special stamp, as shown in Figure 4.

步骤5.通过计算椭圆图像的质心坐标，找到发票专用章的中心点。Step 5. Find the center point of the invoice-specific stamp by calculating the centroid coordinates of the ellipse image.

根据国家标准，发票专用章税号在其中心位置，因此需要找到发票专用章的中心点，用来定位税号位置。计算外轮廓椭圆图像的质心坐标的方式有多种，如几何矩、霍夫变换等，本发明采用几何矩的方式计算获取外轮廓椭圆图像的质心坐标，具体步骤如下：According to the national standard, the tax number of the special invoice stamp is at its center, so it is necessary to find the center point of the special stamp of the invoice to locate the position of the tax number. There are many ways to calculate the centroid coordinates of the outer contour ellipse image, such as geometric moments, Hough transform, etc., the present invention uses the geometric moment method to calculate and obtain the centroid coordinates of the outer contour ellipse image, and the specific steps are as follows:

离散图像的p+q阶几何矩m_pq定义为：The p+q order geometric moment m _pq of a discrete image is defined as:

其中，x、y分别表示外轮廓椭圆图像的横纵坐标；p表示x的阶数，q表示y的阶数，且p、q均为大于等于0的整数；M表示外轮廓椭圆图像横坐标x的界，N表示外轮廓椭圆图像纵坐标y的界，且M、N均为大于等于1的整数；Among them, x and y respectively represent the horizontal and vertical coordinates of the outer contour ellipse image; p represents the order of x, q represents the order of y, and both p and q are integers greater than or equal to 0; M represents the abscissa of the external contour ellipse image The boundary of x, N represents the boundary of the y-coordinate of the outer contour ellipse image, and both M and N are integers greater than or equal to 1;

取p＝0、q＝0，根据式<1>得到零阶几何矩m₀₀：Take p=0, q=0, and get the zero-order geometric moment m ₀₀ according to formula <1>:

取p＝1、q＝0，根据式<1>得到x一阶几何矩m₁₀：Take p=1, q=0, and get the first-order geometric moment m ₁₀ of x according to formula <1>:

取p＝0、q＝1，根据式<1>得到y一阶几何矩m₀₁：Take p=0, q=1, and get the first-order geometric moment m ₀₁ of y according to formula <1>:

利用零阶和一阶几何矩，通过如下公式求得发票专用章外轮廓椭圆图像的质心坐标 Using the zero-order and first-order geometric moments, the coordinates of the center of mass of the ellipse image of the outer contour of the invoice special seal are obtained by the following formula

发票专用章外轮廓的中心点如图5所示。The center point of the outer contour of the special seal for invoices is shown in Figure 5.

若在该步利用霍夫变换计算外轮廓椭圆图像的质心坐标，则需要如下步骤：If the Hough transform is used to calculate the centroid coordinates of the outer contour ellipse image in this step, the following steps are required:

将平面上的二次曲线表示为：Represent a quadratic curve in the plane as:

Ax²+2Bxy+Cy²+2Dx+2Ey+1＝0Ax ² +2Bxy+Cy ² +2Dx+2Ey+1＝0

其中，A、B、C、D、E表示五个待求的参数，x、y表示二次曲线的横纵坐标。Among them, A, B, C, D, and E represent the five parameters to be obtained, and x and y represent the horizontal and vertical coordinates of the quadratic curve.

利用霍夫变换求出以上五个参数，如果B²-AC＜0，则二次曲线为椭圆，然后根据椭圆参数方程求解椭圆的中心坐标。Calculate the above five parameters by using Hough transform, if B ² -AC<0, then the quadratic curve is an ellipse, and then solve the center coordinates of the ellipse according to the ellipse parameter equation.

步骤6.根据中心点坐标截取只包含税号的矩形区域。Step 6. According to the coordinates of the center point, intercept the rectangular area containing only the tax number.

根据发票专用章样章标准，税号在印章的中间位置，可以根据中心点坐标和自适应阈值，取出最小且完整包含税号的矩形区域，即只包含税号的矩形二值图，如图6所示。According to the sample standard of the special invoice stamp, the tax number is in the middle of the seal, and the smallest and complete rectangular area containing the tax number can be taken out according to the coordinates of the center point and the adaptive threshold, that is, the rectangular binary image that only contains the tax number, as shown in the figure 6.

步骤7.将税号分割为单个数字。Step 7. Split the tax ID number into single digits.

将步骤6提取出来的将矩形区域图像进行颜色反转，得到颜色反转后的图像如图7所示，再根据自适应定阈值，对反转后图像进行分割，分割得到每个数字。将矩形区域中的税号分割为单个数字。Invert the color of the rectangular area image extracted in step 6, and obtain the image after color inversion, as shown in Figure 7, and then segment the inverted image according to the adaptive threshold value, and segment each number. Splits tax ID numbers in a rectangular area into individual digits.

步骤8.对每个数字进行模板匹配，识别税号。Step 8. Perform template matching on each number to identify the tax ID number.

制作数字0～9的搜索模板T，调整单个数字图像的大小，使其与搜索模板T中数字的大小一致，将调整后的单个数字作为被搜索图S依次送入制作好的数字搜索模板中，采用相关法对数字进行匹配，根据匹配结果识别税号。Make a search template T for numbers 0 to 9, adjust the size of a single digital image to make it consistent with the size of the number in the search template T, and send the adjusted single number as the searched image S to the prepared digital search template in turn , use the correlation method to match the numbers, and identify the tax number according to the matching result.

采用相关法对数字进行匹配的具体过程如下：The specific process of matching numbers by correlation method is as follows:

(8.1)设搜索模板T有m×n个像素点、被搜索图S有W×H个像素点，将T叠放在S上并进行平移，T覆盖下的S区域为子图S_ij，根据下式得到搜索模板T与子图S_ij的相似度D(i,j)：(8.1) Assuming that the search template T has m×n pixels, and the searched image S has W×H pixels, superimpose T on S and perform translation, the S area covered by T is the sub-image S _ij , According to the following formula, the similarity D(i,j) between the search template T and the subgraph S _ij is obtained:

其中，i为子图左上角在被搜索图S上的横坐标，j为子图左上角在被搜索图S上的纵坐标，且1≤i≤W-m，1≤j≤H-n，S_ij(m,n)为S_ij在坐标(m,n)处的像素值，T(m,n)为T在坐标(m,n)处的像素值；Among them, i is the abscissa of the upper left corner of the subgraph on the searched graph S, j is the ordinate of the upper left corner of the subgraph on the searched graph S, and 1≤i≤Wm, 1≤j≤Hn, S _ij ( m, n) is the pixel value of S _ij at coordinates (m, n), and T(m, n) is the pixel value of T at coordinates (m, n);

(8.2)将D(i,j)归一化，得模板匹配的相关系数R(i,j)：(8.2) Normalize D(i,j) to get the correlation coefficient R(i,j) of template matching:

(8.3)相关系数R(i,j)＝1表示模板和子图完全一致，由于会存在一定偏差，我们此处规定，若R(i,j)大于等于0.9则表示模板和子图一致。判断相关系数R(i,j)的值是否大于等于0.9，若是，则子图和模板一致，完成匹配搜索，确定图片对应的数字；反之，子图和模板不一致，继续迭代搜索，直到搜索到与子图一致的模板，完成匹配搜索。(8.3) The correlation coefficient R(i,j)=1 means that the template and the subgraph are completely consistent. Since there will be a certain deviation, we stipulate here that if R(i,j) is greater than or equal to 0.9, it means that the template and the subgraph are consistent. Determine whether the value of the correlation coefficient R(i,j) is greater than or equal to 0.9. If so, the subgraph is consistent with the template, complete the matching search, and determine the number corresponding to the picture; otherwise, the subgraph is inconsistent with the template, and continue to iteratively search until it is found A template consistent with the subgraph, to complete the matching search.

本发明未详细说明部分属于本领域技术人员公知常识。Parts not described in detail in the present invention belong to the common knowledge of those skilled in the art.

以上描述仅是本发明的实施例，不构成对本发明的任何限制，显然对于本领域的专业人员来说，在了解了本发明内容和原理后，都可能在不背离本发明原理的情况下，进行形式和细节上的各种修正和改变，但是这些基于本发明思想的修正和改变仍在本发明的权利要求的保护范围之内。The above description is only an embodiment of the present invention, and does not constitute any limitation to the present invention. Obviously, for those skilled in the art, after understanding the content and principle of the present invention, it is possible without departing from the principle of the present invention. Various modifications and changes in form and details are made, but these modifications and changes based on the idea of the present invention are still within the protection scope of the claims of the present invention.

Claims

1. A method for extracting and identifying tax ID numbers of special stamps for invoices, comprising the following steps:

(1) Obtain the RGB image containing the special stamp for the invoice;

(2) Laplacian sharpening is carried out to the acquired RGB image, and then it is converted to the YCrCb color space;

(3) Separate the channels of the YCrCb color space, and perform adaptive threshold processing on the separated Cr channel, convert the image into a binary image, and then remove the noise of the binary image through dilation and corrosion operations, and obtain only the invoice-specific stamp The binary map;

(4) Carry out Canny edge detection on the binary image that only contains the special invoice stamp, and select the outermost contour, that is, the outer contour ellipse image of the special invoice stamp;

(5) Calculate the centroid coordinates of the outer contour ellipse image, i.e. the coordinates of the center point of the special invoice stamp;

(6) According to the coordinates of the center point, a rectangular area containing only the tax number is intercepted;

(7) Divide the tax code in the rectangular area into a single number;

(8) Make a search template T for numbers 0 to 9, adjust the size of a single digital image to make it consistent with the size of the number in the search template T, and send the adjusted single number as the searched image S to the prepared number in sequence In the search template, the correlation method is used to match the numbers, and the tax ID is identified according to the matching result.

2. The method according to claim 1, characterized in that: in the step (2), carry out Laplace sharpening processing to the RGB image obtained, and the specific processing method is: the gray value lower than the neighborhood pixel average gray value The gray value of the central pixel decreases, and the gray value of the central pixel higher than the average gray value of the neighboring pixels is increased.

3. method according to claim 1, is characterized in that: in step (2), the RGB image after sharpening is converted to YCrCb color space by following formula:

Y＝(B×1868+G×9617+R×4899+8192)÷16384;

Cr＝((B-Y)×9241+8192)÷16384+128;

Cb=((R-Y)×11682+8192)÷16384+128;

Among them, "Y" represents the value of the Y channel in the YCrCb image, "Cr" represents the value of the Cr channel in the YCrCb image, "Cb" represents the value of the Cb channel in the YCrCb image, and "R" represents the value of the R channel in the RGB image, "G" indicates the value of the G channel in the RGB image, and "B" indicates the value of the B channel in the RGB image.

4. The method according to claim 1, characterized in that: in the step (4), the Canny edge detection is carried out to the binary image that only includes the special seal for invoices, specifically through the following steps:

(4.1) Utilize Gaussian filter to denoise the binary image;

(4.2) Calculate the gradient strength and direction of each pixel in the image after denoising;

(4.3) Use non-maximum value suppression to eliminate the spurious response caused by edge detection;

(4.4) Identify real and potential edges using dual-threshold detection;

(4.5) Edge detection is finally accomplished by suppressing isolated weak edges.

5. The method according to claim 1, characterized in that: the centroid coordinates of the outer contour ellipse image in step (5) are calculated by geometric moments or Hough transform.

6. The method according to claim 5, characterized in that: calculate the centroid coordinates of the outer contour ellipse image by geometric moments, and the concrete steps are as follows:

The p+q order geometric moment m _pq of a discrete image is defined as:

Among them, x and y respectively represent the horizontal and vertical coordinates of the outer contour ellipse image; p represents the order of x, q represents the order of y, and both p and q are integers greater than or equal to 0; M represents the abscissa of the external contour ellipse image The boundary of x, N represents the boundary of the y-coordinate of the outer contour ellipse image, and both M and N are integers greater than or equal to 1;

Take p=0, q=0, and get the zero-order geometric moment m ₀₀ according to formula <1>:

Take p=1, q=0, and get the first-order geometric moment m ₁₀ of x according to formula <1>:

Take p=0, q=1, and get the first-order geometric moment m ₀₁ of y according to formula <1>:

Using the zero-order and first-order geometric moments, the coordinates of the center of mass of the ellipse image of the outer contour of the invoice special seal are obtained by the following formula

7. The method according to claim 1, characterized in that in step (6), the rectangular area containing only the tax code is the smallest and complete area containing the tax code extracted according to the coordinates of the center of mass and the adaptive threshold.

8. The method according to claim 1, characterized in that: in step (7), the tax code in the rectangular area is divided into single numbers, including the following operations:

(7.1) Reverse the color of the image in the rectangular area to obtain the image after inversion;

(7.2) Segment the reversed image according to the adaptive threshold to obtain a single digital image.

9. method according to claim 1 is characterized in that: adopt correlation method to match the specific process of number as follows in the step (8):

(8.1) Assuming that the search template T has m×n pixels, and the searched image S has W×H pixels, superimpose T on S and perform translation, the S area covered by T is the sub-image S _ij , According to the following formula, the similarity D(i,j) between the search template T and the subgraph S _ij is obtained:

Among them, i is the abscissa of the upper left corner of the subgraph on the searched graph S, j is the ordinate of the upper left corner of the subgraph on the searched graph S, and 1≤i≤Wm, 1≤j≤Hn, S _ij ( m, n) is the pixel value of S _ij at coordinates (m, n), and T(m, n) is the pixel value of T at coordinates (m, n);

(8.2) Normalize D(i,j) to get the correlation coefficient R(i,j) of template matching:

(8.3) Determine whether the value of the correlation coefficient R(i, j) is greater than or equal to 0.9, if so, the subgraph is consistent with the template, complete the matching search, and determine the number corresponding to the picture; otherwise, if the subgraph is inconsistent with the template, continue the iterative search, Until a template consistent with the subgraph is found, the matching search is completed.