CN114821134A - Method for identifying print style number of publication based on template matching - Google Patents
Method for identifying print style number of publication based on template matching Download PDFInfo
- Publication number
- CN114821134A CN114821134A CN202210753828.9A CN202210753828A CN114821134A CN 114821134 A CN114821134 A CN 114821134A CN 202210753828 A CN202210753828 A CN 202210753828A CN 114821134 A CN114821134 A CN 114821134A
- Authority
- CN
- China
- Prior art keywords
- layer
- template
- numbers
- weighting
- pixel point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
- G06V10/751—Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/418—Document matching, e.g. of document images
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention discloses a method for identifying the number of a printed publication based on template matching, belonging to the technical field of image processing; the method comprises the following steps: presetting a plurality of template layers containing different numbers; acquiring a digital image to be identified; acquiring a plurality of template layers to be selected which are matched with the numbers on the weighting single layer; acquiring a position feature set and a stroke thickness feature set of numbers on each weighted list layer; judging and acquiring a template layer to be selected which is matched with the numbers on the weighting list layer, and acquiring the digital information on the weighting list layer according to the matched template layer to be selected; the analogy in turn identifies the information of each digit in the digital image. According to the method, the pixel values in the digital area in the template image are weighted, a digital recognition algorithm based on template matching is improved, and the efficiency and the accuracy of digital recognition of the print form of the publication are improved.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a method for recognizing printed numbers of publications based on template matching.
Background
Print number recognition has been a research hotspot in the field of pattern recognition. With the rapid development of the information of the modern society, we are surrounded by the digital age, and the numbers are advancing to replace the conversation and the expression and memory of the literal languages. In the mobile phone number, the driving license number, the identification card number and the physical examination table, a series of digital information expressions expressing identity, capability, object and health quality are all expressed by the combination of Arabic numbers 0 to 9 interpreted by a computer.
Therefore, one of the key points in the design of a processing system for solving such problems is to design a digital recognition method with high reliability and high recognition rate. However, none of the prior art methods for recognizing numbers can achieve perfect recognition. The traditional matching algorithm has large calculation amount, more complex calculation and low matching efficiency; the variation of the mean gray level value of the scene in the image field affects the correctness of the matching result, and the inadaptability to the azimuth rotation and the scale scaling of the scene in the image field.
Disclosure of Invention
The invention provides a method for identifying the number of a printed publication based on template matching, which improves the digital identification algorithm based on template matching by weighting the pixel values in the digital area in a template image and improves the efficiency and the accuracy of the digital identification of the printed publication.
The invention aims to provide a method for identifying the number of a printed publication based on template matching, which comprises the following steps:
presetting a plurality of template layers containing different numbers, wherein each template layer is a binary image and has equal size;
the method comprises the steps of obtaining a digital image to be identified, and dividing each digital in the digital image into a plurality of first single image layers according to a communication domain of each digital; normalizing each first single image layer to obtain a plurality of second single image layers with the same size as the template image layers; in each template layer and the second single layer, the gray value of the pixel point in the digital area is 1, and the gray value of the pixel point in the background area is 0;
traversing each pixel point of each template layer along the horizontal direction to obtain a plurality of pixel point sequences which are continuously arranged and have a gray value of 1; acquiring the step length of each pixel point sequence in each template layer and the weight of a pixel point in each pixel point sequence; weighting the template image layer according to the weight of the pixel points in each pixel point sequence to sequentially obtain a weighted template image layer corresponding to each number; sequentially comparing each second single image layer in a similar manner, and performing weighting processing to obtain a plurality of weighted single image layers;
arranging a plurality of transverse lines on each weighting template layer at equal intervals from top to bottom, and respectively counting the number of first intersection points of each transverse line and the number on the corresponding weighting template layer; sequentially obtaining the second intersection point number of each transverse line and the number on the weighting single layer by analogy;
acquiring a plurality of template layers to be selected which are matched with the numbers on the weighting single layers according to the number of second intersection points of each transverse line on one weighting single layer and the number of first intersection points on each weighting template layer;
acquiring a position feature set of numbers on each template layer to be selected according to the weight of all pixel points on each horizontal line corresponding to the template layer to be selected in each template layer to be selected; acquiring a stroke thickness characteristic set of numbers on each template layer to be selected according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selected; sequentially analogizing to obtain a position feature set and a stroke thickness feature set of the numbers on each weighted list layer;
judging and acquiring a template layer to be selected matched with the numbers on the weighting list layer according to a position feature set and a stroke thickness feature set of the numbers on the weighting list layer and a position feature set and a stroke thickness feature set of the numbers on each template layer to be selected, and acquiring digital information on the weighting list layer according to the matched template layer to be selected;
the analogy in turn identifies the information for each digit in the digital image.
In an embodiment, the step length of each pixel point sequence and the weight of the pixel point in each pixel point sequence are obtained according to the following steps:
counting the number of pixels in each pixel sequence to obtain the length of each pixel sequence;
setting the weight variation from the pixel points at two ends of each pixel point sequence to the central pixel point to be an arithmetic progression, and setting the weight of each pixel point in each pixel point sequenceThe sum of (1) and the initial weights of the pixel points at both ends of each pixel point sequence are all set asWherein N is the transverse length of the template layer;
acquiring the step length of each pixel point sequence according to the length of each pixel point sequence and the initial weight of the pixels at the two corresponding ends;
and then, acquiring the weight of the pixel point in each pixel point sequence according to the step length of each pixel point sequence and the initial weight of the pixel points at the two corresponding ends.
In an embodiment, in the process of acquiring a plurality of template layers to be selected, which are matched with the numbers on the weighting single layer, if one template layer to be selected is acquired, the number information on the weighting single layer is acquired according to the template layer to be selected.
In an embodiment, the template layers to be selected that are matched with the numbers on the weighting list layer are obtained according to the following steps:
acquiring the position of each transverse line on a weighting list layer and the number of second intersection points between the position and the numbers on the weighting list layer; acquiring the number of first intersection points of transverse lines at the same position on the weighting single layer and numbers on the weighting template layer;
and judging a plurality of weighting template layers matched with the numbers on the weighting list layers according to the difference between the position of each transverse line on one weighting list layer and the number of first intersection points of the position and the numbers on the weighting list layer, and the number of second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layers, namely the plurality of weighting template layers to be selected.
In an embodiment, the calculation formula for determining the weighted template layers matched with the numbers on the weighted list layer is as follows:
in the formula (I), the compound is shown in the specification,second represented on the weighted list layerThe number of second intersection points of the transverse lines and the numbers on the weighting list layer;
second expressed on the weighted template layerThe number of first intersection points of the transverse lines and the numbers on the weighted template layer;
the difference value between the number of the first intersection points of the numbers of each transverse line on the weighting list layer and the number of the second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layer is represented;
when in useIf so, judging that the numbers on the weighted single layer are not matched with the numbers on the weighted template layer;
when in useAnd judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer, and sequentially acquiring a plurality of weighting template layers matched with the numbers on the weighting list layer, namely a plurality of weighting template layers to be selected.
In an embodiment, the position feature set of the numbers on each layer of the template to be selected is obtained according to the following steps:
according to the weight values of all pixel points on each transverse line corresponding to the template layer to be selected in each template layer to be selected, and according to the sequence of the transverse lines from top to bottom in the template layer to be selected, sequentially corresponding each transverse line to the template to be selectedObtaining a position feature set by the weight arrangement of all pixel points on the plate layer, namely the position feature set isAnd N represents the transverse length of the template layer to be selected.
In an embodiment, the stroke weight feature set of the numbers on each layer of the template to be selected is obtained according to the following steps:
according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selected, arranging the step length of the pixel point sequence corresponding to each horizontal line in sequence from top to bottom in the template layer to be selected to obtain a stroke thickness feature set, namely the stroke thickness feature setWherein, in the step (A),and representing the total number of intersection points of all the transverse lines on the layer of the template to be selected.
In an embodiment, in the process of determining and acquiring the template layer to be selected that matches the numbers on the weighting list layer, the template layer to be selected that matches the numbers on the weighting list layer is determined according to a numerical difference between a position feature set of the numbers on a weighting list layer and a corresponding position in a position feature set of the numbers on each template layer to be selected, and a numerical difference between a stroke thickness feature set of the numbers on the weighting list layer and a corresponding position in a stroke thickness feature set of the numbers on each template layer to be selected.
In an embodiment, the calculation formula for determining the template layer to be selected that matches the number on the weighted list layer is as follows:
in the formula (I), the compound is shown in the specification,representing weighted sheetsA position feature set corresponding to the image layer;
the position feature set corresponding to the weighted single layer and the position feature set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the stroke weight characteristic set corresponding to the template layer to be selected,;
the first stroke weight characteristic set corresponding to the weighted list layer and the second stroke weight characteristic set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the matching error value of the number on the weighting list layer and the number on the template layer to be selected;
When in useIf so, judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer to be selected,
when in useIf so, judging that the numbers on the weighting list layer are not matched with the numbers on the weighting template layer to be selected,
wherein the allowable error componentThe method comprises the steps of calculating the number on the known correctly matched weighted single layer and the number on the weighted template layer to be selectedValue derived tolerable error component。
In one embodiment, in the process of dividing each digital in the digital image into a plurality of first single image layers according to its own connected domain, the method further comprises:
acquiring a digital image to be identified; converting the digital image into a binary image after graying;
opening operation processing is carried out on the binary image, and isolated small points and burrs on the digital image are removed;
and dividing a plurality of first single image layers according to the connected domain of each number in the binary image.
The invention has the beneficial effects that:
the invention provides a method for identifying the number of printed publication based on template matching, which carries out the first rough matching by calculating the number of intersection points of a transverse line and numbers on a layer, and can obviously improve the matching speed due to the great reduction of the data volume. The matching of the digital position and the stroke thickness for the second time is accurate matching, and the matching is carried out according to the weight and the change characteristics of each pixel point, so that the error is reduced. The template matching times are reduced, the calculation time is short, the feature weighting is reduced, the matching error is reduced, the efficiency and the accuracy of the number recognition of the printed matter are improved, the character recognition rejection rate is continuously reduced along with the increase of the character template library, and the character recognition rate is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flow chart illustrating the general steps of an embodiment of a method for identifying numbers of printed matters of a publication based on template matching according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims at the problem that the digital identification of the printed matter of the publication is greatly interfered by the quality of the scanned digital image, and can reduce the identification rate of character images with shielding, defects and pollution. By weighting the pixel values in the digital area in the template image, the digital recognition algorithm based on template matching is improved, and the efficiency and accuracy of digital recognition of print forms of publications are improved.
According to the invention, the importance of the edge point and the center point in the character stroke is different according to the morphological refinement principle, and the 'skeleton', namely the importance of the center point to the two edge points, is gradually reduced, so that the concept of weight can be added into the character template according to the logic to generate the character template with weighted characteristics, further the digital recognition algorithm based on template matching is improved, and the efficiency and the accuracy of digital recognition of the print style of the publication are improved.
The invention provides a method for identifying printed numbers of a publication based on template matching, which is shown in figure 1 and comprises the following steps:
s1, presetting a plurality of template layers containing different numbers, wherein each template layer is a binary image and has equal size;
in this embodiment, the standard digital template image size is set asTherefore, the divided digital images are normalizedA pixel-sized template layer.
S2, acquiring digital images to be recognized, and dividing each digital image into a plurality of first single image layers according to the communication domain of each digital image; normalizing each first single image layer to obtain a plurality of second single image layers with the same size as the template image layers; in each template layer and the second single layer, the gray value of the pixel point in the digital area is 1, and the gray value of the pixel point in the background area is 0;
wherein, in the course of dividing each digit in the digital image into a plurality of first single image layers according to its own connected domain, also include:
acquiring a digital image to be identified; converting the digital image into a binary image after graying;
opening operation processing is carried out on the binary image, and isolated small points and burrs on the digital image are removed;
and dividing a plurality of first single image layers according to the connected domain of each number in the binary image.
In this embodiment, the number of the publication print is converted into image information by the optical scanning technique. Because the image can generate noise in the process of acquisition and transmission, and the noise exists, the subsequent image processing and analysis can be influenced to a certain extent, the noise with higher probability can be processed by using the self-adaptive median filter, and the details can be saved while the non-impulse noise is smoothed, wherein the self-adaptive median filter of a 3 x 3 template is adopted in the embodiment;
then carrying out gray processing on the digital image, obtaining a gray threshold T by using an Otsu algorithm, converting the digital image into a binary image according to the gray threshold T, wherein the pixel point of a digital area is 1, and the pixel point of a background area is 0;
performing opening operation processing on the binary image to remove isolated small points and burrs on the digital image;
segmenting characters according to connected domains of the digital regions, wherein each individual number can form a connected image domain, and acquiring the starting position and the ending position of the row and the column of each connected domain so as to obtain the minimum circumscribed rectangle of a single character, thereby completing the work of segmenting the characters and acquiring a first single layer corresponding to the individual number;
finally, performing geometric transformation on the first single image layers with different sizes to enable the first single image layers to have the same size as the template image layers, wherein the bilinear interpolation algorithm is used for performing normalization processing on the first single image layers of all numbers; thereby obtaining a second monolayer layer corresponding to the individual number.
S3, traversing each pixel point of each template layer along the horizontal direction to obtain a plurality of pixel point sequences which are continuously arranged and have the gray value of 1; acquiring the step length of each pixel point sequence in each template layer and the weight of a pixel point in each pixel point sequence; weighting the template image layer according to the weight of the pixel points in each pixel point sequence to sequentially obtain a weighted template image layer corresponding to each number; sequentially comparing each second single image layer in a similar manner, and performing weighting processing to obtain a plurality of weighted single image layers;
it should be noted that the central point of the digital stroke in the template layer has a greater effect, and the edge point of the stroke has a smaller influence on the recognition of the whole character, so that the pixel points in the digital region are weighted according to the positions of the pixel points in the template image to enhance the accuracy of template matching.
The step length of each pixel point sequence and the weight of the pixel points in each pixel point sequence are obtained according to the following steps:
counting the number of pixels in each pixel sequence to obtain the length of each pixel sequence;
setting the weight changes from the pixel points at two ends of each pixel point sequence to the central pixel point to be equal difference series, setting the sum of the weights of each pixel point in each pixel point sequence to be 1, and setting the initial weights of the pixel points at two ends of each pixel point sequence to be equalWherein N is the transverse length of the template layer;
acquiring the step length of each pixel point sequence according to the length of each pixel point sequence and the initial weight of the pixel points at the corresponding two ends;
and then, acquiring the weight of the pixel point in each pixel point sequence according to the step length of each pixel point sequence and the initial weight of the pixel points at the two corresponding ends.
In this embodiment, the step of obtaining a sequence of more pixel points in the whole image is as follows:
(1) firstly, traversing a template layer from the left to the right from a pixel point at the upper left corner, marking the pixel point as A when the pixel point is encountered, continuing to traverse until the pixel point is encountered as 0, marking the previous pixel point as B, namely acquiring a continuously arranged pixel point sequence with the gray value of 1, marking the front-end pixel point of the pixel point sequence as A, marking the rear-end pixel point as B, and acquiring the length C from A to B by counting the number of the pixel points in the pixel point sequence;
(2) continuously traversing the row, if the pixel point can be met to be 1, continuously marking according to the step (1), and counting the pixel point sequence and the length of the pixel point sequence until the row is traversed;
(3) and (5) returning to the steps (1) and (2) to traverse the next line until the whole image is traversed.
Thus obtaining a length set of each pixel point sequenceWherein n is the number of pixel point sequences in the digital region.
Then, the sequence of the z-th pixel point is taken, the length of which isIn this embodiment, the weights from the two edge pixels to the central pixel on the pixel point sequence are gradually increased, and the weights from the two edge points A, B to the central point Q are respectively changed into an arithmetic progression to obtain two identical and gradually increased arithmetic progression AQ and BQ, and an initial term is setWherein N is the transverse length of the template layer, the sum of the weights on the pixel point sequence is 1, which can be known according to the characteristic of the arithmetic progression,
if it isEven number, the step length of the two same arithmetic progressionComprises the following steps:
wherein the content of the first and second substances,representing the total length of two identical arithmetic progression,for the initial term, two parameters are known, and the step length is obtainedThe value of (c).
If it isIf the number is odd, the two same arithmetic progression share the center point Q, and the step length thereofComprises the following steps:
whereinRepresenting the total length of two identical arithmetic progression,for the initial term, two parameters are known, and the step length is obtainedThe value of (c). Thus, the data sets of the two arithmetic progression are obtained, and the corresponding data sets are endowed to the pixel point sequence, so as to obtain the weight of each pixel point of the pixel point sequence.
Set available by the same wayThe weight of each pixel point in all the pixel point sequences is obtained, namely the weight of each pixel point in the digital region is obtainedWhereinThe coordinates of the pixel points in the corresponding digital region.
And finally, weighting the template layers of all the numbers according to the mode to process the weighted template layer corresponding to each number, wherein the numbers comprise all the numbers from 0 to 9.
S4, arranging a plurality of transverse lines on each weighting template layer at equal intervals from top to bottom, and respectively counting the number of first intersection points of each transverse line and the number on the corresponding weighting template layer; sequentially analogizing to obtain the number of second intersection points of each transverse line and the number on the weighting list layer;
acquiring a plurality of template layers to be selected which are matched with the numbers on the weighting single layers according to the number of second intersection points of each transverse line on one weighting single layer and the number of first intersection points on each weighting template layer;
in this embodiment, three horizontal lines are arranged on each weighted template layer from top to bottom at equal intervalsThe weighted template layer can be divided into four parts by the three transverse lines, and the positions respectively taken in the three transverse directions are positioned in the weighted template layer,,Therein is disclosedIs the longitudinal length of the weighted template layer, which is recorded as、And(ii) a Respectively count、、Sum of horizontal line to weight of pixel point on weighted template layerWherein1 to 3; since the sum of the weights of the pixel points in the single pixel point sequence is 1, that isA first number of intersections representing the horizontal line and the number;
similarly, the number of second intersection points of the numbers on the weighted list layer is obtained;
In addition, in the process of acquiring a plurality of template layers to be selected, which are matched with the numbers on the weighting single layer, if one template layer to be selected is acquired, the number information on the weighting single layer is acquired according to the template layer to be selected.
The intersection feature table of the obtained transverse line and the numbers on the weighting template layer is shown in the following table 1:
table 1 is a cross point feature table of numbers on the layer of the horizontal line and the weighted template
Digital signature | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Transverse line is onPoint of intersection of | 2 | 1 | 2 | 2 | 2 | 1 | 2 | 1 | 2 | 2 |
Transverse line is onPoint of intersection of | 2 | 1 | 1 | 1 | 2 | 1 | 2 | 1 | 1 | 1 |
Transverse line is onPoint of intersection of | 2 | 1 | 2 | 2 | 1 | 2 | 2 | 1 | 2 | 2 |
It should be noted that, the number of intersection points obtained when the fonts of numbers in the template layer and the single layer are different is also different, and for this reason, it is specified that both the digital fonts in the template layer and the single layer are arials in this embodiment.
The plurality of template layers to be selected which are matched with the numbers on the weighting single layer are obtained according to the following steps:
acquiring the position of each transverse line on a weighting list layer and the number of second intersection points between the position and the numbers on the weighting list layer; acquiring the number of first intersection points of transverse lines at the same position on the weighting single layer and numbers on the weighting template layer;
and judging a plurality of weighting template layers matched with the numbers on the weighting list layers according to the difference between the position of each transverse line on one weighting list layer and the number of first intersection points of the position and the numbers on the weighting list layer, and the number of second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layers, namely the plurality of weighting template layers to be selected.
Specifically, the calculation formula for determining the plurality of weighted template layers matched with the numbers on the weighted list layer is as follows:
in the formula (I), the compound is shown in the specification,second represented on the weighted list layerThe number of second intersection points of the transverse lines and the numbers on the weighting list layer;
second expressed on the weighted template layerThe number of first intersection points of the transverse lines and the numbers on the weighted template layer;
the difference value between the number of the first intersection points of the numbers of each transverse line on the weighting list layer and the number of the second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layer is represented;
when in useIf so, judging that the numbers on the weighted single layer are not matched with the numbers on the weighted template layer;
when in useAnd judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer, and sequentially acquiring a plurality of weighting template layers matched with the numbers on the weighting list layer, namely a plurality of weighting template layers to be selected.
S5, acquiring a position feature set of numbers on each template layer to be selected according to the weight of all pixel points on each transverse line corresponding to the template layer to be selected in each template layer to be selected; acquiring a stroke thickness characteristic set of numbers on each template layer to be selected according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selected; sequentially analogizing to obtain a position feature set and a stroke thickness feature set of the numbers on each weighted list layer;
it should be noted that, the length of each pixel sequence is longer, and the step length is smaller. And three straight linesThe length of each pixel point sequence is the thickness degree of the character stroke at the intersection point, so the thickness degree of the character stroke at the intersection point can be represented by the step size.
The position feature set of the numbers on each template layer to be selected is obtained according to the following steps:
according to the weights of all pixel points on each horizontal line corresponding to the template layer to be selected in each template layer to be selected, sequentially arranging the weights of all the pixel points on each horizontal line corresponding to the template layer to be selected in the template layer to be selected from top to bottom according to the sequence of the horizontal lines in the template layer to be selected to obtain a position feature set, namely the position feature set isAnd N represents the transverse length of the template layer to be selected.
The stroke weight characteristic set of the numbers on each template layer to be selected is obtained according to the following steps:
according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selectedSequentially arranging the step length of the pixel point sequence corresponding to each transverse line according to the sequence of the transverse lines from top to bottom in the layer of the template to be selected to obtain a stroke thickness characteristic set, namely the stroke thickness characteristic setWherein, in the step (A),and representing the total number of intersection points of all the transverse lines on the layer of the template to be selected.
S6, judging and acquiring a template layer to be selected which is matched with the numbers on the weighting list layer according to the position feature set and the stroke thickness feature set of the numbers on the weighting list layer and the position feature set and the stroke thickness feature set of the numbers on each template layer to be selected, and acquiring the digital information on the weighting list layer according to the matched template layer to be selected;
the analogy in turn identifies the information for each digit in the digital image.
In the process of judging and acquiring the template layer to be selected matched with the numbers on the weighting single layer, judging the template layer to be selected matched with the numbers on the weighting single layer according to the numerical difference value of the corresponding position in the position feature set of the numbers on the weighting single layer and the position feature set of the numbers on each template layer to be selected, and the numerical difference value of the corresponding position in the stroke thickness feature set of the numbers on the weighting single layer and the stroke thickness feature set of the numbers on each template layer to be selected.
Specifically, the calculation formula for judging the template layer to be selected that is matched with the numbers on the weighting list layer is as follows:
in the formula (I), the compound is shown in the specification,representing a position feature set corresponding to the weighting single layer;
the position feature set corresponding to the weighted single layer and the position feature set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the stroke weight characteristic set corresponding to the template layer to be selected,;
the first stroke weight characteristic set corresponding to the weighted list layer and the second stroke weight characteristic set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the matching error value of the number on the weighting list layer and the number on the template layer to be selected;
When in useIf so, judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer to be selected,
when in useIf so, judging that the numbers on the weighting list layer are not matched with the numbers on the weighting template layer to be selected; if the two characters are judged not to be matched, the intersection point characteristic is returnedAnd matching, comparing with the next template character, and if no matched character exists in the template library, sending rejection information to request manual identification.
Wherein the allowable error componentObtaining a weighting single layer and a weighting template layer to be selected which are known to be correctly matched according to manual identification, and calculating the weighting single layer and the weighting template layer to be selectedValue, in turn to obtain a correct matchIn the collection, get the collectionMaximum value, as tolerable error component。
Sequentially obtaining templates corresponding to 0-9 digits according to the stepsFirst of layerThe number of first intersection points of the transverse lines and the numbers on the template layer, and a position feature set and a stroke thickness feature set of the numbers on each template layer; constructing a template library;
according to the acquired digital image to be recognized, acquiring the corresponding first single image layer in the digital image to be recognized through the operationThe number of second intersection points of the transverse lines and the numbers on the single image layer, and a position feature set and a stroke thickness feature set of the numbers on the single image layer; and acquiring a template layer matched with the number on the single image layer to be recognized from a template library through the number of the intersection points, the position feature set of the number on the layer and the stroke thickness feature set, so as to recognize the number information on the unknown digital image to be recognized according to the known template layer.
In this embodiment, the method further includes updating the template library to improve the recognition rate of the characters: the method comprises the following specific steps:
in this embodiment, the feature table of the recognized character is compared with the feature tables of the standard characters in the template library one by one, and when the feature table of a certain character in the template library is found to be matched with the feature table of the certain character, the recognized character is determined to be the standard character. However, due to the diversity of the digital fonts, standard characters which are not matched with the recognized characters exist in the template library, the system sends rejection information to prompt a user that the recognition result is not accurate enough, manual recognition is suggested, then the feature table of the rejected characters is added into the template library in the form of a template file through learning, and accurate recognition can be carried out when the characters with the features are encountered later.
In this embodiment, the number of intersections between the horizontal line and the numbers on the image layers is calculated to perform the first rough matching, and since the data amount is greatly reduced, the matching speed can be significantly increased. The matching of the digital position and the stroke thickness for the second time is accurate matching, and the matching is carried out according to the weight and the change characteristics of each pixel point, so that the error is reduced. The template matching times are reduced, the calculation time is short, the feature weighting is reduced, the matching error is reduced, the efficiency and the accuracy of the number recognition of the printed matter are improved, the character recognition rejection rate is continuously reduced along with the increase of the character template library, and the character recognition rate is improved.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for recognizing print numbers of publications based on template matching is characterized by comprising the following steps:
presetting a plurality of template layers containing different numbers, wherein each template layer is a binary image and has equal size;
the method comprises the steps of obtaining a digital image to be identified, and dividing each digital in the digital image into a plurality of first single image layers according to a communication domain of each digital; normalizing each first single image layer to obtain a plurality of second single image layers with the same size as the template image layers; in each template layer and the second single layer, the gray value of the pixel point in the digital area is 1, and the gray value of the pixel point in the background area is 0;
traversing each pixel point of each template layer along the horizontal direction to obtain a plurality of pixel point sequences which are continuously arranged and have a gray value of 1; acquiring the step length of each pixel point sequence in each template layer and the weight of a pixel point in each pixel point sequence; weighting the template image layer according to the weight of the pixel points in each pixel point sequence to sequentially obtain a weighted template image layer corresponding to each number; sequentially comparing each second single image layer in a similar manner, and performing weighting processing to obtain a plurality of weighted single image layers;
arranging a plurality of transverse lines on each weighting template layer at equal intervals from top to bottom, and respectively counting the number of first intersection points of each transverse line and the number on the corresponding weighting template layer; sequentially analogizing to obtain the number of second intersection points of each transverse line and the number on the weighting list layer;
acquiring a plurality of template layers to be selected which are matched with the numbers on the weighting single layers according to the number of second intersection points of each transverse line on one weighting single layer and the number of first intersection points on each weighting template layer;
acquiring a position feature set of numbers on each template layer to be selected according to the weight of all pixel points on each horizontal line corresponding to the template layer to be selected in each template layer to be selected; acquiring a stroke thickness characteristic set of numbers on each template layer to be selected according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selected; sequentially analogizing to obtain a position feature set and a stroke thickness feature set of the numbers on each weighted list layer;
judging and acquiring a template layer to be selected matched with the numbers on the weighting list layer according to a position feature set and a stroke thickness feature set of the numbers on the weighting list layer and a position feature set and a stroke thickness feature set of the numbers on each template layer to be selected, and acquiring digital information on the weighting list layer according to the matched template layer to be selected;
the analogy in turn identifies the information for each digit in the digital image.
2. The method for identifying printed numbers of publications based on template matching according to claim 1, wherein the step length of each pixel point sequence and the weight of the pixel points in each pixel point sequence are obtained by the following steps:
counting the number of pixels in each pixel sequence to obtain the length of each pixel sequence;
setting the weight changes from the pixel points at two ends of each pixel point sequence to the central pixel point to be equal difference series, setting the sum of the weights of each pixel point in each pixel point sequence to be 1, and setting the initial weights of the pixel points at two ends of each pixel point sequence to be equalWherein N is the transverse length of the template layer;
acquiring the step length of each pixel point sequence according to the length of each pixel point sequence and the initial weight of the pixels at the two corresponding ends;
and then, acquiring the weight of the pixel point in each pixel point sequence according to the step length of each pixel point sequence and the initial weight of the pixel points at the two corresponding ends.
3. The method according to claim 1, wherein in the process of obtaining a plurality of template layers to be selected that match the numbers on the weighted list layer, if one template layer to be selected is obtained, the digital information on the weighted list layer is obtained according to the template layer to be selected.
4. The method for identifying numbers of printed publication based on template matching according to claim 1, wherein the template layers to be selected that match the numbers on the weighted list layer are obtained according to the following steps:
acquiring the position of each transverse line on a weighting list layer and the number of second intersection points between the position and the numbers on the weighting list layer; acquiring the number of first intersection points of transverse lines at the same position on the weighting single layer and numbers on the weighting template layer;
and judging a plurality of weighting template layers matched with the numbers on the weighting list layers according to the difference between the position of each transverse line on one weighting list layer and the number of first intersection points of the position and the numbers on the weighting list layer, and the number of second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layers, namely the plurality of weighting template layers to be selected.
5. The method according to claim 4, wherein the formula for determining the weighted template layers matching the numbers on the weighted list layers is as follows:
in the formula (I), the compound is shown in the specification,second represented on the weighted list layerThe number of second intersection points of the transverse lines and the numbers on the weighting list layer;
second expressed on the weighted template layerThe number of first intersection points of the transverse lines and the numbers on the weighted template layer;
the difference value between the number of the first intersection points of the numbers of each transverse line on the weighting list layer and the number of the second intersection points of the transverse lines at the same position on the weighting list layer and the numbers on the weighting template layer is represented;
when in useIf so, judging that the numbers on the weighted single layer are not matched with the numbers on the weighted template layer;
when in useAnd judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer, and sequentially acquiring a plurality of weighting template layers matched with the numbers on the weighting list layer, namely a plurality of weighting template layers to be selected.
6. The method for identifying numbers of printed publications based on template matching according to claim 1, wherein the position feature set of the numbers on each template layer to be selected is obtained according to the following steps:
according to the weights of all pixel points on each horizontal line corresponding to the template layer to be selected in each template layer to be selected, sequentially arranging the weights of all the pixel points on each horizontal line corresponding to the template layer to be selected in the template layer to be selected from top to bottom according to the sequence of the horizontal lines in the template layer to be selected to obtain a position feature set, namely the position feature set isAnd N represents the transverse length of the template layer to be selected.
7. The method for identifying numbers in a print form of a publication according to claim 6, wherein the stroke weight feature set of the numbers on each template layer to be selected is obtained according to the following steps:
according to the step length of the pixel point sequence corresponding to each horizontal line in each template layer to be selected, arranging the step length of the pixel point sequence corresponding to each horizontal line in sequence from top to bottom in the template layer to be selected to obtain a stroke thickness feature set, namely the stroke thickness feature setWherein, in the step (A),and representing the total number of intersection points of all the transverse lines on the layer of the template to be selected.
8. The method for recognizing the numbers of the printed publications based on the template matching as claimed in claim 7, wherein in the process of judging and acquiring the template layer to be selected matching with the numbers on the weighted list layer, the template layer to be selected matching with the numbers on the weighted list layer is judged according to the numerical difference between the position feature set of the numbers on a weighted list layer and the corresponding position in the position feature set of the numbers on each template layer to be selected, and the numerical difference between the stroke thickness feature set of the numbers on the weighted list layer and the corresponding position in the stroke thickness feature set of the numbers on each template layer to be selected.
9. The method according to claim 8, wherein the calculation formula for determining the template layer to be selected that matches the number on the weighted list layer is as follows:
in the formula (I), the compound is shown in the specification,representing a position feature set corresponding to the weighting single layer;
the position feature set corresponding to the weighted single layer and the position feature set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the stroke weight characteristic set corresponding to the template layer to be selected,;
the first stroke weight characteristic set corresponding to the weighted list layer and the second stroke weight characteristic set corresponding to the template layer to be selectedAbsolute value of difference of the individual values;
representing the matching error value of the number on the weighting list layer and the number on the template layer to be selected;
When in useIf so, judging that the numbers on the weighting list layer are matched with the numbers on the weighting template layer to be selected,
when in useIf so, judging that the numbers on the weighting list layer are not matched with the numbers on the weighting template layer to be selected,
10. The method for identifying numbers on a printed matter based on template matching according to claim 1, wherein in the process of dividing each number in the digital image into a plurality of first single image layers according to its own connected domain, the method further comprises:
acquiring a digital image to be identified; converting the digital image into a binary image after graying;
opening operation processing is carried out on the binary image, and isolated small points and burrs on the digital image are removed;
and dividing a plurality of first single image layers according to the connected domain of each number in the binary image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210753828.9A CN114821134B (en) | 2022-06-30 | 2022-06-30 | Method for identifying print style number of publication based on template matching |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210753828.9A CN114821134B (en) | 2022-06-30 | 2022-06-30 | Method for identifying print style number of publication based on template matching |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114821134A true CN114821134A (en) | 2022-07-29 |
CN114821134B CN114821134B (en) | 2022-09-02 |
Family
ID=82522229
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210753828.9A Active CN114821134B (en) | 2022-06-30 | 2022-06-30 | Method for identifying print style number of publication based on template matching |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114821134B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010198305A (en) * | 2009-02-25 | 2010-09-09 | Amano Corp | Vehicle number information reading system |
CN102024144A (en) * | 2010-11-23 | 2011-04-20 | 上海海事大学 | Container number identification method |
CN102509383A (en) * | 2011-11-28 | 2012-06-20 | 哈尔滨工业大学深圳研究生院 | Feature detection and template matching-based mixed number identification method |
CN104463195A (en) * | 2014-11-08 | 2015-03-25 | 沈阳工业大学 | Printing style digital recognition method based on template matching |
CN105574531A (en) * | 2015-12-11 | 2016-05-11 | 中国电力科学研究院 | Intersection point feature extraction based digital identification method |
-
2022
- 2022-06-30 CN CN202210753828.9A patent/CN114821134B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010198305A (en) * | 2009-02-25 | 2010-09-09 | Amano Corp | Vehicle number information reading system |
CN102024144A (en) * | 2010-11-23 | 2011-04-20 | 上海海事大学 | Container number identification method |
CN102509383A (en) * | 2011-11-28 | 2012-06-20 | 哈尔滨工业大学深圳研究生院 | Feature detection and template matching-based mixed number identification method |
CN104463195A (en) * | 2014-11-08 | 2015-03-25 | 沈阳工业大学 | Printing style digital recognition method based on template matching |
CN105574531A (en) * | 2015-12-11 | 2016-05-11 | 中国电力科学研究院 | Intersection point feature extraction based digital identification method |
Also Published As
Publication number | Publication date |
---|---|
CN114821134B (en) | 2022-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109829453B (en) | Method and device for recognizing characters in card and computing equipment | |
US5901239A (en) | Skin pattern and fingerprint classification system | |
CN108596197B (en) | Seal matching method and device | |
CN110032938B (en) | Tibetan recognition method and device and electronic equipment | |
CN109919160B (en) | Verification code identification method, device, terminal and storage medium | |
CN109740606B (en) | Image identification method and device | |
CN109241861B (en) | Mathematical formula identification method, device, equipment and storage medium | |
US20170308768A1 (en) | Character information recognition method based on image processing | |
CN108197644A (en) | A kind of image-recognizing method and device | |
CN108830275B (en) | Method and device for identifying dot matrix characters and dot matrix numbers | |
WO2017161636A1 (en) | Fingerprint-based terminal payment method and device | |
CN110647795A (en) | Form recognition method | |
CN111523622B (en) | Method for simulating handwriting by mechanical arm based on characteristic image self-learning | |
CN114038004A (en) | Certificate information extraction method, device, equipment and storage medium | |
CN110738030A (en) | Table reconstruction method and device, electronic equipment and storage medium | |
CN115457565A (en) | OCR character recognition method, electronic equipment and storage medium | |
CN111339932B (en) | Palm print image preprocessing method and system | |
CN111950559A (en) | Pointer instrument automatic reading method based on radial gray scale | |
CN114417904A (en) | Bar code identification method based on deep learning and book retrieval system | |
CN114821134B (en) | Method for identifying print style number of publication based on template matching | |
CN112101343A (en) | License plate character segmentation and recognition method | |
CN109726722B (en) | Character segmentation method and device | |
CN114387592B (en) | Character positioning and identifying method under complex background | |
CN111488870A (en) | Character recognition method and character recognition device | |
CN112906690A (en) | License plate segmentation model training method, license plate segmentation method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |