CN108399425B - Character isomorphism detection method and device - Google Patents

Character isomorphism detection method and device Download PDF

Info

Publication number
CN108399425B
CN108399425B CN201810126894.7A CN201810126894A CN108399425B CN 108399425 B CN108399425 B CN 108399425B CN 201810126894 A CN201810126894 A CN 201810126894A CN 108399425 B CN108399425 B CN 108399425B
Authority
CN
China
Prior art keywords
node
character image
character
isomorphic
structure chart
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810126894.7A
Other languages
Chinese (zh)
Other versions
CN108399425A (en
Inventor
周恺卿
莫礼平
曾磊
曹良斌
刘笔余
江威
张轩宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jishou University
Original Assignee
Jishou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jishou University filed Critical Jishou University
Priority to CN201810126894.7A priority Critical patent/CN108399425B/en
Publication of CN108399425A publication Critical patent/CN108399425A/en
Application granted granted Critical
Publication of CN108399425B publication Critical patent/CN108399425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/754Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries involving a deformation of the sample pattern or of the reference pattern; Elastic matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/196Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
    • G06V30/1983Syntactic or structural pattern recognition, e.g. symbolic string recognition
    • G06V30/1988Graph matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The embodiment of the invention provides a character isomorphism detection method and a device, wherein the method comprises the following steps: acquiring a character image to be processed, and converting the character image into a curve structure chart; acquiring a training result set, and acquiring a node set and an edge set of an isomorphic group aiming at each isomorphic group in the training result set; judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart according to the obtained node set and edge set of the isomorphic group and the node set and edge set of the curve structure chart; and if isomorphism exists, acquiring a node mapping set between the character image and the curve structure chart, and acquiring isomorphism mapping between the character image and the curve structure chart according to the node mapping set. The character isomorphism detection scheme utilizes the node set and the edge set of the characters in the isomorphism group in the image to be recognized and the sample to judge the isomorphism relationship, so that the large class of the characters corresponding to the characters in the image to be recognized can be obtained, the subsequent character recognition range is reduced, and the subsequent character recognition accuracy is improved.

Description

Character isomorphism detection method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a character isomorphism detection method and device.
Background
In the process of character recognition, the complexity of character recognition and the non-normative nature of writing by a writer itself bring great difficulty to character recognition, and especially in the off-line state, the off-line recognition is difficult because the off-line state has fewer factors contributing to character recognition than the on-line state. When character recognition is carried out, characters to be recognized are usually compared with massive sample characters directly in the prior art to realize recognition, and the mode causes huge workload, low recognition speed and low recognition accuracy. The main reason for this defect is that the characters to be recognized are directly compared with a large number of samples, and the comparison range is too large.
Disclosure of Invention
In view of the above, an objective of the embodiments of the present invention is to provide a method and an apparatus for detecting character isomorphism, so as to solve the above-mentioned problems.
The preferred embodiment of the present invention provides a method for detecting isomorphism of characters, which comprises:
acquiring a character image to be processed, and converting the character image into a curve structure chart;
acquiring a training result set, and acquiring a node set and an edge set of each isomorphic group in the training result set;
judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart or not according to the obtained node set and edge set of the isomorphic group and the node set and edge set of the curve structure chart;
and if isomorphism exists, obtaining a node mapping set between the character image and the curve structure chart, and obtaining isomorphism mapping between the character image and the curve structure chart according to the node mapping set.
Another preferred embodiment of the present invention provides a device for detecting isomorphism of characters, the device comprising:
the curve structure chart acquisition module is used for acquiring a character image to be processed and converting the character image into a curve structure chart;
a training result set obtaining module, configured to obtain a training result set, and obtain a node set and an edge set of each homogeneous group in the training result set;
the judging module is used for judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart according to the obtained node set and the obtained edge set of the isomorphic group and the node set and the edge set of the curve structure chart;
and the mapping acquisition module is used for acquiring a node mapping set between the character image and the curve structure chart when the character image and the curve structure chart in the isomorphic group are isomorphic, and acquiring the isomorphic mapping between the character image and the curve structure chart according to the node mapping set.
According to the character isomorphism detection method and device provided by the embodiment of the invention, a character image to be processed is converted into a curve structure diagram, and whether the character image contained in an isomorphism group is isomorphism with the curve structure diagram is judged according to a node set and an edge set of the isomorphism group and the node set and the edge set of the curve structure diagram aiming at each isomorphism group in an obtained training result set. And when isomorphism is judged, acquiring a node mapping set between the character image and the curve structure chart, and acquiring isomorphism mapping between the character image and the curve structure chart according to the node mapping set. The character isomorphism detection scheme provided by the invention utilizes the node set and the edge set of the characters in the isomorphism group in the image to be recognized and the sample to judge the isomorphism relationship, so that the large class of the characters corresponding to the characters in the image to be recognized can be obtained, the subsequent character recognition range is reduced, and the subsequent character recognition accuracy is improved.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention.
Fig. 2 is a flowchart of a character isomorphism detection method according to an embodiment of the present invention.
Fig. 3 is a flowchart of the substeps of step S101 in fig. 2.
Fig. 4 is a schematic diagram of neighborhood points of a character pixel point according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of distribution of neighborhood points of character pixel points according to an embodiment of the present invention.
Fig. 6 is another distribution diagram of neighborhood points of character pixel points according to an embodiment of the present invention.
Fig. 7 is another distribution diagram of neighborhood points of character pixel points according to an embodiment of the present invention.
Fig. 8 is another distribution diagram of neighborhood points of character pixel points according to an embodiment of the present invention.
Fig. 9(a) and (b) are schematic diagrams illustrating the thinning effect of the character image according to the embodiment of the present invention.
Fig. 10(a), (b) and (c) are schematic diagrams for the structural diagram conversion provided by the embodiment of the invention.
Fig. 11(a) and (b) are character structure diagrams provided in the embodiment of the present invention.
Fig. 12 is a schematic diagram of node stacking in the structure diagram according to an embodiment of the present invention.
FIG. 13 is another diagram illustrating node stacking in the structure diagram according to an embodiment of the present invention.
Fig. 14 is a diagram illustrating the effect of eliminating node accumulation according to an embodiment of the present invention.
Fig. 15 is a schematic character diagram of a node with degree 2 according to an embodiment of the present invention.
Fig. 16 is another flowchart of the sub-steps of step S101 in fig. 2.
Fig. 17 is a diagram illustrating the effect of redundant node elimination and edge merging according to an embodiment of the present invention.
FIG. 18 is a schematic diagram of a proximity curve provided by an embodiment of the present invention.
Fig. 19 is a different schematic representation of a diagram provided by an embodiment of the present invention.
Fig. 20(a) and (b) are schematic diagrams of two mutually identical diagrams provided by an embodiment of the present invention.
Fig. 21 is a flowchart of the substeps of step S103 in fig. 2.
Fig. 22 is a flowchart of the substeps of step S104 in fig. 2.
Fig. 23 is a functional block diagram of a character isomorphism detection apparatus according to a preferred embodiment of the invention.
Icon: 100-an electronic device; 110-character isomorphism detection means; 111-curve structure diagram acquisition module; 112-training result set acquisition module; 113-a judgment module; 114-a mapping acquisition module; 120-a processor; 130-memory.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "disposed," and "connected" are to be construed broadly, e.g., as being fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Referring to fig. 1, a schematic block diagram of an electronic device 100 according to an embodiment of the invention is shown. In this embodiment, the electronic device 100 includes a character isomorphism detection apparatus 110, a processor 120, and a memory 130. Wherein, the memory 130 is electrically connected with the processor 120 directly or indirectly to realize the data transmission or interaction. The character isomorphism detecting device 110 includes at least one software function module which can be stored in the memory 130 in the form of software or firmware or solidified in the operating system in the electronic device 100. The processor 120 is configured to execute an executable module stored in the memory, such as a software function module or a computer program included in the character isomorphism detection apparatus 110, to implement isomorphism detection on characters.
In this embodiment, the electronic device 100 may be, but is not limited to, a computer, a device with an image and data processing function installed in the computer, or the like.
Fig. 2 is a schematic flowchart of a character isomorphism detection method applied to the electronic device 100 shown in fig. 1 according to an embodiment of the present invention. It should be noted that the method provided in this embodiment is not limited by the sequence shown in fig. 2 and described below. The specific process shown in fig. 2 will be described in detail below.
Step S101, acquiring a character image to be processed, and converting the character image into a curve structure chart.
In this embodiment, in order to facilitate the recognition of the character image, a series of processes and conversion processes need to be performed on the obtained character image to be processed in advance, and the obtained character image to be processed is converted into a corresponding graph structure diagram so as to facilitate subsequent processing.
Referring to fig. 3, in the present embodiment, the step S101 may include three substeps, i.e., step S1011, step S1012 and step S1013.
Step S1011, pre-processing the character image, and converting the pre-processed character image into a structural diagram corresponding to the character image.
The character images cut from the actual environment often have noise caused by adhesive frames, random stains and uneven illumination, image preprocessing is needed to make the pictures clear and the edges obvious, some information related to the characters in the images is highlighted, and some unnecessary information is removed so as to extract character features for identification.
The performance of the whole Character Recognition (OCR) system is directly affected by the quality of the preprocessing effect, and the content of the system is very wide. Generally, the preprocessing includes the steps of binarization, denoising, inverse deformation transformation, size normalization, and the like. In summary, the purpose of preprocessing is to filter out noise enhancing useful information, recover degraded information, binarize, segment characters from rows and segments of characters, refine characters to obtain skeletons, normalize characters to reduce differences among like characters, and so on.
In this embodiment, the preprocessing of the character image to be processed mainly includes binarization, erosion, and thinning. The refinement method adopted in the present embodiment is mainly explained, and the prior art can be referred to for other preprocessing processes to obtain more knowledge.
The thinning processing of the character image is to convert the image into a curve or point with a width of one, which is called a skeleton, and generally apply the curve or point to a binary image, and generate a new binary image as an output, and when the conversion is performed, the basic structure information of the original image needs to be kept.
In the binary image, a graphic pixel is represented by 1, a background pixel is represented by 0, and any pixel point p with coordinates (x, y) is called a 4-neighbor point of the pixel point p, wherein the pixel point p is a point set of (x +1, y), (x-1, y), (x, y-1) and (x, y + 1). The set of points (x +1, y-1), (x, y-1), (x-1, y-1), (x-1, y), (x-1, y +1), (x, y +1), (x +1, y +1) are called 8 neighbors of pixel p. N (p) is equal to the number of points with a median of 1 among 8 neighbors of pixel point p. For example, as shown in fig. 4, 8 neighborhoods of pixel point p are (n0n2, …, n7), and n0, n2, n4, and n6 are 4 neighborhoods of p.
For a pixel point p in a character image, whether the pixel point p is marked or not can be determined by the following rules:
(a) if N (p) is 0, 1 or 8, marking pixel point p;
(b) if n (p) is 2, 5 or 6 and the neighbors of pixel p are consecutive, e.g. n1, n0, n7, n6, n5, then mark pixel p, otherwise not mark;
(c) if N (p) is 3 and the distribution condition of the pixel point p and the adjacent points thereof accords with the template in the figure 5, marking the pixel point p;
(d) if N (p) is 4 and the distribution of the pixel point p and the adjacent points thereof conforms to the template in FIG. 6, marking the pixel point p;
(e) if N (p) is 7 and the neighboring point whose pixel value is 0 among the neighboring points is 4 neighboring points of the pixel point p, marking the pixel point p;
(f) to keep the lines connected, if pixel p conforms to the template in fig. 7, pixel p is not marked.
Setting the pixel value of the marked pixel point in the character image to be 0, namely setting the pixel value to be background, and reserving the unmarked pixel point for not processing.
Optionally, in this embodiment, in order to make the number of pixels of the image smaller, for each pixel point p in the character image, if the pixel point p conforms to the template in fig. 8 and the templates obtained by rotating the pixel point p by 90 °, 180 °, and 270 °, respectively, the pixel value of the pixel point p is made to be 0. In this embodiment, if the original character image is as shown in fig. 9(a), the corresponding character image obtained after the above-mentioned thinning processing flow is as shown in fig. 9 (b).
In this embodiment, before describing the step of converting the character image into the corresponding structure diagram, the probability of the undirected graph is first described for convenience of understanding in the following. An undirected graph G is a triplet < V, E, phi >, denoted as G ═ V, E, phi >, where V is a non-empty finite set whose elements are called nodes or vertices. E is a subset of disorder pairs V & V, each element of which is called an edge, where V & V { (a, b) | a ∈ V ^ b ∈ V }. φ is the mapping from the edge set E to V & V, φ: E → V & V.
For convenience of explanation, undirected graph G is defined as<V,E,φ>Abbreviated as G ═<V,E>Wherein E is V&Multiple subsets of V, each element in E being an unordered couple (V)1,v2),v1∈V,v2∈V。
The nodes of an undirected graph may be represented by a point on a plane and the edges may be represented by line segments (straight or curved) on the plane. The planar graph thus drawn is referred to as a diagram of the figure. Due to the arbitrary position of the planar points representing the nodes, one and the same undirected graph can draw many illustrations of different shapes.
For undirected graph G ═<V,E>If edge e is coupled to an unordered pair (v)i,vj) When they are related, it is called viAnd vjIs the end point of e, and is denoted as e ═ vi,vj) And is called e association node viAnd vj,viAnd vjIs an adjacent node. If two nodes associated with an edge are the same, this is called a ring. If there are multiple edges associated with the same pair of nodes, these edges are called parallel edges or multiple edges, and the number of parallel edges is called the multiple number of parallel edges.
For an undirected graph G ═ V, E >, V ∈ V (G), if there are k rings in G and l acyclic edges are associated with V, the number 2k + l is called the degree of node V, denoted or abbreviated as d (V).
When the graph G is (V, E), V ' is E V, and V ' determines a G subgraph G (V '), (V) is<V',E'>Wherein E ═ { E ═ (v) ═ E ═ vi,vj)|e∈E^vi∈V'^vj∈V'}。
Numeric and english characters can be viewed as being composed of points and curves without width, and can be represented by G ═ { V, E }. Where V is a set of nodes consisting of outliers, curve endpoints, and intersections. E is the set of curves connecting two nodes.
In order not to be confused, the entity representing the character image structure information is called a structure diagram, and the structure diagram G is defined similarly to an undirected graph. Structure G ═<V,E,info>Where V is a set of nodes consisting of outliers, curve endpoints and intersections, E is a set of curves connecting two nodes, the elements in E are also called edges, and each edge E is a triplet E ═ V1, V2, q, where V is a1∈V,v2∈V,v1、v2Are the two end points of the curve, q is the set of information needed on the curve, and info is some information describing the image, such as the length and width of the image.
The character image after the above preprocessing is represented as a two-dimensional array a, in which black dots are represented by 1 and white dots in the image are represented by 0.
First, each black dot in the two-dimensional array can be set to be in an unaccessed state, and description information info of the image represented by the two-dimensional array, such as the length and width of all minimum rectangles containing all black dots, is calculated. The set V, E may be made empty, for each black point a (i, j) in the two-dimensional array, the number N (i, j) of its neighbors whose pixel value is 1 is calculated, if N (i, j) is not equal to 2, then a (i, j) is set to V, which is a negative number not belonging to the set V, and V is added to the set of nodes V.
Optionally a node v that has not been accessediFrom viStarting, traversing along the path of the black points which are not visited in the character image, and setting each visited black point as visited until encountering a node v which is not visitedjAnd will (v)1,v2P) into set E, p is the traversal of this path (from v)iTo vj) The sequence of points generated. If v isiIf there are black spots left unvisited around, then go from v againiGo out to traverse until there is no black point around it that is not visited, and go viIs set to be accessed.
The above operation is performed for each node in the character image until all the nodes are set to the accessed state. Thus, the character image may be represented by nodes, sequences and description information, and the character image may be converted into a structural diagram, as shown in fig. 10(a), (b) and (c), which are processes of marking nodes, extracting an edge path1 and extracting an edge path2, respectively, where V { -1, -2}, e1 { -1, -1, path1, e2 { -1, -2, path 2.
As shown in fig. 11(a) and (b), examples in which characters '8' and 'h' are represented by structural diagrams are shown, where e1, e2, and e3 are sets of information of curves represented by them, respectively. The characters in fig. 11(a), V { -1, -2, -3, -4}, E { (-1, -2, E1), (-2, -3, E2), (-2, -4, E3) }, the characters in fig. 11(b), V { -1}, E { (-1, -1, E1), (-1, -1, E2) }.
Step S1012, performing elimination processing on the nodes in the stacked state in the structure diagram.
As can be seen from the above, in the present embodiment, the condition for determining whether the point (i, j) is a node is whether N (i, j) is equal to 2, so that a stacking phenomenon of a plurality of nodes may occur at the intersection of the curves, as shown in fig. 12. In addition, some defects due to written non-normative or refined algorithms may also lead to a pile-up phenomenon of nodes, as shown in fig. 13.
This excess information should be removed since node stacking may result in the resulting graph containing unnecessary nodes and edges, which adds difficulty to subsequent processing.
The elimination of the node pile-up can be handled at different levels, one is at the level of generating the structural diagram from the character image, and the other is after generating the structural diagram, and the node pile-up elimination is performed based on the generated structural diagram in the present embodiment.
The nature of the node pile-up is that the length of the edges is too small, so for each edge in the structure diagram, the relative length of each edge with respect to the structure diagram can be calculated as follows:
Figure BDA0001573760600000081
wherein L (e) represents the relative length of the edge e, Width represents the Width value of the rectangle containing the points in the structure diagram, Height represents the Height value of the rectangle containing the points in the structure diagram, n represents the number of the points contained by the edge e,
Figure BDA0001573760600000082
the ith point in the edge e is represented,
Figure BDA0001573760600000083
representing the (i +1) th point in the edge e.
Let G be a structure diagram, optionally an unaccessed edge eiThe following treatments were carried out:
calculating L (e)i) If L (e)i) If the length is greater than the preset length value t, retaining eiOtherwise, e is removed from GiAnd in eiSelecting one node v with smaller degree from two associated nodessNode v with larger degreebDeleting node v from structure graph GsFig. 14 is a diagram showing the effect of the characters in fig. 12 and 13 after the elimination of the node pile-up. If D (v)s) If not equal to 0, check each edge e in the graphk=(vi,vj,pk) If v isiIs equal to vsThen v is changediIf v isjIs equal to vsThen v is changedjIs v isb
The choice of the preset length value t in the algorithm has a great influence on the finally generated graph. If t is small, the purpose of eliminating node accumulation is not achieved, and some edges and nodes to be removed will remain in the graph, increasing the difficulty for the subsequent processing. When t is larger, some edges and nodes representing correct character structure information are removed, and the subsequent processing is also influenced. Experiments show that when t is 0.03, the treatment effect is better.
After the above-described process of eliminating node stacking, some redundant nodes may be left, which are not necessary, and exist only in the nodes of degree 2, such as v1, v2 in fig. 15, because the curve connected to v1 or v2 is a curve.
Referring to fig. 16, in the present embodiment, after eliminating the nodes in the stacked state in the node pattern, step S101 may further include step S1014, step S1015, and step S1016.
In step S1014, the nodes with the node degree of 2 included in the structure diagram after the elimination processing are found.
Step S1015, two edges associated with the node are obtained, and a merged edge is created according to the edge information of the two edges and the end point information of each edge.
Step S1016, deleting the node and the two edges from the structure diagram, and adding the merged edge to the structure diagram.
Optionally, each untreated node v in the junction map is treated as follows:
calculating D (v), if D (v) is not equal to 2 or v does not contain a ring, then do nothing to v. Otherwise, the edges associated with v are respectively set as e1=(v1,v2,p1),e2=(w1,w2,p2) Wherein
Figure BDA0001573760600000091
Then a new merged edge e is created (y)1,y2,p)。
Wherein:
Figure BDA0001573760600000092
m1, m2, m3, m4 take the following values:
Figure BDA0001573760600000093
Figure BDA0001573760600000101
deleting v, e from the structure diagram1,e2And add e to the edge set. Fig. 17 is a schematic diagram of a character after performing redundant node elimination and edge merging corresponding to the character in fig. 14.
In step S1013, the structure diagram after the elimination process is converted into a graph structure diagram.
The structural diagram obtained after the character image is processed contains necessary information for storing the character image, but also contains information which is not needed for character identification, and the structural diagram can be post-processed to remove the unnecessary information, so that the needed information is more convenient to use.
In character image conversionIn the process of going to the structure diagram, a curve e ═ v (v) is described1,v2P) is a sequence p of two end points of the curve and the points that make up the curve, where p describes the entire information of the curve in detail, but in character recognition only some information that is very useful for character recognition should be retained, thus requiring a simplified process.
Optionally, in this embodiment, an information set q including a direction, a length, a curvature, and a direction change sequence of each curve in the structure diagram is calculated. Let edge e equal to (v)1,v2,p)(
Figure BDA0001573760600000102
p is the slave node v1To v2Traversing the point column of this curve, the edge is undirected, but the curve represented by the edge can be viewed as directional, which translates from v1 to v2) to e ═ v (v2)1,v2,q)。
Wherein, the direction, length and curvature of the curve can be calculated by the following formula:
Figure BDA0001573760600000103
Figure BDA0001573760600000104
Figure BDA0001573760600000105
wherein v is1And v2Respectively two end points of the curve, n represents the number of points on the curve, pnRepresenting the last point on the curve, p1Representing the first point on the curve.
When the shape of the curve is more complex, the above quantities cannot distinguish between more similar edges, such as the characters shown in fig. 18, whose direction, length and curvature are comparable, except that their internal orientation is different. The sequence of changes in direction of the curves is defined in order to allow the discrimination of similar curves.
In this embodiment, the direction change sequence of the curve can be obtained by:
column of coordinates
Figure BDA0001573760600000111
Divided into n' segments to form another coordinate column
Figure BDA0001573760600000112
Wherein
Figure BDA0001573760600000113
Defined as a rounding function. I.e. p 'is the sequence obtained by dividing the coordinate column p into n' small segments and averaging them over each segment. On the basis of p ', a vector { x, y, z } which becomes a three-dimensional space in one dimension is added to each vector { x, y } in p', and this added dimension z is made 0, and then another sequence is calculated
Figure BDA0001573760600000114
The function angle represents the angle between the two vectors,
Figure BDA0001573760600000115
sequence diI.e. the trend of the direction of this curve.
After the above steps, the structure diagram corresponding to the character image can be changed into an entity representing the character image information, which contains information useful for recognition and is called a curve structure diagram.
Step S102, a training result set is obtained, and a node set and an edge set of the isomorphic group are obtained aiming at each isomorphic group in the training result set.
In this embodiment, a sample set may be trained in advance to obtain a training result set, where the training result set includes a plurality of homogeneous groups, and each homogeneous group may include one or more matched characters. Wherein, the matched characters in the same isomorphic group are isomorphic relation. In the present embodiment, the related theories of isomorphism are first explained for the following understanding.
From the above, it can be seen that one and the same figure has a different shape of illustration, and conversely, two different figures may have the same shape of illustration, as shown in fig. 19.
Can see G1And G2The nodes and the edges are all in one-to-one correspondence, and the connection relations are completely the same, only the names of the nodes and the edges are different. Such two figures are said to be isomorphic. Mathematically, the nodes of two isomorphic graphs can be in one-to-one correspondence, and the edges can be in one-to-one correspondence. The strict mathematical definition of isomorphism is as follows.
Two graphs G (v) (G), e (G)), H (v (H), e (H)), and e (H)), if there is one bijective set, α: v (G)) → v (H), β: e (G)) → e (H), such that for any e (u, v) ∈ e (G), (α (u), α (v)) ∈ e (H), and β (e) ═ α (u), α (v)), the graphs G and H are said to be isomorphic and are described as being isomorphic
Figure BDA0001573760600000121
<α,β>Called G to H isomorphic mapping, the set of all G to H isomorphic mappings is denoted as
Figure BDA0001573760600000122
Isomorphism of a graph is an equivalence relation over a set of graphs, i.e.
Figure BDA0001573760600000123
If it is
Figure BDA0001573760600000124
Then
Figure BDA0001573760600000125
If it is
Figure BDA0001573760600000126
And is
Figure BDA0001573760600000127
Then
Figure BDA0001573760600000128
In fig. 19:
α={<v1,u1>,<v2,u2>,<v3,u3>,<v4,u4>},
β={<e1,a1>,<e2,a2>,<e3,a3>,<e4,a4>,<e5,a5>,<e6,a6>},
it can be seen that < α, β > is an isomorphic mapping of G1 to G2, and for processing convenience, the function is considered as a set, with each element in the function being a doublet.
In this embodiment, let a ═ V, E >, V1 ∈ V, V2 ∈ V for an undirected graph.
Order to
Figure BDA0001573760600000129
For simplicity, another definition of undirected graph isomorphism is proposed:
two undirected graphs, G (v (G), e (G)), and H (v (H), e (H)), if a bijective function exists,
α:V(G)→V(H)
if any u ∈ v (G), v ∈ v (G), and D (u, v) ═ D (α (u), α (v)), then the graph G is isomorphic with H and is denoted as
Figure BDA00015737606000001210
α is referred to as the isomorphic node map of G to H.
Let undirected graph a ═ V, E >, V ∈ V, let multiple set n (V) ═ { D (V, vi) | vi ∈ V ^ D (V, vi) ≠ 0}, called adjacency weight set of node V.
Let undirected graph a ═ V, E >, let multiple set family p (a) = { n (vi) | vi ∈ V }.
For example, in fig. 20(a) and (b), p(s) { {1,1,1}, {2,1}, {2,1}, {1}, and p(s) { (t).
Let the undirected graph a ═ V, E >, let the set Q (a, x) { vi | vi ∈ V ^ n (vi) ═ x }, where x ∈ p (a).
If p (a) ═ p (B), any x ∈ p (a) and | Q (a, x) | ═ Q (B, x) |, inevitably occur. Let set Z ═{ z1, z 2.., zn }, and
Figure BDA0001573760600000131
z is the set of different elements in P (A) or P (B). For example, in fig. 20(a) and 20(b), p(s) ═ p (t) { {1,1,1}, {2,1}, {1} }, and Z { {1,1,1}, {2,1}, and {1} }.
If P (a) and P (b), let pi (a) and P (a, zi),
Figure BDA0001573760600000132
πi(B)=P(B,zi),
Figure BDA0001573760600000133
and | pi i (a) | ═ pi i (b) |, in fig. 20(a) and fig. 20(b), Z { {1,1,1}, {2,1}, {1} }, pi { (b) }1(A)=P(A,{1,1,1}={d},π1(B)=P(B,{1,1,1}={h},π2(A)={a,b},π2(B)={e,g},π3(A)={c},π3(B)={f}。
Figure BDA0001573760600000134
If | X | ═ Y |, let MX→YRepresents the set of all X to Y one-to-one mapping functions, then | MX→Y| X |! If X ═ X1,x2,...,xn},Y={y1,y2,...,ynN of Y! Each permutation being Pi, let E ═ tone<x1,y1>,<x2,y2>,...,<xn,yn>}, then
Figure BDA0001573760600000135
Let X and Y be two sets of families, define an operator, let X.Y ═ a<w,z>E.g. X X Y ^ a ═ weu ^ z }, make and stipulate
Figure BDA0001573760600000136
All the edges in the edge set E (A) which are associated with the same node are placed in the same set, and the set of the sets is called K (A). In fig. 20(a), k(s) { { e1, e2}, { e3}, { e4}, and { e5} }.
If the graph to be processed is A
Figure BDA0001573760600000137
And B
Figure BDA0001573760600000138
(1) If P (A) ≠ P (B), then A and B are not homogenous;
(2) if p (a) ═ p (B) and a and B are isomorphic, then all a to B isomorphic node maps α conform to the following equation:
Figure BDA0001573760600000141
for each isomorphic node mapping α, there is a set of one-to-one mapping functions β i: v (a) → v (B), making < α, β i > the isomorphic mapping of a to B, and β i satisfying the following formula:
∪βi=Mk1(A)→s1(B)·Mk2(A)→s2(B)·...·Mk|K(A)|(A)→s|K(A)|(B)
si(B)={e|e∈E(B)^x∈Ki(A) x association v1,v2^ e association alpha (v)1),α(v2)}
Wherein k isi(A) Is the ith element in K (A). From the above analysis, β is a function of α, and the key to isomorphic mapping is to map isomorphic nodes, since Mπ1(A)→π1(B)·Mπ2(A)→π2(B)·…·Mπ|z|(A)→π|z|(B)The number of the elements in is
Figure BDA0001573760600000142
So the search space of the algorithm has
Figure BDA0001573760600000143
However Mπ1(A)→π1(B)·Mπ2(A)→π2(B)·…·Mπ|z|(A)→π|z|(B)Most of the functions in the method are not isomorphic node mapping, if the functions are exhaustive Mπ1(A)→π1(B)·Mπ2(A)→π2(B)·…·Mπ|z|(A)→π|z|(B)All elements α i in (a) are calculated under the mapping α i
Figure BDA0001573760600000144
Whether or not this is true, i.e. matrix Di,j(A) Whether D (vi, vj) is equal to Di,j(B) D (α (vi), α (vj)), which is time and space consuming and unnecessary.
In this embodiment, some common writing patterns of numbers and letters are selected as sample input to train the sample input to obtain a training result set. For example, R is a set of groups representing the current training result, and each element of R
Figure BDA0001573760600000145
Ci>Is a non-empty set, each of which<Gi,Ci>Is a recognized sample, and when<Gi,Ci>∈Rt,<Gj,Cj>When being e to Rt, must have
Figure BDA0001573760600000146
Rt can be viewed as an isomorphic group of samples that are isomorphic in the character information graph.
If Ri belongs to R, Rj belongs to R, i is not equal to j, any < Gs, Cs > belongs to Ri, < Gt, Ct > belongs to Rj and all satisfy the isomorphism of Gs and Gt, namely, the elements in R are some isomorphism groups which are not isomorphism with each other.
In practice, the samples may be trained by the following process:
if the existing training result is needed, calling data from the existing training result and storing the data in the R, otherwise, enabling the data to be stored in the R
Figure BDA0001573760600000147
For each sample in the input sample library<B,C>The following processing is performed:
converting image B into its character information graph G, and finding out whether there is isomorphic group isomorphic with G in R
Figure BDA0001573760600000151
If not, the set<G,C>Adding into R; if so, find one at Rt<Gi,Ci>Let D (G, Gi) ═ Min { D (G, Gj) },<Gj,Cj>e Rt, then judging if Ci is equal to C, if not, it will be<B,C>And adding the data into the set Rt, and if the data is equal to the set Rt, not processing the data. And saving the data in the R to a training result file.
TABLE 1 training results set
Figure BDA0001573760600000152
In order to reduce the times of isomorphic judgment and mapping generation, nodes and edges in each character information graph in R are specified to maintain the corresponding relation of isomorphic mapping for an isomorphic group R in the R set, so that for a given character information graph, only isomorphic mapping is generated with any character information graph in the group, and isomorphic mapping is generated with all character information graphs in the group. The resulting training result set may be as shown in table 1.
After the training result set is obtained, the curve structure diagram to be recognized can be respectively compared with each isomorphic group in the training result set to detect whether the curve structure diagram is isomorphic.
Step S103, judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart according to the obtained node set and the edge set of the isomorphic group and the node set and the edge set of the curve structure chart.
In this embodiment, referring to fig. 21, step S103 may include five sub-steps, i.e., step S1031, step S1032, step S1033, step S1034 and step S1035.
Step S1031, detecting whether the number of nodes in the node set of the homogeneous group is the same as the number of nodes in the node set of the curve structure diagram, and whether the number of edges in the edge set of the homogeneous group is the same as the number of edges in the edge set of the curve structure diagram, if so, executing step S1032.
Step S1032 obtains a multiple set family of the character images in the isomorphic group and a multiple set family of the graph structure diagram.
Step S1033, detecting whether the multiple set family of the character images in the isomorphic group is the same as the multiple set family of the graph structure diagram, and if so, executing step S1034.
In step S1034, it is detected whether the obtained node mapping set is empty, and if not, step S1035 is executed.
Step S1035 is to determine that the character image is isomorphic with the graph structure diagram.
Let A and B be the graph structure diagram and isomorphic group used for detection respectively, if A and B's node number or number of the side is not equal, then A and B are different. If equal, then go to the following calculation step;
the calculation sets P (A) and P (B) indicate that A and B are different structures if P (A) ≠ P (B). If p (a) ═ p (b), the following calculation steps are entered;
calculating a set Z, and ordering the elements in Z according to the following rule, namely Z is { Z ═ Z1,z2,...,z|Z|}|Q(A,z1)|≤|Q(A,z2)|≤…≤|Q(A,z|Z|) And then calculating pi (A), pi (B).
Let n be | Z |, calculate Mi be M |πi(A)→πi(B),i=1,2,…,n;
Isomorphic node mapping set of graphs A to B
Figure BDA0001573760600000161
Order set T0,T1,T2,…TnF is null, the variable c is equal to 1, and then the following loop is entered:
a. for each t ∈ Mc, t denotes a symbol from X ═ vi<vi,vj>E t to Y ═ vi-<vj,vi>E.g. t, to determine if sub-graphs A (X) and B (Y) are equal, i.e. matrix Di,jWhether (a), (x)) D (Xi, Xj) is equal to Di,j(b (y)) D (α (Xi), α (Yi)), and if not equal, deleting element t from Mc;
b. let Tc equal to Tc-1Mc, let X be equal for each t ∈ Tc{vi|<vi,vj>∈t}Y={vi|<vj,vi>E.g., t }, and D is judgedi,jWhether (a), (x)) D (Xi, Xj) is equal to Di,j(b (y)) D (α (Xi), α (Yi)), if not equal, deleting element t from Tc, if equal and adding t to set f if equal and c is equal to n;
c. if c is smaller than n, increasing the value of c by 1, and then returning to execute the step of judging whether the node numbers or the edge numbers of A and B are the same; if c is equal to n, the loop ends.
After the above loop is finished, if the set f is empty, it indicates that the graphs a and B are isomorphic, otherwise, each element in f is an isomorphic node map from a to B.
Step S104, when the character image contained in the isomorphic group is isomorphic with the curve structure chart, obtaining a node mapping set between the character image and the curve structure chart, and obtaining the isomorphic mapping between the character image and the curve structure chart according to the node mapping set.
Referring to fig. 22, in the present embodiment, the step S104 may include four sub-steps of step S1041, step S1042, step S1043 and step S1044.
Step S1041, adding all edges associated with the same node in the edge set of the homogeneous group or the edge set of the curve structure diagram to the same set.
Step S1042, a total set is obtained according to the obtained plurality of sets.
Step S1043, calculating, for each isomorphic node map in the node map set, a mapping function corresponding to the isomorphic node map according to each element in the total set.
And step S1044, obtaining isomorphic mapping between the character image and the curve structure chart according to the isomorphic node mapping and the mapping function.
If a set f of isomorphic node maps of a to B is found, then all the isomorphic maps of a to B can be found by the following steps:
let A to B isomorphic mapping set as
Figure BDA0001573760600000171
Calculating a total set k (a), where m ═ k (a) |, where ki (a) is the i-th element in k (a), and ki (a) is the set of all edges associated with the same node in the edge set of the isomorphic group or the edge set of the graph structure graph.
And calculating a mapping function for each isomorphic node mapping alpha e f in the node mapping set:
si (B) { e | e ∈ E (B) ^ x ∈ Ki (A) ^ x association v1 and v2^ e association α (v1), α (v2) }, i ═ 1, 2 …, m
Computing set M ═ Mk1(A)→s1(B)·Mk2(A)→s2(B)·…·Mkm(A)→sm(B)For each element β j in M, the tuple is put<αi,βj>And adding the mixture into the set h.
After the above calculation, each element in h is an isomorphic mapping of A to B, i.e.
Figure BDA0001573760600000181
From the foregoing analysis, it can be seen that for two graphs A and B with N nodes and B edges, it is determined whether they are isomorphic and N! B! . Because M isV(A)→V(B)There are N! A mapping, ME(A)→E(B)B! Thus, in the worst case, determining whether two graphs are isomorphic and generating all isomorphic mappings requires N! B! And (5) secondary searching.
In the detection and determination method provided in this embodiment, the number of the one-to-one mappings needed to generate the isomorphic node mapping in the worst case is x 1! x 2! … xt! (where t ═ Z |, xi ═ pi i (a) |, x1+ x2+ … + xt ═ N), and x1 |. x 2! … xt! N! E.g. 1! 2! 3! 12, 6! 120, because the dynamic deletion of the algorithm does not belong to the set of isomorphic mappings, the time and space of the algorithm are reduced more greatly. Assuming that the elimination law of each function in the set is P each time the matrix is computed, the number of all generated mappings becomes (P … (P (x 1! P × 2! P) × 3! P) × … × xtP) } P … (P (X1! P × 2! P) × 3! P) × … × xtP) } P2t-1x 1! x 2! … xt! The performance of the algorithm can meet the requirement of character recognition.
If k isomorphic node maps are generatedAnd under each isomorphic node map a, a mapping set M of an edge existsk1(A)→s1(B)·Mk2(A)→s2(B)·…·Mkm(A)→sm(B)Each map in the set and a are made to constitute an isomorphic map between two graphs. And Mk1(A)→s1(B)·Mk2(A)→s2(B)·…·Mkm(A)→sm(B)The number of the middle elements is the number of groups of parallel edges in the figure.
In this embodiment, after the step of determining whether the character image included in the isomorphic group is isomorphic with the graph structure diagram, the character isomorphism detection method further includes the steps of:
after judging that the character images contained in the isomorphic group are isomorphic with the curve structure chart, calculating the difference degree between each character image and the curve structure chart.
And selecting a character image with the minimum difference degree with the curve structure diagram, and taking the character in the character image as the character represented by the curve structure diagram.
In this embodiment, each isomorphic group carries a node set, an edge set, and contained character information. After an isomorphic group isomorphic with the graph structure diagram is obtained, a matching character group contained in the isomorphic group can be obtained. Wherein the set of matching characters may include one matching character or a plurality of matching characters. If the matched character set comprises a plurality of matched characters, selecting the matched character with the minimum difference degree with the curve structure diagram from the plurality of matched characters, and using the matched character as the character represented by the curve structure diagram.
As can be seen from the above, for one side e in the graph structure diagram, (v1, v2, q), it indicates that the curve is directional at the time of processing, it is from v1 to v2, if it indicates a curve from v2 to v1, which is the same as e, it is-e if the direction vector and the direction change sequence of e are reversed and each element is inverted.
If e1 and e2 are sides in the two curve structure diagrams, respectively, the difference between the curves designated e1 and e2 is designated as L (e1, e 2). The greater the difference between the curves denoted e1 and e2, the greater L (e1, e 2). L (e1, e2) tends towards a zero function when the curves denoted e1 and e2 are very similar. The choice of L is very important, since the information set on the edge is { direction, length, curvature, direction change sequence }, so L can be represented by these data.
After detecting an isomorphic group isomorphic with the graph structure diagram in the training result set according to the above detection method, in this embodiment, the difference between each matched character in the obtained matched character group and the graph structure diagram can be calculated according to the following formula:
Figure BDA0001573760600000191
Figure BDA0001573760600000192
wherein e1 and e2 respectively represent the graph structure diagram and the matched characters participating in the calculation, and w1, w2, w3 and w4 respectively represent the weight coefficients.
W1, w2, w3 and w4 determine the proportion of each parameter in comparison, w1, w2 and w3 can be basically the same as they are represented by relative differences, w4 can be increased if e1 and e2 are the only edges in the figure, otherwise w4 takes a smaller value.
In table 1, each row represents a character and their graphs in a homogeneous group, and it can be seen that the number of characters in the group G1 is the most and there is only one edge, so the key to distinguish the characters in this group is the representation of the information set of the curve and the selection of the difference function L of the edge, and if the selection is not proper, the difference between the character information graph to be recognized and the character information graph in the group during recognition is not large, and even recognition error occurs. Therefore, improving these key points can greatly improve the recognition rate. It can also be seen from the above that the number of isomorphic decisions in identifying an image is small, because the P-sets are substantially different between different groups, and each group only needs to generate one isomorphic map.
Meanwhile, it can be seen that there are some groups with the same alphabets and numbers, lower case and upper case, such as the number '1' and the alphabets 'l', S and S, V and V, etc. in G1, and these people have characters which are difficult to distinguish, and the difference between them is small when recognizing, and the problem to be solved can be that the categories, such as the number category, the lower case category and the upper case category, are assigned when recognizing.
Fig. 23 is a block diagram of functional modules of a character isomorphism detecting device 110 according to another preferred embodiment of the present invention. The character isomorphism detection device 110 includes a graph structure diagram obtaining module 111, a training result set obtaining module 112, a determining module 113, and a mapping obtaining module 114.
The graph structure diagram obtaining module 111 is configured to obtain a character image to be processed, and convert the character image into a graph structure diagram.
The training result set obtaining module 112 is configured to obtain a training result set, and obtain a node set and an edge set of each homogeneous group in the training result set.
The judging module 113 is configured to judge whether the character image included in the isomorphic group is isomorphic with the graph structure diagram according to the obtained node set and edge set of the isomorphic group and the obtained node set and edge set of the graph structure diagram.
The mapping obtaining module 114 is configured to, when the character image included in the isomorphic group is isomorphic with the curve structure diagram, obtain a node mapping set between the character image and the curve structure diagram, and obtain an isomorphic mapping between the character image and the curve structure diagram according to the node mapping set.
In summary, the method and apparatus for detecting isomorphism of characters provided in the embodiments of the present invention convert a character image to be processed into a graph structure diagram, and determine, for each isomorphism group in an obtained training result set, whether a character image included in the isomorphism group is isomorphism with the graph structure diagram according to a node set and an edge set of the isomorphism group and the node set and the edge set of the graph structure diagram. And when isomorphism is judged, acquiring a node mapping set between the character image and the curve structure chart, and acquiring isomorphism mapping between the character image and the curve structure chart according to the node mapping set. The character isomorphism detection scheme provided by the invention utilizes the node set and the edge set of the characters in the isomorphism group in the image to be recognized and the sample to judge the isomorphism relationship, so that the large class of the characters corresponding to the characters in the image to be recognized can be obtained, the subsequent character recognition range is reduced, and the subsequent character recognition accuracy is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (6)

1. A character isomorphism detection method, comprising:
acquiring a character image to be processed, and converting the character image into a curve structure chart;
acquiring a training result set, and acquiring a node set and an edge set of each isomorphic group in the training result set;
judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart or not according to the obtained node set and edge set of the isomorphic group and the node set and edge set of the curve structure chart;
if isomorphism exists, obtaining a node mapping set between the character image and the curve structure chart, and obtaining isomorphism mapping between the character image and the curve structure chart according to the node mapping set;
the step of obtaining the character image to be processed and converting the character image into a curve structure chart comprises the following steps:
preprocessing the character image, and converting the preprocessed character image into a structural diagram corresponding to the character image;
eliminating the nodes in the stacking state in the structure chart;
searching out the nodes with the node degree of 2 in the structure chart after the elimination processing;
acquiring two edges associated with the nodes, and creating a combined edge according to the edge information of the two edges and the end point information of each edge;
deleting the node and the two edges from the structure diagram, and adding the merged edge to the structure diagram;
converting the structure chart after the elimination treatment into a curve structure chart;
the method further comprises the following steps:
after judging that the character images contained in the isomorphic group are isomorphic with the curve structure chart, calculating the difference between each character image and the curve structure chart according to the following mode:
Figure FDA0002705892540000021
Figure FDA0002705892540000022
wherein e1 and e2 respectively represent a curve structure diagram and a character image which participate in calculation, and w1, w2, w3 and w4 respectively represent weight coefficients;
and selecting a character image with the minimum difference degree with the curve structure diagram, and taking the character in the character image as the character represented by the curve structure diagram.
2. The method for detecting character isomorphism according to claim 1, wherein said step of determining whether the character image included in said isomorphism group is isomorphism with said graph structure diagram according to the obtained node set and edge set of said isomorphism group and the node set and edge set of said graph structure diagram includes:
detecting whether the node number in the node set of the isomorphic group is the same as the node number in the node set of the curve structure chart and whether the number of edges in the edge set of the isomorphic group is the same as the number of edges in the edge set of the curve structure chart;
if the character images are the same, obtaining a multiple set family of the character images in the isomorphic group and a multiple set family of the curve structure chart;
detecting whether a multiple set family of the character images in the isomorphic group is the same as a multiple set family of the curve structure chart;
and if the node mapping set is the same as the node mapping set, detecting whether the obtained node mapping set is empty, and if not, judging that the character image and the curve structure chart are isomorphic.
3. The method for detecting the isomorphism of characters in claim 1, wherein said step of obtaining the isomorphism mapping between said character image and said graph structure diagram according to said node mapping set comprises:
adding the edges of all the related same nodes in the edge set of the isomorphic group or the edge set of the curve structure chart into the same set;
obtaining a total set according to the obtained multiple sets;
aiming at each isomorphic node map in the node map set, calculating a mapping function corresponding to the isomorphic node map according to each element in the total set;
and obtaining isomorphic mapping between the character image and the curve structure chart according to the isomorphic node mapping and the mapping function.
4. The method for detecting character isomorphism according to claim 1, wherein said step of preprocessing said character image and converting the preprocessed character image into a structural diagram corresponding to the character image includes:
aiming at each pixel point in the character image, obtaining a plurality of adjacent points of the pixel point;
counting the number of the adjacent points with the pixel value of 1 in the plurality of adjacent points;
marking the pixel points according to the counted number and the distribution conditions of the pixel points and the plurality of adjacent points;
thinning the character image according to the marking result of the pixel point;
obtaining description information of the character image after thinning processing, and judging whether each pixel point in the character image is a node or not according to adjacent point information of the pixel point;
when the pixel points are determined as nodes, traversing edges connecting the nodes from the nodes along the paths of the character images to obtain a sequence containing the points in the traversed edges;
and converting the character image into a structure diagram according to the obtained nodes, the sequence and the description information in the character image.
5. The method for detecting character isomorphism according to claim 1, wherein said step of performing elimination processing on nodes in a pile-up state in said structure diagram includes:
calculating the relative length of each side in the structure diagram relative to the structure diagram;
detecting whether the relative length is smaller than a preset length value;
if the length value is smaller than the preset length value, judging that the node associated with the edge is in a stacking state, and deleting the node in the stacking state from the structural diagram.
6. A character isomorphism detection apparatus, comprising:
the curve structure chart acquisition module is used for acquiring a character image to be processed and converting the character image into a curve structure chart;
a training result set obtaining module, configured to obtain a training result set, and obtain a node set and an edge set of each homogeneous group in the training result set;
the judging module is used for judging whether the character images contained in the isomorphic group are isomorphic with the curve structure chart according to the obtained node set and the obtained edge set of the isomorphic group and the node set and the edge set of the curve structure chart;
a mapping obtaining module, configured to obtain a node mapping set between the character image and the curve structure diagram when the character image and the curve structure diagram included in the isomorphic group are isomorphic, and obtain an isomorphic mapping between the character image and the curve structure diagram according to the node mapping set;
wherein, the curve structure chart acquisition module is used for:
preprocessing the character image, and converting the preprocessed character image into a structural diagram corresponding to the character image;
eliminating the nodes in the stacking state in the structure chart;
searching out the nodes with the node degree of 2 in the structure chart after the elimination processing;
acquiring two edges associated with the nodes, and creating a combined edge according to the edge information of the two edges and the end point information of each edge;
deleting the node and the two edges from the structure diagram, and adding the merged edge to the structure diagram;
converting the structure chart after the elimination treatment into a curve structure chart;
the mapping obtaining module is further configured to:
after judging that the character images contained in the isomorphic group are isomorphic with the curve structure chart, calculating the difference between each character image and the curve structure chart according to the following mode:
Figure FDA0002705892540000051
Figure FDA0002705892540000061
wherein e1 and e2 respectively represent a curve structure diagram and a character image which participate in calculation, and w1, w2, w3 and w4 respectively represent weight coefficients;
and selecting a character image with the minimum difference degree with the curve structure diagram, and taking the character in the character image as the character represented by the curve structure diagram.
CN201810126894.7A 2018-02-08 2018-02-08 Character isomorphism detection method and device Active CN108399425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810126894.7A CN108399425B (en) 2018-02-08 2018-02-08 Character isomorphism detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810126894.7A CN108399425B (en) 2018-02-08 2018-02-08 Character isomorphism detection method and device

Publications (2)

Publication Number Publication Date
CN108399425A CN108399425A (en) 2018-08-14
CN108399425B true CN108399425B (en) 2021-04-06

Family

ID=63096266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810126894.7A Active CN108399425B (en) 2018-02-08 2018-02-08 Character isomorphism detection method and device

Country Status (1)

Country Link
CN (1) CN108399425B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838926A (en) * 2014-03-05 2014-06-04 德州学院 Algorithm for deciding isomorphism of optional non-weighted graphs

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055620B (en) * 2006-04-12 2011-04-06 富士通株式会社 Shape comparison device and method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103838926A (en) * 2014-03-05 2014-06-04 德州学院 Algorithm for deciding isomorphism of optional non-weighted graphs

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于二值图像的手写体快速细化算法;曹良斌;《吉首大学学报(自然科学版)》;20180131;第29-33页 *
基于结构特征的手写体汉字识别研究;冯志敏;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20061215;第14-25页 *
手写汉字图像动态信息恢复方法研究;苏哲文;《中国博士学位论文全文数据库信息科技辑》;20111115;第29-30页、第88-92页 *

Also Published As

Publication number Publication date
CN108399425A (en) 2018-08-14

Similar Documents

Publication Publication Date Title
US10963632B2 (en) Method, apparatus, device for table extraction based on a richly formatted document and medium
US10699109B2 (en) Data entry from series of images of a patterned document
CN110516577B (en) Image processing method, image processing device, electronic equipment and storage medium
CN110796031A (en) Table identification method and device based on artificial intelligence and electronic equipment
CN101727580B (en) Image processing apparatus, image processing unit, and image processing method
US9959475B2 (en) Table data recovering in case of image distortion
CN113177959B (en) QR code real-time extraction method in rapid movement process
CN102682428A (en) Fingerprint image computer automatic mending method based on direction fields
CN113792853B (en) Training method of character generation model, character generation method, device and equipment
CN112991536B (en) Automatic extraction and vectorization method for geographic surface elements of thematic map
Barbosa et al. On the improvement of multiple circles detection from images using hough transform
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN111445386A (en) Image correction method based on four-point detection of text content
CN110738204A (en) Method and device for positioning certificate areas
CN107358244B (en) A kind of quick local invariant feature extracts and description method
CN111832390B (en) Handwritten ancient character detection method
CN108345853B (en) Character recognition method and device based on isomorphic theory and terminal equipment
CN108399425B (en) Character isomorphism detection method and device
CN111160142A (en) Certificate bill positioning detection method based on numerical prediction regression model
CN107170004A (en) To the image matching method of matching matrix in a kind of unmanned vehicle monocular vision positioning
CN106056599B (en) A kind of object recognition algorithm and device based on Object Depth data
CN114463764A (en) Table line detection method and device, computer equipment and storage medium
CN113177542A (en) Method, device and equipment for identifying characters of seal and computer readable medium
US10796197B2 (en) Automatic method and system for similar images and image fragments detection basing on image content
CN112749691A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180814

Assignee: HUNAN RUIYANG ELECTRONIC TECHNOLOGY Co.,Ltd.

Assignor: JISHOU University

Contract record no.: X2023980045935

Denomination of invention: Character isomorphism detection method and device

Granted publication date: 20210406

License type: Exclusive License

Record date: 20231108