CN110516655B - Chinese character image stroke processing method and system - Google Patents

Chinese character image stroke processing method and system Download PDF

Info

Publication number
CN110516655B
CN110516655B CN201910832767.3A CN201910832767A CN110516655B CN 110516655 B CN110516655 B CN 110516655B CN 201910832767 A CN201910832767 A CN 201910832767A CN 110516655 B CN110516655 B CN 110516655B
Authority
CN
China
Prior art keywords
stroke
point
along
advancing
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910832767.3A
Other languages
Chinese (zh)
Other versions
CN110516655A (en
Inventor
魏东琦
赛琳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XI'AN CENTER OF GEOLOGICAL SURVEY CGS
Original Assignee
XI'AN CENTER OF GEOLOGICAL SURVEY CGS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XI'AN CENTER OF GEOLOGICAL SURVEY CGS filed Critical XI'AN CENTER OF GEOLOGICAL SURVEY CGS
Priority to CN201910832767.3A priority Critical patent/CN110516655B/en
Publication of CN110516655A publication Critical patent/CN110516655A/en
Application granted granted Critical
Publication of CN110516655B publication Critical patent/CN110516655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • G06V30/2268Character recognition characterised by the type of writing of cursive writing using stroke segmentation

Abstract

The application discloses a Chinese character image stroke processing method and a system, wherein the method comprises the following steps: analyzing strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections; a point in the stroke and the stroke advance direction are input, and the variation type of the stroke along the direction is output through a center line algorithm. The invention has the advantages that: the method is simple to realize, can effectively solve the problem of stroke adhesion between characters, has strong practicability, and can correctly finish the continuous stroke segmentation of the Chinese characters.

Description

Chinese character image stroke processing method and system
Technical Field
The invention relates to a Chinese character image stroke processing method and a system.
Background
Handwritten chinese characters are an important way of recording information, and initially paper was the primary medium for storing this information. However, with the popularization of various electronic devices, paper documents are conveniently electronized, and electronic input devices such as a handwriting board and a digital board are also available, but most of the information is stored in a raster image format. With the development of artificial intelligence technology in recent years, the offline handwritten Chinese character recognition technology has been in qualitative leap, and the accuracy rate of handwritten single character recognition can reach the level of commercial application. However, the overall recognition of the handwritten text in segmented text is still not good enough. Wherein, whether the segmentation among the Chinese characters is correct or not is an important reason for influencing the overall recognition effect. The Chinese characters have various fonts, the forms of different fonts are different, and the Chinese characters have complex structures and different writing styles, so that the segmentation method based on the global characteristics cannot well solve the Chinese character segmentation problem.
Disclosure of Invention
The invention aims to overcome the defects and provide a Chinese character image stroke processing method which can effectively solve the problem of stroke adhesion between characters and correctly complete the continuous stroke segmentation of the Chinese characters.
In order to achieve the purpose, the invention adopts the technical scheme that: a Chinese character image stroke processing method is characterized by comprising the following steps:
analyzing strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections;
a point in the stroke and the stroke advance direction are input, and the variation type of the stroke along the direction is output through a center line algorithm.
Another object of the present invention is to provide a system for processing strokes of a chinese character image, comprising:
the analysis module is used for analyzing strokes of the Chinese characters into four variation types of endpoints, inflection points, bifurcations and intersections;
and the processing module is used for inputting a point in the stroke and the advancing direction of the stroke and outputting the change type of the stroke along the direction through a center line algorithm.
The beneficial effects of the invention are as follows:
the realization is simple, include: analyzing strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections; a point in the stroke and the stroke advance direction are input, and the variation type of the stroke along the direction is output through a center line algorithm. The practicability is strong, the problem of stroke adhesion between characters can be effectively solved, and the continuous stroke segmentation of the Chinese characters can be completed correctly.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a Chinese character image stroke processing method of the present invention;
FIG. 2 is a schematic diagram of four stroke variation types of the present invention;
FIG. 3 is a schematic centerline view of the present invention;
FIG. 4 is a schematic diagram of stroke variations of the present invention;
FIG. 5 is a schematic drawing of a chain of segmented rows according to the present invention;
fig. 6 is a broken away schematic view of the column pen of the present invention.
Detailed Description
As some terms are used throughout the description and claims to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, that a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect. The description which follows is a preferred embodiment of the present application, but is made for the purpose of illustrating the general principles of the application and not for the purpose of limiting the scope of the application. The protection scope of the present application shall be subject to the definitions of the appended claims.
Referring to fig. 1, a method for processing strokes of a chinese character image according to the present invention includes: step S101, analyzing strokes of Chinese characters into four variation types of end points, inflection points, bifurcations and intersections; step S102, inputting a point in the stroke and the advancing direction of the stroke, and outputting the change type of the stroke along the direction through a central line algorithm.
In one embodiment, the parsing the strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections includes: the corresponding stroke line degrees of the four stroke types are 1,2,3,4 respectively.
In one embodiment, degrees exceeding 4 resolve to a cross.
In one embodiment, the centerline algorithm comprises: proceeding from a point within the stroke in the direction of stroke progress, an endpoint type exists when there is a point outside the stroke.
In one embodiment, the centerline algorithm comprises: and calculating two boundary points in a direction perpendicular to the stroke advancing direction, advancing from one point in the stroke along the stroke advancing direction, and if any boundary point is not reached beyond a certain pixel, determining that the stroke is of a bifurcation type.
In one embodiment, the certain pixels are 9 pixels
In one embodiment, the centerline algorithm comprises: and (3) moving forward from a point in the stroke along the forward direction of the stroke, taking a front vector and a rear vector which are formed by continuous three points separated by fixed pixels on the central line of two boundary points, and judging that the included angle between the two vectors exceeds 30 degrees, so that the curve type is formed.
In one embodiment, the fixed pixel is 2 pixels.
Another object of the present invention is to provide a system for processing strokes of chinese characters, comprising: the analysis module is used for analyzing strokes of the Chinese characters into four variation types of endpoints, inflection points, bifurcations and intersections; and the processing module is used for inputting a point in the stroke and the advancing direction of the stroke and outputting the change type of the stroke along the direction through a central line algorithm.
In one embodiment, the parsing module sorts the degrees of the four stroke types corresponding to the stroke lines 1,2,3,4.
Referring to fig. 2, strokes are the smallest structural units forming a Chinese character, and the phenomenon of continuous strokes occurs during writing of the Chinese character, but the strokes are overlapped and adhered locally. The stroke variation of Chinese characters can be divided into 4 types: 1 stroke end, 2 strokes turn, 3 blocked by other lines, and 4 crossed with other lines. The 4 stroke types correspond to the stroke line degrees 1,2,3,4 respectively, and can also be called end points, inflection points, branches and intersections vividly. Degrees exceeding 4 are considered to be crossed.
Setting the input image as a binary image, namely setting background pixels as 0 and character pixels as 255; if the size of a Chinese character is about 64 pixels × 64 pixels, the height is scaled to 64 and the width is scaled in the same scale.
The algorithm inputs a point P in the stroke 0 And stroke advancing direction r 0 And outputting the change type of the stroke along the direction.
The center line algorithm comprises the following steps: from P 0 Starting at point along r 0 Advancing by 1 pixel in direction, set to Q 1 The point (fig. 3), which may be off center from the stroke, requires a trajectory modification. Along r 0 In the vertical direction of (i.e., (-r) y , r x ) Direction sum (r) y , -r x ) And respectively proceed until a stroke boundary is encountered, i.e., the point pixel is 0. Let two boundary points be E 1 , E 2 Let P 1 =(E 1 +E 2 ) 2, then P 1 Located on the stroke centerline. Vector r 1 =
Figure 100002_DEST_PATH_IMAGE002
If P is 0 The point is also on the central line, then r 1 As a vector along the centerline. Then from P 1 Starting at point along r 1 Direction Using this algorithm, P can be found 2 . The algorithm is circularly used to sequentially obtain the trace points P of the central line of the stroke 1 , P 2 , P 3 , …。
Judging the stroke change type: if a certain Q n The point is outside the stroke, indicating that the stroke end has been encountered, returning to 1. If along r n Calculating boundary point E in vertical direction 1 And E 2 At this time, more than 9 pixels have not yet reached the boundary point, indicating that a stroke bifurcation has been encountered (FIG. 4). Recalculate r n Direction and r n Whether both vertical directions exceed 9 pixels determines the degree of bifurcation, i.e., the return value. Let P n-2 、P n And P n+2 Respectively the points on the central line obtained in the (n-2) th, n and n + 2) th step, if the vector is
Figure DEST_PATH_IMAGE004
And vector->
Figure DEST_PATH_IMAGE006
If the included angle exceeds 30 deg., the encountered stroke is considered to turn and return to 2.
Continuous stroke segmentation is one of the difficulties in Chinese character segmentation. The continuous stroke dividing position is the stroke type change position, so the continuous stroke processing method can be used for dividing the Chinese character continuous stroke.
Example 1
Line-by-line segmentation:
according to the writing habit of people, sometimes a stroke of a word is too long (usually "vertical") and a stroke is continued with the next line of the word. This continuous stroke segmentation is a difficult point of Chinese character row segmentation.
And setting a certain Chinese character area to span the upper and lower lines, and performing line segmentation. Traversing each column j of the Chinese character area, setting the middle point of the lower boundary of the row area and the upper boundary of the next row area at the column position as i, wherein the point (i, j) belongs to the Chinese character area, and searching a segmentation position near the point. Starting from the point, the stroke change position is found by taking the vertical direction as the direction. According to writing habits, the break is made at the returned stroke change position. As shown in FIG. 5, the vertical stroke of the "but" is linked with the "crystal" at the bottom, and the continuous stroke point is searched downwards from the pixel point of the middle vertical stroke. The stroke branching position is in the day part of the crystal word, and the algorithm can accurately find the continuous stroke segmentation position.
Example 2
Column-by-column segmentation:
for regions with a width >32 pixels, it is possible to split a point where the position is only one stroke in the vertical direction. For a certain column j of a Chinese character area, searching for a line which is only intersected with a stroke line of a vertical line, recording an intersection position p, and setting an image array as A. p is initially 0, and scans from top to bottom, and for the ith row, if the array A [ i, j ] ≠ 0, a stroke is encountered. If p =0, updating the value of p to i; if A [ i, j ] =0 and ≠ 0, then p = -p, which represents that the intersection of the stroke and the vertical line is finished; if A [ i, j ≠ 0 is met again, and p is less than 0 at the moment, a second stroke crossed with the vertical line is met, and the loop exits. For vertical lines that intersect only one stroke, an attempt is made to break the stroke around the intersection location. The method takes the position as a starting point, and adopts the stroke change type judgment algorithm to the left and the right respectively. If the change type on the left side is < the right side, the left side is disconnected according to the writing habit of people, and otherwise, the right side is disconnected. As shown in FIG. 6, the "make" word is connected with the "go" word. Starting from a certain position of a 'right-falling' stroke of a 'command' word, detecting that the stroke turns when encountering the 'right-going' stroke, and obtaining the stroke change type of 2 at the moment; starting to the left, detecting that the stroke change position is the intersection of the stroke with the left falling stroke, and the change type is 3.3> =2, so the division is at the connection of the stroke of "putting" and the stroke of "going". For another example: in FIG. 5, the stroke of "falling down" of "guest" is connected with "view". Starting from a certain position of the stroke of 'falling down' (the vertical line at the position is only intersected with the stroke of 'falling down'), the stroke changes leftwards and rightwards respectively, and the stroke changes are all crossed, namely 4=4, so that the stroke changes on the right side are disconnected. As another example, for the "rock" word 2<3, so the left side of the pen stroke is broken.
The beneficial effects of the invention are as follows:
the realization is simple, include: analyzing strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections; a point in the stroke and the stroke advance direction are input, and the variation type of the stroke along the direction is output through a center line algorithm. The practicability is strong, the problem of stroke adhesion between characters can be effectively solved, and the continuous stroke segmentation of the Chinese characters can be completed correctly.
The foregoing description shows and describes several preferred embodiments of the present application, but as aforementioned, it is to be understood that the application is not limited to the forms disclosed herein, but is not to be construed as excluding other embodiments and is capable of use in various other combinations, modifications, and environments and is capable of changes within the scope of the application as described herein, commensurate with the above teachings, or the skill or knowledge of the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the application, which is to be protected by the claims appended hereto.

Claims (4)

1. A Chinese character image stroke processing method is characterized by comprising the following steps:
analyzing strokes of the Chinese characters into four variation types of end points, inflection points, bifurcations and intersections;
inputting a point in the stroke and the advancing direction of the stroke, and outputting the change type of the stroke along the direction through a central line algorithm;
the center line algorithm comprises the following steps: from P 0 Starting at point, along r 0 Advancing by 1 pixel in direction r 0 In the vertical direction of (i.e., (-r) y , r x ) Direction sum (r) y , -r x ) Respectively advancing until a stroke boundary is met, namely the pixel of the point is 0, and recording two boundary points as E 1 , E 2 Let P 1 =(E 1 +E 2 ) 2, then P 1 On the stroke central line, the vector r is recorded 1 =
Figure DEST_PATH_IMAGE002
If P is 0 The point is also on the central line, then r 1 Is a vector along the center line, further from P 1 Starting at point, along r 1 Direction Using this algorithm, P can be found 2 The algorithm is cyclically used to sequentially obtain the trace point P of the center line of the stroke 1 , P 2 , P 3 , …;
The method for analyzing the strokes of the Chinese characters into four variation types including end points, inflection points, bifurcations and intersections comprises the following steps: the corresponding stroke line degrees of the four stroke types are 1,2,3,4 respectively;
the intersection is analyzed when the degree exceeds 4;
proceeding from a point in the stroke along the advancing direction of the stroke, and when a certain point exists outside the stroke, the point is of an end point type;
calculating two boundary points in the direction perpendicular to the stroke advancing direction, advancing from one point in the stroke along the stroke advancing direction, and if the boundary points are not reached by more than a certain pixel, determining that the stroke is in a bifurcation type;
and (3) moving forward from a point in the stroke along the forward direction of the stroke, taking a front vector and a rear vector which are formed by continuous three points separated by fixed pixels on the central line of two boundary points, and judging that the included angle between the two vectors exceeds 30 degrees, so that the curve type is formed.
2. The method for processing strokes of Chinese character images as recited in claim 1, wherein said certain pixels are 9 pixels
3. The method of stroke processing for a chinese character image according to claim 1, wherein said fixed pixels are 2 pixels.
4. A Chinese character image stroke processing system is characterized by comprising:
the analysis module is used for analyzing strokes of the Chinese characters into four variation types of endpoints, inflection points, bifurcations and intersections;
the processing module is used for inputting a point in the stroke and the advancing direction of the stroke and outputting the change type of the stroke along the direction through a center line algorithm;
the center line algorithm comprises the following steps: from P 0 Starting at point along r 0 1 pixel forward in the direction r 0 In the vertical direction of (i.e., (-r) y , r x ) Direction sum (r) y , -r x ) Respectively advancing until a stroke boundary is met, namely the pixel of the point is 0, and recording two boundary points as E 1 , E 2 Let P stand for 1 =(E 1 +E 2 ) 2, then P 1 On the stroke central line, the vector r is recorded 1 =
Figure DEST_PATH_IMAGE002A
If P is 0 The point is also on the central line, then r 1 Is a vector along the center line, further from P 1 Starting at point along r 1 Direction Using this algorithm, P can be derived 2 The algorithm is cyclically used to sequentially obtain the trace point P of the center line of the stroke 1 , P 2 , P 3 , …;
The analyzing module is used for respectively measuring the degrees of the stroke lines corresponding to the four stroke types 1,2,3,4;
the intersection is analyzed when the degree exceeds 4;
proceeding from a point in the stroke along the advancing direction of the stroke, and when a certain point exists outside the stroke, the point is of an end point type;
calculating two boundary points in the direction perpendicular to the stroke advancing direction, advancing from one point in the stroke along the stroke advancing direction, and if the boundary points are not reached by more than a certain pixel, determining that the stroke is in a bifurcation type;
and (3) moving forward from a point in the stroke along the forward direction of the stroke, taking a front vector and a rear vector which are formed by continuous three points separated by fixed pixels on the central line of two boundary points, and judging that the included angle between the two vectors exceeds 30 degrees, so that the curve type is formed.
CN201910832767.3A 2019-09-04 2019-09-04 Chinese character image stroke processing method and system Active CN110516655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910832767.3A CN110516655B (en) 2019-09-04 2019-09-04 Chinese character image stroke processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910832767.3A CN110516655B (en) 2019-09-04 2019-09-04 Chinese character image stroke processing method and system

Publications (2)

Publication Number Publication Date
CN110516655A CN110516655A (en) 2019-11-29
CN110516655B true CN110516655B (en) 2023-04-18

Family

ID=68630890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910832767.3A Active CN110516655B (en) 2019-09-04 2019-09-04 Chinese character image stroke processing method and system

Country Status (1)

Country Link
CN (1) CN110516655B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116343242B (en) * 2023-05-30 2023-08-11 山东一品文化传媒有限公司 Real-time examination and reading method and system based on image data
CN117519515A (en) * 2024-01-05 2024-02-06 深圳市方成教学设备有限公司 Character recognition method and device for memory blackboard and memory blackboard

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101160592B (en) * 2005-02-15 2010-05-19 凯特影像科技有限公司 Handwritten character recognizing method, handwritten character recognizing system
CN100382098C (en) * 2006-09-08 2008-04-16 华南理工大学 First-end stroke online extraction method for written Chinese character
WO2012103794A1 (en) * 2011-01-31 2012-08-09 北京壹人壹本信息科技有限公司 Method and device for implementing original handwriting, and electronic device
CN102103761B (en) * 2011-01-31 2013-05-08 北京壹人壹本信息科技有限公司 Method for realizing original handwriting
CN102750556A (en) * 2012-06-01 2012-10-24 山东大学 Off-line handwritten form Chinese character recognition method
CN104156721B (en) * 2014-07-31 2017-06-23 南京师范大学 A kind of off line Chinese-character stroke extracting method based on template matches
CN104182748B (en) * 2014-08-15 2018-04-13 电子科技大学 One kind is based on the matched Chinese-character stroke extraction method of fractionation
CN107292936B (en) * 2017-05-18 2020-08-11 湖南大学 Chinese character font vectorization method

Also Published As

Publication number Publication date
CN110516655A (en) 2019-11-29

Similar Documents

Publication Publication Date Title
CN112818812B (en) Identification method and device for table information in image, electronic equipment and storage medium
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN110178139B (en) System and method for character recognition using a full convolutional neural network with attention mechanisms
US10140556B2 (en) Arabic optical character recognition method using hidden markov models and decision trees
CN110516655B (en) Chinese character image stroke processing method and system
KR20170137170A (en) Method and apparatus for text image processing
CN101128837A (en) Segmentation-based recognition
CN107944451B (en) Line segmentation method and system for ancient Tibetan book documents
CN112784531B (en) Chinese character form and character library generation method based on deep learning and component splicing
JP2730665B2 (en) Character recognition apparatus and method
JP4704601B2 (en) Character recognition method, program, and recording medium
CN102063621A (en) Method and device for correcting geometric distortion of character lines
CN111985459A (en) Table image correction method, device, electronic equipment and storage medium
CN102314252B (en) Character segmentation method and device for handwritten character string
Ronee et al. Handwritten character recognition using piecewise linear two-dimensional warping
CN110516674B (en) Handwritten Chinese character segmentation method and system for text image
CN111275049A (en) Method and device for acquiring character image skeleton feature descriptors
US10679049B2 (en) Identifying hand drawn tables
CN115311666A (en) Image-text recognition method and device, computer equipment and storage medium
CN114743200A (en) Electronic signature handwriting segmentation method based on recognition
CN114495132A (en) Character recognition method, device, equipment and storage medium
CN108564078B (en) Method for extracting axle wire of Manchu word image
CN112419208A (en) Construction drawing review-based vector drawing compiling method and system
CN102542269B (en) Western language word segmenting method and device
CN114463760B (en) Character image writing track recovery method based on double-stream coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant