CN111310548B - Method for identifying stroke types in online handwriting - Google Patents

Method for identifying stroke types in online handwriting Download PDF

Info

Publication number
CN111310548B
CN111310548B CN201911224894.1A CN201911224894A CN111310548B CN 111310548 B CN111310548 B CN 111310548B CN 201911224894 A CN201911224894 A CN 201911224894A CN 111310548 B CN111310548 B CN 111310548B
Authority
CN
China
Prior art keywords
stroke
sequence
handwriting
strokes
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911224894.1A
Other languages
Chinese (zh)
Other versions
CN111310548A (en
Inventor
邹杰
曾蓓蓓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Hande Ruiting Technology Co ltd
Original Assignee
Wuhan Hande Ruiting Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Hande Ruiting Technology Co ltd filed Critical Wuhan Hande Ruiting Technology Co ltd
Priority to CN201911224894.1A priority Critical patent/CN111310548B/en
Publication of CN111310548A publication Critical patent/CN111310548A/en
Application granted granted Critical
Publication of CN111310548B publication Critical patent/CN111310548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/30Writer recognition; Reading and verifying signatures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the field of information security, and discloses a method for identifying stroke types in online handwriting, which comprises the following steps: preparing for stroke recognition, collecting a plurality of handwriting scripts related to standard Chinese characters, establishing a template stroke sequence and index items of the template stroke subsequences, recognizing strokes, searching all template stroke subsequences meeting the conditions according to the length of the stroke sequence to be recognized, calling a stroke matching algorithm to establish a corresponding relation between the template strokes and the stroke sequence to be recognized, obtaining the types of partial strokes and the stroke subsequences of smaller unknown types which are segmented by the strokes according to the corresponding relation of the strokes, and repeating the processes to finish recognition of all the stroke types. The invention discloses a method for identifying stroke types in online handwriting, which can identify each stroke and the stroke types of each stroke in the handwriting for nonstandard arbitrary handwriting with known writing content, and lays a foundation for further extracting detailed characteristics of handwriting writing.

Description

Method for identifying stroke types in online handwriting
Technical Field
The invention relates to the field of information security, in particular to a method for identifying stroke types in online handwriting.
Background
The online handwriting authentication refers to a technology for obtaining handwriting of a user online by using special input equipment and comparing personalized features of culvert therein so as to realize user identity authentication.
Like a human face, handwriting is typical structured feature data, and the basic unit constituting handwriting is a stroke. Under the influence of the success of face recognition, people begin to pay attention to the extraction of stroke features again.
The research shows that the stroke characteristics have important significance for improving handwriting authentication performance, and are the basis for further extracting high-level characteristics including handwriting writing level, structural layout, mutual correspondence among strokes, writing sequence of strokes, writing rhythm, strong pause and pause, light and heavy diseases and the like. However, unlike the singleness of human face composition, handwriting is multi-element complex structured data composed of strokes in a specific writing order. The forehead, eyes, nose and mouth are all included in the face, and the relative position of the elements is determined, so that the certainty is obviously helpful for extracting the characteristics.
However, handwriting does not have this property, and the composition complexity of handwriting is represented by: firstly, the strokes of different Chinese characters are different in constitution; secondly, even the same Chinese character is represented in various handwriting forms, such as a line book, a cursive book and the like, and the strokes of the Chinese character are different in constitution; in addition, those with highly personalized variant words exhibit handwriting structures that are unique and non-normative; finally, even if different writings of the same font of the same Chinese character come from the front and the back of the same person, the sequence of strokes has multiple possibilities due to multiple strokes, fewer strokes, pen winding, simplified strokes and different strokes caused by the randomness of writing.
In this case, it is a difficult matter to recognize strokes in the handwriting, i.e., to establish correspondence between the handwriting strokes and standard-sized kanji strokes.
Disclosure of Invention
The invention aims at overcoming the defects of the technology, provides a method for identifying the stroke types in online handwriting, can identify each stroke and the stroke types thereof in the handwriting for nonstandard arbitrary handwriting with known writing content, and lays a foundation for further extracting detailed characteristics of handwriting writing.
In order to achieve the above purpose, the invention provides a method for identifying stroke types in line handwriting, which comprises the following steps:
a) Stroke recognition preparation: collecting a plurality of handwriting of a standard Chinese character c, dividing all handwriting according to strokes, manually marking the corresponding relation between each stroke in the handwriting and each stroke in the standard Chinese character c, setting the handwriting which is manually marked or the handwriting which is obtained through calculation and has the calculation accuracy passing through manual inspection as a template handwriting, establishing a template stroke sequence of the template handwriting, dividing the template stroke sequence into a plurality of template stroke subsequences, establishing an index item of the template stroke sequence and the template stroke subsequence, wherein the index item comprises the sequence number of the template stroke subsequence in the template sequence, the length of the template stroke subsequence, the starting point and the end point of the template stroke subsequence in the template sequence, and the types of the direct preceding stroke and the direct following stroke of the template stroke subsequence, the direct preceding stroke refers to the former section in the template sequence where the first section of the template stroke subsequence is located, and the last section of the direct following stroke refers to the latter section in the template sequence;
The standard Chinese character c is a Chinese character with standard writing style and expression form which is widely used at present;
the handwriting is a time sequence signal sequence related to time, which is obtained by collecting various information generated by the movement of the pen point in the writing process in real time through special data sensing equipment, and the data collected by the data sensing equipment at each sampling moment comprises two-dimensional position information of the pen point;
b) And (3) stroke identification: searching all template stroke sequences with the length meeting the requirements in the index item according to the length of the stroke sequence to be recognized, then taking out the corresponding template stroke subsequences one by one according to the index item, calling a matching algorithm to establish the corresponding relation between the template stroke subsequences and the stroke sequence to be recognized to form a matching sequence, searching a continuous one-to-one stroke corresponding relation with the length being more than 2 in the matching sequence, if the corresponding relation meeting the conditions does not exist in the matching sequence, reading the next index item, if the template handwriting contained in all the index items does not exist in the corresponding relation meeting the conditions, marking the template handwriting as a brand-new standard Chinese character c writing form, updating the template handwriting and the index item by manpower, expanding the representativeness of the template handwriting, otherwise, obtaining the types of unknown strokes in the handwriting to be recognized by continuous one-to-one stroke corresponding relation and marked types, obtaining a plurality of strokes to be recognized with smaller lengths, repeating the processes until all strokes of the handwriting to be recognized are recognized or part of strokes are recognized, otherwise, and returning the partial writing forms to the template handwriting to the original writing form.
Preferably, the preparation stage for stroke recognition in the step a) includes the following steps:
a1 Inquiring data, and establishing a standard body stroke order database of Chinese characters, wherein the Chinese characters comprise meaningful Chinese characters and nonsensical symbols written according to the writing specification of the Chinese characters;
a2 Collecting various handwriting corresponding to standard Chinese character c): the various handwriting corresponding to the standard Chinese character c refers to various standard handwriting related to the standard Chinese character c, non-standard handwriting which can be correctly identified by other people or can be identified by only a few people or can be identified by the writers, wherein the identification refers to the establishment of the corresponding relation between the handwriting and the standard Chinese character c, and the establishment ofRepresenting the ith handwriting corresponding to the standard Chinese character c obtained by the handwriting equipment, (x) j ,y j ) Representing the position information of the pen point acquired by the handwriting equipment at the j moment, wherein j is more than or equal to 1 and n is more than or equal to n i ,n i Representing the ith handwriting h i The number of the middle sampling points;
a3 Dividing handwriting according to strokes): the method comprises the steps of segmenting handwriting according to strokes by adopting a key point extraction algorithm, and setting T= { T 1 ,t 2 ,...,t N The } represents the handwriting sampling point time sequence corresponding to the standard Chinese character c, and N represents the time sequence The number of sampling points is set to be KT= { KT 1 ,kt 2 ,...,kt M The sequence of keypoints with respect to the time sequence of handwriting sampling points T obtained by a keypoint extraction algorithm, wherein 1=kt 1 <kt i-1 <kt i <kt M =n, 1 < i < M, M represents the number of key points in the handwriting sampling point time sequence T, let bt= { BT 1 ,bt 2 ,...,bt M-1 The sequence of strokes of the handwriting corresponding to the handwriting sampling point time sequence T defined by the key point sequence KT is represented, wherein the ith stroke bt in the handwriting sampling point time sequence T i From key point kt i 、kt i+1 Defining that i is more than or equal to 1 and less than M-1;
a4 Manually labeling the corresponding relation between the strokes in the handwriting and the strokes in the standard Chinese character c): manually marking the corresponding relation between each section of strokes in the handwriting and each section of strokes in the standard Chinese character c, and manually confirming the corresponding relation between the handwriting and the strokes of the standard Chinese character c in order to overcome the problem of inconsistency caused by handwriting randomness and obtain correct stroke types, wherein the stroke corresponding relation comprises the following four types: 1 to 1, one section of strokes in the standard Chinese character c corresponds to one section of strokes in the handwriting; 1 pair M is that one section of strokes in the standard Chinese character c corresponds to a plurality of sections of strokes in the handwriting; n is 1, which is that a plurality of strokes in the standard Chinese character c correspond to one stroke in the handwriting; m is that a plurality of strokes in the standard Chinese character c correspond to a plurality of strokes in the handwriting, the handwriting which is marked manually or the type of the strokes obtained through calculation, and the handwriting with the calculation accuracy passing the manual inspection is taken as the template handwriting;
A5 Creating an index entry for the template stroke sequence and the template stroke subsequence: let c= { s 1 ,s 2 ,...,s n The standard stroke type sequence for standard Chinese character C is represented, where n represents the number of strokes contained in standard stroke type sequence C, to which two new stroke types s are added 0 Sum s n+1 Obtaining a set C1=CU { s- 0 ,s n+1 (s is therein 0 Indicating that it is in the position before writingRedundant virtual stroke types, s, of the preparation phase n+1 Representing redundant virtual stroke types in ending stage after writing is finished, and setting a stroke index matrix corresponding to the set C1 as
0.ltoreq.i.ltoreq.n+1, 0.ltoreq.j.ltoreq.n+1, the element index in the stroke index matrix D corresponding to the stroke type number in set C1, wherein +.>The specific meaning of (c) is as follows: set S a ={s 1 ,s 2 ,...,s m Is a sequence of template strokes corresponding to set C1, known as S1 a ={s v-1 ,s v ,s v+1 ,...,s v+k S is } is a In (1), wherein S1 a Strokes that have been manually or algorithmically labeled as being of the ith segment type begin with strokes that have been manually or algorithmically labeled as being of the jth segment type end at +.>There is one about S1 a Index item->Representing in a-th template stroke sequence related to standard Chinese character C, a template stroke sub-sequence with a length of k from the beginning of a b-th stroke to the end of an e-th stroke, wherein k=e-b+1, the immediately preceding stroke of the first segment of the template stroke sub-sequence, namely the b-1-th stroke of the template stroke sequence where the template stroke sub-sequence is located, is identified as the i-th stroke type in the set C1, the immediately following stroke of the last segment of the template stroke sub-sequence, namely the e+1-th stroke of the template stroke sub-sequence where the template stroke sub-sequence is located, is identified as the j-th stroke type in the set C1, and elements in the matrix D are expressed as% >Is a set of index items, each index item in the set of index items records a template stroke subsequence of a template stroke sequence, the length of the template stroke subsequence is k segments, the first segment of strokes in the template stroke sequence to be recognized are recognized as the ith stroke type, the first segment of strokes in the template stroke sequence to be recognized are marked as the ith stroke type manually or by an algorithm, the ith stroke type is the ith stroke type in a set C1 about a standard Chinese character C, i is more than or equal to 0 and less than or equal to n+1, and all template handwriting S about the standard Chinese character C is scanned a For all template handwriting S a Establishing index items for all template stroke subsequences in the template;
a6 Ending).
Preferably, the stroke recognition stage in the step B) includes the following steps:
b1 Let H = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) The method comprises the steps that a user inputs test handwriting corresponding to a standard Chinese character c through handwriting of a handwriting device, wherein the test handwriting refers to at least one section of handwriting which is unknown in stroke type and needs to be recognized in handwriting input through handwriting of the handwriting device, and the recognition stroke type refers to the establishment of a corresponding relation between strokes in handwriting and strokes in the standard Chinese character c;
b2 Dividing the test handwriting according to the strokes, calling a key point extraction algorithm by adopting the same method as the step A3), dividing the test handwriting H according to the strokes, and setting K h ={kh 1 ,kh 2 ,...,kh m+1 The sequence of keypoints representing the test handwriting H resulting from a keypoint extraction algorithm, wherein 1=kh 1 <kh i-1 <kh i <kh m+1 N,1 < i.ltoreq.m, n representing the number of key points in the key point sequence, let B h ={bh 1 ,bh 2 ,...,bh m The test handwriting H is represented by a key point sequence K h Dividing the resulting stroke sequence, wherein the stroke sequence B h The ith stroke bh of (a) i Is to (a)The dead point is defined by the key point kh i 、kh i+1 Defining that i is more than or equal to 1 and less than m, wherein m represents the number of segments of strokes in the stroke sequence;
b3 Initializing the identified stroke set G x And unrecognized Stroke sequence set B x Initializing a stroke sequence set U to be recognized x
The stroke sequence set U to be identified x A set of four tuples g1= (p 1, p2, p3, p 4), wherein p1, p2 represent the first stroke of the stroke sequence to be recognized at B h P3 represents the sequence number of the stroke type identified by the immediately preceding stroke of the stroke sequence to be identified in set C1, and p4 represents the sequence number of the stroke type identified by the immediately following stroke of the stroke sub-sequence to be identified in set C1;
the recognized strokes represent strokes in a test handwriting stroke sequence with determined stroke types according to marked stroke type data in a template handwriting library through calculation of a stroke type recognition algorithm;
The unrecognized strokes represent strokes in the test handwriting stroke sequence which cannot be determined yet according to the marked stroke type data in the template handwriting library through calculation of a stroke type recognition algorithm;
the strokes to be recognized are strokes which need to be recognized by a stroke type recognition algorithm and the stroke types of which are not determined yet;
the strokes in the handwriting stroke sequence are tested, and in an initial state, the stroke types are unknown;
the direct predecessor of the unrecognized stroke sequence refers to the case that the beginning stroke of the unrecognized stroke sequence is at B h The sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q 1, and q is 1 < m, then the sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q-1, for the beginning stroke at B h The sequence number 1 of the unrecognized stroke, the sequence number 0 of the immediately preceding stroke, and the type s of the stroke with the sequence number 0 in the unrecognized stroke 0 The stroke type s 0 The sequence number in set C1 is 0;
said unaware ofThe immediately subsequent strokes of the other stroke sequence refer to the end stroke at B if the end stroke of the stroke sequence is not recognized h The sequence number of the non-recognized stroke sequence is r, and r is not less than 1 and less than m, then the sequence number of the immediately following stroke of the non-recognized stroke sequence is r+1, and for the end stroke, the sequence number is represented as B h The number of the unrecognized stroke subsequence with the number m is m+1, the number of the immediately following stroke is s n+1 The stroke type s n+1 The sequence number in set C1 is n+1, where m represents the test handwriting stroke sequence B h The number of segments of the middle stroke, n, represents the number of strokes contained in the standard stroke type sequence C;
the identified stroke set G x G2= (p 5, p 6) is formed from two-tuple elements, where p5 represents the sequence of recognized strokes in the test handwriting stroke B h P6 representing the sequence number of the stroke type in the standard stroke type sequence C, the doublet g2= (p 5, p 6) representing the test handwriting stroke sequence B h The p5 th segment of the stroke is identified as the p6 th segment of the stroke type in the standard stroke type sequence C;
the unidentified stroke sequence set B x G3= (p 7, p8, p9, p 10) is formed by four elements, wherein p7 and p8 respectively represent the first stroke of the unrecognized stroke sub-sequence in the test handwriting stroke sequence B h The meanings of p9 and p10 are the same as the meanings of p3 and p4 in the quadruple g 1;
initializing a set of stroke sequences to be recognized U x = { (1, m,0, n+1) }, initializing the set of recognized strokes G x For null, initialize the unidentified stroke sequence set B x Is empty;
b4 If the stroke sequence set U is to be recognized x Not empty, then from the set of stroke sequences to be identified U x An element g1 epsilon U is arbitrarily selected x Element g1 is derived from set U x In step B5), otherwise, set U x If the task is empty, identifying that the task is finished, and jumping to the step B9);
b5 To be recognized, stroke subsequence type recognition: carrying out stroke sequence recognition by taking four components in the selected element G1 as parameters to obtain a recognized stroke subsequence set G1 and an unrecognized stroke subsequence set U1, wherein the recognized stroke subsequence set G1 is composed of elements of a binary group, the meaning of each element in the binary group is the same as the meaning of the element in the binary group G2 in the step B3), the unrecognized stroke subsequence set U1 is composed of elements of a quaternary group, and the meaning of each element in the quaternary group is the same as the meaning of the element in the quaternary group G3 in the step B3);
b6 If the number of elements in the unidentified stroke subsequence set U1 is 1 and the identified stroke subsequence set G1 is empty, skipping to step B7), otherwise skipping to step B8);
b7 Placing elements in the unidentified stroke sub-sequence set U1 into the unidentified stroke sequence set B x The method comprises the steps of carrying out a first treatment on the surface of the I.e. adding element g1 to the set of unrecognized stroke sequences B x Meaning that the stroke sequence represented by the element g1 in the unidentified stroke sub-sequence set U1 cannot be identified by the algorithm, skipping to step B4);
b8 Placing successfully recognized strokes into the recognized stroke set G) x Placing unidentified strokes into a stroke sequence set U to be identified x : i.e. adding an element in the set of recognized stroke subsequences G1 to the set of recognized strokes G x In the method, elements in the unidentified stroke sub-sequence set U1 are added to the stroke sequence set U to be identified x In step B4);
b9 End of stroke recognition if the stroke sequence set B is not recognized x Is not empty, and indicates that the set B of unrecognized stroke sequences does not exist in the template handwriting library x The defined test handwriting is in the same or similar writing form, jump to step A4), mark the sub-segment of the unrecognized stroke manually, add the marked stroke sequence to the template stroke sequence, update the stroke index matrix D finally, increase the representativeness of the template stroke sequence, if the set B of the stroke sequence is not recognized x For the blank, the successful recognition of all strokes in the test handwriting is described, and the recognition result is stored in the recognized stroke set G x Is a kind of medium.
Preferably, the stroke sequence recognition in the step B5) includes the following steps:
B51 Input parameters: test stroke sequence t= { T 1 ,t 2 ,...,t m The method comprises the steps of }, testing the number m of strokes in a stroke sequence T, the sequence number p1 and the length p2 of a first section of strokes of a stroke sub-sequence to be recognized in the stroke sequence T, the sequence numbers p3 and p4 of the stroke types of the predecessor and successor strokes of the stroke sub-sequence to be recognized in the stroke sequence T in a set C1, a standard Chinese character C related to the handwriting of the stroke to be recognized, and a stroke index matrix D established based on template handwriting related to the standard Chinese character C;
from the test stroke sequence t= { T according to the input parameters p1, p2 1 ,t 2 ,...,t m Intercepting to obtain stroke subsequence T1 = { T to be recognized p1 ,t p1+1 ...,t p1+p2-1 };
And outputting a result by an initialization algorithm: setting the identified stroke subsequence set G1 to be empty and setting the unrecognized stroke subsequence set U1 to be empty;
b52 Acquiring all index item sets meeting the conditions: acquiring index item sets of all relevant template handwriting by inquiring a stroke index matrix D, and reading D in the stroke index matrix D ij Element from d ij All index item sets meeting the condition are readWherein the range of the stroke number k is as follows: b1.ltoreq.k.ltoreq.b2, b1=p2 (1-10%), b2=p2 (1+10%), and summing the index term sets to obtainFor the union of index item sets meeting the conditions, w represents the number of index items in the set Y;
B53 Initializing index item subscript a=1;
b54 Reading a template stroke subsequence defined by the a-th index item: the a-th element (p) in the set Y is read a ,b a ,e a ) Reading the p-th template handwriting library corresponding to the standard Chinese character c a Sequence of individual template strokes Representing template Stroke sequence +.>The number of segments of the middle stroke, intercept the template handwriting from b a Segment start to e a Template Stroke subsequence for segment Stroke end>
B55 Calculating the corresponding relation between the stroke sub-sequence T1 to be identified and the template stroke sub-sequence S1: calculating the stroke corresponding relation between the template stroke subsequence S1 and the stroke subsequence T1 to be identified through a stroke matching algorithm;
b56 Calculating the ratio of the sum of the lengths of the matched strokes to the length of all stroke segments according to the corresponding relation obtained in the step B55), and setting R= { (R) 1 ,q 1 ),(r 2 ,q 2 ),...,(r x ,q x ) The stroke corresponding relation between S1 and T1 obtained by a stroke matching algorithm is represented, and two matching items of front and back two adjacent in the set R meet the condition R i ≤r i+1 ,q i ≤q i+1 ,1≤i<x, wherein the matching term (r i ,q i ) E R, 1.ltoreq.i.ltoreq.x represents the R < th > of the template stroke subsequence S1 i Segment stroke and q-th in to-be-recognized stroke subsequence T1 i Segment strokes are matched with each other and are provided withWherein d1= { r d |(r d ,q d ),1≤d≤x},Wherein d2= { q d |(r d ,q d ) Let len3, len4 respectively represent the sum of all stroke lengths in the template stroke subsequence S1 and the stroke subsequence T1 to be recognized, the symbol len (-) represents the length of the stroke, the calculated ratio Rat1=len1/len3, rat2=len2/len4;
B57 If both Rat1 and Rat2 exceed the preset threshold value, entering a step B58), otherwise, indicating that the writing modes of the template stroke subsequence S1 and the stroke subsequence T1 to be recognized have larger differences, jumping to the step B514), and selecting the next template handwriting for matching;
b58 Searching all continuous one-to-one matching sub-sequences with the length larger than 2 in the corresponding relation R, wherein the continuous one-to-one matching sub-sequences refer to thatRepresents a section of matching subsequence in the corresponding relation R, which is called R * For continuous one-to-one matching, if and only if j is 0.ltoreq.j < h for all j, condition r d+j +1=r d+j+1 And q d+j +1=q d+j+1 All are true, the matching subsequence R * The number of the matching items in the method is h+1, and R1= { R is set 1 ,R 2 ,...,R f All consecutive one-to-one matching sequences of length greater than 2 in correspondence R are represented, whereIs a continuous one-to-one matching sequence with a length of segment I greater than 2, whereinh l ≥2,0≤j<h l D > 0 and d+h l X is less than or equal to x, x represents the number of matching items in the corresponding relation R, h l Representing the sub-matching sequence R l The number of matching items in the list;
b59 If the number f of the matching subsequences in R1 is less than or equal to 0, indicating that the writing modes of the template stroke subsequence S1 and the to-be-recognized stroke subsequence T1 have larger differences, jumping to the step B514), and selecting the next template handwriting for matching; otherwise, go to step B510);
B510 A loop variable l=1 is set;
b511 According to the corresponding relation of strokes and the known stroke types in the template handwriting, obtaining the stroke types in the test handwriting: taking the first segment of matching subsequence R from R1 l According to R l One-to-one matching item of item I in (2)Test stroke sequence T +.>The type of the segment stroke is set to +.>Wherein->Representing +.f in template stroke subsequence S1>Stroke type number, R, of a segment stroke l ∈R1,0≤j<h l ,h l R represents l The number of the one-to-one matching items in the two-to-one matching item is that the two-element groupAdded to the identified stroke set G x In (3), the matching subsequence R is completed l Recognition of all defined strokes to be recognized and matching all the strokes with the subsequence R l Relevant recognition results->Adding to the set of recognized stroke subsequences G1;
b512 L=l+1; if l > f, jumping to step B513), otherwise jumping to step B511);
b513 Updating the unidentified stroke sub-sequence set U1: let r2= { q| (R, q) ∈r l ,R l E R1, 1.ltoreq.l.ltoreq.f } represents a stroke number in the test handwriting stroke sequence for which a stroke type has been identified, r3= { p1, p1+1,..p 1+p2} -R2 represents a stroke number in the test handwriting stroke sequence for which a stroke type has not been identified, wherein p1, p2 represent a starting position and a length of the test stroke sub-sequence in the test stroke sequence; sequencing the sequence numbers in the set R3 according to ascending order, and setting Representing a set of consecutive unrecognized sequence of sequence numbers in a test handwriting stroke sequence, said consecutive sequenceThe unrecognized stroke sequence number refers to any two adjacent sequence numbers +.>And->Satisfy condition->0≤v≤k j 1.ltoreq.j.ltoreq.x, where k j Representing the length of the sequence number of the jth segment of strokes, and x represents the number of the sequence number sequences in the set R4;
according to each successive unrecognized sequence of stroke numbers in set R4J is not less than 1 and not more than x, and a four-element group (d) is obtained j ,k j +1,y j ,o j ) Wherein d is j Representing the starting sequence number, k, of a sequence of consecutive unrecognized strokes in a sequence of strokes of a test handwriting j +1 represents the length of the sequence of consecutive unrecognized strokes, y j Representing the d-th in the test handwriting stroke sequence j -stroke type number, o, for which 1 segment of stroke is recognized j Representing the d-th in the test handwriting stroke sequence j +k j A stroke type number for which +1 strokes are identified;
a quadruple (d) corresponding to all consecutive unrecognized sequence of stroke numbers in set R4 j ,k j +1,y j ,o j ) Added to set U1, u1= { (d) j ,k j +1,y j ,o j ) 1. Ltoreq.j. Ltoreq.x }, where x represents the number of sequence numbers in the set R4, skipping to step B515);
b514 A=a+1, if a > w, setU1 = { (p 1, p2, p3, p 4) }, jump to step B515), otherwise jump to step B54), the p1, p2, p3, p4 being the input parameters of step B51); the meaning of the symbol w is as described in step B52);
B515 At the end, the set of recognized stroke subsequences G1 and the set of unrecognized stroke subsequences U1 are returned.
Preferably, creating the stroke index matrix corresponding to the set C1 in the step A5) includes the following steps:
a51 A) start;
a52 Initializing, setting each element in the stroke index matrix DI is more than or equal to 0 and less than or equal to n+1, j is more than or equal to 0 and less than or equal to n+1, and the +.>K is more than or equal to 0 and less than or equal to Max, wherein n represents the stroke number in the standard Chinese character c, max represents the preset maximum possible segment length, and a circulation variable x=1 is set;
a53 Taking the x-th manually marked template stroke sequence from the template handwriting libraryWherein k is j The jth stroke in the x-th template handwriting is manually marked as being equal to each k in the standard Chinese character c j The segment strokes correspond to each other, and k is more than or equal to 0 j ≤n+1,1≤j≤n x ,n x Representing the number of strokes in the x-th template handwriting, constructing a sequence with virtual start and end strokes +.>Wherein k is 0 =0 represents the virtual starting stroke in standard kanji c,representing virtual ending strokes in the standard Chinese character c, wherein n represents the number of strokes in the standard Chinese character c;
a54 Setting a cyclic variable y=0;
a55 A set loop variable len=2;
a56 Z=y+len, if z.ltoreq.n) x +1, jump to step a 57), otherwise jump to step a 59);
a57 Extracting a stroke subsequence from the template stroke sequence, establishing an index item and warehousing: slave template pen Drawing sequence T x Extracting artificially marked type k of the y and z segments y And k z Adding an index item (x, y+1, z-1) toIn the set of the two-dimensional image data,wherein->Is the kth in the stroke index matrix D y Line k z Column elements;
a58 Len=len+1, jump to step a 56);
a59 Y=y+1 if y is n x -1, jump to step a 55), otherwise jump to step a 510), wherein n x Representing the number of strokes in the x-th template handwriting;
a510 X=x+1, if x is not more than N, jump to step a 53), otherwise jump to step a 511), where N represents the number of template scripts in the template script library;
a511 At the end, the stroke index matrix D is returned.
Compared with the prior art, the invention has the following advantages: for a nonstandard arbitrary handwriting with known writing content, each section of strokes and the types of the strokes in the handwriting can be identified, and a foundation is laid for further extracting detailed characteristics of handwriting writing.
Drawings
FIG. 1 is a flow chart of a method of recognition of stroke types in an online handwriting of the present invention;
FIG. 2 is a flowchart showing the preparation stage of the stroke recognition in step A) of FIG. 1;
FIG. 3 is a specific flowchart for creating a stroke index matrix at step A5) of FIG. 2;
FIG. 4 is a flowchart showing the step B) of the stroke recognition stage in FIG. 1;
FIG. 5 is a specific flowchart of the stroke sequence recognition of step B5) of FIG. 4.
Detailed Description
The invention will now be described in further detail with reference to the drawings and to specific examples.
A method for identifying stroke types in online handwriting, as shown in FIG. 1, comprises the following steps:
a) Stroke recognition preparation: collecting a plurality of handwriting of a standard Chinese character c, dividing all the handwriting according to strokes, manually marking the corresponding relation between each stroke in the handwriting and each stroke in the standard Chinese character c, setting the handwriting which is manually marked or the handwriting which is obtained through calculation and has the calculation accuracy passing through manual inspection as a template handwriting, establishing a template stroke sequence of the template handwriting, dividing the template stroke sequence into a plurality of template stroke subsequences, establishing an index item of the template stroke sequence and the template stroke subsequence, wherein the index item comprises the sequence number of the template stroke subsequence in the template stroke sequence, the length of the template subsequence, the starting point and the end point in the template stroke sequence and the types of the direct predecessor stroke and the direct successor stroke of the template stroke subsequence, and the first section of the direct predecessor stroke refers to the former stroke in the template stroke sequence and the last section of the direct successor stroke of the template subsequence;
The standard Chinese character c is a Chinese character with standard writing style and expression form which is widely used at present;
the handwriting is a time sequence signal sequence related to time, which is obtained by collecting various information generated by the movement of the pen point in the writing process in real time through special data sensing equipment, and the data collected by the data sensing equipment at each sampling moment comprises two-dimensional position information of the pen point;
specifically, as shown in fig. 2, the stroke recognition preparation phase includes the steps of:
a1 For example, the stroke order of the "king" word is inquired to be "horizontal, vertical and horizontal", and the standard stroke type sequence of the "king" of the Chinese character is obtained by: c= { s 1 ,s 2 ,s 3 ,s 4 ,s 5 ,s 6 ,s 7 -wherein s 1 、s 3 、s 5 、s 7 Respectively representing the strokes of 'horizontal, vertical and horizontal' in 'king' word, s 2 、s 4 、s 6 Respectively represent s 1 And s 3 Between strokes, s 3 And s 5 Between strokes, s 5 And s 7 The continuous strokes among the strokes, the subscript i represents the number of the Chinese character 'king' in a standard body stroke order database, a stroke SE is defined for coping with the phenomenon of multiple strokes, if nonsensical strokes exist in the handwriting, the strokes are corresponding to the SE, and no strokes corresponding to the strokes are represented in the writing specification;
A2 Collecting various handwriting corresponding to standard Chinese character c): the various handwriting corresponding to the standard Chinese character c refers to various standard handwriting related to the standard Chinese character c, non-standard handwriting which can be correctly identified by other people or can be identified by only a few people or can be identified by the writers, wherein the identification refers to the establishment of the corresponding relation between the handwriting and the standard Chinese character c, and the establishment ofRepresenting the ith handwriting corresponding to the standard Chinese character c obtained by the handwriting equipment, (x) j ,y j ) Representing the position information of the pen point acquired by the handwriting equipment at the j moment, wherein j is more than or equal to 1 and n is more than or equal to n i ,n i Representing the ith handwriting h i The number of the middle sampling points;
a3 Dividing handwriting according to strokes): the method comprises the steps of segmenting handwriting according to strokes by adopting a key point extraction algorithm, and setting T= { T 1 ,t 2 ,...,t N The time sequence of handwriting sampling points corresponding to the standard Chinese character c is represented, N represents the number of sampling points in the time sequence, and KT= { KT is set 1 ,kt 2 ,...,kt M The sequence of keypoints with respect to the time sequence of handwriting sampling points T obtained by a keypoint extraction algorithm, wherein 1=kt 1 <kt i-1 <kt i <kt M =n, 1 < i < M, M representing handwritingThe number of key points in the handwriting sampling point time sequence T is set to be BT= { BT 1 ,bt 2 ,...,bt M-1 The sequence of strokes of the handwriting corresponding to the handwriting sampling point time sequence T defined by the key point sequence KT is represented, wherein the ith stroke bt in the handwriting sampling point time sequence T i From key point kt i 、kt i+1 Defining that i is more than or equal to 1 and less than M-1;
a4 Manually labeling the corresponding relation between the strokes in the handwriting and the strokes in the standard Chinese character c): manually marking the corresponding relation between each section of strokes in the handwriting and each section of strokes in the standard Chinese character c, and manually confirming the corresponding relation between the handwriting and the strokes of the standard Chinese character c in order to overcome the problem of inconsistency caused by handwriting randomness and obtain correct stroke types, wherein the corresponding relation of the strokes comprises the following four types: 1 to 1, one section of strokes in the standard Chinese character c corresponds to one section of strokes in the handwriting; 1 pair M is that one section of strokes in the standard Chinese character c corresponds to a plurality of sections of strokes in the handwriting; n is 1, which is that a plurality of strokes in the standard Chinese character c correspond to one stroke in the handwriting; m is that a plurality of strokes in the standard Chinese character c correspond to a plurality of strokes in the handwriting, the handwriting which is marked manually or the type of the strokes obtained through calculation, and the handwriting with the calculation accuracy passing the manual inspection is taken as the template handwriting;
A5 Creating an index entry for the template stroke sequence and the template stroke subsequence: let c= { s 1 ,s 2 ,...,s n The standard stroke type sequence for standard Chinese character C is represented, where n represents the number of strokes contained in standard stroke type sequence C, to which two new stroke types s are added 0 Sum s n+1 Obtaining a set C1=CU { s- 0 ,s n+1 (s is therein 0 Representing redundant virtual stroke types, s, in a preparation phase prior to starting writing n+1 Representing redundant strokes in ending stage after writing is finished, and setting a stroke index matrix corresponding to the set C1 as
0.ltoreq.i.ltoreq.n+1, 0.ltoreq.j.ltoreq.n+1, the element index in the stroke index matrix D corresponding to the stroke type number in set C1, wherein +.>The specific meaning of (c) is as follows: set S a ={s 1 ,s 2 ,...,s m Is a sequence of template strokes corresponding to set C1, known as S1 a ={s v-1 ,s v ,s v+1 ,...,s v+k S is } is a In (1), wherein S1 a Beginning with a stroke labeled as the ith segment type by a person or algorithm and ending with a stroke labeled as the jth segment type by a person or algorithm, in +.>There is one about S1 a Index item->In the a-th template stroke sequence related to standard Chinese character C, there is a template stroke sub-sequence with length of k from the beginning of the b-th stroke to the end of the e-th stroke, k=e-b+1, the immediately preceding stroke of the first segment of the template stroke sub-sequence, i.e. the b-1-th stroke of the template stroke sequence where the template stroke sub-sequence is located, is identified as the i-th stroke type in the set C1, the immediately following stroke of the last segment of the template stroke sub-sequence, i.e. the e+1-th stroke of the template stroke sequence where the template stroke sub-sequence is located, is identified as the j-th stroke type in the set C1, the element in the matrix D >Is a set of index items, each index item in the set of index items records a template stroke subsequence of the template stroke sequence, the length of the template stroke subsequence is k segments, and the first segment of stroke in the template stroke sequence to be identified is identified asThe ith stroke type refers to the ith stroke type marked by manpower or algorithm in the template stroke sequence to be recognized, wherein the ith stroke type is the ith stroke type in the set C1 of the standard Chinese character C, i is more than or equal to 0 and less than or equal to n+1, and all template handwriting S of the standard Chinese character C is scanned a For all template handwriting S a Establishing index items of a template stroke sequence and a template stroke sub-sequence, wherein in the step, the step of creating a stroke index matrix corresponding to the set C1 comprises the following steps:
a51 A) start;
a52 Initializing, setting each element in the stroke index matrix DI is more than or equal to 0 and less than or equal to n+1, j is more than or equal to 0 and less than or equal to n+1, and the +.>K is more than or equal to 0 and less than or equal to Max, wherein n represents the stroke number in the standard Chinese character c, max represents the preset maximum possible segment length, and a circulation variable x=1 is set;
a53 Taking the x-th manually marked template stroke sequence from the template handwriting libraryWherein k is j The jth stroke in the x-th template handwriting is manually marked as being equal to each k in the standard Chinese character c j The segment strokes correspond to each other, and k is more than or equal to 0 j ≤n+1,1≤j≤n x ,n x Representing the number of strokes in the x-th template handwriting, constructing a sequence with virtual start and end strokes +.>Wherein k is 0 =0 represents the virtual starting stroke in standard kanji c,representing virtual ending strokes in the standard Chinese character c, wherein n represents the number of strokes in the standard Chinese character c;
a54 Setting a cyclic variable y=0;
a55 A set loop variable len=2;
a56 Z=y+len, if z.ltoreq.n) x +1, jump to step a 57), otherwise jump to step a 59);
a57 Extracting a stroke subsequence from the template stroke sequence, establishing an index item and warehousing: from template stroke sequence T x Extracting artificially marked type k of the y and z segments y And k z Adding an index item (x, y+1, z-1) toIn the set of the two-dimensional image data,wherein->Is the kth in the stroke index matrix D y Line k z Column elements;
a58 Len=len+1, jump to step a 56);
a59 Y=y+1 if y is n x -1, jump to step a 55), otherwise jump to step a 510), wherein n x Representing the number of strokes in the x-th template handwriting;
a510 X=x+1, if x is not more than N, jump to step a 53), otherwise jump to step a 511), where N represents the number of template scripts in the template script library;
a511 Ending, returning to the stroke index matrix D;
A6 Ending;
b) And (3) stroke identification: searching all template stroke sequences with the length meeting the requirement in an index item according to the length of the stroke sequence to be recognized, then taking out corresponding template stroke subsequences one by one according to the index item, calling a matching algorithm to establish the corresponding relation between the template stroke subsequences and the stroke sequence to be recognized to form a matching sequence, searching continuous one-to-one stroke corresponding relation with the length being more than 2 in the matching sequence, if the corresponding relation meeting the condition does not exist in the matching sequence, reading the next index item, if the corresponding relation meeting the condition does not exist in the template handwriting contained in all the index items, marking the template handwriting to be recognized as a brand new standard Chinese character c writing form, updating the template handwriting and the index item by manpower, expanding the representativeness of the template handwriting, otherwise, obtaining the type of unknown strokes in the handwriting to be recognized by the continuous one-to-one stroke corresponding relation and the marked stroke type, repeating the processes until all strokes to be recognized or part of strokes to be recognized are recognized, describing the strokes which cannot be recognized, and returning the partial stroke pattern to the manually recognized in the method as shown in the following stage 4, and the step of writing pattern is not shown in the step of writing, and the step 4 is carried out:
B1 Let H = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) The method comprises the steps that a user inputs test handwriting corresponding to a standard Chinese character c through handwriting of a handwriting device, the test handwriting refers to at least one section of handwriting which is unknown in stroke type and needs to be recognized in handwriting input through handwriting of the handwriting device, and the stroke type recognition refers to the establishment of a corresponding relation between strokes in handwriting and strokes in the standard Chinese character c;
b2 Dividing the test handwriting according to the strokes, calling a key point extraction algorithm by adopting the same method as the step A3), dividing the test handwriting H according to the strokes, and setting K h ={kh 1 ,kh 2 ,...,kh m+1 The sequence of keypoints representing the test handwriting H resulting from a keypoint extraction algorithm, wherein 1=kh 1 <kh i-1 <kh i <kh m+1 N,1 < i.ltoreq.m, n representing the number of key points in the key point sequence, let B h ={bh 1 ,bh 2 ,...,bh m The test handwriting H is represented by a key point sequence K h Dividing the resulting stroke sequence, wherein the stroke sequence B h The ith stroke bh of (a) i From the key point kh i 、kh i+1 Defining that i is more than or equal to 1 and less than m, wherein m represents the number of segments of strokes in the stroke sequence;
b3 Initializing the identified stroke set G x And unrecognized Stroke sequence set B x Initializing a stroke sequence set U to be recognized x
Stroke sequence set U to be recognized x A set of four tuples g1= (p 1, p2, p3, p 4), wherein p1, p2 represent the first stroke of the stroke sequence to be recognized at B h P3 represents the sequence number of the stroke type identified by the immediately preceding stroke of the stroke sequence to be identified in set C1, and p4 represents the sequence number of the stroke type identified by the immediately following stroke of the stroke sub-sequence to be identified in set C1;
the recognized strokes represent strokes in the test handwriting stroke sequence with determined stroke types according to the marked stroke type data in the template handwriting library through calculation of a stroke type recognition algorithm;
the unrecognized strokes represent strokes in the test handwriting stroke sequence which are not determined yet according to the marked stroke type data in the template handwriting library by calculation of a stroke type recognition algorithm;
the strokes to be recognized are strokes which need to be recognized by a stroke type recognition algorithm and the stroke types of which are not determined yet;
testing strokes in a handwriting stroke sequence, wherein the types of the strokes are unknown in an initial state;
the immediately preceding stroke of the unrecognized stroke sequence means that if the beginning stroke of the unrecognized stroke sequence is at B h The sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q 1, and q is 1 < m, then the sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q-1, for the beginning stroke at B h The sequence number 1 of the unrecognized stroke, the sequence number 0 of the immediately preceding stroke, and the type s of the stroke with the sequence number 0 in the unrecognized stroke 0 Stroke type s 0 The sequence number in set C1 is 0;
the immediately subsequent stroke of the unrecognized stroke sequence means that if the end stroke of the unrecognized stroke sequence is at B h The sequence number of the non-recognized stroke sequence is r, r is not less than 1 and less than m, then the sequence number of the immediately subsequent stroke of the non-recognized stroke sequence is r+1, and the stroke is ended at B h Unidentified with the sequence number m inA stroke subsequence having a sequence number m+1 of immediately subsequent strokes of type s n+1 The stroke type s n+1 The sequence number in set C1 is n+1, where m represents the test handwriting stroke sequence B h The number of segments of the middle stroke, n, represents the number of strokes contained in the standard stroke type sequence C;
identified stroke set G x G2= (p 5, p 6) is formed from two-tuple elements, where p5 represents the sequence of recognized strokes in the test handwriting stroke B h P6 representing the sequence number of the stroke type in the standard stroke type sequence C, the doublet g2= (p 5, p 6) representing the test handwriting stroke sequence B h The p5 th segment of the stroke is identified as the p6 th segment of the stroke type in the standard stroke type sequence C;
Unidentified Stroke sequence set B x G3= (p 7, p8, p9, p 10) is formed by four elements, wherein p7 and p8 respectively represent the first stroke of the unrecognized stroke sub-sequence in the test handwriting stroke sequence B h The meanings of p9 and p10 are the same as the meanings of p3 and p4 in the quadruple g 1;
initializing a set of stroke sequences to be recognized U x = { (1, m,0, n+1) }, initializing the set of recognized strokes G x For null, initialize the unidentified stroke sequence set B x Is empty;
b4 If the stroke sequence set U is to be recognized x Not empty, then from the set of stroke sequences to be identified U x An element g1 epsilon U is arbitrarily selected x Element g1 is derived from set U x In step B5), otherwise, set U x If the task is empty, identifying that the task is finished, and jumping to the step B9);
b5 To be recognized, stroke subsequence type recognition: the method comprises the steps of carrying out stroke sequence recognition by taking four components in a selected element G1 as parameters to obtain a recognized stroke subsequence set G1 and an unrecognized stroke subsequence set U1, wherein the recognized stroke subsequence set G1 is composed of elements in a binary group, the meaning of each element in the binary group is the same as that of an element in a binary group G2 in a step B3), the unrecognized stroke subsequence set U1 is composed of elements in a quaternary group, the meaning of each element in the quaternary group is the same as that of an element in a quaternary group G3 in the step B3), and in the step, as shown in fig. 5, the four components in the selected element G1 are taken as parameters to carry out the stroke sequence recognition, and the method comprises the following steps:
B51 Input parameters: test stroke sequence t= { T 1 ,t 2 ,...,t m The method comprises the steps of (1) testing the number m of strokes in a stroke sequence T, the sequence number p1 and the length p2 of a first section of strokes of a stroke sub-sequence to be recognized in the stroke sequence T, the sequence numbers p3 and p4 of the strokes types of the predecessor and successor strokes of the stroke sub-sequence to be recognized in the stroke sequence T in a set C1, a standard Chinese character C related to the handwriting of the stroke to be recognized, and a stroke index matrix D established based on template handwriting related to the standard Chinese character C;
from the test stroke sequence t= { T according to the input parameters p1, p2 1 ,t 2 ,...,t m Intercepting to obtain stroke subsequence T1 = { T to be recognized p1 ,t p1+1 ...,t p1+p2-1 };
And outputting a result by an initialization algorithm: setting the identified stroke subsequence set G1 to be empty and setting the unrecognized stroke subsequence set U1 to be empty;
b52 Acquiring all index item sets meeting the conditions: acquiring index item sets of all relevant template handwriting by inquiring a stroke index matrix D, and reading D in the stroke index matrix D ij Element from d ij All index item sets meeting the condition are readWherein the range of the stroke number k is as follows: b1.ltoreq.k.ltoreq.b2, b1=p2 (1-10%), b2=p2 (1+10%), and summing the index term sets to obtainFor the union of index item sets meeting the conditions, w represents the number of index items in the set Y;
B53 Initializing index item subscript a=1;
b54 Reading a template stroke subsequence defined by the a-th index item: the a-th element (p) in the set Y is read a ,b a ,e a ) Reading the p-th template handwriting library corresponding to the standard Chinese character c a Sequence of individual template strokes/>Representing template Stroke sequence +.>The number of segments of the middle stroke, intercept the template handwriting from b a Segment start to e a Template Stroke subsequence for segment Stroke end>
B55 Calculating the corresponding relation between the stroke sub-sequence T1 to be identified and the template stroke sub-sequence S1: calculating the stroke corresponding relation between the template stroke subsequence S1 and the stroke subsequence T1 to be identified through a stroke matching algorithm;
b56 Calculating the ratio of the sum of the lengths of the matched strokes to the length of all stroke segments according to the corresponding relation obtained in the step B55), and setting R= { (R) 1 ,q 1 ),(r 2 ,q 2 ),...,(r x ,q x ) The stroke corresponding relation between S1 and T1 obtained by a stroke matching algorithm is represented, and two matching items of front and back two adjacent in the set R meet the condition R i ≤r i+1 ,q i ≤q i+1 ,1≤i<x, wherein the matching term (r i ,q i ) E R, 1.ltoreq.i.ltoreq.x represents the R < th > of the template stroke subsequence S1 i Segment stroke and q-th in to-be-recognized stroke subsequence T1 i Segment strokes are matched with each other and are provided withWherein d1= { r d |(r d ,q d ),1≤d≤x},Wherein d2= { q d |(r d ,q d ) Let len3 and len4 respectively represent template stroke subsequences S1 and x }, 1 ∈d ∈x- The sum of all stroke lengths in the stroke sub-sequence to be recognized T1, a symbol len (·) represents the length of the stroke, and the ratio Rat1=len1/len3, rat2=len2/len4 is calculated;
b57 If both Rat1 and Rat2 exceed the preset threshold value, entering a step B58), otherwise, indicating that the writing modes of the template stroke subsequence S1 and the stroke subsequence T1 to be recognized have larger differences, jumping to the step B514), and selecting the next template handwriting for matching;
b58 Searching all continuous one-to-one matching sub-sequences with the length larger than 2 in the corresponding relation R, wherein the continuous one-to-one matching sub-sequences are thatRepresents a section of matching subsequence in the corresponding relation R, which is called R * For continuous one-to-one matching, if and only if j is 0.ltoreq.j < h for all j, condition r d+j +1=r d+j+1 And q d+j +1=q d+j+1 All are true, match subsequence R * The number of the matching items in the method is h+1, and R1= { R is set 1 ,R 2 ,...,R f All consecutive one-to-one matching sequences of length greater than 2 in correspondence R are denoted ∈ }, where ∈ ->Is a continuous one-to-one matching sequence with a length of segment I greater than 2, wherein +.>h l ≥2,0≤j<h l D > 0 and d+h l X is less than or equal to x, x represents the number of matching items in the corresponding relation R, h l Representing the sub-matching sequence R l The number of matching items in the list;
b59 If the number f of the matching subsequences in R1 is less than or equal to 0, indicating that the writing modes of the template stroke subsequence S1 and the to-be-recognized stroke subsequence T1 have larger differences, jumping to the step B514), and selecting the next template handwriting for matching; otherwise, go to step B510);
B510 A loop variable l=1 is set;
b511 According to the corresponding relation of strokes and the known stroke types in the template handwriting, obtainTesting the stroke types in handwriting: taking the first segment of matching subsequence R from R1 l According to R l One-to-one matching item of item I in (2)Test stroke sequence T +.>The type of the segment stroke is set to +.>Wherein->Representing +.f in template stroke subsequence S1>Stroke type number, R, of a segment stroke l ∈R1,0≤j<h l ,h l R represents l The number of the one-to-one matching items in the two-to-one matching item is that the two-element groupAdded to the identified stroke set G x In (3), the matching subsequence R is completed l Recognition of defined strokes to be recognized and matching all with sub-sequence R l Relevant recognition results->Adding to the set of recognized stroke subsequences G1; />
B512 L=l+1; if l > f, jumping to step B513), otherwise jumping to step B511);
b513 Updating the unidentified stroke sub-sequence set U1: let r2= { q| (R, q) ∈r l ,R l E R1, 1.ltoreq.l.ltoreq.f } represents a stroke number in the test handwriting stroke sequence for which a stroke type has been identified, r3= { p1, p1+1,..p 1+p2} -R2 represents a stroke number in the test handwriting stroke sequence for which a stroke type has not been identified, wherein p1, p2 represent a starting position and a length of the test stroke sub-sequence in the test stroke sequence; sequencing the sequence numbers in the set R3 according to ascending order, and setting Representing a continuous set of unrecognized sequence of sequence numbers in the sequence of strokes of the test handwriting, the continuous set of unrecognized sequence of sequence numbers referring to any two adjacent sequence numbers in the sequence +.>And->Satisfy condition->0≤v≤k j 1.ltoreq.j.ltoreq.x, where k j Representing the length of the sequence number of the jth segment of strokes, and x represents the number of the sequence number sequences in the set R4;
according to each successive unrecognized sequence of stroke numbers in set R4J is not less than 1 and not more than x, and a four-element group (d) is obtained j ,k j +1,y j ,o j ) Wherein d is j Representing the starting sequence number, k, of a sequence of consecutive unrecognized strokes in a sequence of strokes of a test handwriting j +1 represents the length of the sequence of consecutive unrecognized strokes, y j Representing the d-th in the test handwriting stroke sequence j -stroke type number, o, for which 1 segment of stroke is recognized j Representing the d-th in the test handwriting stroke sequence j +k j A stroke type number for which +1 strokes are identified;
a quadruple (d) corresponding to all consecutive unrecognized sequence of stroke numbers in set R4 j ,k j +1,y j ,o j ) Added to set U1, u1= { (d) j ,k j +1,y j ,o j ) 1. Ltoreq.j. Ltoreq.x }, where x represents the number of sequence numbers in the set R4, skipping to step B515);
b514 A=a+1, if a > w, setU1={(p1, p2, p3, p 4) }, jump to step B515), otherwise jump to step B54), p1, p2, p3, p4 being the input parameters of step B51);
B515 Ending, returning the set of recognized stroke subsequences G1 and the set of unrecognized stroke subsequences U1;
b6 If the number of elements in the unidentified stroke subsequence set U1 is 1 and the identified stroke subsequence set G1 is empty, skipping to step B7), otherwise skipping to step B8);
b7 To-be-recognized stroke sequence set U) x Put unidentified Stroke sequence set B x The method comprises the steps of carrying out a first treatment on the surface of the I.e. adding element g1 to the set of unrecognized stroke sequences B x Meaning a set of stroke sequences to be identified U x The stroke sequence represented by the element g1 cannot be identified by the algorithm, and the step B4) is skipped;
b8 Placing successfully recognized strokes into the recognized stroke set G) x Placing unidentified strokes into a stroke sequence set U to be identified x : i.e. adding an element in the set of recognized stroke subsequences G1 to the set of recognized strokes G x In the method, elements in the unidentified stroke sub-sequence set U1 are added to the stroke sequence set U to be identified x In step B4);
b9 End of stroke recognition if the stroke sequence set B is not recognized x Is not empty, and indicates that the set B of unrecognized stroke sequences does not exist in the template handwriting library x The defined test handwriting is in the same or similar writing form, jump to step A4), mark the sub-segment of the unrecognized stroke manually, add the marked stroke sequence to the template stroke sequence, update the stroke index matrix D finally, increase the representativeness of the template stroke sequence, if the set B of the stroke sequence is not recognized x For the blank, the successful recognition of all strokes in the test handwriting is described, and the recognition result is stored in the recognized stroke set G x Is a kind of medium.
In this embodiment, the handwriting of the standard chinese character c refers to the non-standard handwriting of the standard chinese character c, which is written by a writer and can be correctly identified by other people or can be identified by only a few people or can be identified by the writer, the identification result is the correspondence between the handwriting and the standard chinese character c, collecting the various handwriting related to the standard chinese character c refers to collecting the various non-standard expression forms and the non-standard writing orders related to the standard chinese character c as much as possible, and there are various personalized handwriting with multiple strokes, few strokes and simplified strokes.
The invention discloses a method for identifying stroke types in online handwriting, which can identify each stroke and the stroke types of each stroke in the handwriting for nonstandard arbitrary handwriting with known writing content, and lays a foundation for further extracting detailed characteristics of handwriting writing.
The invention, in part, is not disclosed in detail and is well known in the art. While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (4)

1. A method for identifying stroke types in online handwriting is characterized in that: the method comprises the following steps:
a) Stroke recognition preparation: collecting a plurality of handwriting of a standard Chinese character c, dividing all handwriting according to strokes, manually marking the corresponding relation between each stroke in the handwriting and each stroke in the standard Chinese character c, setting the handwriting which is manually marked or the handwriting which is obtained through calculation and has the calculation accuracy passing through manual inspection as a template handwriting, establishing a template stroke sequence of the template handwriting, dividing the template stroke sequence into a plurality of template stroke subsequences, establishing an index item of the template stroke sequence and the template stroke subsequence, wherein the index item comprises the sequence number of the template stroke subsequence in the template sequence, the length of the template stroke subsequence, the starting point and the end point of the stroke in the template stroke sequence, and the types of the direct predecessor and the direct successor of the template stroke subsequence of the first stroke of the direct predecessor template stroke subsequence;
The standard Chinese character c is a Chinese character with standard writing style and expression form which is widely used at present;
the handwriting is a time sequence signal sequence related to time, which is obtained by collecting various information generated by the movement of the pen point in the writing process in real time through special data sensing equipment, and the data collected by the data sensing equipment at each sampling moment comprises two-dimensional position information of the pen point;
b) And (3) stroke identification: searching all template stroke sequences with the lengths meeting the requirements in the index items according to the lengths of the stroke sequences to be recognized, then taking out corresponding template stroke subsequences one by one according to the index items, calling a matching algorithm to establish a corresponding relation between the template stroke subsequences and the stroke sequences to be recognized to form a matching sequence, searching a continuous one-to-one stroke corresponding relation with the lengths being more than 2 in the matching sequence, repeating the process until all strokes of the handwriting to be recognized are recognized or part of strokes are recognized, reading the next index item, marking the handwriting to be recognized as a brand-new standard Chinese character c writing form if the template handwriting contained in all the index items does not have the corresponding relation meeting the conditions, and updating the template handwriting and the index items again by manpower, expanding the representative of the template handwriting, otherwise, obtaining the types of unknown strokes in the handwriting to be recognized by continuous one-to-one stroke corresponding relation and marked types, obtaining a plurality of stroke subsequences with smaller lengths, repeating the process until all strokes of the handwriting to be recognized are recognized or part of strokes are recognized, and the partial strokes cannot be recognized, and returning the partial writing forms to the template handwriting to be recognized, wherein the method comprises the steps of marking is not shown in the following steps:
B51 Input parameters: test stroke sequence t= { T 1 ,t 2 ,...,t m The method comprises the steps of }, testing the number m of strokes in a stroke sequence T, the sequence number p1 and the length p2 of a first section of strokes of a stroke sub-sequence to be recognized in the stroke sequence T, the sequence numbers p3 and p4 of the stroke types of the predecessor and successor strokes of the stroke sub-sequence to be recognized in the stroke sequence T in a set C1, a standard Chinese character C related to the handwriting of the stroke to be recognized, and a stroke index matrix D established based on template handwriting related to the standard Chinese character C;
from the test stroke sequence t= { T according to the input parameters p1, p2 1 ,t 2 ,...,t m Intercepting to obtain stroke subsequence T1 = { T to be recognized p1 ,t p1+1 ...,t p1+p2-1 };
And outputting a result by an initialization algorithm: setting the identified stroke subsequence set G1 to be empty and setting the unrecognized stroke subsequence set U1 to be empty;
b52 Acquiring all index item sets meeting the conditions: acquiring index item sets of all relevant template handwriting by inquiring a stroke index matrix D, and reading D in the stroke index matrix D ij Element from d ij All index item sets meeting the condition are readWherein the range of the stroke number k is as follows: b1.ltoreq.k.ltoreq.b2, b1=p2 (1-10%), b2=p2 (1+10%), and summing the index term sets to obtain +.>For the union of index item sets meeting the conditions, w represents the number of index items in the set Y;
B53 Initializing index item subscript a=1;
b54 Reading a template stroke subsequence defined by the a-th index item: the a-th element (p) in the set Y is read a ,b a ,e a ) Reading the p-th template handwriting library corresponding to the standard Chinese character c a Pen-like drawing sequence of each templateColumn of Representing template Stroke sequence +.>The number of segments of the middle stroke, intercept the template handwriting from b a Segment start to e a Template Stroke subsequence for segment Stroke end>
B55 Calculating the corresponding relation between the stroke sub-sequence T1 to be identified and the template stroke sub-sequence S1: calculating the stroke corresponding relation between the template stroke subsequence S1 and the stroke subsequence T1 to be identified through a stroke matching algorithm;
b56 Calculating the ratio of the sum of the lengths of the matched strokes to the length of all stroke segments according to the corresponding relation obtained in the step B55), and setting R= { (R) 1 ,q 1 ),(r 2 ,q 2 ),...,(r x ,q x ) The stroke corresponding relation between S1 and T1 obtained by a stroke matching algorithm is represented, and two matching items of front and back two adjacent in the set R meet the condition R i ≤r i+1 ,q i ≤q i+1 ,1≤i<x, wherein the matching term (r i ,q i ) E R, 1.ltoreq.i.ltoreq.x represents the R < th > of the template stroke subsequence S1 i Segment stroke and q-th in to-be-recognized stroke subsequence T1 i Segment strokes are matched with each other and are provided withWherein d1= { r d |(r d ,q d ),1≤d≤x},/>Wherein d2= { q d |(r d ,q d ) Let len3, len4 respectively represent the sum of all stroke lengths in the template stroke subsequence S1 and the stroke subsequence T1 to be recognized, the symbol len (-) represents the length of the stroke, the calculated ratio Rat1=len1/len3, rat2=len2/len4;
B57 If both Rat1 and Rat2 exceed the preset threshold value, entering a step B58), otherwise, indicating that the writing modes of the template stroke subsequence S1 and the stroke subsequence T1 to be recognized have larger differences, jumping to the step B514), and selecting the next template handwriting for matching;
b58 Searching all continuous one-to-one matching sub-sequences with the length larger than 2 in the corresponding relation R, wherein the continuous one-to-one matching sub-sequences refer to thatRepresents a section of matching subsequence in the corresponding relation R, which is called R * For continuous one-to-one matching, if and only if j is 0.ltoreq.j < h for all j, condition r d+j +1=r d+j+1 And q d+j +1=q d+j+1 All are true, the matching subsequence R * The number of the matching items in the method is h+1, and R1= { R is set 1 ,R 2 ,...,R f All consecutive one-to-one matching sequences of length greater than 2 in correspondence R are represented, whereIs a continuous one-to-one matching sequence with a length of segment I greater than 2, whereinh l ≥2,0≤j<h l D > 0 and d+h l X is less than or equal to x, x represents the number of matching items in the corresponding relation R, h l Representing the sub-matching sequence R l The number of matching items in the list;
b59 If the number f of the matching subsequences in R1 is less than or equal to 0, indicating that the writing modes of the template stroke subsequence S1 and the to-be-recognized stroke subsequence T1 have larger differences, jumping to the step B514), and selecting the next template handwriting for matching; otherwise, go to step B510);
B510 A loop variable l=1 is set;
b511 According to the corresponding relation of strokes and the known stroke types in the template handwriting, obtaining the stroke types in the test handwriting: taking the first segment of matching subsequence R from R1 l According to R l One-to-one matching item of item I in (2)Test stroke sequence T +.>The type of the segment stroke is set to +.>Wherein->Representing +.f in template stroke subsequence S1>Stroke type number, R, of a segment stroke l ∈R1,0≤j<h l ,h l R represents l The number of the one-to-one matching items in the two-to-one matching item is equal to the two-tuple +.>Added to the identified stroke set G x In (3), the matching subsequence R is completed l Recognition of defined strokes to be recognized and matching all with sub-sequence R l Relevant recognition results->Adding to the set of recognized stroke subsequences G1;
b512 L=l+1; if l > f, jumping to step B513), otherwise jumping to step B511);
b513 Updating the unidentified stroke sub-sequence set U1: let r2= { q| (R, q) ∈r l ,R l E R1, 1.ltoreq.l.ltoreq.f } represents a stroke number in the test handwriting stroke sequence for which a stroke type has been identified, r3= { p1, p1+1,..p 1+p2} -R2 represents a stroke number in the test handwriting stroke sequence for which a stroke type has not been identified, wherein p1, p2 represent a starting position and a length of the test stroke sub-sequence in the test stroke sequence; sequencing the sequence numbers in the set R3 according to ascending order, and setting Representing a set of consecutive unrecognized sequence of sequence numbers in the sequence of strokes of the test handwriting, said consecutive unrecognized sequence of sequence numbers referring to any two adjacent sequence numbers in the sequence ∈ ->And->Satisfy condition->0≤v≤k j 1.ltoreq.j.ltoreq.x, where k j Representing the length of the sequence number of the jth segment of strokes, and x represents the number of the sequence number sequences in the set R4;
according to each successive unrecognized sequence of stroke numbers in set R4J is not less than 1 and not more than x, and a four-element group (d) is obtained j ,k j +1,y j ,o j ) Wherein d is j Representing the starting sequence number, k, of a sequence of consecutive unrecognized strokes in a sequence of strokes of a test handwriting j +1 represents the length of the sequence of consecutive unrecognized strokes, y j Representing the d-th in the test handwriting stroke sequence j -stroke type number, o, for which 1 segment of stroke is recognized j Representing the d-th in the test handwriting stroke sequence j +k j A stroke type number for which +1 strokes are identified;
will be combined withAll four-tuple (d) corresponding to consecutive unrecognized sequence of stroke numbers in R4 j ,k j +1,y j ,o j ) Added to set U1, u1= { (d) j ,k j +1,y j ,o j ) 1. Ltoreq.j. Ltoreq.x }, where x represents the number of sequence numbers in the set R4, skipping to step B515);
b514 A=a+1, if a > w, setU1 = { (p 1, p2, p3, p 4) }, jump to step B515), otherwise jump to step B54), the p1, p2, p3, p4 being the input parameters of step B51);
B515 At the end, the set of recognized stroke subsequences G1 and the set of unrecognized stroke subsequences U1 are returned.
2. A method of recognition of stroke types in an online handwriting according to claim 1 and wherein: the preparation stage of the stroke identification in the step A) comprises the following steps:
a1 Inquiring data, and establishing a standard body stroke order database of Chinese characters, wherein the Chinese characters comprise meaningful Chinese characters and nonsensical symbols written according to the writing specification of the Chinese characters;
a2 Collecting various handwriting corresponding to standard Chinese character c): the various handwriting corresponding to the standard Chinese character c refers to various standard handwriting related to the standard Chinese character c, non-standard handwriting which can be correctly identified by other people or can be identified by only a few people or can be identified by the writers, wherein the identification refers to the establishment of the corresponding relation between the handwriting and the standard Chinese character c, and the establishment ofRepresenting the ith handwriting corresponding to the standard Chinese character c obtained by the handwriting equipment, (x) j ,y j ) Representing the position information of the pen point collected by the handwriting equipment at the j moment, wherein j is more than or equal to 1 and n is more than or equal to n i ,n i Representing the ith handwriting h i The number of the middle sampling points;
a3 Dividing handwriting according to strokes): the method comprises the steps of segmenting handwriting according to strokes by adopting a key point extraction algorithm, and setting T= { T 1 ,t 2 ,...,t N The time sequence of handwriting sampling points corresponding to the standard Chinese character c is represented, N represents the number of sampling points in the time sequence, and KT= { KT is set 1 ,kt 2 ,...,kt M The sequence of keypoints with respect to the time sequence of handwriting sampling points T obtained by a keypoint extraction algorithm, wherein 1=kt 1 <kt i-1 <kt i <kt M =n, 1 < i < M, M represents the number of key points in the handwriting sampling point time sequence T, let bt= { BT 1 ,bt 2 ,...,bt M-1 The sequence of strokes of the handwriting corresponding to the handwriting sampling point time sequence T defined by the key point sequence KT is represented, wherein the ith stroke bt in the handwriting sampling point time sequence T i From key point kt i 、kt i+1 Defining that i is more than or equal to 1 and less than M-1;
a4 Manually labeling the corresponding relation between the strokes in the handwriting and the strokes in the standard Chinese character c): manually marking the corresponding relation between each section of strokes in the handwriting and each section of strokes in the standard Chinese character c, and manually confirming the corresponding relation between the handwriting and the strokes of the standard Chinese character c in order to overcome the problem of inconsistency caused by handwriting randomness and obtain correct stroke types, wherein the stroke corresponding relation comprises the following four types: 1 to 1, one section of strokes in the standard Chinese character c corresponds to one section of strokes in the handwriting; 1 pair M is that one section of strokes in the standard Chinese character c corresponds to a plurality of sections of strokes in the handwriting; n is 1, which is that a plurality of strokes in the standard Chinese character c correspond to one stroke in the handwriting; m is that a plurality of strokes in the standard Chinese character c correspond to a plurality of strokes in the handwriting, the handwriting which is marked manually or the type of the strokes obtained through calculation, and the handwriting with the calculation accuracy passing the manual inspection is taken as the template handwriting;
A5 Creating an index entry for the template stroke sequence and the template stroke subsequence: let c= { s 1 ,s 2 ,...,s n ' representing standard stroke type for standard Chinese character cA sequence, where n represents the number of strokes contained in the standard stroke type sequence C, to which two new stroke types s are added 0 Sum s n+1 Obtaining a set C1=CU { s- 0 ,s n+1 (s is therein 0 Representing redundant virtual stroke types, s, in a preparation phase prior to starting writing n+1 Representing redundant strokes in ending stage after writing is finished, and setting a stroke index matrix corresponding to the set C1 as
0.ltoreq.i.ltoreq.n+1, 0.ltoreq.j.ltoreq.n+1, the element index in the stroke index matrix D corresponding to the stroke type number in set C1, wherein +.>The specific meaning of (c) is as follows: set S a ={s 1 ,s 2 ,...,s m Is a sequence of template strokes corresponding to set C1, known as S1 a ={s v-1 ,s v ,s v+1 ,...,s v+k S is } is a In (1), wherein S1 a Beginning with a stroke labeled as the ith segment type by a person or algorithm and ending with a stroke labeled as the jth segment type by a person or algorithm, in +.>There is one about S1 a Index item->Expressed in the a-th template stroke sequence related to standard Chinese character c, there is a template stroke subsequence with length of k from the beginning of the b-th stroke to the end of the e-th stroke, k=e-b+1, the template strokes The immediately preceding stroke of the first segment of the subsequence, i.e., the (b-1) th stroke of the template stroke sequence in which the template stroke subsequence is located, is identified as the (i) th stroke type in set C1, the immediately following stroke of the last segment of the template stroke subsequence, i.e., the (e+1) th stroke of the template stroke sequence in which the template stroke subsequence is located, is identified as the (j) th stroke type in set C1, the element (I) in matrix D>Is a set of index items, each index item in the set of index items records a template stroke subsequence of a template stroke sequence, the length of the template stroke subsequence is k segments, the first segment of strokes in the template stroke sequence to be recognized are recognized as the ith stroke type, the first segment of strokes in the template stroke sequence to be recognized are marked as the ith stroke type manually or by an algorithm, the ith stroke type is the ith stroke type in a set C1 about a standard Chinese character C, i is more than or equal to 0 and less than or equal to n+1, and all template handwriting S about the standard Chinese character C is scanned a For all template handwriting S a Establishing index items of a template stroke sequence and a template stroke sub-sequence;
a6 Ending).
3. A method of recognition of stroke types in an online handwriting according to claim 2 and wherein: the stroke recognition stage in the step B) comprises the following steps:
B1 Let H = { (x) 1 ,y 1 ),(x 2 ,y 2 ),...,(x n ,y n ) The method comprises the steps that a user inputs test handwriting corresponding to a standard Chinese character c through handwriting of a handwriting device, wherein the test handwriting refers to at least one section of handwriting which is unknown in stroke type and needs to be recognized in handwriting input through handwriting of the handwriting device, and the recognition stroke type refers to the establishment of a corresponding relation between strokes in handwriting and strokes in the standard Chinese character c;
b2 Dividing the test handwriting according to strokes, calling a key point extraction algorithm by adopting the same method as the step A3), and performing the following stepsDividing the test handwriting H according to strokes, setting K h ={kh 1 ,kh 2 ,...,kh m+1 The sequence of keypoints representing the test handwriting H resulting from a keypoint extraction algorithm, wherein 1=kh 1 <kh i-1 <kh i <kh m+1 N,1 < i.ltoreq.m, n representing the number of key points in the key point sequence, let B h ={bh 1 ,bh 2 ,...,bh m The test handwriting H is represented by a key point sequence K h Dividing the resulting stroke sequence, wherein the stroke sequence B h The ith stroke bh of (a) i From the key point kh i 、kh i+1 Defining that i is more than or equal to 1 and less than m, wherein m represents the number of segments of strokes in the stroke sequence;
b3 Initializing the identified stroke set G x And unrecognized Stroke sequence set B x Initializing a stroke sequence set U to be recognized x
The stroke sequence set U to be identified x A set of four tuples g1= (p 1, p2, p3, p 4), wherein p1, p2 represent the first stroke of the stroke sequence to be recognized at B h P3 represents the sequence number of the stroke type identified by the immediately preceding stroke of the stroke sequence to be identified in set C1, and p4 represents the sequence number of the stroke type identified by the immediately following stroke of the stroke sub-sequence to be identified in set C1;
the recognized strokes represent strokes in a test handwriting stroke sequence with determined stroke types according to marked stroke type data in a template handwriting library through calculation of a stroke type recognition algorithm;
the unrecognized strokes represent strokes in the test handwriting stroke sequence which cannot be determined yet according to the marked stroke type data in the template handwriting library through calculation of a stroke type recognition algorithm;
the strokes to be recognized are strokes which need to be recognized by a stroke type recognition algorithm and the stroke types of which are not determined yet;
the test handwriting strokes in the stroke sequence are unknown in the type under the initial state;
the direct predecessor of the unrecognized stroke sequence refers to the case that the beginning stroke of the unrecognized stroke sequence is at B h The sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q 1, and q is 1 < m, then the sequence number of the immediately preceding stroke of the unrecognized stroke sequence is q-1, for the beginning stroke at B h The sequence number 1 of the unrecognized stroke, the sequence number 0 of the immediately preceding stroke, and the type s of the stroke with the sequence number 0 in the unrecognized stroke 0 The stroke type s 0 The sequence number in set C1 is 0;
the immediately subsequent strokes of the unrecognized stroke sequence refer to the case where the end stroke of the unrecognized stroke sequence is at B h The sequence number of the non-recognized stroke sequence is r, r is not less than 1 and less than m, then the sequence number of the immediately subsequent stroke of the non-recognized stroke sequence is r+1, and the stroke is ended at B h The number of the unrecognized stroke subsequence with the number m is m+1, the number of the immediately following stroke is s n+1 The stroke type s n+1 The sequence number in set C1 is n+1, where m represents the test handwriting stroke sequence B h The number of segments of the middle stroke, n, represents the number of strokes contained in the standard stroke type sequence C;
the identified stroke set G x G2= (p 5, p 6) is formed from two-tuple elements, where p5 represents the sequence of recognized strokes in the test handwriting stroke B h P6 representing the sequence number of the stroke type in the standard stroke type sequence C, the doublet g2= (p 5, p 6) representing the test handwriting stroke sequence B h The p5 th segment of the stroke is identified as the p6 th segment of the stroke type in the standard stroke type sequence C;
the unidentified stroke sequence set B x G3= (p 7, p8, p9, p 10) is formed by four elements, wherein p7 and p8 respectively represent the first stroke of the unrecognized stroke sub-sequence in the test handwriting stroke sequence B h The meanings of p9 and p10 are the same as the meanings of p3 and p4 in the quadruple g 1;
initializing a set of stroke sequences to be recognized U x = { (1, m,0, n+1) }, initializing the set of recognized strokes G x For null, initialize the unidentified stroke sequence set B x Is empty;
b4 If the stroke sequence set U is to be recognized x Not empty, then from the set of stroke sequences to be identified U x An element g1 epsilon U is arbitrarily selected x Element g1 is derived from set U x In step B5), otherwise, set U x If the task is empty, identifying that the task is finished, and jumping to the step B9);
b5 To be recognized, stroke subsequence type recognition: carrying out stroke sequence recognition by taking four components in the selected element G1 as parameters to obtain a recognized stroke subsequence set G1 and an unrecognized stroke subsequence set U1, wherein the recognized stroke subsequence set G1 is composed of elements of a binary group, the meaning of each element in the binary group is the same as the meaning of the element in the binary group G2 in the step B3), the unrecognized stroke subsequence set U1 is composed of elements of a quaternary group, and the meaning of each element in the quaternary group is the same as the meaning of the element in the quaternary group G3 in the step B3);
B6 If the number of elements in the identified stroke subsequence set G1 is 1 and the unrecognized stroke subsequence set U1 is empty, jumping to step B7), otherwise jumping to step B8);
b7 To-be-recognized stroke sequence set U) x Put unidentified Stroke sequence set B x The method comprises the steps of carrying out a first treatment on the surface of the I.e. adding element g1 to the set of unrecognized stroke sequences B x Meaning a set of stroke sequences to be identified U x The stroke sequence represented by the element g1 cannot be identified by the algorithm, and the step B4) is skipped;
b8 Placing successfully recognized strokes into the recognized stroke set G) x Placing unidentified strokes into a stroke sequence set U to be identified x : i.e. adding an element in the set of recognized stroke subsequences G1 to the set of recognized strokes G x In the method, elements in the unidentified stroke sub-sequence set U1 are added to the stroke sequence set U to be identified x In step B4);
b9 End of stroke recognition if the stroke sequence set B is not recognized x Is not empty, and indicates that the set B of unrecognized stroke sequences does not exist in the template handwriting library x The test handwriting is defined in exactly the same or similar writing form,jumping to step A4), marking the unidentified stroke sub-segments manually, adding the marked stroke sequence into a template stroke sequence, updating the stroke index matrix D finally, increasing the representativeness of the template stroke sequence, and if the stroke sequence set B is not identified x For the blank, the successful recognition of all strokes in the test handwriting is described, and the recognition result is stored in the recognized stroke set G x Is a kind of medium.
4. A method of recognition of stroke types in an online handwriting according to claim 2 and wherein: the creating of the stroke index matrix corresponding to the set C1 in the step A5) includes the following steps:
a51 A) start;
a52 Initializing, setting each element in the stroke index matrix DI is more than or equal to 0 and less than or equal to n+1, j is more than or equal to 0 and less than or equal to n+1, and the +.>K is more than or equal to 0 and less than or equal to Max, wherein n represents the stroke number in the standard Chinese character c, max represents the preset maximum possible segment length, and a circulation variable x=1 is set;
a53 Taking the x-th manually marked template stroke sequence from the template handwriting libraryWherein k is j The jth stroke in the x-th template handwriting is manually marked as being equal to each k in the standard Chinese character c j The segment strokes correspond to each other, and k is more than or equal to 0 j ≤n+1,1≤j≤n x ,n x Representing the number of strokes in the x-th template handwriting, constructing a sequence with virtual start and end strokes +.>Wherein k is 0 =0 represents the virtual start stroke in standard kanji c, ++>Representing virtual ending strokes in the standard Chinese character c, wherein n represents the number of strokes in the standard Chinese character c;
a54 Setting a cyclic variable y=0;
A55 A set loop variable len=2;
a56 Z=y+len, if z.ltoreq.n) x +1, jump to step a 57), otherwise jump to step a 59);
a57 Extracting a stroke subsequence from the template stroke sequence, establishing an index item and warehousing: from template stroke sequence T x Extracting artificially marked type k of the y and z segments y And k z Adding an index item (x, y+1, z-1) toIn the set of the two-dimensional image data,wherein-> Is the kth in the stroke index matrix D y Line k z Column elements;
a58 Len=len+1, jump to step a 56);
a59 Y=y+1 if y is n x -1, jump to step a 55), otherwise jump to step a 510), wherein n x Representing the number of strokes in the x-th template handwriting;
a510 X=x+1, if x is not more than N, jump to step a 53), otherwise jump to step a 511), where N represents the number of template scripts in the template script library;
a511 At the end, the stroke index matrix D is returned.
CN201911224894.1A 2019-12-04 2019-12-04 Method for identifying stroke types in online handwriting Active CN111310548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911224894.1A CN111310548B (en) 2019-12-04 2019-12-04 Method for identifying stroke types in online handwriting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911224894.1A CN111310548B (en) 2019-12-04 2019-12-04 Method for identifying stroke types in online handwriting

Publications (2)

Publication Number Publication Date
CN111310548A CN111310548A (en) 2020-06-19
CN111310548B true CN111310548B (en) 2023-09-19

Family

ID=71147088

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911224894.1A Active CN111310548B (en) 2019-12-04 2019-12-04 Method for identifying stroke types in online handwriting

Country Status (1)

Country Link
CN (1) CN111310548B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541328B (en) * 2020-12-07 2022-04-01 四川大学 Handwriting storage method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0922443A (en) * 1995-07-05 1997-01-21 Nippon Telegr & Teleph Corp <Ntt> On-line handwritten character recognition method
JPH09147052A (en) * 1995-11-27 1997-06-06 Ricoh Co Ltd On-line hand-written character recognition method and character description method for on-line hand-written character recognition
US8050500B1 (en) * 2006-07-06 2011-11-01 Senapps, LLC Recognition method and system
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment
CN103810506A (en) * 2014-01-03 2014-05-21 南京师范大学 Method for identifying strokes of handwritten Chinese characters
CN103927532A (en) * 2014-04-08 2014-07-16 武汉汉德瑞庭科技有限公司 Handwriting registration method based on stroke characteristics
CN104008363A (en) * 2013-02-26 2014-08-27 佳能株式会社 Handwriting track detection, standardization and online-identification and abnormal radical collection
CN104680196A (en) * 2013-11-27 2015-06-03 夏普株式会社 Handwriting character recognizing method and system
CN105354538A (en) * 2015-10-13 2016-02-24 广东小天才科技有限公司 Chinese character handwriting recognition method and system
CN109472234A (en) * 2018-11-01 2019-03-15 北京爱知之星科技股份有限公司 A kind of method of handwriting input intelligent recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0922443A (en) * 1995-07-05 1997-01-21 Nippon Telegr & Teleph Corp <Ntt> On-line handwritten character recognition method
JPH09147052A (en) * 1995-11-27 1997-06-06 Ricoh Co Ltd On-line hand-written character recognition method and character description method for on-line hand-written character recognition
US8050500B1 (en) * 2006-07-06 2011-11-01 Senapps, LLC Recognition method and system
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment
CN104008363A (en) * 2013-02-26 2014-08-27 佳能株式会社 Handwriting track detection, standardization and online-identification and abnormal radical collection
CN104680196A (en) * 2013-11-27 2015-06-03 夏普株式会社 Handwriting character recognizing method and system
CN103810506A (en) * 2014-01-03 2014-05-21 南京师范大学 Method for identifying strokes of handwritten Chinese characters
CN103927532A (en) * 2014-04-08 2014-07-16 武汉汉德瑞庭科技有限公司 Handwriting registration method based on stroke characteristics
CN105354538A (en) * 2015-10-13 2016-02-24 广东小天才科技有限公司 Chinese character handwriting recognition method and system
CN109472234A (en) * 2018-11-01 2019-03-15 北京爱知之星科技股份有限公司 A kind of method of handwriting input intelligent recognition

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Chenglin Liu et al..Model-based stroke extraction and matching for hanwritten Chinese character recognition.《ELSEVIER》.2001,2339-2352. *
Improving Online Handwriting Text/Non-text Classification Accuracy Under Condition of Stroke Context Absence;Serhii Polotskyi et al.;《Advances in Computational Intelligence》;210-221 *
基于模板匹配的移动设备手写汉字笔画识别;洪洋;乔晓君;白晓东;;计算机工程与应用(20);110-115 *
赵海春 ; 林民 ; .面向字形分析的联机手写汉字笔画识别.内蒙古师范大学学报(自然科学汉文版).2008,(06),49-52. *
邹杰 ; 孙宝林 ; 於俊 ; .基于笔画特征的在线笔迹匹配算法.自动化学报.(11),142-155. *

Also Published As

Publication number Publication date
CN111310548A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2020063527A1 (en) Human hairstyle generation method based on multi-feature retrieval and deformation
Tang et al. Text-independent writer identification via CNN features and joint Bayesian
US6052481A (en) Automatic method for scoring and clustering prototypes of handwritten stroke-based data
CN113343707A (en) Scene text recognition method based on robustness characterization learning
WO2020011069A1 (en) Feature processing method and device for motion trajectory, and computer storage medium
Shi et al. Stroke detector and structure based models for character recognition: a comparative study
Zhong et al. Handwritten Chinese character recognition with spatial transformer and deep residual networks
CN112633423B (en) Training method of text recognition model, text recognition method, device and equipment
CN111310548B (en) Method for identifying stroke types in online handwriting
Addis et al. Printed ethiopic script recognition by using lstm networks
Arafat et al. Two stream deep neural network for sequence-based Urdu ligature recognition
CN113361666B (en) Handwritten character recognition method, system and medium
CN114220179A (en) On-line handwritten signature handwriting retrieval method and system based on faiss
JP2005084765A (en) Character recognition device, method and program thereof
Darma Implementation of Zoning and K-Nearest Neighbor in Character Recognition of Wrésastra Script
Su et al. Discriminative transformation for multi-dimensional temporal sequences
CN115795394A (en) Biological feature fusion identity recognition method for hierarchical multi-modal and advanced incremental learning
CN111382703B (en) Finger vein recognition method based on secondary screening and score fusion
CN115641617A (en) Fingerprint identification encryption system based on feature vector
EP4095749A1 (en) Method and system for verifying dynamic handwriting and signatures by means of deep learning
Su et al. Improving HMM-based Chinese handwriting recognition using delta features and synthesized string samples
CN108334884B (en) Handwritten document retrieval method based on machine learning
CN111310543A (en) Method for extracting and authenticating stroke connecting stroke characteristics in online handwriting authentication
CN111310546A (en) Method for extracting and authenticating writing rhythm characteristics in online handwriting authentication
Xiao et al. On-line handwritten Chinese character recognition directed by components with dynamic templates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant