CN111738167A - Method for recognizing unconstrained handwritten text image - Google Patents

Method for recognizing unconstrained handwritten text image Download PDF

Info

Publication number
CN111738167A
CN111738167A CN202010589597.3A CN202010589597A CN111738167A CN 111738167 A CN111738167 A CN 111738167A CN 202010589597 A CN202010589597 A CN 202010589597A CN 111738167 A CN111738167 A CN 111738167A
Authority
CN
China
Prior art keywords
text
sequence
unconstrained
character
writing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010589597.3A
Other languages
Chinese (zh)
Inventor
周度
毛慧芸
刘曼飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Original Assignee
South China University of Technology SCUT
Zhuhai Institute of Modern Industrial Innovation of South China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT, Zhuhai Institute of Modern Industrial Innovation of South China University of Technology filed Critical South China University of Technology SCUT
Priority to CN202010589597.3A priority Critical patent/CN111738167A/en
Publication of CN111738167A publication Critical patent/CN111738167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/333Preprocessing; Feature extraction
    • G06V30/347Sampling; Contour coding; Stroke extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a method for identifying an unconstrained handwritten text image, which comprises the following steps: s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data; s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1; s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension; and S4, outputting an identification result through a CTC transcription layer. The invention not only can effectively process the continuous writing problem in the hand-written text characters, but also can effectively process the unconstrained space relation between the characters, which includes: horizontal writing, longitudinal writing, overlapped writing, multi-line writing, slant writing, steering writing, and the like. The invention combines a large amount of marked unconstrained handwritten texts and can train a system capable of accurately identifying the unconstrained handwritten texts.

Description

Method for recognizing unconstrained handwritten text image
Technical Field
The invention relates to the technical field of computer vision and deep learning, in particular to a recognition method of an unconstrained handwritten text image.
Background
On-line handwritten Chinese text recognition generally refers to a recognition technology that a user writes Chinese characters through handwriting input equipment such as a handwriting pad, a touch screen and a mouse, and a computer converts writing tracks of the Chinese characters acquired by the handwriting input equipment into corresponding internal codes of a Chinese character machine, and the recognition technology is widely applied to the fields of screen handwriting input methods, demonstration interaction, document transmission and the like.
In recognition technology, the Chinese text is very different from the Western language because of the problems of large character set and similar characters. In recent years, researchers have proposed various solutions for recognizing handwritten texts. For example, segmentation-based methods have performed well and have had a large impact. However, when the method is applied, the character segmentation problem is easily encountered, and risks are brought to subsequent identification; the recognition method of the integrated CNN and LSTM requires the network to project the sequential pen point tracks into the feature map through a path signature or an eight-direction feature extraction method, however, the method is not natural enough and is not suitable for the problems of overlapping handwritten texts and the like; the LSTM-based method can naturally capture the temporal dynamics of the pen tip trajectory and has been used for character and text recognition without any pre-processing such as feature extraction and over-segmentation. However, the above methods can only be used to solve the problems of general discontinuous writing and simple horizontal handwritten text, and cannot take into account the unconstrained spatial relationship between characters. Therefore, a new method for recognizing an unconstrained handwritten text image is urgently needed to accurately recognize writing types such as horizontal writing, longitudinal writing, overlapped writing, multi-line writing, oblique writing, and jotting.
Disclosure of Invention
The invention aims to provide a method for recognizing an unconstrained handwritten text image, which solves the problems in the prior art and can accurately recognize texts in writing types such as horizontal writing, longitudinal writing, overlapped writing and the like.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a method for identifying an unconstrained handwritten text image, which comprises the following steps:
s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data;
s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1;
s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension;
and S4, outputting an identification result through a CTC transcription layer.
Preferably, the step S1 is as follows: nib trace in handwritten text I0={(xt,yt,st) T1, 2.., T }; wherein T represents the number of the middle points of the pen point track, T represents the serial number of the middle points of the pen point track, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttRepresenting the strokes to which the t points belong specifically; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-yt,t=1,2,…,T-1。
Preferably, step S2 is specifically as follows: first, the points whose parameters satisfy the following formula are removed to screen the pen-tip locus of the handwritten text preprocessed in step S1:
Δx2+Δy2<Tdis
Figure BDA0002554938470000021
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2, …, T }; wherein II (. cndot.) has the following meaning: when true, II (·) 1, when false, II (·) 0); wherein, TdisIs a manually set value used to evaluate the Euclidean distance between the current point and the next point; t isdisIs a manually set value used to determine whether a point is in a straight line or near a straight lineOn the trace.
Preferably, the multilayer distillation GRU is a normal GRU plus the following operations: suppose the output h of a normal GRU is equal to (h)1,h2,…,hT),hT∈RDFirst, every N time nodes are grouped into one group, and h 'is made (h'1,h'2,…,h'T/N) Wherein h't=[ht*B+0;ht*N+1;…;ht*N+(N-1)],h't∈RND(ii) a Then, the feature vector h 'is further processed't∈RNDMapping to another feature space
Figure BDA0002554938470000031
The required mapping formula is as follows:
Figure BDA0002554938470000032
preferably, the transcription layer is guided by joint temporal classification (CTC) in step S4, and no pre-alignment is required between the input image and its corresponding tag sequence in the training process, and the character set C' is C ∪ { blank }, where C represents all the characters used in the character set and "blank" represents a null mapping, and the input sequence u of a given length T is (u) is1,u2,…,uT) And u isT∈R|C′|Obtaining an exponential long tag sequence of length T, represented by a family of pi, by assigning a tag to each time point and concatenating the tags to form a tag sequence; the probability for each sequence is calculated using the following formula:
Figure BDA0002554938470000033
in CTC, sequence-to-sequence arithmetic
Figure BDA0002554938470000034
For mapping the aligned transcript L;
Figure BDA0002554938470000035
deleting the duplicate tag first, and then deleting the "blank" part; the total probability of a transcription is calculated by summing the probabilities of all alignments that correspond to it:
p(L|u)=∑π:B(π)=Lp(π|u)。
meanwhile, the invention also discloses a data enhancement technology for generating a text-level data set by the character-level data set, which comprises the following steps: in character-level data, using
Figure BDA0002554938470000036
And
Figure BDA0002554938470000037
respectively representing the minimum and maximum coordinate values of the ith character in the X-axis direction
Figure BDA0002554938470000038
And
Figure BDA0002554938470000039
x-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis; the following types of handwritten text are synthesized: level of
Figure BDA0002554938470000041
Figure BDA0002554938470000042
Vertical direction
Figure BDA0002554938470000043
Overlap
Figure BDA0002554938470000044
Multiple rows and columns
Figure BDA0002554938470000045
Steering
Figure BDA0002554938470000046
Tilting
Figure BDA0002554938470000047
It is expressed by the following formula:
Figure BDA0002554938470000048
Figure BDA0002554938470000049
Figure BDA00025549384700000410
Figure BDA00025549384700000411
Figure BDA00025549384700000412
Figure BDA00025549384700000413
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
The invention discloses the following technical effects: (1) in the technical scheme of the invention, only the change between adjacent points, namely the movement of the pen point is concerned, and the relative track of the movement of the pen point is more stable than the handwriting coordinate and the handwriting distribution, so that the network load can be greatly reduced. (2) The invention not only can rapidly and effectively process the continuous writing problem in the hand-written text characters, but also can effectively process the unconstrained space relation between the characters, which includes: horizontal writing, longitudinal writing, overlapped writing, multi-line writing, slant writing, steering writing, and the like. The invention combines a large amount of marked unconstrained handwritten texts to train a system capable of accurately identifying the unconstrained handwritten texts, and has higher practical value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of a type of unconstrained handwritten text image;
FIG. 2 is a flow chart of a method for recognition of unconstrained handwritten text images in accordance with the present invention;
FIG. 3 is a detailed flow chart of the present invention for recognizing unconstrained handwritten text;
fig. 4 is a structural diagram of a general GRU network.
In fig. 1, the text image is written in horizontal, vertical, overlapping, oblique, turning, and multi-line types.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures 1 to 4 are described in further detail below.
The invention provides a method for identifying an unconstrained handwritten text image, which comprises the following steps:
an original image acquisition section:
in deep learning, supervised learning requires a large amount of labeled data, specifically, in the present embodiment, various types of pictures as shown in fig. 1 and labels corresponding to the text of "southern university of china" are required. The data sets in this embodiment are from the real training data set CASIA-oldhdb 2.0-2.2 and various types of unconstrained handwritten text sets generated from the CASIA-oldhdb 1.0-1.2 character set using data enhancement techniques described below. CASIA-OLHWDB2.0-2.2 contains 4072 pages for 41710 lines of handwritten text, 1082220 characters, and all characters can be divided into 2650 classes. CASIA-OLHWDB1.0-1.2 contains 3129496 character data.
Data enhancement techniques to generate text-level data sets from character-level data sets: in character-level data, using
Figure BDA0002554938470000061
And
Figure BDA0002554938470000062
respectively representing the minimum and maximum coordinate values of the ith character in the X-axis direction
Figure BDA0002554938470000063
And
Figure BDA0002554938470000064
x-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis. The following types of handwritten text are synthesized using the above method: level of
Figure BDA0002554938470000065
Vertical direction
Figure BDA0002554938470000066
Overlap
Figure BDA0002554938470000067
Multiple rows and columns
Figure BDA0002554938470000068
Steering
Figure BDA0002554938470000069
Tilting
Figure BDA00025549384700000610
It is expressed by the following formula:
Figure BDA00025549384700000611
Figure BDA00025549384700000612
Figure BDA0002554938470000071
Figure BDA0002554938470000072
Figure BDA0002554938470000073
Figure BDA0002554938470000074
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
The hot, unconstrained handwritten text set OHCTR dataset used in the icdra 2013 chinese handwritten text recognition competition was used to test system performance, and in addition, the resultant dataset was also used to evaluate the robustness of the framework. Accuracy-based experimental results on an ICDAR race set. If a common gated recurrent neural network (LSTM) is used in the feature refinement stage, the system recognition rate is 88.31 and the training time is 208 hours. After the gated recurrent neural network (LSTM) of the first two layers in the identification system is replaced by the distilled gated recurrent neural network proposed by the patent, the training time is reduced from the original 208 hours to 95 hours, and the identification rate is 88.33 and basically remains unchanged. After the synthetic text is introduced, the recognition rate of training time is increased from 88.33% to 91.36%, which is greatly improved, and the comparison table is shown in the following table 1:
TABLE 1
Method of producing a composite material Accuracy AR (%) Training time (h)
Multilayer GRU 88.31 208
Distillation GRU 88.33 95
Synthesized text 91.36 102
The recognition technique described above will be explained by taking the recognition flowchart of the text "ready to be expected by the viewer" shown in fig. 3 as an example,
pretreatment:
various kinds of deformation of handwritten characters are caused by different writing modes of people in the process of writing the characters. Especially for unconstrained handwritten texts, the types of deformation of the same character are very many due to the problems of continuous strokes, dragging strokes and large stroke thickness difference caused by different writing tools. The original image will have some distortion and noise which will also affect the recognition. The preprocessing process can overcome the influence caused by the deformation to a certain extent, fully exerts the performance of the feature extraction and the classifier, and plays an important role in improving the recognition performance.
In particular to the embodiment, the pen-tip trajectory input I of various unconstrained handwritten text images0Can be abstracted into formula I0={(xt,yt,st) T1, 2.., T }; where t represents the serial number of the midpoint of the pen tip trace, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttThe specific strokes of the t point are expressed, and it needs to be explained that the track drawn by the process of pen point falling and pen lifting is taken as one stroke, not the stroke in the meaning of characters; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-ytT is 1,2, …, T-1. In the prior art, in the process of preprocessing unconstrained handwritten text, a pen point track is generally input into an I0Abstraction is formula I0={(xt,yt,st) T1, 2.., T }; the input characteristics x and y cannot be normalized in a given interval, so that extra burden is brought to a network, the complex sequence property of the unconstrained handwritten text cannot be adapted, and therefore, after the pen point track of the input handwritten text is abstracted, only the change between adjacent points of the input handwritten text, namely the relative displacement (delta x) of the pen point, is concerned in the embodimentt、Δyt) So that the handwriting is seated due to the relative displacement of the pen pointsThe marks and handwriting distribution are much more stable, from the viewpoint, the series-connected type unconstrained texts have very similar characteristic patterns, and the only difference between different text styles is the movement of pen points between characters, so that the network burden can be greatly reduced compared with the traditional method.
Feature extraction section
A difficulty with pattern recognition is the large semantic gap between the pattern to be recognized and the pattern that can be processed by a computer. The study of text recognition focuses on the understanding of a particular class of images on a two-dimensional plane, and computers are essentially one-dimensional processing machines. Feature extraction is a powerful tool for filling this semantic gap. It resolves the two-dimensional problem into a one-dimensional problem, and the point-shake image expression into a pattern expression form, thereby resolving the problem to be understood into a computational problem that von neumann machines are adept at.
As shown in FIG. 3, with the online text entry system "ready for viewer expectations", the system first removes redundant points in the text. The guiding idea is that if the distance between a certain point and the previous point is very small or the angle formed by the certain point and the previous two points is flat (i.e. almost on a straight line), the point is considered as a redundant point and will be removed, specifically to this embodiment, the point whose parameter satisfies the following formula is removed first to screen the pen-tip trajectory of the preprocessed handwritten text:
Δx2+Δy2<Tdis
Figure BDA0002554938470000091
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2, …, T }; wherein II (. cndot.) has the following meaning: when true, II (·) 1, when false, II (·) 0); t isdisIs a manually set value whose purpose is to evaluate the euclidean distance of the current point from the next point, if this distance is very small, even equal to zero (overlapping points), then the current point isThe dots will be removed. T iscosIs also a manually set value whose purpose is to determine if a point is on a straight line or near-direct writing and, if so, to remove it directly. The two values need to be manually set, if the two values are too small, sampling points have great redundancy, and the calculation speed is slow; if too large, the sampling point is too sparse again, and the precision is lost. The setting depends on the trade-off between the speed and accuracy of the system operation by the user. And finally, generating a characteristic sequence with each element being a four-dimensional vector by using coordinates of the rest points in the text, wherein i-th strokes in the graph represent that the elements are grouped according to strokes (one stroke is calculated between the pen-down stroke and the pen-up stroke instead of the Chinese character stroke).
After the signature sequence was generated, we input it into a distillation GRU. Conventional Recurrent Neural Networks (RNNs) are used to process sequence data through a recursive mechanism. However, it has poor effect when dealing with long-term sequence problems, and the information weight far away from each other is continuously reduced, almost having no influence on the subsequent results, and meanwhile, it also increases the calculation burden, and the system performance is greatly influenced. Gated recurrent neural networks can solve such problems very well. The GRU model is a variant of the LSTM model, and by integrating the "input gate" and "forget gate" in the LSTM model to obtain an "update gate", the detailed structure of the GRU unit is shown in fig. 4, which can make the network computation less and convergence easier. In the system, the outputs of two GRU units are cascaded on the input of one GRU unit to form a distillation GRU network, and experiments prove that the method can effectively reduce the training time and cannot reduce the recognition rate.
Constructing a model:
the method comprises the steps of building an integral model of the system, and adopting a keras deep learning framework to construct the network of the system in order to improve the efficiency and the calculation speed of the constructed system and meet the requirement of quick response. keras is a high-level neural network API written in pure Python language, to which both tensiflow, thano and CNTK can be applied as a back-end. The system structure diagram of this embodiment is shown in fig. 3, and the method includes acquiring a time sequence input of a handwritten text, extracting a feature map through processing, then passing through a double-layer distillation GRU network including a distillation GRU layer and a normal GRU layer, and finally passing through a CTC transcription layer to obtain a result.
Model training
After the model is designed and sufficient data is available, model training can be performed. The purpose of model training is to compare the output result of a large amount of data after model calculation with the label corresponding to the data, so as to adjust the parameters of the network in the model, and finally the model can identify the data of the same type. In this embodiment, a pen-tip trajectory of a handwritten text is represented by l, where l is { (x)t,yt,st) Where T denotes the number of dots, (x) 1,2, …, T }, where T denotes the number of dotst,yt) Denotes the abscissa, s, of the t-th pointtIndicating to which stroke this point belongs in particular. Given a handwritten text training data set Q and samples (l, z), where z represents the label to which the data corresponds, the loss function l (Q) of the model network is expressed as:
Figure BDA0002554938470000111
according to the invention, a random steepest descent method SGD is taken as an optimization algorithm, and a GeForceTitan-XGPUs display card is used, so that convergence can be achieved in about 3-4 days.
We assume the character set to be C' ═ C ∪ { blank }, where C denotes all characters that may be included in the text line recognition task and "blank" denotes the output class to be empty and no output1,u2,…,uT) Wherein u ist∈R|C'|Here, each time point is assigned with a tag and all time points are combined to obtain a tag sequence, and as a result, a large number of tags (denoted as pi) with a sequence length T can be obtained. The probability for each tag sequence was calculated as:
Figure BDA0002554938470000121
in CTC, map operations
Figure BDA0002554938470000122
The tag sequence pi can be mapped to a sequence l, i.e. the final output recognition character sequence.
Figure BDA0002554938470000123
Duplicate characters in the tag sequence, separated by "blank", are first removed, and then "blank" is removed.
As shown in the figure, here is listed a three tag sequence pi (the space character is replaced with "_" for ease of recognition):
π1: backup-audience-period-waiting-for
π2: standby, watched, many, standby
π3: standby-watched-many period-standby
When such a three-tag sequence is subjected to removal of repeated characters between space characters and then space characters, a correct text sequence of "expected by the viewer" can be output.
The invention combines a large amount of marked unconstrained handwritten texts and can train a system capable of accurately identifying the unconstrained handwritten texts.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (6)

1. A method for recognizing an unconstrained handwritten text image, comprising the steps of:
s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data;
s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1;
s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension;
and S4, outputting an identification result through a CTC transcription layer.
2. The method for recognizing an unconstrained handwritten text image as claimed in claim 1, wherein step S1 is as follows: nib trace in handwritten text I0={(xt,yt,st) T1, 2.., T }; wherein T represents the number of the middle points of the pen point track, T represents the serial number of the middle points of the pen point track, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttRepresenting the strokes to which the t points belong specifically; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-yt,t=1,2,…,T-1。
3. The method for recognizing an unconstrained handwritten text image as claimed in claim 1, wherein step S2 is as follows: first, the points whose parameters satisfy the following formula are removed to screen the pen-tip locus of the handwritten text preprocessed in step S1:
Δx2+Δy2<Tdis
Figure FDA0002554938460000011
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2,. ·, T }; wherein
Figure FDA0002554938460000012
The meanings are as follows: when the product is true
Figure FDA0002554938460000013
Is false time
Figure FDA0002554938460000014
Wherein, TdisIs a manually set value used to evaluate the Euclidean distance between the current point and the next point; t isdisIs a manually set value used to determine whether a point is on a straight line or near-straight line of writing.
4. A method of recognition of an unconstrained handwritten text image as claimed in claim 1, characterized in that said multilayer distilled GRU is a normal GRU plus the following operations: suppose the output h of a normal GRU is equal to (h)1,h2,...,hT),hT∈RDFirst, every N time nodes are grouped into one group, and h 'is made (h'1,h′2,...,h′T/N) Wherein h't=[ht*N+0;ht*N+1;...;ht*N+(N-1)],h′t∈RND(ii) a Then, the feature vector h 'is further processed't∈ RND mapping to another feature space
Figure FDA0002554938460000021
The required mapping formula is as follows:
Figure FDA0002554938460000022
5. the method of claim 1, wherein the transcription layer is guided by joint temporal classification (CTC) in step S4, and the input image and its corresponding label sequence do not need to be aligned in advance during training, wherein the character set C 'is C ∪ { blank }, wherein C represents all characters used in the character set, blank' represents null mapping, and the input sequence u with a given length T is (u) is1,u2,...,uT) And u isT∈R|C|Obtaining an exponential long tag sequence of length T, represented by a family of pi, by assigning a tag to each time point and concatenating the tags to form a tag sequence; the probability for each sequence is calculated using the following formula:
Figure FDA0002554938460000023
in CTC, sequence-to-sequence arithmetic
Figure FDA0002554938460000024
For mapping the aligned transcript L;
Figure FDA0002554938460000025
delete duplicate tags first, then delete "blank" part; the total probability of a transcription is calculated by summing the probabilities of all alignments that correspond to it:
p(L|u)=∑π:B(π)=Lp(π|u)。
6. a data enhancement technique for generating a text-level data set from a character-level data set, in which character-level data is generated using
Figure FDA0002554938460000031
And
Figure FDA0002554938460000032
respectively representing the minimum and maximum coordinate values of the ith character in the X-axis direction
Figure FDA0002554938460000033
And
Figure FDA0002554938460000034
x-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis; the following types of handwritten text are synthesized: level of
Figure FDA0002554938460000035
Vertical direction
Figure FDA0002554938460000036
Overlap
Figure FDA0002554938460000037
Multiple rows and columns
Figure FDA0002554938460000038
Steering
Figure FDA0002554938460000039
Tilting
Figure FDA00025549384600000310
It is expressed by the following formula:
Figure FDA00025549384600000311
Figure FDA00025549384600000312
Figure FDA00025549384600000313
Figure FDA00025549384600000314
Figure FDA00025549384600000315
Figure FDA00025549384600000316
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
CN202010589597.3A 2020-06-24 2020-06-24 Method for recognizing unconstrained handwritten text image Pending CN111738167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010589597.3A CN111738167A (en) 2020-06-24 2020-06-24 Method for recognizing unconstrained handwritten text image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010589597.3A CN111738167A (en) 2020-06-24 2020-06-24 Method for recognizing unconstrained handwritten text image

Publications (1)

Publication Number Publication Date
CN111738167A true CN111738167A (en) 2020-10-02

Family

ID=72651026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010589597.3A Pending CN111738167A (en) 2020-06-24 2020-06-24 Method for recognizing unconstrained handwritten text image

Country Status (1)

Country Link
CN (1) CN111738167A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408387A (en) * 2021-06-10 2021-09-17 中金金融认证中心有限公司 Method for generating handwritten text data for complex writing scene and computer product
CN114241495A (en) * 2022-02-28 2022-03-25 天津大学 Data enhancement method for offline handwritten text recognition
CN114529910A (en) * 2022-01-27 2022-05-24 北京鼎事兴教育咨询有限公司 Handwritten character recognition method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893968A (en) * 2016-03-31 2016-08-24 华南理工大学 Text-independent end-to-end handwriting recognition method based on deep learning
CN106570456A (en) * 2016-10-13 2017-04-19 华南理工大学 Handwritten Chinese character recognition method based on full-convolution recursive network
CN108154136A (en) * 2018-01-15 2018-06-12 众安信息技术服务有限公司 For identifying the method, apparatus of writing and computer-readable medium
US20200143191A1 (en) * 2018-11-02 2020-05-07 Iflytek Co., Ltd. Method, apparatus and storage medium for recognizing character

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893968A (en) * 2016-03-31 2016-08-24 华南理工大学 Text-independent end-to-end handwriting recognition method based on deep learning
CN106570456A (en) * 2016-10-13 2017-04-19 华南理工大学 Handwritten Chinese character recognition method based on full-convolution recursive network
CN108154136A (en) * 2018-01-15 2018-06-12 众安信息技术服务有限公司 For identifying the method, apparatus of writing and computer-readable medium
US20200143191A1 (en) * 2018-11-02 2020-05-07 Iflytek Co., Ltd. Method, apparatus and storage medium for recognizing character

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘曼飞: "基于深度学习的联机手写汉字分析与识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408387A (en) * 2021-06-10 2021-09-17 中金金融认证中心有限公司 Method for generating handwritten text data for complex writing scene and computer product
CN114529910A (en) * 2022-01-27 2022-05-24 北京鼎事兴教育咨询有限公司 Handwritten character recognition method and device, storage medium and electronic equipment
CN114241495A (en) * 2022-02-28 2022-03-25 天津大学 Data enhancement method for offline handwritten text recognition
CN114241495B (en) * 2022-02-28 2022-05-03 天津大学 Data enhancement method for off-line handwritten text recognition

Similar Documents

Publication Publication Date Title
Xie et al. Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition
Tang et al. Text-independent writer identification via CNN features and joint Bayesian
Pal et al. Handwriting recognition in indian regional scripts: a survey of offline techniques
Tagougui et al. Online Arabic handwriting recognition: a survey
Nguyen et al. A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks
CN111738167A (en) Method for recognizing unconstrained handwritten text image
CN108664975B (en) Uyghur handwritten letter recognition method and system and electronic equipment
Al-Omari et al. Handwritten Indian numerals recognition system using probabilistic neural networks
Panyam et al. Modeling of palm leaf character recognition system using transform based techniques
Tan et al. A new handwritten character segmentation method based on nonlinear clustering
CN112651323B (en) Chinese handwriting recognition method and system based on text line detection
CN115461792A (en) Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium
Saba et al. Online versus offline Arabic script classification
Ghods et al. Decision fusion of horizontal and vertical trajectories for recognition of online Farsi subwords
Mohammad et al. Contour-based character segmentation for printed Arabic text with diacritics
Singh et al. A bilingual (Gurmukhi-Roman) online handwriting identification and recognition system
Singh et al. Indic script family and its offline handwriting recognition for characters/digits and words: a comprehensive survey
Mukherjee et al. Fusion of spatio-temporal information for indic word recognition combining online and offline text data
Magrina Convolution Neural Network based Ancient Tamil Character Recognition from Epigraphical Inscriptions
Bahashwan et al. Efficient segmentation of arabic handwritten characters using structural features.
Abirami et al. Handwritten mathematical recognition tool
Jung et al. On-line recognition of cursive Korean characters using graph representation
Ramakrishnan et al. Development of OHWR system for Kannada
Assaleh et al. Recognition of handwritten Arabic alphabet via hand motion tracking
Álvarez et al. Fuzzy system for intelligent word recognition using a regular grammar

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201002