CN111738167A - Method for recognizing unconstrained handwritten text image - Google Patents
Method for recognizing unconstrained handwritten text image Download PDFInfo
- Publication number
- CN111738167A CN111738167A CN202010589597.3A CN202010589597A CN111738167A CN 111738167 A CN111738167 A CN 111738167A CN 202010589597 A CN202010589597 A CN 202010589597A CN 111738167 A CN111738167 A CN 111738167A
- Authority
- CN
- China
- Prior art keywords
- text
- sequence
- unconstrained
- character
- writing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000004821 distillation Methods 0.000 claims abstract description 9
- 238000013518 transcription Methods 0.000 claims abstract description 8
- 230000035897 transcription Effects 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 8
- 230000002123 temporal effect Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 10
- 238000013528 artificial neural network Methods 0.000 description 6
- 230000000306 recurrent effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012821 model calculation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000002945 steepest descent method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/333—Preprocessing; Feature extraction
- G06V30/347—Sampling; Contour coding; Stroke extraction
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a method for identifying an unconstrained handwritten text image, which comprises the following steps: s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data; s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1; s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension; and S4, outputting an identification result through a CTC transcription layer. The invention not only can effectively process the continuous writing problem in the hand-written text characters, but also can effectively process the unconstrained space relation between the characters, which includes: horizontal writing, longitudinal writing, overlapped writing, multi-line writing, slant writing, steering writing, and the like. The invention combines a large amount of marked unconstrained handwritten texts and can train a system capable of accurately identifying the unconstrained handwritten texts.
Description
Technical Field
The invention relates to the technical field of computer vision and deep learning, in particular to a recognition method of an unconstrained handwritten text image.
Background
On-line handwritten Chinese text recognition generally refers to a recognition technology that a user writes Chinese characters through handwriting input equipment such as a handwriting pad, a touch screen and a mouse, and a computer converts writing tracks of the Chinese characters acquired by the handwriting input equipment into corresponding internal codes of a Chinese character machine, and the recognition technology is widely applied to the fields of screen handwriting input methods, demonstration interaction, document transmission and the like.
In recognition technology, the Chinese text is very different from the Western language because of the problems of large character set and similar characters. In recent years, researchers have proposed various solutions for recognizing handwritten texts. For example, segmentation-based methods have performed well and have had a large impact. However, when the method is applied, the character segmentation problem is easily encountered, and risks are brought to subsequent identification; the recognition method of the integrated CNN and LSTM requires the network to project the sequential pen point tracks into the feature map through a path signature or an eight-direction feature extraction method, however, the method is not natural enough and is not suitable for the problems of overlapping handwritten texts and the like; the LSTM-based method can naturally capture the temporal dynamics of the pen tip trajectory and has been used for character and text recognition without any pre-processing such as feature extraction and over-segmentation. However, the above methods can only be used to solve the problems of general discontinuous writing and simple horizontal handwritten text, and cannot take into account the unconstrained spatial relationship between characters. Therefore, a new method for recognizing an unconstrained handwritten text image is urgently needed to accurately recognize writing types such as horizontal writing, longitudinal writing, overlapped writing, multi-line writing, oblique writing, and jotting.
Disclosure of Invention
The invention aims to provide a method for recognizing an unconstrained handwritten text image, which solves the problems in the prior art and can accurately recognize texts in writing types such as horizontal writing, longitudinal writing, overlapped writing and the like.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a method for identifying an unconstrained handwritten text image, which comprises the following steps:
s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data;
s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1;
s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension;
and S4, outputting an identification result through a CTC transcription layer.
Preferably, the step S1 is as follows: nib trace in handwritten text I0={(xt,yt,st) T1, 2.., T }; wherein T represents the number of the middle points of the pen point track, T represents the serial number of the middle points of the pen point track, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttRepresenting the strokes to which the t points belong specifically; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-yt,t=1,2,…,T-1。
Preferably, step S2 is specifically as follows: first, the points whose parameters satisfy the following formula are removed to screen the pen-tip locus of the handwritten text preprocessed in step S1:
Δx2+Δy2<Tdis
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2, …, T }; wherein II (. cndot.) has the following meaning: when true, II (·) 1, when false, II (·) 0); wherein, TdisIs a manually set value used to evaluate the Euclidean distance between the current point and the next point; t isdisIs a manually set value used to determine whether a point is in a straight line or near a straight lineOn the trace.
Preferably, the multilayer distillation GRU is a normal GRU plus the following operations: suppose the output h of a normal GRU is equal to (h)1,h2,…,hT),hT∈RDFirst, every N time nodes are grouped into one group, and h 'is made (h'1,h'2,…,h'T/N) Wherein h't=[ht*B+0;ht*N+1;…;ht*N+(N-1)],h't∈RND(ii) a Then, the feature vector h 'is further processed't∈RNDMapping to another feature spaceThe required mapping formula is as follows:
preferably, the transcription layer is guided by joint temporal classification (CTC) in step S4, and no pre-alignment is required between the input image and its corresponding tag sequence in the training process, and the character set C' is C ∪ { blank }, where C represents all the characters used in the character set and "blank" represents a null mapping, and the input sequence u of a given length T is (u) is1,u2,…,uT) And u isT∈R|C′|Obtaining an exponential long tag sequence of length T, represented by a family of pi, by assigning a tag to each time point and concatenating the tags to form a tag sequence; the probability for each sequence is calculated using the following formula:
in CTC, sequence-to-sequence arithmeticFor mapping the aligned transcript L;deleting the duplicate tag first, and then deleting the "blank" part; the total probability of a transcription is calculated by summing the probabilities of all alignments that correspond to it:
p(L|u)=∑π:B(π)=Lp(π|u)。
meanwhile, the invention also discloses a data enhancement technology for generating a text-level data set by the character-level data set, which comprises the following steps: in character-level data, usingAndrespectively representing the minimum and maximum coordinate values of the ith character in the X-axis directionAndx-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis; the following types of handwritten text are synthesized: level of Vertical directionOverlapMultiple rows and columnsSteeringTiltingIt is expressed by the following formula:
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
The invention discloses the following technical effects: (1) in the technical scheme of the invention, only the change between adjacent points, namely the movement of the pen point is concerned, and the relative track of the movement of the pen point is more stable than the handwriting coordinate and the handwriting distribution, so that the network load can be greatly reduced. (2) The invention not only can rapidly and effectively process the continuous writing problem in the hand-written text characters, but also can effectively process the unconstrained space relation between the characters, which includes: horizontal writing, longitudinal writing, overlapped writing, multi-line writing, slant writing, steering writing, and the like. The invention combines a large amount of marked unconstrained handwritten texts to train a system capable of accurately identifying the unconstrained handwritten texts, and has higher practical value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of a type of unconstrained handwritten text image;
FIG. 2 is a flow chart of a method for recognition of unconstrained handwritten text images in accordance with the present invention;
FIG. 3 is a detailed flow chart of the present invention for recognizing unconstrained handwritten text;
fig. 4 is a structural diagram of a general GRU network.
In fig. 1, the text image is written in horizontal, vertical, overlapping, oblique, turning, and multi-line types.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures 1 to 4 are described in further detail below.
The invention provides a method for identifying an unconstrained handwritten text image, which comprises the following steps:
an original image acquisition section:
in deep learning, supervised learning requires a large amount of labeled data, specifically, in the present embodiment, various types of pictures as shown in fig. 1 and labels corresponding to the text of "southern university of china" are required. The data sets in this embodiment are from the real training data set CASIA-oldhdb 2.0-2.2 and various types of unconstrained handwritten text sets generated from the CASIA-oldhdb 1.0-1.2 character set using data enhancement techniques described below. CASIA-OLHWDB2.0-2.2 contains 4072 pages for 41710 lines of handwritten text, 1082220 characters, and all characters can be divided into 2650 classes. CASIA-OLHWDB1.0-1.2 contains 3129496 character data.
Data enhancement techniques to generate text-level data sets from character-level data sets: in character-level data, usingAndrespectively representing the minimum and maximum coordinate values of the ith character in the X-axis directionAndx-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis. The following types of handwritten text are synthesized using the above method: level ofVertical directionOverlapMultiple rows and columnsSteeringTiltingIt is expressed by the following formula:
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
The hot, unconstrained handwritten text set OHCTR dataset used in the icdra 2013 chinese handwritten text recognition competition was used to test system performance, and in addition, the resultant dataset was also used to evaluate the robustness of the framework. Accuracy-based experimental results on an ICDAR race set. If a common gated recurrent neural network (LSTM) is used in the feature refinement stage, the system recognition rate is 88.31 and the training time is 208 hours. After the gated recurrent neural network (LSTM) of the first two layers in the identification system is replaced by the distilled gated recurrent neural network proposed by the patent, the training time is reduced from the original 208 hours to 95 hours, and the identification rate is 88.33 and basically remains unchanged. After the synthetic text is introduced, the recognition rate of training time is increased from 88.33% to 91.36%, which is greatly improved, and the comparison table is shown in the following table 1:
TABLE 1
Method of producing a composite material | Accuracy AR (%) | Training time (h) |
Multilayer GRU | 88.31 | 208 |
Distillation GRU | 88.33 | 95 |
Synthesized text | 91.36 | 102 |
The recognition technique described above will be explained by taking the recognition flowchart of the text "ready to be expected by the viewer" shown in fig. 3 as an example,
pretreatment:
various kinds of deformation of handwritten characters are caused by different writing modes of people in the process of writing the characters. Especially for unconstrained handwritten texts, the types of deformation of the same character are very many due to the problems of continuous strokes, dragging strokes and large stroke thickness difference caused by different writing tools. The original image will have some distortion and noise which will also affect the recognition. The preprocessing process can overcome the influence caused by the deformation to a certain extent, fully exerts the performance of the feature extraction and the classifier, and plays an important role in improving the recognition performance.
In particular to the embodiment, the pen-tip trajectory input I of various unconstrained handwritten text images0Can be abstracted into formula I0={(xt,yt,st) T1, 2.., T }; where t represents the serial number of the midpoint of the pen tip trace, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttThe specific strokes of the t point are expressed, and it needs to be explained that the track drawn by the process of pen point falling and pen lifting is taken as one stroke, not the stroke in the meaning of characters; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-ytT is 1,2, …, T-1. In the prior art, in the process of preprocessing unconstrained handwritten text, a pen point track is generally input into an I0Abstraction is formula I0={(xt,yt,st) T1, 2.., T }; the input characteristics x and y cannot be normalized in a given interval, so that extra burden is brought to a network, the complex sequence property of the unconstrained handwritten text cannot be adapted, and therefore, after the pen point track of the input handwritten text is abstracted, only the change between adjacent points of the input handwritten text, namely the relative displacement (delta x) of the pen point, is concerned in the embodimentt、Δyt) So that the handwriting is seated due to the relative displacement of the pen pointsThe marks and handwriting distribution are much more stable, from the viewpoint, the series-connected type unconstrained texts have very similar characteristic patterns, and the only difference between different text styles is the movement of pen points between characters, so that the network burden can be greatly reduced compared with the traditional method.
Feature extraction section
A difficulty with pattern recognition is the large semantic gap between the pattern to be recognized and the pattern that can be processed by a computer. The study of text recognition focuses on the understanding of a particular class of images on a two-dimensional plane, and computers are essentially one-dimensional processing machines. Feature extraction is a powerful tool for filling this semantic gap. It resolves the two-dimensional problem into a one-dimensional problem, and the point-shake image expression into a pattern expression form, thereby resolving the problem to be understood into a computational problem that von neumann machines are adept at.
As shown in FIG. 3, with the online text entry system "ready for viewer expectations", the system first removes redundant points in the text. The guiding idea is that if the distance between a certain point and the previous point is very small or the angle formed by the certain point and the previous two points is flat (i.e. almost on a straight line), the point is considered as a redundant point and will be removed, specifically to this embodiment, the point whose parameter satisfies the following formula is removed first to screen the pen-tip trajectory of the preprocessed handwritten text:
Δx2+Δy2<Tdis
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2, …, T }; wherein II (. cndot.) has the following meaning: when true, II (·) 1, when false, II (·) 0); t isdisIs a manually set value whose purpose is to evaluate the euclidean distance of the current point from the next point, if this distance is very small, even equal to zero (overlapping points), then the current point isThe dots will be removed. T iscosIs also a manually set value whose purpose is to determine if a point is on a straight line or near-direct writing and, if so, to remove it directly. The two values need to be manually set, if the two values are too small, sampling points have great redundancy, and the calculation speed is slow; if too large, the sampling point is too sparse again, and the precision is lost. The setting depends on the trade-off between the speed and accuracy of the system operation by the user. And finally, generating a characteristic sequence with each element being a four-dimensional vector by using coordinates of the rest points in the text, wherein i-th strokes in the graph represent that the elements are grouped according to strokes (one stroke is calculated between the pen-down stroke and the pen-up stroke instead of the Chinese character stroke).
After the signature sequence was generated, we input it into a distillation GRU. Conventional Recurrent Neural Networks (RNNs) are used to process sequence data through a recursive mechanism. However, it has poor effect when dealing with long-term sequence problems, and the information weight far away from each other is continuously reduced, almost having no influence on the subsequent results, and meanwhile, it also increases the calculation burden, and the system performance is greatly influenced. Gated recurrent neural networks can solve such problems very well. The GRU model is a variant of the LSTM model, and by integrating the "input gate" and "forget gate" in the LSTM model to obtain an "update gate", the detailed structure of the GRU unit is shown in fig. 4, which can make the network computation less and convergence easier. In the system, the outputs of two GRU units are cascaded on the input of one GRU unit to form a distillation GRU network, and experiments prove that the method can effectively reduce the training time and cannot reduce the recognition rate.
Constructing a model:
the method comprises the steps of building an integral model of the system, and adopting a keras deep learning framework to construct the network of the system in order to improve the efficiency and the calculation speed of the constructed system and meet the requirement of quick response. keras is a high-level neural network API written in pure Python language, to which both tensiflow, thano and CNTK can be applied as a back-end. The system structure diagram of this embodiment is shown in fig. 3, and the method includes acquiring a time sequence input of a handwritten text, extracting a feature map through processing, then passing through a double-layer distillation GRU network including a distillation GRU layer and a normal GRU layer, and finally passing through a CTC transcription layer to obtain a result.
Model training
After the model is designed and sufficient data is available, model training can be performed. The purpose of model training is to compare the output result of a large amount of data after model calculation with the label corresponding to the data, so as to adjust the parameters of the network in the model, and finally the model can identify the data of the same type. In this embodiment, a pen-tip trajectory of a handwritten text is represented by l, where l is { (x)t,yt,st) Where T denotes the number of dots, (x) 1,2, …, T }, where T denotes the number of dotst,yt) Denotes the abscissa, s, of the t-th pointtIndicating to which stroke this point belongs in particular. Given a handwritten text training data set Q and samples (l, z), where z represents the label to which the data corresponds, the loss function l (Q) of the model network is expressed as:
according to the invention, a random steepest descent method SGD is taken as an optimization algorithm, and a GeForceTitan-XGPUs display card is used, so that convergence can be achieved in about 3-4 days.
We assume the character set to be C' ═ C ∪ { blank }, where C denotes all characters that may be included in the text line recognition task and "blank" denotes the output class to be empty and no output1,u2,…,uT) Wherein u ist∈R|C'|Here, each time point is assigned with a tag and all time points are combined to obtain a tag sequence, and as a result, a large number of tags (denoted as pi) with a sequence length T can be obtained. The probability for each tag sequence was calculated as:
in CTC, map operationsThe tag sequence pi can be mapped to a sequence l, i.e. the final output recognition character sequence.Duplicate characters in the tag sequence, separated by "blank", are first removed, and then "blank" is removed.
As shown in the figure, here is listed a three tag sequence pi (the space character is replaced with "_" for ease of recognition):
π1: backup-audience-period-waiting-for
π2: standby, watched, many, standby
π3: standby-watched-many period-standby
When such a three-tag sequence is subjected to removal of repeated characters between space characters and then space characters, a correct text sequence of "expected by the viewer" can be output.
The invention combines a large amount of marked unconstrained handwritten texts and can train a system capable of accurately identifying the unconstrained handwritten texts.
In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, are merely for convenience of description of the present invention, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.
Claims (6)
1. A method for recognizing an unconstrained handwritten text image, comprising the steps of:
s1, preprocessing input unconstrained handwritten texts to obtain preprocessed text data;
s2, generating a text characteristic sequence on the basis of the preprocessed text data obtained in the step S1;
s3, on the basis of the text characteristic sequence obtained in the step S2, text characteristics are extracted through a multilayer distillation GRU network in time sequence dimension;
and S4, outputting an identification result through a CTC transcription layer.
2. The method for recognizing an unconstrained handwritten text image as claimed in claim 1, wherein step S1 is as follows: nib trace in handwritten text I0={(xt,yt,st) T1, 2.., T }; wherein T represents the number of the middle points of the pen point track, T represents the serial number of the middle points of the pen point track, (x)t,yt) The abscissa and ordinate, s, of a point of sequence number ttRepresenting the strokes to which the t points belong specifically; calculating the variation quantity delta x of the horizontal and vertical coordinates between all adjacent points in the T pointst、ΔytWherein Δ xt=xt+1-xt,Δyt=yt+1-yt,t=1,2,…,T-1。
3. The method for recognizing an unconstrained handwritten text image as claimed in claim 1, wherein step S2 is as follows: first, the points whose parameters satisfy the following formula are removed to screen the pen-tip locus of the handwritten text preprocessed in step S1:
Δx2+Δy2<Tdis
after removing the input redundant points, performing feature extraction to generate a four-dimensional feature sequence I, I { (Δ x)t,Δyt,I(st≠st+1),I(st=st+1) T ═ 1,2,. ·, T }; whereinThe meanings are as follows: when the product is trueIs false timeWherein, TdisIs a manually set value used to evaluate the Euclidean distance between the current point and the next point; t isdisIs a manually set value used to determine whether a point is on a straight line or near-straight line of writing.
4. A method of recognition of an unconstrained handwritten text image as claimed in claim 1, characterized in that said multilayer distilled GRU is a normal GRU plus the following operations: suppose the output h of a normal GRU is equal to (h)1,h2,...,hT),hT∈RDFirst, every N time nodes are grouped into one group, and h 'is made (h'1,h′2,...,h′T/N) Wherein h't=[ht*N+0;ht*N+1;...;ht*N+(N-1)],h′t∈RND(ii) a Then, the feature vector h 'is further processed't∈ RND mapping to another feature spaceThe required mapping formula is as follows:
5. the method of claim 1, wherein the transcription layer is guided by joint temporal classification (CTC) in step S4, and the input image and its corresponding label sequence do not need to be aligned in advance during training, wherein the character set C 'is C ∪ { blank }, wherein C represents all characters used in the character set, blank' represents null mapping, and the input sequence u with a given length T is (u) is1,u2,...,uT) And u isT∈R|C|Obtaining an exponential long tag sequence of length T, represented by a family of pi, by assigning a tag to each time point and concatenating the tags to form a tag sequence; the probability for each sequence is calculated using the following formula:
in CTC, sequence-to-sequence arithmeticFor mapping the aligned transcript L;delete duplicate tags first, then delete "blank" part; the total probability of a transcription is calculated by summing the probabilities of all alignments that correspond to it:
p(L|u)=∑π:B(π)=Lp(π|u)。
6. a data enhancement technique for generating a text-level data set from a character-level data set, in which character-level data is generated usingAndrespectively representing the minimum and maximum coordinate values of the ith character in the X-axis directionAndx-axis coordinates representing the first and last points of the ith character, respectively, by Δ XrRepresents a random bias term, uniformly distributed within (-2, 13), and expressed as Δ xlineThe length of the text line in the X-axis direction is represented, and the same definition is also made on the Y-axis; the following types of handwritten text are synthesized: level ofVertical directionOverlapMultiple rows and columnsSteeringTiltingIt is expressed by the following formula:
in the operation of generating the unconstrained handwritten text, the type of the handwritten text to be generated and the number N of characters are first decided, then N character samples are randomly selected from the character data set, and finally the selected character samples are combined, and the calculated distance (Δ x, Δ y) between adjacent characters is set to generate the unconstrained handwritten text.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010589597.3A CN111738167A (en) | 2020-06-24 | 2020-06-24 | Method for recognizing unconstrained handwritten text image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010589597.3A CN111738167A (en) | 2020-06-24 | 2020-06-24 | Method for recognizing unconstrained handwritten text image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111738167A true CN111738167A (en) | 2020-10-02 |
Family
ID=72651026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010589597.3A Pending CN111738167A (en) | 2020-06-24 | 2020-06-24 | Method for recognizing unconstrained handwritten text image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111738167A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408387A (en) * | 2021-06-10 | 2021-09-17 | 中金金融认证中心有限公司 | Method for generating handwritten text data for complex writing scene and computer product |
CN114241495A (en) * | 2022-02-28 | 2022-03-25 | 天津大学 | Data enhancement method for offline handwritten text recognition |
CN114529910A (en) * | 2022-01-27 | 2022-05-24 | 北京鼎事兴教育咨询有限公司 | Handwritten character recognition method and device, storage medium and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893968A (en) * | 2016-03-31 | 2016-08-24 | 华南理工大学 | Text-independent end-to-end handwriting recognition method based on deep learning |
CN106570456A (en) * | 2016-10-13 | 2017-04-19 | 华南理工大学 | Handwritten Chinese character recognition method based on full-convolution recursive network |
CN108154136A (en) * | 2018-01-15 | 2018-06-12 | 众安信息技术服务有限公司 | For identifying the method, apparatus of writing and computer-readable medium |
US20200143191A1 (en) * | 2018-11-02 | 2020-05-07 | Iflytek Co., Ltd. | Method, apparatus and storage medium for recognizing character |
-
2020
- 2020-06-24 CN CN202010589597.3A patent/CN111738167A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893968A (en) * | 2016-03-31 | 2016-08-24 | 华南理工大学 | Text-independent end-to-end handwriting recognition method based on deep learning |
CN106570456A (en) * | 2016-10-13 | 2017-04-19 | 华南理工大学 | Handwritten Chinese character recognition method based on full-convolution recursive network |
CN108154136A (en) * | 2018-01-15 | 2018-06-12 | 众安信息技术服务有限公司 | For identifying the method, apparatus of writing and computer-readable medium |
US20200143191A1 (en) * | 2018-11-02 | 2020-05-07 | Iflytek Co., Ltd. | Method, apparatus and storage medium for recognizing character |
Non-Patent Citations (1)
Title |
---|
刘曼飞: "基于深度学习的联机手写汉字分析与识别", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408387A (en) * | 2021-06-10 | 2021-09-17 | 中金金融认证中心有限公司 | Method for generating handwritten text data for complex writing scene and computer product |
CN114529910A (en) * | 2022-01-27 | 2022-05-24 | 北京鼎事兴教育咨询有限公司 | Handwritten character recognition method and device, storage medium and electronic equipment |
CN114241495A (en) * | 2022-02-28 | 2022-03-25 | 天津大学 | Data enhancement method for offline handwritten text recognition |
CN114241495B (en) * | 2022-02-28 | 2022-05-03 | 天津大学 | Data enhancement method for off-line handwritten text recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition | |
Tang et al. | Text-independent writer identification via CNN features and joint Bayesian | |
Pal et al. | Handwriting recognition in indian regional scripts: a survey of offline techniques | |
Tagougui et al. | Online Arabic handwriting recognition: a survey | |
Nguyen et al. | A database of unconstrained Vietnamese online handwriting and recognition experiments by recurrent neural networks | |
CN111738167A (en) | Method for recognizing unconstrained handwritten text image | |
CN108664975B (en) | Uyghur handwritten letter recognition method and system and electronic equipment | |
Al-Omari et al. | Handwritten Indian numerals recognition system using probabilistic neural networks | |
Panyam et al. | Modeling of palm leaf character recognition system using transform based techniques | |
Tan et al. | A new handwritten character segmentation method based on nonlinear clustering | |
CN112651323B (en) | Chinese handwriting recognition method and system based on text line detection | |
CN115461792A (en) | Handwritten text recognition method, apparatus and system, handwritten text search method and system, and computer-readable storage medium | |
Ghods et al. | Decision fusion of horizontal and vertical trajectories for recognition of online Farsi subwords | |
Mohammad et al. | Contour-based character segmentation for printed Arabic text with diacritics | |
Singh et al. | A bilingual (Gurmukhi-Roman) online handwriting identification and recognition system | |
Mukherjee et al. | Fusion of spatio-temporal information for indic word recognition combining online and offline text data | |
Singh et al. | Indic script family and its offline handwriting recognition for characters/digits and words: a comprehensive survey | |
Bahashwan et al. | Efficient segmentation of arabic handwritten characters using structural features. | |
Magrina | Convolution Neural Network based Ancient Tamil Character Recognition from Epigraphical Inscriptions | |
Abirami et al. | Handwritten mathematical recognition tool | |
Jung et al. | On-line recognition of cursive Korean characters using graph representation | |
K Jabde et al. | A Comprehensive Literature Review on Air-written Online Handwritten Recognition | |
Ramakrishnan et al. | Development of OHWR system for Kannada | |
Assaleh et al. | Recognition of handwritten Arabic alphabet via hand motion tracking | |
Álvarez et al. | Fuzzy system for intelligent word recognition using a regular grammar |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201002 |