CN114463760B - Character image writing track recovery method based on double-stream coding - Google Patents

Character image writing track recovery method based on double-stream coding Download PDF

Info

Publication number
CN114463760B
CN114463760B CN202210363354.7A CN202210363354A CN114463760B CN 114463760 B CN114463760 B CN 114463760B CN 202210363354 A CN202210363354 A CN 202210363354A CN 114463760 B CN114463760 B CN 114463760B
Authority
CN
China
Prior art keywords
double
network
coding
sequence
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210363354.7A
Other languages
Chinese (zh)
Other versions
CN114463760A (en
Inventor
黄双萍
陈洲楠
杨代辉
梁景麟
彭政华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Original Assignee
Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou, South China University of Technology SCUT filed Critical Guangdong Provincial Laboratory Of Artificial Intelligence And Digital Economy Guangzhou
Priority to CN202210363354.7A priority Critical patent/CN114463760B/en
Publication of CN114463760A publication Critical patent/CN114463760A/en
Application granted granted Critical
Publication of CN114463760B publication Critical patent/CN114463760B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a method for recovering writing tracks of character images based on double-stream coding, which comprises the following steps: adjusting the character image to a preset size and carrying out binarization processing; constructing a double-stream coding network, wherein the input of the double-stream coding network is character images, and the output of the double-stream coding network is the character image of double-stream fusion coding characteristics
Figure 100004_DEST_PATH_IMAGE001
(ii) a Constructing a decoding network, wherein the input of the decoding network is a double-stream fusion coding characteristic
Figure 970835DEST_PATH_IMAGE002
Outputting a predicted character writing track sequence; jointly training a double-flow coding network and a decoding network to obtain a character image writing track recovery network model; and recovering the writing track by using the trained character and image writing track recovery network model. In the encoding process, the method respectively extracts the characters in the vertical direction and the horizontal directionThe character is down-sampled, the parameter quantity is reduced, necessary character font information is kept, subsequent decoding is helped to accurately reflect character fonts, and the recovery performance of writing tracks of character images is effectively improved.

Description

Character image writing track recovery method based on double-stream coding
Technical Field
The invention relates to the field of character and image pattern recognition, in particular to a character and image writing track recovery method based on double-stream coding.
Background
The text data can be roughly divided into two types, namely image modal data and writing track modal data according to modal types, and the text generation technology mainly expands around the two modal types. The character image is usually obtained by an image acquisition device such as a scanner or a camera and is stored in the form of a dot matrix image, and the data can intuitively display the shape of the character and is commonly used for displaying and reading the character. The writing trace of the character is acquired by interactive equipment such as a digital pen, a handwriting pad or a touch screen and the like which can record the trace, is usually stored in a mode of a pen point coordinate point trace sequence, and can possibly record auxiliary information such as pen point pressure, speed and the like in the writing process. The writing track recovery of the character image is a cross-mode character generation technology, aims to recover and obtain writing motion track information from the character image without track information, is often used as an important technical means for character recognition and data augmentation, and has great application potential in the fields of judicial handwriting identification, writing robots, font generation, character special effect generation and the like.
The challenge of writing trajectory recovery algorithms comes first from the complexity of the glyph structure. Taking Chinese characters as an example, the number of Chinese characters stored in the national standard GB18030 is as many as 7 ten thousand, and there are no characters with complex structures or characters with easy confusion among classes, and a slight error in recovering the model may result in fuzzy characters, disorder of classes, or meaningless characters. The recovery algorithm not only needs to overcome the complexity of the character pattern structure, but also needs to learn the position distribution of the pen point on the space and the sequence (stroke order of Chinese characters) between different stroke points. Therefore, in general, the task of generating a writing trajectory of characters is more difficult than the task of generating an ordinary character image. In addition, since the writing track recovery task spans the image and track sequence modalities of the text, the characteristics of the two modalities and the complex mapping relationship between the two modalities need to be considered comprehensively, which makes the design of the track recovery algorithm have great challenges.
Disclosure of Invention
In view of this, the present invention aims to provide a method for recovering a writing track of a text image based on a dual-stream coding, so as to solve the problems of poor characteristic characterization capability, weak generalization performance and low writing track recovery accuracy existing in the text image writing track recovery in the prior art.
The invention discloses a method for recovering writing tracks of character images based on double-stream coding, which comprises the following steps:
step 1, adjusting a character image to a preset size and carrying out binarization processing;
step 2, constructing a double-stream coding network, wherein the double-stream coding network inputs character images and outputs character images with double-stream fusion coding characteristics
Figure 968963DEST_PATH_IMAGE001
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristic
Figure 794355DEST_PATH_IMAGE002
Outputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
step 5, writing track recovery is carried out by utilizing the trained character image writing track recovery network model;
specifically, the double-current coding network comprises a vertical convolution cyclic neural network, a horizontal convolution cyclic neural network and an attention module;
the parallel-connected vertical convolution cyclic neural network and horizontal convolution cyclic neural network both comprise CNN encoders and BilSTM encoders, the CNN encoders in the vertical convolution cyclic neural network perform vertical down-sampling by using vertical down-sampling operation, and then encode input text images by matching with convolution operation to obtain one-dimensional direction characteristics of texts in the horizontal direction
Figure 530230DEST_PATH_IMAGE003
One-dimensional directional characteristics
Figure 932393DEST_PATH_IMAGE004
Obtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the vertical convolution cyclic neural network to obtain double-current coding characteristics
Figure 804534DEST_PATH_IMAGE005
(ii) a The CNN encoder in the horizontal convolution cyclic neural network performs down-sampling in the horizontal direction by using down-sampling operation in the horizontal direction, and then encodes an input character image by matching with convolution operation to obtain one-dimensional direction characteristics of characters in the vertical direction
Figure 920868DEST_PATH_IMAGE006
One-dimensional directional characteristics
Figure 144039DEST_PATH_IMAGE006
Obtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the horizontal convolution cyclic neural network to obtain double-current coding characteristics
Figure 615472DEST_PATH_IMAGE007
Encoding features for dual streams in attention modules
Figure 607698DEST_PATH_IMAGE005
And
Figure 975226DEST_PATH_IMAGE007
performing fusion to obtain the double-current fusion coding characteristics
Figure 685693DEST_PATH_IMAGE008
Figure 960816DEST_PATH_IMAGE009
Wherein the content of the first and second substances,
Figure 807550DEST_PATH_IMAGE010
Figure 880067DEST_PATH_IMAGE011
by incorporating features
Figure 812250DEST_PATH_IMAGE012
And
Figure 625486DEST_PATH_IMAGE013
to obtain
Figure 326725DEST_PATH_IMAGE014
Figure 301635DEST_PATH_IMAGE015
Is composed of
Figure 986694DEST_PATH_IMAGE014
The component (b) of (a) is,
Figure 134778DEST_PATH_IMAGE016
is composed of
Figure 956104DEST_PATH_IMAGE014
The length of (a) of (b),
Figure 101915DEST_PATH_IMAGE017
is a learnable parameter of a fully connected layer.
Optionally, the downsampling operation is asymmetric pooling operation, asymmetric convolution operation or full connection layer network operation downsampling;
optionally, the decoding network is an LSTM decoder, and the LSTM decoder uses dual-stream fusion coding features
Figure 743111DEST_PATH_IMAGE002
Sequentially predicting track points for input; LSTM decoder based on
Figure 898149DEST_PATH_IMAGE018
Predicted value of time
Figure 370719DEST_PATH_IMAGE019
And hidden layer vector
Figure 953010DEST_PATH_IMAGE020
Prediction of
Figure 347082DEST_PATH_IMAGE021
Track point information of time
Figure 40232DEST_PATH_IMAGE022
Figure 570570DEST_PATH_IMAGE023
Wherein, in the step (A),
Figure 323763DEST_PATH_IMAGE024
and
Figure 939552DEST_PATH_IMAGE025
to represent
Figure 233130DEST_PATH_IMAGE021
The position coordinates of the time of day,
Figure 880624DEST_PATH_IMAGE026
to represent
Figure 804718DEST_PATH_IMAGE021
The meaning of the state of the pen point at any moment and 3 states is: "the pen point is contacting with the paper surface", "the current stroke is finished, the temporary pen is lifted" and "all strokes are finished", finally,
Figure 173382DEST_PATH_IMAGE027
a sequence of trajectories is written for the predicted text.
Specifically, in the process of jointly training the dual-stream coding network and the decoding network, the coding and decoding network loss function is as follows:
Figure 942755DEST_PATH_IMAGE028
Figure 447686DEST_PATH_IMAGE029
to balance the predetermined constants of the respective loss weights,
Figure 339419DEST_PATH_IMAGE030
for the L2 loss, the formula is calculated as:
Figure 929800DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 768443DEST_PATH_IMAGE032
and
Figure 127880DEST_PATH_IMAGE033
respectively the X-coordinate and Y-coordinate predictors of the position of the decoding network,
Figure 393776DEST_PATH_IMAGE034
and
Figure 471454DEST_PATH_IMAGE035
label values of an X coordinate and a Y coordinate of the position are respectively, and N is the number of the track points;
Figure 644946DEST_PATH_IMAGE036
for cross entropy loss, the calculation formula is:
Figure 124469DEST_PATH_IMAGE037
wherein the content of the first and second substances,
Figure 561267DEST_PATH_IMAGE038
for the decoding network to pen point state
Figure 126240DEST_PATH_IMAGE039
The probability of (a) is predicted,
Figure 41106DEST_PATH_IMAGE040
a label value in a pen point state;
Figure 640715DEST_PATH_IMAGE041
for dynamic time warping loss, an optimal alignment path between the prediction and the label track sequence is found by using a dynamic time warping algorithm, and the sequence distance under the optimal alignment path is calculated as the global loss of the prediction sequence:
Given a sequence of predicted trajectories
Figure 45152DEST_PATH_IMAGE042
And a sequence of tag tracks
Figure 831842DEST_PATH_IMAGE043
The sequence length is respectively
Figure 807189DEST_PATH_IMAGE044
And
Figure 261305DEST_PATH_IMAGE045
setting the Euclidean distance function
Figure 39905DEST_PATH_IMAGE046
For characterizing points of track
Figure 48312DEST_PATH_IMAGE047
And
Figure 570560DEST_PATH_IMAGE048
define an alignment path
Figure 675919DEST_PATH_IMAGE049
Wherein, in the step (A),
Figure 625421DEST_PATH_IMAGE050
Figure 652283DEST_PATH_IMAGE051
for the length of the alignment path, each item of the alignment path defines
Figure 712643DEST_PATH_IMAGE052
And
Figure 875771DEST_PATH_IMAGE053
the corresponding relation of (1):
Figure 996174DEST_PATH_IMAGE054
Figure 41490DEST_PATH_IMAGE055
Figure 905541DEST_PATH_IMAGE056
wherein the content of the first and second substances,
Figure 188754DEST_PATH_IMAGE057
to represent
Figure 480059DEST_PATH_IMAGE058
To (1) a
Figure 950354DEST_PATH_IMAGE059
The number of the track points is one,
Figure 618096DEST_PATH_IMAGE060
to represent
Figure 755816DEST_PATH_IMAGE061
To (1) a
Figure 14759DEST_PATH_IMAGE062
And (3) searching an alignment path which enables the sequence distance to be minimum by using a Dynamic Time Warping (DTW) algorithm as an optimal alignment path, wherein the corresponding sequence distance is used as the global loss of the predicted sequence:
Figure 235000DEST_PATH_IMAGE063
preferably, the hidden layer state of a BilSTM encoder in a dual stream coding network is used as the hidden layer initial state of an LSTM decoder
Figure 706433DEST_PATH_IMAGE064
Preferably, the first and second electrodes are formed of a metal,
Figure 698660DEST_PATH_IMAGE065
the value of the carbon dioxide is 0.5,
Figure 66187DEST_PATH_IMAGE066
the value of the carbon dioxide is 1.0,
Figure 776654DEST_PATH_IMAGE067
taking the value of 1/6000.
Compared with the prior art, the method provided by the invention has the advantages that the characteristics of the characters in the vertical and horizontal directions are respectively extracted in the encoding process, the characteristics are down-sampled, the parameter quantity is reduced, the necessary character font information is retained, the subsequent decoding is assisted to accurately reflect the font of the characters, and finally the recovery performance of the writing track of the character image can be effectively improved.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
FIG. 2 shows a schematic structural diagram of a dual-stream coding network in an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of a decoding network in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
For reference and clarity, the technical terms, abbreviations or acronyms used hereinafter are to be construed in summary as follows:
CNN: a Convolutional Neural Network;
RNN: a current Neural Network, a Recurrent Neural Network;
CRNN: a Convolutional Recurrent Neural Network;
BilSTM: bi-directional Long Short-Term Memory, bidirectional Long-and-Short-Term Memory model;
DTW: dynamic Time Warping, Dynamic Time Warping.
Fig. 1 shows a schematic flow diagram of an embodiment of the invention. A method for restoring writing tracks of character images based on double-stream coding comprises the following steps:
step 1, adjusting a character image to a preset size and carrying out binarization processing;
step 2, constructing a double-stream coding network, wherein the double-stream coding network inputs character images and outputs character images with double-stream fusion coding characteristics
Figure 582936DEST_PATH_IMAGE002
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristic
Figure 429670DEST_PATH_IMAGE068
Outputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
and 5, restoring the writing track by using the trained character and image writing track restoring network model.
The specific operation steps of this embodiment are as follows:
(1) preprocessing the input text image by resizing while maintaining aspect ratio
Figure 499257DEST_PATH_IMAGE069
(ii) a And carrying out binarization processing.
(2) And constructing a double-stream coding network.
1) As shown in FIG. 2, two Convolutional Recurrent Neural Network (CRNN) branches are constructed
Figure 431441DEST_PATH_IMAGE070
And
Figure 979097DEST_PATH_IMAGE071
. They contain a CNN encoder and a BiLSTM encoder, respectively. Two CNN encoders respectively utilize asymmetric pooling operation in the vertical or horizontal direction to perform down-sampling in the vertical or horizontal direction, and encode input text images in cooperation with convolution operation to obtain one-dimensional direction characteristics of the text in the vertical or horizontal direction
Figure 680336DEST_PATH_IMAGE004
And
Figure 717563DEST_PATH_IMAGE006
. One-dimensional directional characteristics
Figure 137043DEST_PATH_IMAGE004
And
Figure 488389DEST_PATH_IMAGE006
obtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the time sequence characteristic sequence by a BiLSTM coder to obtain a double-stream coding characteristic
Figure 44136DEST_PATH_IMAGE005
And
Figure 455526DEST_PATH_IMAGE007
2) attention-based mechanism for two features
Figure 362302DEST_PATH_IMAGE005
And
Figure 314077DEST_PATH_IMAGE007
performing fusion to obtain the double-current fusion coding characteristics
Figure 989909DEST_PATH_IMAGE072
Figure 778392DEST_PATH_IMAGE011
Figure 172464DEST_PATH_IMAGE010
Figure 865614DEST_PATH_IMAGE009
Wherein features are combined
Figure 661532DEST_PATH_IMAGE005
And
Figure 945882DEST_PATH_IMAGE007
to obtain
Figure 827251DEST_PATH_IMAGE014
Figure 324091DEST_PATH_IMAGE015
And
Figure 100002_DEST_PATH_IMAGE073
is composed of
Figure 177778DEST_PATH_IMAGE014
To (1) aiA component and ajThe number of the components is such that,
Figure 898609DEST_PATH_IMAGE074
to represent
Figure 1694DEST_PATH_IMAGE015
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE075
to represent
Figure 505488DEST_PATH_IMAGE076
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE077
a function representing a fully connected layer,
Figure 479260DEST_PATH_IMAGE016
is composed of
Figure 574255DEST_PATH_IMAGE014
The length of (a) of (b),
Figure 961374DEST_PATH_IMAGE017
is a learnable parameter of a fully connected layer.
(3) And constructing a decoding network, performing characteristic decoding, and outputting a predicted character writing track sequence.
1) Constructing an LSTM decoder encoding features with dual stream fusion
Figure 800017DEST_PATH_IMAGE078
For input, the trace points are predicted in turn. As shown in fig. 3, the LSTM decoder is based on
Figure 422104DEST_PATH_IMAGE018
Predicted value of time
Figure 422421DEST_PATH_IMAGE019
And hidden layer vector
Figure 500098DEST_PATH_IMAGE020
Prediction of
Figure 673591DEST_PATH_IMAGE021
Track point information of time
Figure 153114DEST_PATH_IMAGE022
. In the end, the flow rate of the gas is controlled,
Figure 589911DEST_PATH_IMAGE027
a sequence of trajectories is written for the predicted text. Using hidden layer states of a BilSt encoder in a dual stream coding network as hidden layer initial states of an LSTM decoder
Figure 889305DEST_PATH_IMAGE064
2) For
Figure 69751DEST_PATH_IMAGE021
Track point information of time, setting
Figure 403780DEST_PATH_IMAGE023
. Wherein the content of the first and second substances,
Figure 808217DEST_PATH_IMAGE024
and
Figure 860487DEST_PATH_IMAGE025
the position coordinates representing the time of day are,
Figure 844623DEST_PATH_IMAGE026
indicating the state of the pen point at that time to form a thermal code
Figure DEST_PATH_IMAGE079
Figure 236421DEST_PATH_IMAGE080
And
Figure DEST_PATH_IMAGE081
respectively representing 3 states during writing: "the nib is contacting the paper surface", "the current stroke is done, the temporary stroke is lifted" and "all strokes are done". In particular, the initial input trace point is set to
Figure 483863DEST_PATH_IMAGE082
(4) And constructing a loss function of the encoding and decoding network, and training a model formed by the double-flow encoding network and the decoding network end to end (end-to-end). The set codec network loss functions include L2 loss, cross entropy loss and dynamic time warping loss.
Loss of L2:
Figure 820166DEST_PATH_IMAGE083
wherein the content of the first and second substances,
Figure 342415DEST_PATH_IMAGE032
and
Figure 651036DEST_PATH_IMAGE033
in order to be a predictive value for the network,
Figure 349607DEST_PATH_IMAGE034
and
Figure 376469DEST_PATH_IMAGE035
is the label value and N is the number of trace points.
Cross entropy Loss (CrossEntropy Loss):
Figure DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 702408DEST_PATH_IMAGE039
in order to be a predictive value for the network,
Figure 865536DEST_PATH_IMAGE040
is the tag value.
Dynamic time warping loss (dynamictimewarping loss): and searching an optimal alignment path between the prediction and the label track sequence by using a dynamic time warping algorithm, and calculating a sequence distance under the optimal alignment path to be used as the global loss of the prediction sequence, thereby realizing the global optimization of the track sequence.
Given a sequence of predicted trajectories
Figure 985939DEST_PATH_IMAGE042
And a sequence of tag tracks
Figure 234517DEST_PATH_IMAGE043
The sequence length is respectively
Figure 98568DEST_PATH_IMAGE044
And
Figure 178520DEST_PATH_IMAGE045
setting the Euclidean distance function
Figure 469824DEST_PATH_IMAGE046
For characterizing points of track
Figure 940119DEST_PATH_IMAGE047
And
Figure 607861DEST_PATH_IMAGE048
define an alignment path
Figure 745581DEST_PATH_IMAGE049
(wherein
Figure 207787DEST_PATH_IMAGE050
For the length of the alignment path), each item of the alignment path defines
Figure 227695DEST_PATH_IMAGE052
And
Figure 433549DEST_PATH_IMAGE053
the corresponding relation of (1):
Figure 160196DEST_PATH_IMAGE085
Figure 58882DEST_PATH_IMAGE086
Figure 566087DEST_PATH_IMAGE087
using a Dynamic Time Warping (DTW) algorithm to find an alignment path which enables the sequence distance to be minimum, wherein the alignment path is used as an optimal alignment path, and the corresponding sequence distance is used as the global loss of a predicted sequence:
Figure 310052DEST_PATH_IMAGE088
coding and decoding network loss function:
Figure 419435DEST_PATH_IMAGE028
Figure 223443DEST_PATH_IMAGE029
are constants that balance the respective loss weights. In practice, we set up separately
Figure 155627DEST_PATH_IMAGE029
0.5,1.0 and 1/6000.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be considered as the protection scope of the present invention.

Claims (6)

1. A method for restoring writing tracks of character images based on double-stream coding is characterized by comprising the following steps:
Step 1, adjusting a character image to a preset size and carrying out binarization processing;
step 2, constructing a double-stream coding network, wherein the double-stream coding network inputs character images and outputs character images with double-stream fusion coding characteristics
Figure DEST_PATH_IMAGE001
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristic
Figure 253170DEST_PATH_IMAGE001
Outputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
step 5, writing track recovery is carried out by utilizing the trained character image writing track recovery network model;
the double-current coding network comprises a vertical convolution cyclic neural network, a horizontal convolution cyclic neural network and an attention module;
the parallel-connected vertical convolution cyclic neural network and horizontal convolution cyclic neural network both comprise CNN encoders and BilSTM encoders, the CNN encoders in the vertical convolution cyclic neural network perform vertical down-sampling by using vertical down-sampling operation, and then encode input text images by matching with convolution operation to obtain one-dimensional direction characteristics of texts in the horizontal direction
Figure DEST_PATH_IMAGE002
One-dimensional directional characteristics
Figure DEST_PATH_IMAGE003
Obtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the vertical convolution cyclic neural network to obtain double-current coding characteristics
Figure DEST_PATH_IMAGE004
(ii) a The CNN encoder in the horizontal convolution cyclic neural network performs down-sampling in the horizontal direction by using down-sampling operation in the horizontal direction, and then encodes an input text image by matching with convolution operation to obtain one-dimensional direction characteristics of the text in the vertical direction
Figure DEST_PATH_IMAGE005
One-dimensional directional characteristics
Figure 511589DEST_PATH_IMAGE005
Splitting in direction dimension to obtain a characteristic sequence taking direction as time sequence, and coding the characteristic sequence of the time sequence by a BilSTM coder in a horizontal convolution cyclic neural network to obtain double-current coding characteristics
Figure DEST_PATH_IMAGE006
Encoding features for dual streams in attention module
Figure DEST_PATH_IMAGE007
And
Figure DEST_PATH_IMAGE008
performing fusion to obtain the double-current fusion coding characteristics
Figure DEST_PATH_IMAGE009
Figure DEST_PATH_IMAGE010
Wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
Figure DEST_PATH_IMAGE012
by incorporating features
Figure 245321DEST_PATH_IMAGE007
And
Figure 906109DEST_PATH_IMAGE008
to obtain
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
And
Figure DEST_PATH_IMAGE015
is composed of
Figure 595848DEST_PATH_IMAGE013
The ith and jth components of (a),
Figure DEST_PATH_IMAGE016
to represent
Figure DEST_PATH_IMAGE017
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE018
to represent
Figure DEST_PATH_IMAGE019
The attention weight of (a) is given,
Figure DEST_PATH_IMAGE020
a function representing a fully connected layer,
Figure DEST_PATH_IMAGE021
is composed of
Figure DEST_PATH_IMAGE022
The length of (a) of (b),
Figure DEST_PATH_IMAGE023
is a learnable parameter of a fully connected layer.
2. The method for recovering the writing track of the character image based on the double-stream coding as claimed in claim 1, wherein the down-sampling operation is an asymmetric pooling operation, an asymmetric convolution operation or a full-connection layer network operation.
3. The method for recovering writing tracks of character images based on dual-stream coding as claimed in claim 1, wherein the decoding network is an LSTM decoder, and the LSTM decoder is characterized by dual-stream fusion coding
Figure DEST_PATH_IMAGE024
Sequentially predicting track points for input; LSTM decoder based on
Figure DEST_PATH_IMAGE025
Predicted value of time
Figure DEST_PATH_IMAGE026
And hidden layer vector
Figure DEST_PATH_IMAGE027
Prediction of
Figure DEST_PATH_IMAGE028
Track point information of time
Figure DEST_PATH_IMAGE029
Figure DEST_PATH_IMAGE030
Wherein, in the step (A),
Figure DEST_PATH_IMAGE031
and
Figure DEST_PATH_IMAGE032
to represent
Figure 388616DEST_PATH_IMAGE028
The position coordinates of the time of day,
Figure DEST_PATH_IMAGE033
to represent
Figure 327754DEST_PATH_IMAGE028
The meaning of the state of the pen point at any moment and 3 states is: "the pen point is contacting with the paper surface", "the current stroke is finished, the temporary pen is lifted" and "all strokes are finished", finally,
Figure DEST_PATH_IMAGE034
a sequence of trajectories is written for the predicted text.
4. The method for recovering the writing locus of the character image based on the double-stream coding as claimed in claim 3, wherein in the process of jointly training the double-stream coding network and the decoding network, the loss function of the coding and decoding network is as follows:
Figure DEST_PATH_IMAGE035
Figure DEST_PATH_IMAGE036
to balance the predetermined constants of the respective loss weights,
Figure DEST_PATH_IMAGE037
for the L2 loss, the formula is calculated as:
Figure DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE039
and
Figure DEST_PATH_IMAGE040
respectively the X-coordinate and Y-coordinate predictors of the position of the decoding network,
Figure DEST_PATH_IMAGE041
and
Figure DEST_PATH_IMAGE042
label values of an X coordinate and a Y coordinate of the position are respectively, and N is the number of the track points;
Figure DEST_PATH_IMAGE043
For cross entropy loss, the calculation formula is:
Figure DEST_PATH_IMAGE044
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE045
for the decoding network to pen point state
Figure DEST_PATH_IMAGE046
The probability of (a) is predicted,
Figure DEST_PATH_IMAGE047
a tag value which is a pen point state;
Figure DEST_PATH_IMAGE048
for dynamic time warping loss, an optimal alignment path between the prediction and the label track sequence is found by using a dynamic time warping algorithm, and the sequence distance under the optimal alignment path is calculated as the global loss of the prediction sequence:
given a sequence of predicted trajectories
Figure DEST_PATH_IMAGE049
And a sequence of tag tracks
Figure DEST_PATH_IMAGE050
The sequence length is respectively
Figure DEST_PATH_IMAGE051
And
Figure DEST_PATH_IMAGE052
setting the Euclidean distance function
Figure DEST_PATH_IMAGE053
For characterizing points of track
Figure DEST_PATH_IMAGE054
And
Figure DEST_PATH_IMAGE055
define an alignment path
Figure DEST_PATH_IMAGE056
Wherein, in the step (A),
Figure DEST_PATH_IMAGE057
Figure DEST_PATH_IMAGE058
for the length of the alignment path, each item of the alignment path defines
Figure DEST_PATH_IMAGE059
And
Figure DEST_PATH_IMAGE060
the corresponding relation of (1):
Figure DEST_PATH_IMAGE061
Figure DEST_PATH_IMAGE062
Figure DEST_PATH_IMAGE063
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE064
to represent
Figure DEST_PATH_IMAGE065
To (1) a
Figure DEST_PATH_IMAGE066
The number of the track points is one,
Figure DEST_PATH_IMAGE067
to represent
Figure DEST_PATH_IMAGE068
To (1) a
Figure DEST_PATH_IMAGE069
And (3) searching an alignment path which enables the sequence distance to be minimum by using a Dynamic Time Warping (DTW) algorithm as an optimal alignment path, wherein the corresponding sequence distance is used as the global loss of the predicted sequence:
Figure DEST_PATH_IMAGE070
5. the method for recovering writing trace of character and image based on dual-stream coding as claimed in claim 1, wherein the hidden layer state of BilSTM encoder in dual-stream coding network is used as the initial state of hidden layer of LSTM decoder
Figure DEST_PATH_IMAGE071
6. The method for recovering writing trace of character images based on dual-stream coding as claimed in claim 4,
Figure DEST_PATH_IMAGE072
the value of the carbon dioxide is 0.5,
Figure DEST_PATH_IMAGE073
the value of the carbon dioxide is 1.0,
Figure DEST_PATH_IMAGE074
taking the value of 1/6000.
CN202210363354.7A 2022-04-08 2022-04-08 Character image writing track recovery method based on double-stream coding Active CN114463760B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210363354.7A CN114463760B (en) 2022-04-08 2022-04-08 Character image writing track recovery method based on double-stream coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210363354.7A CN114463760B (en) 2022-04-08 2022-04-08 Character image writing track recovery method based on double-stream coding

Publications (2)

Publication Number Publication Date
CN114463760A CN114463760A (en) 2022-05-10
CN114463760B true CN114463760B (en) 2022-06-28

Family

ID=81416905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210363354.7A Active CN114463760B (en) 2022-04-08 2022-04-08 Character image writing track recovery method based on double-stream coding

Country Status (1)

Country Link
CN (1) CN114463760B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853378A (en) * 2024-03-07 2024-04-09 湖南董因信息技术有限公司 Text handwriting display method based on metric learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410242A (en) * 2018-09-05 2019-03-01 华南理工大学 Method for tracking target, system, equipment and medium based on double-current convolutional neural networks
CN110188669A (en) * 2019-05-29 2019-08-30 华南理工大学 A kind of aerial hand-written character track restoration methods based on attention mechanism
WO2021136144A1 (en) * 2019-12-31 2021-07-08 中兴通讯股份有限公司 Character restoration method and apparatus, storage medium, and electronic device
CN114428866A (en) * 2022-01-26 2022-05-03 杭州电子科技大学 Video question-answering method based on object-oriented double-flow attention network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11158055B2 (en) * 2019-07-26 2021-10-26 Adobe Inc. Utilizing a neural network having a two-stream encoder architecture to generate composite digital images
CN111104532B (en) * 2019-12-30 2023-04-25 华南理工大学 RGBD image joint recovery method based on double-flow network
CN111626238B (en) * 2020-05-29 2023-08-04 京东方科技集团股份有限公司 Text recognition method, electronic device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410242A (en) * 2018-09-05 2019-03-01 华南理工大学 Method for tracking target, system, equipment and medium based on double-current convolutional neural networks
CN110188669A (en) * 2019-05-29 2019-08-30 华南理工大学 A kind of aerial hand-written character track restoration methods based on attention mechanism
WO2021136144A1 (en) * 2019-12-31 2021-07-08 中兴通讯股份有限公司 Character restoration method and apparatus, storage medium, and electronic device
CN114428866A (en) * 2022-01-26 2022-05-03 杭州电子科技大学 Video question-answering method based on object-oriented double-flow attention network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《OBC306: A Large-Scale Oracle Bone Character Recognition Dataset》;shuangping huang;《2019 International Conference on Document Analysis and Recognition 》;20200203;第681-688页 *

Also Published As

Publication number Publication date
CN114463760A (en) 2022-05-10

Similar Documents

Publication Publication Date Title
KR102473543B1 (en) Systems and methods for digital ink interaction
Kosmala et al. On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars
Zhelezniakov et al. Online handwritten mathematical expression recognition and applications: A survey
CN111553350A (en) Attention mechanism text recognition method based on deep learning
Jain et al. Unconstrained OCR for Urdu using deep CNN-RNN hybrid networks
CN113673432A (en) Handwriting recognition method, touch display device, computer device and storage medium
CN111046771A (en) Training method of network model for recovering writing track
CN114463760B (en) Character image writing track recovery method based on double-stream coding
Gan et al. In-air handwritten Chinese text recognition with temporal convolutional recurrent network
US11837001B2 (en) Stroke attribute matrices
JP6055065B1 (en) Character recognition program and character recognition device
US20050276480A1 (en) Handwritten input for Asian languages
CN111738167A (en) Method for recognizing unconstrained handwritten text image
Choudhury et al. Trajectory-based recognition of in-air handwritten Assamese words using a hybrid classifier network
CN113435398B (en) Signature feature identification method, system, equipment and storage medium based on mask pre-training model
CN114757969B (en) Character and image writing track recovery method based on global tracking decoding
Xu et al. On-line sample generation for in-air written chinese character recognition based on leap motion controller
CN115620314A (en) Text recognition method, answer text verification method, device, equipment and medium
Bezine et al. Handwriting perceptual classification and synthesis using discriminate HMMs and progressive iterative approximation
Assaleh et al. Recognition of handwritten Arabic alphabet via hand motion tracking
Alwajih et al. DeepOnKHATT: an end-to-end Arabic online handwriting recognition system
Tan et al. An End-to-End Air Writing Recognition Method Based on Transformer
CN113673635B (en) Hand-drawn sketch understanding deep learning method based on self-supervision learning task
WO2022180725A1 (en) Character recognition device, program, and method
Shi et al. In-air Handwritten English Word Recognition Based on Corner Point Feature Fusion and Contrastive Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant