CN114463760B - Character image writing track recovery method based on double-stream coding - Google Patents
Character image writing track recovery method based on double-stream coding Download PDFInfo
- Publication number
- CN114463760B CN114463760B CN202210363354.7A CN202210363354A CN114463760B CN 114463760 B CN114463760 B CN 114463760B CN 202210363354 A CN202210363354 A CN 202210363354A CN 114463760 B CN114463760 B CN 114463760B
- Authority
- CN
- China
- Prior art keywords
- double
- network
- coding
- sequence
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression Of Band Width Or Redundancy In Fax (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention discloses a method for recovering writing tracks of character images based on double-stream coding, which comprises the following steps: adjusting the character image to a preset size and carrying out binarization processing; constructing a double-stream coding network, wherein the input of the double-stream coding network is character images, and the output of the double-stream coding network is the character image of double-stream fusion coding characteristics(ii) a Constructing a decoding network, wherein the input of the decoding network is a double-stream fusion coding characteristicOutputting a predicted character writing track sequence; jointly training a double-flow coding network and a decoding network to obtain a character image writing track recovery network model; and recovering the writing track by using the trained character and image writing track recovery network model. In the encoding process, the method respectively extracts the characters in the vertical direction and the horizontal directionThe character is down-sampled, the parameter quantity is reduced, necessary character font information is kept, subsequent decoding is helped to accurately reflect character fonts, and the recovery performance of writing tracks of character images is effectively improved.
Description
Technical Field
The invention relates to the field of character and image pattern recognition, in particular to a character and image writing track recovery method based on double-stream coding.
Background
The text data can be roughly divided into two types, namely image modal data and writing track modal data according to modal types, and the text generation technology mainly expands around the two modal types. The character image is usually obtained by an image acquisition device such as a scanner or a camera and is stored in the form of a dot matrix image, and the data can intuitively display the shape of the character and is commonly used for displaying and reading the character. The writing trace of the character is acquired by interactive equipment such as a digital pen, a handwriting pad or a touch screen and the like which can record the trace, is usually stored in a mode of a pen point coordinate point trace sequence, and can possibly record auxiliary information such as pen point pressure, speed and the like in the writing process. The writing track recovery of the character image is a cross-mode character generation technology, aims to recover and obtain writing motion track information from the character image without track information, is often used as an important technical means for character recognition and data augmentation, and has great application potential in the fields of judicial handwriting identification, writing robots, font generation, character special effect generation and the like.
The challenge of writing trajectory recovery algorithms comes first from the complexity of the glyph structure. Taking Chinese characters as an example, the number of Chinese characters stored in the national standard GB18030 is as many as 7 ten thousand, and there are no characters with complex structures or characters with easy confusion among classes, and a slight error in recovering the model may result in fuzzy characters, disorder of classes, or meaningless characters. The recovery algorithm not only needs to overcome the complexity of the character pattern structure, but also needs to learn the position distribution of the pen point on the space and the sequence (stroke order of Chinese characters) between different stroke points. Therefore, in general, the task of generating a writing trajectory of characters is more difficult than the task of generating an ordinary character image. In addition, since the writing track recovery task spans the image and track sequence modalities of the text, the characteristics of the two modalities and the complex mapping relationship between the two modalities need to be considered comprehensively, which makes the design of the track recovery algorithm have great challenges.
Disclosure of Invention
In view of this, the present invention aims to provide a method for recovering a writing track of a text image based on a dual-stream coding, so as to solve the problems of poor characteristic characterization capability, weak generalization performance and low writing track recovery accuracy existing in the text image writing track recovery in the prior art.
The invention discloses a method for recovering writing tracks of character images based on double-stream coding, which comprises the following steps:
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristicOutputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
step 5, writing track recovery is carried out by utilizing the trained character image writing track recovery network model;
specifically, the double-current coding network comprises a vertical convolution cyclic neural network, a horizontal convolution cyclic neural network and an attention module;
the parallel-connected vertical convolution cyclic neural network and horizontal convolution cyclic neural network both comprise CNN encoders and BilSTM encoders, the CNN encoders in the vertical convolution cyclic neural network perform vertical down-sampling by using vertical down-sampling operation, and then encode input text images by matching with convolution operation to obtain one-dimensional direction characteristics of texts in the horizontal direction One-dimensional directional characteristicsObtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the vertical convolution cyclic neural network to obtain double-current coding characteristics(ii) a The CNN encoder in the horizontal convolution cyclic neural network performs down-sampling in the horizontal direction by using down-sampling operation in the horizontal direction, and then encodes an input character image by matching with convolution operation to obtain one-dimensional direction characteristics of characters in the vertical directionOne-dimensional directional characteristicsObtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the horizontal convolution cyclic neural network to obtain double-current coding characteristics;
Encoding features for dual streams in attention modulesAndperforming fusion to obtain the double-current fusion coding characteristics:
Wherein the content of the first and second substances,,by incorporating featuresAndto obtain,Is composed ofThe component (b) of (a) is,is composed ofThe length of (a) of (b),is a learnable parameter of a fully connected layer.
Optionally, the downsampling operation is asymmetric pooling operation, asymmetric convolution operation or full connection layer network operation downsampling;
optionally, the decoding network is an LSTM decoder, and the LSTM decoder uses dual-stream fusion coding features Sequentially predicting track points for input; LSTM decoder based onPredicted value of timeAnd hidden layer vectorPrediction ofTrack point information of time,Wherein, in the step (A),andto representThe position coordinates of the time of day,to representThe meaning of the state of the pen point at any moment and 3 states is: "the pen point is contacting with the paper surface", "the current stroke is finished, the temporary pen is lifted" and "all strokes are finished", finally,a sequence of trajectories is written for the predicted text.
Specifically, in the process of jointly training the dual-stream coding network and the decoding network, the coding and decoding network loss function is as follows:
to balance the predetermined constants of the respective loss weights,for the L2 loss, the formula is calculated as:
wherein the content of the first and second substances,andrespectively the X-coordinate and Y-coordinate predictors of the position of the decoding network,andlabel values of an X coordinate and a Y coordinate of the position are respectively, and N is the number of the track points;
wherein the content of the first and second substances,for the decoding network to pen point stateThe probability of (a) is predicted,a label value in a pen point state;
for dynamic time warping loss, an optimal alignment path between the prediction and the label track sequence is found by using a dynamic time warping algorithm, and the sequence distance under the optimal alignment path is calculated as the global loss of the prediction sequence:
Given a sequence of predicted trajectoriesAnd a sequence of tag tracksThe sequence length is respectivelyAndsetting the Euclidean distance functionFor characterizing points of trackAnddefine an alignment pathWherein, in the step (A),,for the length of the alignment path, each item of the alignment path definesAndthe corresponding relation of (1):
wherein the content of the first and second substances,to representTo (1) aThe number of the track points is one,to representTo (1) aAnd (3) searching an alignment path which enables the sequence distance to be minimum by using a Dynamic Time Warping (DTW) algorithm as an optimal alignment path, wherein the corresponding sequence distance is used as the global loss of the predicted sequence:
preferably, the hidden layer state of a BilSTM encoder in a dual stream coding network is used as the hidden layer initial state of an LSTM decoder。
Preferably, the first and second electrodes are formed of a metal,the value of the carbon dioxide is 0.5,the value of the carbon dioxide is 1.0,taking the value of 1/6000.
Compared with the prior art, the method provided by the invention has the advantages that the characteristics of the characters in the vertical and horizontal directions are respectively extracted in the encoding process, the characteristics are down-sampled, the parameter quantity is reduced, the necessary character font information is retained, the subsequent decoding is assisted to accurately reflect the font of the characters, and finally the recovery performance of the writing track of the character image can be effectively improved.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
FIG. 2 shows a schematic structural diagram of a dual-stream coding network in an embodiment of the present invention;
fig. 3 shows a schematic structural diagram of a decoding network in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
For reference and clarity, the technical terms, abbreviations or acronyms used hereinafter are to be construed in summary as follows:
CNN: a Convolutional Neural Network;
RNN: a current Neural Network, a Recurrent Neural Network;
CRNN: a Convolutional Recurrent Neural Network;
BilSTM: bi-directional Long Short-Term Memory, bidirectional Long-and-Short-Term Memory model;
DTW: dynamic Time Warping, Dynamic Time Warping.
Fig. 1 shows a schematic flow diagram of an embodiment of the invention. A method for restoring writing tracks of character images based on double-stream coding comprises the following steps:
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristicOutputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
and 5, restoring the writing track by using the trained character and image writing track restoring network model.
The specific operation steps of this embodiment are as follows:
(1) preprocessing the input text image by resizing while maintaining aspect ratio(ii) a And carrying out binarization processing.
(2) And constructing a double-stream coding network.
1) As shown in FIG. 2, two Convolutional Recurrent Neural Network (CRNN) branches are constructed And. They contain a CNN encoder and a BiLSTM encoder, respectively. Two CNN encoders respectively utilize asymmetric pooling operation in the vertical or horizontal direction to perform down-sampling in the vertical or horizontal direction, and encode input text images in cooperation with convolution operation to obtain one-dimensional direction characteristics of the text in the vertical or horizontal directionAnd. One-dimensional directional characteristicsAndobtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the time sequence characteristic sequence by a BiLSTM coder to obtain a double-stream coding characteristicAnd。
2) attention-based mechanism for two featuresAndperforming fusion to obtain the double-current fusion coding characteristics:
Wherein features are combinedAndto obtain,Andis composed ofTo (1) aiA component and ajThe number of the components is such that,to representThe attention weight of (a) is given,to representThe attention weight of (a) is given,a function representing a fully connected layer,is composed ofThe length of (a) of (b),is a learnable parameter of a fully connected layer.
(3) And constructing a decoding network, performing characteristic decoding, and outputting a predicted character writing track sequence.
1) Constructing an LSTM decoder encoding features with dual stream fusionFor input, the trace points are predicted in turn. As shown in fig. 3, the LSTM decoder is based on Predicted value of timeAnd hidden layer vectorPrediction ofTrack point information of time. In the end, the flow rate of the gas is controlled,a sequence of trajectories is written for the predicted text. Using hidden layer states of a BilSt encoder in a dual stream coding network as hidden layer initial states of an LSTM decoder。
2) ForTrack point information of time, setting. Wherein the content of the first and second substances,andthe position coordinates representing the time of day are,indicating the state of the pen point at that time to form a thermal code,Andrespectively representing 3 states during writing: "the nib is contacting the paper surface", "the current stroke is done, the temporary stroke is lifted" and "all strokes are done". In particular, the initial input trace point is set to。
(4) And constructing a loss function of the encoding and decoding network, and training a model formed by the double-flow encoding network and the decoding network end to end (end-to-end). The set codec network loss functions include L2 loss, cross entropy loss and dynamic time warping loss.
Loss of L2:
wherein the content of the first and second substances,andin order to be a predictive value for the network,andis the label value and N is the number of trace points.
Cross entropy Loss (CrossEntropy Loss):
wherein the content of the first and second substances,in order to be a predictive value for the network,is the tag value.
Dynamic time warping loss (dynamictimewarping loss): and searching an optimal alignment path between the prediction and the label track sequence by using a dynamic time warping algorithm, and calculating a sequence distance under the optimal alignment path to be used as the global loss of the prediction sequence, thereby realizing the global optimization of the track sequence.
Given a sequence of predicted trajectoriesAnd a sequence of tag tracksThe sequence length is respectivelyAndsetting the Euclidean distance functionFor characterizing points of trackAnddefine an alignment path(whereinFor the length of the alignment path), each item of the alignment path definesAndthe corresponding relation of (1):
using a Dynamic Time Warping (DTW) algorithm to find an alignment path which enables the sequence distance to be minimum, wherein the alignment path is used as an optimal alignment path, and the corresponding sequence distance is used as the global loss of a predicted sequence:
coding and decoding network loss function:
are constants that balance the respective loss weights. In practice, we set up separately0.5,1.0 and 1/6000.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and amendments can be made without departing from the principle of the present invention, and these modifications and amendments should also be considered as the protection scope of the present invention.
Claims (6)
1. A method for restoring writing tracks of character images based on double-stream coding is characterized by comprising the following steps:
Step 1, adjusting a character image to a preset size and carrying out binarization processing;
step 2, constructing a double-stream coding network, wherein the double-stream coding network inputs character images and outputs character images with double-stream fusion coding characteristics;
Step 3, constructing a decoding network, wherein the input of the decoding network is the double-current fusion coding characteristicOutputting a predicted character writing track sequence;
step 4, training a double-flow coding network and a decoding network in a combined manner to obtain a character image writing track recovery network model;
step 5, writing track recovery is carried out by utilizing the trained character image writing track recovery network model;
the double-current coding network comprises a vertical convolution cyclic neural network, a horizontal convolution cyclic neural network and an attention module;
the parallel-connected vertical convolution cyclic neural network and horizontal convolution cyclic neural network both comprise CNN encoders and BilSTM encoders, the CNN encoders in the vertical convolution cyclic neural network perform vertical down-sampling by using vertical down-sampling operation, and then encode input text images by matching with convolution operation to obtain one-dimensional direction characteristics of texts in the horizontal directionOne-dimensional directional characteristics Obtaining a characteristic sequence taking the direction as a time sequence after splitting in the direction dimension, and coding the characteristic sequence of the time sequence by a BilSTM coder in the vertical convolution cyclic neural network to obtain double-current coding characteristics(ii) a The CNN encoder in the horizontal convolution cyclic neural network performs down-sampling in the horizontal direction by using down-sampling operation in the horizontal direction, and then encodes an input text image by matching with convolution operation to obtain one-dimensional direction characteristics of the text in the vertical directionOne-dimensional directional characteristicsSplitting in direction dimension to obtain a characteristic sequence taking direction as time sequence, and coding the characteristic sequence of the time sequence by a BilSTM coder in a horizontal convolution cyclic neural network to obtain double-current coding characteristics;
Encoding features for dual streams in attention moduleAndperforming fusion to obtain the double-current fusion coding characteristics:
Wherein the content of the first and second substances,,by incorporating featuresAndto obtain,Andis composed ofThe ith and jth components of (a),to representThe attention weight of (a) is given,to representThe attention weight of (a) is given,a function representing a fully connected layer,is composed ofThe length of (a) of (b),is a learnable parameter of a fully connected layer.
2. The method for recovering the writing track of the character image based on the double-stream coding as claimed in claim 1, wherein the down-sampling operation is an asymmetric pooling operation, an asymmetric convolution operation or a full-connection layer network operation.
3. The method for recovering writing tracks of character images based on dual-stream coding as claimed in claim 1, wherein the decoding network is an LSTM decoder, and the LSTM decoder is characterized by dual-stream fusion codingSequentially predicting track points for input; LSTM decoder based onPredicted value of timeAnd hidden layer vectorPrediction ofTrack point information of time,Wherein, in the step (A),andto representThe position coordinates of the time of day,to representThe meaning of the state of the pen point at any moment and 3 states is: "the pen point is contacting with the paper surface", "the current stroke is finished, the temporary pen is lifted" and "all strokes are finished", finally,a sequence of trajectories is written for the predicted text.
4. The method for recovering the writing locus of the character image based on the double-stream coding as claimed in claim 3, wherein in the process of jointly training the double-stream coding network and the decoding network, the loss function of the coding and decoding network is as follows:
to balance the predetermined constants of the respective loss weights,for the L2 loss, the formula is calculated as:
wherein the content of the first and second substances,andrespectively the X-coordinate and Y-coordinate predictors of the position of the decoding network,andlabel values of an X coordinate and a Y coordinate of the position are respectively, and N is the number of the track points;
wherein the content of the first and second substances,for the decoding network to pen point stateThe probability of (a) is predicted,a tag value which is a pen point state;
for dynamic time warping loss, an optimal alignment path between the prediction and the label track sequence is found by using a dynamic time warping algorithm, and the sequence distance under the optimal alignment path is calculated as the global loss of the prediction sequence:
given a sequence of predicted trajectoriesAnd a sequence of tag tracksThe sequence length is respectivelyAndsetting the Euclidean distance functionFor characterizing points of trackAnddefine an alignment pathWherein, in the step (A),,for the length of the alignment path, each item of the alignment path definesAndthe corresponding relation of (1):
wherein the content of the first and second substances,to representTo (1) aThe number of the track points is one,to representTo (1) aAnd (3) searching an alignment path which enables the sequence distance to be minimum by using a Dynamic Time Warping (DTW) algorithm as an optimal alignment path, wherein the corresponding sequence distance is used as the global loss of the predicted sequence:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210363354.7A CN114463760B (en) | 2022-04-08 | 2022-04-08 | Character image writing track recovery method based on double-stream coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210363354.7A CN114463760B (en) | 2022-04-08 | 2022-04-08 | Character image writing track recovery method based on double-stream coding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114463760A CN114463760A (en) | 2022-05-10 |
CN114463760B true CN114463760B (en) | 2022-06-28 |
Family
ID=81416905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210363354.7A Active CN114463760B (en) | 2022-04-08 | 2022-04-08 | Character image writing track recovery method based on double-stream coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114463760B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117853378A (en) * | 2024-03-07 | 2024-04-09 | 湖南董因信息技术有限公司 | Text handwriting display method based on metric learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410242A (en) * | 2018-09-05 | 2019-03-01 | 华南理工大学 | Method for tracking target, system, equipment and medium based on double-current convolutional neural networks |
CN110188669A (en) * | 2019-05-29 | 2019-08-30 | 华南理工大学 | A kind of aerial hand-written character track restoration methods based on attention mechanism |
WO2021136144A1 (en) * | 2019-12-31 | 2021-07-08 | 中兴通讯股份有限公司 | Character restoration method and apparatus, storage medium, and electronic device |
CN114428866A (en) * | 2022-01-26 | 2022-05-03 | 杭州电子科技大学 | Video question-answering method based on object-oriented double-flow attention network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11158055B2 (en) * | 2019-07-26 | 2021-10-26 | Adobe Inc. | Utilizing a neural network having a two-stream encoder architecture to generate composite digital images |
CN111104532B (en) * | 2019-12-30 | 2023-04-25 | 华南理工大学 | RGBD image joint recovery method based on double-flow network |
CN111626238B (en) * | 2020-05-29 | 2023-08-04 | 京东方科技集团股份有限公司 | Text recognition method, electronic device and storage medium |
-
2022
- 2022-04-08 CN CN202210363354.7A patent/CN114463760B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109410242A (en) * | 2018-09-05 | 2019-03-01 | 华南理工大学 | Method for tracking target, system, equipment and medium based on double-current convolutional neural networks |
CN110188669A (en) * | 2019-05-29 | 2019-08-30 | 华南理工大学 | A kind of aerial hand-written character track restoration methods based on attention mechanism |
WO2021136144A1 (en) * | 2019-12-31 | 2021-07-08 | 中兴通讯股份有限公司 | Character restoration method and apparatus, storage medium, and electronic device |
CN114428866A (en) * | 2022-01-26 | 2022-05-03 | 杭州电子科技大学 | Video question-answering method based on object-oriented double-flow attention network |
Non-Patent Citations (1)
Title |
---|
《OBC306: A Large-Scale Oracle Bone Character Recognition Dataset》;shuangping huang;《2019 International Conference on Document Analysis and Recognition 》;20200203;第681-688页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114463760A (en) | 2022-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102473543B1 (en) | Systems and methods for digital ink interaction | |
Kosmala et al. | On-line handwritten formula recognition using hidden Markov models and context dependent graph grammars | |
Zhelezniakov et al. | Online handwritten mathematical expression recognition and applications: A survey | |
CN111553350A (en) | Attention mechanism text recognition method based on deep learning | |
Jain et al. | Unconstrained OCR for Urdu using deep CNN-RNN hybrid networks | |
CN113673432A (en) | Handwriting recognition method, touch display device, computer device and storage medium | |
CN111046771A (en) | Training method of network model for recovering writing track | |
CN114463760B (en) | Character image writing track recovery method based on double-stream coding | |
Gan et al. | In-air handwritten Chinese text recognition with temporal convolutional recurrent network | |
US11837001B2 (en) | Stroke attribute matrices | |
JP6055065B1 (en) | Character recognition program and character recognition device | |
US20050276480A1 (en) | Handwritten input for Asian languages | |
CN111738167A (en) | Method for recognizing unconstrained handwritten text image | |
Choudhury et al. | Trajectory-based recognition of in-air handwritten Assamese words using a hybrid classifier network | |
CN113435398B (en) | Signature feature identification method, system, equipment and storage medium based on mask pre-training model | |
CN114757969B (en) | Character and image writing track recovery method based on global tracking decoding | |
Xu et al. | On-line sample generation for in-air written chinese character recognition based on leap motion controller | |
CN115620314A (en) | Text recognition method, answer text verification method, device, equipment and medium | |
Bezine et al. | Handwriting perceptual classification and synthesis using discriminate HMMs and progressive iterative approximation | |
Assaleh et al. | Recognition of handwritten Arabic alphabet via hand motion tracking | |
Alwajih et al. | DeepOnKHATT: an end-to-end Arabic online handwriting recognition system | |
Tan et al. | An End-to-End Air Writing Recognition Method Based on Transformer | |
CN113673635B (en) | Hand-drawn sketch understanding deep learning method based on self-supervision learning task | |
WO2022180725A1 (en) | Character recognition device, program, and method | |
Shi et al. | In-air Handwritten English Word Recognition Based on Corner Point Feature Fusion and Contrastive Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |