CN112632911B - Chinese character coding method based on character embedding - Google Patents

Chinese character coding method based on character embedding Download PDF

Info

Publication number
CN112632911B
CN112632911B CN202110001263.4A CN202110001263A CN112632911B CN 112632911 B CN112632911 B CN 112632911B CN 202110001263 A CN202110001263 A CN 202110001263A CN 112632911 B CN112632911 B CN 112632911B
Authority
CN
China
Prior art keywords
character
substructure
parts
embedding
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110001263.4A
Other languages
Chinese (zh)
Other versions
CN112632911A (en
Inventor
柯逍
刘童安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN202110001263.4A priority Critical patent/CN112632911B/en
Publication of CN112632911A publication Critical patent/CN112632911A/en
Application granted granted Critical
Publication of CN112632911B publication Critical patent/CN112632911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a Chinese character coding method based on character embedding, which comprises the following steps: step S1: constructing a Chinese character set, decomposing each character into a plurality of substructures, constructing a substructure set, defining the contribution degree of each substructure to the character, and constructing a substructure contribution degree matrix to each character according to the substructure set; step S2: constructing a substructure embedding matrix and training according to the obtained substructure set and the contribution matrix of the substructure to each character, and extracting to obtain a character embedding matrix; step S3: inputting characters, and acquiring character embedding through a character embedding matrix. The invention can effectively reduce the dimension of Chinese character coding, so that the Chinese character coding with similar structure has positive correlation, and effectively improves the character recognition efficiency.

Description

Chinese character coding method based on character embedding
Technical Field
The invention relates to the field of pattern recognition and computer vision, in particular to a Chinese character coding method based on character embedding.
Background
Language is one of the main ways that humans transmit information, and words are written language, which is also one of the most widespread ways that humans transmit information visually.
With the rapid development of technologies such as artificial intelligence, internet and the like, the automatic recognition of texts in images by using a computer is of great significance. For the task of character recognition, characters are usually coded by a one-hot coding mode, the coding mode ignores the correlation among similar characters and is sparse, and for the task of recognizing English characters and numbers, the applicability is still good due to the fact that the number of categories is small. However, for the task of recognizing Chinese characters, because of the various categories of Chinese characters, there are thousands of common characters, which results in slower network convergence by using unique hot coding, and completely ignores the structural shape similarity between Chinese characters, resulting in low accuracy and low efficiency of character recognition.
Disclosure of Invention
In view of the above, the present invention provides a method for encoding chinese characters based on character embedding, which can effectively reduce the dimensionality of chinese character encoding, so that chinese character encoding with similar structures has positive correlation, and effectively improve character recognition efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
a Chinese character coding method based on character embedding comprises the following steps:
step S1: constructing a Chinese character set, decomposing each character into a plurality of substructures, constructing a substructure set, defining the contribution degree of each substructure to the character, and constructing a substructure contribution degree matrix to each character according to the substructure set;
step S2: constructing a substructure embedding matrix and training according to the obtained substructure set and the contribution matrix of the substructure to each character, and extracting to obtain a character embedding matrix;
step S3: inputting characters, and acquiring character embedding through a character embedding matrix.
Further, the step S1 is specifically:
step S11, determining the character set to be coded, the ia th Chinese character is chariaIn total, ncharsIf a Chinese character needs to be embedded, the character set is chars ═ charia|ia=1,2,...,nchars};
Step S12, all Chinese characters in char are split to obtain the partial of all substructures ═ { part }ib|ib=1,2,...,nparts}, where partibIs the ib-th substructure, npartsNumber of elements that are parts;
step S13, calculating nfreq of substructure frequency tableparts={nfreqib|ib=1,2,...,npartsWherein nfreqibDenotes partibIs nfreqibA substructure of individual characters;
step S14: because the split result is character split when k is 1, chars is a subset of parts, and a mapping relation g is established, so that part is formedib=partg(ia)
Step S15: calculating the contribution degree of each substructure in parts to each character in chars to obtain npartsLine ncharsThe contribution matrix charparts of the column.
Further, the step S12 is specifically:
(1) presetting that each Chinese character can be split into k substructures;
(2) k is an integer not less than 1, and when k is 1, the split result is a character per se;
(3) the maximum value of k being the number of strokes of a character or kmax,kmaxA maximum manually set split number;
splitting all Chinese characters in char according to (1) - (3) to obtain all substructures of parts ═ partib|ib=1,2,...,nparts}, where partibIs the ib-th substructure, npartsIs the number of elements of parts.
Further, the step S15 is specifically:
(1) when a Chinese character is split into k parts, the contribution degree of the split sub-structure to the character is
Figure BDA0002881458490000031
(2) When one substructure appears in a plurality of splitting results of one character at the same time, calculating the contribution degree by taking the splitting method with the minimum k;
(3) if a sub-structure cannot form a character, the contribution degree of the sub-structure to the character is 0;
calculating the contribution degree of each substructure in parts to each character in chars according to (1) - (3) to obtain npartsLine ncharsThe contribution matrix charparts of the column.
Further, the step S2 is specifically:
step S21: construction of a pair of sub-structural Embedded matrices embs1, embs2, embs1 and embs2 all npartsA matrix with m rows and m columns, wherein m is a vector dimension obtained by embedding which is manually set;
step S22: if each substructure in parts is encoded uniquely, then partibIs encoded as ponehotibThen the one-hot coding of all substructures is ponehots ═ ponehot { (ponehot)ib|ib=1,2,…,nparts};
Step S23: for the ib-th substructure, ponehotibWith probability f (nfreq)ib) As the central substructure, the probability calculation method is as follows:
Figure BDA0002881458490000041
wherein min is a minimum function, alpha is a parameter set manually, then a window with the size of t is set, t is a positive integer parameter set manually, the distribution of the ib-th row of charparts is used as the probability distribution of characters, t characters are extracted, the character numbers are mapped to the substructure numbers by mapping g and are placed in the window to be used as related substructures, r substructures are extracted randomly to be used as unrelated substructures, and r is the positive integer parameter set manually;
step S24: the computation of embedding the one-hot code into the vector by the sub-structure embedding matrix is as follows:
emb=ponehot×embsparts
wherein the embspartsEmbedding a matrix for a substructure, using ponehot as a unique hot code of the substructure, using emb as an embedded vector, and embedding the unique hot code of the central substructure into an embedded vector emb1 through embs 1;
step S25: the one-hot coding of t related substructures is embedded by embs2 to obtain t embedded vectors emb2ps ═ emb2pic1, 2, …, t }, where emb2picThe ith of the t embedded vectors;
step S26: the one-hot encoding of r unrelated substructures is embedded by embs2 to obtain t embedded vectors emb2ns ═ { emb2nid1, 2., r }, where emb2nidIs the id-th of the r embedded vectors;
step S27: loss is calculated and the network is optimized using the following formula:
Figure BDA0002881458490000042
Figure BDA0002881458490000051
wherein ∑icA summation symbol, Σ, representing the traversal ic 1, 2idA summation symbol representing the traversal id 1, 2, …, r,
Figure BDA0002881458490000052
is emb2picThe transpose of (a) is performed,
Figure BDA0002881458490000053
is emb2nidTranspose of (3), the expression of logsigmoid function is as follows:
Figure BDA0002881458490000054
wherein x is an independent variable, e is a natural constant, and log is a logarithmic function with e as a base;
step S28: based on steps S23-S27, go through ib ═ 1, 2partsEmbedding the embs1 into the matrix as a trained substructure for a plurality of times until the network converges;
step S29: extracting a character embedding matrix embschar from the embs1 through a mapping relation g, wherein the line ia of the embschar corresponds to the line g (ia) of the embs1, and extracting a character-independent-hot-coding table conehots from ponehots through the mapping relation gia|ia=1,2,...,ncharsTherein conhotia=ponehotg(ia)
Further, the step S3 is specifically:
step S31: selecting a Chinese character to be coded;
step S32: using the condhoss to code the Chinese characters to be coded into one-hot codes;
step S33: the one-hot encoding is embedded as a low-dimensional vector using embschar.
Compared with the prior art, the invention has the following beneficial effects:
the invention can effectively reduce the dimension of Chinese character coding, enables the Chinese character coding with similar structure to have positive correlation, and effectively improves the character recognition efficiency
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
Referring to fig. 1, the present invention provides a method for encoding chinese characters based on character embedding, comprising the following steps:
step S1: constructing a Chinese character set, decomposing each character into a plurality of substructures, constructing a substructure set, defining the contribution of each substructure to the character, and constructing a contribution matrix of the substructures to each character according to the substructure set;
step S2: constructing a substructure embedding matrix and training according to the obtained substructure set and the contribution matrix of the substructure to each character, and extracting to obtain a character embedding matrix;
step S3: inputting characters, and acquiring character embedding through a character embedding matrix.
In this embodiment, the step S1 specifically includes:
step S11, determining the character set to be coded, the ia th Chinese character is chariaIn total, ncharsIf a Chinese character needs to be embedded, the character set is chars ═ charia|ia=1,2,...,nchars};
Step S12, (1) presetting that each Chinese character can be split into k substructures;
(2) k is an integer not less than 1, and when k is 1, the split result is a character per se;
(3) the maximum value of k being the number of strokes of a character or kmax,kmaxA maximum manually set split number;
splitting all Chinese characters in char according to (1) - (3) to obtain all substructures of parts ═ partib|ib=1,2,...,nparts}, where partibIs the ib-th substructure, npartsIs the number of elements of parts.
Step S13, calculating nfreq of substructure frequency tableparts={nfreqib|ib=1,2,...,npartsWherein nfreqibRepresenting partibIs nfreqibA substructure of individual characters;
step S14: because the split result is character split when k is 1, chars is a subset of parts, and a mapping relation g is established, so that part is formedib=partg(ia)
Step S15: (1) when a Chinese character is split into k parts, the contribution degree of the split sub-structure to the character is
Figure BDA0002881458490000071
(2) When one substructure appears in a plurality of splitting results of one character at the same time, calculating the contribution degree by taking the splitting method with the minimum k;
(3) if a sub-structure cannot form a character, the contribution degree of the sub-structure to the character is 0;
calculating the contribution degree of each substructure in parts to each character in chars according to (1) - (3) to obtain npartsLine ncharsThe contribution matrix charparts of the column.
In this embodiment, the step S2 specifically includes:
step S21: construction of a pair of sub-structural Embedded matrices embs1, embs2, embs1 and embs2 all npartsA matrix with m rows and m columns, wherein m is a vector dimension obtained by embedding which is manually set;
step S22: if each substructure in parts is encoded uniquely, then partibIs encoded as ponehotibThen the one-hot coding of all substructures is ponehots ═ ponehot { (ponehot)ib|ib=1,2,...,nparts};
Step S23: for the ib-th substructure, ponehotibWith probability f (nfreq)ib) As the central substructure, the probability calculation method is as follows:
Figure BDA0002881458490000081
wherein min is a minimum function, alpha is a parameter set manually, then a window with the size of t is set, t is a positive integer parameter set manually, the distribution of the ib-th row of charparts is used as the probability distribution of characters, t characters are extracted, the character numbers are mapped to the substructure numbers by mapping g and are placed in the window to be used as related substructures, r substructures are extracted randomly to be used as unrelated substructures, and r is the positive integer parameter set manually;
step S24: the computation of embedding the one-hot code into the vector by the sub-structure embedding matrix is as follows:
emb=ponehot×embsparts
wherein the embspartsEmbedding a matrix for a substructure, namely, ponehot is the single-hot coding of the substructure, emb is an embedded vector, and embedding the single-hot coding of the central substructure into an embedded vector emb1 through embs 1;
step S25: the one-hot coding of t related substructures is embedded by the embs2 to obtain t embedded vectors emb2ps ═ { emb2pic1, 2, …, t }, where emb2picThe ith of the t embedded vectors;
step S26: the one-hot encoding of r unrelated substructures is embedded by embs2 to obtain t embedded vectors emb2ns ═ { emb2nid1, 2., r }, where emb2nidIs the id-th of the r embedded vectors;
step S27: loss is calculated and the network is optimized using the following formula:
Figure BDA0002881458490000082
wherein ∑icA summation symbol, Σ, representing the traversal ic 1, 2idA summation symbol representing the traversal id 1, 2., r,
Figure BDA0002881458490000092
is emb2picThe transpose of (a) is performed,
Figure BDA0002881458490000093
is emb2nidTranspose of (3), the expression of logsigmoid function is as follows:
Figure BDA0002881458490000091
wherein x is an independent variable, e is a natural constant, and log is a logarithmic function with e as a base;
step S28: based on steps S23-S27, go through ib ═ 1, 2partsEmbedding the embs1 into the matrix as a trained substructure for a plurality of times until the network converges;
step S29: extracting a character embedding matrix embschar from the embs1 through a mapping relation g, wherein the line ia of the embschar corresponds to the line g (ia) of the embs1, and extracting a character-independent-hot-coding table conehots from ponehots through the mapping relation gia|ia=1,2,...,ncharsTherein conhotia=ponehotg(ia)
In this embodiment, the step S3 specifically includes:
step S31: selecting a Chinese character to be coded;
step S32: using the condhoss to code the Chinese characters to be coded into one-hot codes;
step S33: the one-hot encoding is embedded as a low-dimensional vector using embschar.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (4)

1. A Chinese character coding method based on character embedding is characterized by comprising the following steps:
step S1: constructing a Chinese character set, decomposing each character into a plurality of substructures, constructing a substructure set, defining the contribution degree of each substructure to the character, and constructing a substructure contribution degree matrix to each character according to the substructure set;
step S2: constructing a substructure embedding matrix and training according to the obtained substructure set and the contribution matrix of the substructure to each character, and extracting to obtain a character embedding matrix;
step S3: inputting characters, and acquiring character embedding through a character embedding matrix;
the step S1 specifically includes:
step S11, determining the character set to be coded, the ia-th Chinese character is chariaIn total, ncharsIf a Chinese character needs to be embedded, the character set is chars ═ charia|ia=1,2,...,nchars};
Step S12, all Chinese characters in char are split to obtain the partial of all substructures ═ { part }ib|ib=1,2,...,npartsH, part thereinibIs the ib-th substructure, npartsNumber of elements that are parts;
step S13, calculating nfreq of substructure frequency tableparts={nfreqib|ib=1,2,...,npartsWherein nfreqibDenotes partibIs nfreqibA substructure of individual characters;
let each Chinese character be able to be split into k substructures, the contribution degree of the split substructures to the character is
Figure FDA0003553365850000011
If a sub-structure cannot form a character, the contribution degree of the sub-structure to the character is 0;
step S14: because the split result is character split when k is 1, chars is a subset of parts, and a mapping relation g is established, so that part is formedib=partg(ia)
Step S15: calculating the contribution degree of each substructure in parts to each character in chars to obtain npartsLine ncharsA contribution matrix charparts of the columns;
the step S2 specifically includes:
step S21: construction of a pair of sub-structural Embedded matrices embs1, embs2, embs1 and embs2 all npartsA matrix with m rows and m columns, wherein m is a vector dimension obtained by embedding which is manually set;
step S22: if each substructure in parts is encoded uniquely, then partibIs encoded as ponehotibThen the one-hot coding of all substructures is ponehots ═ ponehot { (ponehot)ib|ib=1,2,...,nparts};
Step S23: for the ib-th substructure, ponehotibWith probability f (nfreq)ib) As the central substructure, the probability calculation method is as follows:
Figure FDA0003553365850000021
wherein min is a minimum function, alpha is a parameter set manually, then a window with the size of t is set, t is a positive integer parameter set manually, the distribution of the ib-th row of charparts is used as the probability distribution of characters, t characters are extracted, the character numbers are mapped to the substructure numbers by mapping g and are placed in the window to be used as related substructures, r substructures are extracted randomly to be used as unrelated substructures, and r is the positive integer parameter set manually;
step S24: the computation of embedding the one-hot code into the vector by the sub-structure embedding matrix is as follows:
emb=ponehot×embsparts
wherein the embspartsEmbedding a matrix for a substructure, using ponehot as a unique hot code of the substructure, using emb as an embedded vector, and embedding the unique hot code of the central substructure into an embedded vector emb1 through embs 1;
step S25: the one-hot coding of t related substructures is embedded by the embs2 to obtain t embedded vectors emb2ps ═ { emb2pic1, 2.., t }, where emb2picThe ith of the t embedded vectors;
step S26: the one-hot encoding of r unrelated substructures is embedded by embs2 to obtain t embedded vectors emb2ns ═ { emb2nid1, 2., r }, where emb2nidIs the id-th of the r embedded vectors;
step S27: loss is calculated and the network is optimized using the following formula:
Figure FDA0003553365850000031
wherein ∑icA summation symbol, Σ, representing the traversal ic 1, 2idA summation symbol representing the traversal id 1, 2., r,
Figure FDA0003553365850000032
is emb2picThe transpose of (a) is performed,
Figure FDA0003553365850000033
is emb2nidTranspose of (3), the expression of logsigmoid function is as follows:
Figure FDA0003553365850000034
wherein x is an independent variable, e is a natural constant, and log is a logarithmic function with e as a base;
step S28: based on steps S23-S27, go through ib ═ 1, 2partsEmbedding the embs1 into the matrix as a trained substructure for a plurality of times until the network converges;
step S29: extracting a character embedding matrix embschar from the embs1 through a mapping relation g, wherein the line ia of the embschar corresponds to the line g (ia) of the embs1, and extracting a character-independent-hot-coding table conehots from ponehots through the mapping relation gia|ia=1,2,...,ncharsTherein conhotia=ponehotg(ia)
2. The method for encoding chinese characters based on character embedding of claim 1, wherein said step S12 specifically comprises:
(1) presetting that each Chinese character can be split into k substructures;
(2) k is an integer not less than 1, and when k is 1, the split result is a character per se;
(3) the maximum value of k being the number of strokes of a character or kmax,kmaxA maximum manually set split number;
splitting all Chinese characters in char according to (1) - (3) to obtain all substructures of parts ═ partib|ib=1,2,...,nparts}, where partibIs the ib-th substructure, npartsIs the number of elements of parts.
3. The method for encoding chinese characters based on character embedding of claim 2, wherein said step S15 specifically comprises:
(1) when a Chinese character is split into k parts, the contribution degree of the split sub-structure to the character is
Figure FDA0003553365850000041
(2) When a substructure appears in a plurality of splitting results of a character at the same time, calculating the contribution degree by taking the splitting method with the minimum k;
(3) if a sub-structure cannot form a character, the contribution degree of the sub-structure to the character is 0;
calculating the contribution degree of each substructure in parts to each character in chars according to (1) - (3) to obtain npartsLine ncharsThe contribution matrix charparts of the columns.
4. The method for encoding chinese characters based on character embedding of claim 1, wherein said step S3 specifically comprises:
step S31: selecting a Chinese character to be coded;
step S32: using the condhoss to code the Chinese characters to be coded into one-hot codes;
step S33: the one-hot encoding is embedded as a low-dimensional vector using embschar.
CN202110001263.4A 2021-01-04 2021-01-04 Chinese character coding method based on character embedding Active CN112632911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110001263.4A CN112632911B (en) 2021-01-04 2021-01-04 Chinese character coding method based on character embedding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110001263.4A CN112632911B (en) 2021-01-04 2021-01-04 Chinese character coding method based on character embedding

Publications (2)

Publication Number Publication Date
CN112632911A CN112632911A (en) 2021-04-09
CN112632911B true CN112632911B (en) 2022-05-13

Family

ID=75290846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110001263.4A Active CN112632911B (en) 2021-01-04 2021-01-04 Chinese character coding method based on character embedding

Country Status (1)

Country Link
CN (1) CN112632911B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4327421A (en) * 1976-05-13 1982-04-27 Transtech International Corporation Chinese printing system
CN103544141A (en) * 2012-07-16 2014-01-29 哈尔滨安天科技股份有限公司 Method and system for extracting significant character strings in binary data
CN109697285A (en) * 2018-12-13 2019-04-30 中南大学 Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4327421A (en) * 1976-05-13 1982-04-27 Transtech International Corporation Chinese printing system
CN103544141A (en) * 2012-07-16 2014-01-29 哈尔滨安天科技股份有限公司 Method and system for extracting significant character strings in binary data
CN109697285A (en) * 2018-12-13 2019-04-30 中南大学 Enhance the hierarchical B iLSTM Chinese electronic health record disease code mask method of semantic expressiveness

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
脱机印刷体彝族文字识别系统的原理与实现;朱宗晓;《万方数据会议库》;20120625;第1-5页 *

Also Published As

Publication number Publication date
CN112632911A (en) 2021-04-09

Similar Documents

Publication Publication Date Title
CN109582789B (en) Text multi-label classification method based on semantic unit information
US8369612B2 (en) System and methods for Arabic text recognition based on effective Arabic text feature extraction
CN108171198B (en) Continuous sign language video automatic translation method based on asymmetric multilayer LSTM
CN109948691B (en) Image description generation method and device based on depth residual error network and attention
CN109492202B (en) Chinese error correction method based on pinyin coding and decoding model
CN110209801A (en) A kind of text snippet automatic generation method based on from attention network
CN109285111B (en) Font conversion method, device, equipment and computer readable storage medium
Puigcerver et al. ICDAR2015 competition on keyword spotting for handwritten documents
CN111753557A (en) Chinese-more unsupervised neural machine translation method fusing EMD minimized bilingual dictionary
CN110196903B (en) Method and system for generating abstract for article
CN111914825B (en) Character recognition method and device and electronic equipment
CN111581374A (en) Text abstract obtaining method and device and electronic equipment
CN111312356A (en) Traditional Chinese medicine prescription generation method based on BERT and integration efficacy information
CN109255381A (en) A kind of image classification method based on the sparse adaptive depth network of second order VLAD
CN101673398A (en) Method for splitting images based on clustering of immunity sparse spectrums
CN112036137A (en) Deep learning-based multi-style calligraphy digital ink simulation method and system
CN112632911B (en) Chinese character coding method based on character embedding
CN106934458A (en) Multilayer automatic coding and system based on deep learning
CN115170403A (en) Font repairing method and system based on deep meta learning and generation countermeasure network
CN111797611B (en) Antithetical couplet generation model, antithetical couplet generation method, antithetical couplet generation device, computer equipment and medium
Chua et al. Unsupervised learning of patterns in data streams using compression and edit distance
CN116226357B (en) Document retrieval method under input containing error information
CN111523325A (en) Chinese named entity recognition method based on strokes
Valy et al. Text Recognition on Khmer Historical Documents using Glyph Class Map Generation with Encoder-Decoder Model.
CN108921911B (en) Method for automatically converting structured picture into source code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant