CN113961696A - Oracle automatic conjugation verification method based on Obibert - Google Patents

Oracle automatic conjugation verification method based on Obibert Download PDF

Info

Publication number
CN113961696A
CN113961696A CN202111273361.XA CN202111273361A CN113961696A CN 113961696 A CN113961696 A CN 113961696A CN 202111273361 A CN202111273361 A CN 202111273361A CN 113961696 A CN113961696 A CN 113961696A
Authority
CN
China
Prior art keywords
oracle
conjugation
text
obibert
automatic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111273361.XA
Other languages
Chinese (zh)
Other versions
CN113961696B (en
Inventor
熊晶
翟雪
陈利平
刘国英
刘永革
韩胜伟
王楠
张展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anyang Normal University
Original Assignee
Anyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyang Normal University filed Critical Anyang Normal University
Priority to CN202111273361.XA priority Critical patent/CN113961696B/en
Priority claimed from CN202111273361.XA external-priority patent/CN113961696B/en
Publication of CN113961696A publication Critical patent/CN113961696A/en
Application granted granted Critical
Publication of CN113961696B publication Critical patent/CN113961696B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an automatic oracle conjugation verification method based on Obibert, which comprises the following steps of: s1, collecting a large amount of explanation texts of the oracle, and forming an oracle Bert corpus under the direct participation of an oracle expert; s2, forming a summation vector by an oracle paraphrase text in the oracle Bert corpus, specifically comprising the summation of Token embedding, text embedding and position embedding, and obtaining an Obibert neural network model; s3, judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not by passing the oracle bone script on the conjugated oracle bone through an Obibert NSP model. The invention judges whether the result of the automatic conjugation of the oracle fragments is correct or not through Obibert, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle expressure text, namely, the method for judging whether the result of the automatic conjugation of the oracle fragments is correct or not is provided, and the application of the oracle is further improved.

Description

Oracle automatic conjugation verification method based on Obibert
Technical Field
The invention belongs to the technical field of oracle, and particularly relates to an oracle automatic conjugation verification method based on Obibert.
Background
The oracle-bone inscription is a treasure of Chinese nationality and has important historical value and scientific research significance. However, oracle bone fragments often exist in fragment form due to the characteristics and history of the oracle bone fragments, the materials and the like, and the correct splicing of the oracle bone fragments together is called oracle bone conjugation. In the actual oracle study, the study object is an image of an oracle photograph, a rubbing, and the like rather than an oracle real object. Traditional oracle conjugation research is completed by oracle experts through the steps of collecting oracle images, copying, cutting, splicing, proofreading and the like, and only experts with extremely deep research accumulation and conjugation experience can perform the research. This has greatly hindered the progress of modern oracle studies. The development of oracle conjugation studies has been greatly facilitated since the introduction of computer technology into oracle studies, as edge and contour based automatic conjugation of oracle fragments can be achieved based on image processing techniques. But the new problems are: the edges and the outlines of the oracle fragments are not strictly sutured, and due to abrasion of oracle materials and the existence of fine fragments, a large number of candidate results appear in the automatic conjugation (hereinafter referred to as automatic conjugation) of the oracle fragments of a computer, and obviously, the use of the image processing technology alone is not sufficient for the research work of the oracle fragment conjugation.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an automatic oracle conjugation verification method based on Obibert. And selecting the selectable item with the highest probability from the candidate results of automatic conjugation of the computer by combining the oracle explanation text, namely providing a method for judging whether the automatic conjugation result of the oracle fragments is correct.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention provides an automatic oracle conjugation verification method based on Obibert, which comprises the following steps of:
s1, collecting a large number of explanation texts of the oracle characters, and constructing an oracle character Bert corpus;
s2, vectorizing the oracle explanation text in the oracle Bert corpus to form an addition vector to obtain an Obibert neural network model, wherein the Oberbert neural network model specifically comprises a Token embedding, text embedding and position embedding mixed addition;
s3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the paraphrase on any two automatically conjugated oracle bones, connecting the paraphrase on any two automatically conjugated carapace bones to obtain two sentences as input, adding a mark symbol to the NSP model, using the corresponding output as semantic representation of the paraphrase text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different paraphrase text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
As a preferred technical solution of the present invention, step S1 specifically includes the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
As a preferable technical scheme of the invention, the method also comprises the following steps:
s4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
As a preferable technical scheme of the invention, the method also comprises the following steps:
s5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
As a preferred technical scheme of the invention, Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle explanation sentence is taken as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](wherein n ═ 1,2, 3.) denotes that temporarily alsoAn unknown oracle bone character.
As a preferred technical scheme of the invention, text embedding is an operation aiming at carapace-bone-script explanation sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
As a preferred technical scheme of the invention, the position embedding is to learn a vector representation at each position in the oracle explanation sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
As a preferred technical scheme of the invention, NSP is Next sequence Prediction, and the tasks of NSP are as follows: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
Compared with the prior art, the invention has the following beneficial effects:
the method judges whether the result of the automatic conjugation of the oracle bone fragments is correct or not through the oracle bone script corpus, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle bone expressage text, namely, the method for judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not is provided, and the application of the oracle bone fragments is further improved.
Drawings
FIG. 1 is a working diagram of the oracle automatic conjugation verification method based on Obibert of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
In order to achieve the object of the present invention, as shown in fig. 1, in one embodiment of the present invention, there is provided an ObiBert-based oracle automatic conjugation verification method, including the steps of:
s1, collecting a large number of explanation texts of the oracle characters, and constructing the oracle character Bert corpus. The method specifically comprises the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
S2, vectorizing the oracle explanation text in the oracle Bert corpus to form a sum vector, and obtaining the Obibert neural network model, wherein the Oberbert neural network model specifically comprises Token embedding, text embedding and position embedding mixed sum.
Specifically, Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle explanation sentence is used as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](wherein n ═ 1,2, 3.) means that it is not yet temporaryThe recognized oracle bone word.
Specifically, text embedding is an operation on oracle paraphrase sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
Specifically, the position embedding is to learn a vector representation at each position in the oracle explanation sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
S3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the paraphrase on any two automatically conjugated oracle bones, connecting the paraphrase on any two automatically conjugated carapace bones to obtain two sentences as input, adding a mark symbol to the NSP model, using the corresponding output as semantic representation of the paraphrase text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different paraphrase text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
The NSP is a Next sequence Prediction, and the tasks of the NSP are as follows: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
S4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
S5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
The method judges whether the result of the automatic conjugation of the oracle bone fragments is correct or not through the oracle bone script corpus, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle bone expressage text, namely, the method for judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not is provided, and the application of the oracle bone fragments is further improved.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. An automatic oracle conjugation verification method based on Obibert is characterized by comprising the following steps:
s1, collecting a large number of explanation texts of the oracle characters, and constructing an oracle character Bert corpus;
s2, vectorizing the oracle explanation text in the oracle Bert corpus to form an addition vector to obtain an Obibert neural network model, wherein the Oberbert neural network model specifically comprises a Token embedding, text embedding and position embedding mixed addition;
s3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the postscript linked on any two automatically conjugated oracle bones to obtain two sentences as input, adding a mark symbol by the NSP model, using the corresponding output as semantic representation of the postscript text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different postscript text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
2. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein step S1 specifically comprises the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
3. The ObiBert-based oracle automatic conjugation verification method according to claim 1, further comprising the steps of:
s4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
4. The ObiBert-based oracle automatic conjugation verification method according to claim 1, further comprising the steps of:
s5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
5. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle paraphrase sentence is taken as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](where n ═ 1,2, 3.) denotes an oracle character which is not recognized temporarily.
6. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein text embedding is an operation for oracle paraphrase sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
7. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein the position embedding is to learn a vector representation at each position in the oracle paraphrase sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
8. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein NSP is Next sequence Prediction, and the tasks of NSP are: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
CN202111273361.XA 2021-10-29 Automatic oracle conjugation verification method based on ObiBert Active CN113961696B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111273361.XA CN113961696B (en) 2021-10-29 Automatic oracle conjugation verification method based on ObiBert

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111273361.XA CN113961696B (en) 2021-10-29 Automatic oracle conjugation verification method based on ObiBert

Publications (2)

Publication Number Publication Date
CN113961696A true CN113961696A (en) 2022-01-21
CN113961696B CN113961696B (en) 2024-05-14

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587215A (en) * 2022-10-18 2023-01-10 河南大学 Residual broken Chinese character image conjugation method based on sentence continuity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102693222A (en) * 2012-05-25 2012-09-26 熊晶 Carapace bone script explanation machine translation method based on example
US20130188863A1 (en) * 2012-01-25 2013-07-25 Richard Linderman Method for context aware text recognition
CN108509587A (en) * 2018-03-29 2018-09-07 浙江师范大学 The inquiry inscriptions on bones or tortoise shells opens up database establishment and the search method of figure and its original text and annotations
CN110413785A (en) * 2019-07-25 2019-11-05 淮阴工学院 A kind of Automatic document classification method based on BERT and Fusion Features
CN110807100A (en) * 2019-10-30 2020-02-18 安阳师范学院 Oracle-bone knowledge map construction method and system based on multi-modal data
CN111881260A (en) * 2020-07-31 2020-11-03 安徽农业大学 Neural network emotion analysis method and device based on aspect attention and convolutional memory

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130188863A1 (en) * 2012-01-25 2013-07-25 Richard Linderman Method for context aware text recognition
CN102693222A (en) * 2012-05-25 2012-09-26 熊晶 Carapace bone script explanation machine translation method based on example
CN108509587A (en) * 2018-03-29 2018-09-07 浙江师范大学 The inquiry inscriptions on bones or tortoise shells opens up database establishment and the search method of figure and its original text and annotations
CN110413785A (en) * 2019-07-25 2019-11-05 淮阴工学院 A kind of Automatic document classification method based on BERT and Fusion Features
CN110807100A (en) * 2019-10-30 2020-02-18 安阳师范学院 Oracle-bone knowledge map construction method and system based on multi-modal data
CN111881260A (en) * 2020-07-31 2020-11-03 安徽农业大学 Neural network emotion analysis method and device based on aspect attention and convolutional memory

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张柯文;李翔;朱全银;方强强;马甲林;成洁怡;丁行硕;: "一种基于WSD层级记忆网络建模的文档表示方法", 淮阴工学院学报, no. 03, 15 June 2020 (2020-06-15) *
王华锋;王久阳;: "一种基于Roberta的中文实体关系联合抽取模型", 北方工业大学学报, no. 02, 15 April 2020 (2020-04-15) *
王爱民;葛彦强;刘国英;葛文英;周宏宇: "计算机辅助甲骨文缀合关键技术研究", 计算机测量与控制, no. 007, 31 December 2010 (2010-12-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115587215A (en) * 2022-10-18 2023-01-10 河南大学 Residual broken Chinese character image conjugation method based on sentence continuity
CN115587215B (en) * 2022-10-18 2023-10-20 河南大学 Residual-part Chinese sketch conjugation method based on statement smoothness

Similar Documents

Publication Publication Date Title
CN108764074B (en) Subjective item intelligently reading method, system and storage medium based on deep learning
CN110110585B (en) Intelligent paper reading implementation method and system based on deep learning and computer program
US6252988B1 (en) Method and apparatus for character recognition using stop words
US20040006467A1 (en) Method of automatic language identification for multi-lingual text recognition
CN112287920B (en) Burma language OCR method based on knowledge distillation
CN110188762B (en) Chinese-English mixed merchant store name identification method, system, equipment and medium
CN113408535B (en) OCR error correction method based on Chinese character level features and language model
CN111274239A (en) Test paper structuralization processing method, device and equipment
CN111967267B (en) XLNET-based news text region extraction method and system
CN110502759B (en) Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary
CN112036330A (en) Text recognition method, text recognition device and readable storage medium
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN113961696A (en) Oracle automatic conjugation verification method based on Obibert
CN114579796B (en) Machine reading understanding method and device
Mostafa et al. An end-to-end ocr framework for robust arabic-handwriting recognition using a novel transformers-based model and an innovative 270 million-words multi-font corpus of classical arabic with diacritics
CN113961696B (en) Automatic oracle conjugation verification method based on ObiBert
Khosrobeigi et al. A rule-based post-processing approach to improve Persian OCR performance
CN115344668A (en) Multi-field and multi-disciplinary science and technology policy resource retrieval method and device
US11270153B2 (en) System and method for whole word conversion of text in image
CN110362803B (en) Text template generation method based on domain feature lexical combination
CN111753840A (en) Ordering technology for business cards in same city logistics distribution
Vasantharajan et al. Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English
CN111626318A (en) Multi-language harmful information feature intelligent mining method based on deep learning
CN116935396B (en) OCR college entrance guide intelligent acquisition method based on CRNN algorithm
CN115146630B (en) Word segmentation method, device, equipment and storage medium based on professional domain knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant