CN113961696A - Oracle automatic conjugation verification method based on Obibert - Google Patents
Oracle automatic conjugation verification method based on Obibert Download PDFInfo
- Publication number
- CN113961696A CN113961696A CN202111273361.XA CN202111273361A CN113961696A CN 113961696 A CN113961696 A CN 113961696A CN 202111273361 A CN202111273361 A CN 202111273361A CN 113961696 A CN113961696 A CN 113961696A
- Authority
- CN
- China
- Prior art keywords
- oracle
- conjugation
- text
- obibert
- automatic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000021615 conjugation Effects 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012795 verification Methods 0.000 title claims abstract description 16
- 210000000988 bone and bone Anatomy 0.000 claims abstract description 42
- 239000013598 vector Substances 0.000 claims abstract description 40
- 239000012634 fragment Substances 0.000 claims abstract description 22
- 238000003062 neural network model Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 3
- 238000011160 research Methods 0.000 description 5
- 239000000463 material Substances 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005299 abrasion Methods 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses an automatic oracle conjugation verification method based on Obibert, which comprises the following steps of: s1, collecting a large amount of explanation texts of the oracle, and forming an oracle Bert corpus under the direct participation of an oracle expert; s2, forming a summation vector by an oracle paraphrase text in the oracle Bert corpus, specifically comprising the summation of Token embedding, text embedding and position embedding, and obtaining an Obibert neural network model; s3, judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not by passing the oracle bone script on the conjugated oracle bone through an Obibert NSP model. The invention judges whether the result of the automatic conjugation of the oracle fragments is correct or not through Obibert, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle expressure text, namely, the method for judging whether the result of the automatic conjugation of the oracle fragments is correct or not is provided, and the application of the oracle is further improved.
Description
Technical Field
The invention belongs to the technical field of oracle, and particularly relates to an oracle automatic conjugation verification method based on Obibert.
Background
The oracle-bone inscription is a treasure of Chinese nationality and has important historical value and scientific research significance. However, oracle bone fragments often exist in fragment form due to the characteristics and history of the oracle bone fragments, the materials and the like, and the correct splicing of the oracle bone fragments together is called oracle bone conjugation. In the actual oracle study, the study object is an image of an oracle photograph, a rubbing, and the like rather than an oracle real object. Traditional oracle conjugation research is completed by oracle experts through the steps of collecting oracle images, copying, cutting, splicing, proofreading and the like, and only experts with extremely deep research accumulation and conjugation experience can perform the research. This has greatly hindered the progress of modern oracle studies. The development of oracle conjugation studies has been greatly facilitated since the introduction of computer technology into oracle studies, as edge and contour based automatic conjugation of oracle fragments can be achieved based on image processing techniques. But the new problems are: the edges and the outlines of the oracle fragments are not strictly sutured, and due to abrasion of oracle materials and the existence of fine fragments, a large number of candidate results appear in the automatic conjugation (hereinafter referred to as automatic conjugation) of the oracle fragments of a computer, and obviously, the use of the image processing technology alone is not sufficient for the research work of the oracle fragment conjugation.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides an automatic oracle conjugation verification method based on Obibert. And selecting the selectable item with the highest probability from the candidate results of automatic conjugation of the computer by combining the oracle explanation text, namely providing a method for judging whether the automatic conjugation result of the oracle fragments is correct.
In order to solve the technical problems, the invention provides the following technical scheme:
the invention provides an automatic oracle conjugation verification method based on Obibert, which comprises the following steps of:
s1, collecting a large number of explanation texts of the oracle characters, and constructing an oracle character Bert corpus;
s2, vectorizing the oracle explanation text in the oracle Bert corpus to form an addition vector to obtain an Obibert neural network model, wherein the Oberbert neural network model specifically comprises a Token embedding, text embedding and position embedding mixed addition;
s3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the paraphrase on any two automatically conjugated oracle bones, connecting the paraphrase on any two automatically conjugated carapace bones to obtain two sentences as input, adding a mark symbol to the NSP model, using the corresponding output as semantic representation of the paraphrase text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different paraphrase text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
As a preferred technical solution of the present invention, step S1 specifically includes the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
As a preferable technical scheme of the invention, the method also comprises the following steps:
s4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
As a preferable technical scheme of the invention, the method also comprises the following steps:
s5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
As a preferred technical scheme of the invention, Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle explanation sentence is taken as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](wherein n ═ 1,2, 3.) denotes that temporarily alsoAn unknown oracle bone character.
As a preferred technical scheme of the invention, text embedding is an operation aiming at carapace-bone-script explanation sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
As a preferred technical scheme of the invention, the position embedding is to learn a vector representation at each position in the oracle explanation sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
As a preferred technical scheme of the invention, NSP is Next sequence Prediction, and the tasks of NSP are as follows: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
Compared with the prior art, the invention has the following beneficial effects:
the method judges whether the result of the automatic conjugation of the oracle bone fragments is correct or not through the oracle bone script corpus, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle bone expressage text, namely, the method for judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not is provided, and the application of the oracle bone fragments is further improved.
Drawings
FIG. 1 is a working diagram of the oracle automatic conjugation verification method based on Obibert of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
In order to achieve the object of the present invention, as shown in fig. 1, in one embodiment of the present invention, there is provided an ObiBert-based oracle automatic conjugation verification method, including the steps of:
s1, collecting a large number of explanation texts of the oracle characters, and constructing the oracle character Bert corpus. The method specifically comprises the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
S2, vectorizing the oracle explanation text in the oracle Bert corpus to form a sum vector, and obtaining the Obibert neural network model, wherein the Oberbert neural network model specifically comprises Token embedding, text embedding and position embedding mixed sum.
Specifically, Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle explanation sentence is used as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](wherein n ═ 1,2, 3.) means that it is not yet temporaryThe recognized oracle bone word.
Specifically, text embedding is an operation on oracle paraphrase sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
Specifically, the position embedding is to learn a vector representation at each position in the oracle explanation sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
S3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the paraphrase on any two automatically conjugated oracle bones, connecting the paraphrase on any two automatically conjugated carapace bones to obtain two sentences as input, adding a mark symbol to the NSP model, using the corresponding output as semantic representation of the paraphrase text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different paraphrase text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
The NSP is a Next sequence Prediction, and the tasks of the NSP are as follows: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
S4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
S5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
The method judges whether the result of the automatic conjugation of the oracle bone fragments is correct or not through the oracle bone script corpus, so as to screen the selectable item with the highest probability from the candidate result of the automatic conjugation of the computer by combining the oracle bone expressage text, namely, the method for judging whether the result of the automatic conjugation of the oracle bone fragments is correct or not is provided, and the application of the oracle bone fragments is further improved.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. An automatic oracle conjugation verification method based on Obibert is characterized by comprising the following steps:
s1, collecting a large number of explanation texts of the oracle characters, and constructing an oracle character Bert corpus;
s2, vectorizing the oracle explanation text in the oracle Bert corpus to form an addition vector to obtain an Obibert neural network model, wherein the Oberbert neural network model specifically comprises a Token embedding, text embedding and position embedding mixed addition;
s3, judging whether the result of the automatic conjugation of the oracle fragments is correct or not through the oracle explanation text on the conjugated oracle slice by an Obibbert NSP model; the judging method comprises the following steps: extracting the postscript linked on any two automatically conjugated oracle bones to obtain two sentences as input, adding a mark symbol by the NSP model, using the corresponding output as semantic representation of the postscript text, simultaneously segmenting the two input sentences by using a segmentation symbol, and respectively adding two different postscript text vectors to the two sentences for distinguishing; if the output of the model is correct, the conjugation of the two pieces of the oracle bone is correct; if the output of the model is wrong, it indicates that the conjugation of the two pieces of the nail bone is wrong.
2. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein step S1 specifically comprises the following steps:
s11, spacing the obtained oracle text according to characters, namely dividing one oracle character into one word, and removing punctuation marks in the text to accord with the characteristic that the oracle original text has no sentence reading mark;
s12, constructing a dictionary, counting the frequency of the oracle characters, representing each oracle character as an integer id according to the frequency, and recording the mapping relation between the oracle characters and the ids;
s13, representing the paraphrase text of the oracle-bone inscription as an id sequence according to the language sequence;
s14, training oracle explanation text corpora by using a CBOW neural network model of word2vec, scanning the corpora by adopting a sliding window with the size of 3, predicting central words in each window through context, and forming training data;
s15, obtaining a parameter matrix after training, wherein each row of the matrix is a word vector of a corresponding oracle character in the dictionary, and the row is the size of the dictionary.
3. The ObiBert-based oracle automatic conjugation verification method according to claim 1, further comprising the steps of:
s4, if the two conjugated sheets are judged to be correct in step S3, combining them with the adjacent oracle bone sheet as a whole, repeating step S3 until all sheets in the result of automatic conjugation are judged to be correct, or retaining the maximum number of correct conjugated sheets as the final conjugation result.
4. The ObiBert-based oracle automatic conjugation verification method according to claim 1, further comprising the steps of:
s5, if the two conjugated pieces are judged to be wrong in the step S3, keeping any one piece, selecting another one piece to combine with the adjacent oracle bone piece, and repeating the steps S3 and S4 until all pieces in the automatic conjugation result are judged to be correct, or keeping the combination of the maximum number of correct conjugated pieces as the final conjugation result.
5. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein Token is embedded into a word vector for establishing oracle, that is, each oracle word in an oracle paraphrase sentence is taken as a segmentation unit, and then Token is converted into a vector representation form with fixed dimensions; by [ CLS]Symbol mark the start of Token; by [ SEP ]]Symbol marks the end of Token; in view of the specificity of oracle characters, [ C ]]Representing the incomplete or fuzzy unrecognizable oracle bone; by [ U ]n](where n ═ 1,2, 3.) denotes an oracle character which is not recognized temporarily.
6. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein text embedding is an operation for oracle paraphrase sentence pairs; the concrete implementation is as follows: forming vectors by using indexes 0 and 1 to represent different oracle paraphrase sentences, namely, assigning 0 to all Token of the first sentence so as to form a first vector; assigning 1's to all Token's of the second sentence, thereby forming a second vector; if there is only one input sentence, its text is embedded as a vector with all indices being 0.
7. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein the position embedding is to learn a vector representation at each position in the oracle paraphrase sentence to process text sequence information; the same oracle bone character appears at different positions and is represented by different vectors; the concrete implementation is as follows: a suitably sized look-up table is designed in which the first row is a vector representation of any oracle word in the first position, the second row is a vector representation of any oracle word in the second position, and so on.
8. The ObiBert-based oracle automatic conjugation verification method according to claim 1, wherein NSP is Next sequence Prediction, and the tasks of NSP are: predicting whether sentence B is the next sentence of sentence a, the purpose of NSP is to obtain information between sentences.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111273361.XA CN113961696B (en) | 2021-10-29 | Automatic oracle conjugation verification method based on ObiBert |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111273361.XA CN113961696B (en) | 2021-10-29 | Automatic oracle conjugation verification method based on ObiBert |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113961696A true CN113961696A (en) | 2022-01-21 |
CN113961696B CN113961696B (en) | 2024-05-14 |
Family
ID=
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115587215A (en) * | 2022-10-18 | 2023-01-10 | 河南大学 | Residual broken Chinese character image conjugation method based on sentence continuity |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693222A (en) * | 2012-05-25 | 2012-09-26 | 熊晶 | Carapace bone script explanation machine translation method based on example |
US20130188863A1 (en) * | 2012-01-25 | 2013-07-25 | Richard Linderman | Method for context aware text recognition |
CN108509587A (en) * | 2018-03-29 | 2018-09-07 | 浙江师范大学 | The inquiry inscriptions on bones or tortoise shells opens up database establishment and the search method of figure and its original text and annotations |
CN110413785A (en) * | 2019-07-25 | 2019-11-05 | 淮阴工学院 | A kind of Automatic document classification method based on BERT and Fusion Features |
CN110807100A (en) * | 2019-10-30 | 2020-02-18 | 安阳师范学院 | Oracle-bone knowledge map construction method and system based on multi-modal data |
CN111881260A (en) * | 2020-07-31 | 2020-11-03 | 安徽农业大学 | Neural network emotion analysis method and device based on aspect attention and convolutional memory |
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130188863A1 (en) * | 2012-01-25 | 2013-07-25 | Richard Linderman | Method for context aware text recognition |
CN102693222A (en) * | 2012-05-25 | 2012-09-26 | 熊晶 | Carapace bone script explanation machine translation method based on example |
CN108509587A (en) * | 2018-03-29 | 2018-09-07 | 浙江师范大学 | The inquiry inscriptions on bones or tortoise shells opens up database establishment and the search method of figure and its original text and annotations |
CN110413785A (en) * | 2019-07-25 | 2019-11-05 | 淮阴工学院 | A kind of Automatic document classification method based on BERT and Fusion Features |
CN110807100A (en) * | 2019-10-30 | 2020-02-18 | 安阳师范学院 | Oracle-bone knowledge map construction method and system based on multi-modal data |
CN111881260A (en) * | 2020-07-31 | 2020-11-03 | 安徽农业大学 | Neural network emotion analysis method and device based on aspect attention and convolutional memory |
Non-Patent Citations (3)
Title |
---|
张柯文;李翔;朱全银;方强强;马甲林;成洁怡;丁行硕;: "一种基于WSD层级记忆网络建模的文档表示方法", 淮阴工学院学报, no. 03, 15 June 2020 (2020-06-15) * |
王华锋;王久阳;: "一种基于Roberta的中文实体关系联合抽取模型", 北方工业大学学报, no. 02, 15 April 2020 (2020-04-15) * |
王爱民;葛彦强;刘国英;葛文英;周宏宇: "计算机辅助甲骨文缀合关键技术研究", 计算机测量与控制, no. 007, 31 December 2010 (2010-12-31) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115587215A (en) * | 2022-10-18 | 2023-01-10 | 河南大学 | Residual broken Chinese character image conjugation method based on sentence continuity |
CN115587215B (en) * | 2022-10-18 | 2023-10-20 | 河南大学 | Residual-part Chinese sketch conjugation method based on statement smoothness |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108764074B (en) | Subjective item intelligently reading method, system and storage medium based on deep learning | |
CN110110585B (en) | Intelligent paper reading implementation method and system based on deep learning and computer program | |
US6252988B1 (en) | Method and apparatus for character recognition using stop words | |
US20040006467A1 (en) | Method of automatic language identification for multi-lingual text recognition | |
CN112287920B (en) | Burma language OCR method based on knowledge distillation | |
CN110188762B (en) | Chinese-English mixed merchant store name identification method, system, equipment and medium | |
CN113408535B (en) | OCR error correction method based on Chinese character level features and language model | |
CN111274239A (en) | Test paper structuralization processing method, device and equipment | |
CN111967267B (en) | XLNET-based news text region extraction method and system | |
CN110502759B (en) | Method for processing Chinese-Yue hybrid network neural machine translation out-of-set words fused into classification dictionary | |
CN112036330A (en) | Text recognition method, text recognition device and readable storage medium | |
Al Ghamdi | A novel approach to printed Arabic optical character recognition | |
CN113961696A (en) | Oracle automatic conjugation verification method based on Obibert | |
CN114579796B (en) | Machine reading understanding method and device | |
Mostafa et al. | An end-to-end ocr framework for robust arabic-handwriting recognition using a novel transformers-based model and an innovative 270 million-words multi-font corpus of classical arabic with diacritics | |
CN113961696B (en) | Automatic oracle conjugation verification method based on ObiBert | |
Khosrobeigi et al. | A rule-based post-processing approach to improve Persian OCR performance | |
CN115344668A (en) | Multi-field and multi-disciplinary science and technology policy resource retrieval method and device | |
US11270153B2 (en) | System and method for whole word conversion of text in image | |
CN110362803B (en) | Text template generation method based on domain feature lexical combination | |
CN111753840A (en) | Ordering technology for business cards in same city logistics distribution | |
Vasantharajan et al. | Adapting the Tesseract Open-Source OCR Engine for Tamil and Sinhala Legacy Fonts and Creating a Parallel Corpus for Tamil-Sinhala-English | |
CN111626318A (en) | Multi-language harmful information feature intelligent mining method based on deep learning | |
CN116935396B (en) | OCR college entrance guide intelligent acquisition method based on CRNN algorithm | |
CN115146630B (en) | Word segmentation method, device, equipment and storage medium based on professional domain knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |