CN109918622A - The method and system converted from Word document to LaTeX document are realized based on JAVA - Google Patents

The method and system converted from Word document to LaTeX document are realized based on JAVA Download PDF

Info

Publication number
CN109918622A
CN109918622A CN201910143870.7A CN201910143870A CN109918622A CN 109918622 A CN109918622 A CN 109918622A CN 201910143870 A CN201910143870 A CN 201910143870A CN 109918622 A CN109918622 A CN 109918622A
Authority
CN
China
Prior art keywords
document
text
latex
converted
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910143870.7A
Other languages
Chinese (zh)
Other versions
CN109918622B (en
Inventor
宋军
徐衡
朱超群
彭艳
张坤
曹威
吴雅笛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing anzhengtong Information Technology Co.,Ltd.
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201910143870.7A priority Critical patent/CN109918622B/en
Publication of CN109918622A publication Critical patent/CN109918622A/en
Application granted granted Critical
Publication of CN109918622B publication Critical patent/CN109918622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention discloses the method and system that a kind of Word document is converted to LaTeX document, Word document file is submitted according to user, system is using JACOB technology to data progress initial analysis such as text, picture, formula, tables in file;The data element in source file is extracted using Apache POI, JACOB technology, and records the relative position information of each element;Classified according to each text element of the NB Algorithm to extraction, source file formula is realized based on stacking autocoder and is converted;The relative position information is combined with each data element, forms the information flow of LaTeX destination document;Above- mentioned information stream is written in file destination, to be converted into final LaTeX document.The present invention can reduce the difficulty and complexity converted from Word document to LaTeX document, provides the document conversion method of profession for colleges and universities teachers and students and scientific research personnel etc., improves the working efficiency to document process.

Description

The method and system converted from Word document to LaTeX document are realized based on JAVA
Technical field
The present invention relates to document conversion and data processing field, more specifically to one kind based on JAVA realize by The method that Word document is converted to LaTeX document.
Background technique
TeX provides a set of powerful and extremely flexible composition language, its up to 900 instruction, and TeX has Macroefficiency, user can define oneself applicable newer command constantly to extend the function of TeX system.Leslie Lamport is opened The LaTeX of hair is most popular and the most widely used TeX Hong Ji in the world today.Microsoft Office Word conduct The kernel program of Office suite provides many wieldy document creation tools, and occupancy volume is most currently on the market Big word processor.The dedicated file format Word file (.docx) of Word come true on most general document standard.Text Shelves conversion is to convert the document formats such as Word, Pdf, Txt, Ooxml, Odf, Html.Such as Fa Ming Ren ?the pure proposition of wood The method that the document of Ooxml, Odf are converted to html format document, Adobe Acrobat Professional software it is real Existing Word format and the conversion of Pdf format etc..Apache POI is the Java database an of open source code, main target It is the bottom document for accessing Word.JACOB is a Java-COM middleware, can be in java application by this component Middle calling com component and Win program library.It may be implemented using Apache POI and JACOB to Microsoft Office Word The read-write capability of format file.
In realizing process of the present invention, inventor has found that existing document conversion is primarily present in technology and user's use aspect Following three classes problem: firstly, format of the existing document switch technology generally be directed to a small number of source format documents and specific objective Document, transformation function is single, and for a user, actual use value is not high.Secondly, the document different for coding mode is real Now conversion has the conversion problem between certain difficulty, such as Microsoft Office Word and LaTeX document.Most Afterwards, LaTeX document is made of the markup language of Tex language, and a complete LaTeX document is made, needs to be grasped TeX language Nearly all description rule and written in code ability, for layman, document writes that there are higher with typesetting Difficulty and complexity.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the foregoing drawbacks, the present invention provides a kind of Word document to The method and system of LaTeX document conversion.
The technical solution adopted by the present invention to solve the technical problems is: constructing one kind and is realized based on JAVA by Word document The method converted to LaTeX document, includes the following steps:
S1, the Word source document files submitted according to user, are opened by the Word calling program module in JACOB component Source document files;
S2, in open source document files, by JACOB component in source document files Various types of data element carry out just Begin to analyze, obtains and record the data information of each data element in source document files;
S3, the data information recorded according to step S2 extract source document using Apache POI component and JACOB component Various types of data element in file;
S4, the Various types of data element for extracting step S3 carry out the dealing with information flow;Wherein, every class data element distinguishes shape At information flow corresponding thereto;
S5, the data information that step S2 is recorded is combined with the information flow of every class data element, is guaranteeing source document In the case that each data element position is constant in files, the information flow of LaTeX destination document is formed;
S6, the information flow for the LaTeX destination document that step S5 is formed is written to file destination, thus by Word source document File is converted into LaTeX document.
Further, it is obtained in step S2 and the data information that records includes the classification and each data of data element Relative position of the element in source document;Data element by JACOB block analysis includes text, picture, table and formula Element.
Further, initial analysis is carried out to Various types of data element in source file in step S2, specifically in source file The storage states of all data elements judged.
Further, pass through Paragraphs, Item, Text and Table interface in JACOB component, note in step S2 Record classification and the relative position of each data element.
Further, extracting Various types of data element in step S3 in source document files includes:
For text element, pass through get (" Text "), the get (" Font ") and get (" Size ") letter in JACOB component Number, extraction obtain the text element in source document;The text element includes text data content, text type and text lattice Formula;
For picture element, using XWPFDocument interface in Apache POI component, extraction is obtained in source document Picture element;Using the FileOutputStream method carried in JAVA, the picture element extracted is saved as into local text Part;
For table element, in conjunction with the getTable function and ReadTable function in JACOB component, extraction obtains source Table element in document;Wherein, the specification of table by getTableRowsCount method in JACOB component and GetTableColumnsCount method obtains;
For formula element, in conjunction with the data information recorded in step S2, by copy method in JACOB component, and The getContents function of pasting boards subclass function in Toolkit tool-class, extraction obtain the formula element of source document;Wherein, Pasting boards are obtained by the Transferable variable in Toolkit tool-class, and will by getTransferData method Data are converted;
Wherein, when every extraction one kind data element, its relative position in source document is recorded.
Further, the data element includes text, picture, table and formula element, utilizes simple shellfish in step S4 This algorithm of leaf carries out classification judgement to the text element of extraction, forms corresponding LaTeX text element information flow;In step S4 The formula element of extraction is converted based on stacking autocoder, forms corresponding LaTeX formula element information flow;Step Remaining Various types of data element forms corresponding destination document format information stream directly according to relative position information in rapid S4.
Further, carrying out the step of classification determines using text element of the NB Algorithm to extraction includes:
A1, the n text element extracted is passed through into JIEBA segmentation methods, is converted into n dimensional feature vector X={ x1、 x2、…、xn};Wherein, xiFor i-th dimension feature vector, i ∈ n;
A2, a two-value classification problem is converted by the text data classification problem extracted, i.e., any unknown text number Belong to category set C={ C according to sample d0, C1};Wherein, C0Represent body text, C1Represent title text;
A3, each text data type is identified using NB Algorithm, including body text, title text two Class;
A4, the probability P that unknown text sample d belongs to classification c is calculated are as follows:
Wherein, it takes maximum probability value as the classification of unknown text sample d, forms corresponding LaTeX according to text categories Text element.
Further, based on stacking autocoder the formula element of extraction is converted the step of include:
B1, the formula element extracted in step S3 is encoded using stacking autocoding algorithm;
Have coded data in B2, the coding result that step B1 is obtained, with formula template library and carries out approximate match;
B3, the highest formula template data of matching degree is input to system equations transfer function module In WordMathToLaTeX, the formula format in source file is further converted, forms the volume that can be identified by LaTeX document Code mode.
Further, in the step B3 converted based on stacking autocoder to the formula element of extraction, according to layer The Euclidean distance y of folded autocoding arithmetic result x and known sample, judge the expression of the highest formula template of matching degree Formula are as follows:
Wherein, x1、x2、…xn、y1、y2、…ynRepresent the value of each vector space after formula coder.
A kind of system converted based on JAVA realization from Word document to LaTeX document proposed by the present invention, use are above-mentioned The method that any one Word document is converted to LaTeX document carries out document conversion.
In a kind of method and system converted based on JAVA realization from Word document to LaTeX document of the present invention In, according to the original Word document that user provides, using machine learning algorithm, intellectual analysis is carried out to source file data, automatically The most approximate or highest text element of matching degree and formula element are chosen, source file data integral layout and target text are integrated Shelves specific coding forms file destination data flow and file destination catalogue, caption, table and the supplemental streams such as illustrates, writes Enter into file destination, to realize the conversion between different type document.
Implement a kind of method converted based on JAVA realization from Word document to LaTeX document proposed by the present invention and is System, has the advantages that
1, the difficulty and complexity of the conversion of different type document be can reduce, be vast colleges and universities teachers and students, scientific research personnel etc. Conveniently professional document conversion regime is provided;
2, facilitate user that simple Word format is converted to the submission format of professional technical paper, solve vast section It grinds personnel and colleges and universities teachers and students needs to learn complexity LaTeX code and take a significant amount of time to carry out recompiling typesetting to paper Problem, improve work efficiency, compensate for the field blank that Now Domestic is converted from Word document to LaTeX document.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is the flow chart that Word document is converted to LaTeX document;
Fig. 2 is that NB Algorithm and stacking autocoder classify to the text element and formula element of extraction Flow chart;
Fig. 3 is the table conversion effect figure that Word document is converted to LaTeX document;
Fig. 4 is the picture conversion effect figure that Word document is converted to LaTeX document;
Fig. 5 is the formula conversion effect that Word document is converted to LaTeX document;
Fig. 6 is the overall conversion effect picture that Word document is converted to LaTeX document.
Specific embodiment
For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail A specific embodiment of the invention.
Referring to FIG. 1, it is the flow chart that Word document is converted to LaTeX document;One kind proposed by the present invention is based on JAVA realizes the method converted from Word document to LaTeX document, specifically includes the following steps:
S1, the Word source document files submitted according to user, are opened by the Word calling program module in JACOB component Source document files.
S2, in open source document files, by JACOB component in source document files Various types of data element carry out just Begin to analyze, obtains and record the data information of each data element in source document files;The data information for wherein obtaining and recording The relative position of classification and each data element in source document files including data element specifically leads in the present embodiment Paragraphs, Item, Text and Table interface in JACOB component are crossed, records the classification of each data element and with respect to position It sets;It include wherein text, picture, table and formula element by the data element of JACOB block analysis;Wherein to source document text Various types of data element carries out initial analysis in part, specifically carries out to the storage state of all data elements in source document files Judgement.
S3, the data information recorded according to step S2 extract source document using Apache POI component and JACOB component Various types of data element in file;Wherein, Various types of data element is extracted in source document files includes:
For text element, pass through get (" Text "), the get (" Font ") and get (" Size ") letter in JACOB component Number, extraction obtain the text element in source document files;The text element includes text data content, text type and text Format;
For picture element, using XWPFDocument interface in Apache POI component, extraction obtains source document files In picture element;The picture element of extraction is saved as this by the FileOutputStream method carried using JAVA program Ground file;
For table element, in conjunction with the getTable function and ReadTable function in JACOB component, extraction obtains source Table element in document;Wherein, the specification of table by getTableRowsCount method in JACOB component and GetTableColumnsCount method obtains;
For formula element, in conjunction with the data information recorded in step S2, by copy method in JACOB component, and The getContents function of pasting boards subclass function in Toolkit tool-class, extraction obtain the formula element of source document;
Wherein, when every extraction one kind data element, its relative position in source document is recorded.
S4, the Various types of data element for extracting step S3 carry out the dealing with information flow;Every class data element be respectively formed with Its corresponding information flow;Wherein, it for the processing of information flow, specifically includes: the public affairs based on stacking autocoder to extraction Formula element is converted, and corresponding LaTeX formula element information flow is formed;Using NB Algorithm to the text of extraction Element carries out classification judgement, forms corresponding LaTeX text element information flow;Remaining Various types of data element is directly according to opposite Location information forms corresponding destination document format information stream.
S5, the data information that step S2 is recorded is combined with the information flow of every class data element, is guaranteeing source document In the case that each data element position is constant in files, the information flow of LaTeX destination document is formed;
S6, the information flow for the LaTeX destination document that step S5 is formed is written to file destination, thus by Word source document File is converted into LaTeX document.
Referring to FIG. 2, it is NB Algorithm and autocoder is laminated to text element and the formula member of extraction The flow chart that element is classified;Specifically, carrying out the step of classification judgement using text element of the NB Algorithm to extraction Suddenly include:
A1, the n text element extracted is passed through into JIEBA segmentation methods, is converted into n dimensional feature vector X={ x1、 x2、…、xn};Wherein, xiFor i-th dimension feature vector, i ∈ n;
A2, a two-value classification problem is converted by the text data classification problem extracted, i.e., any unknown text number Belong to category set C={ C according to sample d0, C1};Wherein, C0Represent body text, C1Represent title text;
A3, each text data type is identified using NB Algorithm, including body text, title text two Class;
A4, the probability P that unknown text sample d belongs to classification c is calculated are as follows:
Wherein, it takes maximum probability value as the classification δ of unknown text sample d, is formed according to classification δ corresponding LaTeX text element;
Specifically, the step of being converted based on stacking autocoder to the formula element of extraction includes:
B1, the formula element extracted in step S3 is encoded using stacking autocoding algorithm;
Have coded data in B2, the coding result that step B1 is obtained, with formula template library and carries out approximate match;
B3, the highest formula template of matching degree is input in system equations transfer function module WordMathToLaTeX, Formula format in source file is further converted, the coding mode that can be identified by LaTeX document is formed.Wherein, according to layer The Euclidean distance y of folded autocoding arithmetic result x and known sample, judge the expression of the highest formula template of matching degree Formula are as follows:
Wherein, x1、x2、…xn、y1、y2、…ynRepresent the value of each vector space after formula coder.
By above-mentioned principle, it is proposed by the present invention another be based on JAVA realize from Word document to LaTeX document turn The system changed carries out the function of document conversion including the method that any one Word document is converted to LaTeX document.
Fig. 3 is the table conversion effect figure that Word document is converted to LaTeX document;Fig. 4 is Word document to LaTeX document The picture conversion effect figure of conversion;Fig. 5 is the formula conversion effect that Word document is converted to LaTeX document;Fig. 6 is Word document The overall conversion effect picture converted to LaTeX document;Pass through Fig. 3-Fig. 6, it is seen that proposed by the present invention a kind of based on JAVA realization Word document effectively can be changed into Latex document by the method converted from Word document to LaTeX document.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (10)

1. a kind of realize the method converted from Word document to LaTeX document based on JAVA, which is characterized in that including walking as follows It is rapid:
S1, the Word source document files submitted according to user open source document by the Word calling program module in JACOB component Files;
S2, in open source document files, Various types of data element in source document files is initially divided by JACOB component Analysis, obtains and records the data information of each data element in source document files;
S3, the data information recorded according to step S2, utilize Apache POI component and JACOB component, extraction source document files In Various types of data element;
S4, the Various types of data element for extracting step S3 carry out the dealing with information flow;Wherein, every class data element be respectively formed with Its corresponding information flow;
S5, the data information that step S2 is recorded is combined with the information flow of every class data element, is guaranteeing source document text In the case that each data element position is constant in part, the information flow of LaTeX destination document is formed;
S6, the information flow for the LaTeX destination document that step S5 is formed is written to file destination, thus by Word source document files It is converted into LaTeX document.
2. the method that Word document according to claim 1 is converted to LaTeX document, which is characterized in that obtained in step S2 It takes and the data information recorded includes the opposite position of the classification and each data element of data element in source document files It sets;Data element by JACOB block analysis includes text, picture, table and formula element.
3. the method that Word document according to claim 1 is converted to LaTeX document, which is characterized in that right in step S2 Various types of data element carries out initial analysis, the specifically storage to all data elements in source document files in source document files State is judged.
4. the method that Word document according to claim 1 is converted to LaTeX document, which is characterized in that lead in step S2 Paragraphs, Item, Text and Table interface in JACOB component are crossed, records the classification of each data element and with respect to position It sets.
5. the method that Word document according to claim 1 is converted to LaTeX document, which is characterized in that in step S3 Various types of data element is extracted in source document files includes:
It is mentioned for text element by get (" Text "), the get (" Font ") and get (" Size ") function in JACOB component Obtain the text element in source document files;The text element includes text data content, text type and text formatting;
For picture element, using XWPFDocument interface in Apache POI component, extraction is obtained in source document files Picture element;The FileOutputStream method carried using JAVA program, saves as local text for the picture element of extraction Part;
For table element, in conjunction with the getTable function and ReadTable function in JACOB component, extraction obtains source document In table element;Wherein, the specification of table by getTableRowsCount method in JACOB component and GetTableColumnsCount method obtains;
For formula element, in conjunction with the data information recorded in step S2, by copy method in JACOB component, and The getContents function of pasting boards subclass function in Toolkit tool-class, extraction obtain the formula element of source document;
Wherein, when every extraction one kind data element, its relative position in source document is recorded.
6. the method that Word document according to claim 1 is converted to LaTeX document, which is characterized in that the data element Element includes text, picture, table and formula element, is carried out in step S4 using text element of the NB Algorithm to extraction Classification determines, forms corresponding LaTeX text element information flow;Based on stacking autocoder to the public affairs of extraction in step S4 Formula element is converted, and corresponding LaTeX formula element information flow is formed;Remaining Various types of data element is directly pressed in step S4 According to relative position information, corresponding destination document format information stream is formed.
7. the method that Word document according to claim 6 is converted to LaTeX document, which is characterized in that utilize simple shellfish This algorithm of leaf carries out the step of classification determines to the text element of extraction
A1, the n text element extracted is passed through into JIEBA segmentation methods, is converted into n dimensional feature vector X={ x1、x2、…、 xn};Wherein, xiFor i-th dimension feature vector, i ∈ n;
A2, a two-value classification problem is converted by the text data classification problem extracted, i.e., any unknown text data sample This d belongs to category set C={ C0, C1};Wherein, C0Represent body text, C1Represent title text;
A3, each text data type is identified using NB Algorithm, including body text, two class of title text;
A4, the probability P that unknown text sample d belongs to classification c is calculated are as follows:
Wherein, it takes maximum probability value as the classification δ of unknown text sample d, forms corresponding LaTeX text according to classification δ This element.
8. the method that Word document according to claim 6 is converted to LaTeX document, which is characterized in that certainly based on stacking Moving the step of encoder converts the formula element of extraction includes:
B1, the formula element extracted in step S3 is encoded using stacking autocoding algorithm;
Have coded data in B2, the coding result that step B1 is obtained, with formula template library and carries out approximate match;
B3, the highest formula template of matching degree is input in system equations transfer function module WordMathToLaTeX, to source Formula format in file is further converted, and the coding mode that can be identified by LaTeX document is formed.
9. the method that Word document according to claim 8 is converted to LaTeX document, which is characterized in that the step B3 In, according to the Euclidean distance y of stacking autocoding arithmetic result x and known sample, judge the highest formula of matching degree The expression formula of template are as follows:
Wherein, x1、x2、…xn、y1、y2、…ynRepresent the value of each vector space after formula coder.
10. a kind of realize the system converted from Word document to LaTeX document based on JAVA, which is characterized in that using such as right It is required that the method that any one of 1-9 Word document is converted to LaTeX document carries out document conversion.
CN201910143870.7A 2019-02-27 2019-02-27 Method for realizing conversion from Word document to LaTeX document based on JAVA Active CN109918622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910143870.7A CN109918622B (en) 2019-02-27 2019-02-27 Method for realizing conversion from Word document to LaTeX document based on JAVA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910143870.7A CN109918622B (en) 2019-02-27 2019-02-27 Method for realizing conversion from Word document to LaTeX document based on JAVA

Publications (2)

Publication Number Publication Date
CN109918622A true CN109918622A (en) 2019-06-21
CN109918622B CN109918622B (en) 2020-12-08

Family

ID=66962462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910143870.7A Active CN109918622B (en) 2019-02-27 2019-02-27 Method for realizing conversion from Word document to LaTeX document based on JAVA

Country Status (1)

Country Link
CN (1) CN109918622B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021042542A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Table of contents storage method and apparatus, computer device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040073708A1 (en) * 1999-12-08 2004-04-15 Warnock Kevin L. Internet document services
CN1685312A (en) * 2002-07-19 2005-10-19 Jgr阿奎西申公司 Registry driven interoperability and exchange of documents
CN101055577A (en) * 2006-04-12 2007-10-17 龙搜(北京)科技有限公司 Collector capable of extending markup language
CN101196886A (en) * 2006-12-08 2008-06-11 鸿富锦精密工业(深圳)有限公司 System and method for converting word files into XML files
CN103309848A (en) * 2013-06-14 2013-09-18 广东电网公司佛山供电局 Method for converting excel document into pdf document
CN104008087A (en) * 2014-06-05 2014-08-27 李梦依 Automatic typesetting method and system special for copywriter with standard format
CN104267953A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for importing Word test questions based on browser
CN107025407A (en) * 2017-03-22 2017-08-08 国家计算机网络与信息安全管理中心 The malicious code detecting method and system of a kind of office document files

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040073708A1 (en) * 1999-12-08 2004-04-15 Warnock Kevin L. Internet document services
CN1685312A (en) * 2002-07-19 2005-10-19 Jgr阿奎西申公司 Registry driven interoperability and exchange of documents
CN101055577A (en) * 2006-04-12 2007-10-17 龙搜(北京)科技有限公司 Collector capable of extending markup language
CN101196886A (en) * 2006-12-08 2008-06-11 鸿富锦精密工业(深圳)有限公司 System and method for converting word files into XML files
CN103309848A (en) * 2013-06-14 2013-09-18 广东电网公司佛山供电局 Method for converting excel document into pdf document
CN104008087A (en) * 2014-06-05 2014-08-27 李梦依 Automatic typesetting method and system special for copywriter with standard format
CN104267953A (en) * 2014-09-27 2015-01-07 昆明钢铁集团有限责任公司 Control and method for importing Word test questions based on browser
CN107025407A (en) * 2017-03-22 2017-08-08 国家计算机网络与信息安全管理中心 The malicious code detecting method and system of a kind of office document files

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEIXIN_30379911: "JAVA解析word文档", 《CSDN》 *
潘若瑛: "多模板多格式论文综合校排系统的研究和实现", 《中国优秀硕士学位论文全文数据库》 *
蔡万景 等: "LaTex创作的Web模板系统的研究与实现", 《科技信息》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021042542A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Table of contents storage method and apparatus, computer device and storage medium

Also Published As

Publication number Publication date
CN109918622B (en) 2020-12-08

Similar Documents

Publication Publication Date Title
Jaderberg et al. Reading text in the wild with convolutional neural networks
CN100511219C (en) Electronic filing system searchable by a handwritten search query
JP4724776B2 (en) System and method for adaptive handwriting recognition
Govindaraju et al. Guide to OCR for Indic scripts
CN110147534B (en) Method and system for converting LaTeX document into Word document
CN109597886A (en) It extracts and generates mixed type abstraction generating method
Kia et al. A novel method for recognition of Persian alphabet by using fuzzy neural network
Kotani et al. Generating handwriting via decoupled style descriptors
Halder et al. Offline writer identification and verification—A state-of-the-art
Droettboom et al. Using the Gamera framework for the recognition of cultural heritage materials
Kanoun et al. Natural language morphology integration in off-line arabic optical text recognition
CN109918622A (en) The method and system converted from Word document to LaTeX document are realized based on JAVA
Sari et al. A search engine for Arabic documents
Bharath et al. Online handwriting recognition for Indic scripts
CN109885818A (en) A kind of powerpoint presentation is to Beamer PowerPoint conversion method and system
Bouibed et al. Writer retrieval using histogram of templates features and SVM
CN106021241B (en) Braille point place Chinese character coding and its machine translation method between braille
JP7435098B2 (en) Kuzushiji recognition system, Kuzushiji recognition method and program
CN110147530A (en) A kind of method and system that Word document is converted to LaTeX document
Wang Pattern recognition and machine vision
Deshmukh et al. Voice-Enabled Vision For The Visually Disabled
Eglin et al. Computer assistance for Digital Libraries: Contributions to Middle-ages and Authors' Manuscripts exploitation and enrichment
Mirshekari Foundations of Legal Protection of Reputation
Bhokse et al. Devnagari handwriting recognition system using dynamic time warping algorithm
Jomy et al. Pattern Analysis Techniques for the Recognition of Unconstrained Handwritten Malayalam Character Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220414

Address after: 100000 b1001-01, floor 9, block B, No. 9, Shangdi Third Street, Haidian District, Beijing

Patentee after: Beijing anzhengtong Information Technology Co.,Ltd.

Address before: 430000 Lu Mill Road, Hongshan District, Wuhan, Hubei Province, No. 388

Patentee before: CHINA University OF GEOSCIENCES (WUHAN CITY)

TR01 Transfer of patent right