CN104463153B - The method and system of character identification rate in a kind of raising format document - Google Patents

The method and system of character identification rate in a kind of raising format document Download PDF

Info

Publication number
CN104463153B
CN104463153B CN201310450972.6A CN201310450972A CN104463153B CN 104463153 B CN104463153 B CN 104463153B CN 201310450972 A CN201310450972 A CN 201310450972A CN 104463153 B CN104463153 B CN 104463153B
Authority
CN
China
Prior art keywords
character
coding
format document
predetermined
universal standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310450972.6A
Other languages
Chinese (zh)
Other versions
CN104463153A (en
Inventor
董宁
耿蕾蕾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Founder Apabi Technology Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Apabi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Apabi Technology Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201310450972.6A priority Critical patent/CN104463153B/en
Publication of CN104463153A publication Critical patent/CN104463153A/en
Application granted granted Critical
Publication of CN104463153B publication Critical patent/CN104463153B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The present invention is a kind of method and system improving character identification rate in format document, character original coding corresponding to the same predetermined character in the format document is compared to obtain with character standard coding and encodes comparison result, multiple coding comparison results are subjected to probability statistics and obtain probability value, the probability value is compared with threshold value, if being more than threshold value, the format document shows the character that character original coding control universal standard character code library obtains;Otherwise, the format document shows the character after OCR identifications.The present invention to select to show that the character original coding compares the character that universal standard character code library obtains or the format document shows the character after OCR identifications, therefore effectively increases the accuracy of character recognition by the method for probability statistics.

Description

The method and system of character identification rate in a kind of raising format document
Technical field
The present invention relates to a kind of method improving Text region rate, character is known in specifically a kind of raising format document The not method and system of rate.
Background technology
In order to ensure the reading effect of reader, the type-setting document that the publication side of books and periodicals is issued before printing is generally format Document.So-called format document is to clearly record the letters such as position, glyph bitmap, font, size, the color of each character The file of breath, the format document can also record the coding of each character.Since format document describes glyph bitmap and word Relative position between symbol, therefore there is certain stability, it can ensure the version that reader is read under any computer environment Formula document and the books and periodicals printed all have consistent visual effect, and common format document is mainly PDF etc..
Although having recorded the coding of character in some format documents, when display, generally according to glyph bitmap It is shown, is shown not according to coding.When extracting the character of word from format document, due to format document Recorded in character coding may generally be encoded the universal standard or custom coding by way of obtain, therefore it is specific To a format document, the coding mode of its character is not known, and then the character of word cannot be obtained according to the coding.
Therefore in the prior art, generally use OCR(Optical Character Recognition, optical character are known Not)Technology extracts the character in format document, but since OCR technique itself has discrimination, uses OCR The character for the word that technology identifies often has that error rate is high, influences user's reading.
Invention content
For this purpose, when technical problem to be solved by the present invention lies in overcoming in the prior art using OCR technique identification character There are the higher problems of error rate, provide a kind of method and system improving character identification rate in format document.
In order to solve the above technical problems, the present invention is a kind of method improving character identification rate in format document,
Include the following steps:
Character original coding corresponding to the same predetermined character in the format document is encoded with the character universal standard It is compared to obtain and encodes coding comparison results identical or that coding is different;
The coding comparison result corresponding to multiple predetermined characters is subjected to probability statistics and obtains the reserved word The probability value that symbol is encoded using the character universal standard;
The probability value is compared with threshold value, if being more than threshold value, the predetermined character is former according to character described in its Begin to encode the obtained character in control universal standard character code library and show;Otherwise, it is logical to directly display the predetermined character Cross the character that OCR is identified.
A kind of method of character identification rate in raising format document, the step of obtaining the coding comparison result before, also Include the following steps:
Extract the glyph bitmap of each predetermined character in the format document;
Extract the character original coding of each of described format document predetermined character;
To character after being identified after glyph bitmap progress OCR identifications;
Character universal standard coding is obtained to character control universal standard character code library after the identification.
A kind of method of character identification rate in raising format document, before the step of extracting the character original coding, Further include following steps:
Character with character original coding in the format document is screened as predetermined character.
A kind of method of character identification rate in raising format document, will be in the format document with character original coding Further include following steps after the step of character is screened as predetermined character:
ID numbers are carried out for each predetermined character.
A kind of method of character identification rate in raising format document, it is described predetermined extracting each of described format document Further include following steps after the step of character original coding of character:
A character original coding table is established, by the character original codings of the ID of the predetermined character corresponding thereto It stores in the character original coding table.
A kind of method of character identification rate in raising format document, in the step of obtaining the character universal standard coding Afterwards, further include following steps:
A character standard coding schedule is established, the character standards of the ID of the predetermined character corresponding thereto are encoded It stores in the character standard coding schedule.
The method of character identification rate, the probability value is compared with threshold value and carries out phase in a kind of raising format document Further include following steps before the operation answered:
Establish an editable interface for showing, changing and confirming the character.
The system of character identification rate in a kind of raising format document, including coding comparison device, probability statistics compiling device and general Rate value, threshold value comparison device, wherein
The coding comparison device is used for the original volume of character corresponding to the same predetermined character in the format document Code is compared to obtain from character universal standard coding encodes coding comparison results identical or that coding is different;
The probability statistics compiling device, it is general for carrying out the coding comparison result corresponding to multiple predetermined characters Rate counts to obtain the probability value that the predetermined character uses character universal standard coding;
The probability value, threshold value comparison device, for the probability value to be compared with threshold value, if being more than threshold value, The predetermined character is according to the obtained character in character original coding control universal standard character code library described in its and shows;It is no Then, the character that the predetermined character is identified by OCR is directly displayed.
The system of character identification rate, further includes glyph bitmap extraction element, the original volume of character in a kind of raising format document Code extraction element, OCR identification devices and the character universal standard encode corresponding intrument, wherein
The glyph bitmap extraction element, the glyph bitmap for extracting each predetermined character in the format document;
The character original coding extraction element, the word for extracting each of described format document predetermined character Accord with original coding;
The OCR identification devices, for word after being identified after the glyph bitmap progress OCR identifications to extracting Symbol;
The character universal standard encodes corresponding intrument, for compareing universal standard character code to character after the identification Library obtains character universal standard coding.
The system of character identification rate, further includes predetermined character screening plant, the reserved word in a kind of raising format document Symbol screening plant is used to screen the character with character original coding in the format document as predetermined character.
The system of character identification rate, further includes ID numbering devices in a kind of raising format document, and the ID numbering devices are used In carrying out ID numbers for each predetermined character.
The system of character identification rate, further includes that character standard coding schedule establishes device in a kind of raising format document, described Character standard coding schedule establishes device, for establishing a character standard coding schedule, corresponding thereto by the ID of the predetermined character In the character standard code storage to the character standard coding schedule answered.
The system of character identification rate, further includes that character standard coding schedule establishes device in a kind of raising format document, described Character standard coding schedule establishes device, for establishing a character standard coding schedule, corresponding thereto by the ID of the predetermined character In the character standard code storage to the character standard coding schedule answered.
The system of character identification rate, further includes that device is established at editable interface in a kind of raising format document, described to compile Editing interface establishes device, for establishing an editable interface for showing, changing and confirming the character.
The above technical solution of the present invention has the following advantages over the prior art:
1, in a kind of method and system improving character identification rate in format document of the present invention, by format text Character original coding in shelves corresponding to the same predetermined character encoded with the character universal standard be compared to obtain encode it is identical Or different encoding ratios pair is encoded as a result, multiple coding comparison results, which are carried out probability statistics, obtains probability value, by institute It states probability value to be compared with threshold value, if being more than threshold value, shows the character original coding control universal standard character code The character that library obtains;Otherwise, the character after display OCR identifications.The present invention is by the methods of probability statistics, to select described in display The character or the format document that character original coding control universal standard character code library obtains show the word after OCR identifications Symbol, therefore effectively increase the accuracy of character recognition.
2, in a kind of method and system improving character identification rate in format document of the present invention, the volume is being obtained Further include following steps before the step of code comparison result:Extract the glyph bitmap of each predetermined character in the format document. Extract the character original coding of each of described format document predetermined character.OCR identifications are carried out to the glyph bitmap After identified after character;Character universal standard coding is obtained to character control universal standard character code library after the identification. The present invention can know character after method for distinguishing is identified by OCR, convenient for further obtaining the character universal standard coding. The OCR identification devices are commercially available general module, are had the advantages that low-cost.
3, in a kind of method and system improving character identification rate in format document of the present invention, the word is being extracted Further include using the character with character original coding in the format document as predetermined character before the step of according with original coding The step of screening, the character step that extraction needs to extract the glyph bitmap can be reduced by screening the operation of predetermined character Number effectively reduces the run time of the present invention, improves operational efficiency.The invention also includes for each predetermined character The step of carrying out ID numbers more convenient can accurately make the predetermined character former with the character by the way of ID numbers Character corresponds after beginning coding or the identification.The invention also includes establish a character original coding table and establish a word The step of according with standard code table, the character original coding table can effectively manage character original coding, and the character standard is compiled Code table can effectively manage character standard coding, can reduce the time of the operation of the present invention.
4, in a kind of method and system improving character identification rate in format document of the present invention, further including foundation can The step of editing interface, the editable interface can show, change and confirm shown character, can manual intervention show The error character shown, convenient for correcting mistake.
Description of the drawings
In order to make the content of the present invention more clearly understood, it below according to specific embodiments of the present invention and combines Attached drawing, the present invention is described in further detail, wherein
Fig. 1 be one embodiment of the invention a kind of raising format document in character identification rate method flow chart;
Fig. 2 be one embodiment of the invention a kind of raising format document in character identification rate system structure diagram.
Specific implementation mode
The specific implementation mode of the present invention is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific implementation mode stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
Embodiment 1
As an embodiment of the present invention, as shown in Figure 1, it is a kind of improve format document in character identification rate method, Include the following steps:
Character original coding corresponding to the same predetermined character in the format document is encoded with the character universal standard It is compared to obtain and encodes coding comparison results identical or that coding is different.
The coding comparison result corresponding to multiple predetermined characters is subjected to probability statistics and obtains the reserved word The probability value that symbol is encoded using the character universal standard.
The probability value is compared with threshold value, if being more than threshold value, the predetermined character is former according to character described in its Begin to encode the obtained character in control universal standard character code library and show.Otherwise, it is logical to directly display the predetermined character Cross the character that OCR is identified.
The present invention is by the methods of probability statistics, to select to show the character original coding according to control universal standard word The character after the obtained character of code database or display OCR identifications is accorded with, the present invention is general using character in the predetermined character When standard code mode, substituted according to the control obtained character in universal standard character code library with the character original coding Character after OCR identifications, the character original coding is according to the control obtained character in character universal standard character code library The accuracy higher of accuracy ratio OCR, therefore the present invention can improve the accuracy of Text region on the whole.
Embodiment 2
As an embodiment of the present invention, on the basis of embodiment 1, in the step of obtaining the coding comparison result Before, further include following steps:
Extract the glyph bitmap of each predetermined character in the format document.
Character after being identified after OCR identifications is carried out to the glyph bitmap that extracts.
Character universal standard coding is obtained to character control universal standard character code library after the identification.Wherein, described The character universal standard is encoded to national standard GB2312.
Extract the character original coding of each of described format document predetermined character.
Above-mentioned acquisition character universal standard coding and character original coding step, can be performed simultaneously respectively, it is possibility to have Certain sequencing, for example character universal standard coding is first obtained, then obtain character original coding;Or first obtain character original Begin coding, then obtains character universal standard coding.As long as getting the character universal standard coding before comparison and character being former Begin to encode and the purpose of the present invention can be realized.
The present invention can know character after method for distinguishing is identified by OCR, general convenient for further obtaining the character Standard code.
Embodiment 3
As an embodiment of the present invention, on the basis of embodiment 2, in the step of extracting the character original coding Before, further include following steps:
Character with character original coding in the format document is screened as predetermined character.Screen reserved word The operation of symbol can reduce the number that extraction needs to extract the character step of the glyph bitmap, effectively reduce the fortune of the present invention The row time, improve operational efficiency.
Embodiment 4
As an embodiment of the present invention, on the basis of embodiment 3, there will be character original in the format document Further include following steps after the step of character of coding is screened as predetermined character:
ID numbers are carried out for each predetermined character.More convenient can accurately it be made by the way of ID numbers described Predetermined character is corresponded with character after the character original coding or the identification.
Embodiment 5
As an embodiment of the present invention, on the basis of embodiment 4, each institute in extracting the format document Further include following steps after the step of stating the character original coding of predetermined character:
A character original coding table is established, by the character original codings of the ID of the predetermined character corresponding thereto It stores in the character original coding table.The character original coding table can effectively manage character original coding, can subtract The time of the operation of few present invention.
Embodiment 6
As an embodiment of the present invention, on the basis of embodiment 4 or embodiment 5, obtaining, the character is general Further include following steps after the step of standard code:
A character standard coding schedule is established, the character standards of the ID of the predetermined character corresponding thereto are encoded It stores in the character standard coding schedule.The character standard coding schedule can effectively manage character standard coding, can subtract The time of the operation of few present invention.
Embodiment 7
As an embodiment of the present invention, on the basis of the above embodiments, the probability value and threshold value are compared Pair and carry out accordingly operate before, further include following steps:
Establish an editable interface for showing, changing and confirming the character.
The editable interface can show, change and confirm shown character, mistake that can be shown by manual intervention Accidentally character, facilitates correction mistake.
As an embodiment of the present invention, on the basis of the above embodiments, the threshold value is 90%.
Embodiment 8
As an embodiment of the present invention, shown in Figure 2, it is a kind of to improve character identification rate in format document and be System, including coding comparison device, probability statistics compiling device and probability value, threshold value comparison device.Wherein,
The coding comparison device is used for the original volume of character corresponding to the same predetermined character in the format document Code is compared to obtain from character universal standard coding encodes coding comparison results identical or that coding is different.
The probability statistics compiling device, it is general for carrying out the coding comparison result corresponding to multiple predetermined characters Rate counts to obtain the probability value that the predetermined character uses character universal standard coding.
The probability value, threshold value comparison device, for the probability value to be compared with threshold value, if being more than threshold value, The predetermined character is according to the obtained character in character original coding control universal standard character code library described in its and shows.It is no Then, the character that the predetermined character is identified by OCR is directly displayed.
The present invention is compiled by the method for probability statistics to select to show that the character original coding compares universal standard character Character or the format document that code library obtains show the character after OCR identifications, therefore effectively increase Text region just True rate.
Embodiment 9
As an embodiment of the present invention, further include glyph bitmap extraction element, character on the basis of embodiment 8 Original coding extraction element, OCR identification devices and the character universal standard encode corresponding intrument.Wherein,
The glyph bitmap extraction element, the glyph bitmap for extracting each predetermined character in the format document.
The character original coding extraction element, the word for extracting each of described format document predetermined character Accord with original coding.
The OCR identification devices, for word after being identified after the glyph bitmap progress OCR identifications to extracting Symbol.
The character universal standard encodes corresponding intrument, for compareing universal standard character code to character after the identification Library obtains character universal standard coding.
The present invention can know character after method for distinguishing is identified by OCR, general convenient for further obtaining the character Standard code.The OCR identification devices are commercially available general module, are had the advantages that low-cost.
Embodiment 10
As an embodiment of the present invention, further include predetermined character screening plant on the basis of embodiment 9, it is described Predetermined character screening plant is used to filter out the character with character original coding in the format document as predetermined character Come.The predetermined character screening plant can reduce the number that extraction needs to extract the character step of the glyph bitmap, effectively The run time for reducing the present invention, improves operational efficiency.
Embodiment 11
As an embodiment of the present invention, further include ID numbering devices on the basis of embodiment 10, the ID numbers Device is used to carry out ID numbers for each predetermined character.The ID numbering devices more convenient can accurately make described pre- Determine character to correspond with character after the character original coding or the identification.
Embodiment 12
As an embodiment of the present invention, further include that character original coding table establishes dress on the basis of embodiment 11 Set, the character original coding table establishes device, for establishing a character original coding table, by the ID of the predetermined character with In its corresponding described character original coding storage to the character original coding table.The character original coding table establishes dress Character original coding can effectively be managed by setting, and can reduce the time of the operation of the present invention.
Embodiment 13
As an embodiment of the present invention, further include character standard on the basis of embodiment 11 or embodiment 12 Coding schedule establishes device, and the character standard coding schedule establishes device, will be described pre- for establishing a character standard coding schedule Determine in the character standard code storages to the character standard coding schedule of the ID of character corresponding thereto.The character standard Coding schedule establishes device, can effectively manage character standard coding, can reduce the time of the operation of the present invention.
Embodiment 14
As an embodiment of the present invention, further include that can compile on the basis of any one embodiment of embodiment 8-13 Editing interface establishes device, and device is established at the editable interface, for establishing one for showing, changing and confirming the character Editable interface.The editable interface can show, change, confirm shown character, can be shown by manual intervention Error character, have the function of correct mistake.
As an embodiment of the present invention, on the basis of the above embodiments, the threshold value is 90%.
It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, the present invention can be used in one or more wherein include computer usable program code computer Usable storage medium(Including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)The computer program of upper implementation produces The form of product.
The present invention be with reference to according to the method for the embodiment of the present invention, equipment(System)And the flow of computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.

Claims (8)

1. a kind of method improving character identification rate in format document, which is characterized in that include the following steps:
Extract the glyph bitmap of each predetermined character in the format document;
Extract the character original coding of each of described format document predetermined character;
To character after being identified after glyph bitmap progress OCR identifications;
Character universal standard coding is obtained to character control universal standard character code library after the identification, wherein the character The universal standard is encoded to national standard GB2312;
Character original coding corresponding to the same predetermined character in the format document is carried out with character universal standard coding Compare to obtain and encodes coding comparison results identical or that coding is different;
Coding comparison result progress probability statistics corresponding to multiple predetermined characters are obtained the predetermined character to adopt The probability value encoded with the character universal standard;
The probability value is compared with threshold value, if being more than threshold value, the predetermined character is according to the original volume of character described in its The obtained character in code control universal standard character code library is simultaneously shown;Otherwise, it directly displays the predetermined character and passes through OCR The character identified.
2. a kind of method improving character identification rate in format document according to claim 1, which is characterized in that extracting Further include following steps before the step of character original coding:
Character with character original coding in the format document is screened as predetermined character.
3. a kind of method improving character identification rate in format document according to claim 2, which is characterized in that will be described Further include walking as follows after the step of character with character original coding is screened as predetermined character in format document Suddenly:
I D numbers are carried out for each predetermined character.
4. a kind of method improving character identification rate in format document according to claim 3, which is characterized in that extracting Further include following steps after the step of character original coding of each of the format document predetermined character:
A character original coding table is established, the character original codings of the I D of the predetermined character corresponding thereto are deposited It stores up in the character original coding table.
5. a kind of method improving character identification rate in format document according to claim 3 or 4, which is characterized in that Further include following steps after the step of obtaining the character universal standard coding:
A character standard coding schedule is established, the character standard codings of the I D of the predetermined character corresponding thereto are deposited It stores up in the character standard coding schedule.
6. a kind of method improving character identification rate in format document according to claim 1, which is characterized in that will be described Further include following steps before probability value is compared with threshold value and is operated accordingly:
Establish an editable interface for showing, changing and confirming the character.
7. a kind of system improving character identification rate in format document, which is characterized in that including glyph bitmap extraction element, character Original coding extraction element, OCR identification devices, character universal standard coding corresponding intrument, coding comparison device, probability statistics dress It sets and probability value, threshold value comparison device, wherein
The glyph bitmap extraction element, the glyph bitmap for extracting each predetermined character in the format document;
The character original coding extraction element, the character for extracting each of described format document predetermined character are former Begin coding;
The OCR identification devices, for character after being identified after the glyph bitmap progress OCR identifications to extracting;
The character universal standard encodes corresponding intrument, for being obtained to character control universal standard character code library after the identification It is encoded to the character universal standard, wherein the character universal standard is encoded to national standard GB2312;
The coding comparison device, for by corresponding to the same predetermined character in the format document character original coding with Character universal standard coding, which is compared to obtain, encodes coding comparison results identical or that coding is different;
The probability statistics compiling device, for the coding comparison result corresponding to multiple predetermined characters to be carried out probability system Meter obtains the probability value that the predetermined character uses character universal standard coding;
The probability value, threshold value comparison device, it is described if being more than threshold value for the probability value to be compared with threshold value Predetermined character is according to the obtained character in character original coding control universal standard character code library described in its and shows;Otherwise, Directly display the character that the predetermined character is identified by OCR.
8. a kind of system improving character identification rate in format document according to claim 7, which is characterized in that further include Predetermined character screening plant, the predetermined character screening plant are used for the word with character original coding in the format document Symbol is screened as predetermined character.
CN201310450972.6A 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document Expired - Fee Related CN104463153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310450972.6A CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310450972.6A CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Publications (2)

Publication Number Publication Date
CN104463153A CN104463153A (en) 2015-03-25
CN104463153B true CN104463153B (en) 2018-09-04

Family

ID=52909169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310450972.6A Expired - Fee Related CN104463153B (en) 2013-09-25 2013-09-25 The method and system of character identification rate in a kind of raising format document

Country Status (1)

Country Link
CN (1) CN104463153B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038093B (en) * 2017-11-10 2021-06-15 深圳市亿图软件有限公司 PDF character extraction method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology
CN102194503A (en) * 2010-03-12 2011-09-21 腾讯科技(深圳)有限公司 Player and character code detection method and device for subtitle file
JP5955579B2 (en) * 2011-07-21 2016-07-20 日東電工株式会社 Protection sheet for glass etching

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5955579A (en) * 1982-09-24 1984-03-30 Fujitsu Ltd Character recognizer
JPH06187505A (en) * 1992-12-21 1994-07-08 Hitachi Ltd Data entry system/method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101782896A (en) * 2009-01-21 2010-07-21 汉王科技股份有限公司 PDF character extraction method combined with OCR technology
CN102194503A (en) * 2010-03-12 2011-09-21 腾讯科技(深圳)有限公司 Player and character code detection method and device for subtitle file
JP5955579B2 (en) * 2011-07-21 2016-07-20 日東電工株式会社 Protection sheet for glass etching

Also Published As

Publication number Publication date
CN104463153A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN106599940B (en) Picture character recognition method and device
CN109446873A (en) Hand-written script recognition methods, system and terminal device
US20130181995A1 (en) Handwritten character font library
JP2004139484A (en) Form processing device, program for implementing it, and program for creating form format
CN109299663A (en) Hand-written script recognition methods, system and terminal device
JPH10124289A (en) Binary data encoding method
JP6000992B2 (en) Document file generation apparatus and document file generation method
JP2005173730A (en) Business form ocr program, method, and device
CN109522898A (en) Handwriting samples picture mask method, device, computer equipment and storage medium
JP2009169948A (en) Device and method for determining orientation of document, and program and recording medium thereof
CN108319578B (en) Method for generating medium for data recording
CN108038093A (en) PDF text extraction methods and device
CN104094283B (en) Character-extraction method and character-recognition device using said method
CN109726369A (en) A kind of intelligent template questions record Implementation Technology based on normative document
CN110867243B (en) Image annotation method, device, computer system and readable storage medium
CN104346616B (en) Character recognition device and character identifying method
CN104463153B (en) The method and system of character identification rate in a kind of raising format document
CN107122785B (en) Text recognition model establishing method and device
CN114529933A (en) Contract data difference comparison method, device, equipment and medium
CN107666550B (en) Image forming apparatus and document electronization method
JP2015005100A (en) Information processor, template generation method, and program
CN112861485A (en) Method, device and equipment for processing nuclear power DCS control logic drawing
US9208381B1 (en) Processing digital images including character recognition using ontological rules
JP2005259017A (en) Image processing apparatus, image processing program and storage medium
KR20100089241A (en) Method of producing and recognizing marker for providing augmented reality

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220620

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: FOUNDER APABI TECHNOLOGY Ltd.

Address before: 100871, Beijing, Haidian District Cheng Fu Road 298, founder building, 9 floor

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: FOUNDER APABI TECHNOLOGY Ltd.

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180904