CN115223186A - Character acquisition, recognition, retrieval and analysis method and equipment thereof - Google Patents

Character acquisition, recognition, retrieval and analysis method and equipment thereof Download PDF

Info

Publication number
CN115223186A
CN115223186A CN202210651483.6A CN202210651483A CN115223186A CN 115223186 A CN115223186 A CN 115223186A CN 202210651483 A CN202210651483 A CN 202210651483A CN 115223186 A CN115223186 A CN 115223186A
Authority
CN
China
Prior art keywords
character
characters
module
image
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210651483.6A
Other languages
Chinese (zh)
Inventor
张洪岭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Changyue Technology Co ltd
Original Assignee
Hefei Changyue Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Changyue Technology Co ltd filed Critical Hefei Changyue Technology Co ltd
Priority to CN202210651483.6A priority Critical patent/CN115223186A/en
Publication of CN115223186A publication Critical patent/CN115223186A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a character acquisition, recognition, retrieval and analysis method and equipment thereof in the technical field of character recognition, comprising an image acquisition module, a character scanning module, a character analysis module and a character storage module, wherein the method comprises the following steps: s1, collecting natural scene image information with characters by using an image collection module; s2, using a character acquisition module to select the image characters and pictures on the image; and S3, scanning the framed image characters line by line according to the height of the characters by using a character scanning module until all the characters are scanned, acquiring, identifying, retrieving and analyzing the characters by using an image acquisition module, rapidly scanning the image characters line by using the character scanning module, rapidly identifying the image characters line by matching and comparing a character database, supporting the corresponding identification of various languages, and having the advantages of high identification speed, high identification efficiency and high accuracy.

Description

Character acquisition, recognition, retrieval and analysis method and equipment thereof
Technical Field
The invention relates to the technical field of character recognition, in particular to a character acquisition, recognition, retrieval and analysis method and equipment thereof.
Background
The technology of automatically recognizing characters by using a computer is an important field of pattern recognition application. People need to process a large amount of words, reports and texts in production and life. In order to reduce the labor of people and improve the processing efficiency, the 50 s began to discuss the general character recognition method and develop an optical character recognizer. In the 60 s, utility machines using magnetic ink and special fonts were introduced. In the later 60 s, a plurality of character types and handwritten character recognition machines appeared, and the recognition precision and the machine performance of the character recognition machines can basically meet the requirements. Such as a handwritten form number recognition machine and a printed form english number recognition machine for letter sorting. In the 70 s, the basic theory of character recognition and the development of high-performance character recognition machines were mainly studied, and the research on character recognition was focused on, and character recognition can be applied to many fields, such as reading, translation, retrieval of literature data, letter and parcel sorting, manuscript editing and proofreading, large-quantity statistical statements and card collection and analysis, bank check processing, commodity invoice statistical collection, commodity code recognition, commodity warehouse management, automatic processing of a large number of credit cards in collection services of expenses of water, electricity, gas, house rentals, personal insurance and the like, and local automation of office typewriters' work. And document retrieval and various certificate identification are realized, so that the user can conveniently and quickly input information, and the working efficiency of various industries is improved.
Character recognition refers to the process of an electronic device (e.g., a scanner or digital camera) examining characters printed on paper and then translating the shape into computer text using character recognition methods; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information. How to debug or use auxiliary information to improve recognition accuracy is the most important issue of OCR. The main indicators for measuring the performance of an OCR system are: the recognition rejection rate, the recognition error rate, the recognition speed, the user interface friendliness, the product stability, the usability, the feasibility and the like, the scanning OCR character recognition software supports the all-round scanning photographing recognition translation technology, and the image-taking character recognition translation software is image-to-character translation software supporting the character extraction and character editing functions in image-to-character conversion.
The existing character recognition equipment is complex in structure, low in accuracy rate of character recognition, low in recognition speed and low in efficiency in the recognition process.
Disclosure of Invention
The invention aims to provide a character acquisition, identification, retrieval and analysis method and equipment thereof, and aims to solve the problems that the existing character identification equipment in the background technology is complex in structure, low in character identification accuracy rate, low in identification speed and low in efficiency in the identification process.
In order to achieve the purpose, the invention provides a character acquisition, identification, retrieval and analysis device which comprises an image acquisition module, a character scanning module, a character analysis module and a character storage module.
Preferably, the image acquisition module is used for acquiring natural scene image information with characters.
Preferably, the character acquisition module is configured to acquire character information on natural scene image information.
Preferably, the character scanning module is configured to scan character information on image information of a natural scene.
Preferably, the character analysis module is configured to match the scanned character information with character information in a database, and convert the character information into characters corresponding to the character information in the database.
Preferably, the character storage module is used for storing a character database and scanned character information.
The invention also provides a character acquisition, identification, retrieval and analysis method, which comprises the following steps:
s1, collecting natural scene image information with characters by using an image collection module;
s2, using a character acquisition module to select the image characters and pictures on the image;
s3, scanning the framed image characters line by line according to the height of the characters by using a character scanning module until all the characters are scanned;
and S4, matching the scanned character information with the character information of the character database stored in the character storage module, finding out the character type characters with the highest similarity, and converting the character type characters into editable characters through the character analysis module for outputting.
Preferably, the image character picture frame selection is performed in a rectangular text box mode and is tangent to the pixel edges of the height and the width of the picture character.
Preferably, the character database comprises a character model library of a regular script, a song script, a black body, an clerical script, a running script and a song imitation, and further comprises a simplified Chinese character model library, a traditional Chinese character model library, an English character model library, a Japanese character model library and a Korean character model library.
Compared with the prior art, the invention has the beneficial effects that: the character acquisition, identification, retrieval and analysis method and the equipment thereof collect character information on natural scene image information through the image acquisition module, use the character acquisition module to select the image, character and picture frames on the image, and scan the picture characters line by line quickly through the character scanning module, match the scanned character information with the character information of the character database stored in the character storage module, find the character module characters with the highest similarity, and convert the characters into editable characters through the character analysis module to output, support the corresponding identification of various languages, and have the advantages of high identification speed, high identification efficiency and high accuracy.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The invention provides a character acquisition, identification, retrieval and analysis device which comprises an image acquisition module, a character scanning module, a character analysis module and a character storage module.
The image acquisition module is used for acquiring natural scene image information with characters.
The character acquisition module is used for acquiring character information on the natural scene image information.
The character scanning module is used for scanning character information on the natural scene image information.
The character analysis module is used for matching the scanned character information with character information in a database and converting the character information into characters corresponding to the character information in the database.
The character storage module is used for storing a character database and scanned character information.
The invention also provides a character acquisition, identification, retrieval and analysis method, which comprises the following steps:
s1, collecting natural scene image information with characters by using an image collection module;
s2, using a character acquisition module to frame and select image characters and pictures on the image;
s3, scanning the framed and selected image characters line by line according to the height of the characters by using a character scanning module until all the characters are scanned;
and S4, matching the scanned character information with the character information of the character database stored in the character storage module, finding out the character type characters with the highest similarity, and converting the character type characters into editable characters through the character analysis module for outputting.
The image character picture frame selection adopts a rectangular text box form to perform frame selection, and is tangent to the pixel edges of the height and the width of the picture character.
The character database comprises character model libraries of a regular script, a song style, a black body, an clerical script, a running regular script and a song style simulation, and further comprises character model libraries of a simplified Chinese character, a traditional Chinese character, an English character, a Japanese character and a Korean character.
In summary, the invention collects character information on natural scene image information through the image collection module, uses the character acquisition module to frame select image characters and pictures on the image, and rapidly scans the picture characters line by line through the character scanning module, matches the scanned character information with the character information of the character database stored in the character storage module, finds out the character module characters with the highest similarity, and converts the character module characters into a form of editable characters through the character analysis module for output, supports corresponding identification of multiple languages, and has the advantages of high identification speed, high identification efficiency and high accuracy.
It should be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of this invention can be used in any combination with one another as long as no structural conflict exists, and the combination is not exhaustively described in this specification merely for the sake of brevity and resource savings. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (9)

1. A character acquisition, recognition, retrieval and analysis device is characterized in that: the system comprises an image acquisition module, a character scanning module, a character analysis module and a character storage module.
2. The text acquisition, recognition, retrieval and analysis device of claim 1, wherein: the image acquisition module is used for acquiring natural scene image information with characters.
3. The text acquisition, recognition, retrieval and analysis device of claim 1, wherein: the character acquisition module is used for acquiring character information on natural scene image information.
4. The text acquisition, recognition, retrieval and analysis device of claim 1, wherein: the character scanning module is used for scanning character information on the natural scene image information.
5. The text acquisition, recognition, retrieval and analysis device of claim 1, wherein: and the character analysis module is used for matching the scanned character information with the character information in the database and converting the character information into characters corresponding to the character information in the database.
6. The text acquisition, recognition, retrieval and analysis device of claim 1, wherein: the character storage module is used for storing a character database and scanned character information.
7. A character acquisition, identification, retrieval and analysis method is characterized in that: the method comprises the following steps:
s1, collecting natural scene image information with characters by using an image collection module;
s2, using a character acquisition module to select the image characters and pictures on the image;
s3, scanning the framed image characters line by line according to the height of the characters by using a character scanning module until all the characters are scanned;
and S4, matching the scanned character information with the character information of the character database stored in the character storage module, finding out the character type characters with the highest similarity, and converting the character type characters into editable characters through the character analysis module for outputting.
8. The method and apparatus for word acquisition, recognition, retrieval and analysis according to claim 7, wherein: and the image character picture frame selection adopts a rectangular text box form to perform frame selection, and is tangent to the pixel edges of the height and the width of the picture character.
9. The method and apparatus for word acquisition, recognition, retrieval and analysis according to claim 7, wherein: the character database comprises character model libraries of a regular script, a song style, a black body, an clerical script, a running regular script and an imitation song style, and also comprises character model libraries of a simplified Chinese character, a traditional Chinese character, an English character, a Japanese character and a Korean character.
CN202210651483.6A 2022-06-09 2022-06-09 Character acquisition, recognition, retrieval and analysis method and equipment thereof Pending CN115223186A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210651483.6A CN115223186A (en) 2022-06-09 2022-06-09 Character acquisition, recognition, retrieval and analysis method and equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210651483.6A CN115223186A (en) 2022-06-09 2022-06-09 Character acquisition, recognition, retrieval and analysis method and equipment thereof

Publications (1)

Publication Number Publication Date
CN115223186A true CN115223186A (en) 2022-10-21

Family

ID=83607976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210651483.6A Pending CN115223186A (en) 2022-06-09 2022-06-09 Character acquisition, recognition, retrieval and analysis method and equipment thereof

Country Status (1)

Country Link
CN (1) CN115223186A (en)

Similar Documents

Publication Publication Date Title
Marinai Introduction to document analysis and recognition
CN110929580A (en) Financial statement information rapid extraction method and system based on OCR
US8208726B2 (en) Method and system for optical character recognition using image clustering
Isheawy et al. Optical character recognition (ocr) system
Ugale et al. Document management system: A notion towards paperless office
CN111898433B (en) Paper bill digitizing method and device
Fernández et al. Handwritten word spotting in old manuscript images using a pseudo-structural descriptor organized in a hash structure
Palfray et al. Logical segmentation for article extraction in digitized old newspapers
CN115937887A (en) Method and device for extracting document structured information, electronic equipment and storage medium
CN112464907A (en) Document processing system and method
Almohri et al. A real-time DSP-based optical character recognition system for isolated Arabic characters using the TI TMS320C6416T
CN115223186A (en) Character acquisition, recognition, retrieval and analysis method and equipment thereof
CN113743159A (en) OCR method applied to power enterprises
CN111241329A (en) Image retrieval-based ancient character interpretation method and device
Impedovo et al. A new cursive basic word database for bank-check processing systems
Karambelkar et al. Automated Text Extraction from Images using Optical Character Recognition.
CN115988149A (en) Method for generating video by AI intelligent graphics context
Marinai et al. Exploring digital libraries with document image retrieval
Dulla A dataset of warped historical arabic documents
Marinai A survey of document image retrieval in digital libraries
CN111241955B (en) Bill information extraction method and system
CN113935296A (en) Method for extracting paper bank flow information by using sliding template technology
Mariner Optical Character Recognition (OCR)
Wilkinson et al. Neural word search in historical manuscript collections
Reul An Intelligent Semi-Automatic Workflow for Optical Character Recognition of Historical Printings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination