CN114419303A - Character recognition method and scanning system based on scanning software - Google Patents

Character recognition method and scanning system based on scanning software Download PDF

Info

Publication number
CN114419303A
CN114419303A CN202111512970.6A CN202111512970A CN114419303A CN 114419303 A CN114419303 A CN 114419303A CN 202111512970 A CN202111512970 A CN 202111512970A CN 114419303 A CN114419303 A CN 114419303A
Authority
CN
China
Prior art keywords
image
binary
character recognition
matrix
scanning software
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111512970.6A
Other languages
Chinese (zh)
Inventor
余烁奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kirin Software Co Ltd
Original Assignee
Kirin Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kirin Software Co Ltd filed Critical Kirin Software Co Ltd
Priority to CN202111512970.6A priority Critical patent/CN114419303A/en
Publication of CN114419303A publication Critical patent/CN114419303A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The invention provides a character recognition method based on scanning software, which is characterized by comprising the following steps of: s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image; s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image; s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image; and S4, the scanning software performs character recognition on the cut image and outputs recognition content. The method can realize one-key processing and storage based on a method for immediately recognizing characters of a document scanned by scanning software.

Description

Character recognition method and scanning system based on scanning software
Technical Field
The invention relates to the technical field of image processing, in particular to a character recognition method based on scanning software, a scanning system, electronic equipment and a readable storage medium.
Background
Image processing technology is used in more and more applications in modern society, for example, in the field of image processing, the image processing technology is used to perform character recognition on an acquired picture or document. At present, few functions for realizing character recognition exist in scanner equipment, and the existing character recognition technology is rarely applied to scanning software. For example, chinese patent publication No. CN112862024A discloses a text recognition method and system, which acquires an image sample, inputs the image sample set into a text recognition network for training, acquires a text recognition model for character recognition, and acquires a text recognition result. The method is not used for carrying out character recognition on a domestic platform based on scanning software, when a scanner is connected to obtain a scanned document, a character recognition result cannot be obtained immediately, and the scanned text needs to be additionally processed to carry out character recognition.
Therefore, it is necessary to provide a character recognition method which can directly perform character recognition based on scanning software.
Disclosure of Invention
The invention provides a character recognition method and a scanning system based on scanning software, which can immediately perform character recognition based on a document scanned by the scanning software to realize one-key processing and storage.
In order to achieve the above objects and other related objects, the present invention provides a character recognition method based on scanning software, comprising the steps of:
s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image;
s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image;
s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image;
and S4, the scanning software performs character recognition on the cut image and outputs recognition content.
Further, the scanning software performs binarization processing on the text image, and specifically includes:
carrying out graying processing on the text image to obtain a grayscale image;
extracting an image gray matrix according to the gray image;
calculating an image local contrast matrix according to the image gray matrix;
and carrying out binary division on the image local contrast matrix by utilizing an Otsu method to obtain the binary image.
Further, the binary division of the image local contrast matrix by using the universe method to obtain the binary image specifically includes:
acquiring the maximum value and the minimum value of the contrast value in the image local contrast matrix;
setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram;
selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts;
selecting a point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as an optimal binary segmentation threshold point;
dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point;
performing edge detection on the gray level image by using a Canny operator to determine an edge matrix;
taking the intersection of the first binary matrix and the edge matrix to determine a second binary matrix;
and determining a binary image according to the second binary matrix.
Further, the scanning software performs character recognition on the cut image through Tesseract related api.
Based on the same invention conception, the invention also provides a scanning system which is used for scanning a document through a scanner to obtain a text image and extracting the character content of the text image and comprises equipment, a storage module, a scanning content display module and a processing key module;
the device and storage module is used for displaying the device information of the scanner and the storage information of the text image, and storing and sending the text image through the keys;
the scanning content display module is used for displaying a text image scanned by the scanner;
the processing key module is used for processing the text image after being pressed, and comprises beautification processing and character recognition processing, wherein the beautification processing comprises the following steps: respectively carrying out image denoising and binarization processing on the text image to obtain a binary image; intelligently correcting the binary image based on Hough transform, and simultaneously selecting a character recognition area for cutting to obtain a cut image; and the character recognition processing comprises character recognition of the cutting image and output of recognition content.
Based on the same inventive concept, the present invention also provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program, and the computer program realizes the method of any one of the above items when being executed by the processor.
Based on the same inventive concept, the present invention also provides a readable storage medium having stored therein a computer program which, when executed by a processor, implements the method of any one of the above.
In summary, the present invention provides a method for immediately recognizing characters of a document scanned on a domestic operating system based on scanning software, so as to implement one-key processing and saving, in order to solve the problem that the scanning software is not used for acquiring a scanned document on the domestic operating system in the existing character recognition technology and immediately recognize characters.
Drawings
Fig. 1 is a schematic step diagram of a character recognition method based on scanning software according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a scanning system according to an embodiment of the present invention.
Detailed Description
The following describes the text recognition method and scanning system based on scanning software according to the present invention in further detail with reference to fig. 1-2 and the detailed description. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.
Referring to fig. 1, an embodiment of the present invention provides a character recognition method based on scanning software, including the following steps:
s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image;
s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image;
s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image;
and S4, the scanning software performs character recognition on the cut image and outputs recognition content.
In this embodiment, step S2 specifically includes:
carrying out graying processing on the text image to obtain a grayscale image; extracting an image gray matrix according to the gray image; calculating an image local contrast matrix according to the image gray matrix; and carrying out binary division on the image local contrast matrix by utilizing an Otsu method to obtain the binary image. The local contrast matrix of the image is obtained by filtering the gray matrix of the image, so that the influence caused by uneven illumination can be effectively eliminated, and the contrast and the binary separability of the image are improved.
In this embodiment, the obtaining the binary image by performing binary division on the image local contrast matrix by using the universe method specifically includes: acquiring the maximum value and the minimum value of the contrast value in the image local contrast matrix; setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram; selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts; selecting a point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as an optimal binary segmentation threshold point; dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point; performing edge detection on the gray level image by using a Canny operator to determine an edge matrix; taking the intersection of the first binary matrix and the edge matrix to determine a third binary matrix; and determining a binary image according to the third binary matrix.
In this embodiment, the scanning software generally performs character recognition on the cropped image through a Tesseract-related api.
Based on the same invention concept, referring to fig. 2, a scanning system for scanning a document by a scanner to obtain a text image and extracting text content of the text image is characterized by comprising a device, a storage module, a scanned content display module and a processing key module; the device and storage module is used for displaying the device information of the scanner and the storage information of the text image, and storing and sending the text image through the keys; the scanning content display module is used for displaying a text image scanned by the scanner; the processing key module is used for processing the text image after being pressed, and comprises beautification processing and character recognition processing, wherein the beautification processing comprises the following steps: respectively carrying out image denoising and binarization processing on the text image to obtain a binary image; intelligently correcting the binary image based on Hough transform, and simultaneously selecting a character recognition area for cutting to obtain a cut image; and the character recognition processing comprises character recognition of the cutting image and output of recognition content.
As shown in fig. 2, 1 to 4 and 15 to 18 indicate processing key modules, 5 to 14 indicate a device and storage module, 19 indicates a scanned content display module, specifically, 1 indicates a one-key beautification key for performing image denoising and binarization processing on the text image, 2 indicates an intelligent correction key, 3 indicates a character recognition key, 4 indicates a scan start key, 5 indicates a name of a scanning device, 6 indicates a type of a scanner, 7 indicates a color mode supported by the scanner, 8 indicates a resolution supported by the scanner, 9 indicates a size supported by the scanner, 10 indicates a document format stored in a scanned document, 11 indicates a document name stored in a scanned document, 12 indicates a path name stored in the scanned document, 13 indicates transmission of the scanned document by mail, 14 indicates storage of the scanned document by another mail, 15 indicates toolbar cutting, 16 denotes toolbar rotation, 17 denotes toolbar flipping, 18 denotes toolbar watermarking, and 19 denotes scanned document editing page, i.e., scanned page.
Based on the same inventive concept, the invention further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the character recognition method based on the scanning software.
The processor may be, in some embodiments, a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor (e.g., a GPU), or other data Processing chip. The processor is typically used to control the overall operation of the electronic device. In this embodiment, the processor is configured to execute the program code stored in the memory or process data, for example, execute the program code of the character recognition method based on the scanning software.
The memory includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. In other embodiments, the memory may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Of course, the memory may also include both internal and external memory units of the electronic device. In this embodiment, the memory is generally used for storing an operating method installed in the electronic device and various types of application software, such as a program code of the character recognition method based on the scanning software. In addition, the memory may also be used to temporarily store various types of data that have been output or are to be output.
Based on the same idea, the invention further provides a readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the character recognition method based on the scanning software is realized.
The invention has the advantages of solving the problem that the scanned document is not obtained on the basis of scanning software on a domestic operating system for immediately performing character recognition in the prior character recognition technology, and providing a method for immediately recognizing characters of the document scanned on the basis of the scanning software on the domestic operating system to realize one-key processing and storage.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (7)

1. A character recognition method based on scanning software is characterized by comprising the following steps:
s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image;
s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image;
s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image;
and S4, the scanning software performs character recognition on the cut image and outputs recognition content.
2. The character recognition method based on scanning software according to claim 1, wherein the scanning software performs binarization processing on the text image, and specifically comprises:
carrying out graying processing on the text image to obtain a grayscale image;
extracting an image gray matrix according to the gray image;
calculating an image local contrast matrix according to the image gray matrix;
and carrying out binary division on the image local contrast matrix by utilizing an Otsu method to obtain the binary image.
3. The character recognition method based on scanning software according to claim 2, wherein the obtaining the binary image by binary partitioning the image local contrast matrix using the universe method specifically comprises:
acquiring the maximum value and the minimum value of the contrast value in the image local contrast matrix;
setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram;
selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts;
selecting a point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as an optimal binary segmentation threshold point;
dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point;
performing edge detection on the gray level image by using a Canny operator to determine an edge matrix;
taking the intersection of the first binary matrix and the edge matrix to determine a second binary matrix;
and determining a binary image according to the second binary matrix.
4. The method of claim 1, wherein the scanning software performs text recognition on the cropped image through a Tesseract-related api.
5. A scanning system is used for scanning a document through a scanner to obtain a text image and extracting the character content of the text image, and is characterized by comprising equipment, a storage module, a scanned content display module and a processing key module;
the device and storage module is used for displaying the device information of the scanner and the storage information of the text image, and storing and sending the text image through the keys;
the scanning content display module is used for displaying a text image scanned by the scanner;
the processing key module is used for processing the text image after being pressed, and comprises beautification processing and character recognition processing, wherein the beautification processing comprises the following steps: respectively carrying out image denoising and binarization processing on the text image to obtain a binary image; intelligently correcting the binary image based on Hough transform, and simultaneously selecting a character recognition area for cutting to obtain a cut image; and the character recognition processing comprises character recognition of the cutting image and output of recognition content.
6. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1 to 4.
7. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 4.
CN202111512970.6A 2021-12-11 2021-12-11 Character recognition method and scanning system based on scanning software Pending CN114419303A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111512970.6A CN114419303A (en) 2021-12-11 2021-12-11 Character recognition method and scanning system based on scanning software

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111512970.6A CN114419303A (en) 2021-12-11 2021-12-11 Character recognition method and scanning system based on scanning software

Publications (1)

Publication Number Publication Date
CN114419303A true CN114419303A (en) 2022-04-29

Family

ID=81266017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111512970.6A Pending CN114419303A (en) 2021-12-11 2021-12-11 Character recognition method and scanning system based on scanning software

Country Status (1)

Country Link
CN (1) CN114419303A (en)

Similar Documents

Publication Publication Date Title
CN110046529B (en) Two-dimensional code identification method, device and equipment
CN101908136B (en) Table identifying and processing method and system
US7017816B2 (en) Extracting graphical bar codes from template-based documents
US7437002B2 (en) Image recognition system utilizing an edge image and a binary image
US7317835B2 (en) Image processing method and apparatus
US11030447B2 (en) On-device partial recognition systems and methods
CN112308046A (en) Method, device, server and readable storage medium for positioning text region of image
CN114121179B (en) Extraction method and extraction device of chemical structural formula
US10055668B2 (en) Method for the optical detection of symbols
CN113177899A (en) Method for correcting text tilt of medical photocopy, electronic device and readable storage medium
EP3014528B1 (en) Determining barcode locations in documents
CN113557520A (en) Character processing and character recognition method, storage medium and terminal device
EP4369312A1 (en) An image processing method, image forming device and electronic device
US20070053610A1 (en) Image processing apparatus and control method therefor
CN110689063B (en) Training method and device for certificate recognition based on neural network
CN114419303A (en) Character recognition method and scanning system based on scanning software
CN111814780A (en) Bill image processing method, device and equipment and storage medium
US6788819B1 (en) Image processing
CN111526263B (en) Image processing method, device and computer system
CN113365071B (en) Image layered compression method and image layered compression device
CN113435331B (en) Image character recognition method, system, electronic equipment and storage medium
CN116740746A (en) Text recognition method, text recognition device, computer equipment and storage medium
WO1995034046A1 (en) System and method for reading data from prescribed regions of a document of variable size
JP3756660B2 (en) Image recognition method, apparatus and recording medium
CN115700824A (en) Character segmentation method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination