CN114419303A

CN114419303A - Character recognition method and scanning system based on scanning software

Info

Publication number: CN114419303A
Application number: CN202111512970.6A
Authority: CN
Inventors: 余烁奇
Original assignee: Kirin Software Co Ltd
Current assignee: Kirin Software Co Ltd
Priority date: 2021-12-11
Filing date: 2021-12-11
Publication date: 2022-04-29

Abstract

The invention provides a character recognition method based on scanning software, which is characterized by comprising the following steps of: s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image; s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image; s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image; and S4, the scanning software performs character recognition on the cut image and outputs recognition content. The method can realize one-key processing and storage based on a method for immediately recognizing characters of a document scanned by scanning software.

Description

Character recognition method and scanning system based on scanning software

Technical Field

The invention relates to the technical field of image processing, in particular to a character recognition method based on scanning software, a scanning system, electronic equipment and a readable storage medium.

Background

Image processing technology is used in more and more applications in modern society, for example, in the field of image processing, the image processing technology is used to perform character recognition on an acquired picture or document. At present, few functions for realizing character recognition exist in scanner equipment, and the existing character recognition technology is rarely applied to scanning software. For example, chinese patent publication No. CN112862024A discloses a text recognition method and system, which acquires an image sample, inputs the image sample set into a text recognition network for training, acquires a text recognition model for character recognition, and acquires a text recognition result. The method is not used for carrying out character recognition on a domestic platform based on scanning software, when a scanner is connected to obtain a scanned document, a character recognition result cannot be obtained immediately, and the scanned text needs to be additionally processed to carry out character recognition.

Therefore, it is necessary to provide a character recognition method which can directly perform character recognition based on scanning software.

Disclosure of Invention

The invention provides a character recognition method and a scanning system based on scanning software, which can immediately perform character recognition based on a document scanned by the scanning software to realize one-key processing and storage.

In order to achieve the above objects and other related objects, the present invention provides a character recognition method based on scanning software, comprising the steps of:

s1, the scanning software identifies and connects a connectable scanner, and the connectable scanner scans a file to obtain a text image;

s2, the scanning software respectively carries out image denoising and binarization processing on the text image to obtain a binary image;

s3, intelligently correcting the binary image based on Hough transform, and selecting a character recognition area for cutting to obtain a cut image;

and S4, the scanning software performs character recognition on the cut image and outputs recognition content.

Further, the scanning software performs binarization processing on the text image, and specifically includes:

carrying out graying processing on the text image to obtain a grayscale image;

extracting an image gray matrix according to the gray image;

calculating an image local contrast matrix according to the image gray matrix;

and carrying out binary division on the image local contrast matrix by utilizing an Otsu method to obtain the binary image.

Further, the binary division of the image local contrast matrix by using the universe method to obtain the binary image specifically includes:

acquiring the maximum value and the minimum value of the contrast value in the image local contrast matrix;

setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram;

selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts;

selecting a point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as an optimal binary segmentation threshold point;

dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point;

performing edge detection on the gray level image by using a Canny operator to determine an edge matrix;

taking the intersection of the first binary matrix and the edge matrix to determine a second binary matrix;

and determining a binary image according to the second binary matrix.

Further, the scanning software performs character recognition on the cut image through Tesseract related api.

Based on the same invention conception, the invention also provides a scanning system which is used for scanning a document through a scanner to obtain a text image and extracting the character content of the text image and comprises equipment, a storage module, a scanning content display module and a processing key module;

the device and storage module is used for displaying the device information of the scanner and the storage information of the text image, and storing and sending the text image through the keys;

the scanning content display module is used for displaying a text image scanned by the scanner;

the processing key module is used for processing the text image after being pressed, and comprises beautification processing and character recognition processing, wherein the beautification processing comprises the following steps: respectively carrying out image denoising and binarization processing on the text image to obtain a binary image; intelligently correcting the binary image based on Hough transform, and simultaneously selecting a character recognition area for cutting to obtain a cut image; and the character recognition processing comprises character recognition of the cutting image and output of recognition content.

Based on the same inventive concept, the present invention also provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program, and the computer program realizes the method of any one of the above items when being executed by the processor.

Based on the same inventive concept, the present invention also provides a readable storage medium having stored therein a computer program which, when executed by a processor, implements the method of any one of the above.

In summary, the present invention provides a method for immediately recognizing characters of a document scanned on a domestic operating system based on scanning software, so as to implement one-key processing and saving, in order to solve the problem that the scanning software is not used for acquiring a scanned document on the domestic operating system in the existing character recognition technology and immediately recognize characters.

Drawings

Fig. 1 is a schematic step diagram of a character recognition method based on scanning software according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a scanning system according to an embodiment of the present invention.

Detailed Description

The following describes the text recognition method and scanning system based on scanning software according to the present invention in further detail with reference to fig. 1-2 and the detailed description. The advantages and features of the present invention will become more apparent from the following description. It is to be noted that the drawings are in a very simplified form and are all used in a non-precise scale for the purpose of facilitating and distinctly aiding in the description of the embodiments of the present invention. To make the objects, features and advantages of the present invention comprehensible, reference is made to the accompanying drawings. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the implementation conditions of the present invention, so that the present invention has no technical significance, and any structural modification, ratio relationship change or size adjustment should still fall within the scope of the present invention without affecting the efficacy and the achievable purpose of the present invention.

Referring to fig. 1, an embodiment of the present invention provides a character recognition method based on scanning software, including the following steps:

In this embodiment, step S2 specifically includes:

carrying out graying processing on the text image to obtain a grayscale image; extracting an image gray matrix according to the gray image; calculating an image local contrast matrix according to the image gray matrix; and carrying out binary division on the image local contrast matrix by utilizing an Otsu method to obtain the binary image. The local contrast matrix of the image is obtained by filtering the gray matrix of the image, so that the influence caused by uneven illumination can be effectively eliminated, and the contrast and the binary separability of the image are improved.

In this embodiment, the obtaining the binary image by performing binary division on the image local contrast matrix by using the universe method specifically includes: acquiring the maximum value and the minimum value of the contrast value in the image local contrast matrix; setting the number of histogram groups, equally dividing the interval between the maximum value and the minimum value of the contrast value according to the number of the histogram groups, so that the local contrast value of each pixel point falls into the corresponding interval, and constructing a histogram; selecting any point in the histogram, dividing the histogram into two parts according to the point, and calculating the intra-class variance and the inter-class variance of the two parts; selecting a point with the maximum value of the inter-class variance divided by the intra-class variance in the histogram as an optimal binary segmentation threshold point; dividing the image local contrast matrix into a first binary matrix according to the optimal binary segmentation threshold point; performing edge detection on the gray level image by using a Canny operator to determine an edge matrix; taking the intersection of the first binary matrix and the edge matrix to determine a third binary matrix; and determining a binary image according to the third binary matrix.

In this embodiment, the scanning software generally performs character recognition on the cropped image through a Tesseract-related api.

Based on the same invention concept, referring to fig. 2, a scanning system for scanning a document by a scanner to obtain a text image and extracting text content of the text image is characterized by comprising a device, a storage module, a scanned content display module and a processing key module; the device and storage module is used for displaying the device information of the scanner and the storage information of the text image, and storing and sending the text image through the keys; the scanning content display module is used for displaying a text image scanned by the scanner; the processing key module is used for processing the text image after being pressed, and comprises beautification processing and character recognition processing, wherein the beautification processing comprises the following steps: respectively carrying out image denoising and binarization processing on the text image to obtain a binary image; intelligently correcting the binary image based on Hough transform, and simultaneously selecting a character recognition area for cutting to obtain a cut image; and the character recognition processing comprises character recognition of the cutting image and output of recognition content.

As shown in fig. 2, 1 to 4 and 15 to 18 indicate processing key modules, 5 to 14 indicate a device and storage module, 19 indicates a scanned content display module, specifically, 1 indicates a one-key beautification key for performing image denoising and binarization processing on the text image, 2 indicates an intelligent correction key, 3 indicates a character recognition key, 4 indicates a scan start key, 5 indicates a name of a scanning device, 6 indicates a type of a scanner, 7 indicates a color mode supported by the scanner, 8 indicates a resolution supported by the scanner, 9 indicates a size supported by the scanner, 10 indicates a document format stored in a scanned document, 11 indicates a document name stored in a scanned document, 12 indicates a path name stored in the scanned document, 13 indicates transmission of the scanned document by mail, 14 indicates storage of the scanned document by another mail, 15 indicates toolbar cutting, 16 denotes toolbar rotation, 17 denotes toolbar flipping, 18 denotes toolbar watermarking, and 19 denotes scanned document editing page, i.e., scanned page.

Based on the same inventive concept, the invention further provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the character recognition method based on the scanning software.

The processor may be, in some embodiments, a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor (e.g., a GPU), or other data Processing chip. The processor is typically used to control the overall operation of the electronic device. In this embodiment, the processor is configured to execute the program code stored in the memory or process data, for example, execute the program code of the character recognition method based on the scanning software.

The memory includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device. In other embodiments, the memory may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device. Of course, the memory may also include both internal and external memory units of the electronic device. In this embodiment, the memory is generally used for storing an operating method installed in the electronic device and various types of application software, such as a program code of the character recognition method based on the scanning software. In addition, the memory may also be used to temporarily store various types of data that have been output or are to be output.

Based on the same idea, the invention further provides a readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the character recognition method based on the scanning software is realized.

The invention has the advantages of solving the problem that the scanned document is not obtained on the basis of scanning software on a domestic operating system for immediately performing character recognition in the prior character recognition technology, and providing a method for immediately recognizing characters of the document scanned on the basis of the scanning software on the domestic operating system to realize one-key processing and storage.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A character recognition method based on scanning software is characterized by comprising the following steps:

2. The character recognition method based on scanning software according to claim 1, wherein the scanning software performs binarization processing on the text image, and specifically comprises:

carrying out graying processing on the text image to obtain a grayscale image;

extracting an image gray matrix according to the gray image;

calculating an image local contrast matrix according to the image gray matrix;

3. The character recognition method based on scanning software according to claim 2, wherein the obtaining the binary image by binary partitioning the image local contrast matrix using the universe method specifically comprises:

and determining a binary image according to the second binary matrix.

4. The method of claim 1, wherein the scanning software performs text recognition on the cropped image through a Tesseract-related api.

5. A scanning system is used for scanning a document through a scanner to obtain a text image and extracting the character content of the text image, and is characterized by comprising equipment, a storage module, a scanned content display module and a processing key module;

6. An electronic device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1 to 4.

7. A readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 4.