CN114267035A

CN114267035A - Document image processing method and system, electronic device and readable medium

Info

Publication number: CN114267035A
Application number: CN202111587342.4A
Authority: CN
Inventors: 韩亦真; 朱军; 禤少茵
Original assignee: Guangzhou Xinwensu Technology Co ltd
Current assignee: Guangzhou Xinwensu Technology Co ltd
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2022-04-01

Abstract

The invention relates to a document image processing method, a document image processing system, an electronic device and a readable medium. A document image processing method, comprising: acquiring a first document image, and preprocessing the first document image to obtain a first processed image and a second processed image which are the same; executing text block position detection operation on the first processed image to obtain matrix coordinate information of a plurality of text character blocks; and respectively performing first binarization operation at corresponding positions in the second processed image based on the matrix coordinate information of the text character blocks to obtain a second document image. By using the document image processing method provided by the invention, the two processed images are used for reference comparison processing, so that the binaryzation processing of the characters of the scanned document is better realized, and the detection and area binaryzation technology aiming at the character block is generated, thereby realizing more effective binaryzation segmentation effect and reducing the problems of adhesion and deficiency of character strokes.

Description

Document image processing method and system, electronic device and readable medium

Technical Field

The present invention relates to the field of image processing, and in particular, to a method, a system, an electronic device, and a readable medium for processing a document image.

Background

Documents are widely used in most of the current life, business, etc. scenarios as carriers for content preservation and delivery. Many times, the content of a paper document needs to be stored as an electronic document which can be edited, when the number of the paper documents is small, the paper documents can be transcribed in a manual typing mode, and in some repeated and massive application scenes, the most efficient method is to acquire a needed content image by using a digital image processing technology at present; the text image is converted to text characters and processed using digital image recognition technology (OCR). However, the paper document is converted into the electronic first document image, and needs to be acquired by methods such as camera shooting and scanner scanning. This may cause more noticeable disturbance of the acquired electronic first document image and image blurring due to the influence of the illumination imbalance.

This has a significant impact when processing the electronic first document image using digital image processing techniques. Therefore, before processing, preprocessing such as denoising is performed on the image to obtain an image closer to the original document. In the prior art, a binarization algorithm based on a single threshold value or a binarization algorithm based on an adaptive threshold value is generally used for preprocessing an image, and due to different ink depths of scanned documents or differences of illumination intensity during shooting, optimal binarization segmentation between each character and a background in the document is different. Thereby affecting the accuracy of subsequent OCR recognition.

Disclosure of Invention

In view of the foregoing disadvantages of the prior art, an object of the present invention is to provide a document image processing method, so as to solve the problem that the image binarization denoising cannot be accurately performed according to the noise condition of the character in the prior art.

It is another object of the present invention to provide a document image processing system.

It is a further object of the present invention to provide an electronic device.

It is a further object of the present invention to provide a computer readable medium.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, the present invention provides a document image processing method, including:

acquiring a first document image, and preprocessing the first document image to obtain a first processed image and a second processed image which are the same;

executing text block position detection operation on the first processed image to obtain matrix coordinate information of a plurality of text character blocks;

and respectively performing first binarization operation at corresponding positions in the second processed image based on the matrix coordinate information of the text character blocks to obtain a second document image.

Further, the document image processing method, the preprocessing, includes:

performing white balance processing on the first document image to obtain a balanced image;

carrying out image pixel correction on the balance image to obtain a corrected image;

and converting the corrected image into a gray-scale image, and copying the gray-scale image to obtain the first processed image and the second processed image.

Further, in the document image processing method, the text block position detecting operation includes:

carrying out numerical value enhancement operation on the first processed image to obtain an enhanced image;

carrying out second binarization operation and corrosion treatment on the enhanced image;

and acquiring matrix coordinate information of a plurality of text character blocks in the enhanced image based on contour detection operation.

Further, when acquiring the matrix coordinate information of the text character block, the document image processing method further performs the following operations:

merging the matrix coordinate information which accords with the preset rule; the predetermined rule is that the separation distance of the centers of gravity between adjacent text character blocks is less than a predetermined number of pixels while the height or width is a first predetermined proportion of the normal text character block.

Further, in the document image processing method, the numerical enhancement operation is:

when the pixel value of the pixel point is smaller than a first preset pixel value, assigning the pixel value of the pixel point to be 0;

and when the pixel value of the pixel point is larger than a second preset pixel value, assigning the pixel value of the pixel point to be 255.

Further, in the document image processing method, the first binarization operation includes:

acquiring a corresponding target image block in the second processed image based on a plurality of pieces of matrix coordinate information;

obtaining a binarization threshold value based on the gray level histogram data of the target image block; the obtaining rule of the binarization threshold value is as follows: determining a maximum threshold, obtaining a maximum pixel value in pixel values of pixel points of a second predetermined proportion quantity in the target image block, wherein if the maximum pixel value is greater than the maximum threshold, the binarization threshold is the maximum threshold, otherwise, the binarization threshold is the maximum pixel value;

and carrying out binarization processing on the target image block based on the binarization threshold value.

Further, in the document image processing method, the first binarization operation further includes:

and carrying out binarization processing on other areas of the second processed image by using a preset threshold value to eliminate image noise.

In another aspect, the present invention provides a scanned document image processing system using the document image processing method described in any one of the preceding claims, including:

the preprocessing module is used for acquiring a first document image and preprocessing the first document image to obtain a first processed image and a second processed image which are the same;

the text block detection module is used for executing text block position detection operation on the first processing image to obtain matrix coordinate information of a plurality of text character blocks;

and the fine processing module is used for respectively carrying out first binarization operation on corresponding positions in the second processed image based on the matrix coordinate information of the text character blocks to obtain a second document image.

In another aspect, the present invention provides an electronic device comprising:

a memory storing a computer program;

a processor, said computer program when executed by said processor implementing any of the document image processing methods described above.

In another aspect, the present invention provides a computer readable medium storing a computer program which, when executed by a processor, implements any of the document image processing methods described above.

Compared with the prior art, the document image processing method, the document image processing system, the electronic device and the readable medium provided by the invention have the following beneficial effects:

by using the document image processing method provided by the invention, the first document image is preprocessed to obtain the first processed image and the second processed image, the text block detection operation is carried out on the first processed image, the independent identification on the text block is further realized, the first binarization operation is carried out on the second processed image according to the detected text character blocks, and the second document image is further obtained. The two processed images are used for reference comparison processing, so that binarization processing of the characters of the scanned document is better realized, detection aiming at character blocks and an area binarization technology are generated, and therefore a more effective binarization segmentation effect is realized, and the problems of adhesion and deletion of character strokes are reduced.

Drawings

FIG. 1 is a flow chart of a document image processing method provided by the present invention;

FIG. 2 is an original image of a first document provided by the present invention;

FIG. 3 is a preprocessed image provided by the present invention;

FIG. 4 is an image of the present invention subjected to a text position detection operation;

FIG. 5 is a fine binarization processed image provided by the present invention;

FIG. 6 is a block diagram of a document image processing system according to the present invention.

Detailed Description

In order to make the objects, technical solutions and effects of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It is to be understood by those skilled in the art that the foregoing general description and the following detailed description are exemplary and explanatory of specific embodiments of the invention, and are not intended to limit the invention.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process or method that comprises a list of steps does not include only those steps, but may include other steps not expressly listed or inherent to such process or method. Also, without further limitation, one or more devices or subsystems, elements or structures or components beginning with "comprise. The appearances of the phrases "in one embodiment," "in another embodiment," and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Referring to fig. 1-5, the present invention provides a document image processing method, which is applied to the field of extracting and identifying characters in a first document image, and is configured to generate a first processed image and a second processed image having the same attribute by simply preprocessing the images, further perform work of obtaining text character blocks in the first processed image, and after obtaining the text character blocks, perform a first binarization operation on positions of the second processed image corresponding to each text character block based on basic parameters of the text character blocks, so as to obtain a second document image, and improve OCR identification accuracy.

The document image processing method provided by the invention can ensure that the identified precision of the document characters is improved after the document is converted into the first document image, and particularly has excellent experience in the work of character conversion of batch documents. For example, a large amount of time is consumed for manual input because 1000 pages of paper documents need to be subjected to character extraction, and although the existing document image processing technology also uses a binarization technology for denoising, strokes of characters subjected to binarization are easy to be lost or overlapped, so that OCR cannot be normally recognized easily, or the recognized contents which are not intended to be expressed are easy to be caused.

The temperature image processing method comprises the following steps:

s1, acquiring a first document image, and preprocessing the first document image to obtain a first processed image and a second processed image which are the same; generally, the first document image may be obtained by scanning a paper document, or may be obtained by data transfer (for example, storing the first document image in a usb disk or other storage medium, loading the first document image into a processing device by data transmission, or downloading the first document image into the processing device via a network). In this embodiment, the main function of the preprocessing is to form the same first processed image and second processed image, and simultaneously perform the preliminary pixel correction on the first document image, so as to facilitate the processing in the subsequent steps.

Further, as a preferred scheme, in this embodiment, the image enhancement processing is performed on the first document image through the preprocessing, and a specific person skilled in the art may select a proper image enhancement method to perform preprocessing on the first document image, so that the processing in the subsequent steps is more convenient and faster, and the processing accuracy in the subsequent steps is improved.

S2, executing text block position detection operation on the first processed image to obtain matrix coordinate information of a plurality of text character blocks; generally, in this embodiment, the text block position detection operation may use a position detection operation commonly used in the art, for example, CTPN detection technology. After obtaining the plurality of text character blocks, the matrix coordinate information corresponding to each text character block is synchronously obtained through the text block position detection operation, which is not repeated or limited herein.

And S3, respectively performing first binarization operation at corresponding positions in the second processed image based on the matrix coordinate information of the text character blocks to obtain a second document image. In this embodiment, the first binarization operation preferably uses a commonly used binarization processing technique, so as to improve the degree of cleaning of the text block and implement high-definition processing of the first document image.

Further, as a preferable scheme, in this embodiment, the preprocessing includes:

performing white balance processing on the first document image to obtain a balanced image; those skilled in the art can obtain the balance image by using an appropriate white balance processing technique according to actual requirements, and in this embodiment, the following formula is used:

Gray_ = (avg(C(R)) + avg(C(G)) + avg(C(B))) / 3；

C(R) = C(R) * avg(R) / Gray_；

C(G) = C(G) * avg(G) / Gray_；

C(B) = C(B) * avg(B) / Gray_；

wherein R is a red pixel channel of the first document image; g is a green pixel channel of the first document image; b is a blue pixel channel of the first document image; c () is the pixel value of the corresponding pixel channel in the first document image; avg (×) is the average expression.

Carrying out image pixel correction on the balance image to obtain a corrected image; those skilled in the art can obtain the corrected image by using an appropriate image pixel correction technique according to actual requirements, and in this embodiment, the image pixel correction technique is a gamma correction algorithm.

And converting the corrected image into a gray-scale image, and copying the gray-scale image to obtain the first processed image and the second processed image. Specifically, in this embodiment, a conversion formula for converting the corrected image into a grayscale image is as follows:

C（GRAY）= 0.299 * C(R) + 0.578 * C(G) + 0.114 * C(B)；

wherein GRAY is a corrected image; and C () is the pixel value of the corresponding pixel channel in the first document image.

After the preprocessing method provided by the embodiment is used, the definition of the content of the first document image can be obviously improved, and the effect presentation can be specifically shown with reference to fig. 2 and fig. 3, so that the execution of the subsequent steps is greatly facilitated, and the recognition accuracy of the text character block is effectively improved.

Further, as a preferable solution, in this embodiment, the text block position detecting operation includes:

s21, carrying out numerical enhancement operation on the first processed image to obtain an enhanced image; specifically, those skilled in the art can obtain the enhanced image by using an appropriate numerical enhancement operation according to actual needs.

Further, as a preferable scheme, in this embodiment, the numerical value enhancement operation is:

In this embodiment, the numerical enhancement operation may be: for each pixel point in the first processed image, assigning the pixel point with the pixel value smaller than A to be 0; the pixel value of the pixel point with the pixel value larger than B is assigned to be 255 (0 < A < 60, 210 < B < 255), so that the noise reduction and the development effect can be effectively improved.

S22, carrying out second binarization operation and corrosion treatment on the enhanced image; specifically, those skilled in the art may process the enhanced image by using an appropriate adaptive binarization technique and an erosion processing technique according to actual needs. The second binarization operation is preferably adaptive binarization processing, and when the adaptive binarization technique is executed, a person skilled in the art can select basic parameters of the adaptive binarization technique according to actual requirements, in this embodiment, the size of the pixel field (blockSize) of the adaptive binarization threshold is set to MaxW 2+1, and the offset constant (C (MaxW 2+ 1)) of the adaptive binarization threshold is set to MaxW 2+ 1; where MaxW is the pixel scale occupied by a single normal character in the first document image, MaxW =32 in the present embodiment, that is, the pixel domain size of the second binarization operation is 65 and the offset constant is 65 in the present embodiment.

Furthermore, after the enhanced image is subjected to corrosion treatment, strokes of the characters can be better connected together.

And S23, acquiring matrix coordinate information of a plurality of text character blocks in the enhanced image based on contour detection operation. Specifically, those skilled in the art may perform the contour detection operation by using a suitable contour detection technique according to actual requirements, and further obtain a plurality of text character blocks and corresponding matrix coordinate information.

According to the technical scheme provided by the embodiment, each text character block can be accurately obtained through data enhancement operation, second binarization operation, corrosion treatment and contour detection operation, and meanwhile, accurate matrix coordinate information of the text character block is obtained, so that the accuracy in fine binarization treatment of a second processed image is ensured, and binarization treatment is accurately performed on individual text character block regions, so that the developing effect of a first document image is better, and particularly, reference can be made to comparison and distinction between fig. 3 and fig. 4.

Further, as a preferred solution, in this embodiment, when acquiring the matrix coordinate information of the text character block, the following operation is further performed:

merging the matrix coordinate information which accords with the preset rule; the predetermined rule is that the separation distance of the centers of gravity between adjacent text character blocks is less than a predetermined number of pixels while the height or width is a first predetermined proportion of the normal text character block. Generally, text character blocks with the gravity center distances spaced within 3-5 pixel points and the widths or heights smaller than half or one third of the median of the widths and heights of all the blocks are combined to obtain new and more accurate matrix coordinate information, which can be seen in the combination situation between normal characters and punctuation marks in fig. 4. The implementation of the embodiment can ensure that the text character block is obtained more accurately, and reduce the calculation amount of the subsequent steps.

Further, as a preferable solution, in this embodiment, the first binarization operation includes:

obtaining a binarization threshold value based on the gray level histogram data of the target image block; the obtaining rule of the binarization threshold value is as follows: determining a maximum threshold, obtaining a maximum pixel value in pixel values of pixel points of a second predetermined proportion quantity in the target image block, wherein if the maximum pixel value is greater than the maximum threshold, the binarization threshold is the maximum threshold, otherwise, the binarization threshold is the maximum pixel value; in the embodiment, the dynamic binarization threshold is used for binarization operation, so that accurate processing of the target image block is ensured, and high-definition processing of the text character block is realized. Of course, before the fine binarization processing is performed, the maximum threshold needs to be determined, and thus, the refinement processing is realized. The maximum threshold value is preferably 50 to 150, and more preferably 100.

And carrying out binarization processing on the target image block based on the binarization threshold value. The binarization operation is that when the pixel value of a certain pixel point is smaller than the binarization threshold value, the value is assigned to 0, and the value is assigned to 255.

In a specific implementation process of this embodiment, the following may be implemented:

3.1 extracting a target image block from the second processed image according to the matrix coordinate information.

3.2 statistics of the gray histogram gray _ hist (i.e. the number of pixels of each gray value) of the target image block.

3.3, obtaining a threshold variable T according to the gray histogram gray _ hist, wherein the T is increased from 0, and if the number of the pixel points smaller than T is 28% of the pixel points of the target image block; or T > M, then stop.

3.4, carrying out binarization operation with a threshold value of T on the target image block, and if the pixel of the pixel point is greater than or equal to T, assigning the pixel value to be 255; otherwise the value is assigned to 0.

And 3.5, assigning the processing image of the target image block to the corresponding area of the second processing image.

3.6 repeat steps 3.1-3.5 until the plurality of matrix coordinate information is processed.

Specifically, by implementing the technical scheme provided by this embodiment, the definition of the text character block in the second processed image can be effectively improved, the area binarization operation is realized, and the high-definition processing is realized, which can be seen as the difference between fig. 5 and fig. 2.

Further, as a preferable solution, in this embodiment, the first binarization operation further includes:

and carrying out binarization processing on other areas of the second processed image by using a preset threshold value to eliminate image noise. In the present embodiment, the predetermined threshold is 120-. The noise of the non-text character block can be effectively removed, so that the image is clearer, and the difference between the figure 5 and the figure 2 can be seen.

Referring to fig. 6, the present invention further provides a scanned document image processing system using the document image processing method according to any of the foregoing embodiments, including:

The present invention also provides an electronic device comprising:

a memory storing a computer program;

a processor, said computer program implementing the document image processing method of any of the preceding embodiments when executed by said processor.

The present invention also provides a computer-readable medium storing a computer program which, when executed by a processor, implements the document image processing method according to any of the foregoing embodiments.

More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.

It should be understood that equivalents and modifications of the technical solution and inventive concept thereof may occur to those skilled in the art, and all such modifications and alterations should fall within the scope of the appended claims.

Claims

1. A document image processing method, comprising:

2. The document image processing method according to claim 1, wherein the preprocessing includes:

3. The document image processing method according to claim 1, wherein the text block position detecting operation includes:

4. The document image processing method according to claim 3, wherein, in acquiring matrix coordinate information of the text character block, the following operation is further performed:

5. The document image processing method according to claim 3, wherein the numerical enhancement operation is:

6. The document image processing method according to claim 1, wherein the first binarization operation includes:

7. The document image processing method according to claim 6, wherein the first binarization operation further comprises:

8. A scanned document image processing system using the document image processing method of any one of claims 1 to 7, comprising:

9. An electronic device, comprising:

a memory storing a computer program;

a processor, said computer program, when executed by said processor, implementing the document image processing method of any of claims 1-7.

10. A computer-readable medium, in which a computer program is stored which, when being executed by a processor, carries out the document image processing method according to any one of claims 1 to 7.