CN116259056A

CN116259056A - Table character recognition method, apparatus, medium and electronic device

Info

Publication number: CN116259056A
Application number: CN202310241025.XA
Authority: CN
Inventors: 王健; 贾岿; 袁野
Original assignee: Shanghai Hongji Information Technology Co Ltd
Current assignee: Shanghai Hongji Information Technology Co Ltd
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-06-13

Abstract

The application relates to a form character recognition method, which is characterized by carrying out form frame line recognition on an original form image; cutting the original table image according to the identified table frame line coordinates to extract a first cell image; scaling the first cell image to obtain a second cell image; and replacing the second cell image to obtain a new form image. The method can identify the form frame line of the original form image, then locally reduce the area (sub-image) where each form cell is positioned according to a certain scaling factor, generate a new image of the current page, and remarkably widen the distance between texts. Then OCR recognition is carried out on the basis of the new image to obtain a recognition result of the new image, so that the adhesion phenomenon of OCR character recognition is well prevented, the text information obtained by recognition is more accurate, the occurrence of text information disorder caused by character adhesion can not occur in user reading, and the user readability is improved.

Description

Table character recognition method, apparatus, medium and electronic device

Technical Field

The present disclosure relates to the field of document processing technologies, and in particular, to a method and apparatus for recognizing a table character, a computer storage medium, and an electronic device.

Background

In the intelligent document understanding scenario, there are many kinds of documents, for example: orders, long documents, emails, etc., where a large number of documents contain tabular content. If the user is to understand the document content well, the user needs to understand the table structure and the text content in the table accurately.

Character recognition requires OCR (Optical Character Recognition ) technology, and when the distance across cells is very compact, OCR algorithms often recognize two character strings as one character string, which presents a sticky problem for form character recognition. When the adhesion problem of the form character recognition occurs, the follow-up information extraction and other tasks fail, so that the obtained text information and text content have the problems that the characters of two cells are close and the characters are adhered and are not easy to read after the character recognition is finished, the text information is difficult to read and difficult to understand by a user, and the experience of text reading is poor.

Therefore, in order to solve the above problems of character recognition, a more optimal method is required to avoid character sticking during recognition.

Disclosure of Invention

In order to solve the above problems, the present application proposes a form character recognition method, apparatus, computer storage medium and electronic device.

In one aspect of the present application, a method for identifying a form character is provided, including the following steps:

carrying out table frame line identification on the original table image;

cutting the original table image according to the identified table frame line coordinates to extract a first cell image;

scaling the first cell image to obtain a second cell image;

and replacing the second cell image to obtain a new form image.

As an optional embodiment of the present application, optionally, the performing table frame line identification on the original table image includes:

identifying a form region of the original form image;

identifying a transverse line and a longitudinal line of the form area according to the form area;

determining the cells of the table area according to the crossing points of the transverse lines and the longitudinal lines;

and determining the coordinates of the table frame wires according to the cells of the table area.

As an optional embodiment of the present application, optionally, the scaling the first cell image to obtain a second cell image includes:

presetting a scaling k;

multiplying the scaling k by the size of the first cell image, and reducing or enlarging the first cell image according to the multiplied size to obtain the second cell image.

As an optional embodiment of the present application, optionally, the replacing the second cell image with the second cell image results in a new image table, including:

calculating centers of the first cell image and the second cell image respectively;

placing the second cell image back to the original form image, and enabling centers of the second cell image and the first cell image to coincide;

and deleting the first cell image to obtain the new image.

As an optional embodiment of the present application, optionally, the replacing the second cell image back to the original form image includes:

determining the original position of the first cell image according to the coordinates of the table frame lines;

and replacing the second cell image to the original position of the first cell image.

As an optional embodiment of the present application, optionally, after the second cell image is replaced to obtain a new form image, the method further includes:

performing OCR (optical character recognition) on the new form image to obtain an OCR recognition result; the OCR recognition result comprises text content and text positions of texts in the new form image;

and adjusting the OCR recognition result according to the table region and the scaling k to obtain the recognition result of the original table picture.

As an optional embodiment of the present application, optionally, the text position includes coordinates of a first rectangular box surrounding the text;

the step of adjusting the OCR recognition result according to the table area and the scaling k to obtain a recognition result of the original table picture includes:

judging whether the coordinates of the first rectangular frame are within the range of the coordinates of the table frame wire;

if yes, multiplying the reciprocal 1/k of the scaling ratio by the size of the first rectangular frame, and enlarging or reducing the first rectangular frame according to the multiplied size to obtain a second rectangular frame;

calculating coordinates of a second rectangular frame when centers of the first rectangular frame and the second rectangular frame are coincident;

and taking the coordinates of the text content and the second rectangular frame as the identification result of the original table picture.

In another aspect of the present application, there is provided a form character recognition apparatus, including:

the table frame line identification module is used for carrying out table frame line identification on the original table image;

the table cell cutting module is used for cutting the original table image according to the identified table frame line coordinates and extracting a first cell image;

the image scaling module is used for scaling the first cell image to obtain a second cell image;

and the image resetting module is used for returning the second cell image to the original position to obtain a new form image.

As an alternative embodiment of the present application, alternatively,

the table frame line identification module comprises: the first identification module is used for identifying the form area of the original form image; the second identification module is used for identifying transverse lines and longitudinal lines of the form area according to the form area; the cell determining module is used for determining cells of the table area according to the crossing points of the transverse lines and the longitudinal lines; the coordinate determining module is used for determining the coordinates of the table frame wires according to the cells of the table area;

the image scaling module comprises: the configuration module is used for presetting a scaling rate k; the scaling module is used for multiplying the scaling k by the size of the first cell image, and reducing or amplifying the first cell image according to the multiplied size to obtain the second cell image;

the image resetting module comprises: a calculation module for calculating centers of the first cell image and the second cell image, respectively; a reset module, configured to put the second cell image back to the original table image, and make the centers of the second cell image and the first cell image coincide; a new image acquisition module, configured to delete the first cell image, and obtain the new image;

wherein, reset module includes: the first reset module is used for determining the original position of the first cell image according to the coordinates of the table frame wires; the second reset module is used for replacing the second cell image to the original position of the first cell image;

the image resetting module further comprises:

the OCR recognition module is used for carrying out OCR recognition on the new form image to obtain an OCR recognition result; the OCR recognition result comprises text content and text positions of texts in the new form image; the text location includes coordinates of a first rectangular box surrounding the text;

the text adjustment module is configured to adjust the OCR recognition result according to the table area and the scaling k, and obtain a recognition result of the original table picture, where the text adjustment module includes:

In another aspect of the present application, a computer storage medium is provided, where a computer program is stored, and the program when executed by a processor implements the method for identifying characters in a table.

In another aspect of the present application, there is also provided an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the form character recognition method when executing the executable instructions.

The invention has the technical effects that:

the method comprises the steps of carrying out table frame line identification on an original table image; cutting the original table image according to the identified table frame line coordinates to extract a first cell image; scaling the first cell image to obtain a second cell image; and replacing the second cell image to obtain a new form image. The method can identify the form frame line of the original form image, then locally reduce the area (sub-image) where each form cell is positioned according to a certain scaling factor, generate a new image of the current page, and remarkably widen the distance between texts. Then OCR recognition is carried out on the basis of the new image to obtain a recognition result of the new image, so that the adhesion phenomenon of OCR character recognition is well prevented, the text information obtained by recognition is more accurate, the occurrence of text information disorder caused by character adhesion can not occur in user reading, and the user readability is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features and aspects of the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a schematic diagram of a device acceptance sheet in the background;

FIG. 2 is a schematic flow chart of the method for recognizing characters in a form according to the present invention;

FIG. 3 is a schematic zoom diagram showing a new image obtained by zooming a first cell image according to the present invention;

FIG. 4 is a schematic diagram of an application system of the electronic device of the present invention;

fig. 5 shows a schematic diagram of the composition of an application of the device according to the invention.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Example 1

In this embodiment, the following term definitions are first made:

1. form image: the form presented in the form of an image is divided into an original form image and a form new image.

2. Cell image: the form image is an image after being cut according to the form cells.

2. First cell image: the cell image before scaling.

3. Second cell image: the cell image after scaling.

The device acceptance sheet shown in fig. 1, in which the text "2021-10-22-20:14" and "drafting node" passing through the square frame in the approval opinion are possibly recognized as a string object "2021-10-22-20:14 drafting node" by the OCR algorithm, such recognition result is unsuitable, and the industry refers to this problem as a sticky problem of form character recognition. This problem needs to be solved to avoid the occurrence of table character sticking.

In order to solve the problems, the application provides the following general ideas: first, carrying out table frame line identification on a first cell image A, and then locally reducing the area (sub-image) where each table cell is located according to a certain scaling factor to generate a new image B of the current page. OCR recognition is then performed on the B basis, so that character sticking can be avoided.

As shown in fig. 2, in one aspect of the present application, a method for identifying a table character is provided, including the following steps:

s1, carrying out table frame line identification on an original table image;

s2, cutting the original table image according to the identified table frame line coordinates, and extracting a first cell image;

s3, scaling the first cell image to obtain a second cell image;

s4, returning the second cell image to the original position to obtain a new form image.

In order to solve the problems, the application provides the following general ideas: firstly, carrying out table frame line identification on an original table image A, and then carrying out local reduction on the area (sub-image) where each table cell is located according to a certain scaling factor to generate a new table image B of the current page. OCR recognition is then performed on the B basis, so that character sticking can be avoided.

Embodiments of the above steps S1 to S4 of the present application will be specifically described below.

As shown in fig. 3, the first cell image of the current page is formed by a plurality of table units, and the distance d1 between the text in the table and the table frame is smaller, so that when the OCR algorithm recognizes, the text "2021-10-22-20:14" and "drafting node" in the square frame are possibly recognized as a string object "2021-10-22:14 drafting node" by the OCR algorithm.

Therefore, the technology firstly carries out the table frame line identification on the first cell image A to obtain the table frame line coordinate of each cell.

As an optional implementation manner of the application, optionally, S1, performing table frame line identification on the original table image includes:

identifying a form region of the original form image;

The table frame line identification, that is, the frame line information and the coordinate information of the unit cells of the original table image need to be acquired, is implemented by identifying the original table image in a table identification mode, so that the information such as the unit cell frame line information and the coordinate of the original table image is obtained. The method comprises the following specific steps:

the first step: identifying a table area;

and a second step of: and identifying the transverse lines, the vertical lines and the unit cells.

And identifying the table area of the first cell image by adopting a table identification algorithm, and obtaining information such as table frame lines, coordinates and the like of each cell. What can be obtained with the form recognition algorithm is the form area and the "horizontal, vertical, cell" of the form, such as the form recognition algorithm based on the deep learning algorithm or the form recognition algorithm based on Open-CV, etc.

In this embodiment, the user may decide to implement the above-mentioned recognition calculation of the frame line, coordinates, etc. by using the existing table recognition algorithm, and the embodiment is not limited thereto.

S2, cutting the original table image according to the identified table frame line coordinates, and extracting a first cell image.

After the coordinates of the table frame lines of each cell are identified, the table cells of the original table image can be cut, and the original table image is divided into a plurality of first cell images. That is, according to the coordinates of the table frame line of each cell, all the cut images of each cell, namely the first cell image, are extracted.

In this embodiment, after the table frame line coordinates are obtained, each table unit may be divided according to the corresponding table frame line coordinates, and each table unit is divided based on the table frame line coordinates, so that the original table image may be cut into a plurality of first cell images.

As shown in FIG. 3, after dicing, the cell image (one of the first cell images) containing "2021-10-22:14" will be segmented. After the original table image is segmented, the original table image is composed of a plurality of segmented first cell image sets;

each first cell image waits for scaling.

As an optional embodiment of the present application, optionally, S3, scaling the first cell image to obtain a second cell image includes:

presetting a scaling k;

After the coordinates of the table frame lines of the first cell images are obtained, in order to clearly display the space between the first cell images, the table frame lines of the first cell images may be scaled in an enlarged or reduced manner.

In this embodiment, in order to increase the distance between texts in the cell images, each cell image is first reduced in a reduction ratio.

The user needs to set the reduction scale in advance and set it by himself. And scaling each first cell image according to the proportion k to obtain a reduced second cell image.

As shown in fig. 3, each cell image in the set of cell images of the first cell image is reduced according to the preset scaling k, after each cell image in the first cell image is reduced according to the scaling k, the first cell image in which each cell is located is reduced, and a second cell image is obtained, and at this time, the current page image is composed of a plurality of reduced first cell images, that is, second cell images.

In order to match the placement/belonging position of the corresponding first cell image, in this embodiment, the second cell images need to be replaced, and after each second cell image is restored in the original cell table area, the first cell image of the current page becomes a new image composed of a pair of reduced cell images.

As an optional embodiment of the present application, optionally, S4, placing the second cell image back to the original position, to obtain a new form image, including:

and deleting the first cell image to obtain the new image.

The second cell image is preferably correspondingly restored in the original cell table area, and in this embodiment, the reduced cell image is replaced in the original cell table area with the center of the cell as a reference, so as to obtain a new table image B of the current page.

The second cell image obtained by the scaling process needs to be reset according to the coordinate center of the original cell image, that is, the center of the second cell image needs to be replaced in the original cell area. Specifically, firstly calculating the centers of the first cell image and the second cell image, and when the centers are replaced, taking the first cell image as a reference, and replacing each cell image into an original cell area of the first cell image; and when the second cell image is reset, taking the center of the obtained cell as a reference, and returning the second cell image to the original table image so that the centers of the second cell image and the first cell image are overlapped.

When the second cell image is replaced and is overlapped with the center of the first cell image, the corresponding first cell image is deleted, the image replacement is completed, and the original table image is composed of a plurality of scaled second cell images.

The table frame line and the coordinate information are calculated before and are not described in detail here. And placing the second cell image back to the original position of the first cell image.

All the second cell images are reset according to the mode of resetting the cell centers serving as the reference, and the new image is only scaled in size, the center position is not changed, and therefore the distance between the cells is conveniently calculated.

As shown in fig. 3, each cell image of the new image is reduced, the text space in the cell image is remarkably widened, and the sticking phenomenon of OCR character recognition can be prevented when OCR recognition is performed.

The text pitch in the cell image is determined here mainly by the scaling k, in particular by the user.

At the moment, the distance between texts of the new image is increased, so that OCR recognition is facilitated, and each OCR object in the new image is subjected to text OCR recognition by utilizing a preset OCR recognition algorithm.

Form recognition is separate from text OCR. The OCR word recognition algorithm mainly comprises the following ideas: 1) Page direction identification; 2) Identifying text objects by a target monitoring algorithm: the text object is a continuous character sequence, and the character spacing can be controlled through parameters; 3) A character classification algorithm identifies what each character is.

In this embodiment, the OCR recognition algorithm is not required, and the user selects the OCR recognition algorithm by himself.

OCR text recognition is carried out on each cell image of each new image, so that text content of each new image cell is obtained. Specifically, when the recognition is performed, traversing the cell images of the new image, obtaining the recognition results of all recognition objects, and outputting the OCR object types.

In this embodiment, the text OCR and form recognition are separate algorithms. The OCR word recognition algorithm firstly carries out page direction recognition, and then recognizes the text object through the target monitoring algorithm.

In the text OCR recognition process, the position of a table edge is not considered, the second step is to perform target detection, and find out which foreground images are text strings to be recognized in a piece of paper, and the found object is called an OCR object. For example: normally, each line is an OCR object, and if there is a punctuation mark or space in a line of text, a line of text may be identified as a plurality of OCR objects. Whether a line is identified as an object or as a plurality of objects depends on a parameter (user-determined k) controlling the spacing.

Therefore, according to the above character OCR recognition, all the OCR objects in the new image are traversed to obtain text information of the OCR objects, i.e., cell images, and four-point coordinates of the bounding box. The four-point coordinates of the bounding box, namely the cell coordinates of the cell image where the OCR object is located, are related to the coordinate extraction/recognition algorithm of the bounding box, and particularly refer to a related bounding box geometric calculation mode.

And after obtaining the text information of the OCR object, namely the cell image and the four-point coordinates of the bounding box, if the OCR object is positioned in the table area, scaling the 4-point coordinates of the bounding box according to the proportion of 1/k by taking the center of the cell as a reference, and updating the 4-point coordinates of the bounding box.

In order to accurately detect the position of the OCR object, a detection step is provided for judging whether the OCR object is positioned in the original table area or not, and effective OCR object is effectively identified by judging whether the OCR object is positioned in the detection condition of the original table area or not.

Detecting, judging that the OCR object is positioned in the original table area, and scaling four-point coordinates of the bounding box according to the proportion of 1/k; and otherwise, adjusting. Automatic or manual adjustment is all right.

Therefore, the present application uses a method of cutting out the form cells, scaling them, and then performing OCR, and changes d1 to d2 by the original distance between texts. The space is obviously widened, and the adhesion phenomenon of OCR character recognition is prevented.

It should be apparent to those skilled in the art that the implementation of all or part of the above-described embodiments of the method may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the steps of the embodiments of the control methods described above when executed.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment methods may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the embodiment flow of each control method as described above when executed.

Example 2

As shown in fig. 4, in another aspect of the present application, there is further provided an electronic device, including:

a processor;

a memory for storing processor-executable instructions;

Embodiments of the present disclosure provide for an electronic device that includes a processor and a memory for storing processor-executable instructions. Wherein the processor is configured to implement any of the previously described methods of table character recognition when executing the executable instructions.

Here, it should be noted that the number of processors may be one or more. Meanwhile, in the electronic device of the embodiment of the disclosure, an input device and an output device may be further included. The processor, the memory, the input device, and the output device may be connected by a bus, or may be connected by other means, which is not specifically limited herein.

The memory is a computer-readable storage medium that can be used to store software programs, computer-executable programs, and various modules, such as: a program or module corresponding to a table character recognition method in an embodiment of the present disclosure. The processor executes various functional applications and data processing of the electronic device by running software programs or modules stored in the memory.

The input device may be used to receive an input number or signal. Wherein the signal may be a key signal generated in connection with user settings of the device/terminal/server and function control. The output means may comprise a display device such as a display screen.

Example 3

In another aspect of the present application, a computer storage medium is provided, where a computer program is stored, and the program when executed by a processor implements the method for recognizing characters in a table.

The storage medium may be a magnetic disk, an optical disc, a Read-only memory (ROM), a random access memory (RandomAccessMemory, RAM), a flash memory (flash memory), a hard disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

Example 4

As shown in fig. 5, based on the implementation principle of embodiment 1, in another aspect of the present application, there is further provided a table character recognition device, including:

The functions and interactive application principles of the table frame line recognition module, the table unit cutting module, the image scaling module and the image resetting module are specifically described in the corresponding steps S1-S4 in embodiment 1, and the description of this embodiment is omitted.

The functional principles and interaction principles of the following modules are the same as described in example 1.

As an alternative embodiment of the present application, alternatively,

the image resetting module further comprises:

The functions of the above modules and the interaction principle between the modules are described in embodiment 1, and this embodiment is not repeated.

The modules or steps of the invention described above may be implemented in a general-purpose computing device, they may be centralized in a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by a computing device, such that they may be stored in a memory device and executed by a computing device, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method for identifying a form character, comprising the steps of:

carrying out table frame line identification on the original table image;

scaling the first cell image to obtain a second cell image;

and replacing the second cell image to obtain a new form image.

2. The method of claim 1, wherein the performing table frame line recognition on the original table image comprises:

identifying a form region of the original form image;

3. The method of claim 2, wherein scaling the first cell image to obtain a second cell image comprises:

presetting a scaling k;

4. The method of claim 3, wherein said returning the second cell image to the original position to obtain a new image form comprises:

and deleting the first cell image to obtain the new image.

5. The method of claim 4, wherein said replacing the second cell image back to the original form image comprises:

6. The method of claim 3, further comprising, after the second cell image is replaced to obtain a new form image:

7. The method of claim 6, wherein the text position includes coordinates of a first rectangular box surrounding the text;

8. A form character recognition apparatus, comprising:

9. A computer storage medium having stored thereon a computer program which when executed by a processor implements the method of table character recognition of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to implement the tabular character recognition method of any of claims 1-7 when executing the executable instructions.