CN113705576A

CN113705576A - Text recognition method and device, readable storage medium and equipment

Info

Publication number: CN113705576A
Application number: CN202111279462.8A
Authority: CN
Inventors: 刘丹; 张恒星
Original assignee: Jiangxi Zhongye Intelligent Technology Co ltd
Current assignee: Jiangxi Zhongye Intelligent Technology Co ltd
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2021-11-26
Anticipated expiration: 2041-11-01
Also published as: CN113705576B

Abstract

The invention provides a text recognition method, a text recognition device, a readable storage medium and a text recognition device, wherein the method comprises the following steps: acquiring an image to be identified; carrying out character and table recognition on an image to be recognized by adopting a preset image recognition model so as to extract text data, a table structure and respective coordinate information of the text data and the table structure in the image to be recognized; performing connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure, and determining coordinate information of the effective rectangular region according to the coordinate information of the table structure; and fusing the text data and the effective rectangular area according to the coordinate correspondence according to the text data and the coordinate information of the effective rectangular area, and outputting a fusion result so as to identify text contents recorded in the image to be identified. The invention realizes the automatic connection and combination of the multi-line text and the table in the table, avoids the problems of disordered sentences, unsmooth semanteme and the like in the recognition result and improves the text recognition precision.

Description

Text recognition method and device, readable storage medium and equipment

Technical Field

The present invention relates to the field of image information recognition technologies, and in particular, to a text recognition method, an apparatus, a readable storage medium, and a device.

Background

With the continuous development of computer technology, information technology occupies an increasingly important position in people's daily life, and the rapid development of information technology enables various information of human society to be updated continuously, so that people need to acquire needed knowledge from a large amount of information and need to process the large amount of information. Various file materials are disordered and complicated, the files are classified, stored and organized to be utilized, corresponding file materials and file libraries need to be established for some file information, and sometimes some information needs to be exchanged and searched, so that the labor cost is reduced, and the efficiency is improved. The documents in the form of tables exist in the aspects of our lives, and have an important position in national economic life and daily life.

The existing OCR recognition system has better recognition effect on simple printed matters without tables and has poorer recognition rate on texts with complex backgrounds, irregular typesetting and tables. The method comprises the steps that complex texts of table types are contained, the texts in each table are independent modules, a traditional OCR recognition system cannot automatically connect and combine multiple lines of contents in the tables, and recognized result sentences are messy and have different meanings; the text data of the forms are complex and various, most of the fonts are small and numerous non-Song and regular script fonts, and the traditional OCR recognition system has low recognition rate of the fonts and is easy to generate character-like errors.

Disclosure of Invention

Based on this, the invention aims to provide a text recognition method, a text recognition device, a readable storage medium and a text recognition device, so as to solve the technical problems of low precision and easy error existing in the existing text recognition.

According to the embodiment of the invention, the text recognition method comprises the following steps:

acquiring an image to be identified;

carrying out character and table recognition on an image to be recognized by adopting a preset image recognition model so as to extract text data, a table structure and respective coordinate information of the text data and the table structure in the image to be recognized;

performing connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure, and determining coordinate information of the effective rectangular region according to the coordinate information of the table structure;

and fusing the text data and the effective rectangular area according to the coordinate correspondence relationship according to the text data and the coordinate information of the effective rectangular area, and outputting a fusion result so as to identify the text content recorded in the image to be identified.

In addition, the text recognition method according to the above embodiment of the present invention may further have the following additional technical features:

further, after extracting the text data in the image to be recognized, the method further includes:

and performing keyword error correction on the text data based on a pre-constructed keyword lexicon.

Further, after the step of performing keyword error correction on the text data based on a pre-constructed keyword lexicon, the method further includes:

and respectively inputting the text data after error correction and the text data before error correction into a preset voice model for scoring, and reserving the text data with high score.

Further, performing table recognition on the image to be recognized by using the preset image recognition model includes:

performing linear recognition on an image to be recognized by adopting a preset image recognition model to obtain a linear data set, wherein the linear data set comprises linear data and coordinate information thereof;

screening, combining and/or rejecting straight line data in the straight line data set based on a preset processing rule to obtain an effective straight line data set;

wherein the table structure is formed by straight line data in the valid straight line data set, and the preset processing rule includes:

removing straight lines corresponding to the positive direction included angle of the x axis from 15 degrees to 75 degrees;

eliminating straight lines with the length smaller than 50 pixel values;

merging the straight lines with the straight line pitch smaller than 10 pixel values;

and eliminating straight lines parallel to the edges and with the distance less than 15 pixel points.

Further, the step of performing connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure includes:

mapping each straight line in the effective straight line data set to a corresponding position of a blank picture, wherein the pixels of the blank picture and the image to be identified are the same;

performing connected region segmentation on the blank picture by adopting a preset region segmentation module, and extracting all rectangular regions of the blank picture;

and screening the rectangular areas according to the areas of the rectangular areas and the IOU ratio, and eliminating non-effective rectangular areas to obtain effective rectangular areas limited by the table structure.

Further, the character recognition of the image to be recognized by adopting the preset image recognition model comprises:

adopting the preset image recognition model to detect text lines of the image to be recognized, and then carrying out OCR character recognition on each text line;

and removing the character box of the recognized character to obtain the text data.

Further, after the step of acquiring the image to be recognized, the method further includes:

preprocessing the image to be identified, wherein the preprocessing mode comprises one or more of image size normalization, graying processing, binarization processing, bilateral filtering processing, mathematical morphology processing and image rotation processing;

wherein the image rotation processing includes:

performing linear detection on the image to be identified through radiation transformation to find out an angle set r 1 of all the straight lines in the image;

screening an angle set according to the length and the position of the straight line, and eliminating the straight line angles which do not meet the conditions to obtain an angle set Gamma 2;

obtaining a mode of an angle set r 2 to obtain a predicted angle Ã of the image to be recognized;

performing first angle rotation on the image to be recognized according to a prediction angle Ã;

and performing four-classification angle prediction on the image to be recognized after the image to be recognized is rotated for one time, and performing angle rotation for the second time on the image according to a prediction result.

A text recognition apparatus according to an embodiment of the present invention includes:

the image acquisition module is used for acquiring an image to be identified;

the information identification module is used for carrying out character and table identification on the image to be identified by adopting a preset image identification model so as to extract text data, a table structure and respective coordinate information of the text data and the table structure in the image to be identified;

the region segmentation module is used for performing connected region segmentation on the table structure based on a preset region segmentation module so as to identify an effective rectangular region defined by the table structure, and determining coordinate information of the effective rectangular region according to the coordinate information of the table structure;

and the data fusion module is used for fusing the text data and the effective rectangular area according to the coordinate correspondence relationship according to the text data and the coordinate information of the effective rectangular area, and outputting a fusion result so as to identify the text content recorded in the image to be identified.

The invention also proposes a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the text recognition method described above.

The invention also proposes a text recognition device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the text recognition method as described above when executing the program.

Compared with the prior art: the method comprises the steps of performing character and table recognition on an image to be recognized based on model training to respectively extract text data and a table structure in the image and respective coordinates of the text data and the table structure, then performing connected region segmentation on the table structure based on the model training to obtain an effective rectangular region and coordinates of the table structure, and then fusing the extracted characters and the effective rectangular region according to the coordinate corresponding relation, so that automatic connection and combination of multi-line texts in the table and the table are realized, the problems of sentence disorder, semantic incoherence and the like of a recognition result are avoided, and the text recognition precision is greatly improved.

Drawings

FIG. 1 is a flow chart of a text recognition method in a first embodiment of the present invention;

FIG. 2 is a flow chart of a text recognition method in a second embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a text recognition apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a text recognition apparatus in a fourth embodiment of the present invention.

The following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

To facilitate an understanding of the invention, the invention will now be described more fully with reference to the accompanying drawings. Several embodiments of the invention are presented in the drawings. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for illustrative purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Example one

Referring to fig. 1, a text recognition method according to a first embodiment of the present invention is shown, where the text recognition method can be implemented by software and/or hardware, and the method specifically includes steps S01-S04.

In step S01, an image to be recognized is acquired.

Specifically, when the document to be recognized belongs to a paper document (such as a printed matter), the document to be recognized can be converted into the image to be recognized in a corresponding format through modes of photographing, scanning and the like, and when the paper document is photographed or scanned, the paper document is ensured to be placed flatly and have no obvious stain on the surface as much as possible, so that the noise interference of subsequent images is reduced as much as possible. When the file to be recognized belongs to the electronic file (such as PDF format) but not the image format, the file to be recognized can be converted into the image to be recognized in the corresponding format by means of image conversion and the like.

And step S02, recognizing characters and tables of the image to be recognized by adopting a preset image recognition model so as to extract the text data, the table structure and the coordinate information of the text data and the table structure in the image to be recognized.

Specifically, in some optional embodiments of this embodiment, the preset image recognition model may include at least one of a DBnet network, an attention mechanism-based RCNN and a Unet network model, and specifically, the DBnet network may be used to perform text line detection on the image to be recognized, and then perform OCR character recognition on the detected text line through the attention mechanism-based RCNN, so as to extract text data and coordinate information thereof in the image to be recognized. Meanwhile, the Unet network model or the optimized Unet network model can be used for extracting the table lines of the image to be recognized so as to extract the table structure and the coordinate information of the image to be recognized.

In specific implementation, a training sample set may be collected first, then samples in the training sample set are labeled, and then the labeled training sample set is used to train the DBnet network, the attention mechanism-based RCNN and the optimized Unet network model, so as to train and obtain the DBnet network capable of performing text line detection on images, the attention mechanism-based RCNN capable of performing OCR character recognition on text lines, and the optimized Unet network model capable of performing table recognition on images.

Step S03, performing connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure, and determining coordinate information of the effective rectangular region according to the coordinate information of the table structure.

Specifically, in some optional embodiments of this embodiment, the preset region segmentation module may be, but is not limited to, a Two-Pass algorithm, where the Two-Pass algorithm finds and marks all connected regions in the image by traversing the Two-Pass image. After the effective rectangular area of the table structure is identified, because the effective rectangular area is defined by the table lines (generally straight lines) of the table structure, and the coordinate information of the table structure includes the coordinates of each pixel point of each table line, the coordinates of each pixel point of the boundary line of the effective rectangular area can be determined, and thus the coordinate information of the effective rectangular area is determined.

And step S04, according to the text data and the coordinate information of the effective rectangular area, fusing the text data and the effective rectangular area according to the coordinate corresponding relation, and outputting a fusion result to identify the text content recorded in the image to be identified.

Wherein each valid rectangular area corresponds to a grid area of the table. It should be understood that, based on the coordinate information of the effective rectangular areas, the area range defined by each effective rectangular area may be determined, and characters whose coordinates in the text data fall into the area range are filled into the corresponding positions of the effective rectangular areas for fusion, so as to implement automatic connection and combination of multiple lines of texts in the form and the form.

In summary, in the text recognition method in the above embodiments of the present invention, text and table recognition is performed on an image to be recognized based on model training to respectively extract text data and a table structure in the image and respective coordinates of the text data and the table structure, then connected region segmentation is performed on the table structure based on the model training to obtain an effective rectangular region and coordinates of the table structure, and then the extracted text and the effective rectangular region are fused according to a coordinate correspondence, so that automatic connection and combination of multiple lines of texts in the table and the table are realized, problems of sentence confusion, semantic incoherence and the like in a recognition result are avoided, and text recognition accuracy is greatly improved.

Example two

Referring to fig. 2, a text recognition method according to a second embodiment of the present invention is shown, where the text recognition method can be implemented by software and/or hardware, and the method specifically includes steps S1-S5.

And step S1, acquiring an image to be recognized, and preprocessing the image to be recognized.

In this embodiment, the image to be recognized is converted from a paper document, and the specific conversion process is as follows: the text is converted into a pdf document by photographing or scanning, and the pdf document is subjected to format analysis by measures such as resolution control and file size filtering and is converted into an image to be recognized in a specific format (such as jpg, png, and the like).

The preprocessing mode comprises one or more of image size normalization, graying processing, binarization processing, bilateral filtering processing, mathematical morphology processing and image rotation processing. The purpose of image preprocessing is to extract valid information and attenuate redundant or invalid information, thereby improving image quality. By way of example and not limitation, in this embodiment, the step of preprocessing the image to be recognized specifically includes:

step 1.1, converting the picture into a gray-scale image by using a weighted average method;

step 1.2, redundant or invalid information of the picture is removed by using a bilateral filtering method;

step 1.3: performing small-area filling and edge burr removing on the picture characters by morphological processing methods such as open-close operation and the like;

step 1.4: and 2-time angle rotation correction is carried out on the picture by using affine transformation and a VGG (vertical gradient generator) model, so that the influence on the fused text caused by coordinate offset errors is reduced. Namely, the image rotation processing is implemented by the following steps:

step 1.4.1, performing linear detection on a picture through radiation transformation, finding out angle sets r 1 of all straight lines, screening the angle sets according to the length and the position of the straight lines, and eliminating straight line angles which do not meet conditions, such as straight lines with short lengths and offset positions, so as to improve the precision of picture angle prediction, obtaining angle sets r 2 after elimination, solving the mode of the angle sets r 2 to obtain the predicted angle Ã of the picture, wherein the predicted angle Ã is an included angle of the picture in the relative horizontal or vertical direction;

step 1.4.2, performing first angle rotation on the picture according to the prediction angle Ã to rotate the picture into a horizontal or vertical state;

and step 1.4.3, carrying out VGG four-classification angle prediction on the picture after the picture is rotated for the first time, and carrying out angle rotation for the second time on the picture according to the detection result.

It should be noted that, generally, before processing the image, the image is required to be in a vertically-aligned state, which facilitates subsequent processing. After one rotation, the image is only adjusted to be in a horizontal or vertical state, and therefore, the image needs to be further rotated by 0 degree, 90 degrees, 180 degrees or 270 degrees (i.e., the image 4 is classified and detected to be rotated), so as to adjust the image to be in a vertically-aligned state, for example, the image is rotated 90 degrees counterclockwise from the horizontal state to be in a vertically-aligned state.

In specific implementation, the embodiment specifically uses the VGG16 network model to perform image 4 classification detection rotation on the picture. Specifically, all training samples can be respectively subjected to 0-degree, 90-degree, 180-degree and 270-degree rotation processing, so that each training sample correspondingly generates 4 training pictures with different rotation angles, and then, based on the training picture sets, 4-class learning and training are performed on the VGG16 network model, so that a VGG16 network model capable of performing image 4-class detection rotation on the pictures is obtained, so that the pictures with any rotation angle (for example, 270 degrees) are input into the VGG16 network model, and the VGG16 network model can convert the rotation of the pictures into a vertically-aligned state (0 degree) and output the vertically-aligned state. The VGG16 model adopts a CNN structure to extract features, comprises 13 layers of convolution to fully extract high-latitude features of pictures, stores an optimal model through parameter tuning, and solidifies and deploys the model.

And step S2, recognizing characters and tables of the image to be recognized by adopting a preset image recognition model so as to extract the text data, the table structure and the coordinate information of the text data and the table structure in the image to be recognized.

In some optional cases of this embodiment, the process of performing table recognition on the image to be recognized by using the preset image recognition model specifically includes the following steps:

wherein the table structure is composed of line data among the valid line data sets.

Specifically, the preset image recognition model may be a Unet network model or an optimized Unet network model, and in the specific implementation, a large number of pictures containing tables may be collected, table lines on the pictures are manually labeled according to pixel points, and the table lines are labeled into two categories, namely horizontal lines and vertical lines, and then the optimized Unet network model is used to perform model training and optimization on the pictures, store an optimal parameter model, perform solidification and tfserving deployment, so as to form a model capable of performing table line recognition on the image to be recognized.

It should be noted that, the result output by the Unet network model is all straight lines on the image to be recognized, that is, a straight line data set, and at this time, the straight line data set may contain not only table lines but also other interfering straight lines, so that the straight line data set needs to be processed based on a preset processing rule to select actually required table lines from the straight line data set, so as to obtain an effective straight line data set. Specifically, in the present embodiment, after a large number of experimental analyses, determining the preset processing rule includes:

eliminating straight lines with the included angle of 15-75 degrees with the positive direction of the x axis.

Eliminating straight lines with length less than 50 pixel values;

combining the straight lines with the straight line interval smaller than 10 pixel values;

and fourthly, eliminating straight lines which are parallel to the edges and have a distance less than 15 pixel points.

On the other hand, in some optional cases of this embodiment, the process of performing character recognition on the image to be recognized by using the preset image recognition model may specifically include the following steps:

The method comprises the steps of firstly detecting a text line through a DBnet network, and secondly identifying OCR characters of the text line through an RCNN convolution network based on an attention mechanism.

Firstly, running a sliding CNN on an input picture to extract features based on an RCNN convolution network model of an attention mechanism; inputting the obtained characteristic sequence into an LSTM overlapped on the top of the CNN for coding the characteristic sequence; decoding is performed using an attention model and a tag sequence is output. The attention model used allows the decoder to calculate a variable context vector by weighted averaging the hidden states of the encoder during each decoding step, so that the most relevant information can be read from time to time without having to rely entirely on the hidden state at the previous time.

And step S3, performing keyword error correction on the text data based on a pre-constructed keyword lexicon, inputting the text data after error correction and the text data before error correction into a preset voice model respectively for scoring, and reserving the text data with high score.

In the conventional OCR recognition process, because the text data of the table type is complex and various, and most of fonts are small fonts such as non-Song and non-regular characters, the conventional OCR recognition system has low recognition rate of the fonts, and is easy to generate character-like errors. Therefore, in order to solve the technical problem, the present embodiment proposes a keyword error correction mechanism, that is, after text data is extracted, keyword error correction is also performed on the text data, so as to solve the problem that OCR is likely to make a word recognition error. The specific error correction process is as follows:

and replacing the wrongly-written characters in the recognized sentence with the sensitive wrongly-written characters by using a kenlm training language model to obtain a new sentence. And importing the trained language model, and respectively scoring the original sentence and the sentence newly obtained after replacement. If the score of the newly obtained sentence is higher than that of the original sentence, the problem of the original sentence is shown, the character is modified into the character which needs to be replaced, otherwise, the original sentence is kept unchanged.

In order to improve the efficiency of error correction, in some optional cases of this embodiment, only error-prone or more important words or phrases may be corrected, and in specific implementation, more important keywords or keywords set by the user and error-prone may be collected in advance, so as to construct a keyword lexicon in advance, where the keyword lexicon is to be embedded in a language model for training, and thus after extracting text data, the language model matches the text data with each keyword or keyword in the keyword lexicon in an identifying degree manner, thereby completing keyword correction of the text data.

Step S4, performing connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure, and determining coordinate information of the effective rectangular region according to the coordinate information of the table structure.

In some optional cases of this embodiment, the step of performing connected component segmentation on the table structure based on a preset component segmentation module to identify an effective rectangular area defined by the table structure includes:

It should be noted that, after the table lines are extracted, the table lines are transferred to the blank picture with the same pixels to perform connected region segmentation, instead of performing connected region segmentation on the original image, so that the workload is greatly reduced, meanwhile, the interference of other noise points on the original image is greatly avoided, the precision of connected region segmentation is improved, each cell region of the table structure is accurately segmented, and the accuracy of subsequent fusion with characters is improved. In addition, a rectangular region screening mechanism with the conditions of rectangular region area and IOU ratio is added, and a required effective rectangular region can be rapidly screened out, so that the problem that interference factors exist on blank pictures or error interference exists in effective linear data sets can be well solved, and the precision is further improved.

Therefore, in a specific case of the present embodiment, the overall process of extracting the effective rectangular region of the table structure in the image to be recognized (i.e., each table region) may specifically be as follows (steps one to six):

the method comprises the following steps: marking the training data set according to pixel points, marking the training data set into two categories of horizontal lines and vertical lines, performing model training and tuning on the picture by using the optimized Unet network model, saving the optimal parameter model, and performing solidification and tfserving deployment;

wherein, an architecture with nested and dense skip connections is added on the Unet network, reducing the semantic gap between the encoder and decoder. In the convolution process, different scale feature maps learn different picture features, the low-level feature map captures rich spatial information, the boundary features of a learning object, and the high-level semantic feature map learns the position information features of the learning object. The Unet network fuses shallow fine information and deep semantic information in a channel dimension splicing mode on 5 scales to generate 320 feature maps with the same resolution, and then convolution is carried out through a filter with the size of 3 x 3.

Step two: and (3) zooming the preprocessed image to be recognized to 1024 x 1024 scale, performing region segmentation on the image to be recognized by using the optimized Unet model obtained by training in the step one, traversing the predicted value of each point in the image, judging the target point when the predicted value is greater than a given threshold value, performing least square straight line fitting on all the target points, and obtaining a straight line data set L1.

Step three: and screening, merging and rejecting the straight line data set L1 to obtain an effective straight line data set L2.

Step four: an effective straight line data set L2 is drawn on a blank picture of the same pixel, and the image is subjected to region extraction based on a Two-Pass connected region marking method to obtain a rectangular region set.

Specifically, the Two-Pass algorithm finds and marks all connected regions in the image by traversing the image twice. The algorithm is realized by the following steps:

(1) first scan

Access the current pixel B (x, y), if B (x, y) = = 1:

a. if the neighborhood pixel values of B (x, y) are all zero, then B (x, y) is assigned a new label:

wherein x and y represent the horizontal and vertical coordinates of the current pixel, respectively, and B (x, y) represents the pixel value with coordinates (x, y).

b. If there are more than 1 pixels in the 8 neighborhood of B (x, y):

assigning the minimum value of the pixel in the 8 neighborhoods to B (x, y), recording the equal relation among all values (label) in the domain values, wherein the pixels with the same label value are the same connected region;

where LabelSet [ i ] represents the same label along with the region, label _ m and label _ n are the locations where the pixel is label.

(2) Second pass scanning

The current pixel B (x, y) is visited, and if B (x, y) >1, a minimum label value is found that is sibling to label = B (x, y) and assigned to B (x, y).

Step five: obtaining an area set S of all rectangular areas (calculated according to the minimum bounding rectangle of the areas), and screening the rectangular areas according to the area size and a rectangular intersection ratio (IOU ratio), wherein the screening conditions are as follows:

wherein: si represents the ith area in the area S set, x and y represent the horizontal and vertical coordinates of the pixel points,Ω _irepresents the spatial region of the Si and,w _irepresentsΩ _iThe width of the abscissa of the spatial region,h _irepresentsΩ _iThe height of the ordinate of the spatial region, γ, represents the threshold value of the intersection ratio.

Step six: and screening the rectangular area again according to the angle of the minimum circumscribed rectangle of the rectangular area, and screening out the rectangular area with the angle between-10 and 10, thereby determining the effective rectangular area.

And step S5, according to the text data and the coordinate information of the effective rectangular area, fusing the text data and the effective rectangular area according to the coordinate corresponding relation, and outputting a fusion result to identify the text content recorded in the image to be identified.

In specific implementation, the fusion result may be that the two information are fused together, that is, the text is filled in the table for storage, or the fusion result may also be that the corresponding relationship between the two information is determined, and then the information is associated and stored in the database according to a preset template format (e.g., (key, velua)), so that the storage is convenient for the later-stage fast traversal query.

EXAMPLE III

Another aspect of the present invention further provides a text recognition apparatus, referring to fig. 3, which shows a text recognition apparatus according to a third embodiment of the present invention, the text recognition apparatus includes:

the image acquisition module 11 is used for acquiring an image to be identified;

the information identification module 12 is configured to perform character and table identification on an image to be identified by using a preset image identification model, so as to extract text data, a table structure and respective coordinate information of the text data and the table structure in the image to be identified;

the region segmentation module 13 is configured to perform connected region segmentation on the table structure based on a preset region segmentation module to identify an effective rectangular region defined by the table structure, and determine coordinate information of the effective rectangular region according to coordinate information of the table structure;

and the data fusion module 14 is configured to fuse the text data and the effective rectangular region according to the coordinate information of the text data and the effective rectangular region, and output a fusion result to identify text content recorded in the image to be identified.

Further, in some optional embodiments of the present invention, the text recognition apparatus further comprises:

and the keyword error correction module is used for performing keyword error correction on the text data based on a pre-constructed keyword lexicon.

and the error correction scoring module is used for respectively inputting the text data after error correction and the text data before error correction into a preset voice model for scoring and reserving the text data with high score.

Further, in some optional embodiments of the present invention, the information identification module 12 includes:

the system comprises a line identification unit, a line identification unit and a data processing unit, wherein the line identification unit is used for performing line identification on an image to be identified by adopting a preset image identification model to obtain a line data set, and the line data set comprises line data and coordinate information thereof;

the straight line screening module is used for screening, combining and/or rejecting straight line data in the straight line data set based on a preset processing rule to obtain an effective straight line data set;

Further, in some optional embodiments of the present invention, the region segmentation module 13 includes:

the straight line migration unit is used for mapping each straight line in the effective straight line data set to a corresponding position of a blank picture, and the pixels of the blank picture and the image to be identified are the same;

the region segmentation unit is used for performing connected region segmentation on the blank picture by adopting a preset region segmentation module and extracting all rectangular regions of the blank picture;

and the region screening unit is used for screening the rectangular regions according to the areas of the rectangular regions and the IOU ratio, and eliminating non-effective rectangular regions to obtain effective rectangular regions defined by the table structure.

Further, in some optional embodiments of the present invention, the information identification module 12 further includes:

the information recognition unit is used for detecting text lines of the image to be recognized by adopting the preset image recognition model and then performing OCR character recognition on each text line;

a character frame removing unit for removing the character frame of the recognized character to obtain the text data

and the image preprocessing module is used for preprocessing the image to be identified, and the preprocessing mode comprises one or more of image size normalization, graying processing, binarization processing, bilateral filtering processing and mathematical morphology processing.

The functions or operation steps of the modules and units when executed are substantially the same as those of the method embodiments, and are not described herein again.

In summary, in the text recognition apparatus in the above embodiment of the present invention, the image is preprocessed by using methods such as image size normalization, graying processing, bilateral filtering, and mathematical morphology processing, so as to extract valid information, weaken redundant or invalid information, and facilitate detection and recognition of a text; by constructing a keyword word base table, keyword error correction is performed on the recognized characters by adopting a method based on an editing distance and a language model, and the recognition accuracy is improved. Extracting table lines of the text by using a Unet model, carrying out region segmentation on the picture by using a connected region marking method, extracting all rectangular regions, screening the rectangular regions according to the area and IOU ratio, and removing non-effective rectangular regions; and completing effective splicing of the recognition results through fusion of the effective rectangular area and the OCR recognition results.

Example four

Referring to fig. 4, a text recognition apparatus according to a fourth embodiment of the present invention is shown, which includes a memory 20, a processor 10, and a computer program 30 stored in the memory and running on the processor, wherein the processor 10 implements the text recognition method as described above when executing the computer program 30.

The processor 10 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used to execute program codes stored in the memory 20 or process data, such as executing an access restriction program.

The memory 20 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 20 may in some embodiments be an internal storage unit of the text recognition device, for example a hard disk of the text recognition device. The memory 20 may also be an external storage device of the text recognition device in other embodiments, such as a plug-in hard disk provided on the text recognition device, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 20 may also include both an internal storage unit of the text recognition apparatus and an external storage device. The memory 20 may be used not only to store application software installed in the text recognition apparatus and various kinds of data, but also to temporarily store data that has been output or will be output.

It should be noted that the configuration shown in fig. 4 does not constitute a limitation of the text recognition device, and in other embodiments the text recognition device may include fewer or more components than shown, or combine some components, or a different arrangement of components.

In summary, in the text recognition device in the above embodiment of the present invention, the image is preprocessed by using methods such as image size normalization, graying processing, bilateral filtering, and mathematical morphology processing, so as to extract valid information, weaken redundant or invalid information, and facilitate detection and recognition of a text; by constructing a keyword word base table, keyword error correction is performed on the recognized characters by adopting a method based on an editing distance and a language model, and the recognition accuracy is improved. Extracting table lines of the text by using a Unet model, carrying out region segmentation on the picture by using a connected region marking method, extracting all rectangular regions, screening the rectangular regions according to the area and IOU ratio, and removing non-effective rectangular regions; and completing effective splicing of the recognition results through fusion of the effective rectangular area and the OCR recognition results.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the text recognition method as described above.

Those of skill in the art will understand that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be viewed as implementing logical functions, can be embodied in any computer-readable storage medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable storage medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable storage medium may even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims

1. A method of text recognition, the method comprising:

acquiring an image to be identified;

recognizing characters and tables of the image to be recognized by adopting a preset image recognition model so as to extract text data, table structures and respective coordinate information of the text data and the table structures in the image to be recognized;

2. The text recognition method according to claim 1, further comprising, after extracting text data from the image to be recognized:

3. The method of claim 2, wherein after the step of performing keyword error correction on the text data based on a pre-constructed keyword lexicon, further comprising:

4. The text recognition method of claim 1, wherein performing table recognition on the image to be recognized by using the preset image recognition model comprises:

performing linear recognition on the image to be recognized by adopting the preset image recognition model to obtain a linear data set, wherein the linear data set comprises linear data and coordinate information thereof;

eliminating straight lines with the length smaller than 50 pixel values;

5. The text recognition method of claim 4, wherein the step of performing connected region segmentation on the table structure based on a preset region segmentation module to identify the effective rectangular region defined by the table structure comprises:

6. The text recognition method of claim 1, wherein performing character recognition on the image to be recognized by using the preset image recognition model comprises:

7. The text recognition method according to any one of claims 1 to 6, further comprising, after the step of acquiring the image to be recognized:

wherein the image rotation processing includes:

8. A text recognition apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an image to be identified;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the text recognition method according to any one of claims 1 to 7.

10. A text recognition apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the text recognition method of any one of claims 1 to 7 when executing the program.