CN116311276A - Document image correction method, device, electronic equipment and readable medium - Google Patents

Document image correction method, device, electronic equipment and readable medium Download PDF

Info

Publication number
CN116311276A
CN116311276A CN202310220642.1A CN202310220642A CN116311276A CN 116311276 A CN116311276 A CN 116311276A CN 202310220642 A CN202310220642 A CN 202310220642A CN 116311276 A CN116311276 A CN 116311276A
Authority
CN
China
Prior art keywords
document image
corrected
text
result
degrees
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310220642.1A
Other languages
Chinese (zh)
Inventor
陶提
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Taimei Digital Technology Co ltd
Original Assignee
Shanghai Taimei Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Taimei Digital Technology Co ltd filed Critical Shanghai Taimei Digital Technology Co ltd
Priority to CN202310220642.1A priority Critical patent/CN116311276A/en
Publication of CN116311276A publication Critical patent/CN116311276A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/1607Correcting image deformation, e.g. trapezoidal deformation caused by perspective
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables

Abstract

The invention discloses a document image correction method, a device, electronic equipment and a readable medium, wherein the document image correction method comprises the following steps: acquiring offset probabilities of offset 0 degrees, 90 degrees, 180 degrees and 270 degrees of a document image to be corrected relative to a standard direction of the document image to be corrected; when the maximum probability in the bias probabilities is larger than a first threshold value, correcting the document image to be corrected to a standard direction based on the bias angle corresponding to the maximum probability; and when the maximum probability in the bias probabilities is smaller than or equal to a first threshold value, performing text detection and text recognition on the document image to be corrected, judging whether a text recognition result has semantic information, and correcting the document image to be corrected to a standard direction based on the judgment result. According to the document image correction method and device, the document image to be corrected is corrected based on the offset probability of the document image to be corrected, and the document image to be corrected can be corrected to the standard direction, so that the OCR recognition accuracy of the document image is improved.

Description

Document image correction method, device, electronic equipment and readable medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and apparatus for correcting a document image, an electronic device, and a readable medium.
Background
In the prior art, text content in a document image is generally recognized by OCR (Optical Character Recognition ) technology. In the case of OCR recognition, it is generally required that the direction of placement of a document image matches the recognition direction of OCR, otherwise, the result of OCR recognition is likely to be erroneous, and a large amount of messy codes appear.
Accordingly, in view of the above-mentioned technical problems, it is necessary to provide a document image correction method, apparatus, electronic device, and readable medium.
Disclosure of Invention
The invention aims to provide a document image correction method, a device, electronic equipment and a readable medium, which can correct the direction of a document image so as to improve the OCR recognition accuracy of the document image.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
in a first aspect, the present invention provides a document image correction method, including:
acquiring offset probabilities of offset 0 degrees, 90 degrees, 180 degrees and 270 degrees of a document image to be corrected relative to a standard direction of the document image to be corrected; when the maximum probability in the bias probabilities is larger than a first threshold value, correcting the document image to be corrected to the standard direction based on the bias angle corresponding to the maximum probability; and when the maximum probability in the bias probabilities is smaller than or equal to a first threshold value, identifying the text content of the document image to be corrected, judging whether the identification result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result.
In one or more embodiments, correcting the document image to be corrected based on the semantic information judgment result of the recognition result includes: and when the identification result has semantic information, determining the current direction of the document image to be corrected as the standard direction.
In one or more embodiments, identifying text content of the document image to be corrected, judging whether the identification result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result, including: performing text detection on the document image to be corrected, and performing text recognition on the text detection result to obtain a recognition result of text content of the document image to be corrected; and when the identification result does not have semantic information, determining the offset angle of the document image to be corrected based on a text detection model with a transverse text word detection function and a longitudinal text word detection function, and correcting the document image to be corrected to the standard direction based on the offset angle.
In one or more embodiments, identifying text content of the document image to be corrected, judging whether the identification result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result, including: performing text detection on the document image to be corrected, and performing text recognition on the text detection result to obtain a recognition result of text content of the document image to be corrected; and when the recognition result does not have semantic information, determining a bias angle of the document image to be corrected based on the text detection and the text recognition result, and correcting the document image to be corrected to the standard direction based on the bias angle.
In one or more embodiments, determining the offset angle of the document image to be rectified based on the results of the text detection and the text recognition includes: text detection is carried out on the document image to be corrected based on a transverse text detection model, and text recognition is carried out on the text detection result; and determining the offset angle of the text word in the document image to be corrected based on the text recognition result.
In one or more embodiments, determining the offset angle of the text word in the document image to be corrected based on the result of the text recognition includes: determining the ratio of a single-character text box to a total text box in the text detection result based on the text recognition result; when the duty ratio is larger than a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 90 degrees or 270 degrees; and when the duty ratio is smaller than or equal to a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 180 degrees.
In one or more embodiments, correcting the document image to be corrected based on the offset angle includes: and when the offset angle of the document image to be corrected is 180 degrees, rotating the document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
In one or more embodiments, correcting the document image to be corrected based on the offset angle includes: when the offset angle of the document image to be corrected is 90 degrees or 270 degrees, rotating the document image to be corrected by 90 degrees; identifying the text content of the rotated document image to be corrected, and judging whether the identification result of the text content has semantic information or not; when the identification result has semantic information, determining that the current direction of the rotated document image to be corrected is the standard direction; and when the identification result does not have semantic information, rotating the rotated document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
In one or more embodiments, acquiring offset probabilities of the document image to be corrected offset by 0 °, 90 °, 180 °, and 270 ° with respect to a standard direction thereof, respectively, includes: inputting a document image to be corrected into a rotation model, and classifying image features of the document image to be corrected based on the rotation model; based on the classification result, the offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG of the document image to be corrected relative to the standard direction of the document image to be corrected are output.
In a second aspect, the present invention provides a document image correction apparatus comprising:
the acquisition module is used for acquiring offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG of offset of the document image to be corrected relative to the standard direction of the document image to be corrected; the first correction module is used for correcting the document image to be corrected to the standard direction based on the offset angle corresponding to the maximum probability when the maximum probability in the offset probabilities is larger than a first threshold value; and the second correction module is used for identifying the text content of the document image to be corrected when the maximum probability in the bias probability is smaller than or equal to a first threshold value, judging whether the identification result of the text content has semantic information or not, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result.
In a third aspect, the present invention provides an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the document image correction method as described above when executing the program.
In a fourth aspect, the present invention provides a computer readable medium having computer executable instructions embodied therein, which when executed by a processor, are adapted to carry out a document image correction method as described above.
Compared with the prior art, the method and the device for correcting the document image, provided by the invention, have the advantages that the offset probabilities of 0 degree, 90 degrees, 180 degrees and 270 degrees are respectively offset relative to the standard direction of the document image to be corrected, and the document image to be corrected is corrected based on the offset probabilities, so that the document image to be corrected can be corrected to the standard direction, and the OCR recognition accuracy of the document image is improved; the method can directly carry out rotation correction on the document image with higher bias angle confidence based on the bias probability, carries out text recognition and semantic classification on the document image with lower bias angle confidence, and further determines the bias angle of the document image by combining the recognition result, thereby improving the correction accuracy of the document image.
Drawings
FIG. 1 is a schematic view of an application scenario of a method for correcting a file image according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for correcting a document image according to an embodiment of the present invention;
FIG. 3 is a block diagram showing a configuration of a device for correcting a file image according to an embodiment of the present invention;
fig. 4 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the invention is, therefore, to be taken in conjunction with the accompanying drawings, and it is to be understood that the scope of the invention is not limited to the specific embodiments.
Throughout the specification and claims, unless explicitly stated otherwise, the term "comprise" or variations thereof such as "comprises" or "comprising", etc. will be understood to include the stated element or component without excluding other elements or components.
In order to facilitate understanding of the technical solutions of the present application, the following first explains in detail the technical terms that may occur in the present invention.
Document image: refers to images in a document that contain text content. For example, an image in a document in a format such as PDF formed by photographing or scanning paper text may be used. For document images in a PDF document, each page may be a document image.
Standard direction: refers to a direction matching the recognition direction of the recognition system. Generally, a document image is input into the recognition system in a standard direction, so that higher recognition accuracy can be obtained. In general, document images in a standard direction are more in line with the reading habit of human beings.
OCR (Optical Character Recognition ): the method is characterized in that electronic equipment is adopted to check characters printed on paper, the shapes of the characters are determined through detecting dark and bright modes, and then the shapes are translated into computer characters by a character recognition method; that is, a technique of converting characters in a paper document into a document image of a black-and-white lattice by optically converting characters in the paper document into a text format which can be edited by converting characters or texts in the image into a text format by processing and analyzing the image by a computer.
OCR systems need to convert characters or text in text regions of a document image into editable text form, and the recognition direction of the document image directly affects the recognition accuracy and efficiency of the OCR system. If the recognition direction of the document image is incorrect, the OCR system may recognize the character or text as erroneous or not recognized. For example, in a chinese OCR system, if the recognition direction of an image is incorrect, text may be recognized as a corresponding traditional or messy word.
In order to avoid the problems, the document image correction method and the device provided by the invention can correct the document image so that the document image has a correct recognition direction.
Referring to fig. 1, an exemplary application scenario diagram of a document image correction method according to the present invention is shown. In the implementation scenario shown in fig. 1, it includes a client 101, a document image correction server 102, and a network 103. The network 103 is a medium to provide a communication link between the client 101 and the document image correction server 102. The network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, etc., and the network 103 may be at least one of a local area network, a metropolitan area network, and a wide area network.
The client 101 may be an electronic device for providing an image of a document to be rectified. For example, the electronic device may be a mobile terminal such as a smart phone, a tablet computer, a laptop portable notebook computer, or a terminal such as a desktop computer, a projection computer, which is not limited in the embodiment of the present invention.
The document image correction server 102 refers to a server for running any one of document image correction programs and providing a corresponding document image correction service. The document image rectification server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligence platforms, and the like.
A user can upload a document image to be corrected to the document image correction server 102 through the client 101, and the document image correction server 102 classifies image characteristics of the document image to be corrected to acquire offset probabilities of respectively offsetting the document image to be corrected by 0 DEG, 90 DEG, 180 DEG and 270 DEG relative to the standard direction of the document image to be corrected; and correcting the document image to be corrected based on the bias probability. When the maximum probability among the bias probabilities is greater than a first threshold value, the document image correction server 102 may correct the document image to be corrected to the standard direction based on the bias angle corresponding to the maximum probability. When the maximum probability of the bias probabilities is less than or equal to a first threshold, the document image correction server 102 may identify text content of the document image to be corrected, determine whether the identification result of the text content has semantic information, and correct the document image to be corrected to the standard direction based on the semantic information determination result of the identification result.
Fig. 2 is a flowchart of a method for correcting a document image according to an embodiment of the invention. The document image correction method specifically comprises the following steps:
s201: and acquiring offset probabilities of offset 0 degrees, 90 degrees, 180 degrees and 270 degrees of the document image to be corrected relative to the standard direction of the document image to be corrected.
In the present embodiment, the offset angle means an angle that is offset clockwise with respect to the standard direction. Of course, in other embodiments, the offset angle may be an angle that is counter-clockwise offset relative to the normal direction.
It will be appreciated that document images (pages) in a document (e.g., a PDF document) typically include four placement directions, namely, 0 ° (from the standard direction), 90 ° (offset 90 ° from the standard direction), 180 ° (offset 180 ° from the standard direction), and 270 ° (offset 270 ° from the standard direction) directions. Wherein, the 0 ° direction is a standard direction matching the OCR recognition direction, and the OCR recognition result obtained in this direction is more accurate.
In an exemplary embodiment, the method for acquiring offset probabilities of 0 °, 90 °, 180 ° and 270 ° of the document image to be corrected relative to the standard direction thereof specifically includes: inputting a document image to be corrected into a rotation model, and classifying image features of the document image to be corrected based on the rotation model; based on the classification result, the offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG of the document image to be corrected relative to the standard direction of the document image to be corrected are output.
It should be noted that, the image features of the document image to be corrected may be extracted by an artificial intelligent model (such as a neural network model), or may be manually extracted by a worker through image feature extraction software. The extracted image features include texture features, color features, spatial relationship features, and the like.
The rotation model can be obtained through training of the existing four-classification model, the types output by the rotation model are 0 degree, 90 degree, 180 degree and 270 degree, and meanwhile the probability that the document image to be corrected belongs to each direction type can be output. The rotation model may be a multi-layer neural network model, or may be a conventional classifier, such as a Support Vector Machine (SVM), random forest, etc.
For a multi-layer neural network rotation model, a shallow convolutional network and a self-attention neural network connected in sequence can be included. The shallow convolutional network can be used for extracting image characteristic data of the document image to be rectified. The self-attention neural network can be used for classifying the document image to be corrected according to the image characteristic data, and the document image to be corrected is respectively biased by 0 DEG, 90 DEG, 180 DEG and 270 DEG of bias probability relative to the standard direction of the document image to be corrected. Alternatively, the shallow convolutional network may include a plurality of convolutional blocks and one fully-connected layer, each of which may include a convolutional (Conv) layer, a pooling (Pool) layer, a bulk normalization (Batch Normalization, BN) layer, and an excitation (ReLU) layer.
S202: and correcting the document image to be corrected to the standard direction based on the offset angle corresponding to the maximum probability when the maximum probability in the offset probabilities is larger than a first threshold.
The first threshold is a threshold for measuring the reliability of the bias probability result output by the rotation model. Typically, if the maximum probability of the bias probability of the output of the rotation model is greater than the first threshold, the maximum probability of the output is considered trusted. The first threshold may be set according to actual needs, and in general, different model types, different training manners, and different training data may cause different first thresholds.
For example, in one embodiment, the first threshold may be set at 80%. Assuming that the bias probabilities of the four direction categories of 0 degree, 90 degree, 180 degree and 270 degree of a certain document image to be corrected output by the rotation model are 6%, 2%, 90% and 2% respectively, the maximum probability in the bias probabilities of the document image to be corrected is 90%, namely the bias probability corresponding to the bias angle of 180 degrees. The maximum probability (90%) is greater than the first threshold (80%), the document image to be corrected can be considered to be offset by 180 ° with respect to its normal direction. At this time, the rotation model can rotate the document image to be corrected by 180 degrees according to the current offset angle (180 degrees) of the document image to be corrected, and the document image to be corrected can be corrected to the standard direction.
It is understood that when the offset angle of the document image to be corrected is 0 °, it means that the current direction of the document image to be corrected is the standard direction without further rotation correction. When the offset angle of the document image to be corrected is 180 degrees, the document image to be corrected is rotated 180 degrees clockwise or anticlockwise, and the document image to be corrected can be corrected to the standard direction. When the offset angle of the document image to be corrected is 90 degrees, the document image to be corrected is rotated by 90 degrees anticlockwise or 270 degrees clockwise, and then the document image to be corrected can be corrected to the standard direction. When the offset angle of the document image to be corrected is 270 degrees, the document image to be corrected is rotated by 270 degrees anticlockwise or 90 degrees clockwise, and the document image to be corrected can be corrected to the standard direction.
S203: and when the maximum probability in the bias probabilities is smaller than or equal to a first threshold value, identifying the text content of the document image to be corrected, judging whether the identification result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result.
It will be appreciated that the bias probability result output by the rotation model may be considered unreliable when the maximum probability of the bias probabilities is less than or equal to the first threshold. In this case, it is necessary to further identify the text content of the document image to be corrected, determine whether the identification result of the text content has semantic information, and correct the document image to be corrected to the standard direction based on the semantic information determination result of the identification result.
In an exemplary embodiment, the method for identifying the text content of the document image to be rectified specifically includes: and carrying out text detection on the document image to be corrected, and carrying out text recognition on the text detection result to obtain a recognition result of the text content of the document image to be corrected.
The text detection of the document image to be corrected means that a text region is detected in the document image, and the text region and a background region are separated to prepare for subsequent text recognition. Text detection typically involves the following procedure: image preprocessing, namely denoising, graying, binarizing and the like are carried out on an input image so as to better detect a text region; text word (text line) detection, which uses image processing and computer vision techniques to detect and segment text in an image to form a text box, commonly used text word detection algorithms include edge detection, region segmentation, template-based matching methods, and the like. Text detection functions may be implemented using deep neural network models, such as CTPN (Connectionist Text Proposal Network), PSENet (Shape Robust Text Detection with Progressive Scale Expansion Network), DBNet (Differentiable Binarization Network) models, and the like.
Text recognition refers to character segmentation and character recognition of text bars detected by text detection, and characters in a text image are converted into editable text. Text recognition typically includes the following flow: character segmentation, which is a key link of text recognition, is performed by separating different characters or texts so as to perform subsequent character recognition; character recognition, namely recognizing the segmented characters or texts, wherein the character recognition needs to extract and classify the characters by means of technologies such as computer vision, machine learning, deep learning and the like so as to obtain higher recognition accuracy and efficiency; and (3) post-processing, namely, performing post-processing on the recognition result to obtain a more accurate text recognition result, wherein the post-processing method comprises error correction, spell check, recognition result correction and the like. Text recognition functions may be implemented using neural network models, such as CRNN (Convolutional Recurrent Neural Network), RARE (Robust text recognizer with Automatic REctification) models, and the like.
In an exemplary embodiment, the method for correcting the document image to be corrected based on the semantic information judgment result of the recognition result specifically includes: and when the identification result has semantic information, determining the current direction of the document image to be corrected as the standard direction.
The semantic information refers to knowledge and information about things, concepts, relationships, and the like obtained by people through understanding language symbols. In natural language processing and computer science, semantic information generally refers to meaning information in natural language data in the form of text, voice, and the like, and is a high-level and abstract knowledge representation form.
When the text recognition result has semantic information, it means that the text recognition result is associated between the context information or one character and its surrounding characters. In general, text detection and text recognition are performed on document images in a standard direction, so that text content with semantic information can be obtained; and text detection and text recognition are carried out on document images in other bias directions, the obtained text content does not have semantic information, and unusual characters or messy codes which are not associated with each other among characters can appear. Therefore, whether the current direction of the document image is the standard direction of the document image can be judged by whether the text recognition result has semantic information or not.
In an exemplary embodiment, the method for correcting the document image to be corrected based on the semantic information judgment result of the recognition result specifically includes: and when the recognition result does not have semantic information, determining a bias angle of the document image to be corrected based on the text detection and the text recognition result, and correcting the document image to be corrected to the standard direction based on the bias angle.
Specifically, the method for determining the offset angle of the document image to be rectified based on the results of the text detection and the text recognition specifically comprises the following steps: text detection is carried out on the document image to be corrected based on a transverse text detection model, and text recognition is carried out on the text detection result; and determining the offset angle of the text word in the document image to be corrected based on the text recognition result. Further, based on the text recognition result, the method for determining the offset angle of the text word in the document image to be corrected specifically includes: determining the ratio of a single-character text box to a total text box in the text detection result based on the text recognition result; when the duty ratio is larger than a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 90 degrees or 270 degrees; and when the duty ratio is smaller than or equal to a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 180 degrees.
In this embodiment, the lateral text detection model for text detection is trained using lateral (i.e., text line lateral arrangement) text samples, and the second threshold is a standard threshold for determining the offset angle of the text word. When the offset angle of the document image is 90 ° or 270 °, most of the text detection results are single character text boxes. When the offset angle of the document image is 0 ° or 180 °, there are fewer single-character text boxes appearing in the result of text detection. Therefore, the offset angle of the text word in the document image can be judged according to the number ratio of the single-character text boxes to the total text boxes in the text detection result.
In this embodiment, the lateral direction refers to a direction parallel to the arrangement direction of text bars in a document image when the offset angle of the document image is 0 ° or 180 °; the longitudinal direction refers to a direction parallel to the arrangement direction of text bars in a document image when the offset angle of the document image is 90 ° or 270 °.
It is understood that when the recognition result of the text content does not have semantic information in the current direction of the document image to be corrected, the current direction of the document image to be corrected may be considered to be other than the standard direction (offset 0 °). At this time, there are three cases in which the current direction of the document image to be corrected is offset by 90 °, 180 °, and 270 °. Therefore, when the duty ratio of the single-character text box in the text detection result is smaller than or equal to the second threshold value, the offset angle of the text word is 180 degrees; when the duty ratio of the single character text box in the text detection result is smaller than or equal to a second threshold value, the offset angle of the text word is 90 degrees or 270 degrees.
Of course, in other embodiments, a text detection model with a transverse and a longitudinal text word detection function may be directly used to detect the arrangement direction of the text word in the document image to be corrected, so as to directly output the offset angle of the text word according to the arrangement direction of the text word. Such text detection models typically require training using both lateral and longitudinal text samples.
In an exemplary embodiment, the means for correcting the document image to be corrected based on the offset angle specifically includes: and when the offset angle of the document image to be corrected is 180 degrees, rotating the document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
It is understood that when the offset angle of the document image to be corrected is 180 °, the document image to be corrected can be corrected to its standard direction regardless of whether the document image to be corrected is rotated 180 ° clockwise or counterclockwise. Thus, in designing the automatic correction mechanism, rotation correction of the document image can be performed in either one rotation direction (which may be clockwise or counterclockwise) is optional.
In an exemplary embodiment, the means for correcting the document image to be corrected based on the offset angle specifically includes: when the offset angle of the document image to be corrected is 90 degrees or 270 degrees, rotating the document image to be corrected by 90 degrees; identifying the text content of the rotated document image to be corrected, and judging whether the identification result of the text content has semantic information or not; when the identification result has semantic information, determining that the current direction of the rotated document image to be corrected is the standard direction; and when the text recognition result does not have semantic information, rotating the rotated document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
It is understood that when the offset angle of the document image to be corrected is 90 ° or 270 °, the offset angle of the document image to be corrected after rotating by 90 ° may be 0 ° or 180 °. At this time, the text content of the rotated document image to be corrected is required to be identified, and whether the identification result of the text content has semantic information is determined. When the identification result has semantic information, the current offset angle of the rotated document image to be corrected can be determined to be 0 degrees, namely the current direction of the rotated document image to be corrected is the standard direction of the rotated document image to be corrected. When the identification result does not have semantic information, the current offset angle of the rotated document image to be corrected can be determined to be 180 degrees, and the rotated document image to be corrected is rotated 180 degrees again at the moment, so that the document image to be corrected can be corrected to the standard direction.
In summary, according to the document image correction method provided by the invention, the offset probabilities of 0 °, 90 °, 180 ° and 270 ° of the document image to be corrected relative to the standard direction of the document image to be corrected are obtained, and the document image to be corrected is corrected based on the offset probabilities, so that the OCR recognition accuracy of the document image can be improved; the method can directly carry out rotation correction on the document image with higher bias angle confidence based on the bias probability, carries out text recognition and semantic classification on the document image with lower bias angle confidence, and can improve the correction accuracy of the document image by combining with further determination of the bias angle of the document image of the recognition result.
Referring to fig. 3, based on the same inventive concept as the document image correction method described above, in one embodiment of the present invention, a document image correction apparatus 300 is provided, which includes an acquisition module 301, a first correction module 302, and a second correction module 303.
The acquisition module 301 is configured to acquire offset probabilities of 0 °, 90 °, 180 °, and 270 ° respectively for the document image to be corrected with respect to the standard direction thereof. The first correction module 302 is configured to correct the document image to be corrected to the standard direction based on the offset angle corresponding to the maximum probability when the maximum probability of the offset probabilities is greater than a first threshold. The second correction module 303 is configured to identify text content of the document image to be corrected when a maximum probability of the bias probabilities is less than or equal to a first threshold, determine whether the identification result of the text content has semantic information, and correct the document image to be corrected to the standard direction based on the semantic information determination result of the identification result.
Specifically, the obtaining module 301 may be configured to input a document image to be corrected into a rotation model, and classify image features of the document image to be corrected based on the rotation model; and based on the classification result, outputting offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG respectively for the document image to be corrected relative to the standard direction of the document image to be corrected.
The identifying of the text content of the document image to be rectified specifically comprises the following steps: and carrying out text detection on the document image to be corrected, and carrying out text recognition on the text detection result to obtain a recognition result of the text content of the document image to be corrected.
The second correction module 303 may be configured to determine, when the recognition result has semantic information, that the current direction of the document image to be corrected is the standard direction. And the method can be used for determining the offset angle of the document image to be corrected based on the text detection and the text recognition result when the recognition result does not have semantic information, and correcting the document image to be corrected to the standard direction based on the offset angle.
Referring to fig. 4, an embodiment of the present invention further provides an electronic device 400, where the electronic device 400 includes at least one processor 401, a memory 402 (e.g., a nonvolatile memory), a memory 403, and a communication interface 404, and the at least one processor 401, the memory 402, the memory 403, and the communication interface 404 are connected together via a bus 405. The at least one processor 401 is operative to invoke the at least one program instruction stored or encoded in the memory 402 to cause the at least one processor 401 to perform the various operations and functions of the document image rectification method described in various embodiments of the present specification.
In embodiments of the present description, electronic device 400 may include, but is not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile electronic devices, smart phones, tablet computers, cellular phones, personal Digital Assistants (PDAs), handsets, messaging devices, wearable electronic devices, consumer electronic devices, and the like.
Embodiments of the present invention also provide a computer-readable medium having computer-executable instructions carried thereon, which when executed by a processor, may be used to implement the various operations and functions of the document image correction method described in the various embodiments of the present specification.
The computer readable medium in the present invention may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, systems, and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing descriptions of specific exemplary embodiments of the present invention are presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain the specific principles of the invention and its practical application to thereby enable one skilled in the art to make and utilize the invention in various exemplary embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims (12)

1. A document image correction method, comprising:
acquiring offset probabilities of offset 0 degrees, 90 degrees, 180 degrees and 270 degrees of a document image to be corrected relative to a standard direction of the document image to be corrected;
when the maximum probability in the bias probabilities is larger than a first threshold value, correcting the document image to be corrected to the standard direction based on the bias angle corresponding to the maximum probability;
and when the maximum probability in the bias probabilities is smaller than or equal to a first threshold value, identifying the text content of the document image to be corrected, judging whether the identification result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result.
2. The document image correction method according to claim 1, wherein correcting the document image to be corrected based on the semantic information judgment result of the recognition result includes:
and when the identification result has semantic information, determining the current direction of the document image to be corrected as the standard direction.
3. The document image correction method according to claim 1, wherein recognizing text content of the document image to be corrected, judging whether the recognition result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the recognition result, comprises:
performing text detection on the document image to be corrected, and performing text recognition on the text detection result to obtain a recognition result of text content of the document image to be corrected;
and when the identification result does not have semantic information, determining the offset angle of the document image to be corrected based on a text detection model with a transverse text word detection function and a longitudinal text word detection function, and correcting the document image to be corrected to the standard direction based on the offset angle.
4. The document image correction method according to claim 1, wherein recognizing text content of the document image to be corrected, judging whether the recognition result of the text content has semantic information, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the recognition result, comprises:
performing text detection on the document image to be corrected, and performing text recognition on the text detection result to obtain a recognition result of text content of the document image to be corrected;
and when the recognition result does not have semantic information, determining a bias angle of the document image to be corrected based on the text detection and the text recognition result, and correcting the document image to be corrected to the standard direction based on the bias angle.
5. The document image correction method according to claim 4, wherein determining the offset angle of the document image to be corrected based on the results of the text detection and the text recognition includes:
text detection is carried out on the document image to be corrected based on a transverse text detection model, and text recognition is carried out on the text detection result;
and determining the offset angle of the text word in the document image to be corrected based on the text recognition result.
6. The document image correction method according to claim 5, wherein determining the offset angle of the text word in the document image to be corrected based on the result of the text recognition includes:
determining the ratio of a single-character text box to a total text box in the text detection result based on the text recognition result;
when the duty ratio is larger than a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 90 degrees or 270 degrees;
and when the duty ratio is smaller than or equal to a second threshold value, determining that the offset angle of the text word in the document image to be corrected is 180 degrees.
7. The document image correction method according to claim 3 or 6, wherein correcting the document image to be corrected based on the offset angle includes:
and when the offset angle of the document image to be corrected is 180 degrees, rotating the document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
8. The document image correction method according to claim 3 or 6, wherein correcting the document image to be corrected based on the offset angle includes:
when the offset angle of the document image to be corrected is 90 degrees or 270 degrees, rotating the document image to be corrected by 90 degrees;
identifying the text content of the rotated document image to be corrected, and judging whether the identification result of the text content has semantic information or not;
when the identification result has semantic information, determining that the current direction of the rotated document image to be corrected is the standard direction;
and when the identification result does not have semantic information, rotating the rotated document image to be corrected by 180 degrees so as to correct the document image to be corrected to the standard direction.
9. The document image correction method according to claim 1, wherein acquiring offset probabilities of the document image to be corrected offset by 0 °, 90 °, 180 °, and 270 ° with respect to a standard direction thereof, respectively, comprises:
inputting a document image to be corrected into a rotation model, and classifying image features of the document image to be corrected based on the rotation model;
based on the classification result, the offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG of the document image to be corrected relative to the standard direction of the document image to be corrected are output.
10. A document image correction apparatus, comprising:
the acquisition module is used for acquiring offset probabilities of 0 DEG, 90 DEG, 180 DEG and 270 DEG of offset of the document image to be corrected relative to the standard direction of the document image to be corrected;
the first correction module is used for correcting the document image to be corrected to the standard direction based on the offset angle corresponding to the maximum probability when the maximum probability in the offset probabilities is larger than a first threshold value;
and the second correction module is used for identifying the text content of the document image to be corrected when the maximum probability in the bias probability is smaller than or equal to a first threshold value, judging whether the identification result of the text content has semantic information or not, and correcting the document image to be corrected to the standard direction based on the semantic information judgment result of the identification result.
11. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the document image correction method of any one of claims 1 to 9 when the program is executed by the processor.
12. A computer readable medium having computer executable instructions carried thereon, which when executed by a processor is adapted to carry out the document image correction method according to any one of claims 1 to 9.
CN202310220642.1A 2023-03-08 2023-03-08 Document image correction method, device, electronic equipment and readable medium Pending CN116311276A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310220642.1A CN116311276A (en) 2023-03-08 2023-03-08 Document image correction method, device, electronic equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310220642.1A CN116311276A (en) 2023-03-08 2023-03-08 Document image correction method, device, electronic equipment and readable medium

Publications (1)

Publication Number Publication Date
CN116311276A true CN116311276A (en) 2023-06-23

Family

ID=86791965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310220642.1A Pending CN116311276A (en) 2023-03-08 2023-03-08 Document image correction method, device, electronic equipment and readable medium

Country Status (1)

Country Link
CN (1) CN116311276A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740740A (en) * 2023-08-11 2023-09-12 浙江太美医疗科技股份有限公司 Method for judging same-line text, method for ordering documents and application thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740740A (en) * 2023-08-11 2023-09-12 浙江太美医疗科技股份有限公司 Method for judging same-line text, method for ordering documents and application thereof
CN116740740B (en) * 2023-08-11 2023-11-21 浙江太美医疗科技股份有限公司 Method for judging same-line text, method for ordering documents and application thereof

Similar Documents

Publication Publication Date Title
US10817741B2 (en) Word segmentation system, method and device
US8867828B2 (en) Text region detection system and method
CN110569341B (en) Method and device for configuring chat robot, computer equipment and storage medium
CN111488826A (en) Text recognition method and device, electronic equipment and storage medium
CN111476067A (en) Character recognition method and device for image, electronic equipment and readable storage medium
US20130108160A1 (en) Character recognition device, character recognition method, character recognition system, and character recognition program
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
CN112861842A (en) Case text recognition method based on OCR and electronic equipment
CN115546488B (en) Information segmentation method, information extraction method and training method of information segmentation model
CN113221735A (en) Multimodal-based scanned part paragraph structure restoration method and device and related equipment
CN116311276A (en) Document image correction method, device, electronic equipment and readable medium
Akinbade et al. An adaptive thresholding algorithm-based optical character recognition system for information extraction in complex images
CN116089648A (en) File management system and method based on artificial intelligence
CN110796210A (en) Method and device for identifying label information
CN114419636A (en) Text recognition method, device, equipment and storage medium
US9378428B2 (en) Incomplete patterns
CN115984886A (en) Table information extraction method, device, equipment and storage medium
CN116110066A (en) Information extraction method, device and equipment of bill text and storage medium
CN113221718B (en) Formula identification method, device, storage medium and electronic equipment
CN112149523B (en) Method and device for identifying and extracting pictures based on deep learning and parallel-searching algorithm
CN115294593A (en) Image information extraction method and device, computer equipment and storage medium
CN113496115B (en) File content comparison method and device
CN113807343A (en) Character recognition method and device, computer equipment and storage medium
US20170262726A1 (en) Tex line detection
CN116469106A (en) Document image correction method, device, electronic equipment and readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination