CN112434640B - Method, device and storage medium for determining rotation angle of document image - Google Patents

Method, device and storage medium for determining rotation angle of document image Download PDF

Info

Publication number
CN112434640B
CN112434640B CN202011410416.2A CN202011410416A CN112434640B CN 112434640 B CN112434640 B CN 112434640B CN 202011410416 A CN202011410416 A CN 202011410416A CN 112434640 B CN112434640 B CN 112434640B
Authority
CN
China
Prior art keywords
text line
text
determining
category
document image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011410416.2A
Other languages
Chinese (zh)
Other versions
CN112434640A (en
Inventor
刘坚强
彭鑫
周代国
吴鹏杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Xiaomi Technology Wuhan Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Xiaomi Technology Wuhan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd, Xiaomi Technology Wuhan Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202011410416.2A priority Critical patent/CN112434640B/en
Publication of CN112434640A publication Critical patent/CN112434640A/en
Application granted granted Critical
Publication of CN112434640B publication Critical patent/CN112434640B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The present disclosure relates to a method, apparatus and storage medium for determining a rotation angle of a document image. The method for determining the rotation angle of the document image comprises the following steps: and detecting and cutting text lines included in the document image to obtain a plurality of text line images. Angles of the plurality of text line images are determined. A reference angle of the document image is determined based on angles of the text line images. Based on the reference angle, a rotation angle of the document image is determined. According to the method for determining the rotation angle of the document image, the rotation angle of the document image can be determined based on the plurality of text line images obtained after the document image is cut and the angles corresponding to the text line images, so that interference of the background in the document image is eliminated, the complexity of determining the rotation angle is reduced, and the determination accuracy of the rotation angle is improved.

Description

Method, device and storage medium for determining rotation angle of document image
Technical Field
The disclosure relates to the technical field of computer image processing, in particular to a method, a device and a storage medium for determining a rotation angle of a document image.
Background
Along with the rapid development of portable photographic equipment technology, people can shoot through the terminal more conveniently and obtain higher quality photos, besides being used for recording the wonderful moments in daily life, the digital copy of the photos can be obtained by shooting the documents, and further the text information in the photos can be extracted and identified through an optical character recognition (Optical Character Recognition, OCR) technology for recording and sharing important information. However, in real life, the photographed document images all have a certain rotation angle, and when the angle is too large (e.g., 90 °,180 °,270 °), the text recognition result is greatly affected. If the rotation angle of the document image can be predicted and rotated according to the rotation angle, the precision of OCR character recognition can be greatly improved.
In the related art, the rotation angle of the document image is determined mainly by two modes of predicting the rotation angle of the document image based on a neural network (ConvolutionNeuralNetwork, CNN) and calculating the rotation angle of the document image based on the character structure characteristics. When the rotation angle of the document image is predicted based on CNN, the quadrant direction of the document image is predicted, and then the rotation angle of the picture at the moment is predicted by utilizing an OCR network after the document image is rotated to a specified angle range section. But this approach is directed mainly to the case where the document area occupies the main part of the image. However, in real life, the document area ratio in the captured document image may be smaller and the background area may be more complex, and by adopting this method, the rotation angle of the document image is determined, network reasoning needs to be performed on the whole document image, which consumes a long time, and the accuracy of the prediction result cannot be determined. The document image rotation angle calculating method based on the text structure features mainly depends on the accuracy of each text walking direction detection and Chinese character stroke feature extraction, and when a large number of text lines are contained in the document image, the time consumption of an algorithm can be linearly increased, and the use experience of a user can be influenced. And the scenes in real life are various and extremely complex, the accuracy in the actual application scenes can not be ensured by adopting the method, and the algorithm robustness is poor.
Disclosure of Invention
To overcome the problems in the related art, the present disclosure provides a method, apparatus, and storage medium for determining a rotation angle of a document image.
According to a first aspect of embodiments of the present disclosure, there is provided a method of determining a rotation angle of a document image, including: and detecting and cutting text lines included in the document image to obtain a plurality of text line images. And determining angles of the text line images, wherein the angles of the text line images are included angles between the first edges of the text lines in the text line images and the horizontal direction. A reference angle of the document image is determined based on angles of the text line images. And determining the rotation angle of the document image based on the reference angle.
In one embodiment, detecting and cropping text lines included in the document image to obtain a plurality of text line images includes: and respectively detecting a plurality of text lines included in the document image to obtain a plurality of text detection boxes, and determining the overlapping degree of the text lines corresponding to the text detection boxes. And cutting the text lines with the text line overlapping degree equal to a preset overlapping degree threshold value by adopting a straight text line detection algorithm to obtain a text line image. And cutting the text lines with the text line overlapping degree larger than a preset overlapping degree threshold value by adopting a curved text line detection algorithm to obtain a text line image.
In another embodiment, clipping using a curved text line detection algorithm results in one or more text line images, comprising: the edge point set of the text detection box is divided into an upper edge point set and a lower edge point set. And performing curve fitting on the edge points in the upper edge point set to obtain an upper edge curve, and performing curve fitting on the edge points in the lower edge point set to obtain a lower edge curve. And determining coordinates of each central point in a text line width range corresponding to the text detection box based on the upper edge curve and the lower edge curve. And determining the clipping height of the text detection box based on the coordinates of each corresponding center point in the width range. And cutting the text detection box according to the appointed width based on the width range and the cutting height to obtain a plurality of rectangular images. And splicing the rectangular images along the horizontal direction to obtain a text line image.
In yet another embodiment, the determining the reference angle of the document image based on the angles of the text line images includes: based on the angles of the text line images, clustering the text line images to obtain clustered text line categories, and determining the number of the text line images in each clustered text line category. And determining a reference class according to the number of the text line images in each text line class after clustering. And judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class.
In yet another embodiment, the determining the reference class according to the number of text line images in each text line class after clustering includes: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. And if the difference value between the numbers of the Chinese line images in the first number of the text line categories is larger than a first number threshold, determining the text category with the largest number of the Chinese line images in the first number of the text line categories as a reference category.
In still another embodiment, the method of determining a rotation angle of a document image further includes: and if the difference value between the numbers of the Chinese line images in the first number of the text line categories is smaller than or equal to a first number threshold value, selecting the text line category with the largest average area of the text line image areas from the first number of the text line categories as a reference category.
In yet another embodiment, the determining the reference class according to the number of text line images in each text line class after clustering includes: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. And determining the text line category with the largest average area of the text line image area as a reference category in the first number of text line categories.
In yet another embodiment, the performing the class determination on the reference class to obtain a class corresponding to the reference class includes: and acquiring a second number of text line images in the text line images corresponding to the reference class. A category of the second number of text line images is determined. And judging the category of the reference category based on the category of the second number of text line images to obtain the category corresponding to the reference category. The categories of the text line images comprise forward horizontal text lines, reverse horizontal text lines, forward vertical text lines or reverse vertical text lines.
In yet another embodiment, the determining the category of the second number of text line images includes: and inputting the second number of text line images in the reference class into a trained text line image class classification model to obtain the class of the second number of text line images in the reference class.
In yet another embodiment, the determining the class of the reference class based on the class of the second number of text line images, to obtain a class corresponding to the reference class, includes: and determining the category of each text line image and the number of the text line images corresponding to each category in the second number of the text line images. And when the number of the text line images corresponding to the category with the largest number of the text line images is larger than a second number threshold, taking the category with the largest number of the text line images as the category corresponding to the reference category.
In still another embodiment, the method for determining a rotation angle of a document image further includes: and taking a default category as the category corresponding to the reference category under the condition that the number of the text line images corresponding to the category with the largest number of the text line images is smaller than or equal to the second number threshold.
In still another embodiment, the determining the rotation angle of the document image based on the reference angle includes: and determining the average angle of the reference class according to the angle of each text line image in the reference class. And determining the rotation angle of the document image based on the class corresponding to the reference class and the average angle.
According to a second aspect of the embodiments of the present disclosure, there is provided a rotation angle apparatus for determining a document image, including: and the clipping unit is used for detecting and clipping the text lines included in the document image to obtain a plurality of text line images, and determining angles of the text line images, wherein the angles of the text line images are included angles between the first edges of the text lines in the text line images and the horizontal direction. And the screening unit is used for determining the reference angle of the document image based on the angles of the text line images. And a determining unit configured to determine a rotation angle of the document image based on the reference angle.
In an embodiment, the clipping unit detects and clips text lines included in the document image in the following manner to obtain a plurality of text line images: and respectively detecting a plurality of text lines included in the document image to obtain a plurality of text detection boxes, and determining the overlapping degree of the text lines corresponding to the text detection boxes. And cutting the text lines with the text line overlapping degree equal to a preset overlapping degree threshold value by adopting a straight text line detection algorithm to obtain a text line image. And cutting the text lines with the text line overlapping degree larger than a preset overlapping degree threshold value by adopting a curved text line detection algorithm to obtain a text line image.
In another embodiment, the clipping unit uses a curved text line detection algorithm to clip to obtain one or more text line images in the following manner: the edge point set of the text detection box is divided into an upper edge point set and a lower edge point set. And performing curve fitting on the edge points in the upper edge point set to obtain an upper edge curve, and performing curve fitting on the edge points in the lower edge point set to obtain a lower edge curve. And determining coordinates of each central point in a text line width range corresponding to the text detection box based on the upper edge curve and the lower edge curve. And determining the clipping height of the text detection box based on the coordinates of each corresponding center point in the width range. And cutting the text detection box according to the appointed width based on the width range and the cutting height to obtain a plurality of rectangular images. And splicing the rectangular images along the horizontal direction to obtain a text line image.
In yet another embodiment, the filtering unit determines the reference angle of the document image based on angles of the plurality of text line images of the text line image in the following manner: based on the angles of the text line images, clustering the text line images to obtain clustered text line categories, and determining the number of the text line images in each clustered text line category. And determining a reference class according to the number of the text line images in each text line class after clustering. And judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class.
In yet another embodiment, the filtering unit determines the reference class according to the number of text line images in each text line class after clustering in the following manner: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. And if the difference value between the numbers of the Chinese line images in the first number of the text line categories is larger than a first number threshold, determining the text category with the largest number of the Chinese line images in the first number of the text line categories as a reference category.
In a further embodiment, the screening unit is further adapted to: and if the difference value between the numbers of the Chinese line images in the first number of the text line categories is smaller than or equal to a first number threshold value, selecting the text line category with the largest average area of the text line image areas from the first number of the text line categories as a reference category.
In yet another embodiment, the filtering unit determines the reference class according to the number of text line images in each text line class after clustering in the following manner: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. And determining the text line category with the largest average area of the text line image area as a reference category in the first number of text line categories.
In yet another embodiment, the determining unit performs the category determination on the reference class in the following manner, to obtain a category corresponding to the reference class: and acquiring a second number of text line images in the text line images corresponding to the reference class. A category of the second number of text line images is determined. And judging the category of the reference category based on the category of the second number of text line images to obtain the category corresponding to the reference category. The categories of the text line images comprise forward horizontal text lines, reverse horizontal text lines, forward vertical text lines or reverse vertical text lines.
In a further embodiment, the determining unit determines the category of the second number of text line images in the reference category in the following manner: and inputting the second number of text line images in the reference class into a trained text line image class classification model to obtain the class of the second number of text line images in the reference class.
In yet another embodiment, the determining unit performs the category judgment on the reference category based on the category of the second number of text line images, to obtain a category corresponding to the reference category: and determining the category of each text line image and the number of the text line images corresponding to each category in the second number of the text line images. And when the number of the text line images corresponding to the category with the largest number of the text line images is larger than a second number threshold, taking the category with the largest number of the text line images as the category corresponding to the reference category.
In a further embodiment, the determining unit is further configured to: and taking a default category as the category corresponding to the reference category under the condition that the number of the text line images corresponding to the category with the largest number of the text line images is smaller than or equal to the second number threshold.
In still another embodiment, the determination unit determines the rotation angle of the document image based on the reference angle in the following manner: and determining the average angle of the reference class according to the angle of each text line image in the reference class. And determining the rotation angle of the document image based on the class corresponding to the reference class and the average angle.
According to a third aspect of the embodiments of the present disclosure, there is provided a rotation angle apparatus for determining a document image, including: a memory for storing instructions; and a processor for calling the instructions stored in the memory to execute any one of the above methods for determining the rotation angle of the document image.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein instructions which, when executed by a processor, perform any one of the above-described methods of determining a rotation angle of a document image.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: according to the method for determining the rotation angle of the document image, the rotation angle of the document image can be determined based on the plurality of text line images obtained after the document image is cut and the angles corresponding to the text line images, so that interference of the background in the document image is eliminated, the complexity of determining the rotation angle is reduced, and accuracy and precision of determining the rotation angle are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart illustrating a method of determining a rotation angle of a document image according to an exemplary embodiment.
Fig. 2 is a schematic diagram of a text line image, according to an exemplary embodiment.
Fig. 3 is a schematic diagram of a text line image, according to an exemplary embodiment.
FIG. 4 is a schematic diagram illustrating a clipping according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating another method of determining a rotation angle of a document image according to an exemplary embodiment.
Fig. 6 is a schematic diagram of a text line image, according to an exemplary embodiment.
Fig. 7 is a schematic diagram of yet another text line image shown in accordance with an exemplary embodiment.
Fig. 8 is a schematic diagram of yet another text line image shown in accordance with an exemplary embodiment.
Fig. 9 is a flowchart illustrating still another method of determining a rotation angle of a document image according to an exemplary embodiment.
Fig. 10 is a block diagram illustrating a rotation angle determining apparatus of a document image according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
In the related art, the rotation angle of the document image is determined mainly by two modes of predicting the rotation angle of the document image based on a neural network (Convolution Neural Network, CNN) and calculating the rotation angle of the document image based on the character structure characteristics. When predicting the rotation angle of the document image based on the CNN network, predicting the quadrant direction of the document image by adopting a classification model based on CNN in advance, and determining the angle quadrant of the document image. The classification model categories include [0,90,180,270]. Further, the document image is rotated to (-45 °,45 °), and the rotation angle of the document image at this time is predicted by the OCR network. Thereby determining the rotation angle of the document image based on the determination of the angle quadrant in which the document image is located and the rotation angle of the document image at that time. But the prediction is performed in the mode, network reasoning needs to be performed based on the whole document image, and the time is long. In practical application, the background of the obtained document image may be complex, the text occupies a small area of the document image, and when the neural network is trained, all scenes cannot be completely contained, so that the robustness of the neural network is poor. If the document image contains a large number of horizontal text lines and vertical text lines, the accuracy of the classification model is rapidly reduced, and the accuracy of predicting the rotation angle of the document image is seriously affected.
In the method for calculating the rotation angle of the document image based on the character structure characteristics, the trend of the text line is detected according to the projection characteristics of the document image and the position relationship between the adjacent Chinese characters of the text line. And determining typesetting information of the text lines based on the trend of the text lines and the aspect ratio information of each character in the text lines, thereby determining whether the current document image is inverted based on extracting the stroke characteristics of the Chinese characters and completing the calculation of the rotation angle of the document image. But the method is mainly dependent on the accuracy of each text walking direction detection and Chinese character stroke feature extraction. When the rotation angle is calculated, trend detection and stroke feature extraction are required to be carried out on all text lines in the document image, and if the text lines contained in the document image are more, the time consumption of an algorithm can be linearly increased, and the use experience of a user can be influenced. And the scenes in real life are various and extremely complex, the accuracy in the actual application scenes can not be ensured by adopting the method, and the algorithm robustness is poor.
In view of this, an embodiment of the present disclosure provides a method for determining a rotation angle of a document image, which cuts the document image through text line detection, so as to obtain a plurality of text line images and angles corresponding to the text line images, thereby determining the rotation angle of the document image based on the angles of the text line images. In the process of determining the rotation angle of the document image, the background interference except the text line can be avoided, so that the information amount for determining the rotation angle is reduced, and the rotation angle of the document image is not limited by the angle of the text line image when the rotation angle of the document image is determined, thereby being beneficial to improving the determination precision of the rotation angle of the document image.
Fig. 1 is a flowchart illustrating a method of determining a rotation angle of a document image according to an exemplary embodiment. As shown in fig. 1, the method of determining the rotation angle of the document image includes the following steps S11 to S13.
In step S11, text lines included in the document image are detected and cut, resulting in a plurality of text line images.
In the embodiment of the disclosure, the document image is an image containing text. In order to determine the position of each text line in the document image conveniently, text line detection can be carried out on the document image, and then a plurality of text detection boxes are obtained based on detection, so that when the document image is cut, the text line can be cut according to the position of the text detection box, and a plurality of text line images are obtained. According to the text detection box, the document image is cut, the background interference except for text in the document image can be eliminated, and the complexity of determining the rotation angle of the document image is further reduced.
In step S12, angles of a plurality of text line images are determined.
In the embodiment of the present disclosure, in order to facilitate determination of the rotation angle of the document image, it may be determined according to the inclination angle corresponding to each text line. When determining the angles of the text line images, the first edge of the text line image can be overlapped with the horizontal direction, and then the included angle formed between the first edge of the text line in the text line image and the horizontal direction is the angle of the text line image. Wherein the first edge of the text line image may be a relatively longer one of the plurality of edges of the text line image. Further, the method has more convincing and accurate performance when determining the angle of the text line image.
In one example, when cropping the text line image, the first edge of the text line image may be cropped in a horizontal direction such that the first edge of the text line image can coincide with the horizontal direction. Further, when determining the angle of the text line images, the angle between each text line in each text line image and the first edge of the document image can be determined. That is, an angle formed between a text line and a first edge of a corresponding text line image is taken as an angle of the text line image. For example: the included angle formed between the text line in the text line image and the first edge of the text line image is 15 degrees, and the 15 degrees are the angles corresponding to the text line image. Therefore, when the document picture is rotated, the document picture can be rotated in a targeted manner.
In step S13, a reference angle of the document image is determined based on angles of the text line images.
In the embodiment of the present disclosure, the reference angle is an angle for determining a rotation angle of the document image. Since angles corresponding to the respective text line images in the document image may be different, and rotation based on a unique angle is required when rotating the document image. Therefore, in order to determine the rotation angle required for the document image, the reference angle of the document image can be determined among angles corresponding to a plurality of text line images in the text line image. Thus, when the document image is rotated, the document image can be rotated with reference to the reference angle. In one example, the reference angle may be determined based on the number of text line images corresponding to the same angle. In the angles corresponding to the respective text line images, if the number of text line images corresponding to the same angle is the largest, the angle may be determined as the reference angle. And then when the OCR carries out character recognition on the document picture rotated based on the reference angle, the inclination angles of a plurality of text lines in the rotated document image can meet the character recognition requirement of the OCR, thereby being beneficial to improving the character recognition accuracy of the OCR. In another example, the reference angle may be determined from an angle range. And determining a plurality of angle ranges of the document image according to the angles corresponding to the text line images, and further determining a reference angle based on the angle range with the largest number of the text line images. Therefore, when the corresponding angles of the text line images in the document image are different, the document image is rotated by adopting the reference angle determined by the angle range, so that the inclination angles of a plurality of text lines in the rotated document image can approach to the same direction.
In step S14, the rotation angle of the document image is determined based on the reference angle.
In the embodiment of the disclosure, the inclination angle of the document image in the current and horizontal directions is determined according to the determined reference angle, so that the rotation angle of the document image is not limited by the angle of the text line image when the rotation angle of the document image is determined, thereby being beneficial to improving the determination precision of the rotation angle of the document image.
By the embodiment, the interference of the document image except the text can be eliminated based on the cutting of the text line image, and further, the determination accuracy of the rotation angle of the document image can be improved when the reference angle of the document image is determined.
In one embodiment, there may be only one or more types of text lines in the document image. Types of text lines include: straight text lines and curved text lines. In order to ensure the integrity of the text lines in the text line image, the type of the text line in the current text detection box can be determined before cutting, and then the text line is subjected to targeted cutting according to the type of the text line. In determining the shape of the text line, the determination may be based on the text line overlap. That is, by text line detection, the area of the text line region in each text detection box and the area of the minimum area rectangle corresponding to the text line can be determined. And further obtaining the text line overlapping degree corresponding to the text line according to the ratio of the area of the text line area to the area of the minimum area rectangle. And if the obtained text line overlapping degree is equal to a preset overlapping degree threshold value, determining the text line corresponding to the text line overlapping degree as a straight text line. When the text line image is obtained by clipping, clipping is performed based on a straight text line detection algorithm. If the obtained text line overlapping degree is smaller than a preset overlapping degree threshold value, determining the text line corresponding to the text line overlapping degree as a bent text line. When the text line image is obtained by clipping, clipping is performed based on a curved text line detection algorithm. The text line area may be characterized as a text line area obtained along the text line edge, and the area of the minimum area rectangle corresponding to the text line may be characterized as an area of the minimum area rectangle containing all the texts of the text line. For example: as shown in fig. 2, the text line image is a text line schematic diagram, the text lines are straight, and the sizes and arrangement directions of the texts in the text lines are the same, so that the text line area obtained based on the text line edge and the minimum area rectangle corresponding to the text line belong to the same area. As shown in fig. 3, the text line diagram in the text line image is a curved text line, and the size, the arrangement direction, or the size and the arrangement direction of each text in the text line may be different, so that the text line area corresponding to the text line is an area corresponding to 1, the minimum area rectangle corresponding to the text line is an area corresponding to 2, and the minimum area rectangle of the curved text line is larger than the text line area. Therefore, the preset overlap threshold value can be set to be 1, and then comparison is performed based on the overlap degree of the text lines, and if the overlap degree of the text lines is equal to 1, the text lines are straight; the text line overlap is less than 1, and the text line is curved.
In another embodiment, when a straight text line is cut using a straight text line cutting algorithm, the following method may be used for cutting. Through text line detection, the edge point set of each text detection box can be obtained while the text lines are contained in the document image. And according to the coordinates of each edge point of the current text detection box, obtaining the minimum area rectangle corresponding to the text line through calculation. And further determining the point coordinates of the four corners corresponding to the minimum area rectangle and the width and height corresponding to the minimum area rectangle. Therefore, according to the width and the height and the point coordinates of the four corners corresponding to the minimum area rectangle, the four corner coordinates corresponding to the cut text line image of the text line can be determined. Based on perspective transformation, a perspective transformation matrix of four angular coordinates corresponding to the text line image is obtained, so that the perspective transformation matrix is mapped into the document image, text lines in the current text detection frame are cut from the document image, and the text line image corresponding to the text lines in the current text detection frame is obtained. For example: the minimum area rectangle corresponding to the text line detection frame can be obtained through the edge point set of the text line detection frame, and then four corner points (x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4) corresponding to the minimum area rectangle, and the width and height of the minimum area rectangle are obtained. And obtaining coordinates (0, 0), (width, height), (0, height) of the four corner points after clipping according to the width and height and the point coordinates of the four corners corresponding to the minimum area rectangle. Based on perspective transformation, (0, 0), (width, height), (0, height) to obtain a perspective transformation matrix M corresponding to the text detection frame, mapping M into a document image, and cutting text lines in the current text detection frame from the document image to obtain a text line image corresponding to the text lines in the current text detection frame.
In yet another embodiment, when a curved text line clipping algorithm is used to clip a curved text line, if a straight text line clipping algorithm is used to clip the curved text line, a large number of non-text line regions are introduced during clipping, and a large influence is easily generated when OCR character recognition is performed subsequently. Further, in order to avoid interference of other factors, the following method may be adopted for clipping when clipping a bent text. For ease of understanding, the description of the clipping will be described with reference to the clipping schematic diagram shown in fig. 4. And detecting the text lines to obtain the edge point set of each text detection box. And further dividing the edge point set of each text detection box into an upper edge point set and a lower edge point set. And performing curve fitting on the edge points in the upper edge point set to obtain an upper edge curve, and performing curve fitting on the edge points in the lower edge point set to obtain a lower edge curve, so that the coordinates of each center point 3 in the width range of the Chinese character line in the text line image can be determined according to the upper edge curve and the lower edge curve. Specifically, the height range of the text line and the width of the text line can be obtained based on the correspondence between each edge point in the upper edge point set and each edge point in the lower edge point set. In order to facilitate cutting the text line, the coordinates of the center point 3 between each upper edge point and the corresponding lower edge point can be determined within the width range of the text line according to the corresponding relation between each edge point in the upper edge point set and each edge point in the lower edge point set, and then when cutting is performed, the cutting height corresponding to the text line image obtained by cutting the text line can be determined according to the coordinates of each center point. And in order to obtain the rectangle with the minimum area corresponding to the text line conveniently, when cutting is performed, each center point is taken as a center, cutting is performed within the width range of the text line according to the designated width and the determined cutting height, and then a plurality of rectangles 4 are obtained within the width range of the current text line. And splicing the rectangles 4 along the horizontal direction based on the coordinates of the central point in the horizontal direction, so as to obtain a text line image of the rectangle corresponding to the bent text line in the text detection box. In one example, the specified width may be determined from document image pixels, such as: the image pixels may include: 1 pixel or 2 pixels. The smaller the image pixels corresponding to the appointed width are, the less the interference information in the intercepted text line image is, and further the more the follow-up OCR can be accurately identified.
The following embodiment will explain a determination process of a reference angle of a document image.
In one embodiment, to facilitate distinguishing the types of each text line image in the document image, each text line image is clustered according to its corresponding angle, so that the text line type involved in the document image and the number of text line images in each text line type can be determined. In an example, when clustering the text line images according to their corresponding angles, a clustering algorithm may be used to perform clustering, for example: mean-Shift (Mean-Shift) clustering is a density-based non-parametric clustering algorithm, and the final class number can be automatically determined to complete clustering. By adopting the clustering algorithm to cluster, each text line image can be clustered according to the corresponding angle attribute, and then the automatic classification of each text line can be realized. For example: in the present disclosure, the angle corresponding to the text line image is an angle between the first edge of the text line in the text line image and the horizontal direction. And in order to facilitate subsequent character recognition, when cutting, the first edge of the text line image corresponding to each text line is overlapped with the horizontal direction. Therefore, if the document image is placed forward and the rotation angle is 0 °, and the horizontal text line and the vertical text line are included in the document image, it can be determined that the angle corresponding to the horizontal text line is 0 °, and the angle of the vertical text line is-90 °. Through Mean-Shift clustering, the categories of the text line images can be automatically divided into horizontal text lines and vertical text lines, the angles based on which the text line images in the document images are divided, and the number of the text line images of the horizontal text lines and the vertical text lines is determined. In another example, if the document image is placed with a certain angle, so that the angle corresponding to each text line image may be any angle value, when clustering is performed, clustering may be performed based on a specified angle difference range, difference comparison may be performed on the angles corresponding to each text line image, and text line images corresponding to angles with differences smaller than the specified angle difference range are clustered into one type. Wherein, the smaller the angle difference range is, the closer the corresponding angles between the text line images of the same class are gathered.
Further, a reference class of the document image is determined according to the number of text line images in each text line class after clustering. In one example, the reference angle determined for the guarantee can be close to the true inclination angle of the document image. In determining the reference class of the document image, the determination may be made based on the text line class having the largest number of text line images. Further, when the rotation angle of the document image is determined based on the determined reference angle, a plurality of texts in the document image can be corrected. In another example, in determining the reference class of the document image, the determination may be made according to a text line class having a relatively large number of text line images, but a maximum average area of the text line images. Further, when the rotation angle of the document image is determined based on the determined reference angle, the document image can be visually corrected. After the reference class of the document image is obtained, the class of the document image is judged, and the class corresponding to the reference class is determined so as to determine the reference angle of the document image. Wherein, the category may include: the text lines are arranged horizontally or vertically. Further, different reference angles can be determined for different categories.
In an example, the benchmark class of document images may be determined based on the number of text line images in each text line category. According to the clustered text line categories, the number of text line images in each text line category can be determined. And sequentially determining the first number of text line categories according to the sequence of the number of the text line images from more to less. The more the number of text line images corresponding to the text line categories, the more uniform the text line categories characterizing the text line images in the document image. Wherein the text line categories in the first number may be characterized as text line categories having a relatively large number of text line images in the document image. And comparing the difference value of the numbers of the images of each text line in the first number of text line categories, if the difference value is larger than a first number threshold, the number difference between the numbers of the images of each text line in the first number of text line categories can be represented obviously, and then the text category with the largest number of the images of each text line in the first number of text line categories can be determined as the reference category. Thus, when the document image is rotated according to the reference angle determined by the reference class, the inclination angle of a plurality of text lines in the document image can be corrected.
In another example, the number of text line images in the first number of text line categories is compared by a difference, and if the difference is less than or equal to the first number threshold, the number of text line images corresponding to each text line category in the first number of text line categories is characterized as equal or similar. Further, when the reference class is judged, the determination can be made based on the average area of the text line image area. Based on the foregoing, the area of the region of each text line image can be acquired by text line detection. Further, the total area of the region of each text line type can be obtained from the region area of each text line image. Dividing the total area of the text line category areas by the number of the text line images corresponding to the text line category, so as to obtain the average area of the text line image areas corresponding to the text line category, and further determining the average area of each text line image area in the first number of the text line category. Thereby selecting the text line category with the largest average area of the text line image area as the reference category. Thus, when the document image is rotated according to the reference angle determined by the reference class, the inclination angle of the text lines in most areas in the document image can be corrected.
In yet another example, the reference class of the document image may be determined based on the number of text line images in each text line category and the average area of the text line image area corresponding to each text line category. And determining the number of text line images in each text line category according to the clustered text line categories. And sequentially determining the first number of text line categories according to the sequence of the number of the text line images from more to less. Wherein the text line categories in the first number may be characterized as text line categories having a relatively large number of text line images in the document image. The average area of each text line image area in the first number of text line categories is determined separately. Further, among the first number of text line categories, a text line category having the largest average area of the text line image area is selected as a reference category. Therefore, when the document image is rotated according to the reference angle determined by the reference class, the text lines with the larger number and the largest occupied area in the document image can be corrected.
Based on the same inventive concept, the present disclosure also provides another method of determining a rotation angle of a document image.
Fig. 5 is a flowchart illustrating a method of determining a rotation angle of a document image according to an exemplary embodiment. As shown in fig. 5, the method of determining the rotation angle of the document image includes the following steps S21 to S26.
In step S21, text lines included in the document image are detected and cut, resulting in a plurality of text line images.
In step S22, angles of a plurality of text line images are determined.
In step S23, based on angles of the plurality of text line images, the plurality of text line images are clustered to obtain clustered text line categories, and the number of text line images in each text line category after clustering is determined.
In step S24, a reference class is determined based on the number of text line images in each text line class after clustering.
In step S25, a second number of text line images among the text line images corresponding to the reference class is acquired.
In the disclosed embodiment, the categories of the text line images include a forward horizontal text line, an inverted horizontal text line, a forward vertical text line, or an inverted vertical text line. The text lines in the text line image shown in fig. 2 may be the text lines in the text line image shown in fig. 6, the text lines in the text line image shown in fig. 6 may be the text lines in the text line image shown in fig. 7 may be the text lines in the forward vertical text lines, and the text lines in the text line image shown in fig. 8 may be the text lines in the reverse vertical text lines. Fig. 6-8 illustrate some exemplary text line images of the present disclosure. When the text line image is cut, the clustering angles corresponding to the forward horizontal text line and the reverse horizontal text line may be the same, or the clustering angles of the forward vertical text line or the reverse vertical text line may be the same, or the clustering angles corresponding to the forward horizontal text line, the reverse horizontal text line, the forward vertical text line and the reverse vertical text line are different. Further, among the plurality of text line images in the reference class, there may be a plurality of classes of text line images. And determining the text line image category corresponding to the reference class according to the category of the text line images of the second number in the reference class, so that the rotation angle of the document image required to rotate can be determined according to the text line image category corresponding to the reference class. The second number may be a specified number, or may be a number that is randomly extracted based on a specified proportion based on the number of text line images in the reference class, which is defined in the present disclosure.
In step S26, the category of the second number of text line images is determined.
In an embodiment of the present disclosure, the category of the second number of text line images may be determined based on a conventional machine learning algorithm, for example, by bayesian classification, decision tree, or support vector machine. The category of the second number of text line images may also be determined based on a deep learning algorithm, such as: the determination is made by CNN.
In step S27, the reference class is determined based on the class of the second number of text line images, and the class corresponding to the reference class is obtained.
In an embodiment of the present disclosure, according to the determined category of each text line image in the second number, it is determined whether the category of each text line image is the same. If the text line image categories corresponding to the text line images in the second number are the same, the text line image category in the reference category is the same text line image category, and the text line image category corresponding to the text line images in the second number is the category corresponding to the reference category. If the text line image categories corresponding to the text line images in the second number are different, the text line image categories in the reference category are mixed text line image categories, and the category corresponding to the reference category can be determined based on the text line image numbers corresponding to the different text line image categories.
In step S28, a reference angle of the document image is determined based on the category to which the reference category corresponds.
In step S29, the rotation angle of the document image is determined based on the reference angle.
By the above embodiment, the reference angle of the document image is determined according to the category of each text line image in the reference category, and the determined reference angle can be made to more closely fit the actual inclination angle of the document image. And the category corresponding to the reference class is determined based on partial text line images in the reference class, so that the calculation amount can be saved, and the quick determination of the reference angle is facilitated. Thereby saving the calculation time of the rotation angle of the document image.
In one embodiment, the class of the second number of text line images in the reference class may be determined using a trained text line image class classification model. The text line image category in the text line image classification model may include: forward horizontal text lines, reverse horizontal text lines, forward vertical text lines, and reverse vertical text lines, and 0, 1, 2, and 3 are used as respective category indexes. And then, after the text line is input into the text line image classification model, the text line image category corresponding to the text line image can be determined according to the category index output by the text line image classification model. In one example, if the text line image category corresponding to the text line image is not specified by the text line image category classification model, the text line image category corresponding to the text line image may be specified as the default category.
In one example, the text line image class classification model may be trained using a convolutional neural network with a lightweight neural network model as the primary network. When the text line image class classification model is trained, a batch of text line images synthesized by an algorithm are collected or utilized in advance, and the corresponding text line image classes are marked to be used as a training text line image set for training the text line image class classification model. And randomly extracting a plurality of training text line images in the training text line image set, inputting the training text line images into a convolutional neural network, and training the convolutional neural network based on the output classification result of the text line image types and the text line image types correspondingly marked by the training text line images to obtain a trained text line image type classification model. In one example, training data for training a word recognition algorithm can be used for training the text line image set, so that the utilization rate of the training data is improved, and development cost is reduced.
In another embodiment, when determining the category corresponding to the reference category based on the second number of text line images, the determination may be performed according to the number of text line images corresponding to each category in the second number of text line images. Based on the second number of text line images, determining a text line image category corresponding to each text line image and the number of text line images corresponding to each text line image category respectively. The number of text line images corresponding to the text line image category with the largest number of text line images can be compared with a second number threshold, and when the number of text line images corresponding to the category with the largest number of text line images is larger than the second number threshold, the category with the largest number of text line images is used as the category corresponding to the reference category. For example: the second number is S, and 0, 1, 2, 3 represent a forward horizontal text line, an inverted horizontal text line, a forward vertical text line, and an inverted vertical text line, respectively. Based on the determined text line image category corresponding to each text line image, the number of the text line images corresponding to 0, 1, 2 and 3 is S 0、S1、S2、S3. The maximum value (MAX) in S 0、S1、S2 and S 3 is determined, and compared with the second number threshold, and if MAX (S 0、S1、S2、S3) is greater than the second number threshold, the category corresponding to the reference category is the text line image category corresponding to MAX (S 0、S1、S2、S3). In one example, the second number of thresholds may be specified thresholds.
In still another embodiment, in a case where the number of text line images corresponding to the category having the largest number of text line images is less than or equal to the second number threshold, the default category may be taken as the category corresponding to the reference category. So that the text line image can be normally operated when the rotation angle of the text line image is determined. Wherein the default category may be any one of the categories of the text line image.
In yet another embodiment, the rotation angle of the document image may be determined according to an average angle corresponding to the angle of each text line image in the reference class and a class corresponding to the reference class. According to the angles of the text line images in the reference class, the average angle of the reference class is determined, and when the rotation angle of the document image is determined according to the reference angle, the determined rotation angle is more reasonable, and each text line in the document image can be properly corrected. The corresponding relation between the category corresponding to the preset reference category and the rotation angle determining mode is preset. Further, after the category corresponding to the reference category is determined, the rotation angle of the document image can be quickly determined based on the correspondence between the category corresponding to the reference category and the rotation angle determination mode. In an example, the correspondence between the category corresponding to the preset reference category and the rotation angle determining manner may include: if the category corresponding to the reference category is a forward horizontal text line, the corresponding rotation Angle is determined in such a manner that angle=a, where a is the average Angle of the reference category. If the category corresponding to the reference category is an inverted horizontal text line, the corresponding rotation Angle is determined in such a manner that angle=a+180. If the category corresponding to the reference category is a forward vertical text line, the corresponding rotation Angle is determined in such a manner that angle=a-90. If the reference class corresponds to an inverted vertical text line, the corresponding rotation Angle is determined in such a manner that angle=a+180-90. When determining the class corresponding to the reference class and the average angle of the reference class, the rotation angle of the document image can be rapidly determined according to the corresponding rotation angle determination mode.
In still another embodiment, when the angles of the text line images in the reference class are the same, and there is very specific abnormal data, the rotation angle of the document image is determined, and the determination can be made based on the class corresponding to the reference class and the angles of the text line images which are the same in the reference class. Thereby avoiding errors and improving correction accuracy.
In an implementation scenario, a method of determining a rotation angle of a document image may be as shown in fig. 9.
Fig. 9 is a flowchart showing a method of determining a rotation angle of a document image according to an exemplary embodiment, including the following steps S31 to S35.
In step S31, text lines in the document image are cut out by the text detection box, and a plurality of text line images and angles corresponding to the text line images are obtained.
In step S32, the angles corresponding to the obtained text line images are clustered by means of Mean-Shift, and a plurality of text line categories of the document image are determined.
In step S33, a reference class is determined based on the number of text line images in each text line class after clustering.
In step S34, the class corresponding to the reference class and the average angle of the reference class are determined.
In step S35, the rotation angle of the document image is determined based on the category corresponding to the reference category, the average angle of the reference category, and the correspondence between the category corresponding to the reference category and the rotation angle determination method.
Based on the same conception, the embodiment of the disclosure also provides a rotation angle device for determining the document image.
It can be appreciated that, in order to implement the above functions, the apparatus for determining the rotation angle of the document image provided in the embodiments of the present disclosure includes a hardware structure and/or a software module that perform respective functions. The disclosed embodiments may be implemented in hardware or a combination of hardware and computer software, in combination with the various example elements and algorithm steps disclosed in the embodiments of the disclosure. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the embodiments of the present disclosure.
Fig. 10 is a block diagram illustrating a rotation angle determining apparatus for determining a document image according to an exemplary embodiment. Referring to fig. 10, the rotation angle apparatus 100 for determining a document image includes a cropping unit 101, a screening unit 102, and a determination unit 103.
And a clipping unit 101, configured to detect and clip text lines included in the document image, obtain a plurality of text line images, and determine angles of the plurality of text line images, where the angles of the text line images are included angles between a first edge of a text line in the text line images and a horizontal direction.
And a filtering unit 102 for determining a reference angle of the document image based on angles of the text line images.
A determining unit 103 for determining a rotation angle of the document image based on the reference angle.
In an embodiment, the cropping unit 101 detects and crops text lines included in a document image to obtain a plurality of text line images in the following manner: and respectively detecting a plurality of text lines included in the document image to obtain a plurality of text detection boxes, and determining the overlapping degree of the text lines corresponding to the text detection boxes. And cutting the text lines with the text line overlapping degree equal to a preset overlapping degree threshold value by adopting a straight text line detection algorithm to obtain a text line image. And cutting the text lines with the text line overlapping degree larger than a preset overlapping degree threshold value by adopting a curved text line detection algorithm to obtain a text line image.
In another embodiment, the clipping unit 101 uses a curved text line detection algorithm to clip to obtain one or more text line images in the following manner: the edge point set of the text detection box is divided into an upper edge point set and a lower edge point set. And performing curve fitting on the edge points in the upper edge point set to obtain an upper edge curve, and performing curve fitting on the edge points in the lower edge point set to obtain a lower edge curve. Based on the upper edge curve and the lower edge curve, respective center point coordinates within a text line width range corresponding to the text detection box are determined. And determining the clipping height of the text detection box based on the coordinates of the corresponding central points in the width range. And cutting the text detection box according to the designated width based on the width range and the cutting height to obtain a plurality of rectangular images. And splicing the plurality of rectangular images along the horizontal direction to obtain a text line image.
In still another embodiment, the filtering unit 102 determines the reference angle of the document image based on the angles of the plurality of text line images of the text line image in the following manner: based on the angles of the text line images, clustering the text line images to obtain clustered text line categories, and determining the number of the text line images in each text line category after clustering. And determining a reference class according to the number of the text line images in each text line class after clustering. And judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class.
In yet another embodiment, the filtering unit 102 determines the reference class according to the number of text line images in each text line class after clustering in the following manner: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. If the difference value between the numbers of the Chinese line images in the first number of the text line categories is larger than the first number threshold, determining the text category with the largest number of the Chinese line images in the first number of the text line categories as the reference category.
In yet another embodiment, the screening unit 102 is further configured to: if the difference value between the numbers of the text line images in the first number of text line categories is smaller than or equal to the first number threshold value, selecting the text line category with the largest average area of the text line image areas from the first number of text line categories as the reference category.
In yet another embodiment, the filtering unit 102 determines the reference class according to the number of text line images in each text line class after clustering in the following manner: and sequentially determining a first number of text line categories in the sequence of the number of the text line images from more to less in each text line category after clustering. Among the first number of text line categories, the text line category having the largest average area of the text line image area is determined as the reference category.
In still another embodiment, the determining unit 103 performs the category judgment on the reference class in the following manner, to obtain the category corresponding to the reference class: and acquiring a second number of text line images in the text line images corresponding to the reference class. A category of a second number of text line images is determined. And judging the category of the reference category based on the category of the second number of text line images to obtain the category corresponding to the reference category. The categories of the text line images comprise a forward horizontal text line, an inverted horizontal text line, a forward vertical text line or an inverted vertical text line.
In still another embodiment, the determining unit 103 determines the category of the second number of text line images in the reference category in the following manner: and inputting the second number of text line images in the reference class into the trained text line image class classification model to obtain the class of the second number of text line images in the reference class.
In still another embodiment, the determining unit 103 performs the category judgment on the reference category based on the category of the second number of text line images in the following manner, to obtain the category corresponding to the reference category: and determining the category of each text line image and the number of the text line images corresponding to each category in the second number of the text line images. And when the number of the text line images corresponding to the category with the largest number of the text line images is larger than the second number threshold, taking the category with the largest number of the text line images as the category corresponding to the reference category.
In a further embodiment, the determining unit 103 is further configured to: and when the number of the text line images corresponding to the category with the largest number of the text line images is smaller than or equal to the second number threshold, taking the default category as the category corresponding to the reference category.
In still another embodiment, the determination unit 103 determines the rotation angle of the document image based on the reference angle in the following manner: and determining the average angle of the reference class according to the angles of the text line images in the reference class. And determining the rotation angle of the document image based on the class corresponding to the reference class and the average angle.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Further, in exemplary embodiments, the means for determining the rotation angle of the document image may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for performing the above method. For example, the rotation angle determining means for determining a document image includes: a memory for storing instructions; and a processor for calling the instructions stored in the memory to execute the method for determining the rotation angle of the document image provided by any one of the embodiments.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory, comprising instructions executable by a processor of a rotation angle device that determines a document image to perform the above method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It is further understood that the term "plurality" in this disclosure means two or more, and other adjectives are similar thereto. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It is further understood that the terms "first," "second," and the like are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the expressions "first", "second", etc. may be used entirely interchangeably. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.
It will be further understood that "connected" includes both direct connection where no other member is present and indirect connection where other element is present, unless specifically stated otherwise.
It will be further understood that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (14)

1. A method of determining a rotation angle of a document image, the method comprising:
detecting and cutting text lines included in the document image to obtain a plurality of text line images;
Determining angles of the plurality of text line images, wherein the angles of the text line images are included angles between a first edge of a text line in the text line images and a horizontal direction;
determining a reference angle of the document image based on angles of the text line images;
Determining a rotation angle of the document image based on the reference angle;
Wherein the determining the reference angle of the document image based on the angles of the text line images includes:
Clustering the text line images based on the angles of the text line images to obtain clustered text line categories, and determining the number of the text line images in each clustered text line category;
determining a reference class according to the number of text line images in each text line class after clustering;
judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class;
the step of judging the class of the reference class to obtain the class corresponding to the reference class comprises the following steps:
acquiring a second number of text line images in the text line images corresponding to the reference class;
Determining a category of the second number of text line images;
Judging the category of the reference class based on the category of the second number of text line images to obtain the category corresponding to the reference class; the categories of the text line images comprise forward horizontal text lines, reverse horizontal text lines, forward vertical text lines or reverse vertical text lines.
2. The method for determining a rotation angle of a document image according to claim 1, wherein the detecting and clipping the text lines included in the document image to obtain a plurality of text line images includes:
Detecting a plurality of text lines included in the document image to obtain a plurality of text detection boxes, and determining the overlapping degree of the text lines corresponding to the text detection boxes;
Cutting text lines with the overlapping degree equal to a preset overlapping degree threshold value by adopting a straight text line detection algorithm to obtain text line images;
and cutting the text lines with the text line overlapping degree larger than a preset overlapping degree threshold value by adopting a curved text line detection algorithm to obtain a text line image.
3. The method for determining a rotation angle of a document image according to claim 2, wherein said clipping using a curved text line detection algorithm results in one or more text line images, comprising:
dividing an edge point set of the text detection box into an upper edge point set and a lower edge point set;
Performing curve fitting on the edge points in the upper edge point set to obtain an upper edge curve, and performing curve fitting on the edge points in the lower edge point set to obtain a lower edge curve;
determining coordinates of each center point in a text line width range corresponding to the text detection box based on the upper edge curve and the lower edge curve;
Determining the clipping height of the text detection box based on the coordinates of each corresponding center point in the width range;
based on the width range and the cutting height, cutting the text detection box according to the appointed width to obtain a plurality of rectangular images;
And splicing the rectangular images along the horizontal direction to obtain a text line image.
4. The method for determining a rotation angle of a document image according to claim 1, wherein the determining a reference class according to the number of text line images in each text line class after clustering comprises:
Sequentially determining a first number of text line categories in the clustered text line categories according to the sequence of the number of the text line images from more to less;
and if the difference value between the numbers of the Chinese line images in the first number of the text line categories is larger than a first number threshold, determining the text category with the largest number of the Chinese line images in the first number of the text line categories as a reference category.
5. The method of determining a rotation angle of a document image according to claim 4, wherein the method of determining a rotation angle of a document image further comprises:
And if the difference value between the numbers of the Chinese line images in the first number of the text line categories is smaller than or equal to a first number threshold value, selecting the text line category with the largest average area of the text line image areas from the first number of the text line categories as a reference category.
6. The method for determining a rotation angle of a document image according to claim 1, wherein the determining a reference class according to the number of text line images in each text line class after clustering comprises:
Sequentially determining a first number of text line categories in the clustered text line categories according to the sequence of the number of the text line images from more to less;
And determining the text line category with the largest average area of the text line image area as a reference category in the first number of text line categories.
7. The method of determining a rotation angle of a document image according to claim 1, wherein said determining a category of said second number of text line images comprises:
And inputting the second number of text line images in the reference class into a trained text line image class classification model to obtain the class of the second number of text line images in the reference class.
8. The method for determining a rotation angle of a document image according to claim 7, wherein the performing the category judgment on the reference category based on the category of the second number of text line images to obtain the category corresponding to the reference category includes:
determining the category of each text line image and the number of the text line images corresponding to each category in the second number of the text line images;
And when the number of the text line images corresponding to the category with the largest number of the text line images is larger than a second number threshold, taking the category with the largest number of the text line images as the category corresponding to the reference category.
9. The method of determining a rotation angle of a document image according to claim 8, wherein the method of determining a rotation angle of a document image further comprises:
and taking a default category as the category corresponding to the reference category under the condition that the number of the text line images corresponding to the category with the largest number of the text line images is smaller than or equal to the second number threshold.
10. The method of determining a rotation angle of a document image according to claim 1, wherein the determining the rotation angle of the document image based on the reference angle includes:
Determining an average angle of the reference class according to the angles of the text line images in the reference class;
And determining the rotation angle of the document image based on the class corresponding to the reference class and the average angle.
11. A rotation angle determining apparatus for determining a rotation angle of a document image, the rotation angle determining apparatus comprising:
The clipping unit is used for detecting and clipping text lines included in the document image to obtain a plurality of text line images, and determining angles of the text line images, wherein the angles of the text line images are included angles between a first edge of a Chinese line in the text line images and a horizontal direction;
a screening unit configured to determine a reference angle of the document image based on angles of the text line images;
A determining unit configured to determine a rotation angle of the document image based on the reference angle;
the screening unit is configured to determine a reference angle of the document image based on angles of the plurality of text line images in the following manner:
Clustering the text line images based on the angles of the text line images to obtain clustered text line categories, and determining the number of the text line images in each clustered text line category;
determining a reference class according to the number of text line images in each text line class after clustering;
judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class;
the screening unit is configured to perform category judgment on the reference class in the following manner to obtain a category corresponding to the reference class:
acquiring a second number of text line images in the text line images corresponding to the reference class;
Determining a category of the second number of text line images;
Judging the category of the reference class based on the category of the second number of text line images to obtain the category corresponding to the reference class; the categories of the text line images comprise forward horizontal text lines, reverse horizontal text lines, forward vertical text lines or reverse vertical text lines.
12. The apparatus for determining a rotation angle of a document image according to claim 11, wherein said filtering unit determines a reference angle of said document image based on angles of a plurality of text line images of said text line images in such a manner that:
Clustering the text line images based on the angles of the text line images to obtain clustered text line categories, and determining the number of the text line images in each clustered text line category;
determining a reference class according to the number of text line images in each text line class after clustering;
and judging the category of the reference class to obtain the category corresponding to the reference class, and determining the reference angle of the document image based on the category corresponding to the reference class.
13. A rotation angle determining apparatus for determining a rotation angle of a document image, the rotation angle determining apparatus comprising:
A memory for storing instructions; and
A processor for invoking the instructions stored in the memory to perform the method of determining the rotation angle of a document image as claimed in any one of claims 1-10.
14. A computer-readable storage medium having stored therein instructions which, when executed by a processor, perform the method of determining a rotation angle of a document image according to any one of claims 1-10.
CN202011410416.2A 2020-12-04 2020-12-04 Method, device and storage medium for determining rotation angle of document image Active CN112434640B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011410416.2A CN112434640B (en) 2020-12-04 2020-12-04 Method, device and storage medium for determining rotation angle of document image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011410416.2A CN112434640B (en) 2020-12-04 2020-12-04 Method, device and storage medium for determining rotation angle of document image

Publications (2)

Publication Number Publication Date
CN112434640A CN112434640A (en) 2021-03-02
CN112434640B true CN112434640B (en) 2024-04-30

Family

ID=74691915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011410416.2A Active CN112434640B (en) 2020-12-04 2020-12-04 Method, device and storage medium for determining rotation angle of document image

Country Status (1)

Country Link
CN (1) CN112434640B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114359889B (en) * 2022-03-14 2022-06-21 北京智源人工智能研究院 Text recognition method for long text data
CN115830613A (en) * 2023-01-09 2023-03-21 广州佰锐网络科技有限公司 Document intelligent acquisition sorting method, calling method, storage medium and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211048A (en) * 2019-05-28 2019-09-06 湖北华中电力科技开发有限责任公司 A kind of complicated archival image Slant Rectify method based on convolutional neural networks
CN110458167A (en) * 2019-08-20 2019-11-15 浙江工业大学 A kind of metalwork surface curvature line of text antidote
CN111353961A (en) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 Document curved surface correction method and device
CN111553344A (en) * 2020-04-17 2020-08-18 携程旅游信息技术(上海)有限公司 Method, system, device and storage medium for correcting inclination of text image

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8787695B2 (en) * 2012-11-20 2014-07-22 Eastman Kodak Company Image rectification using text line tracks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211048A (en) * 2019-05-28 2019-09-06 湖北华中电力科技开发有限责任公司 A kind of complicated archival image Slant Rectify method based on convolutional neural networks
CN110458167A (en) * 2019-08-20 2019-11-15 浙江工业大学 A kind of metalwork surface curvature line of text antidote
CN111353961A (en) * 2020-03-12 2020-06-30 上海合合信息科技发展有限公司 Document curved surface correction method and device
CN111553344A (en) * 2020-04-17 2020-08-18 携程旅游信息技术(上海)有限公司 Method, system, device and storage medium for correcting inclination of text image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种快速的文档图像倾斜角检测算法;吴军;侯德文;刘江;;电子技术与软件工程(02);全文 *

Also Published As

Publication number Publication date
CN112434640A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
US11164027B2 (en) Deep learning based license plate identification method, device, equipment, and storage medium
CN110569830B (en) Multilingual text recognition method, device, computer equipment and storage medium
CN107609549B (en) Text detection method for certificate image in natural scene
US8942484B2 (en) Text detection using image regions
WO2017020723A1 (en) Character segmentation method and device and electronic device
KR20210110823A (en) Image recognition method, training method of recognition model, and related devices and devices
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
US8325998B2 (en) Multidirectional face detection method
US9576210B1 (en) Sharpness-based frame selection for OCR
CN110619333B (en) Text line segmentation method, text line segmentation device and electronic equipment
CN112434640B (en) Method, device and storage medium for determining rotation angle of document image
CN113486828B (en) Image processing method, device, equipment and storage medium
US9418316B1 (en) Sharpness-based frame selection for OCR
CN110647882A (en) Image correction method, device, equipment and storage medium
CN111680690B (en) Character recognition method and device
CN109738450B (en) Method and device for detecting notebook keyboard
CN111507957B (en) Identity card picture conversion method and device, computer equipment and storage medium
CN111368632A (en) Signature identification method and device
CN112949649B (en) Text image identification method and device and computing equipment
CN110796663A (en) Picture clipping method, device, equipment and storage medium
JP6542230B2 (en) Method and system for correcting projected distortion
CN113743318A (en) Table structure identification method based on row and column division, storage medium and electronic device
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
US11367296B2 (en) Layout analysis
CN111881732B (en) SVM (support vector machine) -based face quality evaluation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant