CN112434696A - Text direction correction method, device, equipment and storage medium - Google Patents

Text direction correction method, device, equipment and storage medium Download PDF

Info

Publication number
CN112434696A
CN112434696A CN202011458939.4A CN202011458939A CN112434696A CN 112434696 A CN112434696 A CN 112434696A CN 202011458939 A CN202011458939 A CN 202011458939A CN 112434696 A CN112434696 A CN 112434696A
Authority
CN
China
Prior art keywords
text
target
correction
corrected
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011458939.4A
Other languages
Chinese (zh)
Inventor
孙磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202011458939.4A priority Critical patent/CN112434696A/en
Publication of CN112434696A publication Critical patent/CN112434696A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images

Abstract

The embodiment of the invention discloses a text direction correction method, a text direction correction device, text direction correction equipment and a storage medium, wherein each target polygon frame is determined by combining a target network model according to an acquired text image to be corrected, wherein the target network model is trained by adopting a set training method; determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame; the method comprises the steps of correcting a text image to be corrected according to each text correction direction to obtain a corrected text image, solving the problem that the image correction cannot be directly performed on the text image with any inclination angle in the prior art, determining a target polygon frame in the text image to be corrected by adopting a target network model trained by a set training method, further correcting the text image to be corrected, and correcting the text image with any inclination angle.

Description

Text direction correction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to a text direction correction method, a text direction correction device, text direction correction equipment and a storage medium.
Background
The text plays an important role in the life of people, and the text information contains abundant and accurate semantic information. Society has entered the digital age, and information storage and transmission are more and more using the medium of digital images, such as text scanning, digital archives. In the process of text-to-image conversion, due to human operation, hardware limitation and the like, the obtained picture may have a certain angle of inclination, even an inversion, which brings great challenges to subsequent text detection and text recognition.
In the prior art, the conventional image processing methods include a method based on projection, a method based on Hough transformation, a method based on K-nearest neighbor cluster, and the like, to perform tilt correction on a text image. A method for correcting a text picture based on useful PSENet of machine learning and a method for correcting a text picture based on a rectangular frame.
However, the traditional image processing method can only carry out inclination correction on the text picture, but cannot transfer the inverted picture to be positive; although the PSENet can correct the inclined text picture and detect whether the picture is inverted or not, the inclined correction and the inversion are processed in two steps, and the method is time-consuming; the correction method based on the rectangular frame only corrects the text pictures in several specific directions of 0, 90, 180 and 270, and cannot correct the text direction at any angle.
Disclosure of Invention
The invention provides a text direction correction method, a text direction correction device, text direction correction equipment and a storage medium, which are used for correcting a text which is cut obliquely in any direction.
In a first aspect, an embodiment of the present invention provides a text direction correction method, where the text direction correction method includes:
determining each target polygon frame according to the acquired text image to be corrected and a target network model, wherein the target network model is trained by adopting a set training method;
determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame;
and correcting the text image to be corrected according to each text correction direction to obtain a corrected text image.
In a second aspect, an embodiment of the present invention further provides a text direction correction apparatus, where the text direction correction apparatus includes:
the target frame determining module is used for determining each target polygon frame according to the acquired text image to be corrected and a target network model, wherein the target network model is trained by adopting a set training method;
the direction determining module is used for determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame;
and the correction module is used for correcting the text image to be corrected according to each text correction direction to obtain a corrected text image.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a text direction correction method as in any one of the embodiments of the present invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement a text direction correction method according to any one of the embodiments of the present invention.
The embodiment of the invention provides a text direction correction method, a text direction correction device, text direction correction equipment and a storage medium, wherein each target polygon frame is determined by combining a target network model according to an acquired text image to be corrected, wherein the target network model is trained by adopting a set training method; determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame; the method comprises the steps of correcting the text image to be corrected according to each text correction direction to obtain a corrected text image, solving the problem that the image correction cannot be directly carried out on the text image at any inclination angle in the prior art, determining a target polygon frame in the text image to be corrected by adopting a target network model trained by a set training method, determining the text correction direction according to the target polygon frame, further correcting the text image to be corrected, correcting the text image at any inclination angle, and being simple.
Drawings
Fig. 1 is a flowchart of a text direction correction method according to a first embodiment of the present invention;
FIG. 2 is an exemplary diagram of a target polygon box in a text to be corrected with a standard orientation according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating an example of a target polygon frame in a text to be corrected, the target polygon frame having an inclination angle according to a first embodiment of the present invention;
FIG. 4 is a flowchart of a text direction correction method according to a second embodiment of the present invention;
FIG. 5 is a flowchart illustrating an implementation of a training target network model in a text direction correction method according to a second embodiment of the present invention;
fig. 6 is a flowchart illustrating an implementation of obtaining a training sample set in a text direction correction method according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of a text direction correction apparatus according to a third embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer device in the fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a text direction correction method according to an embodiment of the present invention, which is applicable to a case of correcting a text image in any direction, and the method can be executed by a text direction correction apparatus, and specifically includes the following steps:
and step S110, determining each target polygon frame according to the acquired text image to be corrected and a target network model.
And the target network model is trained by adopting a set training method.
In this embodiment, the text image to be corrected may be understood as a text image that needs to be corrected, for example, when a text scan is performed and a text is converted into an image, the obtained image may be inclined due to human operations or hardware limitations, so that an angle correction is needed and the image obtained when the text is converted into the image is taken as the image to be corrected. The target network model can be understood as a convolutional neural network model which is based on deep learning and is trained in advance according to a set training method. The target polygon frame may be understood as a polygon frame generated for texts in different areas in the text image to be corrected, and the texts in different areas in the text image to be corrected are divided and framed by the target polygon frame.
The method includes the steps of obtaining a text image to be corrected by scanning a text, or directly obtaining the text image to be corrected from a storage space such as a database, inputting the text image to be corrected into a target network model to obtain a target polygon frame output by the model, wherein the target network model is based on fast-RCNN and LSTM, the fast-RCNN is used for extracting the text frame to be recognized, the LSTM is used for regressing to obtain the target polygon frame of the text, the text image to be corrected may contain noise, for example, when the text is placed on a desk for shooting, the obtained text image to be corrected contains desk information, the text frame in the text image to be corrected is extracted through the fast-RCNN, and the text frame is processed through the LSTM to obtain the target polygon frame of each text area in the text image.
For example, fig. 2 provides an exemplary diagram of a target polygon frame in a text to be corrected in a standard direction, in which a rectangular frame 101 is a minimum bounding rectangular frame containing a first text region in the same direction as the text image 1 to be corrected, and a rectangular frame 102 is a target polygon frame containing a first text region; the rectangular frame 103 is a minimum bounding rectangular frame containing the second text region in the same direction as the text image 1 to be corrected, and the polygon frame 104 is a target polygon frame containing the second text region. Compared with a rectangular frame, the target polygonal frame is more consistent with the real belonging range of the text region, and is more accurate in description of the text region. Fig. 3 is obtained by rotating fig. 2 by a certain angle, fig. 3 provides an exemplary diagram of a target polygon frame in a text to be corrected, where the target polygon frame has an inclination angle, the rectangle frame 201 in fig. 3 is a minimum bounding rectangle frame containing a first text region in the same direction as the text image 2 to be corrected, and the rectangle frame 202 is a target polygon frame containing the first text region; the rectangular box 203 is a minimum bounding rectangular box containing a second text region in the same direction as the text image 2 to be corrected, the polygonal box 204 is a target polygonal box containing the second text region, the text box 205 is a rectangular box containing all texts, taking a book picture as an example, the text box 205 corresponds to an image of a page of a book, and the text image 2 to be corrected corresponds to an image containing a page of the book and other irrelevant information. Compared with fig. 2, the rectangular box in fig. 3 is changed, and the target polygon box can still accurately describe the text region, and the point 1 in the target polygon box is still located at the upper left corner relative to the standard direction of the text to be corrected. The tilt angle of the text image to be corrected is thus determined by the predicted target polygon box.
And step S120, determining the text correction direction of each target polygon frame according to the coordinates of the first target point and the second target point in the target text label of each target polygon frame.
In this embodiment, the target text label may be understood as mark information storing coordinates of a plurality of position points; and selecting 7 points above the text area framed by the target polygon frame, selecting 7 points below the text area framed by the target polygon frame, and combining two points of the upper left corner and the lower right corner of the minimum circumscribed rectangle frame to form a target text label by the coordinate values of 16 points in total. The 14 points in fig. 2 or fig. 3 are 14 position points in the target polygon frame, the 14 points and two points at the upper left (15) and the lower right (16) of the minimum bounding rectangle frame (e.g., rectangle frame 101 or 103), and the coordinate values of the total 16 points constitute the target text label. The first target point is point 1 in the figure and the second target point is point 7 in the figure. The text correction direction is understood to be an angle at which the text image to be corrected is tilted, for example, the text correction direction is 10 °, in which case the text image to be corrected is tilted by 10 °, and the text image to be corrected is rotated by 10 ° to be corrected back to the standard direction. The angle in the embodiment of the present application is clockwise, and a specific standard angle direction may be selected according to an actual situation, which is not limited in the embodiment of the present application.
In the two-dimensional rectangular coordinate system, the coordinates of two points are known, the angle of each corner in a rectangular triangle with the two points as vertexes is obtained through mathematical calculation by combining the origin, the horizontal axis and the vertical axis of the coordinate system, and the text correction direction can be calculated through mathematical knowledge related to the angle.
And step S130, correcting the text image to be corrected according to the text correction directions to obtain a corrected text image.
When a text image to be corrected is processed, a plurality of target polygon frames may be obtained, a plurality of text correction directions are obtained through calculation, one text correction direction is selected from the plurality of text correction directions to correct the text image to be corrected, or texts in the plurality of target polygon frames are corrected in sequence aiming at the plurality of text correction directions, a plurality of corrected images are obtained, and finally the corrected text image is obtained. When the text correction image is corrected, the coordinates of each pixel point in the text are subjected to coordinate conversion according to the text correction direction to obtain the coordinates of each pixel point after correction, and further obtain the corrected text image.
The embodiment of the invention provides a text direction correction method, which is characterized in that each target polygon frame is determined by combining a target network model according to an acquired text image to be corrected, wherein the target network model is trained by adopting a set training method; determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame; the method comprises the steps of correcting a text image to be corrected according to each text correction direction to obtain a corrected text image, solving the problem that the image correction cannot be directly carried out on the text image at any inclination angle in the prior art, determining a target polygon frame in the text image to be corrected by adopting a target network model trained by a set training method, determining the text correction direction according to the target polygon frame, further correcting the text image to be corrected, and correcting the text image at any inclination angle.
Example two
Fig. 4 is a flowchart of a text direction correction method according to a second embodiment of the present invention. The technical scheme of the embodiment is further refined on the basis of the technical scheme, and specifically mainly comprises the following steps:
and step S210, determining each target polygon frame according to the acquired text image to be corrected and a target network model.
Further, fig. 5 provides a flowchart for implementing the training of the target network model in the text direction correction method, where the training of the target network model includes:
step S211, a training sample set including at least one training sample is obtained.
The training sample set is obtained through a preset sample selection method, and the training sample is composed of a rotary text image and a corresponding standard text label.
In the embodiment of the present application, the training samples may be understood as data samples in training the neural network model; a training sample set may be understood as a set of one or more training samples. Rotating the text image can be understood as text images of different inclination angles; the standard text label can be understood as a text label corresponding to each polygon frame in the rotated text image, namely coordinates of 14 points in the polygon frame and 2 points in the rectangular frame.
Further, fig. 6 provides a flowchart for implementing acquiring a training sample set in the text direction correction method, where the acquiring step of the training sample set includes:
step S2111, at least one training text image and a corresponding training text label are obtained.
In this embodiment, the training text image may be understood as a text image for training a network model to be trained, where a text region in the training text image is divided by polygon boxes, and each polygon box corresponds to one training text label, that is, coordinate values of 16 points. The method comprises the steps of shooting a text in advance to obtain a training text image, carrying out text region division on the training text image in an image processing or manual division mode and the like, and determining a corresponding training text label.
Step S2112, the training text image is rotated based on each set rotation angle to obtain each rotated text image and a corresponding standard text label, the rotated text image and the corresponding standard text label are used as a training sample, and each standard text label comprises a set initial position coordinate.
In the present embodiment, the set rotation angle may be 1 °, 5 °,10 °, 20 °, and so on, for example, the set rotation angle is increased by 10 ° in turn, that is, the training text images are rotated by 10 °, 20 °, 30 ° … 360 ° in turn, so that 36 training sample images can be obtained from one training sample image. The set initial position coordinates can be understood as coordinates of a point at the upper left corner position in a polygon frame corresponding to each text region in the training text image, for example, point 1 in fig. 2 or fig. 3, each polygon frame has a standard text label, the coordinates of point 1 (first point) in the standard text label are always located at the upper left corner in the standard direction of the text image, and the coordinates of point 1 are the set initial position coordinates, so that the text correction direction can be calculated through the predicted target polygon frame, and further the correction of the text image to be corrected is realized.
The training text images are sequentially rotated according to each set rotation angle, angle correction is needed to be carried out on corresponding training text labels to obtain rotated standard text labels, the rotating text images and the corresponding standard text labels form a training sample, and the training samples at all angles form a training sample set.
Illustratively, the embodiment of the present application provides a method for obtaining a training sample set, where when a training text image is rotated, a picture is rotated through warpAffine in Opencv, and the method fills the rotated picture with a black edge to be a rectangle again. After the training text label is rotated in the same way, angle correction is needed to be carried out to adapt to the image after the rotation filling, and the embodiment of the application provides a coordinate conversion formula: the coordinate of the point to be rotated is (X)0,Y0) The coordinates of the rotated point are (X, Y);
Figure BDA0002830532590000091
Figure BDA0002830532590000092
wherein, W0The image width of the training text image is obtained; h0The image height of the training text image is used; w1Is the image width of the rotated text image; h1Is the image height of the rotated text image; alpha is a set rotation angle.
After the training text image is rotated, calculating coordinates of 14 points corresponding to a polygon frame in a training text label according to the coordinate conversion formula to obtain 14 rotated coordinates; and determining the rotated polygonal frame, determining the minimum circumscribed rectangular frame of the polygonal frame in the direction of the rotated text image, combining the coordinates of the upper left corner and the lower right corner of the circumscribed rectangular frame with the coordinates of 14 points of the rotated polygonal frame to form a standard text label, and further forming a training sample set.
And S2113, forming a training sample set according to each training sample.
And S212, sequentially inputting each training sample into a given network model to be trained to obtain a corresponding prediction polygon frame.
In the embodiment of the application, the network model to be trained can be understood as an untrained deep learning-based neural network model, and is used for predicting a polygon frame in a text image; the predicted polygon frame may be understood as a polygon frame predicted by the network model to be trained according to the input rotation text image.
The training samples are input into the network model to be trained, the network model to be trained automatically processes the rotating text image in the training samples to obtain a predicted multi-deformation frame, and a plurality of predicted polygon frames may exist in one rotating text image.
Step S213, a given loss function expression is adopted, and a loss function is determined by combining the predicted text label in the predicted polygon frame and the corresponding standard text label.
In the present embodiment, the predicted text label is constituted by coordinate values of a plurality of points in the predicted polygon frame. Each predicted polygon has a predicted text label, and each predicted text label has a corresponding standard text label. And obtaining a corresponding loss function according to the loss function expression, the predicted text label and the corresponding standard text label.
Illustratively, the loss function expression may be:
Figure BDA0002830532590000101
wherein c, b, h, ω are the prediction classification, prediction rectangle, prediction height, prediction width, respectively. c. C*、b*、h*、ω*Representing the corresponding true value. N is the number of positive and negative text boxes propulses obtained by fast-RCNN, and Np is the number of positive propulses, where λ and μ are weights used to balance classification and detection loss. L isclsIs a classification loss function, LlocIs a position loss function. L iscls(c,c*) Is the text positive and negative sample classification loss, Lloc(b,b*) Is a loss function that predicts a rectangular box, Lloc(h,h*) And Lloc(ω,ω*) The loss of the polygon frame is stored in the form of offset of the upper left corner of the rectangular frame in the network, and because the distribution of the offset after rotation is more complex, the original detection loss weight mu value is adjusted through a plurality of experiments, and the mu value with the best effect is 6 mu, therefore, the loss mu of the polygon frame is multiplied by 6 when the network is trained, and the method for adjusting the loss weight of a complex task can improve the effect of the network in the experiment and obtain a better polygon prediction result.
And S214, performing back propagation on the network model to be trained through the loss function to obtain a target network model.
And in the training process of the neural network model, continuously updating the adjustment model by a back propagation method until the output of the model is consistent with the target. And after the loss function is determined, performing back propagation on the network model to be trained by using the loss function to obtain a target network model. The embodiment of the invention does not limit the specific back propagation process and can be set according to specific conditions.
Step S220, determining a text correction direction corresponding to each target polygon frame according to the coordinates of the first target point and the second target point in the target text label of each target polygon frame and a predetermined direction formula.
In the present embodiment, the predetermined direction formula may be understood as a predetermined mathematical calculation formula for calculating the text correction direction according to the coordinate values. Illustratively, the first target point has coordinates of (x)1,y1) The coordinates of the second target point are (x)2,y2) And angle is a text correction direction, and the embodiment of the application provides a formula for calculating the text correction direction:
Figure BDA0002830532590000111
and step S230, correcting the text image to be corrected according to the text correction directions to obtain a corrected text image.
Further, the text image to be corrected is corrected according to the text correction directions, and the corrected text image can be obtained by the following method:
determining pixel point coordinates of each pixel point corresponding to each target polygon frame in the text image to be corrected; determining the text correction direction corresponding to each target polygon frame as a target correction direction; and determining pixel point correction coordinates corresponding to each target polygon frame according to each target correction direction, each pixel point coordinate and a predetermined coordinate correction formula, and forming a corrected text image according to each pixel point correction coordinate.
In the present embodiment, the target correction direction may be understood as a final correction direction of the text correction; the pixel point coordinates are coordinates of pixel points in the text image to be corrected, and the pixel point correction coordinates are coordinates after pixel point correction.
The text image to be corrected comprises one or more target polygon frames, namely two target polygon frames, for example, pixel point coordinates of each pixel point in the first target polygon frame are determined, the text correction direction corresponding to the first target polygon frame is determined as a target correction direction, the pixel point coordinates and the target correction direction of each pixel point are brought into a predetermined coordinate correction formula, pixel point correction coordinates corresponding to each pixel point are obtained, and a corrected image is formed according to the correction coordinates of each pixel point. The essence is that each pixel point is moved from point A to point B, and the original RGB information of the pixel point is unchanged to form a corrected image. And moving the pixel points of the second target polygon frame according to the same mode to obtain a corrected image, and forming a corrected text image according to the two corrected images. The method rotates the image of each text area separately for the following text recognition and other operations, and is suitable for the situation that the picture contains less text areas.
Further, the text image to be corrected is corrected according to the text correction directions, and the corrected text image can be obtained by the following method:
determining angle intervals corresponding to the correction directions of the texts based on a preset angle division rule, and recording the occurrence times of the angle intervals; determining the angle interval with the highest occurrence frequency as a target angle interval, and selecting a target correcting direction from the target angle interval; and determining the correction coordinates of each pixel point according to the target correction direction, the pixel point coordinates of each pixel point in the text image to be corrected and a predetermined coordinate correction formula, and forming a corrected text image according to the correction coordinates of each pixel point.
In the present embodiment, the angle division rule may be understood as a manner of dividing angles of 0 ° to 360 °, exemplarily, by 10 °, and accordingly, the angle intervals may be [0 °,10 ° ], (10 ° -20 ° ] … (350 ° -360 ° ]; and the target angle interval may be understood as an angle interval having the highest occurrence number of times among all angle intervals for determining the target correction direction.
Taking the example that the number of target polygon frames in a text image to be corrected is equal to 10, each target polygon frame corresponds to a text correction direction, the text correction directions are respectively 10 degrees, 11 degrees, 9.5 degrees, 11 degrees, 9.8 degrees, 11 degrees, 10.9 degrees, 11 degrees and 11 degrees, wherein the angle intervals corresponding to 10 degrees, 9.5 degrees and 9.8 degrees are all [0 degrees ], 10 degrees ], the angle intervals corresponding to 11 degrees and 10.9 degrees are all (10 degrees-20 degrees ], so the angle intervals [0 degrees, 10 degrees ] appear for 3 times, the angle intervals (10 degrees-20 degrees) appear for 7 times, then (10 degrees-20 degrees) are taken as the target angle intervals, the target correction positive direction is selected from the target angle intervals, the median value 15 degrees, the maximum value 20 and the like can be selected as the target correction direction, the text image to be corrected is corrected uniformly according to the target correction direction, the coordinate of pixel points, the pixel, And sequentially substituting the target correction direction into a coordinate correction formula to obtain pixel point correction coordinates corresponding to each pixel point, and forming a corrected text image according to the pixel point correction coordinates. The method is suitable for images of texts basically conforming to the same angle, such as form images and scanned book images.
Further, the coordinate correction formula is:
Figure BDA0002830532590000141
Figure BDA0002830532590000142
wherein x is the abscissa of the pixel point correction coordinate; y is a vertical coordinate in the pixel point correction coordinate; x is the number of0The horizontal coordinate in the pixel point coordinate; y is0The vertical coordinate in the pixel point coordinate; w is a0The image width of the text image to be corrected; h is0The image height of the text image to be corrected; w is a1Is the image width of the corrected text image; h is0Is the image height of the corrected text image; angle is the target correction direction. The corrected text image is an image in which background noise of a table, a machine, or the like is removed and only a text region is included, or the image height and the image width of the corrected text image should be the width and the height of the entire text region. First, the coordinates of each pixel point in the text image to be corrected (for example, the coordinates of each pixel point included in the text box 25 in fig. 3) are identified, and the image height and the image width of the corrected text image are determined according to the maximum abscissa, the minimum abscissa, the maximum ordinate, and the minimum ordinate in the coordinates of each pixel point. Or when the length and width information of the processed image is known, for example, the A4 paper is scanned, and the length and width of the A4 paper are known, so that the corrected image height and image width are known and are directly input into a computer device as known values, and thus when the image to be scanned is a plurality of images with the same specification, the input information is directly used for calculation without calculation each time.
The embodiment of the invention provides a text direction correction method, which solves the problem that the image correction cannot be directly carried out on a text image with any inclination angle in the prior art, and the target network model trained by a set training method is adopted to determine a target polygon frame in the text image to be corrected. The method comprises the steps of obtaining a training sample set through a set sample selection method, obtaining text images of all inclined angles to the maximum degree, obtaining rotary text images of multiple angles by rotating one training sample image, and enabling the training sample set to be simple and convenient in expansion mode. The application also provides two modes for correcting the text image to be corrected, and the text image to be corrected is corrected according to different types and different requirements.
EXAMPLE III
Fig. 7 is a schematic structural diagram of a text direction correction apparatus according to a third embodiment of the present invention, where the apparatus includes: a target frame determination module 31, a direction determination module 32 and a correction module 33.
The target frame determining module 31 is configured to determine each target polygon frame according to the acquired text image to be corrected and a target network model, where the target network model is trained by using a set training method; a direction determining module 32, configured to determine a text correction direction of each target polygon frame according to coordinates of a first target point and a second target point in a target text label of each target polygon frame; and the correcting module 33 is configured to correct the text image to be corrected according to each text correcting direction, so as to obtain a corrected text image.
The embodiment of the invention provides a text direction correcting device, which solves the problem that the image correction cannot be directly carried out on a text image with any inclination angle in the prior art, determines a target polygon frame in the text image to be corrected by adopting a target network model trained by a set training method, determines the text correcting direction according to the target polygon frame, further corrects the text image to be corrected, can correct the text image with any inclination angle, is simple in method, can directly process the text image to be corrected in any direction, saves time and improves the processing efficiency.
Further, the apparatus further comprises:
the system comprises a sample set acquisition module, a standard text label acquisition module and a data processing module, wherein the sample set acquisition module is used for acquiring a training sample set containing at least one training sample, the training sample set is acquired by a preset sample selection method, and the training sample consists of a rotary text image and a corresponding standard text label;
the prediction frame determining module is used for sequentially inputting the training samples into a given network model to be trained to obtain corresponding prediction polygon frames;
the function determining module is used for determining a loss function by adopting a given loss function expression and combining a predicted text label in the predicted polygon frame and a corresponding standard text label;
and the model determining module is used for performing back propagation on the network model to be trained through the loss function to obtain a target network model.
Further, a sample set acquisition module comprising:
the acquisition unit is used for acquiring at least one training text image and a corresponding training text label;
the rotating unit is used for rotating the training text images based on the set rotating angles to obtain the rotating text images and the corresponding standard text labels, the rotating text images and the corresponding standard text labels are used as a training sample, and the standard text labels comprise set initial position coordinates;
and the forming unit is used for forming a training sample set according to each training sample.
Further, the direction determining module 32 is specifically configured to: and determining the text correction direction corresponding to each target polygon frame according to the coordinates of the first target point and the second target point in the target text label of each target polygon frame and a predetermined direction formula.
Further, the correction module 33 includes:
the coordinate determination unit is used for determining pixel point coordinates of each pixel point corresponding to each target polygon frame in the text image to be corrected;
a direction determining unit configured to determine the text correction direction corresponding to each of the target polygon frames as a target correction direction;
and the first correction unit is used for determining pixel correction coordinates corresponding to each target polygon frame according to each target correction direction, each pixel coordinate and a predetermined coordinate correction formula, and forming a corrected text image according to each pixel correction coordinate.
Further, the correction module 33 includes:
the interval determining unit is used for determining an angle interval corresponding to each text correction direction based on a preset angle division rule and recording the occurrence frequency of each angle interval;
the direction selection unit is used for determining the angle interval with the highest occurrence frequency as a target angle interval and selecting a target correction direction from the target angle interval;
and the second correction unit is used for determining correction coordinates of all the pixel points according to the target correction direction, the pixel point coordinates of all the pixel points in the text image to be corrected and a predetermined coordinate correction formula, and forming a corrected text image according to all the pixel point correction coordinates.
Further, the coordinate correction formula is:
Figure BDA0002830532590000171
Figure BDA0002830532590000172
wherein x is the abscissa of the pixel point correction coordinate; y is a vertical coordinate in the pixel point correction coordinate; x is the number of0The horizontal coordinate in the pixel point coordinate; y is0The vertical coordinate in the pixel point coordinate; w is a0The image width of the text image to be corrected; h is0The image height of the text image to be corrected; w is a1Is the image width of the corrected text image; h is1To be correctedThe image height of the text image of (1); angle is the target correction direction.
The text direction correcting device provided by the embodiment of the invention can execute the text direction correcting method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.
Example four
Fig. 8 is a schematic structural diagram of a computer apparatus according to a fourth embodiment of the present invention, as shown in fig. 8, the apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of processors 40 in the device may be one or more, and one processor 40 is taken as an example in fig. 8; the processor 40, the memory 41, the input device 42 and the output device 43 in the apparatus may be connected by a bus or other means, for example in fig. 8.
The memory 41, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the text direction correction method in the embodiment of the present invention (for example, the target block determination module 31, the direction determination module 32, and the correction module 33 in the text direction correction apparatus). The processor 40 executes various functional applications of the apparatus and data processing, i.e., implements the above-described text direction correction method, by executing software programs, instructions, and modules stored in the memory 41.
The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 42 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 43 may include a display device such as a display screen.
EXAMPLE five
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a text direction correction method, including:
determining each target polygon frame according to the acquired text image to be corrected and a target network model, wherein the target network model is trained by adopting a set training method;
determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame;
and correcting the text image to be corrected according to each text correction direction to obtain a corrected text image.
Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the text direction correction method provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the text direction correction apparatus, the units and modules included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A text direction correction method, comprising:
determining each target polygon frame according to the acquired text image to be corrected and a target network model, wherein the target network model is trained by adopting a set training method;
determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame;
and correcting the text image to be corrected according to each text correction direction to obtain a corrected text image.
2. The method of claim 1, wherein the step of training the target network model comprises:
acquiring a training sample set containing at least one training sample, wherein the training sample set is acquired by a preset sample selection method, and the training sample consists of a rotary text image and a corresponding standard text label;
inputting the training samples into a given network model to be trained in sequence to obtain a corresponding prediction polygon frame;
determining a loss function by adopting a given loss function expression and combining a predicted text label in the predicted polygon frame and a corresponding standard text label;
and performing back propagation on the network model to be trained through the loss function to obtain a target network model.
3. The method of claim 2, wherein the step of obtaining the training sample set comprises:
acquiring at least one training text image and a corresponding training text label;
rotating the training text images based on the set rotation angles to obtain the rotation text images and corresponding standard text labels, wherein the rotation text images and the corresponding standard text labels are used as a training sample, and each standard text label comprises a set initial position coordinate;
a training sample set is formed from each of the training samples.
4. The method of claim 1, wherein determining the text correction direction for each of the target polygon boxes based on the coordinates of the first target point and the second target point in the target text label of each of the target polygon boxes comprises:
and determining the text correction direction corresponding to each target polygon frame according to the coordinates of the first target point and the second target point in the target text label of each target polygon frame and a predetermined direction formula.
5. The method according to claim 1, wherein the correcting the text image to be corrected according to each text correction direction to obtain a corrected text image comprises:
determining pixel point coordinates of each pixel point corresponding to each target polygon frame in the text image to be corrected;
determining the text correction direction corresponding to each of the target polygon boxes as a target correction direction;
and determining pixel point correction coordinates corresponding to each target polygon frame according to each target correction direction, each pixel point coordinate and a predetermined coordinate correction formula, and forming a corrected text image according to each pixel point correction coordinate.
6. The method according to claim 1, wherein the correcting the text image to be corrected according to each text correction direction to obtain a corrected text image comprises:
determining an angle interval corresponding to each text correction direction based on a preset angle division rule, and recording the occurrence frequency of each angle interval;
determining the angle interval with the highest occurrence frequency as a target angle interval, and selecting a target correcting direction from the target angle interval;
and determining the correction coordinates of each pixel point according to the target correction direction, the pixel point coordinates of each pixel point in the text image to be corrected and a predetermined coordinate correction formula, and forming a corrected text image according to the correction coordinates of each pixel point.
7. The method according to claim 5 or 6, wherein the coordinate correction formula is:
Figure FDA0002830532580000031
Figure FDA0002830532580000032
wherein x is the abscissa of the pixel point correction coordinate; y is a vertical coordinate in the pixel point correction coordinate; x is the number of0The horizontal coordinate in the pixel point coordinate; y is0The vertical coordinate in the pixel point coordinate; w is a0The image width of the text image to be corrected; h is0The image height of the text image to be corrected; w is a1Is the image width of the corrected text image; h is1Is the image height of the corrected text image; angle is the target correction direction.
8. A text orientation correction apparatus, characterized by comprising:
the target frame determining module is used for determining each target polygon frame according to the acquired text image to be corrected and a target network model, wherein the target network model is trained by adopting a set training method;
the direction determining module is used for determining the text correction direction of each target polygon frame according to the coordinates of a first target point and a second target point in the target text label of each target polygon frame;
and the correction module is used for correcting the text image to be corrected according to each text correction direction to obtain a corrected text image.
9. A computer device, the device comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a text orientation correction method as recited in any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a text direction correction method according to any one of claims 1 to 7.
CN202011458939.4A 2020-12-11 2020-12-11 Text direction correction method, device, equipment and storage medium Pending CN112434696A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011458939.4A CN112434696A (en) 2020-12-11 2020-12-11 Text direction correction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011458939.4A CN112434696A (en) 2020-12-11 2020-12-11 Text direction correction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112434696A true CN112434696A (en) 2021-03-02

Family

ID=74692438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011458939.4A Pending CN112434696A (en) 2020-12-11 2020-12-11 Text direction correction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112434696A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449724A (en) * 2021-06-09 2021-09-28 浙江大华技术股份有限公司 Image text correction method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN111783763A (en) * 2020-07-07 2020-10-16 厦门商集网络科技有限责任公司 Text positioning box correction method and system based on convolutional neural network
US20200364485A1 (en) * 2019-05-16 2020-11-19 Bank Of Montreal Deep-learning-based system and process for image recognition

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364485A1 (en) * 2019-05-16 2020-11-19 Bank Of Montreal Deep-learning-based system and process for image recognition
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN111783763A (en) * 2020-07-07 2020-10-16 厦门商集网络科技有限责任公司 Text positioning box correction method and system based on convolutional neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449724A (en) * 2021-06-09 2021-09-28 浙江大华技术股份有限公司 Image text correction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108304814B (en) Method for constructing character type detection model and computing equipment
CN111027563A (en) Text detection method, device and recognition system
US5892854A (en) Automatic image registration using binary moments
JP2019149788A (en) Image processing apparatus, and control method and program for the same
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
CN111368638A (en) Spreadsheet creation method and device, computer equipment and storage medium
WO2021051527A1 (en) Image segmentation-based text positioning method, apparatus and device, and storage medium
WO2022105569A1 (en) Page direction recognition method and apparatus, and device and computer-readable storage medium
CN113436080A (en) Seal image processing method, device, equipment and storage medium
CN111062317A (en) Method and system for cutting edges of scanned document
CN112949649B (en) Text image identification method and device and computing equipment
CN112926565B (en) Picture text recognition method, system, equipment and storage medium
CN112434696A (en) Text direction correction method, device, equipment and storage medium
CN113052162B (en) Text recognition method and device, readable storage medium and computing equipment
CN106056575B (en) A kind of image matching method based on like physical property proposed algorithm
CN113936137A (en) Method, system and storage medium for removing overlapping of image type text line detection areas
CN114332880A (en) Text detection method, device, equipment and storage medium
CN112183253A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN111508045A (en) Picture synthesis method and device
CN114359889B (en) Text recognition method for long text data
CN116468611B (en) Image stitching method, device, equipment and storage medium
CN113850238B (en) Document detection method and device, electronic equipment and storage medium
CN111597375B (en) Picture retrieval method based on similar picture group representative feature vector and related equipment
CN116704513B (en) Text quality detection method, device, computer equipment and storage medium
CN113850805B (en) Multi-document detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination