CN114494686A - Text image correction method, text image correction device, electronic equipment and storage medium - Google Patents

Text image correction method, text image correction device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114494686A
CN114494686A CN202210110162.5A CN202210110162A CN114494686A CN 114494686 A CN114494686 A CN 114494686A CN 202210110162 A CN202210110162 A CN 202210110162A CN 114494686 A CN114494686 A CN 114494686A
Authority
CN
China
Prior art keywords
text image
alternative
control point
corrected
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210110162.5A
Other languages
Chinese (zh)
Inventor
范森
乔美娜
刘珊珊
吕鹏原
章成全
姚锟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210110162.5A priority Critical patent/CN114494686A/en
Publication of CN114494686A publication Critical patent/CN114494686A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Input (AREA)

Abstract

The disclosure provides a text image correction method, a text image correction device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as optical character recognition. The specific implementation scheme is as follows: determining at least one first alternative control point sequence from the boundary of a text region to be corrected of a text image to be corrected; obtaining alternative corrected text image data of at least one alternative corrected text image according to the position information of each of the plurality of first alternative control points included in the at least one first alternative control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected; and determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.

Description

Text image correction method, text image correction device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly to the field of deep learning and computer vision technologies, and can be applied to scenes such as optical character recognition. In particular, the invention relates to a text image rectification method, a text image rectification device, an electronic device and a storage medium.
Background
With the development of computer technology, text image recognition technology has also been developed and widely used in many fields. Such as content auditing, text image electronization, or text image translation. In the text image recognition process, there may be a case where the text lines in the text image are distorted and the text image needs to be corrected.
Text image rectification may refer to an operation of restoring distorted text in a text image to a flat state.
Disclosure of Invention
The disclosure provides a text image correction method, a text image correction device, an electronic device and a storage medium.
According to an aspect of the present disclosure, there is provided a text image rectification method including: determining at least one first alternative control point sequence from the boundary of a text region to be corrected of a text image to be corrected, wherein the first alternative control point sequence comprises a plurality of first alternative control points; obtaining alternative corrected text image data of at least one alternative corrected text image according to the position information of each of the plurality of first alternative control points included in the at least one first alternative control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected; and determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.
According to another aspect of the present disclosure, there is provided a text image rectification apparatus including: the correction device comprises a first determination module, a second determination module and a correction module, wherein the first determination module is used for determining at least one first alternative control point sequence from the boundary of a text region to be corrected of a text image to be corrected, and the first alternative control point sequence comprises a plurality of first alternative control points; a first obtaining module, configured to obtain candidate corrected text image data of at least one candidate corrected text image according to position information of each of a plurality of first candidate control points included in the at least one first candidate control point sequence and position information of each of a plurality of expected control points included in an expected control point sequence of an expected text image corresponding to the text image to be corrected; and the second determining module is used for determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to the present disclosure.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described in the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically illustrates an exemplary system architecture to which the text image rectification method and apparatus may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a text image rectification method according to an embodiment of the present disclosure;
FIG. 3A schematically illustrates an example schematic of a text image rectification process according to an embodiment of the disclosure;
FIG. 3B schematically shows an example schematic diagram of a second alternative control point sequence in accordance with an embodiment of the present disclosure;
FIG. 3C schematically illustrates an example schematic diagram of a prospective control point sequence, in accordance with an embodiment of the disclosure;
FIG. 4 schematically illustrates a block diagram of a text image rectification apparatus according to an embodiment of the present disclosure; and
fig. 5 schematically shows a block diagram of an electronic device adapted to implement a text image rectification method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The quality of the text image recognition effect is related to the quality of the text image. If there are distorted lines of text in the text image, the quality of the text image will be affected. Thus, the text image with distortion needs to be corrected.
Therefore, the embodiment of the disclosure provides a text image rectification scheme. At least one first alternative control point sequence is determined from the boundary of the text region to be corrected of the text image to be corrected. The first sequence of alternative control points includes a plurality of first alternative control points. And obtaining alternative corrected text image data of the at least one alternative corrected text image according to the position information of the plurality of first alternative control points included by the at least one first alternative control point sequence and the position information of the plurality of expected control points included by the expected control point sequence of the expected text image corresponding to the text image to be corrected. And determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.
The plurality of first candidate control points are all located on the boundary of the text area to be corrected and have strong correlation with the text image to be corrected. The first alternative control points do not need to be marked sequentially, so that the marking difficulty of the control points can be reduced. On the basis, the position information of each of a plurality of first candidate control points included by at least one first candidate control point sequence and the position information of each of a plurality of expected control points included by an expected control point sequence are utilized to generate a candidate corrected text image corresponding to each of the at least one candidate control point sequence, and finally, a target text image is determined from the at least one candidate corrected text image according to an evaluation result obtained by evaluating each candidate corrected text image, regression prediction is not required to be performed on the control points, and the correction quality of the text image is effectively guaranteed.
Fig. 1 schematically illustrates an exemplary system architecture to which the text image rectification method and apparatus may be applied, according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios. For example, in another embodiment, an exemplary system architecture to which the method and apparatus for correcting a text image may be applied may include a terminal device, but the terminal device may implement the method and apparatus for correcting a text image provided in the embodiments of the present disclosure without interacting with a server.
As shown in fig. 1, the system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired and/or wireless communication links, and so forth.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as a knowledge reading application, a web browser application, a search application, an instant messaging tool, a mailbox client, and/or social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services. For example, the Server 105 may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a conventional physical host and a VPS (Virtual Private Server, VPS). Server 105 may also be a server of a distributed system or a server that incorporates a blockchain.
It should be noted that the text image rectification method provided by the embodiment of the present disclosure may be generally executed by the terminal device 101, 102, or 103. Accordingly, the text image rectification device provided by the embodiment of the present disclosure may also be disposed in the terminal device 101, 102, or 103.
Alternatively, the text image rectification method provided by the embodiment of the present disclosure may also be generally executed by the server 105. Accordingly, the text image rectification device provided by the embodiment of the present disclosure may be generally disposed in the server 105. The text image rectification method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the text image rectification device provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
It should be noted that the sequence numbers of the respective operations in the following methods are merely used as representations of the operations for description, and should not be construed as representing the execution order of the respective operations. The method need not be performed in the exact order shown, unless explicitly stated.
Fig. 2 schematically shows a flowchart of a text image rectification method according to an embodiment of the present disclosure.
As shown in FIG. 2, the method 200 includes operations S210-S230.
In operation S210, at least one first alternative control point sequence is determined from a boundary of a text region to be corrected of a text image to be corrected. The first sequence of alternative control points includes a plurality of first alternative control points.
In operation S220, candidate corrected text image data of at least one candidate corrected text image is obtained according to the position information of each of the plurality of first candidate control points included in the at least one first candidate control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected.
In operation S230, a target corrected text image is determined from the at least one corrected text image candidate according to an evaluation result obtained by evaluating the corrected text image candidate data of the at least one corrected text image candidate.
According to an embodiment of the present disclosure, a text image may refer to an image including text information. The text information may include flat text information and non-flat text information. The non-flat text information may include warped text information. The region occupied by the text information on the text image may be referred to as a text region. The text regions corresponding to the flattened text information may be referred to as flattened text regions. Text regions corresponding to non-flat text information may be referred to as non-flat text regions. If the text image comprises non-flat text regions and the non-flat text regions need to be corrected, the non-flat text regions may be referred to as text regions to be corrected, and the text image is referred to as a text image to be corrected. A text image may be referred to as a flattened text image if the text image does not include non-flattened text regions. The intended text image may be a flat text image, i.e., the intended text image may refer to a text image that does not include non-flat text regions. The expected text image may refer to a text image expected to be corrected for the text image to be corrected. The expected text image corresponds to the text image to be rectified.
According to an embodiment of the present disclosure, the text region may include an interior of the text region and an outline of the text region. The contour may include a plurality of boundaries. For example, the plurality of boundaries may include a first boundary along the reading direction and a second boundary along the reading direction. The first boundary and the second boundary may be two corresponding boundaries. For example, the first boundary is an upper boundary. The second boundary is a lower boundary. The boundary may include a plurality of control points.
According to an embodiment of the present disclosure, for a text region to be corrected, the text region to be corrected may include an outline of the text region to be corrected. The first sequence of alternative control points may include a plurality of first alternative control points. The plurality of first candidate control points may be control points on the outline of the text region to be corrected. For example, the plurality of first alternative control points may be located at least one first alternative control point on a first boundary along the reading direction and at least one first alternative control point on a second boundary along the reading direction of the text region to be corrected. The number of first alternative control points located at different boundaries of the outline of the text region to be corrected may be the same or different. The distance between two adjacent first alternative control points located on the same boundary may be the same or different. For example, the distances between two adjacent first candidate control points located on the same boundary are the same, i.e., the first candidate control points located on the same boundary are uniformly distributed.
According to embodiments of the present disclosure, the expected text image may include an outline of the expected text region. The expected text region may refer to a text region which is expected to be corrected in the text region to be corrected of the text image to be corrected. For the expected text region, the expected text region may include an outline of the expected text region. The sequence of expected control points may include a plurality of expected control points. The plurality of expected control points may be control points on an outline of the expected text region. For example, the plurality of intended control points may be located at least one intended control point on a first boundary of the intended text region in the reading direction and at least one intended control point on a second boundary of the intended text region in the reading direction. The number of desired control points located at different boundaries of the outline of the desired text region may be the same or different. The distance between two adjacent desired control points located on the same boundary may be the same or different. For example, the distances between two adjacent expected control points located on the same boundary are the same, i.e., the expected control points located on the same boundary are uniformly distributed.
According to an embodiment of the disclosure, for each of the at least one first alternative sequence of control points, each of a plurality of first alternative control points comprised by the first alternative sequence of control points, there is an expected control point corresponding to the first alternative control point among a plurality of expected control points comprised by the expected sequence of control points. That is, the plurality of desired control points included in the desired control point sequence may correspond one-to-one to the first alternative control points included in each of the first alternative control point sequences.
According to the embodiment of the disclosure, image segmentation can be performed on the text image data to be corrected of the text image to be corrected, so that the text region to be corrected of the text image to be corrected is obtained. After obtaining the text region to be corrected, a plurality of first contour points may be extracted from the contour of the text region to be corrected using a contour point extraction strategy. The contour point extraction strategy may include a findContours function in OpenCV. A plurality of first alternative control points may be determined from the plurality of first contour points. And sequencing the plurality of first candidate control points at least once to obtain at least one first candidate control point sequence. Different first alternative control point sequences may be used to cooperate to obtain different alternative rectified text images. For example, the plurality of first alternative control points may include at least one first alternative control point on a first boundary in the reading direction and at least one first alternative control point on a second boundary in the reading direction of the text region to be corrected. The number of first alternative control points located at the first boundary and the number of first alternative control points located at the second boundary may be the same. The distance between two adjacent first alternative control points located on the same boundary may be the same.
According to an embodiment of the present disclosure, an expected text region may be determined according to a text region to be corrected. A desired sequence of control points is determined from the boundaries of the desired text region. The sequence of expected control points may include a plurality of expected control points. For example, after obtaining the expected text region, a plurality of second contour points may be extracted from the contour of the expected text region using a contour point extraction strategy. A plurality of desired control points may be determined from the plurality of second contour points. Each desired control point may have a first alternative control point corresponding to the desired control point. For example, the plurality of intended control points may include at least one intended control point on a first boundary of the intended text image along the reading direction region and at least one intended control point on a second boundary along the reading direction. The number of control points expected to be located at the first boundary and the number of control points expected to be located at the second boundary may be the same. The distance between two adjacent desired control points located on the same boundary may be the same.
According to the embodiment of the disclosure, for each first alternative control point sequence in at least one first alternative control point sequence, the alternative corrected text image corresponding to the first alternative control point sequence is determined according to the position information of each of the plurality of first alternative control points included in the first alternative control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence. Thereby, alternative rectified text images corresponding to the at least one first alternative control point sequence can be obtained.
According to the embodiment of the disclosure, after obtaining at least one alternative corrected text image, flatness evaluation may be performed on alternative corrected text image data of each of the at least one alternative corrected text image to obtain an evaluation result. And determining a target corrected text image from the at least one alternative corrected text image according to the evaluation result.
According to the embodiment of the disclosure, the plurality of first candidate control points are all on the boundary of the text region to be corrected, and have strong correlation with the text image to be corrected. The first alternative control points do not need to be labeled sequentially, so that the labeling difficulty of the control points can be reduced. On the basis, the position information of each of a plurality of first candidate control points included by at least one first candidate control point sequence and the position information of each of a plurality of expected control points included by an expected control point sequence are utilized to generate a candidate corrected text image corresponding to each of the at least one candidate control point sequence, and finally, a target text image is determined from the at least one candidate corrected text image according to an evaluation result obtained by evaluating each candidate corrected text image, regression prediction is not required to be performed on the control points, and the correction quality of the text image is effectively guaranteed.
According to an embodiment of the present disclosure, operation S210 may include the following operations.
And determining a plurality of second alternative control point sequences from the boundaries of the text region to be corrected of the text image to be corrected. At least one first alternative control point sequence is determined from the plurality of second alternative control point sequences.
According to an embodiment of the present disclosure, the second alternative control point sequence may comprise a plurality of second alternative control points. The plurality of second candidate control points may be control points on the outline of the text region to be corrected. For example, the plurality of second alternative control points may be located at least one second alternative control point on a first boundary along the reading direction and at least one second alternative control point on a second boundary along the reading direction of the text region to be corrected. The number of second alternative control points located at different boundaries of the outline of the text region to be corrected may be the same or different. The distance between two adjacent second alternative control points located on the same boundary may be the same or different. For example, the distances between two adjacent second alternative control points located on the same boundary are the same, i.e., the second alternative control points located on the same boundary are uniformly distributed.
According to the embodiment of the disclosure, image segmentation can be performed on the text image data to be corrected of the text image to be corrected, so that the text region to be corrected of the text image to be corrected is obtained. After obtaining the text region to be corrected, a plurality of first contour points may be extracted from the contour of the text region to be corrected using a contour point extraction strategy. A plurality of second alternative control points may be determined from the plurality of first contour points.
According to the embodiment of the disclosure, the plurality of second candidate control points may be sorted for a plurality of times to obtain a plurality of second candidate control point sequences. For example, each second alternative control point may have a sequence number corresponding to the second alternative control point. The respective sequence numbers of the second candidate control points may be sorted for multiple times to obtain multiple sequence numbers. And obtaining a plurality of second alternative control point sequences according to the plurality of sequence numbers. After obtaining the plurality of second alternative control point sequences, at least one first alternative control point sequence may be determined from the plurality of second alternative control point sequences based on a predetermined sequence screening strategy. The predetermined sequence screening strategy may refer to a strategy of how to determine from the plurality of second alternative control point sequences that a predetermined sequence condition is satisfied. Satisfying the predetermined sequence condition may include any two line segments being disjoint. Each line segment may be determined from an alternative pair of control points. Each candidate control point pair may include a second candidate control point on the first boundary and a second candidate control point on the second boundary corresponding to the first candidate control point. Each line segment may be a line between two second alternative control points comprised by the alternative control point pair.
According to the embodiment of the disclosure, the second candidate control point sequences are screened to obtain at least one first candidate control point sequence, so that a more accurate candidate control point sequence participating in generation of the candidate corrected text image is obtained, and the correction quality of the text image is improved.
According to an embodiment of the present disclosure, the second alternative control point sequence may comprise a plurality of second alternative control points. The plurality of second alternative control points may include M second alternative control points on a first boundary of the text region to be corrected in the reading direction and M second alternative control points on a second boundary.
According to the embodiment of the disclosure, the second candidate control point corresponding to the 0 th sequence number represents the top left corner point. And the second alternative control point corresponding to the (M-1) th serial number represents the upper right corner point. And the second alternative control point corresponding to the Mth serial number represents a lower right corner point. And the second alternative control point corresponding to the (2M-1) th serial number represents the lower left corner point. M is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the reading direction may be an X-axis direction. The first boundary and the second boundary may be two opposing boundaries. For example, the first boundary may be an upper boundary. The second boundary may be a lower boundary. The distance between two adjacent second alternative control points located at the first boundary may be the same. The distance between two adjacent second alternative control points located on the second boundary may be the same. I.e. the second alternative control points located at the same boundary are evenly distributed.
According to an embodiment of the disclosure, in each second alternative control point sequence, each second alternative control point may have a sequence number corresponding to the second alternative control point. Different sequence numbers may have different meanings, and therefore different second alternative sequences of control points correspond to different alternative rectified text images.
According to an embodiment of the present disclosure, the second alternative control point sequence may include 2M second alternative control points. M may be an integer greater than or equal to 1. The value of M may be configured according to actual service requirements, and is not limited herein. For example, M ═ 24. 2M sequence numbers, i.e., 0 th sequence number through (2M-1) th sequence number, may be included. The second candidate control point corresponding to the 0 th sequence number may characterize the top left corner point. The second alternative control point corresponding to the (M-1) th sequence number may characterize the top right corner point. The second candidate control point corresponding to the mth sequence number may represent the lower right corner point. The second alternative control point corresponding to the (2M-1) th sequence number may characterize the lower left corner point. The second candidate control point corresponding to the 0 th sequence number to the second candidate control point corresponding to the (2M-1) th sequence number may be arranged clockwise from the first boundary to the second boundary, that is, the first boundary may include the second candidate control point corresponding to the 0 th sequence number to the second candidate control point corresponding to the (M-1) th sequence number. The second boundary may include a second candidate control point corresponding to the mth sequence number to a second candidate control point corresponding to the (2M-1) th sequence number.
For example, M ═ 24. The first boundary comprises a second alternative control point corresponding to the 0 th serial number to a second alternative control point corresponding to the 23 rd serial number; an alternative control point. The second boundary includes a second candidate control point corresponding to the 24 th order to a second candidate control point corresponding to the 47 th order. The first distances between two adjacent second alternative control points located on the first boundary are the same. The second distances between two adjacent second alternative control points located on the second boundary are the same. The first distance and the second distance may be the same.
According to an embodiment of the present disclosure, determining at least one first alternative control point sequence from the plurality of second alternative control point sequences may include the following operations.
And for each second candidate control point sequence in the plurality of second candidate control point sequences, determining the second candidate control point sequence as the first candidate control point sequence when the k line segment and the h line segment do not intersect according to the position information of the second candidate control point corresponding to the k-th sequence number on the first boundary, the position information of the second candidate control point corresponding to the (k + M) th sequence number on the second boundary, the position information of the second candidate control point corresponding to the h-th sequence number on the first boundary and the position information of the second candidate control point corresponding to the (h + M) th sequence number on the second boundary. The kth line segment is determined according to the second candidate control point corresponding to the kth sequence number and the second candidate control point corresponding to the (k + M) th sequence number. The h-th line segment is determined according to the second candidate control point corresponding to the h-th sequence number and the second candidate control point corresponding to the (h + M) -th sequence number. k and h are each an integer of 0 or more and (M-1) or less and k ≠ h.
According to an embodiment of the present disclosure, each second alternative control point sequence may include 2M second alternative control points. The 2M second alternate control points may include M alternate control point pairs. Each alternate control point pair may include two corresponding second alternate control points. Connecting two corresponding second candidate control points included in each candidate control point pair may obtain a line segment, that is, each candidate control point pair may have a line segment corresponding to the candidate control point pair. Each second alternative sequence of control points may comprise M line segments. The ith candidate control point pair may include a second candidate control point corresponding to the ith sequence number on the first boundary and a second candidate control point corresponding to the (l + M) th sequence number on the second boundary. The line segment l corresponds to the ith candidate control point pair, that is, the line segment l may be obtained by connecting a second candidate control point corresponding to the ith sequence number on the first boundary and a second candidate control point corresponding to the (l + M) th sequence number on the second boundary, which are included in the ith candidate control point pair. l may be an integer greater than or equal to 0 and less than or equal to (M-1).
According to an embodiment of the present disclosure, for each of a plurality of second alternative sequences of control points. If any two line segments of the M line segments corresponding to the second candidate control point sequence are not intersected, the second candidate control point sequence may be determined as the first candidate control point sequence. That is, based on the straddle experiment algorithm or the fast repulsion experiment algorithm, it is possible to determine whether the l ═ k line segment intersects the l ═ h line segment or not, based on the position information of the second candidate control point corresponding to the l ═ k sequence number on the first boundary, the position information of the second candidate control point corresponding to the (k + M) th sequence number on the second boundary, the position information of the second candidate control point corresponding to the l ═ h sequence number on the first boundary, and the position information of the second candidate control point corresponding to the (h + M) th sequence number on the second boundary. If it is determined that the kth line segment and the h line segment do not intersect, a second alternative sequence of control points may be determined as the first alternative sequence of control points.
According to an embodiment of the present disclosure, determining a plurality of second alternative control point sequences from the boundaries of the text region to be corrected of the text image to be corrected may include the following operations.
And adjusting the sequence numbers of the second alternative control points for multiple times based on a preset sequencing strategy to obtain a sequence number sequence after each adjustment. And obtaining each second alternative control point sequence according to the sequence number sequence after each adjustment.
According to an embodiment of the present disclosure, the predetermined ranking policy may refer to a policy how to rank the plurality of second alternative control points. For example, the predetermined ordering policy may be a policy of adjusting respective sequence numbers of the plurality of second candidate control points.
According to the embodiment of the disclosure, in each second alternative control point sequence, each second alternative control point has a sequence number corresponding to the second alternative control point. And the sequence numbers corresponding to the same second alternative control point in different second alternative control point sequences are different. A plurality of sequence number sequences can be obtained by adjusting the sequence number corresponding to each second alternative control point. And obtaining a second alternative control point sequence corresponding to the sequence number sequence according to each sequence number sequence in the sequence number sequences.
According to the embodiment of the disclosure, based on a predetermined sorting policy, the respective sequence numbers of the plurality of second candidate control points are adjusted for a plurality of times to obtain a sequence number after each adjustment, which may include the following operations.
And aiming at the ith adjustment, adjusting the serial number of the second candidate control point corresponding to the initial serial number of the ith serial number to be the 0 th serial number. i is an integer of 1 or more and 2M-1 or less. And when j-i is larger than 0, adjusting the serial number of the second alternative control point corresponding to the j-th serial number as the initial serial number to be the (j-i) th serial number. j is an integer greater than or equal to 1 and less than or equal to (2M-1) and j ≠ i. And when j-i is less than 0, adjusting the serial number of the second candidate control point corresponding to the j serial number as the initial serial number to be the (j +2M-i) th serial number.
According to the embodiment of the disclosure, for the ith adjustment in the (2M-1) adjustments, the sequence number of the second candidate control point corresponding to the initial sequence number being the ith sequence number may be adjusted to the 0 th sequence number. If j-i > 0, the sequence number of the second candidate control point corresponding to the initial sequence number being the jth sequence number can be adjusted to the (j-i) th sequence number. If j-i is less than 0, the sequence number of the second alternative control point corresponding to the initial sequence number being the jth sequence number can be adjusted to be the (j +2M-i) th sequence number. Therefore, each second alternative control point in the 2M second alternative sequence points can be traversed, each second alternative control point is used as a second alternative control point corresponding to the 0 th sequence number, and the sequence numbers of the plurality of second alternative control points are marked again.
According to an embodiment of the present disclosure, operation S220 may include the following operations.
And obtaining alternative corrected text image data of the at least one alternative corrected text image according to the position information of the plurality of first alternative control points included by the at least one first alternative control point sequence and the position information of the plurality of expected control points included by the expected control point sequence of the expected text image corresponding to the text image to be corrected based on a preset transformation algorithm.
According to an embodiment of the present disclosure, the predetermined transformation algorithm may be used to obtain alternative rectified text image data of the alternative rectified text image according to the respective position information of the plurality of first alternative control points and the respective position information of the plurality of expected control points. The predetermined transformation algorithm may comprise a non-rigid registration algorithm. The non-rigid registration algorithm may comprise a thin-plate spline interpolation algorithm.
According to an embodiment of the present disclosure, for each of the at least one first candidate control point sequence, based on a predetermined transformation matrix, all pixels of the expected text image may be interpolated according to the position information of each of the plurality of first candidate control points included in the first candidate control point sequence and the position information of the expected control point corresponding to each of the plurality of first candidate control points, so as to obtain candidate corrected text image data of the candidate corrected text image.
According to an embodiment of the present disclosure, the predetermined transformation algorithm may include a Thin Plate Spline (TPS) interpolation algorithm.
According to an embodiment of the present disclosure, for each of the plurality of first candidate control point sequences, the first TPS transformation matrix may be determined according to position information of each of the plurality of first candidate control points included in the first candidate control point sequence and position information of each of the plurality of expected control points included in the expected control point sequence. And determining the respective position information of all control points included in the alternative corrected text image according to the first TPS transformation matrix. And determining a pixel value corresponding to each control point in the text image to be rectified according to the first TPS transformation matrix and the respective position information of all the control points included in the alternative rectified text image. And processing the pixel value corresponding to each control point by utilizing an interpolation algorithm to obtain the respective pixel values of all the control points included in the alternative corrected text image. And generating alternative corrected text image data of the alternative corrected text image according to the respective pixel values of all the control points included in the alternative corrected text image. The interpolation algorithm may comprise a bilinear interpolation algorithm.
According to an embodiment of the present disclosure, determining the first TPS transformation matrix according to the position information of each of the plurality of first candidate control points included in the first candidate control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence may include: and determining a first TPS transformation parameter according to the position information of each of a plurality of first candidate control points included in the first candidate control point sequence. The first radial basis function is determined based on respective position information of a plurality of prospective control points comprised by the sequence of prospective control points. And determining a first TPS transformation matrix between the text image data to be rectified and the alternative rectified text image data according to the first TPS transformation parameter and the position information of each of a plurality of expected control points included in the expected control point sequence.
According to an embodiment of the present disclosure, operation S230 may include the following operations.
And evaluating the alternative corrected text image data of the at least one alternative corrected text image to obtain the respective evaluation value of the at least one alternative corrected text image. A target evaluation value is determined from the respective evaluation values of the at least one candidate rectified text image. And determining the alternative corrected text image corresponding to the target evaluation value as the target corrected text image.
According to an embodiment of the present disclosure, the evaluation value may be used to characterize the flatness of the alternative rectified text image. The relationship between the magnitude of the evaluation value and the flatness may be set according to actual service requirements, and is not limited herein. For example, the larger the value of the evaluation value, the better the flatness. Alternatively, the smaller the value of the evaluation value, the better the flatness.
According to the embodiment of the disclosure, for each of at least one alternative rectified text image, the alternative rectified text image data of the alternative rectified text image is evaluated to obtain the evaluation value of the alternative rectified text image. For example, the candidate corrected text image data of the candidate corrected text image may be evaluated based on the flatness evaluation policy to obtain an evaluation value of the candidate corrected text image. Thereby, evaluation values corresponding to the respective at least one candidate rectified text image can be obtained.
According to the embodiment of the disclosure, the evaluation values of the at least one candidate corrected text image and the evaluation values of the at least one candidate corrected text image can be ranked to obtain a ranking result. Determining a target evaluation value from the at least one evaluation value according to the ranking result. The target evaluation value may be the maximum evaluation value if the larger the numerical value of the evaluation value, the better the flatness. The target evaluation value may be the most effective evaluation value if the smaller the value of the evaluation value, the better the flatness.
According to an embodiment of the present disclosure, evaluating the candidate rectified text image data of the at least one candidate rectified text image to obtain a respective evaluation value of the at least one candidate rectified text image may include the following operations.
And processing the alternative corrected text image data of the at least one alternative corrected text image by using the flat text image recognition model to obtain respective evaluation values of the at least one alternative corrected text image. The flat text image recognition model is obtained by training a preset classifier by using a training sample. The training samples include sample flat text image data for a sample flat text image and sample warped text image data for a sample warped text image. The sample warped text image data is obtained by warping the sample flat text image data.
According to an embodiment of the present disclosure, the predetermined classifier may include a decision tree model, a logistic regression model, or a neural network model. The type of the predetermined classifier may be configured according to actual service requirements, and is not limited herein.
In accordance with an embodiment of the present disclosure, the sample warped text image data may be warped sample flat text image data. The training sample may include, in addition to the sample flat text image data and the sample warped text image data, a first true result corresponding to the sample flat text image and a second true result corresponding to the sample warped text image. For example, the first true result may be characterized by a first predetermined identity. The second true result may be characterized by a second predetermined identity. The first predetermined flag may be "1". The second predetermined flag may be "0".
According to an embodiment of the present disclosure, sample flattened text image data may be processed using a predetermined classifier to obtain a first result corresponding to the sample flattened text image. The sample warped text image data is processed using a predetermined classifier to obtain a second result corresponding to the sample warped text image. Based on the loss function, an output value is obtained according to the first result, the second result, the first true result and the second true result. And adjusting the model parameters of the preset classifier according to the output value until the preset condition is met. And determining a predetermined classifier obtained under the condition that a predetermined condition is met as the flat text image recognition. For example, the first result and the first true result are input to a loss function, resulting in a first output value. And inputting the second result and the second real result into a loss function to obtain a second output value. And obtaining an output value according to the first output value and the second output value. The satisfaction of the predetermined condition may include at least one of: the output value converges and the training round reaches the maximum training round.
According to the embodiment of the disclosure, the flat text image recognition model can be utilized to process the data of the at least one alternative corrected text image, and the respective evaluation value of the at least one alternative corrected text image is obtained. For example, for each candidate rectified text image data of the at least one candidate rectified text image data, the candidate rectified text image data is processed by using the flat text image recognition model to obtain an evaluation value of the candidate rectified text image.
According to the embodiment of the disclosure, the quality of at least one generated alternative corrected text image is evaluated by using the flat text image recognition model, regression prediction of alternative control points is not needed, the implementation is easy, and the scale of model parameters is small, so that the flat text image recognition model can be trained quickly. In addition, the sample distorted text image data is obtained by distorting the sample flat text image data, so that the difficulty of obtaining the sample distorted text image data is reduced, manual marking of the sample distorted text image data is not needed, marking cost is reduced, and the robustness of the flat text image recognition model in an unmarked scene is improved.
According to an embodiment of the present disclosure, the sample warped text image data is obtained by warping sample flat text image data, and may include: the sample warped text image data is obtained based on a predetermined transformation algorithm from position information of each of a plurality of first sample control points included in the first sequence of sample control points and position information of each of a plurality of second sample control points included in the second sequence of sample control points corresponding to the sample flattened text image. The first sequence of sample control points is derived from the second sequence of sample control points.
According to an embodiment of the present disclosure, the first sequence of sample control points may include a plurality of first sample control points. The second sequence of sample control points may comprise a plurality of second sample control points. Each first sample control point may have a sequence number corresponding to the first sample control point. The respective sequence numbers of the plurality of first sample control points may be adjusted to obtain the adjusted sequence numbers of the plurality of first sample control points. And sequencing the first sample control points according to the adjusted sequence numbers of the plurality of first sample control points to obtain a second sample control point sequence. The above-mentioned adjustment may be random adjustment or sequential adjustment.
According to an embodiment of the present disclosure, the predetermined transformation algorithm may comprise a non-rigid registration algorithm. The second TPS transform matrix may be determined according to position information of each of a plurality of first sample control points included in the first sample control point sequence and position information of each of a plurality of second sample control points included in the second sample control point sequence. And determining the respective position information of all control points included in the sample warped text image according to the second TPS transformation matrix. And determining the pixel value corresponding to each sample control point in the sample warped text image according to the TPS transformation matrix and the respective position information of all the control points included in the sample warped text image. And processing the pixel value corresponding to each sample control point by utilizing an interpolation algorithm to obtain the respective pixel values of all the sample control points included in the sample distorted text image. Sample warped text image data of the sample warped text image is generated from respective pixel values of all sample control points included in the sample warped text image. The interpolation algorithm may comprise a bilinear interpolation algorithm.
According to an embodiment of the present disclosure, determining the second TPS transform matrix according to the position information of each of the plurality of first sample control points included in the first sample control point sequence and the position information of each of the plurality of second sample control points included in the second sample control point sequence may include: and determining a second TPS conversion parameter according to the position information of each of the plurality of first sample control points included in the first sample control point sequence. And determining a second radial basis function according to the position information of each of the plurality of second sample control points included in the second sample control point sequence. And determining a second TPS transformation matrix between the sample flat text image data and the sample warped text image data according to the second TPS transformation parameters and the position information of each of the plurality of second sample control points included in the second sample control point sequence.
The following further describes the text image rectification method according to an embodiment of the disclosure with reference to fig. 3A, fig. 3B, and fig. 3C in combination with a specific embodiment.
Fig. 3A schematically illustrates an example schematic of a text image rectification process according to an embodiment of the disclosure.
As shown in fig. 3A, in 300A, a plurality of second alternative control points 302 are determined from the boundaries of the text region to be corrected of the text image to be corrected 301. And adjusting the sequence numbers of the second alternative control points 302 for multiple times based on a preset sequencing strategy to obtain a sequence number after each adjustment. And obtaining each second alternative control point sequence according to the sequence number sequence after each adjustment, so that a plurality of second alternative control point sequences 303 can be obtained. At least one first alternative control point sequence 304 is determined from the plurality of second alternative control point sequences 303.
And obtaining alternative corrected text image data 306 of the at least one alternative corrected text image according to the position information of each of the plurality of first alternative control points included in the at least one first alternative control point sequence 304 and the position information of each of the plurality of expected control points included in the expected control point sequence 305 of the expected text image corresponding to the text image to be corrected.
The candidate rectified text image data 306 of the at least one candidate rectified text image is processed by the flat text image recognition model 307, resulting in respective evaluation values 308 of the at least one candidate rectified text image. A target evaluation value is determined from the respective evaluation values 308 of the at least one candidate rectified text image. The corrected text image alternative corresponding to the target evaluation value is determined as the target corrected text image 309.
Fig. 3B schematically shows an example schematic diagram of a second alternative control point sequence according to an embodiment of the present disclosure.
As shown in fig. 3B, in 300B, the plurality of second alternative control point sequences 303 in fig. 3A may be 10 second alternative control point sequences, i.e., a second alternative control point sequence 303_1 to a second alternative control point sequence 303_ 10. Each second alternative control point sequence comprises 10 second alternative control points, namely a second alternative control point 303_11, a second alternative control point 303_12, a second alternative control point 303_13, a second alternative control point 303_14 and a second alternative control point 303_15 on a first boundary in the reading direction, and a second alternative control point 303_16, a second alternative control point 303_17, a second alternative control point 303_18, a second alternative control point 303_19 and a second alternative control point 303_20 on a second boundary. Each second alternative control point has a sequence number corresponding to the second alternative control point.
The second candidate control point sequence 303_1 includes a second candidate control point 303_11 corresponding to the 0 th order, a second candidate control point 303_12 corresponding to the 1 st order, a second candidate control point 303_13 corresponding to the 2 nd order, a second candidate control point 303_14 corresponding to the 3 rd order, a second candidate control point 303_15 corresponding to the 4 th order, a second candidate control point 303_16 corresponding to the 5 th order, a second candidate control point 303_17 corresponding to the 6 th order, a second candidate control point 303_18 corresponding to the 7 th order, a second candidate control point 303_19 corresponding to the 8 th order, and a second candidate control point 303_20 corresponding to the 9 th order.
The second candidate control point sequence 303_10 includes a second candidate control point 303_20 corresponding to the 0 th order, a second candidate control point 303_11 corresponding to the 1 st order, a second candidate control point 303_12 corresponding to the 2 nd order, a second candidate control point 303_13 corresponding to the 3 rd order, a second candidate control point 303_14 corresponding to the 4 th order, a second candidate control point 303_15 corresponding to the 5 th order, a second candidate control point 303_16 corresponding to the 6 th order, a second candidate control point 303_17 corresponding to the 7 th order, a second candidate control point 303_18 corresponding to the 8 th order, and a second candidate control point 303_19 corresponding to the 9 th order.
Fig. 3C schematically illustrates an example schematic diagram of a prospective control point sequence in accordance with an embodiment of the disclosure.
As shown in fig. 3C, in 300C, the expected control point sequence 305 in fig. 3A includes 10 expected control points, i.e., an expected control point 3050 corresponding to sequence number 0, an expected control point 3051 corresponding to sequence number 1, an expected control point 3052 corresponding to sequence number 2, an expected control point 3053 corresponding to sequence number 3, an expected control point 3054 corresponding to sequence number 4, an expected control point 3055 corresponding to sequence number 5, an expected control point 3056 corresponding to sequence number 6, an expected control point 3057 corresponding to sequence number 7, an expected control point 3058 corresponding to sequence number 8, and an expected control point 3059 corresponding to sequence number 9.
The intended control point in fig. 3C corresponds to the second alternative control point with the same number in fig. 3B.
In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.
The above is only an exemplary embodiment, but is not limited thereto, and other text image rectification methods known in the art may be included as long as rectification quality of the text image can be ensured.
Fig. 4 schematically shows a block diagram of a text image rectification apparatus according to an embodiment of the present disclosure.
As shown in fig. 4, the text image rectification apparatus 400 may include a first determination module 410, a first obtaining module 420, and a second determination module 430.
A first determining module 410, configured to determine at least one first alternative control point sequence from a boundary of a text region to be corrected of a text image to be corrected. The first sequence of alternative control points includes a plurality of first alternative control points.
A first obtaining module 420, configured to obtain candidate corrected text image data of the at least one candidate corrected text image according to the respective position information of the plurality of first candidate control points included in the at least one first candidate control point sequence and the respective position information of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected.
The second determining module 430 is configured to determine the target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating alternative corrected text image data of the at least one alternative corrected text image.
According to an embodiment of the present disclosure, the first obtaining module 420 may include a first obtaining sub-module, a first determining sub-module, and a second determining sub-module.
The first obtaining sub-module is used for evaluating the alternative corrected text image data of the at least one alternative corrected text image to obtain an evaluation value of the at least one alternative corrected text image.
A first determining sub-module for determining a target evaluation value from the respective evaluation values of the at least one candidate rectified text image.
And a second determining sub-module for determining the alternative corrected text image corresponding to the target evaluation value as the target corrected text image.
According to an embodiment of the present disclosure, the first obtaining sub-module may include an obtaining unit.
And the obtaining unit is used for processing the alternative corrected text image data of the at least one alternative corrected text image by using the flat text image recognition model to obtain the respective evaluation value of the at least one alternative corrected text image. The flat text image recognition model is obtained by training a preset classifier by using a training sample, the training sample comprises sample flat text image data of a sample flat text image and sample distorted text image data of a sample distorted text image, and the sample distorted text image data is obtained by distorting the sample flat text image data.
According to an embodiment of the present disclosure, the warping the sample flat text image data may include: the sample warped text image data is obtained based on a predetermined transformation algorithm from position information of each of a plurality of first sample control points included in the first sequence of sample control points and position information of each of a plurality of second sample control points included in the second sequence of sample control points corresponding to the sample flattened text image. The first sequence of sample control points is derived from the second sequence of sample control points.
According to an embodiment of the present disclosure, the first determination module 410 may include a third determination submodule and a fourth determination submodule.
And the third determining sub-module is used for determining a plurality of second alternative control point sequences from the boundaries of the text region to be corrected of the text image to be corrected.
And the fourth determination submodule is used for determining at least one first alternative control point sequence from the plurality of second alternative control point sequences.
According to an embodiment of the present disclosure, the second alternative control point sequence comprises a plurality of second alternative control points. The plurality of second alternative control points comprise M second alternative control points on a first boundary and M second alternative control points on a second boundary of the text region to be corrected along the reading direction. And the second alternative control point corresponding to the 0 th sequence number represents the upper left corner point. And the second alternative control point corresponding to the (M-1) th serial number represents the upper right corner point. And the second alternative control point corresponding to the M-th serial number represents a lower right corner point, and the second alternative control point corresponding to the (2M-1) -th serial number represents a lower left corner point. M is an integer greater than or equal to 1.
According to an embodiment of the present disclosure, the fourth determination submodule may include a fifth determination submodule.
And a fifth determining sub-module, configured to determine, for each second candidate control point sequence in the plurality of second candidate control point sequences, the second candidate control point sequence as the first candidate control point sequence when it is determined that the k-th line segment does not intersect the h-th line segment according to the position information of the second candidate control point corresponding to the k-th sequence number on the first boundary, the position information of the second candidate control point corresponding to the (k + M) -th sequence number on the second boundary, the position information of the second candidate control point corresponding to the h-th sequence number on the first boundary, and the position information of the second candidate control point corresponding to the (h + M) -th sequence number on the second boundary. The kth line segment is determined according to the second candidate control point corresponding to the kth sequence number and the second candidate control point corresponding to the (k + M) th sequence number. The h-th line segment is determined according to the second candidate control point corresponding to the h-th sequence number and the second candidate control point corresponding to the (h + M) -th sequence number. k and h are each an integer of 0 or more and (M-1) or less and k ≠ h.
According to an embodiment of the present disclosure, the third determination submodule may include a second obtaining submodule and a third obtaining submodule.
And the second obtaining submodule is used for adjusting the respective sequence numbers of the second alternative control points for multiple times based on a preset ordering strategy to obtain a sequence number sequence after each adjustment.
And the third obtaining submodule is used for obtaining each second alternative control point sequence according to the sequence number sequence after each adjustment.
According to an embodiment of the present disclosure, the second obtaining sub-module may include a first adjusting unit, a second adjusting unit, and a third adjusting unit.
And a first adjusting unit, configured to adjust, for the ith adjustment, the number of the second candidate control point corresponding to the initial number being the ith number to the 0 th number. i is an integer of 1 or more and 2M-1 or less.
And a second adjusting unit for adjusting the sequence number of the second candidate control point corresponding to the j-th sequence number as the initial sequence number to the (j-i) -th sequence number when j-i > 0. j is an integer greater than or equal to 1 and less than or equal to (2M-1) and j ≠ i.
And a third adjusting unit, for adjusting the serial number of the second candidate control point corresponding to the j-th serial number as the initial serial number to the (j +2M-i) th serial number when j-i < 0.
According to an embodiment of the present disclosure, the first obtaining module may include a fourth obtaining submodule.
And the fourth obtaining submodule is used for obtaining the alternative corrected text image data of the at least one alternative corrected text image according to the position information of the plurality of first alternative control points included in the at least one first alternative control point sequence and the position information of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected based on a preset transformation algorithm.
According to an embodiment of the present disclosure, the predetermined transformation algorithm comprises a thin-plate spline interpolation algorithm.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described above.
According to an embodiment of the disclosure, a computer program product comprising a computer program which, when executed by a processor, implements the method as described above.
Fig. 5 schematically shows a block diagram of an electronic device adapted to implement a text image rectification method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the electronic device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as the text image rectification method. For example, in some embodiments, the text image rectification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the text image rectification method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the text image rectification method in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (25)

1. A text image rectification method comprising:
determining at least one first alternative control point sequence from the boundary of a text region to be corrected of a text image to be corrected, wherein the first alternative control point sequence comprises a plurality of first alternative control points;
obtaining alternative corrected text image data of at least one alternative corrected text image according to the position information of each of the plurality of first alternative control points included in the at least one first alternative control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected; and
and determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.
2. The method of claim 1, wherein the determining a target rectified text image from the at least one alternative rectified text image according to an evaluation result of evaluating alternative rectified text image data of the at least one alternative rectified text image comprises:
evaluating the alternative corrected text image data of the at least one alternative corrected text image to obtain respective evaluation values of the at least one alternative corrected text image;
determining a target evaluation value from the respective evaluation values of the at least one alternative rectified text image; and
determining the alternative corrected text image corresponding to the target evaluation value as the target corrected text image.
3. The method of claim 1 or 2, wherein the evaluating the alternative rectified text image data of the at least one alternative rectified text image to obtain respective evaluated values of the at least one alternative rectified text image comprises:
processing the alternative corrected text image data of the at least one alternative corrected text image by using a flat text image recognition model to obtain respective evaluation values of the at least one alternative corrected text image,
the flat text image recognition model is obtained by training a preset classifier by using a training sample, wherein the training sample comprises sample flat text image data of a sample flat text image and sample distorted text image data of a sample distorted text image, and the sample distorted text image data is obtained by performing distortion processing on the sample flat text image data.
4. The method of claim 3, wherein the sample warped text image data is warped by the sample flat text image data, comprising:
the sample warped text image data is obtained based on a predetermined transformation algorithm from respective position information of a plurality of first sample control points included in a first sample control point sequence and respective position information of a plurality of second sample control points included in a second sample control point sequence corresponding to the sample flat text image, wherein the first sample control point sequence is obtained from the second sample control point sequence.
5. The method according to any one of claims 1 to 4, wherein the determining at least one first alternative control point sequence from the boundary of the text region to be corrected of the text image to be corrected comprises:
determining a plurality of second alternative control point sequences from the boundary of the text region to be corrected of the text image to be corrected; and
determining the at least one first alternative sequence of control points from the plurality of second alternative sequences of control points.
6. The method of claim 5, wherein the second sequence of alternative control points comprises a plurality of second alternative control points, the plurality of second alternative control points comprising M second alternative control points on a first boundary and M second alternative control points on a second boundary of the text region to be corrected in the reading direction;
the second alternative control point corresponding to the 0 th serial number represents an upper left corner point, the second alternative control point corresponding to the (M-1) th serial number represents an upper right corner point, the second alternative control point corresponding to the M th serial number represents a lower right corner point, and the second alternative control point corresponding to the (2M-1) th serial number represents a lower left corner point, wherein M is an integer greater than or equal to 1.
7. The method of claim 6, wherein the determining at least one first alternative sequence of control points from the plurality of second alternative sequences of control points comprises:
determining, for each of the plurality of second candidate control point sequences, a second candidate control point sequence as the first candidate control point sequence if it is determined that the k-th line segment does not intersect the h-th line segment based on the position information of the second candidate control point corresponding to the k-th sequence number on the first boundary, the position information of the second candidate control point corresponding to the (k + M) -th sequence number on the second boundary, the position information of the second candidate control point corresponding to the h-th sequence number on the first boundary, and the position information of the second candidate control point corresponding to the (h + M) -th sequence number on the second boundary,
the kth line segment is determined according to a second alternative control point corresponding to the kth serial number and a second alternative control point corresponding to the (k + M) th serial number;
the h line segment is determined according to a second alternative control point corresponding to the h serial number and a second alternative control point corresponding to the (h + M) serial number;
wherein k and h are each an integer of 0 or more and (M-1) or less and k ≠ h.
8. The method according to claim 6 or 7, wherein the determining a plurality of second alternative control point sequences from the boundaries of the text region to be corrected of the text image to be corrected comprises:
based on a preset sorting strategy, adjusting the respective sequence numbers of the plurality of second alternative control points for multiple times to obtain a sequence number sequence after each adjustment; and
and obtaining each second alternative control point sequence according to the sequence number sequence after each adjustment.
9. The method according to claim 8, wherein the adjusting the respective sequence numbers of the plurality of second candidate control points a plurality of times based on a predetermined ordering policy to obtain a sequence number after each adjustment, comprises:
aiming at the ith adjustment, adjusting the sequence number of a second alternative control point corresponding to the ith sequence number as the 0 th sequence number, wherein i is an integer which is greater than or equal to 1 and less than or equal to (2M-1);
if j-i is larger than 0, adjusting the sequence number of the second candidate control point corresponding to the j-th sequence number as the initial sequence number to be the (j-i) th sequence number, wherein j is an integer which is larger than or equal to 1 and smaller than or equal to (2M-1) and j is not equal to i; and
and when j-i is less than 0, adjusting the serial number of the second alternative control point corresponding to the j-th serial number as the initial serial number to be the (j +2M-i) th serial number.
10. The method according to any one of claims 1 to 9, wherein the obtaining of the candidate corrected text image data of at least one candidate corrected text image according to the position information of each of the plurality of first candidate control points included in the at least one first candidate control point sequence and the position information of each of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected comprises:
and obtaining alternative corrected text image data of the at least one alternative corrected text image according to the position information of the plurality of first alternative control points included in the at least one first alternative control point sequence and the position information of the plurality of expected control points included in the expected control point sequence of the expected text image corresponding to the text image to be corrected based on a preset transformation algorithm.
11. The method of claim 10, wherein the predetermined transformation algorithm comprises a thin-plate spline interpolation algorithm.
12. A text image rectification apparatus comprising:
the correction device comprises a first determination module, a second determination module and a correction module, wherein the first determination module is used for determining at least one first alternative control point sequence from the boundary of a text region to be corrected of a text image to be corrected, and the first alternative control point sequence comprises a plurality of first alternative control points;
a first obtaining module, configured to obtain candidate corrected text image data of at least one candidate corrected text image according to respective position information of a plurality of first candidate control points included in the at least one first candidate control point sequence and respective position information of a plurality of expected control points included in an expected control point sequence of an expected text image corresponding to the text image to be corrected; and
and the second determining module is used for determining a target corrected text image from the at least one alternative corrected text image according to an evaluation result obtained by evaluating the alternative corrected text image data of the at least one alternative corrected text image.
13. The apparatus of claim 12, wherein the first obtaining means comprises:
the first obtaining submodule is used for evaluating the alternative corrected text image data of the at least one alternative corrected text image to obtain the respective evaluation value of the at least one alternative corrected text image;
a first determining sub-module for determining a target evaluation value from the respective evaluation values of the at least one alternative rectified text image; and
a second determining sub-module configured to determine the candidate corrected text image corresponding to the target evaluation value as the target corrected text image.
14. The apparatus of claim 12 or 13, wherein the first obtaining submodule comprises:
an obtaining unit configured to process candidate corrected text image data of the at least one candidate corrected text image using a flat text image recognition model to obtain respective evaluation values of the at least one candidate corrected text image,
the flat text image recognition model is obtained by training a preset classifier by using a training sample, wherein the training sample comprises sample flat text image data of a sample flat text image and sample distorted text image data of a sample distorted text image, and the sample distorted text image data is obtained by performing distortion processing on the sample flat text image data.
15. The apparatus of claim 14, wherein the sample warped text image data is warped by the sample flat text image data, comprising:
the sample warped text image data is obtained based on a predetermined transformation algorithm from respective position information of a plurality of first sample control points included in a first sample control point sequence and respective position information of a plurality of second sample control points included in a second sample control point sequence corresponding to the sample flat text image, wherein the first sample control point sequence is obtained from the second sample control point sequence.
16. The apparatus of any of claims 12-15, wherein the first determining module comprises:
a third determining submodule, configured to determine a plurality of second candidate control point sequences from boundaries of a to-be-corrected text region of the to-be-corrected text image; and
a fourth determination submodule for determining the at least one first alternative sequence of control points from the plurality of second alternative sequences of control points.
17. The apparatus of claim 16, wherein the second sequence of alternative control points comprises a plurality of second alternative control points, the plurality of second alternative control points comprising M second alternative control points on a first boundary and M second alternative control points on a second boundary of the text region to be corrected in the reading direction;
the second alternative control point corresponding to the 0 th serial number represents an upper left corner point, the second alternative control point corresponding to the (M-1) th serial number represents an upper right corner point, the second alternative control point corresponding to the M th serial number represents a lower right corner point, and the second alternative control point corresponding to the (2M-1) th serial number represents a lower left corner point, wherein M is an integer greater than or equal to 1.
18. The apparatus of claim 17, wherein the fourth determination submodule comprises:
a fifth determining sub-module, configured to determine, for each second candidate control point sequence in the plurality of second candidate control point sequences, a second candidate control point corresponding to a kth sequence number on the first boundary according to position information of the second candidate control point corresponding to the kth sequence number on the first boundary, position information of the second candidate control point corresponding to a (k + M) th sequence number on the second boundary, and a second candidate control point corresponding to a h sequence number on the first boundary; determining the second candidate control point sequence as the first candidate control point sequence when determining that the kth line segment and the h-th line segment do not intersect with each other based on the position information of the candidate control points and the position information of the second candidate control point corresponding to the (h + M) -th sequence number on the second boundary,
the kth line segment is determined according to a second alternative control point corresponding to the kth serial number and a second alternative control point corresponding to the (k + M) th serial number;
the h line segment is determined according to a second alternative control point corresponding to the h serial number and a second alternative control point corresponding to the (h + M) serial number;
wherein k and h are each an integer of 0 or more and (M-1) or less and k ≠ h.
19. The apparatus of claim 17 or 18, wherein the third determination submodule comprises:
the second obtaining submodule is used for adjusting the respective serial numbers of the second alternative control points for multiple times based on a preset ordering strategy to obtain a serial number sequence after each adjustment; and
and the third obtaining submodule is used for obtaining each second alternative control point sequence according to the sequence number sequence after each adjustment.
20. The apparatus of claim 19, wherein the second obtaining submodule comprises:
a first adjusting unit, configured to adjust, for the ith adjustment, the serial number of the second candidate control point corresponding to the ith serial number as the initial serial number to be a 0 th serial number, where i is an integer greater than or equal to 1 and less than or equal to (2M-1);
a second adjusting unit configured to adjust a sequence number of a second candidate control point corresponding to an initial sequence number being a jth sequence number to a (j-i) th sequence number when j-i > 0, where j is an integer greater than or equal to 1 and less than or equal to (2M-1) and j ≠ i; and
and a third adjusting unit, for adjusting the serial number of the second candidate control point corresponding to the j-th serial number as the initial serial number to the (j +2M-i) th serial number when j-i < 0.
21. The apparatus of any one of claims 12-20, wherein the first obtaining module comprises:
and a fourth obtaining submodule, configured to obtain, based on a predetermined transformation algorithm, candidate corrected text image data of the at least one candidate corrected text image according to position information of each of a plurality of first candidate control points included in the at least one first candidate control point sequence and position information of each of a plurality of expected control points included in an expected control point sequence of an expected text image corresponding to the text image to be corrected.
22. The apparatus of claim 21, wherein the predetermined transformation algorithm comprises a thin-plate spline interpolation algorithm.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 11.
CN202210110162.5A 2022-01-28 2022-01-28 Text image correction method, text image correction device, electronic equipment and storage medium Pending CN114494686A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210110162.5A CN114494686A (en) 2022-01-28 2022-01-28 Text image correction method, text image correction device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210110162.5A CN114494686A (en) 2022-01-28 2022-01-28 Text image correction method, text image correction device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114494686A true CN114494686A (en) 2022-05-13

Family

ID=81479238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210110162.5A Pending CN114494686A (en) 2022-01-28 2022-01-28 Text image correction method, text image correction device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114494686A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767429A (en) * 2016-08-18 2018-03-06 阿里巴巴集团控股有限公司 Curve generation method and equipment
CN113255664A (en) * 2021-05-26 2021-08-13 北京百度网讯科技有限公司 Image processing method, related device and computer program product
WO2021208369A1 (en) * 2020-04-17 2021-10-21 嘉楠明芯(北京)科技有限公司 Image correction method and apparatus
CN113591528A (en) * 2021-02-05 2021-11-02 腾讯科技(深圳)有限公司 Document correction method, device, computer equipment and storage medium
US20210390296A1 (en) * 2020-06-16 2021-12-16 Beijing Baidu Netcom Science And Technology Co., Ltd. Optical character recognition method and apparatus, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767429A (en) * 2016-08-18 2018-03-06 阿里巴巴集团控股有限公司 Curve generation method and equipment
WO2021208369A1 (en) * 2020-04-17 2021-10-21 嘉楠明芯(北京)科技有限公司 Image correction method and apparatus
US20210390296A1 (en) * 2020-06-16 2021-12-16 Beijing Baidu Netcom Science And Technology Co., Ltd. Optical character recognition method and apparatus, electronic device and storage medium
CN113591528A (en) * 2021-02-05 2021-11-02 腾讯科技(深圳)有限公司 Document correction method, device, computer equipment and storage medium
CN113255664A (en) * 2021-05-26 2021-08-13 北京百度网讯科技有限公司 Image processing method, related device and computer program product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汤亚波;徐守时;: "一种卫星遥感图像目标位置快速精校正的新方法", 遥感学报, no. 06, 10 December 2005 (2005-12-10) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115187995A (en) * 2022-07-08 2022-10-14 北京百度网讯科技有限公司 Document correction method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111986178A (en) Product defect detection method and device, electronic equipment and storage medium
CN113436100B (en) Method, apparatus, device, medium, and article for repairing video
CN112381183B (en) Target detection method and device, electronic equipment and storage medium
CN113971751A (en) Training feature extraction model, and method and device for detecting similar images
CN115063875B (en) Model training method, image processing method and device and electronic equipment
CN109377508B (en) Image processing method and device
CN111709428B (en) Method and device for identifying positions of key points in image, electronic equipment and medium
CN112308051B (en) Text box detection method and device, electronic equipment and computer storage medium
CN110516598B (en) Method and apparatus for generating image
CN110633717A (en) Training method and device for target detection model
CN114792355A (en) Virtual image generation method and device, electronic equipment and storage medium
CN113705362A (en) Training method and device of image detection model, electronic equipment and storage medium
CN113608805A (en) Mask prediction method, image processing method, display method and equipment
CN113657518A (en) Training method, target image detection method, device, electronic device, and medium
CN115311469A (en) Image labeling method, training method, image processing method and electronic equipment
CN115101069A (en) Voice control method, device, equipment, storage medium and program product
CN114494686A (en) Text image correction method, text image correction device, electronic equipment and storage medium
CN114495101A (en) Text detection method, and training method and device of text detection network
CN113643260A (en) Method, apparatus, device, medium and product for detecting image quality
CN114119990A (en) Method, apparatus and computer program product for image feature point matching
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN116310356B (en) Training method, target detection method, device and equipment of deep learning model
CN115187995B (en) Document correction method, device, electronic equipment and storage medium
CN114511862B (en) Form identification method and device and electronic equipment
CN115564976A (en) Image processing method, apparatus, medium, and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination