CN105354571B  Distortion text image baseline estimation method based on curve projection  Google Patents
Distortion text image baseline estimation method based on curve projection Download PDFInfo
 Publication number
 CN105354571B CN105354571B CN201510695611.7A CN201510695611A CN105354571B CN 105354571 B CN105354571 B CN 105354571B CN 201510695611 A CN201510695611 A CN 201510695611A CN 105354571 B CN105354571 B CN 105354571B
 Authority
 CN
 China
 Prior art keywords
 image
 projection
 line
 strip
 strip image
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Expired  Fee Related
Links
 238000000034 method Methods 0.000 title claims abstract description 57
 238000010586 diagram Methods 0.000 claims description 20
 238000004422 calculation algorithm Methods 0.000 claims description 13
 238000005457 optimization Methods 0.000 claims description 12
 238000005070 sampling Methods 0.000 claims description 11
 229910052704 radon Inorganic materials 0.000 claims description 10
 SYUHGPGVQRZVTBUHFFFAOYSAN radon atom Chemical compound [Rn] SYUHGPGVQRZVTBUHFFFAOYSAN 0.000 claims description 10
 230000009466 transformation Effects 0.000 claims description 10
 238000009499 grossing Methods 0.000 claims description 9
 230000008569 process Effects 0.000 claims description 7
 238000004364 calculation method Methods 0.000 claims description 6
 230000000877 morphologic effect Effects 0.000 claims description 6
 230000035945 sensitivity Effects 0.000 claims description 6
 239000011159 matrix material Substances 0.000 claims description 4
 230000010339 dilation Effects 0.000 claims description 3
 238000005452 bending Methods 0.000 abstract description 3
 230000000694 effects Effects 0.000 description 8
 230000011218 segmentation Effects 0.000 description 8
 230000008901 benefit Effects 0.000 description 5
 238000003709 image segmentation Methods 0.000 description 5
 238000012937 correction Methods 0.000 description 4
 238000000605 extraction Methods 0.000 description 4
 238000004590 computer program Methods 0.000 description 3
 238000010276 construction Methods 0.000 description 3
 238000005516 engineering process Methods 0.000 description 3
 239000000284 extract Substances 0.000 description 3
 230000006870 function Effects 0.000 description 3
 238000005259 measurement Methods 0.000 description 3
 238000004458 analytical method Methods 0.000 description 2
 238000003384 imaging method Methods 0.000 description 2
 230000004075 alteration Effects 0.000 description 1
 238000013459 approach Methods 0.000 description 1
 230000009286 beneficial effect Effects 0.000 description 1
 230000008859 change Effects 0.000 description 1
 238000003708 edge detection Methods 0.000 description 1
 238000001914 filtration Methods 0.000 description 1
 238000010191 image analysis Methods 0.000 description 1
 230000006872 improvement Effects 0.000 description 1
 230000003993 interaction Effects 0.000 description 1
 238000012986 modification Methods 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 230000003287 optical effect Effects 0.000 description 1
 238000012545 processing Methods 0.000 description 1
 238000013138 pruning Methods 0.000 description 1
 238000006467 substitution reaction Methods 0.000 description 1
 230000007704 transition Effects 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
 G06V30/00—Character recognition; Recognising digital ink; Documentoriented imagebased pattern recognition
 G06V30/10—Character recognition
 G06V30/14—Image acquisition
 G06V30/146—Aligning or centring of the image pickup or imagefield
 G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
 G06V30/1478—Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines

 G—PHYSICS
 G06—COMPUTING; CALCULATING OR COUNTING
 G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
 G06V10/00—Arrangements for image or video recognition or understanding
 G06V10/20—Image preprocessing
 G06V10/24—Aligning, centring, orientation detection or correction of the image
 G06V10/247—Aligning, centring, orientation detection or correction of the image by affine transforms, e.g. correction due to perspective effects; Quadrilaterals, e.g. trapezoids
Landscapes
 Engineering & Computer Science (AREA)
 Computer Vision & Pattern Recognition (AREA)
 Physics & Mathematics (AREA)
 General Physics & Mathematics (AREA)
 Multimedia (AREA)
 Theoretical Computer Science (AREA)
 Image Processing (AREA)
 Image Analysis (AREA)
Abstract
The invention discloses a kind of distortion text image baseline estimation method based on curve projection, this method include at least: extracting the edge image of distortion text image；The edge image is cut into band image；Calculate the perspective view of the band image；According to the perspective view, the optimal projection line of the band image is estimated；According to the optimal projection line of the band image and boundary line, the baseline of the distortion text image is obtained.Through the invention, the technical issues of how extracting distortion text image bending baseline is at least solved.
Description
Technical Field
The embodiment of the invention relates to the technical field of digital image processing and computer vision, in particular to a distorted text image baseline estimation method based on curve projection.
Background
When a camera is used for shooting pages of curved documents such as books and periodicals, the obtained images are often accompanied with severe geometric distortion due to the perspective effect of the camera and the bending of the pages. Such geometric distortion causes serious problems in subsequent text image analysis, such as image layout analysis and character recognition, and therefore, it is often necessary to first perform distortion correction on a distorted text image. One of the first problems involved in this is how to robustly and accurately extract the baseline of curved text lines in the image.
The text line baseline is a cluster of invisible horizontal lines in the text image that are parallel to each other and along which the printed content of the document (e.g., text lines, charts, etc.) is aligned. For curved document pages, the cluster of base lines is generally no longer a cluster of straight lines, but a cluster of curved lines. In addition, the text line baselines on the text image are no longer parallel to each other due to the perspective effect of the camera. In addition, due to the complex and various layouts of document pages and the ubiquitous presence of factors such as noncharacter target interference, imaging noise, image occlusion, low image resolution, character blurring and the like in images, accurate and robust extraction of text line base lines in distorted text images is usually very challenging.
In order to estimate the curved text line baseline, a common method firstly segments the horizontal text line in the image, then extracts a corresponding reference point for each character in the text line, and finally fits the reference points by using a Bspline curve to obtain the estimation of the text line baseline. Depending on the method of obtaining text lines, such methods can be further subdivided into: a text line tracking based method, a connected branch clustering based method, and an image segmentation based method.
Early text line tracking was performed directly on binary images. The method comprises the steps of firstly selecting a connected branch in a binary image as a starting seed point for tracking according to a certain strategy, and then carrying out seed point growth on the seed points by searching neighboring connected branches around the seed points. It should be noted that the connected branches are often stuck due to factors such as image blur and low resolution, and therefore, tracking at the level of the connected branches is often unstable, and a large number of tracking errors are often caused. Furthermore, this method is very language sensitive, as most chinese characters are usually composed of many connected branches, and thus tracking on chinese documents often results in erroneous text lines.
An improvement to the above method is to perform text line tracking directly on the grayscale image. The scholars propose a filteringbased method, which uses a set of anisotropic gaussian filter sets to filter the grayscale image to extract the ridge lines of text lines, and then tracks the obtained ridge lines to extract the text lines. Another improved method notices high similarity between image blocks from the same text line, so a selfsimilarity measurement function between image blocks is introduced, and a text line tracking algorithm is constructed based on the measurement function. Text line tracking based methods are typically very sensitive to image noise. In addition, nontextual objects in text images, complex layout structures, etc. often cause the tracking algorithm to fail.
The problem of extracting lines of text can be generally seen as a clustering problem of connected branches. Based on this, recently, a bottomup method is proposed to segment text lines in a handwritten Chinese document image. The method comprises the steps of firstly constructing distance measurement between connected branches by using supervised learning, then organizing all the connected branches in a binary image into a tree structure by using a minimum spanning tree, and finally, dynamically pruning the minimum spanning tree to obtain a text line to be segmented. Similar to the thought, the learner also solves the segmentation problem of the text lines as an energy minimization problem of the connected branch state in the image, measures the interaction between the text lines and the text line bending by introducing a cost function, and finally solves the optimization problem by using a graph cutting method to obtain a text line segmentation result. Connectedbranch clustering based methods are generally more robust than textline tracking based methods. However, the problems of a large number of manual parameter settings, heuristic merging rules, incapability of managing the change of the topology of the connected branches and the like in the algorithm often result in poor performance of the algorithm in practical application.
Unlike the abovedescribed methods, the image segmentationbased method treats text line extraction as a classical image segmentation problem to solve. Based on this, the scholars propose a text line segmentation method based on density estimation and image level set. One significant advantage of this type of approach is language independent and therefore can be applied to text images of different languages. Inspired by the slit cutting (seamcarding) technology, some scholars directly apply the slit cutting technology to text line segmentation of text images, and a better effect is achieved. The image segmentation based method is similar to most image segmentation methods, and has a significant limitation, that is, the segmentation effect of the method is very sensitive to image noise, image resolution, and adhesion between characters, which are very common in text images shot by cameras (especially mobile phones with cameras).
In view of the above, the present invention is particularly proposed.
Disclosure of Invention
The embodiment of the invention mainly aims to provide a distorted text image baseline estimation method based on curve projection, which at least partially solves the technical problem of how to estimate a distorted text image baseline.
In order to achieve the above object, according to one aspect of the present invention, the following technical solutions are provided:
a method for distorted text image baseline estimation based on curvilinear projection, the method at least may include:
extracting an edge image of the distorted text image;
cutting the edge image into strip images;
calculating a projection view of the strip image;
estimating an optimal projection line of the strip image according to the projection diagram;
and obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image.
Further, the extracting the edge image of the distorted text image specifically includes:
step 1: calculating an edge image of the input image by using a Canny operator;
step 2: performing morphological closing operation and removing operation on the edge image;
and step 3: and (3) performing morphological dilation operation on the image obtained in the step (2).
Further, the calculating the projection view of the strip image specifically includes:
calculating a Radon transformation matrix corresponding to the strip imageTo the aboveAnd performing coordinate transformation to obtain a projection graph R (k, theta) corresponding to the strip image, wherein a coordinate transformation formula is as follows:
wherein,
h represents the height of the band image;
k represents a line mark of the strip image;
ρ represents the distance from the center of the strip image to the projection line;
theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α denotes the minimum angle between the projection line of the strip image and the axis of abscissa;
β denotes the maximum angle of the strip image projection line with respect to the axis of abscissa.
Further, the estimating an optimal projection line of the strip image according to the projection map specifically includes:
and constructing a constraint optimization problem on the projection graph R (k, theta), and calculating an optimal projection line of the strip image by using a dynamic programming algorithm. Wherein the constraint optimization problem is as follows:
wherein,
θ_{k}an optimal included angle parameter corresponding to a projection line representing the k (k is 1, …, H) th row center point of the strip image;
p represents a power exponent parameter of the projection value;
λ represents a weight parameter;
φ(θ_{1},…,θ_{H}) A smoothing term representing the projection line angle parameter, used to smooth the angle parameter of adjacent projection lines, is defined as follows:
wherein, σ is a set parameter for controlling the sensitivity of the smoothing term to the difference between the included angles of the adjacent projection lines.
Further, the calculating an optimal projection line of the strip image by using a dynamic programming algorithm specifically includes:
a weighted directed graph is constructed that is,
discrete sampling is carried out on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points (ktheta) on a ktheta plane_{s},θ^{j})(1≤s≤n,1≤j≤m)，
Wherein k is_{s}Are discrete sample points in the kdirection,
θ^{j}are discrete sample points in the theta direction,
n is the total number of sample points in the k coordinate,
m is the total number of sample points in the theta coordinate,
taking the grid points as vertexes of a weighted directed graph;
if and only if two vertices (k)_{s1},θ^{i}) And (k)_{s},θ^{j}) The corresponding projection lines meet the disjointness condition, a directed edge is constructed for the two vertexes, and a vertex (k) is recorded_{s1},θ^{i}) Corresponding projection line angle isTwo adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
v is_{A}The row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
v is_{B}The row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
the w represents the width of the slice image;
for connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
the p represents a specified power exponent;
said Δ_{k}Representing a sampling interval for k coordinates of the projection view;
said λ represents a weight parameter;
the sigma represents a set parameter and is used for controlling the sensitivity of the smoothing item to the difference value of the included angles of the adjacent projection lines;
the h represents an angle step and is calculated according to the following formula:
solving for the longest path on the weighted directed graph.
Further, the obtaining a baseline of the distorted text image according to the optimal projection line and the boundary line of the strip image specifically includes:
calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the leftmost strip image of the image, wherein if the strip image is overlapped with the adjacent strip images, the central line of the overlapped part is selected as the boundary line of the strip image;
from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary thereof as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation;
repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed;
and approximating all the intersection points by utilizing a cubic spline curve so as to obtain a base line.
Compared with the prior art, the technical scheme at least has the following beneficial effects:
the embodiment of the invention extracts the edge graph of the image of the curved document (such as a book), performs stripe segmentation on the edge graph, calculates the projection graph of the stripe image by using Radon transformation for each stripe image, and constructs a constraint optimization problem based on the graph to calculate the optimal projection line of the stripe image. The solution of the optimization problem can be converted into an optimal path calculation problem on a weighted directed graph, and can be quickly solved through a classical dynamic programming algorithm. And finally, connecting the optimal projection lines obtained on each strip image to obtain a complete baseline estimation of the image. The embodiment of the invention can be applied to extracting the base line of the curved document image, thereby being used for the distortion correction of the distorted book page image. The method is independent of text line segmentation, and has the advantages of low calculation complexity, high precision, wide applicability and the like. The embodiment of the invention can be applied to highquality correction of distorted documents such as book pages and the like, and has wide application prospect in the fields of document data digitization, digital library construction, precious historical document book protection and the like.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the abovedescribed advantages at the same time.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the means particularly pointed out in the written description and claims hereof as well as the appended drawings.
It should be noted that this summary is not intended to identify essential features of the claimed subject matter, nor is it intended to be used to determine the scope of the claimed subject matter. The claimed subject matter is not limited to addressing any or all of the disadvantages noted in the background.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention to its proper form. It is obvious that the drawings in the following description are only some embodiments, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a flow diagram illustrating a distorted text image baseline estimation method based on curvilinear projection according to an exemplary embodiment;
FIG. 2 is a schematic diagram illustrating a strip image coordinate system definition and projection line estimation in accordance with an exemplary embodiment;
FIG. 3a is an edge diagram of a stripe image shown in accordance with an exemplary embodiment;
FIG. 3b shows the result of a Radon transform of a strip image according to an example embodiment;
FIG. 3c is a projection diagram illustrating a strip image according to a Radon transform, according to an exemplary embodiment;
FIG. 4a is a schematic diagram illustrating construction of a weighted directed graph in accordance with an illustrative embodiment;
FIG. 4b is a diagram illustrating the calculation of an adjacent projection line disjoint constraint in accordance with an exemplary embodiment;
FIG. 5a is a diagram illustrating the results of calculating an optimal projection line on a stripe projection map, according to an exemplary embodiment;
FIG. 5b illustrates an optimal projection line on a strip image according to an exemplary embodiment;
FIG. 5c is a diagram illustrating the results of computing a histogram of the projection of a strip image along a fixed direction of the image in accordance with one illustrative embodiment;
FIG. 5d is a diagram illustrating the results of computing a histogram of the projection of the strip image along an optimal projection line in accordance with one illustrative embodiment;
FIG. 6 is a schematic diagram illustrating a strap baseline connection in accordance with an exemplary embodiment;
FIG. 7 is a diagram illustrating a baseline estimate of multiple distorted text images obtained using a method according to an embodiment of the invention, according to an illustrative embodiment.
These drawings and the description are not intended to limit the scope of the present invention in any way, but rather to illustrate the inventive concept to those skilled in the art by reference to specific embodiments.
Detailed Description
The technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present invention are clearly and completely described below with reference to the accompanying drawings and the specific embodiments. It is to be understood that the described embodiments are merely a few, and not all, of the embodiments of the present application. All other equivalent or obviously modified embodiments obtained by the person skilled in the art based on the embodiments in this application fall within the scope of protection of the invention without inventive step. The embodiments of the invention can be embodied in many different ways as defined and covered by the claims.
It should be noted that in the following description, numerous specific details are set forth in order to provide an understanding. It may be evident, however, that the subject invention may be practiced without these specific details.
It should be noted that, unless explicitly defined or conflicting, the embodiments and technical features in the present invention may be combined with each other to form a technical solution.
In order to solve the problem of extraction of a curved baseline of a distorted text image, the embodiment of the invention provides a distorted text image baseline estimation method based on curve projection. FIG. 1 is a flow diagram illustrating a distorted text image baseline estimation method based on curvilinear projection according to an exemplary embodiment. As shown in fig. 1, the method may include at least steps S100 to S108.
S100: an edge image of the distorted text image is extracted.
In the step, an edge map of an input image is calculated by using a Canny operator, then the edge map is subjected to morphological closing operation and removing operation, and finally the obtained image is subjected to morphological dilation operation again. The Canny operator is a multistage edge detection algorithm and comprises the steps of denoising, finding brightness gradient in an image, tracking edges in the image and adjusting parameters.
S102: the edge image is sliced into strip images.
In this step, the resulting edge image is cut into a plurality of vertical band images overlapping each other according to the image size, the height of the band images being equal to the height of the image.
S104: a projection view of the strip image is calculated.
In this step, for each strip chart, calculating Radon variation corresponding to the strip chartChange matrixAnd rho and theta are parameters of a projection line corresponding to Radon transformation, rho is the distance from the center of the strip image to the projection line, and theta is an included angle between the normal of the projection line and the x coordinate axis of the strip image. To Radon transform matrixAnd (3) carrying out coordinate transformation, and recording a projection graph corresponding to the obtained strip graph as R (k, theta), wherein a coordinate transformation formula is as follows:
wherein,
h represents the height of the band image;
k represents a line mark of the strip image;
ρ represents the distance from the center of the strip image to the projection line;
theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α denotes the minimum angle between the projection line of the strip image and the axis of abscissa;
β denotes the maximum angle of the strip image projection line with respect to the axis of abscissa.
By way of example, FIG. 2 presents a schematic diagram of the computation of the strip projection map. For each strip image, two coordinate systems are respectively established on the strip: an image coordinate system uov and a projection coordinate system xoy. The image coordinate system origin is located at the upper left corner of the strip image, the u axis and the v axis are respectively parallel to the row direction and the line direction of the image, the projection coordinate system origin is located at the center of the strip image, and the x axis and the y axis are respectively parallel to the u axis and the v axis of the image coordinate system. The projection diagram of the strip image mainly calculates the projection values of the image along a plurality of directions at each point on the central line of the strip image. In fig. 2, p is a point on the center line of the band image. In order to estimate the optimal projection line of the strip, the optimal projection line of each point on the central line of the strip is determined. By way of example, FIG. 3a shows an edge map of a stripe image, shown in accordance with an exemplary embodiment; FIG. 3b shows the result of a Radon transform of a strip image shown in accordance with an example embodiment; fig. 3c shows a projection of a strip image according to a Radon transform, according to an exemplary embodiment.
S106: from the projection map, the optimal projection line of the strip image is estimated.
In this step, a constraint optimization problem is constructed on the projection map R (k, θ), and all optimal projection lines of the strip image are calculated using a dynamic programming algorithm. Wherein the constraint optimization problem of the construction is as follows:
wherein,
θ_{k}an optimal included angle parameter corresponding to a projection line representing the k (k is 1, …, H) th line center point of the strip image, wherein the parameter is a parameter to be estimated;
p represents a power exponent parameter of the projection value, preferably, p ≧ 3;
λ represents a weight parameter;
φ(θ_{1},…,θ_{H}) A smoothing term representing the projection line angle parameter, used to smooth the angle parameter of adjacent projection lines, is defined as follows:
wherein, σ is a set parameter for controlling the sensitivity of the smoothing term to the difference between the included angles of the adjacent projection lines.
Wherein, the dynamic programming is: each decision depends on the current state, which in turn causes a transition of state. One decision sequence is generated in varying states, so this process of multistage optimization decision solving is called dynamic programming.
In an optional embodiment, when the optimal projection line is calculated by using a dynamic programming algorithm, a weighted directed graph needs to be constructed first, and the specific steps include:
firstly, discrete sampling is carried out on the k coordinate and the theta coordinate of a projection graph R (k, theta) to obtain a series of grid points (ktheta) on a ktheta plane_{s},θ^{j}) (s is 1. ltoreq. n, j is 1. ltoreq. m) as shown in FIG. 4 a. Wherein k is_{s}Is a discrete sampling point in the k direction, theta^{j}Is a discrete sampling point in the theta direction, n is the total number of sampling points in the k coordinate, and m is the total number of sampling points in the theta coordinate. These grid points are taken as vertices of the constructed weighted directed graph.
Second, the weighted directed graph edges are connected. If and only if two vertices (k)_{s1},θ^{i}) And (k)_{s},θ^{j}) And the corresponding projection lines meet the disjointness condition, and a directed edge is constructed for the two vertexes. For convenience, the vertex (k) is noted_{s1},θ^{i}) Corresponding projection line angle isWith the symbol representing the corresponding vertex. Two adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
v_{A}the row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
v_{B}the row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
w represents the width of the band image;
and finally, calculating the weight of the weighted directed graph edge. For connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
p represents a specified power exponent;
Δ_{k}representing a sampling interval for k coordinates of the projection view;
λ represents a weight parameter;
sigma represents a set parameter for controlling the sensitivity of the smoothing term to the difference value of the included angles of the adjacent projection lines;
h represents the angular step and is calculated according to the following formula:
after the weighted directed graph is obtained, the constraint optimization problem is solved
And converting into the longest path planning problem on the constructed weighted directed graph. For this purpose, a virtual start point and an end point are added to the left and right of the weighted directed graph, respectively, where the start point is connected to all the leftmost vertices of the weighted directed graph, the end point is connected to all the rightmost vertices of the weighted directed graph, and the weights of all the connected edges are set to zero. Thus, the constraint optimization problem is converted to solve a longest path from a left starting point of the weighted directed graph to a right end point of the weighted directed graph. The problem is solved by a classical longest path problem, and can be quickly solved by a classical algorithm such as Dijkstra.
As an example, fig. 4a gives a schematic diagram of constructing a weighted directed graph. And (3) performing discrete sampling on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points on a ktheta plane, wherein the grid points are used as vertexes of the directed graph. When constructing directed edges, the edges are only connected to adjacent vertices that satisfy the disjoint constraint. As an example, fig. 4b shows a schematic diagram of calculating a disjointing constraint between two adjacent projection lines. In order to ensure that two adjacent projection lines do not intersect, the current projection line should be located in an angular region defined by the previous projection line and the current point. In FIG. 4b, L denotes the center line of the band image, k_{s1}And k_{s}Two adjacent points on the central line of the strip image respectively, and the straight line AB is over k_{s1}The straight line intersects the left and right boundaries of the strip image at two points a and B. As can be seen from the figure, to make kover_{s}Does not intersect with the line AB, the kcrossing_{s}Must be located at the position of A, B and kcrossing_{s}The three points define an angular region. By way of example, FIG. 5a shows the results of computing an optimal projection line on a stripe projection graph, which corresponds to computing an optimal path from left to right on a constructed weighted directed graph. Fig. 5b shows the optimal projection line corresponding to the strip image. As can be seen from fig. 5b, these optimal projection lines correspond to the baseline of the strip image. Note that the baselines on the strip images are typically not parallel due to image distortion and camera perspective effects. Fig. 5c and 5d show the results of computing the projection histogram of the strip image along the image fixation direction and along the optimal projection line, respectively. It can be seen that the projection histogram obtained along the optimal projection line has a significant peak value and no aliasing phenomenon exists.
S108: and obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image.
In the step, for the strip image at the leftmost side of the image, calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the strip image, wherein if the strip image and the adjacent strip image are overlapped, the central line of the overlapped part is selected as the boundary line of the strip image; from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation; repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed; and finally, approximating all the intersection points by utilizing a cubic spline curve to obtain a smooth base line.
As an example, fig. 6 gives a schematic diagram of the baseline connection of adjacent strip images. In the figure S_{i1}、S_{i}And S_{i+1}Respectively representing three adjacent strip images which are overlapped two by two. When the base lines are connected, the starting point of the base line is set on the central line of the overlapping area of the strip images.
In order to verify the embodiment of the invention, baseline extraction was performed on a plurality of actually shot text images of different distorted shapes. By way of example, FIG. 7 is a diagram illustrating a baseline estimate of a plurality of distorted text images obtained using a method according to an embodiment of the invention, according to an exemplary embodiment. Where the first line is the original distorted text image, it can be seen that these images have different distorted shapes and different layout structures. The partial image also contains a margin noise or the like introduced during imaging. The second row shows the image baseline results extracted using the method provided by the embodiments of the present invention. The third row shows a projection histogram of the image projected along the extracted image baseline. It can be seen that the projection histograms have significant peak values, and peaks and valleys in the histograms have good separability, which facilitates subsequent image layout segmentation and analysis. The fourth row gives the effect of extracting the baseline on the locally enlarged image. It can be seen that the image baseline extracted by the embodiment of the invention has high accuracy, and the extracted baseline is well fitted with the curved text line. This provides good characteristic lines for subsequent image distortion correction.
While the steps in this embodiment are described as being performed in the above sequence, those skilled in the art will appreciate that, in order to achieve the effect of this embodiment, the steps may not be performed in such a sequence, and may be performed simultaneously or in a reverse sequence, and these simple changes are all within the scope of the present invention.
The technical solutions provided by the embodiments of the present invention are described in detail above. Although specific examples have been employed herein to illustrate the principles and practice of the invention, the foregoing descriptions of embodiments are merely provided to assist in understanding the principles of embodiments of the invention; also, it will be apparent to those skilled in the art that variations may be made in the embodiments and applications of the invention without departing from the spirit and scope of the invention.
It should be noted that: the numerals in the drawings are only for the purpose of illustrating the invention more clearly and are not to be construed as unduly limiting the scope of the invention.
The terms "comprises," "comprising," or any other similar term are intended to cover a nonexclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/device. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other elements in a process, method, article, or apparatus/device that comprises the element, i.e., the meaning of "comprising a" does not exclude the meaning of "comprising another".
The various steps of the present invention may be implemented in a general purpose computing device, for example, they may be centralized on a single computing device, such as: personal computers, server computers, handheld or portable devices, tablettype devices or multiprocessor apparatus, which may be distributed over a network of computing devices, may perform the steps shown or described in a different order than those shown or described herein, or may be implemented as separate integrated circuit modules, or may be implemented as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific hardware or software or combination thereof.
The methods provided by the present invention may be implemented using programmable logic devices or as computer program software or program modules (including routines, programs, objects, components, data structures, etc.) including performing particular tasks or implementing particular abstract data types, such as a computer program product which is executed to cause a computer to perform the methods described herein. The computer program product includes a computerreadable storage medium having computer program logic or code portions embodied in the medium for performing the method. The computerreadable storage medium may be a builtin medium installed in the computer or a removable medium detachable from the computer main body (e.g., a storage device using a hotplug technology). The builtin medium includes, but is not limited to, rewritable nonvolatile memory such as: RAM, ROM, flash memory, and hard disk. The removable media include, but are not limited to: optical storage media (e.g., CDROMs and DVDs), magnetooptical storage media (e.g., MOs), magnetic storage media (e.g., magnetic tapes or removable disks), media with builtin rewritable nonvolatile memory (e.g., memory cards), and media with builtin ROMs (e.g., ROM cartridges).
The present invention is not limited to the abovedescribed embodiments, and any variations, modifications, or alterations that may occur to one skilled in the art without departing from the spirit of the invention fall within the scope of the invention.
While there has been shown, described, and pointed out detailed description of the basic novel features of the invention as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the system may be made by those skilled in the art without departing from the spirit of the invention.
Claims (5)
1. A distorted text image baseline estimation method based on curve projection is characterized by at least comprising the following steps:
extracting an edge image of the distorted text image;
cutting the edge image into strip images;
calculating a projection view of the strip image;
estimating an optimal projection line of the strip image according to the projection diagram;
obtaining a base line of the distorted text image according to the optimal projection line and the boundary line of the strip image, specifically:
calculating the intersection point of the optimal projection line and the left and right boundaries of the strip image by using a projection line equation for the optimal projection line passing through each point on the central line of the leftmost strip image of the image, wherein if the strip image is overlapped with the adjacent strip images, the central line of the overlapped part is selected as the boundary line of the strip image;
from left to right, taking the intersection point of the optimal projection line of the previous strip image and the right boundary thereof as the starting point of the optimal projection line of the current strip image, and calculating the intersection point of the optimal projection line of the strip image and the right boundary of the strip image by utilizing a strip projection line equation;
repeating the above processes until the calculation of the strip image positioned at the rightmost side of the image is completed;
and approximating all the intersection points by utilizing a cubic spline curve so as to obtain a base line.
2. The method for estimating a baseline of a distorted text image based on curved projection as claimed in claim 1, wherein the extracting the edge image of the distorted text image specifically comprises:
step 1: calculating an edge image of the input image by using a Canny operator;
step 2: performing morphological closing operation and removing operation on the edge image;
and step 3: and (3) performing morphological dilation operation on the image obtained in the step (2).
3. A distorted text image baseline estimation method based on curved projection as claimed in claim 1, wherein said calculating the projection view of the strip image specifically comprises:
calculating a Radon transformation matrix corresponding to the strip imageTo the aboveAnd performing coordinate transformation to obtain a projection graph R (k, theta) corresponding to the strip image, wherein a coordinate transformation formula is as follows:
wherein,
the H represents the height of the stripe image;
the k represents a line mark of the strip image;
the p represents the distance from the center of the strip image to the projection line;
the theta represents an included angle between a normal of a projection line of the strip image and an abscissa axis of the strip image;
α represents the minimum angle between the projection line of the strip image and the axis of abscissa;
said β indicates the maximum angle of the strip image projection line with the abscissa axis.
4. A distorted text image baseline estimation method based on curved projection as claimed in claim 3, wherein said estimating the optimal projection line of the strip image according to the projection map specifically comprises:
constructing a constraint optimization problem on a projection graph R (k, theta), and calculating an optimal projection line of the strip image by using a dynamic programming algorithm; wherein the constraint optimization problem is as follows:
wherein,
theta is described_{k}Representing the optimal included angle parameter corresponding to the projection line of the central point of the kth line of the strip image, wherein k is 1, … and H;
the p represents a power exponent parameter of the projection value;
said λ represents a weight parameter;
the phi (theta)_{1},…,θ_{H}) Smoothing to indicate included angle parameter of projection lineThe term, the angle parameter used to smooth adjacent projection lines, is defined as follows:
wherein, the sigma is a set parameter used for controlling the sensitivity of the smoothing term to the difference value of the included angles of the adjacent projection lines.
5. The method for estimating a baseline of a distorted text image based on curved projection as claimed in claim 4, wherein the calculating the optimal projection line of the strip image by using a dynamic programming algorithm specifically comprises:
a weighted directed graph is constructed that is,
discrete sampling is carried out on the k coordinate and the theta coordinate of the projection graph R (k, theta) to obtain a series of grid points (ktheta) on a ktheta plane_{s},θ^{j})，1≤s≤n,1≤j≤m；
Wherein, k is_{s}Are discrete sample points in the kdirection,
theta is described^{j}Are discrete sample points in the theta direction,
said n being the total number of sample points in the k coordinate,
said m being the total number of sample points in the theta coordinate,
taking the grid points as vertexes of a weighted directed graph;
if and only if two vertices (k)_{s1},θ^{i}) And (k)_{s},θ^{j}) The corresponding projection lines meet the disjointness condition, a directed edge is constructed for the two vertexes, and a vertex (k) is recorded_{s1},θ^{i}) Corresponding projection line angle isTwo adjacent projection linesAndare not intersected if and only ifWithin the following intervals:
wherein,
v is_{A}The row coordinates representing the intersection of the previous projection line and the left border of the strip image are calculated according to the following formula:
v is_{B}The row coordinates representing the intersection of the previous projection line and the right border of the strip image are calculated according to the following formula:
the w represents the width of the slice image;
for connecting verticesAnda directed edge of (2), to which a weight is given
Wherein,
the p represents a specified power exponent;
said Δ_{k}Representing a sampling interval for k coordinates of the projection view;
said λ represents a weight parameter;
the sigma represents a set parameter and is used for controlling the sensitivity of the smoothing item to the difference value of the included angles of the adjacent projection lines;
the h represents an angle step and is calculated according to the following formula:
solving for the longest path on the weighted directed graph.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201510695611.7A CN105354571B (en)  20151023  20151023  Distortion text image baseline estimation method based on curve projection 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201510695611.7A CN105354571B (en)  20151023  20151023  Distortion text image baseline estimation method based on curve projection 
Publications (2)
Publication Number  Publication Date 

CN105354571A CN105354571A (en)  20160224 
CN105354571B true CN105354571B (en)  20190205 
Family
ID=55330538
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201510695611.7A Expired  Fee Related CN105354571B (en)  20151023  20151023  Distortion text image baseline estimation method based on curve projection 
Country Status (1)
Country  Link 

CN (1)  CN105354571B (en) 
Families Citing this family (6)
Publication number  Priority date  Publication date  Assignee  Title 

CN105842728B (en) *  20160324  20181130  东华理工大学  Digitize the pulse base estimation method in nuclear spectrum measurement system 
CN107730511B (en) *  20170920  20201027  北京工业大学  Tibetan historical literature text line segmentation method based on baseline estimation 
CN107845058A (en) *  20170928  20180327  成都大熊智能科技有限责任公司  A kind of method that threedimensionalreconstruction based on edge line realizes projection distortion correction 
CN112241411B (en) *  20201023  20220726  湖南省交通规划勘察设计院有限公司  Spreadsheet structured identification and extraction method based on CAD basic elements 
CN113298054B (en) *  20210727  20211008  国际关系学院  Text region detection method based on embedded spatial pixel clustering 
CN113901904A (en) *  20210929  20220107  北京百度网讯科技有限公司  Image processing method, face recognition model training method, device and equipment 
Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN101149801A (en) *  20071023  20080326  北京大学  Complex structure file image inclination quick detection method 
CN101192269A (en) *  20061129  20080604  佳能株式会社  Method and device for estimating vanishing point from image, computer program and its storage medium 
CN102156884A (en) *  20110425  20110817  中国科学院自动化研究所  Straight segment detecting and extracting method 
US20140140627A1 (en) *  20121120  20140522  Hao Wu  Image rectification using sparselydistributed local features 

2015
 20151023 CN CN201510695611.7A patent/CN105354571B/en not_active Expired  Fee Related
Patent Citations (4)
Publication number  Priority date  Publication date  Assignee  Title 

CN101192269A (en) *  20061129  20080604  佳能株式会社  Method and device for estimating vanishing point from image, computer program and its storage medium 
CN101149801A (en) *  20071023  20080326  北京大学  Complex structure file image inclination quick detection method 
CN102156884A (en) *  20110425  20110817  中国科学院自动化研究所  Straight segment detecting and extracting method 
US20140140627A1 (en) *  20121120  20140522  Hao Wu  Image rectification using sparselydistributed local features 
Also Published As
Publication number  Publication date 

CN105354571A (en)  20160224 
Similar Documents
Publication  Publication Date  Title 

CN105354571B (en)  Distortion text image baseline estimation method based on curve projection  
Miao et al.  A semiautomatic method for road centerline extraction from VHR images  
Wei et al.  Tensor voting guided mesh denoising  
US8494273B2 (en)  Adaptive optical character recognition on a document with distorted characters  
CN102592268B (en)  Method for segmenting foreground image  
CN110334762B (en)  Feature matching method based on quad tree combined with ORB and SIFT  
CN104636706B (en)  One kind is based on gradient direction uniformity complex background bar code image automatic division method  
CN103942797B (en)  Scene image text detection method and system based on histogram and superpixels  
US8929597B2 (en)  Method of tracking objects  
CN104809446A (en)  Palm direction correctionbased method for quickly extracting region of interest in palmprint  
CN113392856B (en)  Image forgery detection device and method  
CN104123554A (en)  SIFT image characteristic extraction method based on MMTD  
De  Automatic data extraction from 2D and 3D pie chart images  
Koo  Textline detection in cameracaptured document images using the state estimation of connected components  
Feild et al.  Scene text recognition with bilateral regression  
CN112990368B (en)  Polygonal structure guided hyperspectral image single sample identification method and system  
CN112907612A (en)  Bar code region positioning method and image rectangular region fitting method  
Ouwayed et al.  General text line extraction approach based on locally orientation estimation  
EP3686841B1 (en)  Image segmentation method and device  
Krylov et al.  Stochastic extraction of elongated curvilinear structures with applications  
Deb et al.  An efficient method for correcting vehicle license plate tilt  
Kumar et al.  An efficient algorithm for text localization and extraction in complex video text images  
CN105631896A (en)  Hybrid classifier decisionbased compressed sensing tracking method  
CN104156696B (en)  Bidirectionalimagebased construction method for quick local changeless feature descriptor  
CN102693424A (en)  Document skew correction method based on Harrlike features 
Legal Events
Date  Code  Title  Description 

C06  Publication  
PB01  Publication  
C10  Entry into substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant  
CF01  Termination of patent right due to nonpayment of annual fee 
Granted publication date: 20190205 

CF01  Termination of patent right due to nonpayment of annual fee 